G3: |Genomes|Genetics Early Online, published on July 3, 2019 as doi:10.1534/g3.119.400429

1

1

2 Single- deletions contributing to loss of heterozygosity in Saccharomyces cerevisiae:

3 genome-wide screens and reproducibility

4

5 Kellyn M. Hoffert* and Erin D. Strome*

6 *Department of Biological Sciences, Northern Kentucky University, Highland Heights, Kentucky,

7 41099

8

9

10

11

12

13

14

15

16

17

18

19

© The Author(s) 2013. Published by the Genetics Society of America. 2

20

21 S. cerevisiae heterozygous LOH screen

22 Keywords: S. cerevisiae, loss of heterozygosity, MAT, MET15, screen comparison

23 Erin D. Strome, FH359G Nunn Drive, Highland Heights, KY 41099, 859-572-6635,

24 [email protected]

25

26

27

28

29

30

31

32

33

34

35

36

37 3

38 ABSTRACT

39 Loss of heterozygosity (LOH) is a phenomenon commonly observed in cancers; the loss of

40 chromosomal regions can be both causal and indicative of underlying genome instability. Yeast

41 has long been used as a model organism to study genetic mechanisms difficult to study in

42 mammalian cells. Studying gene deletions leading to increased LOH in yeast aids our

43 understanding of the processes involved, and guides exploration into the etiology of LOH in

44 cancers. Yet, before in-depth mechanistic studies can occur, candidate genes of interest must

45 be identified. Utilizing the heterozygous Saccharomyces cerevisiae deletion collection (≈ 6500

46 strains), 217 genes whose disruption leads to increased LOH events at the endogenously

47 heterozygous mating type locus were identified. Our investigation to refine this list of genes to

48 candidates with the most definite impact on LOH includes: secondary testing for LOH impact at

49 an additional locus, analysis to determine common gene characteristics, and

50 positional gene enrichment studies to identify chromosomal regions important in LOH events.

51 Further, we conducted extensive comparisons of our data to screens with similar, but distinct

52 methodologies, to further distinguish genes that are more likely to be true contributors to

53 instability due to their reproducibility, and not just identified due to the stochastic nature of LOH.

54 Finally, we selected nine candidate genes and quantitatively measured their impact on LOH as

55 a benchmark for the impact of genes identified in our study. Our data add to the existing body of

56 work and strengthen the evidence of single-gene knockdowns contributing to genome instability.

57

58

59 4

60 INTRODUCTION

61 Genome instability underlies a multitude of changes observed during tumorigenesis. The

62 accumulation of mutations, both as drivers and as a result of tumor formation, give important

63 insight into cancer progression (Loeb et al. 2003). For most genes conferring a protective effect

64 against tumorigenesis, loss of function alterations need to occur in both alleles before the

65 development of cancer phenotypes (Kacser and Burns 1981; Knudson 1993). However, some

66 genes have been discovered to impart haploinsufficient effects whereby a mutation event in only

67 one allele leads to an abnormal cellular phenotype (Veitia 2002; Trotman et al. 2003; Alimonti et

68 al. 2010; Berger et al. 2011). In most cancers, the accumulation of mutations leading to

69 tumorigenesis does not happen only at the single nucleotide level, but with large chromosomal

70 gains or losses (Lengauer et al. 1998; Loeb et al. 2003; Giam and Rancati 2015; Gao et al.

71 2016). Loss of heterozygosity (LOH) events are one such type of genome instability implicated

72 in tumorigenesis and can arise from a myriad of underlying mechanisms, Fig 1, (Knudson 1971;

73 Vladusic et al. 2010; Choi et al. 2017) for additional review: (Levine 1993; Thiagalingam et al.

74 2002; Payne and Kemp 2005).

75 Identifying driver mutations responsible for LOH events has proven a difficult task in

76 mammalian cells; by the time tumors are large enough to be detected and examined, tens if not

77 hundreds to thousands of mutations have occurred. Utilizing Saccharomyces cerevisiae for

78 studying genomic instability mechanisms, including LOH, has long been an invaluable resource

79 in furthering our understanding of molecular underpinnings (for review see (Mager and

80 Winderickx 2005; Botstein and Fink 2011)). Much of the knowledge gained through yeast can

81 then be applied to higher eukaryotic organisms.

82 With the creation of the S. cerevisiae gene knockout collections, large scale surveys for

83 genes implicated in a particular phenotype can be systematically conducted (for review (Giaever 5

84 and Nislow 2014)). The 6,477 strains included in the heterozygous deletion collection (hereafter

85 referred to as SCDChet) target both essential and non-essential genes. In this screening, we

86 use SCDChet to systematically screen for genes with a haploinsufficient effect resulting in

87 increased LOH events. We first utilize the mating type (MAT) locus, on III, which is

88 endogenously heterozygous, MATa/MATα, in diploid yeast cells. The non-mating diploid

89 phenotype results from a co-repressive mechanism driven by the MATa and MATα alleles

90 (Duntze et al. 1970; Herskowitz 1995; Schmidlin Tom et al. 2007; Haber 2012). When an LOH

91 event occurs at this locus, the co-repressible mechanism is no longer active to prevent mating,

92 and the diploid cell mates as if it were haploid. As a secondary assay to confirm increases in

93 LOH events, mutants identified as top-hits for LOH at the MAT locus were tested for increases

94 in LOH at the separate MET15 locus on chromosome XII. The SCDChet is heterozygous at

95 MET15 (met15Δ/MET15) and a color-identifying sectoring assay allows LOH at this locus to be

96 readily measured (Cost and Boeke 1996).

97 The MAT and MET15 loci have been exploited as markers in other LOH screens interested

98 in identifying heterozygous and/or homozygous knockouts that increase genome instability

99 (Yuen et al. 2007; Andersen et al. 2008; Choy et al. 2013). Our goals were therefore two-fold:

100 first to identify genes with haploinsufficiency impacts on genome stability, and secondly, in

101 performing a variety of follow-up analyses, to identify those top-hits with the most potential to be

102 cancer susceptibility genes. These follow-up analyses include gene ontology characterizations,

103 positional gene enrichment identification, and multiple comparisons to other screens interested

104 in identifying genes with impacts on genome stability, to determine the enrichment and

105 reproducibility of gene identification. Thereby not only giving insight into candidate cancer

106 susceptibility genes that have a haploinsufficient impact, but also deepening our understanding

107 of the differences in results due to variations in screening set-ups and the stochastic nature of

108 LOH events. By extensively documenting the methodology used, as well as comparing datasets 6

109 across multiple screens, we aim to sort through much of the noise of screen results and avoid

110 the far too common crisis of reproducibility (Yamada and Hall 2015; Drucker 2016).

111

112 MATERIALS AND METHODS

113 Creation of mating tester haploids

114 One MATa haploid (MATa: his3-∆200, trp1-∆1, CF:[ura3: TRP1 SUP11 CEN4 D8B],

115 leu2-3, lys2-801, ura3-52, ade2-101) and one MATα haploid (MATα: his3-∆200, trp1-∆1,

116 CF:[ura3: TRP1 SUP11 CEN4 D8B], leu2-3, ura3-52, ade2-101) were transformed using lithium

117 acetate transformation with a HIS3MX cassette, carrying the Schizosaccharomyces pombe

118 HIS5 gene, to replace the wildtype CAN1 locus. This resulted in the production of haploid

119 strains EDS585a (MATa: his3-∆200, trp1-∆1, CF:[ura3: TRP1 SUP11 CEN4 D8B], leu2-3, lys2-

120 801, ura3-52, ade2-101, can1::HIS5) and EDS588α (MATα: his3-∆200, trp1-∆1, CF:[ura3: TRP1

121 SUP11 CEN4 D8B], leu2-3, LYS2, ura3-52, ade2-101, can1::HIS5) hereafter referred to as

122 585a and 588α.

123 Heterozygous deletion collection (SCDChet) screen for LOH at the MAT locus

124 The heterozygous deletion collection (6,477 strains) was purchased from Open

125 Biosystems (YSC1055). Three initial copies of the collection were made (200 μL YPD + G418

126 (0.2 mg/mL)); 10 μL of cells were transferred to a 96-well plate, grown overnight at 30°, 60 µL of

127 67% glycerol was added and plates were kept were frozen at -80°. Haploid strains 585a and

128 588α were struck from freeze-down stocks and grown for 3 days. One colony from each strain

129 was inoculated individually into 100 mL SC-His and grown overnight on a shaker. On the same

130 day, deletion collection plates were thawed and new copies were made of each plate grown in

131 200 μL YPD + G418 (0.2 mg/mL) with 20 μL starting inoculum of deletion collection cells. These 7

132 copies were incubated overnight at 30° to allow LOH events to occur. The deletion collection

133 cells were mixed with either 585a or 588α haploid cells (200 μL YPD, 10 μL SCDChet cells, 10

134 μL haploid cells) and grown for 24 hours at 30° to allow mating events to occur. The cells were

135 then spun down at 2,500 rpm for 4 minutes, the supernatant was removed, and the cells were

136 washed with 200 μL ddH2O. The cells were again spun down at 2,500 rpm, the supernatant was

137 removed, and the cells were resuspended in 200 μL ddH2O. Resuspended cells were pinned to

138 SC-His + G418 (0.4 mg/mL) in duplicate and incubated for 48 hours at 30° before colony counts

139 were taken. A scoring system was utilized to record colony count ranges. Pinned spots with no

140 colonies were recorded as “0,” 1-9 colonies were scored as “+,” 10-19 as “++,” 20-29 as “+++,”

141 and any colony counts above 30 were scored as “++++” (Fig 2C). Two complete trials for each

142 haploid mater were conducted, utilizing different copies of the SCDChet collection for each trial.

143 The methodology used in this screening was verified through a series of control

144 experiments. 585a and 588α were put through the screening protocol as described above, but

145 without SCDChet cells introduced to the samples. Haploid tester strains were grown in duplicate

146 and 20 µL samples were placed into 96-well plates. None of the 192 wells per haploid pinned

147 onto SC-His + G418 (0.4 mg/mL) contained colonies, indicating that false positives of the

148 haploid testers growing alone on double selection plates are unlikely. To screen for false

149 positives originating from SCDChet cells, five SCDChet plates were selected at random, and put

150 through the screening protocol above without the addition of a haploid tester. Of the 413 unique

151 strains pinned, 23 samples contained colonies. Fourteen of these strains were identified as

152 producing lawn growth, not single colonies, interpreted as their starting with a genotype

153 conferring an ability to grow on SC-His + G418 (0.4 mg/mL) without needing to mate with 585a

154 or 588α. (When assayed again in the full screen, these 14 wells continued to show lawn growth

155 when incubated with either the 585a or 588α strains.) Therefore, these wells were excluded as

156 top-hits, as well as any other wells that showed lawn growth, after incubation with both 585a 8

157 and 588α. The wells identified with this characteristic can be found in Table S1. Of the

158 remaining nine wells that contained colonies, not one contained a background level of growth to

159 score above a “++” on the scale established. This indicates a low level of false positives –

160 approximately 2% - if classifying any amount of growth as a positive hit. This then informed our

161 decision on a threshold for ‘top-hits’ from our screen (as discussed later), ensuring these types

162 of events are not enough to warrant categorization as a ‘top-hit’ to mark a gene as one of

163 interest. 585a and 588α were also mated with control haploids containing KANMX cassettes to

164 determine false negative rates. Two different control MATa and two different control MATα

165 strains, with KANMX cassettes inserted at different locations in the genome, were utilized. All

166 conditions were kept the same as the screening described above, but instead of inoculating

167 SCDChet cells, the KANMX-containing haploid controls were inoculated into 100mL YPD +

168 G418 (0.2mg/mL). Opposite mating types were paired. All pairings, 32 samples in total, resulted

169 in uncountable lawn growth or colony counts greater than thirty (++++ score). This verifies strain

170 mating occurs in the incubation time allotted, scoring at the top level of growth, and giving no

171 indication of false negatives.

172 Secondary screen for LOH at MET15

173 Methodology was adapted from previous screenings (Ono et al. 1991; Cost and Boeke

174 1996; Andersen et al. 2008); briefly, the 217 top-hits identified in the MAT locus LOH assay

175 were additionally evaluated to determine if LOH was also increased at a secondary locus. When

176 cells lacking functional MET15 are placed on plates containing lead (Pb2+), the excess sulfur

177 precipitates as dark-colored lead (II) sulfide (PbS) (Cost and Boeke 1996). Cells must contain

178 at least one functional copy of MET15 to remain white in appearance; cells that have undergone

179 LOH at MET15 and no longer maintain a functional copy will appear as a black/brown sector.

180 Two microliters of each SCDChet top-hit were inoculated into 200µL YPD and incubated for

181 three days at 30° to allow LOH events to occur. Cells were then pinned to plates in triplicate 9

182 containing 0.7mg/mL lead nitrate (for full plate recipe see Guide to Yeast Genetics and

183 Molecular Cell Biology, Elsevier, 2002), and allowed to grow at 30° for four days. Plates were

184 then placed at 4° for 24 hours to aid in color development before sectors were counted. As with

185 other published methodologies (Ono et al. 1991; Cost and Boeke 1996; Andersen et al. 2008),

186 pinned samples grew as patches, not individual colonies, and sectors are counted as the total

187 number of dark growth regions within the larger patch. The screen was run twice (two biological

188 replicates); strains were scored as positive for LOH at MET15 if an average of more than 13

189 sectors appeared in each replicate across the 6 replicates when analyzed at 24x magnification.

190 Eight mutants (KIN3, MET17, STH1, MVD1, HRB1, BSC5, VAM6, and YKR073C) were unable

191 to be analyzed for sectoring due to presenting as entirely brown colonies from the initial pinning

192 to the lead-containing plate.

193 The methodology used in this screening was verified through a series of control

194 experiments. To determine the baseline rate for LOH at MET15 for the SCDChet collection, we

195 utilized two biological replicates of a plate chosen from the SCDChet collection that contains no

196 strains identified as increasing LOH at MAT in our primary screening assay. In an attempt to

197 ensure that LOH events that happened earlier, and thus resulted in a larger sector portion, were

198 not unduly overrepresented in our results due to increased visibility, we utilized a dissecting

199 microscope for analysis of sectors. Sectors were counted at 24x magnification and averaged

200 across all 94 strains on the plate for 3 technical replicates. Analysis at 24x magnification

201 revealed a high level of sectoring in these strains with the average at 11 sectors. Previous

202 screens utilizing the MET15 locus and analyzing sector appearance did so without magnification

203 and set their threshold for classification as a hit at 2 sectors (Andersen et al. 2008). We

204 therefore set our top-hits threshold level at +2 over the average seen in this control set of

205 analyzed strains. The top-hit threshold was therefore set at having 13 or more sectors as

206 indicating an increase in LOH over the baseline. 10

207 It has been previously documented (Bae et al. 2017) that the SCDChet strain that

208 contains the knockout for YLR303W/MET17 (alias of MET15) is genotyped as

209 met15∆::kanMX/met15∆. Due to its lack of functioning MET15, all cells of this strain plated on

210 Pb2+-containing plates should appear entirely black/brown. This phenotype was confirmed in all

211 replicates (two biological, three technical) where this strain was subjected to the screening

212 protocol. This strain serves as a positive control for this secondary screening.

213 Fluctuation Analysis Benchmarking of Top Candidate Gene Deletions

214 The parental strain used to create SCDChet was ordered from Open Biosystems

215 (BY4743). This strain underwent lithium acetate transformation with a can1::HIS3 cassette in

216 order to render the CAN1 locus heterozygous. Mutant strains containing the heterozygous

217 knockouts of SSA1, CCT7, MRC1, RSM7, HSL7, SSE2, PUG1, YOR364W, and MEC1 were

218 grown from SCDChet and subjected to the same can1::HIS3 switch out via lithium acetate

219 transformation. Strains were then struck for individual colonies and grown at 30° until they

220 reached ~3mm, allowing LOH events to occur. Twenty-four individual colonies per strain were

221 then resuspended in water, assessed for optical density, and the 15 colonies with the most

222 similar size, as read by absorbance at 562nm greater than 0.5, were used for analysis. The rate

223 of LOH events resulting in a change to the CAN locus were determined by plating dilutions on

224 non-selective (YPD for population size) and SC-Arg- plus canavanine (60 μg/mL) plates (for

225 LOH events). Plates were grown for 3-5 days at 30°, followed by colony counting. Fluctuation

226 analysis for LOH rate with 95% confidence intervals (CI) were calculated utilizing the R

227 advanced calculation package Salvador (rSalvador) (Zheng 2002, 2008, 2016). The LOH rates

228 and confidence intervals were measured for two biological replicates for each strain.

229

230 Data Availability Statement 11

231 Strains are available upon request. Table S1. Table of complete MAT locus LOH screen data.

232 Table S2. Top hits identified in MAT locus LOH screen. Table S3. Sector counts of the 217

233 SCDChet gene deletions analyzed in secondary MET15 screen. Table S4. GO Slim Mapper

234 results for the 217 identified genes in the MAT locus screen. Table S5. GO Slim Mapper results

235 for the 100 identified genes in the MET15 secondary screen. Table S6. GO Slim Mapper and

236 GO Term Finder results for the 91 genes identified in two or more related independent studies.

237 Table S7. Positional Gene Enrichment analysis of the 217 MAT Top Hits, 100 MAT + MET15

238 Top hits, and 91 genes identified in 2+ independent studies. Table S8. Human homologs and

239 their associated cancer types for genes identified by the MAT locus screen. Table S9. Human

240 homologs and their associated cancer types for genes identified by multiple independent

241 studies. Table S10. Positional Gene Enrichment analysis of genes identified by similar

242 independent studies. Figure S1. Visual representation of Positional Gene Enrichment data for

243 enriched region on chromosome II. All supplemental files are available in the GSA Figshare

244 portal.

245

246 RESULTS

247 MAT Locus Screen For Genes With Haploinsufficiency Effects On LOH

248 To identify genes that, when heterozygously mutated, resulted in increased incidence of

249 LOH at the MAT locus, the SCDChet collection was screened. SCDChet cells were grown in 96-

250 well plates and paired separately with MATa and MATα haploid tester strains. When pinned

251 onto double selection media, only cells that had mated and formed triploids grew (Fig 2A and

252 2B). The entire collection was screened twice (biological replicates) with technical replicates

253 performed each time. Furthermore, this was done with both MATa and MATα haploid maters

254 resulting in 8 data points for each SCDChet strain. Each trial stamped on SC-His + G418 plates 12

255 was counted independently for growth and assigned a score of 0, +, ++, +++, or ++++ (Fig 2C).

256 The scores of all four replicates were summed for an SCDChet strain with given haploid pairing;

257 any combination of “+” scores adding to 12 or higher were further analyzed to be considered as

258 a ‘top-hit’. For comparison, across the whole SCDChet collection, the average amount of growth

259 seen on the double-selection plates scored as a single “+” for any one spot, and a summed plus

260 score of three across the four replicates. Further, any pairing that resulted in a score of twelve,

261 but contained a replicate that scored “++” or below was removed from top-hit consideration due

262 to the possibility of it being a false positive. Strains that were annotated in the SCDChet

263 database provided by Open Biosystems as having a phenotype related to mating were excluded

264 from consideration as a top-hit, as their ability to mate did not represent an LOH event, but an

265 inherent characteristic of that particular mutant. This screening mechanism resulted in 217

266 heterozygous gene mutations, approximately 3.4% of the genome, being scored as top-hits

267 (Table S2). For listing of all scores of all strains from this study, see Table S1.

268 Secondary Screen For Haploinsufficiency Effects On LOH At An Alternate Locus (MET15)

269 To further understand the extent of the impact of these 217 top-hit heterozygous gene

270 mutations on LOH events, these strains were screened for their LOH impacts at an alternate

271 secondary locus; the MET15 gene on chromosome XII. Each identified top-hit SCDChet strain

272 was grown up in a 96-well plate for three days, pinned to lead-containing plates, then grown up

273 into a patch over four days until color developed. When pinned onto plates containing Pb2+, cells

274 within a patch that have undergone an LOH event will appear as black/brown sectors. All 217

275 top-hit SCDChet strains were tested twice (biological replicates) and stamped onto lead-

276 containing plates in triplicate each time (technical replicates), resulting in 6 sector counts for

277 each mutant strain. One hundred of our initial 217 top-hit heterozygous gene deletions induced

278 increased LOH events at the secondary MET15 locus. This indicates that ~48% of our intiallially

279 identified heterozygous gene mutaitons have large reproducible effects on LOH at multiple loci, 13

280 representing 1.5% of the genome. Eight of the initial 217 strains were unable to have sectors

281 counted for LOH at MET15 due to presenting a brown phenotype throughout the entire patch.

282 See Table S3 for MET15 sectoring data for each tested strain.

283 Multiple Screen Comparison To Identify Reproduced Results From Independent Studies

284 Assessing LOH And/Or Haploinsufficiency Induced Genome Instability

285 To further expand our analysis of the reproducibility of gene identification, we compared

286 our results to previously reported screens that asked similar questions about genes contributing

287 to genome instability. Screens were selected for comparison based on a) utilizing heterozygous

288 knockouts to screen for genome instability, b) screening homozygous knockouts specifically for

289 LOH, or c) a combination of the two (for a visual representation of screen selection, see Fig 3A).

290 Choy et al. The greatest number of overlapping hits, 26, were seen when our results from the

291 MAT locus screen were compared to a screen that most mirrored our own, having utilized the

292 SCDChet to screen for LOH at the MAT locus, Fig 3B, (Choy et al. 2013). This overlap shows

293 statistical significance at the level of p = 3.995 x 10-5 when a hypergeometric probability was

294 calculated using a normal approximation, Table 1. (Ten of the overlapping genes, APD1, SSE2,

295 HSM3, FRM2, MRH1, LSM4, TMN3, PUG1, RPL36B, and CCT7, also reproduced in our

296 secondary screen for increasing LOH at MET15). This screen arrayed four samples from each

297 well of the SCDChet onto solid YPD media with either a MATa or MATα haploid tester to allow

298 for mating. The mating results for each pinned position were then summed to result in a score of

299 0-4 for each. The entire screen was performed in triplicate, strains that had a score ≥ 2 SD

300 above the mean were included as top-hits.

301 Strome et al. Previously an independent screen was conducted using random insertional

302 mutagenesis generated heterozygous knockouts, to identify gene mutations involved in genome

303 instability, specifically chromosome transmission fidelity (CTF) (Strome et al. 2008). 14

304 Heterozygous knockouts containing a chromosome fragment (CF) allowed for visual

305 identification of fragment loss due to insertional mutagenesis induced genome instability (Hieter

306 et al. 1985). The SCDChet was not used in this screening. Of the 164 hits identified, 4 of them

307 (2.4% of their 164) overlap with genes we have identified for increasing MAT LOH, representing

308 1.8% of our dataset, Fig 3B. Two of the identified gene deletions – RLI1 and BPH1 – also

309 increased LOH at MET15.

310 Andersen et al. Screens that assayed for LOH events even if not utilizing heterozygous

311 deletions can still provide valuable insight. This study utilized the homozygous deletion

312 collection (SCDChom) and looked at LOH events at three different loci. Their primary screen

313 utilized the intrinsic heterozygosity at the MET15 locus in SCDChom to measure LOH. To

314 further examine genes of interest identified in their initial screen, they constructed a strain with

315 Many Heterozygous Markers (MHM) on III, IV, and XII to understand the extent

316 of the events at various loci. From the initial MET15 screening in SCDChom, they identified 132

317 gene deletions that resulted in increased LOH. They were able to successfully recreate 114 of

318 these knockouts in their MHM background, which they screened again for MET15 loss at

319 elevated rates. Of the 114 MHM knockouts, 61 of them again demonstrated elevated LOH at

320 MET15. These 61 knockout strains were then examined further for the extent of their LOH

321 activity at two additional loci, SAM2 and MAT. Additional assays identified 26 of these genes

322 with effects increasing LOH events across three independent loci. In comparing all genes

323 identified as increasing LOH from their screening data, we observe three overlaps with our gene

324 list, SLX5, IRA2 and ICE2, Fig 3B. A representation factor (Rf) calculated for this overlap

325 indicates a value greater than 1, (Rf=1.4), indicating more overlap than expected, although not

326 at a p-value <0.05, Table 1. Our secondary MET15 screen also identified SLX5 and ICE2

327 knockdowns as contributing to increased MET15 LOH. These three genes represent a 4.9% 15

328 overlap in their dataset and 1.7% of our data (when corrected to remove essential genes not

329 identifiable in their screening method).

330 Yuen et al. Another screen looking at LOH events due to homozygous mutations once again

331 utilized SCDChom but examined the presence of the bi-mater (BiM) phenotype (among other

332 genome instability assays) (Yuen et al. 2007). The BiM assay measures LOH events at the MAT

333 locus that allow for mating with haploid testers of both mating types. When comparing top-hits

334 between our screens, 5 genes overlap as having elevated levels of LOH, Fig 3B. A

335 representation factor (Rf) calculated for this overlap indicates a value greater than 1, (Rf=1.2),

336 indicating more overlap than expected, although not at a p-value <0.05, Table 1. This

337 represents 2.8% of top-hit data from our screen (corrected to remove essential genes not

338 identifiable with the SCDChom), and 4.1% of overlap from their list of identified genes that lead

339 to a BiM phenotype. Two genes identified in both screens – IML3 and SGO1 – reproduced in

340 our secondary MET15 screening.

341 Schmidlin et al. A third screen utilizing SCDChom again analyzed mating capabilities with

342 haploid testers to examine LOH (Schmidlin Tom et al. 2007). Mating pairs of SCDChom cells

343 and a haploid tester were pinned to plates of minimal medium that allowed for triploid selection;

344 growth of four or more colonies in a sample was considered a positive hit. One hundred

345 homozygous gene deletions were identified in this initial screen, and seven of those overlap with

346 top-hits found in our screening, Fig 3B. A representation factor (Rf) calculated for this overlap

347 indicates a value greater than 1, (Rf=2.0), indicating more overlap than expected, although at a

348 p-value = 0.06, Table 1. Only including our non-essential hits, this represents 3.9% of our

349 dataset, and 7% of the strains initially identified in their screening. Eighty-nine homozygous

350 deletions were then remade by mating the corresponding deletion collection haploid strains to

351 make new homozygous diploids in the same background. In assays to confirm the mating

352 phenotype, six reconfirmed. One of the six genes they identified overlapped with one of our hits 16

353 – IST3. IST3, as well as RSM7 (one of the genes identified in their initial screen), were identified

354 in our secondary screen as increasing the rate of LOH at MET15.

355 While the stochastic nature of LOH events, as well as the differences in the instability

356 phenotypes being assayed, contribute to the limited overlap between screens, genes that

357 consistently appear as top-hits in screens interested in similar instability mechanisms provide

358 interesting avenues for further investigation. Fig 3B shows the results of the comparison of each

359 of these individual screens to each other and the 91 genes found minimally in two independent

360 publications, seven of which were identified in three studies. Furthermore in the current climate

361 of questions surrounding reproducibility we are pleased to see a significant level of overlap

362 between our screen results and the Choy et al. screening for MAT locus LOH with SCDChet

363 strains; approximately 11.9% of our top-hit genes were identified in their results and this

364 represents 7.8% of their dataset.

365 Gene Ontology Analysis for Identified Enrichment Categories

366 To achieve a primary understanding of the functions carried out by members of our first

367 list of top-hits, we analyzed this 217-member MAT locus LOH gene list using two Gene

368 Ontology (GO) tools: Saccharomyces Genome Database (SGD) GO Slim Mapper and GO Term

369 Finder. GO Slim Mapper provides an overview of broader parent terms that a gene can be

370 mapped to, selected by SGD curators, and does not automatically generate enrichment p-

371 values based on its grouping of genes into categories. Fisher’s exact test, with the Benjamini-

372 Hochberg correction for false discovery rates (FDR) (Q = 0.05) were selected and applied

373 based on their documented use for ontology analysis (Benjamini and Hochberg 1995; Sabatti et

374 al. 2003; Rivals et al. 2007). Significant results from SGD GO Slim Mapper (p-value < 0.05, and

375 p < Benjamini-Hochburg critical value) are summarized in Table 2A. Four ontology varieties of

376 SGD GO Slim Mapper were utilized: Cellular Component, Molecular Function, Biological 17

377 Process, and Macromolecular Complex. Alternatively, SGD GO Term Finder selects the most

378 granular term for each gene within a query, providing as detailed of an analysis as currently

379 available based on the literature (Christie et al. 2009). GO Term Finder uses a binomial

380 distribution to calculate p-values corrected for multiple comparison analysis, and there are three

381 varieties of annotations: Cellular Component, Molecular Function, and Biological Process. Many

382 of the categories found in GO Slim Mapper Macromolecular Complex are absorbed into the

383 more specific GO Term Finder Cellular Component, but GO Term Finder and GO Slim Mapper

384 analysis were kept separate due to the nature of the algorithms and the statistical analysis of

385 each dataset.

386 The SGD Slim Mapper Gene Ontology tool reported two significantly enriched groups

387 after FDR correction. For the Molecular Function ontology, Molecular Function Unknown (padj =

388 4.997x10-4) was identified as enriched with 88 genes from the 217 top-hits list lacking specific

389 information on their molecular function. Within the Biological Process ontology, Chromosome

-4 390 Segregation (padj = 4.605x10 ) was found to be overrepresented with 17 genes classified in this

391 group. To determine if a further understanding of the relationships between the top-hits

392 identified in this screen could be found using more explicit ontology terms, GO Term Finder

393 results were analyzed. The only ontology category that was found to be enriched through GO

394 Term Finder was Cellular Component – Chaperonin-containing T-complex (CCT-complex) (padj

395 = 0.0489). Four genes that are part of this complex were identified in our screening, CCT2,

396 CCT7, and CCT8, three core subunits of the complex, as well as SSA1, an ATPase that

397 associates with the core subunits. A complete list of GO Slim Mapper annotations, adjusted p-

398 values, and genes annotated to each term can be found in Table S4.

399 With the goal of identifiying additional ontologies of interest, SGD GO Slim Mapper and

400 GO Term Finder were also applied to the narrowed list of 100 genes identified as increasing

401 LOH events at two loci. Molecular Function Unknown (p = 2.328x10-4) again reproduced as 18

402 being significantly enriched in this dataset with 47 genes from 100 top-hit list represented, Table

403 2B. For the list of all ontologies for the 100 gene list and adjusted p-values of representation see

404 Table S5.

405 To determine if multiple screen comparisons for repetition of gene identification was

406 likely to lead to ontologies worthy of further pursuit, the 91 genes that were identified in at least

407 two of the independent studies previously discussed, were also analyzed with SGD GO Slim

408 Mapper and GO Term Finder to determine category enrichment. For GO Term Finder Biological

409 Process, 82 categories were annotated as significant (p < 0.05), the four categories with largest

-13 410 enrichment are DNA Metabolic Process (padj = 4.63x10 ), Cellular Response to DNA Damage

-15 -13 411 Stimulus (padj = 9.12x10 ), Cellular Response to Stress (padj = 3.75x10 ), and DNA Repair

-12 412 (padj = 1.19x10 ). GO Slim Mapper identified 14 Biological Process categories (p<0.05), the

-11 413 four with most significant enrichment are Organelle Fission (padj = 4.79x10 ), Cellular

-16 -10 414 Response to DNA Damage Stimulus (padj = 3.44x10 ), Mitotic Cell Cycle (padj = 1.04x10 ),

-14 415 and DNA Repair (padj = 1.30x10 ). Thirty-five categories were enriched for GO Term Finder

416 Cellular Component (p < 0.05); the four categories with largest enrichment are Chromosome

-12 -12 - 417 (padj = 3.25x10 ), Chromosomal Part (padj = 4.23x10 ), Nuclear Chromosome (padj = 6.29x10

8 -8 418 ), and Nucleus (padj = 2.41x10 ). GO Slim Mapper identified two Cellular Component categories

-9 -9 419 (p<0.05), Chromosome (padj = 3.42x10 ), and Nucleus (padj = 4.16x10 ). DNA-Dependent

420 ATPase Activity (padj = 0.000228), DNA Binding (padj = 0.00224), G-quadruplex DNA Binding

421 (padj = 0.00416) and Exonuclease Activity (padj = 0.0290) were the significant categories

422 identified using GO Term Finder Molecular Function. While GO Slim Mapper also picked up

05 423 DNA Binding (padj = 7.41x10- ) and ATPase Activity (padj = 0.000808) within the Molecular

424 Function ontology. GO Slim Mapper additionaly found 5 enriched groups in the Macromolecular

06 425 Complex category (p<0.05), Chromosome, Centromeric Region (padj = 1.16x10- ), Kinetochore

426 (padj = 0.000135), SUMO-Targeted Ubiquitin Ligase Complex (padj = 0.00019), Condensed 19

427 Nuclear Chromosome Outer Kinetochore (padj = 0.00019) and Condensed Nuclear Chromosome

428 Kinetochore (padj = 0.000205). For a complete list of GO results from the 91 gene top-hit list of

429 overlaps from the multiple screen comparison, see Table S6.

430 Analysis Of Positional Gene Enrichment for Chromosomal Regions of Interest

431 To investigate if our screens identified any enriched chromosomal regions, which might

432 be indicative of neighborhoods in the genome that contribute to LOH events, we mapped the

433 location of each gene identified as a top-hit using Positional Gene Enrichment analysis (PGE)

434 (De Preter et al. 2008). Genes, which when mutated, identified as top-hits causing increased

435 LOH were dispersed throughout all 16 chromosomes. However, utilizing PGE to assess our

436 different top-hit gene lists allowed us to identify locations with significant enrichment. Analysis of

437 our MAT 217 gene top-hits list identified 41 total locations with significant enrichment, further

438 refinement revealed seven clusters of genes, located on six different chromosomes, with three

439 or more ORFs constituting the enrichment and with multiple comparison adjusted p-values <

440 0.01 (Fig 4). When we analyzed the narrowed list of 100 mutants that increase LOH at both

441 assayed loci (hereafter referenced as the MAT + MET15 dataset), 27 total locations were

442 identified significant enrichment, further refinement revealed nine clusters of genes, located on

443 six different chromosomes, with three or more ORFs constituting the enrichment and with

444 multiple comparison adjusted p-values < 0.01. Analysis of our multiple screen comparison 91

445 gene top-hits list identified 22 total locations with significant enrichment, further refinement

446 revealed two clusters of genes, located on two different chromosomes, with three or more ORFs

447 constituting the enrichment and with multiple comparison adjusted p-values < 0.01. For a list of

448 all enriched chromosomal locations, see Table S7.

449 Chromosome II A large region of significance was identified comprised of 20 ORFs found on

-4 450 the right arm of chromosome II ( (BP) region 501798-545972) (padj = 2.75 x 10 ), 20

451 when the 217 MAT locus LOH top-hits list was evaluated, as shown in Fig 4A. Seven of the

452 ORFs found in this region were identified by our MAT locus LOH screen – HSL7, CKS1,

453 YBR137W, ATG42, BMT2, YBR144C, and APD1. When the MAT + MET15 data set was

454 subjected to PGE analysis, a more defined region of chromosome II (a subsection of the region

455 mentioned above) was found to be more significantly enriched (BP region 504848-545972) (padj

456 = 9.69 x 10-6) Fig 4A. Six of the seven previously mentioned gene deletions in this region, again

457 appear in a now refined territory comprised of 18 ORFs. We next refined our PGE investigation

458 to all top-hits found in our multiple screen analysis, aiming to better identify chromosomal

459 regions important in LOH events. The region of chromosome II identified above is expanded

460 (BP region 442918-575991) (padj = 0.016) and contains 5 ORF hits (YBR099C, IML3, HSL7,

461 APD1, and SSE2) in a 79 ORF region.

462 Chromosome V When the 217 top-hits from the initial screen of increasing LOH at the MAT

463 locus were analyzed, a 24-gene region containing eight top-hit ORFs was found to be enriched

-4 464 on chromosome V (BP region 375211-424307) (padj = 2.31 x10 ). These top-hit genes are

465 FLO8, LSM4, TMN3, RPL23B, DSE1, SAK1, COM2, and RPS26B Fig 4B. The genes SAK1,

466 COM2, and RPS26B are immediately adjacent to one another and are identified as a further

467 enriched cluster with a p-value of 9.57 x 10-4. After narrowing our MAT LOH top-hits list to 100

468 genes that increase LOH at both MAT and MET15, one larger region (BP 387228-560360) with

-6 469 11 hits across a 100-ORF span (padj = 9.69x10 ) and two further enriched subsections of this

470 region (BP region 387228-438340) and (BP region 387228-397649) were identified. The

471 downstream-shifted 30-gene region of chromosome V identified as enriched (BP region

-6 472 387,228-438,340) (padj = 9.69 x 10 ) is comprised of seven top-hits (LSM4, TMN3, RPL23B,

473 DSE1, SAK1, COM2, and YER135C) in the region of 30 genes, while the smaller region is

-4 474 comprised of three ORFs in a six ORF span (padj = 2.57x10 ) Fig 4B. Running the PGE analysis

475 with the overlapping 91 genes from the six compared screens presented a further narrowed 21

-5 476 enriched region (BP region 387228-396168) (padj = 2.03x10 ), with 3 repeated ORFs (LSM4,

477 TMN3, and SLX8) in a 5 ORF region (Fig. 4B). Notably, LSM4 and TMN3 were identified by two

478 independent studies (this study and Choy et al.), and SLX8 was identified by three independent

479 studies (Schmidlin et al., Andersen et al., and Yuen et al.).

480

481 Chromosome VII An enriched region of significance was identified comprised of 23 ORFs on

-3 482 chromosome VII (BP region 67598-98589) (padj = 3.57 x 10 ), when the 217 MAT locus LOH

483 top-hits list was evaluated, as shown in Fig 4C. Five of the ORFs (SHE10, MTC3, YGL218W,

484 KIP3, and SIP2) found in this region were identified by our MAT locus LOH screen. When the

485 narrowed list of 100 mutants that increase LOH at both MAT + MET15 were subjected to PGE

486 analysis this region is no longer identified as being enriched, and further does not recur in PGE

487 evaluation of the 91 overlap list from the multiple screen comparison.

488 Chromosome IX An enriched region was identified (BP region 255113 – 270572) when the 217

489 MAT locus LOH top-hits list was evaluated. This region contains five ORFs (GPP1, MMF1,

-4 490 PCL7, NEO1, and MET30) identified in a 10 ORF region (padj = 4.40 x 10 ) (Fig 4D). Two

491 enriched regions were identified when PGE was run with the 100 top-hits that appeared in both

492 our MAT + MET15 screens. The larger region runs from 193592-264891bp and had five hits

493 (ICE2, AIM19, YIL060W, MMF1, and NEO1) in a 49 ORF region. A smaller subsection was

494 identified as the second hit, this region is 18,502bp long and contains 14 ORFs of which 3 were

-3 495 found in our top-hits (BP region 246389-264891) (padj = 2.72 x 10 ) (Fig 4D). A 13 ORF region

496 of chromosome IX (BP region 83302-100501) contains two genes identified by at least two of

497 the six compared studies (padj = 0.0299). One of the core CCT-complex genes, CCT2, is found

498 in this region, and was identified by our study as well as Choy et al. Additionally, CSM2 was

499 identified by this study, and the primary screening conducted by Schmidlin et al. (Fig. 4D). 22

500 Chromosome X No enriched regions containing more than 2 ORFs are found when running

501 the 217 MAT locus genes alone. The MAT + MET15 100 gene analysis identifies three ORFs

- 502 (YJL009W, SYS1, and SPC1) in a 23 ORF region (BP region 419849-458354) (padj = 9.01 x 10

503 3) (Fig. 4E). A second enriched locus of interest is identified on chromosome X when the

504 overlapping gene list from the multiple screen comparison is analyzed. This region is from

505 491074-517506bp and contains 3 top hits (CPR7, PET191, and POL32) in a 12 ORF region (padj

506 = 4.18x10-4).

507 Chromosome XI Three ORFs (YKR073C, ECM4, and YKR078W) found in the MAT screen

508 are identified in a 8 ORF region on chromosome XI (BP region 577829-586351) (padj = 7.51 x

509 10-3) (Fig 4F). This region is not found in our MAT + MET15 list. However, an overlapping

510 region is found when the multiple screen comparison list is studied (BP region 584594-595940)

511 (padj = 0.013) and contains 2 ORFs, YKR078W and NUP133, in a 5 ORF region.

512 Chromosome XV A large region is returned from PGE analysis of the MAT top-hits list (BP

-3 513 region 69376-139045) (padj = 2.88x10 ) with seven (MED7, YOL134C, HRP1, PAP2, WSC3,

514 NDJ1, and COQ3) of the 46 ORFs in the region being found in that screen. A subsection of this

-3 515 region (BP region 69376-115808) (padj = 3.32 x 10 ) with four ORFs (MED7, HRP1, PAP2, and

516 WSC3) out of a 33 ORF section is identified from the 100 gene MAT + MET15 list (Fig 4G).

517 Analysis of the 91 overlap list from the multiple screen comparison did not return any enriched

518 regions on this chromosome.

519 Chromosome XVI No enriched regions are identified from the 217 MAT locus gene list for

520 chromosome XVI. However, when the 100 top-hits gene list from the MAT + MET15 screens is

521 analyzed for chromosomal enrichment four ORFs in an 83 ORF region are found (BP region

522 600646-728613) (padj = 0.041). Two (RPL36B and DIP5) out of a 20 ORF region are found in a

523 separate region of chromosome XVI (BP region 41043-76239) (padj = 0.046) and two other 23

524 ORFs (DDC1 and YIG1) out of a 9 ORF region (BP region 169769-181114) (padj = 0.019) are

525 also identified when the list of 91 genes that were found in at least two separate screens are

526 analyzed (Fig 4H). Within these regions several genes from heterozygous LOH/genome stability

527 screens are found, DIP5 was identified by both Strome et al. (2008) and Choy et.al., and

528 RPL36B was identified by Choy et al. and this study.

529 Fluctuation Analysis of LOH candidate genes selected from data narrowing criteria

530 In an effort to determine the level of LOH induced by heterozygous gene mutations of interest

531 we created strains capable of a quantitative fluctuation analysis assessment. Nine genes,

532 SSA1, CCT7, MRC1, RSM7, HS7, SSE2, PUG1, YOR364W, and MEC1, were chosen for this

533 assessment based on appearing within our screen and at least one secondary analyses

534 discussed above (additional description of gene functions’ can be found in the discussion

535 section below); the LOH rates of strains heterozygously mutated for these genes were then

536 benchmarked using fluctuation analysis at the CAN1 locus. PUG1, YOR364W, and RSM7 were

537 chosen because heterozygous mutations in these genes results in the highest LOH event

538 scores in both our MAT and MET15 assays. Futhermore YOR364W is categorized as a dubious

539 open reading frame (ORF) indicating lack of clarity on if a /functional product is produced,

540 and if so, an unknown function for that product. As such, YOR364W lacks characterization and

541 falls in the Unknown categories for all three main GO terms: Molecular Function, Biological

542 Process, and Cellular Component. MRC1 was selected because it was indentified in three of the

543 studies we analyzed, while SSE2 was chosen because it was identified in two of the separate

544 published studies (this study and Choy et al. that showed significant overlap) we analyzed for

545 increasing LOH. To learn more about one of the clustered regions identified through PGE we

546 selected three genes from the chromosomal II array, MEC1, SSE2, and HSL7. CCT7 and SSA1

547 were included in these analyses because they are part of the overrepresented gene ontology

548 category of the CCT-complex, identified by both our study and Choy et al. Finally CCT7, SSE2, 24

549 MRC1, RSM7, HSL7 and SSA1, all have identified human homologs (see discussion) which

550 could make their further investigation more relevant to LOH events in cancers. The SCDChet

551 strains containing heterozygous mutations in SSA1, CCT7, MRC1, RSM7, HSL7, SSE2, PUG1,

552 YOR364W, and MEC1, as well as the parental strain used to construct SCDChet – BY4743 –

553 were transformed with a can1::HIS3 cassette to render their CAN1 locus heterozygous. Four

554 independent fluctuation analysis experiments for the parental strain BY4743 were conducted to

555 estimate the baseline LOH rate and 95% confidence intervals (see Fig 5) using the rSalvador

556 package (Zheng 2002). The nine heterozygous gene deletion strains tested all showed

557 significant increased LOH rates with 7- to 31-fold increases over the parental LOH rate (see Fig

558 5). These increased rates demonstrate that all of our secondary analyses were successful in

559 narrowing candidates of interest whose single-gene deletion leads to a significant increase in

560 LOH.

561

562 DISCUSSION

563 Through the work presented here we sought to identify heterozygous gene mutations

564 with haploinsufficient impacts on LOH, as the homologs of these genes are particularly

565 interesting targets for study since loss/mutation to only one allele could induce LOH-based

566 cancer phenotypes. Multiple refinement strategies were employed to attempt to identify those

567 genes with the greatest potential as candidate cancer susceptibility genes for futher study.

568 Human Homolog Identification and Cancer Associations

569 Of the 217 genes identified as top-hits in the primary MAT locus LOH screening, we

570 were able to identify 127 with a known human homolog (58.5%) utilizing YeastMine

571 (https://yeastmine.yeastgenome.org) and NCBI Homologene 25

572 (https://www.ncbi.nlm.nih.gov/homologene). Literature searches on these 127 genes identified

573 86 with a known association with cancer incidence (40% of the top-hits list, 68% of the “genes

574 with human homologs” list) (see Table S8 for all top-hits with known human homologs). When

575 the smaller list of mutants that increased LOH at both MAT + MET15 was examined for human

576 homologs in the same manner as above, 58 of the 100 genes (58%) were identified as having a

577 human homolog, with 36 of the 58 human homologs with a known association to cancer (62%).

578 By again performing data comparisons to multiple independent studies, we are able to

579 collect more information about genes with human homologs and those with known associations

580 to cancers. Sixty-seven out of the 91 genes identified in at least two screens, have a known

581 human homolog (74%). Including the genes identified in our screen (discussed above), a search

582 of the current literature revealed that 54 of those 67 genes have an association with cancer

583 (69%) (Table S9) (Sun et al. 2002; Leone et al. 2003; Mason et al. 2014; Abdel-Fatah et al.

584 2014; Hennecke et al. 2015; The et al. 2015; Wang et al. 2015; Taguchi et al. 2015; Dai et al.

585 2016). This serves as a positive control that performing these screens, and the multiple screen

586 analyses, identifies genes, both with known human homologs that can be investigated for

587 impacts on cancer development, as well as those that have already been linked to roles in

588 cancer progression. The list of genes, with known human homologs not already associated with

589 cancer phenotypes, are prime candidates as novel cancer susceptibility genes for future studies

590 (as an example see discussion of LSM4 below).

591 Multiple Screen Comparisons for Reproducibility of Gene Identification

592 In addition to conducting new screens to search for additional candidates, pooling the

593 data from multiple, related screens, serves as a further layer to test reproducibility and enables

594 identification of candidate genes that are likely contributors to genome instability. 26

595 SSE2 encodes a heat shock protein 70 (HSP70) family member which functions as a

596 nucleotide exchange factor for cytosolic Hsp70s during protein refolding (Easton et al. 2000;

597 Shaner et al. 2005; Dragovic et al. 2006). This gene was identified in our primary MAT screen,

598 reproduced in our secondary LOH screen at MET15, was also identified by Choy et al. as

599 contributing to LOH at MAT in a haploinsuffient manner, and is found in an enriched region of

600 chromosome II, and has a human homolog (HSPH1) with a known cancer association. The

601 discovered relationship of heterozygous loss of this gene to induction of LOH events in yeast

602 could provide an avenue for further mechanistics studies in this model system, a pathway for

603 study design in mammalian cells, and a justification to screen for HSPH1 alterations in

604 additional cancer types.

605 Similarly, LSM4 was identified in our MAT and MET15 datasets (with one of the highest

606 levels of LOH events observed in each), was also identified by Choy et al., is found in an

607 enriched region of chromosome V, and has a human homolog (LSM4), however there is no

608 current direct cancer association. This gene encodes part of a protein complex involved in

609 mRNA splicing, processing body assembly, and decay (He and Parker 2000; Beggs 2005). First

610 characterized in systemic lupus erythemathosus patients, this protein is an intriguing prospect

611 for further study, as SLE is a heterogeneous disorder, linked to increased incidence of many

612 cancer types, however the data are inconsistent (Gayed et al. 2009; Song et al. 2018). Direct

613 investigation into LSM4 haploinsufficiency in a yeast model might allow investigation into the

614 role of LOH events in disease progression.

615 Additionally, the seven genes found in three overlapping, related screens – DCC1,

616 SLX8, EMI2, MRC1, MCM21, RRM3, and ELG1 – qualify as strong candidates for further

617 investigation due to consistently contributing to instability via LOH mechanisms under a variety

618 of experimental conditions. Six of these genes are categorized in the DNA Metabolic Processes

619 gene ontology and are further linked to chromosome organization (DCC1, SLX8, MRC1, 27

620 MCM21, RRM3, and ELG1). With known human homologs and cancer associations for all but

621 MCM21, study of the mechanisms of association with increased LOH events could yield more

622 information for inclusion in epidemiological studies as well as pathway information for targeted

623 treatments.

624 Additional genes that induced the highest levels of LOH in our screens and showed

625 statistically significant increases in LOH via fluctuation analysis are RSM7, PUG1, and

626 YOR364W. RSM7 encodes a small subunit mitochondrial ribosomal protein (Saveanu et al.

627 2001) and mutations in this gene were also picked up by Schmidlin et al as increasing LOH

628 events. Little else has been published about this gene and it is intriguing to consider how

629 defects in a mitochondrial ribosomal protein induce nuclear genome instability. Further, while a

630 human homolog has been identified, to date there has been no association with cancer making

631 it a novel candidate cancer predisposition gene identified here. PUG1 encodes a

632 protoporphyrin uptake protein in the plasma membrane also involved in heme transport

633 (Protchenko et al. 2008; Manente and Ghislain 2009). Choy et al identified haploinsufficiency

634 effects of loss of this gene on chromosome maintenance and Zhu et al have shown increased

635 colony sectoring in a CTF assay using the SCDChet (Zhu et al. 2015). Again, little additional

636 information has been published on this gene. YOR364W, however, has had no direct work

637 published about it, and is therefore another intriguing candidate. This ORF is still considered

638 “dubious” and “unlikely to encode a functional protein”

639 (https://www.yeastgenome.org/locus/S000005891). We however saw high levels of LOH

640 induction in both our MAT locus and MET15 locus screens as well as 14-fold increase in LOH at

641 the CAN1 locus via fluctuation analysis.

642 Gene Ontology Enrichment 28

643 Further indication that our screen results are pertinent to the identification of human

644 cancer-relevant gene mutations comes from our gene ontology analyses. On top of identifying

645 many ontologies with clear cancer relevance such as Chromosome Segregation, Mitotic Cell

646 Cycle, and DNA Repair, our GO analysis identified other ontologies which may hold relevance,

647 but require more analysis of the members for candidate cancer susceptibility genes. For

648 example, one of the top GO Term Finder ontology hits identified the CCT-complex; conserved

649 from yeast to humans, some components of this complex have known cancer impacts. In yeast,

650 systematic identical null mutations in each subunit of the eight core CCT complex

651 revealed varying phenotypic effects, indicating the possibility of secondary roles that individual

652 subunits play in addition to cytoskeletal subunit folding as part of the CCT-complex (Amit et al.

653 2010). Secondary cellular roles of these subunits are additionally supported in research linking

654 many, but not all, of the eight subunits to different cancer types. For instance, the homolog of

655 one of our identified top-hits, CCT2, along with TCP1, has recently been linked as a driver

656 mutation in breast tumor formation (Guest et al. 2015). As well, change in expression of CCT8

657 has been linked to poor prognosis in glioma patients (Qiu et al. 2015). Studying the proteins of

658 this complex could lead to more mechanistic insight on their secondary roles as well as

659 determination of if other members of the complex play a role in cancer development.

660 Furthermore, because members of the CCT-complex are essential genes, utilizing the SCDChet

661 allowed for identification of this complex where it could not have been assessed using

662 homozygous deletions; therefore, allowing for a more comprehensive look at genes contributing

663 to genome instability through LOH events.

664 Chromosome Location Enrichment Clustering Of Genes Involved In LOH

665 In the search for additional information that might aid in generating a better

666 understanding of LOH we investigated the chromosome location of hits within our screen results

667 alone, and further compared across multiple screens. These searches hold the potential to 29

668 identify chromosomal regions and genes important in LOH events as well as to rule out genes

669 that might have been identified due to artifacts in strain construction. On the one hand there are

670 acknowledged errors, advantages, and disadvantages, for using a particular deletion collection

671 (aneuploid strains, secondary mutations, incorrect genotyping) as well as screen-specific issues

672 of strain reconstruction due to the nature of certain mutations (Hughes et al. 2000; Giaever et al.

673 2002; Deutschbauer et al. 2005; Yuen et al. 2007; Schmidlin Tom et al. 2007; Andersen et al.

674 2008; Ben-Shitrit et al. 2012; Giaever and Nislow 2014). These known inconsistencies however

675 do not negate the importance of the S. cerevisiae deletion collections as a tool for

676 understanding genome instability, and the ability to apply knowledge to the study of higher

677 eukaryotes. Conversely, enriched regions may contain regulatory sequences or identify the

678 presence of a particularly important gene in the vicinity, and may point us to candidates with the

679 most potential for further investigation. Because there is not conclusive evidence in all cases to

680 confirm if a gene knockout is contributing to LOH due to regional effects, or via an independent

681 mechanism, all possibilities need to be considered. We take the identified region of

682 chromosome II as an example to discuss these models.

683 Chromosome II The large region of significance identified on the right arm of chromosome II,

684 included seven MAT locus LOH screen identified genes within a 20 ORF region – HSL7, CKS1,

685 YBR137W, ATG42, BMT2, YBR144C, and APD1 and expanded to 19 identified hits in a region

686 of 49 ORFs when run with the 91 gene multiple screen comparison top-hits list. Several

687 possibilities exist for why this region was identified as an enriched locus for increasing LOH

688 events.

689 The first possibility is that all/most of the genes in this neighborhood have independent

690 impacts on LOH occurrences. While a few occasions of clustering of yeast genes with similar

691 functions have been identified (Zhang and Smith 1998), it is possible that due to the large

692 variety of gene functions that could be perturbed and lead to LOH that this region was not 30

693 previously identified as one of these such clusters. Evolutionarily this clustering may have come

694 into existance due to co-regulation of these genes for their involvement in genome stability

695 maintenance.

696 A second possibility is that neighboring gene effects are driving LOH events through a

697 single driver gene in the locus. For example, near the center of this region lies the gene MEC1.

698 Orthologous to human ATR, MEC1 is a critical mitotic checkpoint gene that plays an important

699 role in responding to DNA damage as the cell navigates through the cell cycle (Harrison and

700 Haber 2006; Bandhu et al. 2014). Further, mutations in MEC1 have previously been shown to

701 increase mitotic recombination events (Fasullo and Sun 2008) and decrease chromosome

702 maintenance events (Choy et al. 2013), both of which could lead to increases in LOH; and has

703 been identified in at least two previous screens looking for genes with impacts on genome

704 stability (Stirling et al. 2011; Choy et al. 2013). MEC1 was identified as a top-hit by Choy et al

705 indicating heterozygous loss of this locus can increase LOH events. As an essential gene this

706 ORF would not be identifiable through screens utilizing the homozygous deletion collection.

707 Other groups have reported neighborhood effects of KANMX insertions, however these are

708 generally limited to adjacent genes within 600bp upstream or downstream of the driver locus.

709 Ben-Shitrit et al. published a mechanism to predict such neighboring gene effects, however

710 implementation requires a known protein-protein interaction network with anchoring proteins for

711 the phenotype being measured (Ben-Shitrit et al. 2012). Since LOH events are caused by

712 widely variable mechanisms via genes with widely variable functions we were unable to utilize

713 their algorithm in this situation. Further study of this region to determine if all the identified ORFs

714 are presenting as top-hits due to their proximity to a particular gene, like MEC1, or if they are

715 contributing to genome instability through neighborhood-independent mechanisms could be

716 accomplished through additional studies. This could take the form of individual knockout of

717 every gene separately in a new strain, LOH rate quantification (such as via fluctuation analysis), 31

718 and complementation assays, or through separate testing for expression levels of all of the

719 genes in the region in each of the individual SCDChet strains representing the genes across this

720 chromosomal region. Examining the extent of individual neighborhood effects in these clusters

721 is a future direction that falls outside the scope of this study.

722 A third model is that the chromosomal region itself, potentially through TF binding,

723 histone modification, replication firing, or three-dimensional architecture, may play roles in

724 multiple loci being identified. Several studies have shown that a gene's expression level tends to

725 be similar to that of its neighbors on a chromosome (Zhang and Smith 1998; Kruglyak and Tang

726 2000; Cohen et al. 2000). If these are dosage sensitive genes, this might contribute to them

727 having haploinsufficiency effects, which might account for the chromosomal region’s

728 identification. To assess this possibility for the chromosome II region we searched for

729 topologically associating domains (TAD) within a high-throughput chromosome conformation

730 capture (Hi-C) dataset (Eser et al. 2017). A 130kb TAD was found to stretch across the region

731 we identified (spanning from ~450-580kb on Chr II), however the authors of that study report

732 that the TAD-like domains they found were more strongly correlated with replication timing than

733 with transcription.

734 A fourth model is that chromosomal regions are being identified as involved in increased

735 LOH as a result of an artifact of how the knockouts were generated. To attempt to assess the

736 possibility of region identification due to strain artifacts we have done two analyses. We started

737 by running PGE with all 898 genes from the multiple screen comparisons (De Preter et al.

738 2008). The logic here is that if large regions are identified without individual genes being

739 replicated this could tell us about the possibility for artifacts. Further, in running PGE in this

740 manner we made note of which screen identified each gene, Table S10. We then further

741 categorized each gene identified to the screen set-up and screens utilizing the same starting

742 strains (i.e. SCDChet vs SCDChom vs non-SCDC strains) were combined to determine if strain 32

743 creation artifacts could be at play. We found that for the region in chromosome II, nearly all of

744 the hits (16/19) were found from screens that utilized SCDChet, Figure S1. This could support

745 the theory that something about these particular strains and the way they were made is leading

746 to identification of this locus, but could also indicate a region high in genes with

747 haploinsufficiency effects or essential genes not able to be identified by SCDChom screens

748 (further substantiated by the fact that the other screen that identified two genes in this region did

749 not use either deletion collection but completed insertional mutagenesis to assay for

750 heterozygous effects (Strome et al. 2008)). We therefore moved to a second analysis to assess

751 how strains were created as part of making the S. cerevisiae deletion collections. Since past

752 artifacts have been identified when a set of strains were all created in the same lab (Lehner et

753 al. 2007) we wanted to gauge if this chromosomal region of knockouts met this criteria.

754 Information on strain creation from the Standford yeast deletion project website (http://www-

755 sequence.stanford.edu/group/yeast_deletion_project/overlapping.html) identifies “lab 11” as

756 having created this entire block of gene mutations, ending just where our identified region ends

757 (at ORF YBR172C), however starting farther up (having created strains starting at YBR080C,

758 our identified region starts at YBR127C). This unfortunately both supports and conflicts with the

759 theory of artifact-induced LOH events in all strains made by this group. Supporting evidence is

760 that this group did make all the gene knockout strains we identified. Conflictingly, however we

761 would have expected to have identified all genes/most genes in the chromosomal region that

762 encompasses all of the strains they created, not approximately half.

763 These models could apply to all of the PGE identified enriched regions and further

764 investigation into all of these loci are warranted, but are outside of the scope of this mutant

765 screen report.

766 Essential genes 33

767 As less frequently evaluated loci, not assayed in haploid studies or studies of the

768 homozygous deletion collection that are more frequently conducted, the cohort of essential

769 genes from our screen are interesting targets for further study. 37 essential genes were

770 identified in our primary screen RPN6, LSM4, HSK3, PFS2, RLI1, SSY1, RSC3, BUR6, SLD3,

771 ERG7, TTI1, HRP1, PSA1, FAP7, KRS1, FRQ1, MOB2, YJL009W, CCT7, SPC3, MED7,

772 GPN2, RPO26, TSC13, MPS1, MET30, STH1, CCT2, CCT8, TUB1, MVD1, YOL134C, RKI1,

773 RET1, MCM2, SMC1 and NEO1. Of these essential genes one, CCT7, was chosen for strain

774 recreation to allow quantitative assesement of LOH rate due to heterozygous mutation. The

775 significant 27-fold increase in LOH due to loss of this gene demonstrates its haploinsufficient

776 impact on genome stability. Amongst this group, 32, have previously identified human homologs

777 of which 23 have a known cancer association. Once again, analyses of these

778 genes/proteins/pathways in a yeast system may help uncover additional mechanistic

779 information important in understanding their roles in cancer development. Further, the nine

780 genes (LSM4, BUR6, ERG7, KRS1, SPC3, GPN2, TSC13, MUD1, and NEO1) with known

781 human homologs not already associated with cancer phenotypes are prime candidates as novel

782 cancer susceptibility genes as they, like their yeast counterparts, may be able to induce

783 haploinsufficient effects, not abiding the two-hit hypothesis, and therefore being more impactful

784 gene mutations.

785 Conclusions

786 Analyzing new screen data independently and then in conjunction with gene lists from

787 relevant published works, as well as performing analyses beyond individual genes by looking at

788 such items as ontology representation and positional gene enrichment, can increase the power

789 of a study to identify genes of most importance. The studies a group might choose for

790 comparison could be selected by literature search identification of relevant screens. We propose

791 that this methodology of candidate narrowing allows the community to wade through the noise 34

792 of data and focus on chromosomal deletions that play important roles in the phenotype or

793 pathway of interest.

794 Acknowledgements

795 The content is solely the responsibility of the authors and does not necessarily represent the

796 official views of the National Institutes of Health. Research was supported by an Institutional

797 Development Award (IDeA) from the National Institute of General Medical Sciences of the

798 National Institutes of Health: P20GM1234 and an NIGMS R15 Area Award: 1R15GM109269-

799 01A1.

800

801 References

802 Abdel-Fatah, T. M. A., R. Russell, N. Albarakati, D. J. Maloney, D. Dorjsuren et al., 2014 Genomic and

803 protein expression analysis reveals flap endonuclease 1 (FEN1) as a key biomarker in breast and

804 ovarian cancer. Mol. Oncol. 8: 1326–1338.

805 Alimonti, A., A. Carracedo, J. G. Clohessy, L. C. Trotman, C. Nardella et al., 2010 Subtle variations in Pten

806 dose determine cancer susceptibility. Nat. Genet. 42: 454–458.

807 Amit, M., S. J. Weisberg, M. Nadler-Holly, E. A. McCormack, E. Feldmesser et al., 2010 Equivalent

808 Mutations in the Eight Subunits of the Chaperonin CCT Produce Dramatically Different Cellular

809 and Gene Expression Phenotypes. J. Mol. Biol. 401: 532–543.

810 Andersen, M. P., Z. W. Nelson, E. D. Hetrick, and D. E. Gottschling, 2008 A Genetic Screen for Increased

811 Loss of Heterozygosity in Saccharomyces cerevisiae. Genetics 179: 1179–1195.

812 Antonacopoulou, A. G., P. D. Grivas, L. Skarlas, M. Kalofonos, C. D. Scopa et al., 2008 POLR2F, ATP6V0A1

813 and PRNP expression in colorectal cancer: new molecules with prognostic significance?

814 Anticancer Res. 28: 1221–1227. 35

815 Apasu, J. E., D. Schuette, R. LaRanger, J. A. Steinle, L. D. Nguyen et al., 2019 Neuronal calcium sensor 1

816 (NCS1) promotes motility and metastatic spread of breast cancer cells in vitro and in vivo. FASEB

817 J. Off. Publ. Fed. Am. Soc. Exp. Biol. 33: 4802–4813.

818 Babeu, J.-P., S. D. Wilson, É. Lambert, D. Lévesque, F.-M. Boisvert et al., 2019 Quantitative Proteomics

819 Identifies DNA Repair as a Novel Biological Function for Hepatocyte Nuclear Factor 4α in

820 Colorectal Cancer Cells. Cancers 11:.

821 Bae, N. S., A. P. Seberg, L. P. Carroll, and M. J. Swanson, 2017 Identification of Genes in Saccharomyces

822 cerevisiae that Are Haploinsufficient for Overcoming Amino Acid Starvation. G3

823 GenesGenomesGenetics 7: 1061–1084.

824 Bai, D., J. Zhang, T. Li, R. Hang, Y. Liu et al., 2016 The ATPase hCINAP regulates 18S rRNA processing and

825 is essential for embryogenesis and tumour growth. Nat. Commun. 7: 12310.

826 Baldeyron, C., A. Brisson, B. Tesson, F. Némati, S. Koundrioukoff et al., 2015 TIPIN depletion leads to

827 apoptosis in breast cancer cells. Mol. Oncol. 9: 1580–1598.

828 Bandhu, A., J. Kang, K. Fukunaga, G. Goto, and K. Sugimoto, 2014 Ddc2 Mediates Mec1 Activation

829 through a Ddc1- or Dpb11-Independent Mechanism. PLoS Genet. 10:.

830 Bausch, B., F. Schiavi, Y. Ni, J. Welander, A. Patocs et al., 2017 Clinical Characterization of the

831 Pheochromocytoma and Paraganglioma Susceptibility Genes SDHA, TMEM127, MAX, and

832 SDHAF2 for Gene-Informed Prevention. JAMA Oncol. 3: 1204–1212.

833 Beggs, J. D., 2005 Lsm proteins and RNA processing. Biochem. Soc. Trans. 33: 433–438.

834 Benjamini, Y., and Y. Hochberg, 1995 Controlling the false discovery rate: a practical and powerful

835 approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57: 289–300.

836 Ben-Shitrit, T., N. Yosef, K. Shemesh, R. Sharan, E. Ruppin et al., 2012 Systematic identification of gene

837 annotation errors in the widely used yeast mutation collections. Nat. Methods 9: 373–378. 36

838 Berger, A. H., A. G. Knudson, and P. P. Pandolfi, 2011 A continuum model for tumour suppression.

839 Nature 476: 163–169.

840 Bianco, J. N., V. Bergoglio, Y.-L. Lin, M.-J. Pillaire, A.-L. Schmitz et al., 2019 Overexpression of Claspin and

841 Timeless protects cancer cells from replication stress in a checkpoint-independent manner. Nat.

842 Commun. 10:.

843 Boele, J., H. Persson, J. W. Shin, Y. Ishizu, I. S. Newie et al., 2014 PAPD5-mediated 3’ adenylation and

844 subsequent degradation of miR-21 is disrupted in proliferative disease. Proc. Natl. Acad. Sci. U.

845 S. A. 111: 11467–11472.

846 Botstein, D., and G. R. Fink, 2011 Yeast: an experimental organism for 21st Century biology. Genetics

847 189: 695–704.

848 Brant, K. A., and G. D. Leikauf, 2014 Dysregulation of FURIN by prostaglandin‐endoperoxide synthase 2

849 in lung epithelial NCI‐H292 cells. Mol. Carcinog. 53: 192–200.

850 Broustas, C. G., K. M. Hopkins, S. K. Panigrahi, L. Wang, R. K. Virk et al., 2019 RAD9A promotes

851 metastatic phenotypes through transcriptional regulation of anterior gradient 2 (AGR2).

852 Carcinogenesis 40: 164–172.

853 Chang, K.-W., H.-L. Chen, Y.-H. Chien, T.-C. Chen, and C.-T. Yeh, 2011 SLC25A13 gene mutations in

854 Taiwanese patients with non-viral hepatocellular carcinoma. Mol. Genet. Metab. 103: 293–296.

855 Chang, J.-S., C.-Y. Su, W.-H. Yu, W.-J. Lee, Y.-P. Liu et al., 2015 GIT1 promotes lung cancer cell metastasis

856 through modulating Rac1/Cdc42 activity and is associated with poor prognosis. Oncotarget 6:

857 36278–36291.

858 Chen, Y.-C., B. Humphries, R. Brien, A. E. Gibbons, Y.-T. Chen et al., 2018 Functional Isolation of Tumor-

859 Initiating Cells using Microfluidic-Based Migration Identifies Phosphatidylserine Decarboxylase

860 as a Key Regulator. Sci. Rep. 8: 244. 37

861 Chen, T.-M., M.-C. Lai, Y.-H. Li, Y.-L. Chan, C.-H. Wu et al., 2019 hnRNPM induces translation switch

862 under hypoxia to promote colon cancer development. EBioMedicine 41: 299–309.

863 Chen, J., Y. Lu, J. Xu, Y. Huang, H. Cheng et al., 2004 Identification and characterization of NBEAL1, a

864 novel human neurobeachin-like 1 protein gene from fetal brain, which is up regulated in glioma.

865 Mol. Brain Res. 125: 147–155.

866 Chen, Z., L. Xu, T. Su, Z. Ke, Z. Peng et al., 2017 Autocrine STIP1 signaling promotes tumor growth and is

867 associated with disease outcome in hepatocellular carcinoma. Biochem. Biophys. Res. Commun.

868 493: 365–372.

869 Chen, C., Y. Yin, C. Li, J. Chen, J. Xie et al., 2016 ACER3 supports development of acute myeloid leukemia.

870 Biochem. Biophys. Res. Commun. 478: 33–38.

871 Cho, H.-J., S. Lee, Y. G. Ji, and D. H. Lee, 2018 Association of specific gene mutations derived from

872 machine learning with survival in lung adenocarcinoma. PloS One 13: e0207204.

873 Choi, W. H., S. Lee, and S. Cho, 2017 Microsatellite Alterations and Protein Expression of 5 Major Tumor

874 Suppressor Genes in Gastric Adenocarcinomas. Transl. Oncol. 11: 43–55.

875 Choi, C., N. Thi Thao Tran, T. Van Ngu, S. W. Park, M. S. Song et al., 2018 Promotion of tumor

876 progression and cancer stemness by MUC15 in thyroid cancer via the GPCR/ERK and integrin-

877 FAK signaling pathways. Oncogenesis 7: 85.

878 Chou, Y.-T., J.-K. Jiang, M.-H. Yang, J.-W. Lu, H.-K. Lin et al., 2018 Identification of a noncanonical

879 function for ribose-5-phosphate isomerase A promotes colorectal cancer formation by

880 stabilizing and activating β-catenin via a novel C-terminal domain. PLoS Biol. 16: e2003714.

881 Choy, J. S., E. O’Toole, B. M. Schuster, M. J. Crisp, T. S. Karpova et al., 2013 Genome-wide

882 haploinsufficiency screen reveals a novel role for γ-TuSC in spindle organization and genome

883 stability. Mol. Biol. Cell 24: 2753–2763. 38

884 Christie, K. R., E. L. Hong, and J. M. Cherry, 2009 Functional annotations for the Saccharomyces

885 cerevisiae genome: the knowns and the known unknowns. Trends Microbiol. 17: 286–294.

886 Ciou, S.-C., Y.-T. Chou, Y.-L. Liu, Y.-C. Nieh, J.-W. Lu et al., 2015 Ribose-5-phosphate isomerase A

887 regulates hepatocarcinogenesis via PP2A and ERK signaling. Int. J. Cancer 137: 104–115.

888 Cohen, B. A., R. D. Mitra, J. D. Hughes, and G. M. Church, 2000 A computational analysis of whole-

889 genome expression data reveals chromosomal domains of gene expression. Nat. Genet. 26:

890 183–186.

891 Cost, G. J., and J. D. Boeke, 1996 A useful colony colour phenotype associated with the yeast

892 selectable/counter-selectable marker MET15. Yeast 12: 939–941.

893 Dai, B., P. Zhang, Y. Zhang, C. Pan, G. Meng et al., 2016 RNaseH2A is involved in human gliomagenesis

894 through the regulation of cell proliferation and apoptosis. Oncol. Rep. 36: 173–180.

895 Datta, B., 2009 Roles of P67/MetAP2 as a tumor suppressor. Biochim. Biophys. Acta 1796: 281–292.

896 De Preter, K., R. Barriot, F. Speleman, J. Vandesompele, and Y. Moreau, 2008 Positional gene enrichment

897 analysis of gene sets for high-resolution identification of overrepresented chromosomal regions.

898 Nucleic Acids Res. 36: e43.

899 Deutschbauer, A. M., D. F. Jaramillo, M. Proctor, J. Kumm, M. E. Hillenmeyer et al., 2005 Mechanisms of

900 Haploinsufficiency Revealed by Genome-Wide Profiling in Yeast. Genetics 169: 1915–1925.

901 Di Giammartino, D. C., and J. L. Manley, 2014 New links between mRNA polyadenylation and diverse

902 nuclear pathways. Mol. Cells 37: 644–649.

903 Diefenbacher, M., and A. Orian, 2017 Stabilization of nuclear oncoproteins by RNF4 and the ubiquitin

904 system in cancer. Mol. Cell. Oncol. 4: e1260671.

905 Ding, N., R. Li, W. Shi, and C. He, 2018 CENPI is overexpressed in colorectal cancer and regulates cell

906 migration and invasion. Gene 674: 80–86. 39

907 Donato, M. D., M. Fanelli, M. Mariani, G. Raspaglio, D. Pandya et al., 2015 Nek6 and Hif-1α cooperate

908 with the cytoskeletal gateway of drug resistance to drive outcome in serous ovarian cancer. Am.

909 J. Cancer Res. 5: 1862–1877.

910 Dong, J., J. Geng, and W. Tan, 2018 MiR-363-3p suppresses tumor growth and metastasis of colorectal

911 cancer via targeting SphK2. Biomed. Pharmacother. Biomedecine Pharmacother. 105: 922–931.

912 Dragovic, Z., S. A. Broadley, Y. Shomura, A. Bracher, and F. U. Hartl, 2006 Molecular chaperones of the

913 Hsp110 family act as nucleotide exchange factors of Hsp70s. EMBO J. 25: 2519–2528.

914 Drucker, D. J., 2016 Never Waste a Good Crisis: Confronting Reproducibility in Translational Research.

915 Cell Metab. 24: 348–360.

916 Duntze, W., V. MacKay, and T. R. Manney, 1970 Saccharomyces cerevisiae: A Diffusible Sex Factor.

917 Science 168: 1472–1473.

918 Easton, D. P., Y. Kaneko, and J. R. Subjeck, 2000 The Hsp110 and Grp170 stress proteins: newly

919 recognized relatives of the Hsp70s. Cell Stress Chaperones 5: 276–290.

920 Engin, H. B., D. Carlin, D. Pratt, and H. Carter, 2017 Modeling of RAS complexes supports roles in cancer

921 for less studied partners. BMC Biophys. 10:.

922 Eser, U., D. Chandler-Brown, F. Ay, A. F. Straight, Z. Duan et al., 2017 Form and function of topologically

923 associating genomic domains in budding yeast. Proc. Natl. Acad. Sci. 114: E3061–E3070.

924 Evensen, N. A., P. P. Madhusoodhan, J. Meyer, J. Saliba, A. Chowdhury et al., 2018 MSH6

925 haploinsufficiency at relapse contributes to the development of thiopurine resistance in

926 pediatric B-lymphoblastic leukemia. Haematologica.

927 Fang, Y., D. Fu, and X.-Z. Shen, 2010 The potential role of ubiquitin c-terminal hydrolases in oncogenesis.

928 Biochim. Biophys. Acta 1806: 1–6. 40

929 Fang, Z., C. Gong, S. Yu, W. Zhou, W. Hassan et al., 2018 NFYB-induced high expression of E2F1

930 contributes to oxaliplatin resistance in colorectal cancer via the enhancement of CHK1 signaling.

931 Cancer Lett. 415: 58–72.

932 Fang, H., J. Kang, R. Du, X. Zhao, X. Zhang et al., 2015 Growth inhibitory effect of adenovirus-mediated

933 tissue-targeted expression of ribosomal protein L23 on human colorectal carcinoma cells. Oncol.

934 Rep. 34: 763–770.

935 Faria, J. A. Q. A., N. C. R. Corrêa, C. de Andrade, A. C. de Angelis Campos, R. Dos Santos Samuel de

936 Almeida et al., 2013 SET domain-containing Protein 4 (SETD4) is a Newly Identified Cytosolic and

937 Nuclear Lysine Methyltransferase involved in Breast Cancer Cell Proliferation. J. Cancer Sci. Ther.

938 5: 58–65.

939 Fasullo, M., and M. Sun, 2008 The Saccharomyces cerevisiae checkpoint genes RAD9, CHK1 and PDS1 are

940 required for elevated homologous recombination in a mec1 (ATR) hypomorphic mutant. Cell

941 Cycle Georget. Tex 7: 2418–2426.

942 Fernández-Sáiz, V., B.-S. Targosz, S. Lemeer, R. Eichner, C. Langer et al., 2013 SCFFbxo9 and CK2 direct

943 the cellular response to growth factor withdrawal via Tel2/Tti1 degradation and promote

944 survival in multiple myeloma. Nat. Cell Biol. 15: 72–81.

945 Gagou, M. E., A. Ganesh, G. Phear, D. Robinson, E. Petermann et al., 2014 Human PIF1 helicase supports

946 DNA replication and cell growth under oncogenic-stress. Oncotarget 5: 11381–11398.

947 Gao, C., Y. Su, J. Koeman, E. Haak, K. Dykema et al., 2016 Chromosome instability drives phenotypic

948 switching to metastasis. Proc. Natl. Acad. Sci. U. S. A. 113: 14793–14798.

949 Gayed, M., S. Bernatsky, R. Ramsey-Goldman, A. Clarke, and C. Gordon, 2009 Lupus and cancer. Lupus

950 18: 479–485.

951 Giaever, G., A. M. Chu, L. Ni, C. Connelly, L. Riles et al., 2002 Functional profiling of the Saccharomyces

952 cerevisiae genome. Nature 418: 387–391. 41

953 Giaever, G., and C. Nislow, 2014 The yeast deletion collection: a decade of functional genomics. Genetics

954 197: 451–465.

955 Giam, M., and G. Rancati, 2015 Aneuploidy and chromosomal instability in cancer: a jackpot to chaos.

956 Cell Div. 10:.

957 Guest, S. T., Z. R. Kratche, A. Bollig-Fischer, R. Haddad, and S. P. Ethier, 2015 Two members of the TRiC

958 chaperonin complex, CCT2 and TCP1 are essential for survival of breast cancer cells and are

959 linked to driving oncogenes. Exp. Cell Res. 332: 223–235.

960 Gutierrez-Erlandsson, S., P. Herrero-Vidal, M. Fernandez-Alfara, S. Hernandez-Garcia, S. Gonzalo-Flores

961 et al., 2013 R-RAS2 overexpression in tumors of the human central nervous system. Mol. Cancer

962 12: 127.

963 Haber, J. E., 2012 Mating-type genes and MAT switching in Saccharomyces cerevisiae. Genetics 191: 33–

964 64.

965 Harrison, J. C., and J. E. Haber, 2006 Surviving the breakup: the DNA damage checkpoint. Annu. Rev.

966 Genet. 40: 209–235.

967 Hattori, A., M. Tsunoda, T. Konuma, M. Kobayashi, T. Nagy et al., 2017 Cancer progression by

968 reprogrammed BCAA metabolism in myeloid leukaemia. Nature 545: 500–504.

969 He, W., L. Chen, K. Yuan, Q. Zhou, L. Peng et al., 2018 Gene set enrichment analysis and meta-analysis to

970 identify six key genes regulating and controlling the prognosis of esophageal squamous cell

971 carcinoma. J. Thorac. Dis. 10: 5714-5726–5726.

972 He, X., Z. Liu, Y. Peng, and C. Yu, 2016 MicroRNA-181c inhibits glioblastoma cell invasion, migration and

973 mesenchymal transition by targeting TGF-β pathway. Biochem. Biophys. Res. Commun. 469:

974 1041–1048.

975 He, W., and R. Parker, 2000 Functions of Lsm proteins in mRNA degradation and splicing. Curr. Opin. Cell

976 Biol. 12: 346–350. 42

977 Hennecke, S., J. Beck, K. Bornemann-Kolatzki, S. Neumann, H. Murua Escobar et al., 2015 Prevalence of

978 the Prefoldin Subunit 5 Gene Deletion in Canine Mammary Tumors. PloS One 10: e0131280.

979 Herskowitz, I., 1995 MAP kinase pathways in yeast: For mating and more. Cell 80: 187–197.

980 Hieter, P., C. Mann, M. Snyder, and R. W. Davis, 1985 Mitotic stability of yeast chromosomes: a colony

981 color assay that measures nondisjunction and chromosome loss. Cell 40: 381–392.

982 Hlaváč, V., V. Brynychová, R. Václavíková, M. Ehrlichová, D. Vrána et al., 2014 The role of cytochromes

983 p450 and aldo-keto reductases in prognosis of breast carcinoma patients. Medicine (Baltimore)

984 93: e255.

985 Huang, B., H. Zhou, X. Lang, and Z. Liu, 2014 siRNA‑induced ABCE1 silencing inhibits proliferation and

986 invasion of breast cancer cells. Mol. Med. Rep. 10: 1685–1690.

987 Hughes, T. R., C. J. Roberts, H. Dai, A. R. Jones, M. R. Meyer et al., 2000 Widespread aneuploidy revealed

988 by DNA microarray expression profiling. Nat. Genet. 25: 333–337.

989 Husni, R. E., A. Shiba-Ishii, T. Nakagawa, T. Dai, Y. Kim et al., 2019 DNA hypomethylation-related

990 overexpression of SFN, GORASP2 and ZYG11A is a novel prognostic biomarker for early stage

991 lung adenocarcinoma. Oncotarget 10: 1625–1636.

992 Jin, L., J. Chun, C. Pan, A. Kumar, G. Zhang et al., 2018 The PLAG1-GDH1 Axis Promotes Anoikis

993 Resistance and Tumor Metastasis through CamKK2-AMPK Signaling in LKB1-Deficient Lung

994 Cancer. Mol. Cell 69: 87-99.e7.

995 Jo, Y. S., H. R. Oh, M. S. Kim, N. J. Yoo, and S. H. Lee, 2016 Frameshift mutations of OGDH, PPAT and

996 PCCA genes in gastric and colorectal cancers. Neoplasma 63: 681–686.

997 Joseph, C., O. Macnamara, M. Craze, R. Russell, E. Provenzano et al., 2018 Mediator complex (MED) 7: a

998 biomarker associated with good prognosis in invasive breast cancer, especially ER+ luminal

999 subtypes. Br. J. Cancer 118: 1142–1151.

1000 Kacser, H., and J. A. Burns, 1981 The Molecular Basis of Dominance. Genetics 97: 639–666. 43

1001 Kim, J.-S., J. W. Chang, J. K. Park, and S.-G. Hwang, 2012 Increased aldehyde reductase expression

1002 mediates acquired radioresistance of laryngeal cancer cells via modulating p53. Cancer Biol.

1003 Ther. 13: 638–646.

1004 Kim, M. S., E. G. Jeong, Y. J. Chung, N. J. Yoo, and S. H. Lee, 2008 Absence of somatic mutation of a

1005 tumor suppressor gene eukaryotic translation elongation factor 1, epsilon-1 (EEF1E1), in

1006 common human cancers. APMIS 116: 832–833.

1007 Kim, C. G., H. Lee, N. Gupta, S. Ramachandran, I. Kaushik et al., 2018 Role of Forkhead Box Class O

1008 proteins in cancer progression and metastasis. Semin. Cancer Biol. 50: 142–151.

1009 Knudson, A. G., 1993 Antioncogenes and human cancer. Proc. Natl. Acad. Sci. U. S. A. 90: 10914–10921.

1010 Knudson, A. G., 1971 Mutation and Cancer: Statistical Study of Retinoblastoma. Proc. Natl. Acad. Sci. U.

1011 S. A. 68: 820–823.

1012 Kobayashi, M., K. Harada, M. Negishi, and H. Katoh, 2014 Dock4 forms a complex with SH3YL1 and

1013 regulates cancer cell migration. Cell. Signal. 26: 1082–1088.

1014 Kruglyak, S., and H. Tang, 2000 Regulation of adjacent yeast genes. Trends Genet. TIG 16: 109–111.

1015 Kurbasic, E., M. Sjöström, M. Krogh, E. Folkesson, D. Grabau et al., 2015 Changes in glycoprotein

1016 expression between primary breast tumour and synchronous lymph node metastases or

1017 asynchronous distant metastases. Clin. Proteomics 12: 13.

1018 Lee, S. T., M. Feng, Y. Wei, Z. Li, Y. Qiao et al., 2013 Protein tyrosine phosphatase UBASH3B is

1019 overexpressed in triple-negative breast cancer and promotes invasion and metastasis. Proc.

1020 Natl. Acad. Sci. U. S. A. 110: 11121–11126.

1021 Lee, M., and A. R. Venkitaraman, 2014 A cancer-associated mutation inactivates a region of the high-

1022 mobility group protein HMG20b essential for cytokinesis. Cell Cycle 13: 2554–2563.

1023 Lehner, K., M. M. Stone, R. A. Farber, and T. D. Petes, 2007 Ninety-six haploid yeast strains with

1024 individual disruptions of open reading frames between YOR097C and YOR192C, constructed for 44

1025 the Saccharomyces genome deletion project, have an additional mutation in the mismatch

1026 repair gene MSH3. Genetics 177: 1951–1953.

1027 Leng, J.-J., H.-M. Tan, K. Chen, W.-G. Shen, and J.-W. Tan, 2012 Growth-inhibitory effects of MOB2 on

1028 human hepatic carcinoma cell line SMMC-7721. World J. Gastroenterol. 18: 7285–7289.

1029 Lengauer, C., K. W. Kinzler, and B. Vogelstein, 1998 Genetic instabilities in human cancers. Nature 396:

1030 643–649.

1031 Leone, P. E., M. Mendiola, J. Alonso, C. Paz-y-Miño, and A. Pestaña, 2003 Implications of a RAD54L

1032 polymorphism (2290C/T) in human meningiomas as a risk factor and/or a genetic marker. BMC

1033 Cancer 3: 6.

1034 Levine, A. J., 1993 The Tumor Suppressor Genes. Annu. Rev. Biochem. 62: 623–651.

1035 Li, L., Y. Cao, H. Zhou, Y. Li, B. He et al., 2018 Knockdown of CCNO decreases the tumorigenicity of gastric

1036 cancer by inducing apoptosis. OncoTargets Ther. 11: 7471–7481.

1037 Li, M., C. Jin, M. Xu, L. Zhou, D. Li et al., 2017a Bifunctional enzyme ATIC promotes propagation of

1038 hepatocellular carcinoma by regulating AMPK-mTOR-S6 K1 signaling. Cell Commun. Signal. CCS

1039 15: 52.

1040 Li, Z., L. Kong, L. Yu, J. Huang, K. Wang et al., 2014 Association between MSH6 G39E polymorphism and

1041 cancer susceptibility: a meta-analysis of 7,046 cases and 34,554 controls. Tumour Biol. J. Int.

1042 Soc. Oncodevelopmental Biol. Med. 35: 6029–6037.

1043 Li, Y., Q. Liang, Y. -q Wen, L. -l Chen, L. -t Wang et al., 2010 Comparative proteomics analysis of human

1044 osteosarcomas and benign tumor of bone. Cancer Genet. Cytogenet. 198: 97–106.

1045 Li, C., V. W. Liu, P. M. Chiu, D. W. Chan, and H. Y. Ngan, 2012 Over-expressions of AMPK subunits in

1046 ovarian carcinomas with significant clinical implications. BMC Cancer 12: 357.

1047 Li, Y., N. Sun, Z. Lu, S. Sun, J. Huang et al., 2017b Prognostic alternative mRNA splicing signature in non-

1048 small cell lung cancer. Cancer Lett. 393: 40–51. 45

1049 Li, A., Q. Yan, X. Zhao, J. Zhong, H. Yang et al., 2016 Decreased expression of PBLD correlates with poor

1050 prognosis and functions as a tumor suppressor in human hepatocellular carcinoma. Oncotarget

1051 7: 524–537.

1052 Lin, L.-H., Y.-W. Xu, L.-S. Huang, C.-Q. Hong, T.-T. Zhai et al., 2017 Serum proteomic-based analysis

1053 identifying autoantibodies against PRDX2 and PRDX3 as potential diagnostic biomarkers in

1054 nasopharyngeal carcinoma. Clin. Proteomics 14: 6.

1055 Liu, J.-W., J. K. Nagpal, W. Sun, J. Lee, M. S. Kim et al., 2008 ssDNA-binding protein 2 is frequently

1056 hypermethylated and suppresses cell growth in human prostate cancer. Clin. Cancer Res. Off. J.

1057 Am. Assoc. Cancer Res. 14: 3754–3760.

1058 Loeb, L. A., K. R. Loeb, and J. P. Anderson, 2003 Multiple mutations and cancer. Proc. Natl. Acad. Sci.

1059 100: 776–781.

1060 Mager, W. H., and J. Winderickx, 2005 Yeast as a model for medical and medicinal research. Trends

1061 Pharmacol. Sci. 26: 265–273.

1062 Maleva Kostovska, I., J. Wang, N. Bogdanova, P. Schürmann, S. Bhuju et al., 2016 Rare ATAD5 missense

1063 variants in breast and ovarian cancer patients. Cancer Lett. 376: 173–177.

1064 Manente, M., and M. Ghislain, 2009 The lipid-translocating exporter family and membrane phospholipid

1065 homeostasis in yeast. FEMS Yeast Res. 9: 673–687.

1066 Maragozidis, P., E. Papanastasi, D. Scutelnic, A. Totomi, I. Kokkori et al., 2015 Poly(A)-specific

1067 ribonuclease and Nocturnin in squamous cell lung cancer: prognostic value and impact on gene

1068 expression. Mol. Cancer 14:.

1069 Mason, J. M., H. L. Logan, B. Budke, M. Wu, M. Pawlowski et al., 2014 The RAD51-stimulatory compound

1070 RS-1 can exploit the RAD51 overexpression that exists in cancer cells and tumors. Cancer Res.

1071 74: 3546–3555. 46

1072 Ni, S., W. Weng, M. Xu, Q. Wang, C. Tan et al., 2018 miR-106b-5p inhibits the invasion and metastasis of

1073 colorectal cancer by targeting CTSA. OncoTargets Ther. 11: 3835–3845.

1074 Nicolas, E., E. A. Golemis, and S. Arora, 2016 POLD1: Central mediator of DNA replication and repair, and

1075 implication in cancer and other pathologies. Gene 590: 128–141.

1076 Ono, B., N. Ishii, S. Fujino, and I. Aoyama, 1991 Role of hydrosulfide ions (HS-) in methylmercury

1077 resistance in Saccharomyces cerevisiae. Appl. Environ. Microbiol. 57: 3183–3186.

1078 Oo, H. Z., K. Sentani, S. Mukai, T. Hattori, S. Shinmei et al., 2016 Fukutin, identified by the Escherichia

1079 coli ampicillin secretion trap (CAST) method, participates in tumor progression in gastric cancer.

1080 Gastric Cancer Off. J. Int. Gastric Cancer Assoc. Jpn. Gastric Cancer Assoc. 19: 443–452.

1081 Oo, H. Z., K. Sentani, N. Sakamoto, K. Anami, Y. Naito et al., 2014 Identification of novel transmembrane

1082 proteins in scirrhous-type gastric cancer by the Escherichia coli ampicillin secretion trap (CAST)

1083 method: TM9SF3 participates in tumor invasion and serves as a prognostic factor. Pathobiol. J.

1084 Immunopathol. Mol. Cell. Biol. 81: 138–148.

1085 Pan, J., D. Cao, and J. Gong, 2018 The endoplasmic reticulum co-chaperone ERdj3/DNAJB11 promotes

1086 hepatocellular carcinoma progression through suppressing AATZ degradation. Future Oncol.

1087 Lond. Engl. 14: 3001–3013.

1088 Patra, K. C., Q. Wang, P. T. Bhaskar, L. Miller, Z. Wang et al., 2013 Hexokinase 2 is required for tumor

1089 initiation and maintenance and its systemic deletion is therapeutic in mouse models of cancer.

1090 Cancer Cell 24: 213–228.

1091 Payne, S. R., and C. J. Kemp, 2005 Tumor suppressor genetics. Carcinogenesis 26: 2031–2045.

1092 Protchenko, O., M. Shakoury-Elizeh, P. Keane, J. Storey, R. Androphy et al., 2008 Role of PUG1 in

1093 Inducible Porphyrin and Heme Transport in Saccharomyces cerevisiae. Eukaryot. Cell 7: 859–

1094 871. 47

1095 Provost, E., J. M. Bailey, S. Aldrugh, S. Liu, C. Iacobuzio-Donahue et al., 2014 The Tumor Suppressor rpl36

1096 Restrains KRASG12V-Induced Pancreatic Cancer. Zebrafish 11: 551–559.

1097 Qi, T., W. Zhang, Y. Luan, F. Kong, D. Xu et al., 2014 Proteomic profiling identified multiple short-lived

1098 members of the central proteome as the direct targets of the addicted oncogenes in cancer

1099 cells. Mol. Cell. Proteomics MCP 13: 49–62.

1100 Qin, Y., Q. Hu, J. Xu, S. Ji, W. Dai et al., 2019 PRMT5 enhances tumorigenicity and glycolysis in pancreatic

1101 cancer via the FBW7/cMyc axis. Cell Commun. Signal. 17: 30.

1102 Qiu, X., X. He, Q. Huang, X. Liu, G. Sun et al., 2015 Overexpression of CCT8 and its significance for tumor

1103 cell proliferation, migration and invasion in glioma. Pathol. - Res. Pract. 211: 717–725.

1104 Rahman, M. R., T. Islam, E. Gov, B. Turanli, G. Gulfidan et al., 2019 Identification of Prognostic Biomarker

1105 Signatures and Candidate Drugs in Colorectal Cancer: Insights from Systems Biology Analysis.

1106 Med. Kaunas Lith. 55:.

1107 Reid, R. J. D., X. Du, I. Sunjevaric, V. Rayannavar, J. Dittmar et al., 2016 A Synthetic Dosage Lethal Genetic

1108 Interaction Between CKS1B and PLK1 Is Conserved in Yeast and Human Cancer Cells. Genetics

1109 204: 807–819.

1110 Rivals, I., L. Personnaz, L. Taing, and M.-C. Potier, 2007 Enrichment or depletion of a GO category within

1111 a class of genes: which test? Bioinformatics 23: 401–407.

1112 Romero-Garcia, S., H. Prado-Garcia, and J. S. Lopez-Gonzalez, 2014 Transcriptional analysis of hnRNPA0,

1113 A1, A2, B1, and A3 in lung cancer cell lines in response to acidosis, hypoxia, and serum

1114 deprivation conditions. Exp. Lung Res. 40: 12–21.

1115 Sabatti, C., S. Service, and N. Freimer, 2003 False discovery rate in linkage and association genome

1116 screens for complex disorders. Genetics 164: 829–833.

1117 Saritha, V. N., V. S. Veena, K. M. Jagathnath Krishna, T. Somanathan, and K. Sujathan, 2018 Significance

1118 of DNA Replication Licensing Proteins (MCM2, MCM5 and CDC6), p16 and p63 as Markers of 48

1119 Premalignant Lesions of the Uterine Cervix: Its Usefulness to Predict Malignant Potential. Asian

1120 Pac. J. Cancer Prev. APJCP 19: 141–148.

1121 Saveanu, C., M. Fromont-Racine, A. Harington, F. Ricard, A. Namane et al., 2001 Identification of 12 new

1122 yeast mitochondrial ribosomal proteins including 6 that have no prokaryotic homologues. J. Biol.

1123 Chem. 276: 15861–15867.

1124 Schmidlin Tom, Kaeberlein Matt, Kudlow Brian A., MacKay Vivian, Lockshon Daniel et al., 2007 Single‐

1125 gene deletions that restore mating competence to diploid yeast. FEMS Yeast Res. 8: 276–286.

1126 Shafique, S., and S. Rashid, 2018 Structural basis for renal cancer by the dynamics of pVHL-dependent

1127 JADE1 stabilization and β-catenin regulation. Prog. Biophys. Mol. Biol.

1128 Shan, N., W. Zhou, S. Zhang, and Y. Zhang, 2016 Identification of HSPA8 as a candidate biomarker for

1129 endometrial carcinoma by using iTRAQ-based proteomic analysis. OncoTargets Ther.

1130 Shaner, L., H. Wegele, J. Buchner, and K. A. Morano, 2005 The yeast Hsp110 Sse1 functionally interacts

1131 with the Hsp70 chaperones Ssa and Ssb. J. Biol. Chem. 280: 41262–41269.

1132 Shen, X., S. Kan, J. Hu, M. Li, G. Lu et al., 2016 EMC6/TMEM93 suppresses glioblastoma proliferation by

1133 modulating autophagy. Cell Death Dis. 7: e2043.

1134 Shi, S. Q., J.-J. Ke, Q. S. Xu, W. Q. Wu, and Y. Y. Wan, 2018 Integrated network analysis to identify the key

1135 genes, transcription factors, and microRNAs involved in hepatocellular carcinoma. Neoplasma

1136 65: 66–74.

1137 Shimozono, N., M. Jinnin, M. Masuzawa, M. Masuzawa, Z. Wang et al., 2015 NUP160-SLC43A3 is a novel

1138 recurrent fusion oncogene in angiosarcoma. Cancer Res. 75: 4458–4465.

1139 Siołek, M., C. Cybulski, D. Gąsior-Perczak, A. Kowalik, B. Kozak-Klonowska et al., 2015 CHEK2 mutations

1140 and the risk of papillary thyroid cancer. Int. J. Cancer 137: 548–552. 49

1141 Song, H., E. Dicks, S. J. Ramus, J. P. Tyrer, M. P. Intermaggio et al., 2015 Contribution of Germline

1142 Mutations in the RAD51B, RAD51C, and RAD51D Genes to Ovarian Cancer in the Population. J.

1143 Clin. Oncol. 33: 2901–2907.

1144 Song, L., Y. Wang, J. Zhang, N. Song, X. Xu et al., 2018 The risks of cancer development in systemic lupus

1145 erythematosus (SLE) patients: a systematic review and meta-analysis. Arthritis Res. Ther. 20:.

1146 Stirling, P. C., M. S. Bloom, T. Solanki-Patil, S. Smith, P. Sipahimalani et al., 2011 The Complete Spectrum

1147 of Yeast Chromosome Instability Genes Identifies Candidate CIN Cancer Genes and Functional

1148 Roles for ASTRA Complex Components. PLoS Genet. 7:.

1149 Strome, E. D., X. Wu, M. Kimmel, and S. E. Plon, 2008 Heterozygous Screen in Saccharomyces cerevisiae

1150 Identifies Dosage-Sensitive Genes That Affect Chromosome Stability. Genetics 178: 1193–1207.

1151 Sun, M., L. Ma, L. Xu, J. Li, W. Zhang et al., 2002 A human novel gene DERPC on 16q22.1 inhibits prostate

1152 tumor cell growth and its expression is decreased in prostate and renal tumors. Mol. Med. 8:

1153 655–663.

1154 Sung, Y., S. Park, S. J. Park, J. Jeong, M. Choi et al., 2018 Jazf1 promotes prostate cancer progression by

1155 activating JNK/Slug. Oncotarget 9: 755–765.

1156 Taguchi, A., J.-H. Rho, Q. Yan, Y. Zhang, Y. Zhao et al., 2015 MAPRE1 as a plasma biomarker for early-

1157 stage colorectal cancer and adenomas. Cancer Prev. Res. Phila. Pa 8: 1112–1119.

1158 Tan, S.-X., R.-C. Hu, J.-J. Liu, Y.-L. Tan, and W.-E. Liu, 2014 Methylation of PRDM2, PRDM5 and PRDM16

1159 genes in lung cancer cells. Int. J. Clin. Exp. Pathol. 7: 2305–2311.

1160 The, I., S. Ruijtenberg, B. P. Bouchet, A. Cristobal, M. B. W. Prinsen et al., 2015 Rb and FZR1/Cdh1

1161 determine CDK4/6-cyclin D requirement in C. elegans and human cancer cells. Nat. Commun. 6:

1162 5906. 50

1163 Thiagalingam, S., R. L. Foy, K. Cheng, H. J. Lee, A. Thiagalingam et al., 2002 Loss of heterozygosity as a

1164 predictor to map tumor suppressor genes in cancer: molecular basis of its occurrence. Curr.

1165 Opin. Oncol. 14: 65–72.

1166 Tina, E., S. Prosén, S. Lennholm, G. Gasparyan, M. Lindberg et al., 2019 Expression profile of the amino

1167 acid transporters SLC7A5, SLC7A7, SLC7A8 and the enzyme TDO2 in basal cell carcinoma. Br. J.

1168 Dermatol. 180: 130–140.

1169 Tóth, C., F. Sükösd, E. Valicsek, E. Herpel, P. Schirmacher et al., 2017 Expression of ERCC1, RRM1, TUBB3

1170 in correlation with apoptosis repressor ARC, DNA mismatch repair proteins and p53 in liver

1171 metastasis of colorectal cancer. Int. J. Mol. Med. 40: 1457–1465.

1172 Trotman, L. C., M. Niki, Z. A. Dotan, J. A. Koutcher, A. D. Cristofano et al., 2003 Pten Dose Dictates Cancer

1173 Progression in the Prostate. PLOS Biol. 1: e59.

1174 Veitia, R. A., 2002 Exploring the etiology of haploinsufficiency. BioEssays 24: 175–184.

1175 Vladusic, T., R. Hrascan, N. Pecina-Slaus, I. Vrhovac, M. Gamulin et al., 2010 Loss of heterozygosity of

1176 CDKN2A (p16INK4a) and RB1 tumor suppressor genes in testicular germ cell tumors. Radiol.

1177 Oncol. 44: 168–173.

1178 Wang, C., J.-F. Chang, H. Yan, D.-L. Wang, Y. Liu et al., 2015 A conserved RAD6-MDM2 ubiquitin ligase

1179 machinery targets histone chaperone ASF1A in tumorigenesis. Oncotarget 6: 29599–29613.

1180 Wang, J., W. Chen, W. Wei, and J. Lou, 2017a Oncogene TUBA1C promotes migration and proliferation

1181 in hepatocellular carcinoma and predicts a poor prognosis. Oncotarget 8: 96215–96224.

1182 Wang, T., M. A. Goodman, R. L. McGough, K. R. Weiss, and U. N. M. Rao, 2014a Immunohistochemical

1183 Analysis of Expressions of RB1, CDK4, HSP90, cPLA2G4A, and CHMP2B Is Helpful in Distinction

1184 between Myxofibrosarcoma and Myxoid Liposarcoma. Int. J. Surg. Pathol. 22: 589–599. 51

1185 Wang, Y., Y.-X. Qi, Z. Qi, and S.-Y. Tsang, 2019 TRPC3 Regulates the Proliferation and Apoptosis

1186 Resistance of Triple Negative Breast Cancer Cells through the TRPC3/RASA4/MAPK Pathway.

1187 Cancers 11:.

1188 Wang, J., J. Qian, M. D. Hoeksema, Y. Zou, A. V. Espinosa et al., 2013 Integrative genomics analysis

1189 identifies candidate drivers at 3q26-29 amplicon in squamous cell carcinoma of the lung. Clin.

1190 Cancer Res. Off. J. Am. Assoc. Cancer Res. 19: 5580–5590.

1191 Wang, Y., F. Ren, Y. Wang, Y. Feng, D. Wang et al., 2014b CHIP/Stub1 functions as a tumor suppressor

1192 and represses NF-κB-mediated signaling in colorectal cancer. Carcinogenesis 35: 983–991.

1193 Wang, Y., N. Wu, B. Pang, D. Tong, D. Sun et al., 2017b TRIB1 promotes colorectal cancer cell migration

1194 and invasion through activation MMP-2 via FAK/Src and ERK pathways. Oncotarget 8: 47931–

1195 47942.

1196 Wang, L., M. Zhang, and D.-X. Liu, 2014c Knock-down of ABCE1 gene induces G1/S arrest in human oral

1197 cancer cells. Int. J. Clin. Exp. Pathol. 7: 5495–5504.

1198 Wu, Y., A. Wang, B. Zhu, J. Huang, E. Lu et al., 2018 KIF18B promotes tumor progression through

1199 activating the Wnt/β-catenin pathway in cervical cancer. OncoTargets Ther. 11: 1707–1720.

1200 Xie, Y., A. Wang, J. Lin, L. Wu, H. Zhang et al., 2017 Mps1/TTK: a novel target and biomarker for cancer. J.

1201 Drug Target. 25: 112–118.

1202 Yamada, K. M., and A. Hall, 2015 Reproducibility and cell biology. J Cell Biol 209: 191–193.

1203 Yamaguchi, K., R. Yamaguchi, N. Takahashi, T. Ikenoue, T. Fujii et al., 2014 Overexpression of Cohesion

1204 Establishment Factor DSCC1 through E2F in Colorectal Cancer (S. P. Chellappan, Ed.). PLoS ONE

1205 9: e85750.

1206 Yang, Y., Y. Gao, L. Mutter-Rottmayer, A. Zlatanou, M. Durando et al., 2017 DNA repair factor RAD18 and

1207 DNA polymerase Polκ confer tolerance of oncogenic DNA replication stress. J. Cell Biol. 216:

1208 3097–3115. 52

1209 Yang, F., J. Xu, H. Li, M. Tan, X. Xiong et al., 2019 FBXW2 suppresses migration and invasion of lung

1210 cancer cells via promoting β-catenin ubiquitylation and degradation. Nat. Commun. 10: 1382.

1211 Yang, Z., L. Zhuang, P. Szatmary, L. Wen, H. Sun et al., 2015 Upregulation of heat shock proteins

1212 (HSPA12A, HSP90B1, HSPA4, HSPA5 and HSPA6) in tumour tissues is associated with poor

1213 outcomes from HBV-related early-stage hepatocellular carcinoma. Int. J. Med. Sci. 12: 256–263.

1214 Yuen, K. W. Y., C. D. Warren, O. Chen, T. Kwok, P. Hieter et al., 2007 Systematic genome instability

1215 screens in yeast and their potential relevance to cancer. Proc. Natl. Acad. Sci. U. S. A. 104: 3925–

1216 3930.

1217 Yunlei, Z., C. Zhe, L. Yan, W. Pengcheng, Z. Yanbo et al., 2013 INMAP, a novel truncated version of

1218 POLR3B, represses AP-1 and p53 transcriptional activity. Mol. Cell. Biochem. 374: 81–89.

1219 Zhang, X., and T. F. Smith, 1998 Yeast “Operons.” Microb. Comp. Genomics 3: 133–140.

1220 Zhang, Z., F. Wang, C. Du, H. Guo, L. Ma et al., 2017 BRM/SMARCA2 promotes the proliferation and

1221 chemoresistance of pancreatic cancer cells by targeting JAK2/STAT3 signaling. Cancer Lett. 402:

1222 213–224.

1223 Zhang, Y.-F., B.-C. Zhang, A.-R. Zhang, T.-T. Wu, J. Liu et al., 2013 Co-transduction of ribosomal protein

1224 L23 enhances the therapeutic efficacy of adenoviral-mediated p53 gene transfer in human

1225 gastric cancer. Oncol. Rep. 30: 1989–1995.

1226 Zheng, Q., 2008 A note on plating efficiency in fluctuation experiments. Math. Biosci. 216: 150–153.

1227 Zheng, Q., 2016 Comparing mutation rates under the Luria-Delbrück protocol. Genetica 144: 351–359.

1228 Zheng, Q., 2015 Methods for comparing mutation rates using fluctuation assay data. Mutat. Res. 777:

1229 20–22.

1230 Zheng, Q., 2002 Statistical and algorithmic methods for fluctuation analysis with SALVADOR as an

1231 implementation. Math. Biosci. 176: 237–252. 53

1232 Zhong, A., H. Zhang, S. Xie, M. Deng, H. Zheng et al., 2018 Inhibition of MUS81 improves the chemical

1233 sensitivity of olaparib by regulating MCM2 in epithelial ovarian cancer. Oncol. Rep.

1234 Zhou, W., X. Feng, C. Ren, X. Jiang, W. Liu et al., 2013 Over-expression of BCAT1, a c-Myc target gene,

1235 induces cell proliferation, migration and invasion in nasopharyngeal carcinoma. Mol. Cancer 12:

1236 53.

1237 Zhou, S., Y. Shen, M. Zheng, L. Wang, R. Che et al., 2017a DNA methylation of METTL7A gene body

1238 regulates its transcriptional level in thyroid cancer. Oncotarget 8: 34652–34660.

1239 Zhou, P., N. Xiao, J. Wang, Z. Wang, S. Zheng et al., 2017b SMC1A recruits tumor-associated-fibroblasts

1240 (TAFs) and promotes colorectal cancer metastasis. Cancer Lett. 385: 39–45.

1241 Zhu, J., D. Heinecke, W. A. Mulla, W. D. Bradford, B. Rubinstein et al., 2015 Single-Cell Based

1242 Quantitative Assay of Chromosome Transmission Fidelity. G3 Bethesda Md 5: 1043–1056.

1243 Zhu, Y., H. Lin, Z. Li, M. Wang, and J. Luo, 2001 Modulation of expression of ribosomal protein L7a

1244 (rpL7a) by ethanol in human breast cancer cells. Breast Cancer Res. Treat. 69: 29–38.

1245

1246

1247 Fig 1. Representative LOH events impacting the MAT locus that can result in diploids

1248 exhibiting haploid mating behavior Due to the co-repressible nature of the MAT locus, both

1249 MATa and MATα alleles must be present and active in order to suppress the production of

1250 mating pheromones and receptors, leading to the non-mating diploid phenotype.

1251 Fig 2. Theory and set-up of LOH screen at the MAT locus A) Example of a possible triploid

1252 cell formation mechanism that would allow for growth under double selection conditions. The

1253 LOH event could occur through a variety of mechanisms, all leading to the ability of the diploid

1254 cell to mate. B) Model of the MAT locus LOH screening methodology. C) The scoring system of 54

1255 colony counts utilized in the screening. Strains demonstrating a minimum score of “+++” in all

1256 four replicates for mating with a particular haploid mating type were included as a top-hit.

1257 Fig 3. Multiple independent screen selection and gene lists comparison A) Data sets were

1258 compared across screens that either assayed heterozygous deletions, assayed for LOH events,

1259 or both. Screens looking at haploinsufficiency and its connection to genome instability, or

1260 utilizing homozygous knockouts to understand LOH mechanisms were chosen as they provide

1261 the most relevant data sets for comparison. B) Gene lists from the six screens selected for

1262 comparison were mapped for their overlapping top-hits. Gene deletions identified in two

1263 independent studies are shown in burgundy, whereas genes appearing as hits in three

1264 independent studies are shown in pink. If a particular gene deletion reproduced in multiple

1265 screens within the same publication, the solid color was changed to a striped pattern. Only

1266 screens performed in a diploid system were considered when determining the number of

1267 screens a gene reproduced in.

1268 Fig 4. Positional Gene Enrichment (PGE) analysis for enriched chromosome regions

1269 Genes highlighted in green were identified by the MAT screen; genes highlighted in orange

1270 were identified by the MAT and MET15 screens; genes highlighted in blue were identified by

1271 one independent study (studies selected for comparison are shown in Fig. 3); genes highlighted

1272 in burgundy were identified in two independent studies; genes highlighted in pink were identified

1273 by three independent studies. The key denoting color labels remains the same throughout all

1274 parts of the figure. A) Enriched regions of chromosome II from PGE analysis on MAT locus

-4 -6 1275 screen top-hits (padj = 2.74 x 10 ), MAT + MET15 top-hits (padj = 9.69 x 10 ), and Multiple

1276 Screen Comparison top-hits (six screens compared in Fig. 3), (BP region 442918-575991) (padj

1277 = p=0.016). B) Enriched regions of chromosome V from PGE analysis on MAT locus screen top-

-4 -6 1278 hits (padj = 2.31 x 10 ), MAT + MET15 top-hits (p = 9.69 x 10 ), and Multiple Screen

-5 1279 Comparison top-hits (padj = 2.03x10 ). C) Enriched regions of chromosome VII from PGE 55

1280 analysis on MAT locus screen top-hits (padj = 0.00358). No enriched regions of 3 or more

1281 identified genes were enriched on this chromosome when the MAT + MET15 or Multiple Screen

1282 Comparison datasets were analyzed. D) Enriched regions of chromosome IX from PGE analysis

-4 1283 on MAT locus screen top-hits (padj = 4.40 x 10 ), MAT + MET15 top-hits (padj = 0.00194), and

1284 Multiple Screen Comparison top-hits (padj = 0.0299). E) Enriched regions of chromosome X from

1285 PGE analysis on MAT locus screen top-hits (padj = 0.00901) and Multiple Screen Comparison

-4 1286 top-hits (padj = 4.18x10 ). No regions containing three or more identified genes were enriched

1287 when our MAT locus dataset was run. F) Enriched region of chromosome XI from PGE analysis

1288 on MAT locus screen top-hits (padj = 0.00751) and the Multiple Screen Comparison top-hits (padj

1289 = 0.013). G) Enriched region of chromosome XV from PGE analysis on MAT locus screen top-

1290 hits (padj = 0.00288), and MAT + MET15 top-hits (padj = 0.00332). No enriched regions containing

1291 more than three identified genes are found on chromosome XV for the Multiple Screen

1292 Comparison analysis. H) Enriched region of chromosome XVI from PGE analysis on Multiple

1293 Screen Comparison dataset (padj = 0.046).

1294 Figure 5. LOH rates at the CAN1 locus due to nine separate heterozygous gene mutations

1295 The data shown represents a combination of a minimum of two independent experiments. The

1296 black circle depicts the mean LOH rate with the tails showing the experimental 95% CIs. Non-

1297 overlapping 95% CIs, to the wildtype BY4743 strain, are considered significantly different as the

1298 95% CI overlap method mimics a two-tailed, two-population t-test at the conventional p < 0.05

1299 level with an improvement in type I error rate and statistical power when compared to a t-test,

1300 which has been found unsuitable for FA data analysis (Zheng 2015).

1301 Table 1. Multiple Screen Comparison Gene Identification Overlap A hypergeometric

1302 probability was calculated using a normal approximation using the webtool

1303 http://nemates.org/MA/progs/overlap_stats.html. The same tool was used to calculate a

1304 representation factor. A representation factor is calculated as the number of genes in common 56

1305 between two studies divided by the number of expected genes. The number of expected genes

1306 is estimated as the number of genes in the first study times the number of genes in the second

1307 study which is then divided by the total number of genes that were screened. Representation

1308 factors greater than 1 indicate more overlap than expected, representation factors less than 1

1309 indicate less overlap than expected.

1310 Table 2. Go Slim Mapper and Go Term Finder Results for A) 217 MAT Top Hits B) 100

1311 MAT + MET15 Top Hits Significantly enriched gene ontology categories identified with SGD

1312 Slim Mapper and SGD Term Finder tools are shown. SGD Term Finder reported multiple

1313 comparison corrected p-values < 0.05 are shown. P-values from SGD Slim Mapper were

1314 corrected with the Benjamini-Hochberg critical value. For SGD Slim Mapper, all significantly

1315 enriched categories with a uncorrected p-value < their Benjamini-Hochberg critical value (Q =

1316 0.05) are shown. Gene Ontology Identification numbers (GOID) and names of the genes that

1317 represent the enrichment are included.

1318 Table S1. Complete MAT locus LOH screen data Score data for every gene in the SCDChet

1319 for each of the individual experimental replicates.

1320 Table S2. Top hits identified in MAT locus LOH screen Two hundred and seventeen genes,

1321 when deleted, contribute to LOH at the MAT locus. Genes reported are listed based on the sum

1322 of the “+” scores of all four replicates of mating with a MATa or MATα haploid tester, with each

1323 mating type scored separately. Of the 217 genes identified, 17.5% of them are essential; genes

1324 that have systematic ORF identifiers but no standard name (generally attributed to a lack of

1325 understanding of their function) also make up 17.5% of the dataset. For a complete list of scores

1326 recorded for every gene in the collection, as well as genes excluded from the analysis, see

1327 Table S1. 57

1328 Table S3. Sector counts of the 217 SCDChet gene deletions analyzed in secondary

1329 MET15 screen Score data, for all 217 top-hits from the MAT locus screen, in the MET15

1330 sectoring screen for each of the individual experimental replicates.

1331 Table S4. GO Slim Mapper results for the 217 identified genes in the MAT locus screen

1332 SGD GO Slim Mapper results for Cellular Component, Molecular Function, Biological Process,

1333 and Macromolecular Complex reported with Benjamini-Hochberg critical values for multiple

1334 comparison correction.

1335 Table S5. GO Slim Mapper results for the 100 identified genes in the MET15 secondary

1336 screen SGD GO Slim Mapper results for Cellular Component, Molecular Function, Biological

1337 Process, and Macromolecular Complex reported with Benjamini-Hochberg critical values for

1338 multiple comparison correction.

1339 Table S6. GO Slim Mapper and GO Term Finder results for the 91 genes identified in two

1340 or more related independent studies SGD GO Slim Mapper and GO Term Finder results for

1341 Cellular Component, Molecular Function, Biological Process, and Macromolecular Complex

1342 reported with Benjamini-Hochberg critical values and corrected p-values for multiple comparison

1343 correction.

1344 Table S7. Positional Gene Enrichment analysis of the 217 MAT Top Hits, 100 MAT +

1345 MET15 Top hits, and 91 genes identified in 2+ independent studies All chromosomal

1346 regions reported from PGE analysis are included with basepair positioning and adjusted p-

1347 values. Further the number of elements identified and the total number of elements in the region

1348 are reported.

1349 Table S8. Human homologs and their associated cancer types for genes identified by the

1350 MAT locus screen Literature searches for cancer association were conducted for every MAT 58

1351 screen identified yeast gene with a known human homolog. The yeast systematic and standard

1352 names are included as well as the name of the human homolog and the citation for literature

1353 associating that human homolog with cancer. Whether the yeast genes were additionally

1354 identified in the MET15 screen is also noted. (Zhu et al. 2001; Chen et al. 2004; Kim et al.

1355 2008; Antonacopoulou et al. 2008; Liu et al. 2008; Datta 2009; Li et al. 2010; Fang et al.

1356 2010; Chang et al. 2011; Kim et al. 2012; Li et al. 2012; Leng et al. 2012; Fernández-Sáiz et

1357 al. 2013; Faria et al. 2013; Yunlei et al. 2013; Zhou et al. 2013; Lee et al. 2013; Patra et al.

1358 2013; Zhang et al. 2013; Wang et al. 2013; Gutierrez-Erlandsson et al. 2013; Oo et al. 2014;

1359 Tan et al. 2014; Wang et al. 2014c; Qi et al. 2014; Romero-Garcia et al. 2014; Brant and

1360 Leikauf 2014; Kobayashi et al. 2014; Wang et al. 2014b; Li et al. 2014; Boele et al. 2014;

1361 Lee and Venkitaraman 2014; Di Giammartino and Manley 2014; Provost et al. 2014;

1362 Hlaváč et al. 2014; Donato et al. 2015; Kurbasic et al. 2015; Yang et al. 2015; Guest et al.

1363 2015; Ciou et al. 2015; Fang et al. 2015; Wang et al. 2015; Qiu et al. 2015; Shimozono et al.

1364 2015; Chang et al. 2015; Chen et al. 2016; Jo et al. 2016; Li et al. 2016; Shen et al. 2016;

1365 He et al. 2016; Shan et al. 2016; Bai et al. 2016; Reid et al. 2016; Chen et al. 2017; Hattori

1366 et al. 2017; Lin et al. 2017; Xie et al. 2017; Zhang et al. 2017; Zhou et al. 2017b; Li et al.

1367 2017b; Zhou et al. 2017a; Wang et al. 2017b; Engin et al. 2017; Bausch et al. 2017; Wang

1368 et al. 2017a; Li et al. 2017a; Cho et al. 2018; Jin et al. 2018; Ni et al. 2018; Shi et al. 2018;

1369 Wu et al. 2018; Chou et al. 2018; Sung et al. 2018 p. 1; Chen et al. 2018; Zhong et al. 2018;

1370 Saritha et al. 2018; Evensen et al. 2018; Fang et al. 2018; Joseph et al. 2018; Ding et al.

1371 2018; Choi et al. 2018; Pan et al. 2018; Yang et al. 2019; Tina et al. 2019; Husni et al. 2019;

1372 Chen et al. 2019; Apasu et al. 2019; Wang et al. 2019).

1373 Table S9. Human homologs and their associated cancer types for genes identified by

1374 multiple independent studies Literature searches for cancer association were conducted for

1375 every multiple screen comparison identified yeast gene with a known human homolog. The 59

1376 yeast systematic and standard names are included as well as the name of the human homolog

1377 and the citation for literature associating that human homolog with cancer. (Abdel-Fatah et al.

1378 2014 p.; Hennecke et al. 2015; Leone et al. 2003; Mason et al. 2014; Sun et al. 2002;

1379 Taguchi et al. 2015; The et al. 2015 p. 1; Wang et al. 2015; Guest et al. 2015; Wang et al.

1380 2014c; Shan et al. 2016; Qi et al. 2014 p. 200; Saritha et al. 2018; Zhong et al. 2018; Kim et

1381 al. 2018; Datta 2009; Zhou et al. 2017b; Yang et al. 2015; Shi et al. 2018; Fang et al. 2018;

1382 Chen et al. 2004; Donato et al. 2015; Tina et al. 2019; Zhu et al. 2001; Patra et al. 2013;

1383 Song et al. 2015; Rahman et al. 2019; Diefenbacher and Orian 2017; Shafique and Rashid

1384 2018; Gagou et al. 2014; Nicolas et al. 2016; Wang et al. 2014a; Oo et al. 2016; Baldeyron

1385 et al. 2015; Babeu et al. 2019; Maleva Kostovska et al. 2016; Dong et al. 2018; Broustas et

1386 al. 2019; Qin et al. 2019; Huang et al. 2014; Dai et al. 2016; Chen et al. 2018; Ding et al.

1387 2018; Chen et al. 2019; Wang et al. 2019; Provost et al. 2014; Tina et al. 2019; Lee and

1388 Venkitaraman 2014; Maragozidis et al. 2015; Yamaguchi et al. 2014; Yang et al. 2017; He

1389 et al. 2018; Tóth et al. 2017 p. 3; Li et al. 2018; Bianco et al. 2019; Siołek et al. 2015)

1390 Table S10. Positional Gene Enrichment analysis of genes identified by similar

1391 independent studies All chromosomal regions reported from PGE analysis run on the full 898

1392 top-hits list compiled from all genes identified across all six screens compared in this study.

1393 Enriched regions are included with basepair positioning and adjusted p-values. Further the

1394 number of elements identified and the total number of elements in the region are reported. The

1395 systematic name of every gene in that region are included and color-coded based on which of

1396 the six screen identified each gene. Orange: (Strome et al. 2008); Yellow: (Choy et al. 2013);

1397 Pink: This Study; Green: (Schmidlin Tom et al. 2007); Blue:(Yuen et al. 2007); Purple:

1398 (Andersen et al. 2008). Systematics names in gray indicate the gene was identified in two

1399 independent studies, in black from three independent studies. 60

1400 Figure S1. Positional Gene Enrichment example figure Chr II Visual representation of the

1401 data of chromosome II PGE analysis including gene name, study which identified the gene, and

1402 type of screen/analysis performed (SCDChet, SCDChom, non-SCDC). The majority of genes

1403 representing the enriched region of chromosome II come from heterozygous screens, and most

1404 of these are found utilizing SCDChet. Table 1. Multiple Screen Comparison Gene Identification Overlap

Number of Total Number Mutants with Number Overlapping Representation Screen of Genes p-value Phenotype with This Study “hits” Factor Screened (“hits”) This Study 6477 217 (180*) - - - Choy et al. 2013 6477 332 26 0.00003995 2.3 Strome et al. 2008 6477 164 4 0.351 0.7 Andersen et al. 2008 5134 61 3 0.362 1.4 Yuen et al. 2006 5134 122 5 0.427 1.2 Schmidlin et al. 2007 5134 100 7 0.060 2.0 *217 total top-hit genes identified, 180 non-essential top-hit genes were used for comparison with homozygous knockout screens.

A hypergeometric probability was calculated using a normal approximation using the webtool http://nemates.org/MA/progs/overlap_stats.html. The same tool was used to calculate a representation factor. A representation factor is calculated as the number of genes in common between two studies divided by the number of expected genes. The number of expected genes is estimated as the number of genes in the first study times the number of genes in the second study which is then divided by the total number of genes that were screened. Representation factors greater than 1 indicate more overlap than expected, representation factors less than 1 indicate less overlap than expected.

Table 2. Go Slim Mapper and Go Term Finder Results

A. 217 MAT Top Hits List

Benjamini- Hochburg GO GO GO Term p-value critical Genes in term Categories Method value (Q = 0.05) SGD Slim - - - - Mapper Cellular Chaperonin- Component SGD Term containing T- 0.04889 - SSA1, CCT2, CCT8, CCT7 Finder complex (GOID: 5832) KIN3, IML3, MRC1, MPS1, SPC19, Chromosome SGD Slim 0.0004605 0.00049505 SMC1, KIP3, SPO22, STH1, CSM2, segregation Biological Mapper HSK3, CTF3, TUB1, NDJ1, SGO1, (GOID: 7059) Process KIN4, GPN2 SGD Term - - - - Finder YAL067W-A, YBR096W, VID24, IML3, YBR137W, YBR144C, APD1, UBS1, HSM3, LDB16, MRC1, BPH1, RMD1, QRI7, RGT2, YDL211C, YDR029W, MRH1, SSY1, UBX5, ECM11, YDR509W, EMI2, GRH1, ZRG8, YER087C-A, TMN3, DSE1, YER135C, BCK2, YER181C, PUG1, SNO3, YGL218W, MTC3, SHE10, NNF2, YGR201C, YHI9, MTC6, AIM18, Molecular YIL025C, YIL032C, MMF1, YIL060W, SGD Slim Function 0.0004997 0.001136 SPO22, AIM19, ICE2, CSM2, SYS1, Molecular Mapper Unknown YJL009W, PRY3, PRM10, YJL120W, Function (GOID: 3674) SPC1, HIT1, AIM24, ILM1, YKL018C- A, TTI1, FAT3, YKR073C, EMC6, PER33, YLR342W-A, YLR374C, CTF3, YML037C, TUB1, YML094C-A, YMR105W-A, YMR119W-A, YMR122C, YMR153C-A, YNL146C-A, YNL146W, VID27, RTC4, BSC5, YOL134C, MED7, YOR072W, SGO1, AIM41, YOR364W, YPL080C, YIG1, OPY2 SGD Term - - - - Finder

B. 100 MAT + MET15 Top Hits List

Benjamini- Hochburg GO GO GO Term p-value critical Genes in term Categories Method value (Q = 0.05) SGD Slim - - - - Cellular Mapper Component SGD Term - - - - Finder SGD Slim - - - - Biological Mapper Process SGD Term - - - - Finder YAL067W-A, YBR096W, IML3, YBR137W, YBR144C, APD1, HSM3, BPH1, RGT2, YDL211C, MRH1, ECM11, YDR509W, GRH1, ZRG8, YER087C-A, TMN3, DSE1, Molecular YER135C, BCK2, YER181C, PUG1, SGD Slim Function 0.000233 0.001136 SNO3, NNF2, MMF1, YIL060W, Molecular Mapper Unknown AIM19, ICE2, SYS1, YJL009W, Function (GOID: 3674) PRY3, YJL120W, SPC1, HIT1, AIM24, YKL018C-A, TUB1, YML094C-A, YMR105W-A, YMR122C, VID27, RTC4, MED7, YOR072W, SGO1, YOR364W, OPY2

SGD Term - - - - Finder

Significantly enriched gene ontology categories identified with SGD Slim Mapper and SGD Term Finder tools are shown. SGD Term Finder reported multiple comparison corrected p- values < 0.05 are shown. P-values from SGD Slim Mapper were corrected with the Benjamini- Hochberg critical value. For SGD Slim Mapper, all significantly enriched categories with a uncorrected p-value < their Benjamini-Hochberg critical value (Q = 0.05) are shown. Gene Ontology Identification numbers (GOID) and names of the genes that represent the enrichment are included.