https://www.alphaknockout.com

Mouse Thoc5 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Thoc5 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Thoc5 (NCBI Reference Sequence: NM_172438 ; Ensembl: ENSMUSG00000034274 ) is located on Mouse 11. 20 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 20 (Transcript: ENSMUST00000038237). Exon 3~5 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Thoc5 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-436C9 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit embryonic lethality prior to E5.5.

Exon 3 starts from about 4.78% of the coding region. The knockout of Exon 3~5 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 1245 bp, and the size of intron 5 for 3'-loxP site insertion: 1662 bp. The size of effective cKO region: ~1781 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 20 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Thoc5 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8281bp) | A(25.83% 2139) | C(21.75% 1801) | T(29.6% 2451) | G(22.82% 1890)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr11 + 4897903 4900902 3000 browser details YourSeq 345 1 355 3000 98.6% chr6 - 26953655 26954009 355 browser details YourSeq 344 1 357 3000 98.4% chr3 + 12981189 12981580 392 browser details YourSeq 343 1 762 3000 96.5% chr4 + 84863206 84863988 783 browser details YourSeq 343 1 350 3000 99.2% chr17 + 89391011 89391413 403 browser details YourSeq 342 1 355 3000 98.4% chr13 - 51055242 51055649 408 browser details YourSeq 342 1 355 3000 98.4% chr7 + 128315611 128315969 359 browser details YourSeq 341 1 363 3000 96.2% chr4 - 105742721 105743080 360 browser details YourSeq 341 1 355 3000 98.1% chr3 - 9763381 9763735 355 browser details YourSeq 341 1 629 3000 94.6% chr17 - 23685564 23686372 809 browser details YourSeq 341 1 350 3000 98.9% chr11 - 105847889 105848291 403 browser details YourSeq 341 1 355 3000 98.1% chr9 + 106499031 106499385 355 browser details YourSeq 341 1 356 3000 98.1% chr6 + 91342049 91342410 362 browser details YourSeq 340 1 355 3000 98.1% chrX - 117271852 117272259 408 browser details YourSeq 340 1 355 3000 98.1% chr9 - 90808144 90808551 408 browser details YourSeq 340 1 350 3000 98.6% chr3 - 70842057 70842406 350 browser details YourSeq 340 1 343 3000 99.8% chr3 + 112266486 112266881 396 browser details YourSeq 339 1 350 3000 98.6% chr3 - 70836524 70836926 403 browser details YourSeq 339 1 355 3000 97.8% chr19 - 49426446 49426800 355 browser details YourSeq 339 1 350 3000 98.6% chr12 - 19855432 19855834 403

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr11 + 4902684 4905683 3000 browser details YourSeq 214 373 1003 3000 86.6% chr7 + 80262612 80263016 405 browser details YourSeq 190 337 1001 3000 87.4% chr16 - 20614972 20615479 508 browser details YourSeq 183 375 1013 3000 85.1% chr1 - 60145864 60146280 417 browser details YourSeq 179 348 1001 3000 84.3% chr1 + 179518419 179518784 366 browser details YourSeq 173 804 1005 3000 91.2% chr5 + 100878628 100878819 192 browser details YourSeq 172 337 998 3000 85.6% chr5 - 137691740 137692306 567 browser details YourSeq 172 805 1004 3000 95.3% chr11 + 78255713 78283951 28239 browser details YourSeq 171 804 1001 3000 92.5% chr4 - 139294979 139295165 187 browser details YourSeq 170 807 1001 3000 93.9% chr2 - 119432547 119432965 419 browser details YourSeq 169 804 1001 3000 91.0% chrX - 60613160 60613347 188 browser details YourSeq 169 804 1004 3000 94.3% chr1 - 43697557 43697757 201 browser details YourSeq 169 806 1001 3000 94.8% chr1 - 35980721 35980922 202 browser details YourSeq 169 804 1004 3000 94.8% chrX + 52980960 52981176 217 browser details YourSeq 168 804 1004 3000 92.7% chr5 - 3665527 3665725 199 browser details YourSeq 168 804 1003 3000 91.6% chr3 - 88127196 88127390 195 browser details YourSeq 168 822 1022 3000 93.3% chr10 - 93948180 93948526 347 browser details YourSeq 168 804 1006 3000 94.3% chr14 + 57563630 57564138 509 browser details YourSeq 168 804 1022 3000 88.7% chr11 + 85982405 85982608 204 browser details YourSeq 168 804 1002 3000 90.5% chr10 + 4459919 4460107 189

Note: The 3000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Thoc5 THO complex 5 [ Mus musculus (house mouse) ] Gene ID: 107829, updated on 12-Aug-2019

Gene summary

Official Symbol Thoc5 provided by MGI Official Full Name THO complex 5 provided by MGI Primary source MGI:MGI:1351333 See related Ensembl:ENSMUSG00000034274 Gene type protein coding RefSeq status PROVISIONAL Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Fmip; Sgk2; PK1.3; 1700060C24Rik; A430085L24Rik Expression Ubiquitous expression in testis adult (RPKM 28.1), CNS E14 (RPKM 23.1) and 28 other tissues See more Orthologs human all

Genomic context

Location: 11; 11 A1 See Thoc5 in Genome Data Viewer

Exon count: 21

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (4895283..4928867)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (4795346..4828868)

Chromosome 11 - NC_000077.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Thoc5 ENSMUSG00000034274

Description THO complex 5 [Source:MGI Symbol;Acc:MGI:1351333] Gene Synonyms 1700060C24Rik, A430085L24Rik, Fmip, PK1.3 Location Chromosome 11: 4,895,320-4,928,867 forward strand. GRCm38:CM001004.2 About this gene This gene has 6 transcripts (splice variants), 213 orthologues, is a member of 1 Ensembl protein family and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Thoc5-201 ENSMUST00000038237.7 2235 683aa ENSMUSP00000045580.1 Protein coding CCDS24393 A0A0R4J0J6 TSL:1 GENCODE basic APPRIS P1

Thoc5-202 ENSMUST00000101615.8 2116 635aa ENSMUSP00000099137.2 Protein coding - Q5SVF9 TSL:5 GENCODE basic

Thoc5-203 ENSMUST00000142543.2 1082 300aa ENSMUSP00000118940.1 Protein coding - Q5SVF8 CDS 3' incomplete TSL:5

Thoc5-204 ENSMUST00000144371.1 770 No protein - lncRNA - - TSL:5

Thoc5-206 ENSMUST00000155872.1 600 No protein - lncRNA - - TSL:2

Thoc5-205 ENSMUST00000148117.7 592 No protein - lncRNA - - TSL:2

53.55 kb Forward strand 4.89Mb 4.90Mb 4.91Mb 4.92Mb 4.93Mb (Comprehensive set... Nipsnap1-201 >protein coding Thoc5-202 >protein coding

Nipsnap1-203 >nonsense mediated decay Thoc5-204 >lncRNA Thoc5-206 >lncRNA

Nipsnap1-204 >protein coding Thoc5-201 >protein coding

Nipsnap1-206 >lncRNA Thoc5-205 >lncRNA AA413626-201 >processed pseudogene

Mir6918-201 >miRNA Thoc5-203 >protein coding

Contigs AL645522.12 > Genes < Nefh-201protein coding (Comprehensive set...

Regulatory Build

4.89Mb 4.90Mb 4.91Mb 4.92Mb 4.93Mb Reverse strand 53.55 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene pseudogene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000038237

33.52 kb Forward strand

Thoc5-201 >protein coding

ENSMUSP00000045... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Pfam THO complex, subunit 5 PANTHER THO complex, subunit 5

PTHR13375:SF3

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 600 683

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7