https://www.alphaknockout.com

Mouse Thoc2 Knockout Project (CRISPR/Cas9)

Objective: To create a Thoc2 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Thoc2 (NCBI Reference Sequence: NM_001033422 ; Ensembl: ENSMUSG00000037475 ) is located on Mouse X. 39 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 38 (Transcript: ENSMUST00000047037). Exon 4~7 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 4 starts from about 4.66% of the coding region. Exon 4~7 covers 7.93% of the coding region. The size of effective KO region: ~7648 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 4 5 6 7 39

Legends Exon of mouse Thoc2 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1971 bp section upstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 7 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(1971bp) | A(29.58% 583) | C(15.47% 305) | T(38.2% 753) | G(16.74% 330)

Note: The 1971 bp section upstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.25% 525) | C(16.7% 334) | T(40.25% 805) | G(16.8% 336)

Note: The 2000 bp section downstream of Exon 7 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1971 1 1971 1971 100.0% chrX - 41879938 41881908 1971 browser details YourSeq 40 1672 1785 1971 95.5% chr3 - 133570876 133571025 150 browser details YourSeq 40 1644 1756 1971 72.4% chr11 - 77194873 77194966 94 browser details YourSeq 37 1639 1676 1971 100.0% chr5 + 65767402 65767454 53 browser details YourSeq 36 1672 1780 1971 75.0% chr19 + 25263615 25263711 97 browser details YourSeq 33 1674 1775 1971 75.7% chr12 + 86384963 86385056 94 browser details YourSeq 30 1734 1766 1971 87.1% chr9 - 47984026 47984056 31 browser details YourSeq 30 1749 1779 1971 100.0% chr11 + 97617897 97617928 32 browser details YourSeq 29 1641 1669 1971 100.0% chr12 - 40147100 40147128 29 browser details YourSeq 29 1735 1766 1971 86.7% chr1 + 193676182 193676211 30 browser details YourSeq 28 1642 1669 1971 100.0% chr8 - 18406216 18406243 28 browser details YourSeq 27 1749 1780 1971 96.6% chr16 + 11467064 11467096 33 browser details YourSeq 25 1748 1780 1971 93.2% chr15 - 102426423 102426456 34 browser details YourSeq 25 1643 1669 1971 88.5% chr13 - 112708511 112708536 26 browser details YourSeq 25 1737 1766 1971 93.2% chr1 + 90011221 90011254 34 browser details YourSeq 24 1735 1761 1971 84.0% chr12 - 81775002 81775026 25 browser details YourSeq 24 1742 1766 1971 100.0% chr3 + 40311575 40311601 27 browser details YourSeq 24 1742 1766 1971 100.0% chr3 + 22609485 22609511 27

Note: The 1971 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chrX - 41870290 41872289 2000 browser details YourSeq 92 810 920 2000 99.0% chrX + 41871370 41871480 111 browser details YourSeq 65 278 806 2000 68.1% chr3 - 108976404 108976477 74 browser details YourSeq 53 743 809 2000 93.6% chr2 - 123659383 123659494 112 browser details YourSeq 52 744 809 2000 98.2% chr17 - 57003207 57003284 78 browser details YourSeq 51 743 809 2000 92.8% chr15 - 5346011 5346076 66 browser details YourSeq 50 745 810 2000 90.8% chr11 - 110636129 110636192 64 browser details YourSeq 50 744 806 2000 96.3% chr16 + 83120500 83120776 277 browser details YourSeq 50 743 809 2000 87.8% chr13 + 76961982 76962046 65 browser details YourSeq 49 743 809 2000 91.0% chr1 - 45710058 45710123 66 browser details YourSeq 49 743 809 2000 91.0% chr11 + 111852131 111852196 66 browser details YourSeq 48 743 809 2000 85.2% chr9 - 81774618 81774680 63 browser details YourSeq 48 742 809 2000 81.2% chr3 - 112600670 112600726 57 browser details YourSeq 48 746 809 2000 84.7% chr17 - 38213693 38213750 58 browser details YourSeq 48 743 808 2000 90.8% chr2 + 40320102 40320166 65 browser details YourSeq 47 726 788 2000 94.4% chr4 - 58729674 58729750 77 browser details YourSeq 47 734 788 2000 94.4% chr16 - 77737105 77737161 57 browser details YourSeq 47 746 806 2000 86.6% chr7 + 3899190 3899247 58 browser details YourSeq 46 744 809 2000 79.3% chr18 - 24721683 24721739 57 browser details YourSeq 46 742 809 2000 77.8% chr17 - 41000816 41000871 56

Note: The 2000 bp section downstream of Exon 7 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Thoc2 THO complex 2 [ Mus musculus (house mouse) ] Gene ID: 331401, updated on 12-Aug-2019

Gene summary

Official Symbol Thoc2 provided by MGI Official Full Name THO complex 2 provided by MGI Primary source MGI:MGI:2442413 See related Ensembl:ENSMUSG00000037475 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Tho2; Gm1139; Gm1793; 6330441O12Rik; D130005M13Rik Expression Broad expression in limb E14.5 (RPKM 9.1), whole brain E14.5 (RPKM 9.0) and 23 other tissues See more Orthologs human all

Genomic context

Location: X; X A4 See Thoc2 in Genome Data Viewer Exon count: 39

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) X NC_000086.7 (41794992..41911937, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) X NC_000086.6 (39148171..39265078, complement)

Chromosome X - NC_000086.7

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 16 transcripts

Gene: Thoc2 ENSMUSG00000037475

Description THO complex 2 [Source:MGI Symbol;Acc:MGI:2442413] Gene Synonyms 6330441O12Rik, D130005M13Rik, LOC382210, LOC386493 Location Chromosome X: 41,794,991-41,920,674 reverse strand. GRCm38:CM001013.2 About this gene This gene has 16 transcripts (splice variants), 207 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Thoc2- ENSMUST00000047037.14 7622 1594aa ENSMUSP00000044677.8 Protein coding CCDS40951 B1AZI6 TSL:5 201 GENCODE basic APPRIS P1

Thoc2- ENSMUST00000124458.7 3506 232aa ENSMUSP00000119460.1 Protein coding - Q8C690 CDS 5' 203 incomplete TSL:1

Thoc2- ENSMUST00000145586.7 574 86aa ENSMUSP00000118960.1 Protein coding - F6W2C7 CDS 5' 210 incomplete TSL:5

Thoc2- ENSMUST00000152921.1 489 87aa ENSMUSP00000114148.1 Protein coding - B1AZI7 CDS 3' 215 incomplete TSL:2

Thoc2- ENSMUST00000143557.7 462 117aa ENSMUSP00000118106.1 Protein coding - F6VK25 CDS 5' 209 incomplete TSL:5

Thoc2- ENSMUST00000155369.1 427 109aa ENSMUSP00000120342.1 Protein coding - B8A4C2 CDS 5' 216 incomplete TSL:5

Thoc2- ENSMUST00000151430.7 2213 165aa ENSMUSP00000115815.1 Nonsense mediated - F6SBF2 CDS 5' 212 decay incomplete TSL:5

Thoc2- ENSMUST00000129915.7 869 125aa ENSMUSP00000121941.1 Nonsense mediated - F6RBN0 CDS 5' 205 decay incomplete TSL:5

Thoc2- ENSMUST00000131259.1 229 53aa ENSMUSP00000119823.1 Nonsense mediated - F6QCT0 CDS 5' 207 decay incomplete TSL:5

Thoc2- ENSMUST00000127338.7 2697 No - Retained intron - - TSL:1 204 protein

Thoc2- ENSMUST00000150443.7 2067 No - Retained intron - - TSL:2 211 protein

Thoc2- ENSMUST00000151454.1 1654 No - Retained intron - - TSL:3 213 protein

Thoc2- ENSMUST00000151802.1 1356 No - Retained intron - - TSL:1 214 protein

Thoc2- ENSMUST00000130993.1 675 No - Retained intron - - TSL:3 206 protein

Thoc2- ENSMUST00000123512.7 936 No - lncRNA - - TSL:5 202 protein

Thoc2- ENSMUST00000134084.1 908 No - lncRNA - - TSL:5 208 protein

Page 7 of 9 https://www.alphaknockout.com

145.68 kb Forward strand 41.80Mb 41.85Mb 41.90Mb Gm7657-201 >processed pseudogene (Comprehensive set...

Contigs BX005253.10 > AL954355.10 > Genes (Comprehensive set... < Thoc2-201protein coding

< Thoc2-203protein codin

< Thoc2-212nonsense mediated decay < Thoc2-205nonsense mediated decay

< Thoc2-210protein coding < Thoc2-214retained intron < Thoc2-215protein coding

< Thoc2-209protein coding < Thoc2-207nonsense mediated decay

< Thoc2-216protein coding < Thoc2-213retained intron < Gm5135-201processed pseudogene

< Thoc2-208lncRNA

< Thoc2-206retained intron

< Thoc2-202lncRNA

Regulatory Build

41.80Mb 41.85Mb 41.90Mb Reverse strand 145.68 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript pseudogene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000047037

< Thoc2-201protein coding

Reverse strand 116.91 kb

ENSMUSP00000044... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Pfam THO complex subunit 2, N-terminal domain THO complex, subunitTHOC2, N-terminal

THO complex, subunitTHOC2, C-terminal PANTHER THO complex subunit 2

PTHR21597:SF1

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained missense variant splice region variant synonymous variant

Scale bar 0 200 400 600 800 1000 1200 1594

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9