https://www.alphaknockout.com

Mouse Thap2 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Thap2 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Thap2 (NCBI Reference Sequence: NM_025780 ; Ensembl: ENSMUSG00000020137 ) is located on Mouse 10. 3 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 3 (Transcript: ENSMUST00000218842). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Thap2 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-145J21 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 covers 58.99% of the coding region. Start codon is in exon 1, and stop codon is in exon 3. The size of intron 2 for 5'-loxP site insertion: 3405 bp. The size of effective cKO region: ~657 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

gRNA region

Wildtype allele T A

5' gRNA region A 3'

1 3

Targeting vector T A A

Targeted allele T A A

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Thap2 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(6884bp) | A(30.68% 2112) | C(19.8% 1363) | T(29.13% 2005) | G(20.4% 1404)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr10 - 115373197 115376196 3000 browser details YourSeq 303 1343 1725 3000 94.3% chr10 - 41143455 41144004 550 browser details YourSeq 303 1395 1735 3000 95.1% chr13 + 101282054 101282414 361 browser details YourSeq 299 1398 1724 3000 96.1% chr1 - 156176682 156177026 345 browser details YourSeq 297 1398 1726 3000 95.5% chr6 - 82451150 82451496 347 browser details YourSeq 296 1398 1724 3000 95.8% chr6 + 50511813 50554375 42563 browser details YourSeq 295 1397 1724 3000 95.5% chr1 + 142083910 142084256 347 browser details YourSeq 293 1396 1725 3000 94.9% chr14 + 35128844 35129192 349 browser details YourSeq 291 1400 1725 3000 95.4% chr12 - 99863475 99863818 344 browser details YourSeq 289 1398 1734 3000 94.5% chr1 - 153751081 153751441 361 browser details YourSeq 285 1393 1724 3000 94.5% chr3 + 130849586 130849937 352 browser details YourSeq 284 1398 1725 3000 94.7% chr19 - 49872226 49872572 347 browser details YourSeq 284 1398 1724 3000 95.3% chr6 + 49568447 49568778 332 browser details YourSeq 283 1397 1724 3000 95.4% chr5 - 36409575 36409897 323 browser details YourSeq 283 1394 1729 3000 94.7% chr14 + 118811229 118811566 338 browser details YourSeq 281 1126 1724 3000 91.0% chr3 + 51351672 51352199 528 browser details YourSeq 280 1397 1724 3000 94.1% chr7 - 23144900 23145246 347 browser details YourSeq 280 1397 1724 3000 94.1% chr7 - 21921711 21922057 347 browser details YourSeq 280 1398 1724 3000 96.1% chr4 - 121204688 121205014 327 browser details YourSeq 280 1397 1724 3000 94.1% chr16 - 54634337 54634683 347

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr10 - 115369563 115372562 3000 browser details YourSeq 150 495 1153 3000 83.6% chr16 + 35613421 35613590 170 browser details YourSeq 145 895 1429 3000 92.4% chr8 - 110794676 110795218 543 browser details YourSeq 144 486 1148 3000 82.2% chr10 + 128747227 128747415 189 browser details YourSeq 138 927 1156 3000 93.7% chr4 - 126561012 126561328 317 browser details YourSeq 138 891 1144 3000 95.4% chr18 + 36708158 36709500 1343 browser details YourSeq 135 891 1157 3000 85.4% chr11 + 95134678 95134833 156 browser details YourSeq 134 913 1157 3000 96.6% chr4 + 116885798 116886247 450 browser details YourSeq 134 889 1132 3000 89.0% chr16 + 11235959 11236471 513 browser details YourSeq 134 924 1171 3000 93.0% chr1 + 179170208 179467375 297168 browser details YourSeq 132 894 1168 3000 92.3% chr12 + 4363697 4363996 300 browser details YourSeq 130 891 1157 3000 84.4% chr19 - 46048717 46048871 155 browser details YourSeq 128 893 1156 3000 84.1% chr14 - 121240851 121240995 145 browser details YourSeq 128 498 1157 3000 79.9% chr4 + 125999535 125999699 165 browser details YourSeq 127 891 1153 3000 84.2% chr5 + 25027338 25027489 152 browser details YourSeq 127 924 1172 3000 85.2% chr2 + 84699858 84700009 152 browser details YourSeq 126 494 1147 3000 80.5% chr4 + 150718674 150718819 146 browser details YourSeq 125 891 1157 3000 82.9% chr11 - 98989084 98989238 155 browser details YourSeq 125 486 1135 3000 79.9% chr16 + 55587015 55587166 152 browser details YourSeq 124 891 1153 3000 85.2% chr18 - 36583831 36583979 149

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Thap2 THAP domain containing, apoptosis associated protein 2 [ Mus musculus (house mouse) ] Gene ID: 66816, updated on 12-Aug-2019

Gene summary

Official Symbol Thap2 provided by MGI Official Full Name THAP domain containing, apoptosis associated protein 2 provided by MGI Primary source MGI:MGI:1914066 See related Ensembl:ENSMUSG00000020137 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI450385; AI649097; 2900040O07Rik; 9030625G08Rik Expression Ubiquitous expression in whole brain E14.5 (RPKM 5.3), CNS E11.5 (RPKM 4.9) and 28 other tissues See more Orthologs human all

Genomic context

Location: 10; 10 D2 See Thap2 in Genome Data Viewer

Exon count: 3

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 10 NC_000076.6 (115368398..115384435, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 10 NC_000076.5 (114807022..114821491, complement)

Chromosome 10 - NC_000076.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Thap2 ENSMUSG00000020137

Description THAP domain containing, apoptosis associated protein 2 [Source:MGI Symbol;Acc:MGI:1914066] Gene Synonyms 2900040O07Rik, 9030625G08Rik Location Chromosome 10: 115,368,404-115,384,443 reverse strand. GRCm38:CM001003.2 About this gene This gene has 3 transcripts (splice variants), 181 orthologues, 5 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Thap2-202 ENSMUST00000218842.1 4970 217aa ENSMUSP00000151353.1 Protein coding CCDS24179 Q9D305 TSL:1 GENCODE basic APPRIS P1

Thap2-201 ENSMUST00000020346.5 3681 49aa ENSMUSP00000020346.5 Protein coding - A0A1X7SB55 TSL:1 GENCODE basic

Thap2-203 ENSMUST00000218989.1 527 37aa ENSMUSP00000151925.1 Protein coding - A0A1W2P852 TSL:NA GENCODE basic

36.04 kb Forward strand 115.36Mb 115.37Mb 115.38Mb 115.39Mb Zfc3h1-201 >protein coding (Comprehensive set...

Contigs AC126943.5 > Genes (Comprehensive set... < Tmem19-206protein coding < Thap2-202protein coding

< Tmem19-203protein coding < Thap2-201protein coding

< Tmem19-201protein coding < Thap2-203protein coding

< Tmem19-204protein coding

< Tmem19-209protein coding

< Tmem19-202protein coding

< Tmem19-210protein coding

< Tmem19-205lncRNA

< Tmem19-207retained intron

Regulatory Build

115.36Mb 115.37Mb 115.38Mb 115.39Mb Reverse strand 36.04 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000218842

< Thap2-202protein coding

Reverse strand 16.03 kb

ENSMUSP00000151... Low complexity (Seg) Coiled-coils (Ncoils) Superfamily SSF57716 SMART THAP-type zinc finger

THAP-type zinc finger Pfam THAP-type zinc finger PROSITE profiles THAP-type zinc finger PANTHER THAP domain-containing protein 2 Gene3D THAP-type zinc finger superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 180 217

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7