https://www.alphaknockout.com

Mouse Thap12 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Thap12 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Thap12 (NCBI Reference Sequence: NM_028410.1 ; Ensembl: ENSMUSG00000030753 ) is located on Mouse 7. 5 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 5 (Transcript: ENSMUST00000033009). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Thap12 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-32F21 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 3.96% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 3518 bp, and the size of intron 2 for 3'-loxP site insertion: 3008 bp. The size of effective cKO region: ~621 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 5 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Thap12 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7121bp) | A(29.32% 2088) | C(17.08% 1216) | T(32.47% 2312) | G(21.13% 1505)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr7 + 98703748 98706747 3000 browser details YourSeq 138 2728 2899 3000 91.1% chr11 + 88725104 88725271 168 browser details YourSeq 132 2773 2937 3000 88.5% chr2 - 12428972 12429119 148 browser details YourSeq 124 2729 2883 3000 88.9% chr18 - 43536224 43536377 154 browser details YourSeq 123 2738 2880 3000 91.5% chr13 + 112899026 112899166 141 browser details YourSeq 121 2474 2862 3000 95.5% chr5 - 130064711 130065103 393 browser details YourSeq 121 2734 2885 3000 90.4% chr13 + 42545633 42545777 145 browser details YourSeq 120 2739 2885 3000 90.6% chr1 + 152706548 152706692 145 browser details YourSeq 119 2730 2880 3000 91.6% chr16 - 77024389 77024545 157 browser details YourSeq 119 2729 2871 3000 89.4% chr12 + 82673687 82673827 141 browser details YourSeq 119 2738 2885 3000 90.3% chr1 + 155912987 155913127 141 browser details YourSeq 118 2728 2871 3000 91.2% chr1 + 134769193 134769334 142 browser details YourSeq 117 2739 2880 3000 91.3% chr15 - 84703827 84703967 141 browser details YourSeq 115 2735 2865 3000 90.6% chr3 + 137294979 137295105 127 browser details YourSeq 114 2719 2877 3000 86.6% chr16 + 89008690 89008832 143 browser details YourSeq 114 2735 2871 3000 88.7% chr1 + 93003645 93003776 132 browser details YourSeq 113 2738 2871 3000 92.5% chr11 - 116591347 116591487 141 browser details YourSeq 112 2734 2878 3000 87.0% chr9 + 100997478 100997612 135 browser details YourSeq 112 2734 2871 3000 89.1% chr14 + 30167423 30167555 133 browser details YourSeq 111 2735 2871 3000 88.6% chr11 - 3653243 3653365 123

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr7 + 98707369 98710368 3000 browser details YourSeq 43 592 663 3000 79.4% chr15 + 52033515 52033584 70 browser details YourSeq 34 17 79 3000 94.8% chr1 - 189509009 189509082 74 browser details YourSeq 27 63 97 3000 72.5% chr15 - 73000360 73000388 29 browser details YourSeq 27 1634 1660 3000 100.0% chr12 - 76968676 76968702 27 browser details YourSeq 27 1632 1660 3000 96.6% chr13 + 54639166 54639194 29 browser details YourSeq 25 1634 1662 3000 93.2% chr11 - 105958765 105958793 29 browser details YourSeq 24 1632 1662 3000 84.7% chr2 + 156372858 156372886 29 browser details YourSeq 24 1633 1661 3000 96.2% chr13 + 59385586 59385616 31 browser details YourSeq 24 1748 1775 3000 92.9% chr12 + 9391914 9391941 28 browser details YourSeq 22 1628 1655 3000 89.3% chr12 - 11413956 11413983 28

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Thap12 THAP domain containing 12 [ Mus musculus (house mouse) ] Gene ID: 72981, updated on 26-Jun-2020

Gene summary

Official Symbol Thap12 provided by MGI Official Full Name THAP domain containing 12 provided by MGI Primary source MGI:MGI:1920231 See related Ensembl:ENSMUSG00000030753 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Dap4; Prkrir; Rpkrir; 2900052B10Rik Expression Ubiquitous expression in CNS E11.5 (RPKM 18.2), bladder adult (RPKM 15.8) and 28 other tissues See more Orthologs human all

Genomic context

Location: 7; 7 E1 See Thap12 in Genome Data Viewer

Exon count: 7

Annotation release Status Assembly Chr Location

108.20200622 current GRCm38.p6 (GCF_000001635.26) 7 NC_000073.6 (98703036..98718062)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 7 NC_000073.5 (105851873..105866572)

Chromosome 7 - NC_000073.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Thap12 ENSMUSG00000030753

Description THAP domain containing 12 [Source:MGI Symbol;Acc:MGI:1920231] Gene Synonyms 2900052B10Rik, Dap4, Prkrir Location Chromosome 7: 98,703,103-98,718,062 forward strand. GRCm38:CM001000.2 About this gene This gene has 4 transcripts (splice variants), 354 orthologues, 13 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Thap12- ENSMUST00000033009.15 3723 758aa ENSMUSP00000033009.9 Protein coding CCDS21474 Q9CUX1 TSL:1 201 GENCODE basic APPRIS P1

Thap12- ENSMUST00000126356.7 1123 79aa ENSMUSP00000118403.1 Nonsense mediated - D6RI43 TSL:3 202 decay

Thap12- ENSMUST00000153566.1 607 120aa ENSMUSP00000118736.1 Nonsense mediated - D6RH75 TSL:3 204 decay

Thap12- ENSMUST00000146473.1 670 No - Retained intron - - TSL:2 203 protein

34.96 kb Forward strand

98.70Mb 98.71Mb 98.72Mb (Comprehensive set... Thap12-201 >protein coding

Thap12-202 >nonsense mediated decay

Thap12-204 >nonsense mediated decay

Thap12-203 >retained intron

Contigs < AC111124.8 Genes < Gm15506-201processed transcript < Gm19656-201lincRNA (Comprehensive set...

< Gm15506-202processed transcript

< Gm15506-203transcribed unitary pseudogene

Regulatory Build

98.70Mb 98.71Mb 98.72Mb Reverse strand 34.96 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript pseudogene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000033009

14.96 kb Forward strand

Thap12-201 >protein coding

ENSMUSP00000033... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily SSF57716 Ribonuclease H-like superfamily

SMART THAP-type zinc finger

THAP-type zinc finger Pfam THAP-type zinc finger Domain of unknown function DUF4371 HAT, C-terminal dimerisation domain

PROSITE profiles THAP-type zinc finger PANTHER PTHR46289:SF7

PTHR46289 Gene3D THAP-type zinc finger superfamily

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 758

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7