https://www.alphaknockout.com

Mouse Tpd52 Knockout Project (CRISPR/Cas9)

Objective: To create a Tpd52 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Tpd52 (NCBI Reference Sequence: NM_001025261 ; Ensembl: ENSMUSG00000027506 ) is located on Mouse 3. 8 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 8 (Transcript: ENSMUST00000094381). Exon 2~4 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 18.49% of the coding region. Exon 2~4 covers 49.53% of the coding region. The size of effective KO region: ~9189 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 8

Legends Exon of mouse Tpd52 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1033 bp section downstream of Exon 4 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(25.6% 512) | C(20.7% 414) | T(27.05% 541) | G(26.65% 533)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1033bp) | A(20.23% 209) | C(19.85% 205) | T(35.43% 366) | G(24.49% 253)

Note: The 1033 bp section downstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 - 8953800 8955799 2000 browser details YourSeq 214 1015 1233 2000 99.1% chr7 - 99431634 99431856 223 browser details YourSeq 209 1015 1274 2000 94.6% chr16 + 17049306 17396766 347461 browser details YourSeq 207 1015 1229 2000 98.6% chr8 - 119872734 119872956 223 browser details YourSeq 204 1015 1235 2000 94.5% chr14 - 52125502 52125718 217 browser details YourSeq 202 1015 1235 2000 95.3% chr6 - 67227555 67227768 214 browser details YourSeq 202 1015 1226 2000 98.6% chr4 + 89266964 89267181 218 browser details YourSeq 200 1018 1233 2000 96.8% chr6 - 38543364 38543592 229 browser details YourSeq 200 1015 1235 2000 95.7% chrX + 74357291 74357507 217 browser details YourSeq 198 1015 1233 2000 94.6% chr17 + 29488088 29488290 203 browser details YourSeq 196 1014 1235 2000 95.9% chr5 + 121456986 121457208 223 browser details YourSeq 196 1015 1230 2000 95.8% chr2 + 30855246 30855461 216 browser details YourSeq 194 1014 1233 2000 93.7% chr11 + 94661940 94662147 208 browser details YourSeq 192 1013 1235 2000 95.3% chr9 - 35201155 35201402 248 browser details YourSeq 191 1002 1240 2000 91.6% chr11 + 102744489 102744713 225 browser details YourSeq 191 1015 1262 2000 95.3% chr11 + 85503156 85503441 286 browser details YourSeq 190 1016 1225 2000 96.7% chr10 + 117141706 117141951 246 browser details YourSeq 189 1015 1228 2000 94.8% chr2 - 166028703 166028931 229 browser details YourSeq 189 1011 1233 2000 92.7% chr17 - 35192144 35192357 214 browser details YourSeq 189 1016 1233 2000 96.1% chr15 - 99046161 99046769 609

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1033 1 1033 1033 100.0% chr3 - 8943578 8944610 1033 browser details YourSeq 22 789 810 1033 100.0% chr1 + 14379753 14379774 22 browser details YourSeq 21 730 750 1033 100.0% chr1 - 16662868 16662888 21 browser details YourSeq 21 496 516 1033 100.0% chr10 + 95812287 95812307 21 browser details YourSeq 20 739 762 1033 91.7% chr11 - 61744343 61744366 24 browser details YourSeq 20 577 598 1033 95.5% chr11 + 60996786 60996807 22 browser details YourSeq 20 663 682 1033 100.0% chr10 + 77206030 77206049 20 browser details YourSeq 20 193 212 1033 100.0% chr1 + 113891705 113891724 20

Note: The 1033 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Tpd52 tumor protein D52 [ Mus musculus (house mouse) ] Gene ID: 21985, updated on 10-Oct-2019

Gene summary

Official Symbol Tpd52 provided by MGI Official Full Name tumor protein D52 provided by MGI Primary source MGI:MGI:107749 See related Ensembl:ENSMUSG00000027506 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as mD52; AI043038 Expression Ubiquitous expression in large intestine adult (RPKM 34.0), colon adult (RPKM 25.8) and 26 other tissues See more Orthologs human all

Genomic context

Location: 3; 3 A1 See Tpd52 in Genome Data Viewer Exon count: 11

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (8926530..9005393, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (8929436..9004515, complement)

Chromosome 3 - NC_000069.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 11 transcripts

Gene: Tpd52 ENSMUSG00000027506

Description tumor protein D52 [Source:MGI Symbol;Acc:MGI:107749] Gene Synonyms mD52 Location Chromosome 3: 8,925,593-9,004,723 reverse strand. GRCm38:CM000996.2 About this gene This gene has 11 transcripts (splice variants), 216 orthologues, 3 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Tpd52- ENSMUST00000120143.7 6317 185aa ENSMUSP00000112830.1 Protein coding CCDS50860 Q62393 TSL:1 205 GENCODE basic

Tpd52- ENSMUST00000091355.11 3081 199aa ENSMUSP00000088914.5 Protein coding CCDS38384 Q62393 TSL:1 203 GENCODE basic

Tpd52- ENSMUST00000091354.11 2193 224aa ENSMUSP00000088913.5 Protein coding CCDS38386 Q62393 TSL:1 202 GENCODE basic APPRIS ALT1

Tpd52- ENSMUST00000094381.10 1074 247aa ENSMUSP00000091943.4 Protein coding CCDS38387 F8WHQ1 TSL:3 204 GENCODE basic APPRIS P4

Tpd52- ENSMUST00000063496.13 697 194aa ENSMUSP00000066826.7 Protein coding CCDS38385 E9PUA7 TSL:2 201 GENCODE basic

Tpd52- ENSMUST00000121038.7 849 162aa ENSMUSP00000113368.1 Protein coding - D3Z637 TSL:5 206 GENCODE basic

Tpd52- ENSMUST00000145905.7 698 179aa ENSMUSP00000123147.1 Protein coding - D3Z125 CDS 3' 210 incomplete TSL:5

Tpd52- ENSMUST00000134788.7 673 206aa ENSMUSP00000119899.1 Protein coding - D3Z7X7 CDS 3' 209 incomplete TSL:5

Tpd52- ENSMUST00000124956.1 561 77aa ENSMUSP00000119077.1 Protein coding - D3Z2U2 CDS 3' 207 incomplete TSL:5

Tpd52- ENSMUST00000155450.1 409 57aa ENSMUSP00000120317.1 Nonsense mediated - D6RJ37 TSL:3 211 decay

Tpd52- ENSMUST00000129736.7 419 No - lncRNA - - TSL:3 208 protein

Page 7 of 9 https://www.alphaknockout.com

99.13 kb Forward strand

8.92Mb 8.94Mb 8.96Mb 8.98Mb 9.00Mb Contigs < AC133935.3 < AC161056.5

Genes (Comprehensive set... < Mrps28-201protein cod

< Tpd52-205protein coding

< Tpd52-202protein coding

< Tpd52-204protein coding

< Tpd52-206protein coding

< Tpd52-201protein coding

< Tpd52-210protein coding

< Tpd52-209protein coding

< Tpd52-207protein coding

< Tpd52-208lncRNA

< Tpd52-211nonsense mediated decay

Regulatory Build

8.92Mb 8.94Mb 8.96Mb 8.98Mb 9.00Mb Reverse strand 99.13 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000094381

< Tpd52-204protein coding

Reverse strand 33.26 kb

ENSMUSP00000091... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Pfam Tumour protein D52

PANTHER Tumour protein D52

PTHR19307:SF12

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 40 80 120 160 200 247

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9