https://www.alphaknockout.com

Mouse Wdhd1 Knockout Project (CRISPR/Cas9)

Objective: To create a Wdhd1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Wdhd1 (NCBI Reference Sequence: NM_172598 ; Ensembl: ENSMUSG00000037572 ) is located on Mouse 14. 25 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 25 (Transcript: ENSMUST00000111792). Exon 2~11 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 2.41% of the coding region. Exon 2~11 covers 34.91% of the coding region. The size of effective KO region: ~9182 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 11 25

Legends Exon of mouse Wdhd1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1665 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 11 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(1665bp) | A(24.5% 408) | C(21.92% 365) | T(31.95% 532) | G(21.62% 360)

Note: The 1665 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(32.1% 642) | C(16.8% 336) | T(33.55% 671) | G(17.55% 351)

Note: The 2000 bp section downstream of Exon 11 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1665 1 1665 1665 100.0% chr14 - 47274856 47276520 1665 browser details YourSeq 150 474 637 1665 96.4% chr6 + 82953840 82954226 387 browser details YourSeq 147 481 649 1665 96.3% chr7 + 28625960 28626351 392 browser details YourSeq 144 314 629 1665 86.7% chr1 - 58419060 58419274 215 browser details YourSeq 141 477 629 1665 96.8% chr4 - 152207698 152207857 160 browser details YourSeq 140 478 629 1665 96.7% chr18 - 75268713 75268867 155 browser details YourSeq 140 474 633 1665 94.3% chr14 + 17791366 17791526 161 browser details YourSeq 139 474 629 1665 94.9% chr4 - 36844510 36844671 162 browser details YourSeq 138 477 629 1665 95.5% chr7 - 127871465 127871619 155 browser details YourSeq 137 477 629 1665 95.4% chr17 - 80762130 80762285 156 browser details YourSeq 136 485 629 1665 97.3% chr1 - 114359727 114359875 149 browser details YourSeq 136 474 629 1665 95.4% chr8 + 126860530 126860693 164 browser details YourSeq 136 483 629 1665 96.6% chr10 + 67041614 67041761 148 browser details YourSeq 135 485 629 1665 96.6% chr3 - 127582060 127582204 145 browser details YourSeq 135 484 633 1665 95.4% chr5 + 31143883 31144033 151 browser details YourSeq 135 489 629 1665 97.9% chr11 + 97861307 97861447 141 browser details YourSeq 134 490 643 1665 95.3% chrX - 13153231 13153388 158 browser details YourSeq 134 477 629 1665 95.3% chr18 - 43176565 43176759 195 browser details YourSeq 134 489 629 1665 95.7% chr14 - 69037588 69037726 139 browser details YourSeq 134 489 637 1665 95.3% chr5 + 146379531 146379688 158

Note: The 1665 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr14 - 47263674 47265673 2000 browser details YourSeq 259 803 1869 2000 93.4% chr2 + 170214641 170242807 28167 browser details YourSeq 175 1701 1896 2000 95.4% chr4 + 129658097 129658299 203 browser details YourSeq 170 1700 1896 2000 93.5% chr1 + 191948240 191948437 198 browser details YourSeq 169 1684 1892 2000 89.5% chr7 - 133382757 133382948 192 browser details YourSeq 169 1700 1899 2000 91.0% chr4 - 116055138 116055326 189 browser details YourSeq 169 1701 1873 2000 98.9% chr11 - 75475921 75476093 173 browser details YourSeq 167 1697 1897 2000 91.1% chr8 - 20429428 20429621 194 browser details YourSeq 166 1700 1905 2000 92.8% chr15 + 76962800 76963006 207 browser details YourSeq 166 1700 1911 2000 90.3% chr14 + 30997801 30998007 207 browser details YourSeq 166 1699 1893 2000 93.0% chr11 + 4100305 4100496 192 browser details YourSeq 165 1697 1896 2000 93.7% chr2 - 30930824 30931026 203 browser details YourSeq 165 1701 1892 2000 90.9% chr10 - 94567107 94567291 185 browser details YourSeq 164 1698 1892 2000 90.5% chr2 - 93271004 93271194 191 browser details YourSeq 163 1697 1896 2000 89.8% chr17 + 70898453 70898640 188 browser details YourSeq 163 1701 1892 2000 92.9% chr12 + 92248395 92248583 189 browser details YourSeq 162 1700 1892 2000 90.7% chr16 - 22638354 22638537 184 browser details YourSeq 162 1701 1892 2000 90.3% chr15 - 81560102 81560286 185 browser details YourSeq 162 1704 1892 2000 91.6% chr6 + 33680428 33680607 180 browser details YourSeq 162 357 971 2000 93.6% chr11 + 78704947 78705608 662

Note: The 2000 bp section downstream of Exon 11 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Wdhd1 WD repeat and HMG-box DNA binding protein 1 [ Mus musculus (house mouse) ] Gene ID: 218973, updated on 24-Oct-2019

Gene summary

Official Symbol Wdhd1 provided by MGI Official Full Name WD repeat and HMG-box DNA binding protein 1 provided by MGI Primary source MGI:MGI:2443514 See related Ensembl:ENSMUSG00000037572 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AND-1; A630092E18; D630024B06Rik Expression Biased expression in liver E14 (RPKM 17.1), liver E14.5 (RPKM 13.7) and 14 other tissues See more Orthologs human all

Genomic context

Location: 14; 14 C1 See Wdhd1 in Genome Data Viewer Exon count: 26

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (47240944..47276861, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (47860619..47896532, complement)

Chromosome 14 - NC_000080.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Wdhd1 ENSMUSG00000037572

Description WD repeat and HMG-box DNA binding protein 1 [Source:MGI Symbol;Acc:MGI:2443514] Gene Synonyms AND-1, D630024B06Rik Location : 47,240,944-47,276,857 reverse strand. GRCm38:CM001007.2 About this gene This gene has 6 transcripts (splice variants), 211 orthologues, is a member of 1 Ensembl protein family and is associated with 3 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Wdhd1-204 ENSMUST00000187531.7 4260 1118aa ENSMUSP00000141182.2 Protein coding - A0A2K6EDP7 TSL:5 GENCODE basic APPRIS P5

Wdhd1-202 ENSMUST00000111792.8 4174 1081aa ENSMUSP00000107422.1 Protein coding - A0A0R4J1F8 TSL:5 GENCODE basic APPRIS ALT2

Wdhd1-201 ENSMUST00000111790.1 2962 641aa ENSMUSP00000107420.1 Protein coding - P59328 TSL:2 GENCODE basic

Wdhd1-205 ENSMUST00000227041.1 203 67aa ENSMUSP00000154026.1 Protein coding - A0A2I3BQ38 CDS 5' and 3' incomplete

Wdhd1-203 ENSMUST00000139124.1 2777 No protein - Retained intron - - TSL:1

Wdhd1-206 ENSMUST00000228810.1 706 No protein - Retained intron - - -

Page 7 of 9 https://www.alphaknockout.com

55.91 kb Forward strand 47.24Mb 47.25Mb 47.26Mb 47.27Mb 47.28Mb Gm49150-201 >lncRNA (Comprehensive set...

Socs4-201 >protein coding

Socs4-202 >protein coding

Contigs AC154589.3 > < AC156016.2 Genes (Comprehensive set... < Wdhd1-202protein coding

< Wdhd1-204protein coding

< Wdhd1-206retained intron < Wdhd1-205protein coding < Gm49149-201TEC < Gm24378-201snRNA

< Wdhd1-201protein coding

< Wdhd1-203retained intron

Regulatory Build

47.24Mb 47.25Mb 47.26Mb 47.27Mb 47.28Mb Reverse strand 55.91 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000111792

< Wdhd1-202protein coding

Reverse strand 35.91 kb

ENSMUSP00000107... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily WD40-repeat-containing domain superfamily High mobility group box domain superfamily

SMART WD40 repeat High mobility group box domain Pfam WD40 repeat Anaphase-promoting complex subunit 4, WD40 domain High mobility group box domain

Minichromosome loss protein Mcl1, middle region PROSITE profiles WD40-repeat-containing domain High mobility group box domain

WD40 repeat PROSITE patterns WD40 repeat, conserved site PANTHER PTHR19932:SF10

PTHR19932 Gene3D WD40/YVTN repeat-like-containing domain superfamily High mobility group box domain superfamily CDD cd00200 cd00084

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 100 200 300 400 500 600 700 800 900 1081

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9