https://www.alphaknockout.com

Mouse Wdr37 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Wdr37 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Wdr37 (NCBI Reference Sequence: NM_001039388 ; Ensembl: ENSMUSG00000021147 ) is located on Mouse 13. 14 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 14 (Transcript: ENSMUST00000054251). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Wdr37 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-90H23 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 9.34% of the coding region. The knockout of Exon 3 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 4159 bp, and the size of intron 3 for 3'-loxP site insertion: 2736 bp. The size of effective cKO region: ~597 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 3 14 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Wdr37 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7097bp) | A(27.19% 1930) | C(17.88% 1269) | T(34.34% 2437) | G(20.59% 1461)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr13 - 8857179 8860178 3000 browser details YourSeq 252 1 2846 3000 95.0% chr11 - 50007651 50078953 71303 browser details YourSeq 228 16 2847 3000 93.9% chr5 + 150662638 151011070 348433 browser details YourSeq 217 1 2846 3000 94.3% chr3 - 122739166 122856295 117130 browser details YourSeq 155 2652 2846 3000 95.3% chr17 - 11599058 11599320 263 browser details YourSeq 145 2687 2853 3000 91.4% chr7 + 145263305 145263467 163 browser details YourSeq 144 2692 2866 3000 89.1% chr9 + 108544255 108544418 164 browser details YourSeq 144 2687 2845 3000 92.9% chr8 + 70510845 70510997 153 browser details YourSeq 143 2686 2846 3000 92.5% chr8 - 112756812 112756969 158 browser details YourSeq 143 2687 2846 3000 94.3% chr1 + 86049912 86050070 159 browser details YourSeq 142 2687 2846 3000 92.4% chr2 + 102403129 102403285 157 browser details YourSeq 141 2692 2846 3000 93.5% chr9 + 44432390 44432541 152 browser details YourSeq 141 2692 2846 3000 94.8% chr15 + 46833840 46833993 154 browser details YourSeq 140 2687 2846 3000 91.8% chr14 + 54922909 54923065 157 browser details YourSeq 139 2523 2831 3000 86.6% chr3 - 152259413 152259591 179 browser details YourSeq 139 2692 2846 3000 92.8% chr15 - 102647402 102647553 152 browser details YourSeq 139 2687 2845 3000 91.7% chr9 + 108656780 108656935 156 browser details YourSeq 139 2695 2846 3000 95.4% chr5 + 90266522 90266672 151 browser details YourSeq 139 2692 2846 3000 92.8% chr12 + 66606165 66606316 152 browser details YourSeq 138 2695 2846 3000 93.3% chr4 - 33314406 33314554 149

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr13 - 8853582 8856581 3000 browser details YourSeq 69 1 2198 3000 87.3% chr11 - 76738796 76887942 149147 browser details YourSeq 62 2082 2322 3000 73.0% chr1 - 127401693 127401896 204 browser details YourSeq 51 2128 2226 3000 75.8% chr6 + 113790224 113790322 99 browser details YourSeq 47 6 95 3000 94.5% chr12 + 91420269 91420361 93 browser details YourSeq 45 12 94 3000 86.8% chr1 - 128301467 128301547 81 browser details YourSeq 45 6 94 3000 96.0% chr2 + 34105014 34105102 89 browser details YourSeq 44 1 106 3000 81.2% chr15 + 74919100 74919201 102 browser details YourSeq 42 10 94 3000 75.3% chr13 - 95530684 95530773 90 browser details YourSeq 42 2143 2200 3000 86.3% chr11 + 111084767 111084824 58 browser details YourSeq 41 4 86 3000 91.2% chr14 - 48887855 48887936 82 browser details YourSeq 41 2166 2231 3000 81.6% chr13 + 30106134 30106199 66 browser details YourSeq 40 60 108 3000 93.5% chr11 + 102941129 102941178 50 browser details YourSeq 39 1 73 3000 90.3% chr4 - 93974581 93974651 71 browser details YourSeq 38 15 109 3000 64.0% chr1 - 136822658 136822744 87 browser details YourSeq 37 1 72 3000 93.2% chr14 - 33365994 33366298 305 browser details YourSeq 37 8 73 3000 86.1% chr1 - 136537084 136537147 64 browser details YourSeq 37 60 109 3000 80.0% chr9 + 69448739 69448784 46 browser details YourSeq 37 2131 2195 3000 78.5% chr5 + 74968965 74969029 65 browser details YourSeq 35 2185 2313 3000 63.6% chr11 - 95262829 95262957 129

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Wdr37 WD repeat domain 37 [ Mus musculus (house mouse) ] Gene ID: 207615, updated on 12-Aug-2019

Gene summary

Official Symbol Wdr37 provided by MGI Official Full Name WD repeat domain 37 provided by MGI Primary source MGI:MGI:1920393 See related Ensembl:ENSMUSG00000021147 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Expression Ubiquitous expression in cerebellum adult (RPKM 6.0), whole brain E14.5 (RPKM 5.6) and 28 other tissues See more Orthologs human all

Genomic context

Location: 13; 13 A1 See Wdr37 in Genome Data Viewer Exon count: 19

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 13 NC_000079.6 (8802966..8872100, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 13 NC_000079.5 (8802214..8870976, complement)

Chromosome 13 - NC_000079.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 17 transcripts

Gene: Wdr37 ENSMUSG00000021147

Description WD repeat domain 37 [Source:MGI Symbol;Acc:MGI:1920393] Gene Synonyms 3110035P10Rik, 4933417A01Rik Location Chromosome 13: 8,802,968-8,871,909 reverse strand. GRCm38:CM001006.2 About this gene This gene has 17 transcripts (splice variants), 252 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 26 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Wdr37- ENSMUST00000054251.12 4626 496aa ENSMUSP00000062174.6 Protein coding CCDS26232 Q8CBE3 TSL:5 202 GENCODE basic APPRIS P1

Wdr37- ENSMUST00000021572.10 4526 496aa ENSMUSP00000021572.4 Protein coding CCDS26232 Q8CBE3 TSL:1 201 GENCODE basic APPRIS P1

Wdr37- ENSMUST00000176922.7 3780 170aa ENSMUSP00000135742.1 Protein coding CCDS49204 Q05BF4 TSL:1 212 GENCODE basic

Wdr37- ENSMUST00000164183.8 1780 170aa ENSMUSP00000131469.2 Protein coding CCDS49204 Q05BF4 TSL:1 203 GENCODE basic

Wdr37- ENSMUST00000176813.7 833 179aa ENSMUSP00000135097.1 Protein coding - H3BJR7 CDS 3' incomplete 211 TSL:5

Wdr37- ENSMUST00000176587.1 699 133aa ENSMUSP00000135271.1 Protein coding - H3BK66 CDS 5' incomplete 209 TSL:3

Wdr37- ENSMUST00000176329.7 697 185aa ENSMUSP00000135101.1 Protein coding - H3BJS1 CDS 3' incomplete 207 TSL:5

Wdr37- ENSMUST00000176429.7 580 193aa ENSMUSP00000134916.1 Protein coding - H3BJB3 CDS 5' and 3' 208 incomplete TSL:5

Wdr37- ENSMUST00000175958.1 402 49aa ENSMUSP00000135182.1 Protein coding - H3BJZ0 CDS 3' incomplete 205 TSL:5

Wdr37- ENSMUST00000177404.1 348 30aa ENSMUSP00000135785.1 Protein coding - H3BLH1 CDS 3' incomplete 214 TSL:3

Wdr37- ENSMUST00000176098.7 611 78aa ENSMUSP00000135094.1 Nonsense mediated - H3BJR4 CDS 5' incomplete 206 decay TSL:5

Wdr37- ENSMUST00000177537.7 527 66aa ENSMUSP00000135577.1 Nonsense mediated - H3BKY3 CDS 5' incomplete 216 decay TSL:5

Wdr37- ENSMUST00000176715.1 426 57aa ENSMUSP00000134987.1 Nonsense mediated - H3BJH4 TSL:5 210 decay

Wdr37- ENSMUST00000221401.1 2252 No - Retained intron - - TSL:NA 217 protein

Wdr37- ENSMUST00000177409.1 502 No - Retained intron - - TSL:3 215 protein

Wdr37- ENSMUST00000177112.7 728 No - lncRNA - - TSL:5 213 protein

Wdr37- ENSMUST00000175687.1 338 No - lncRNA - - TSL:5 204 protein

Page 6 of 8 https://www.alphaknockout.com

88.94 kb Forward strand 8.80Mb 8.82Mb 8.84Mb 8.86Mb 8.88Mb Gm48374-201 >TEC (Comprehensive set...

Contigs < AC132433.3 < AC124732.5 Genes (Comprehensive set... < Gm48327-201TEC < Wdr37-206nonsense mediated decay < Wdr37-212protein coding

< Wdr37-202protein coding

< Wdr37-201protein coding

< Wdr37-216nonsense mediated decay < Wdr37-211protein coding

< Wdr37-208protein coding < Wdr37-205protein coding

< Gm48375-201TEC < Wdr37-217retained intron < Wdr37-214protein coding

< Wdr37-215retained intron < Wdr37-210nonsense mediated decay

< Wdr37-207protein coding

< Wdr37-203protein coding

< Wdr37-209protein coding

< Wdr37-213lncRNA

< Wdr37-204lncRNA

Regulatory Build

8.80Mb 8.82Mb 8.84Mb 8.86Mb 8.88Mb Reverse strand 88.94 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000054251

< Wdr37-202protein coding

Reverse strand 68.76 kb

ENSMUSP00000062... MobiDB lite Coiled-coils (Ncoils) Superfamily WD40-repeat-containing domain superfamily

SMART WD40 repeat Prints G-protein beta WD-40 repeat Pfam WD40 repeat PROSITE profiles WD40-repeat-containing domain

WD40 repeat PROSITE patterns WD40 repeat, conserved site PANTHER PTHR19855

PTHR19855:SF12 Gene3D WD40/YVTN repeat-like-containing domain superfamily CDD cd00200

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

synonymous variant

Scale bar 0 60 120 180 240 300 360 420 496

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8