https://www.alphaknockout.com

Mouse Wdr37 Knockout Project (CRISPR/Cas9)

Objective: To create a Wdr37 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Wdr37 (NCBI Reference Sequence: NM_001039388 ; Ensembl: ENSMUSG00000021147 ) is located on Mouse 13. 14 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 14 (Transcript: ENSMUST00000054251). Exon 2~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from the coding region. Exon 2~5 covers 26.61% of the coding region. The size of effective KO region: ~7596 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 14

Legends Exon of mouse Wdr37 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(32.95% 659) | C(18.0% 360) | T(29.05% 581) | G(20.0% 400)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(29.0% 580) | C(14.35% 287) | T(36.45% 729) | G(20.2% 404)

Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr13 - 8861226 8863225 2000 browser details YourSeq 294 657 1666 2000 94.3% chr11 + 85939357 86547150 607794 browser details YourSeq 277 658 1666 2000 94.9% chr11 - 109813265 110340344 527080 browser details YourSeq 276 662 1666 2000 95.5% chr9 + 54788410 55510740 722331 browser details YourSeq 238 658 1666 2000 92.2% chr1 - 153865481 154069491 204011 browser details YourSeq 161 657 839 2000 91.9% chr4 + 108673918 108674088 171 browser details YourSeq 158 191 814 2000 90.7% chr4 + 129669917 129670508 592 browser details YourSeq 153 645 824 2000 92.1% chr12 + 105219874 105220052 179 browser details YourSeq 152 659 824 2000 95.8% chrX - 52196393 52196558 166 browser details YourSeq 152 655 824 2000 94.8% chr10 - 26792958 26793127 170 browser details YourSeq 151 658 824 2000 95.3% chr5 - 98959588 98959754 167 browser details YourSeq 151 659 1035 2000 84.0% chr11 + 111903265 111903444 180 browser details YourSeq 150 1422 1666 2000 94.7% chr7 - 104052011 104052639 629 browser details YourSeq 150 655 1033 2000 84.7% chr12 + 69688135 69688303 169 browser details YourSeq 149 661 824 2000 95.8% chr7 - 127951646 127951811 166 browser details YourSeq 149 658 824 2000 93.4% chr8 + 112094702 112094867 166 browser details YourSeq 149 657 821 2000 95.2% chr1 + 183167247 183167411 165 browser details YourSeq 148 657 820 2000 95.2% chr11 - 61753727 61753890 164 browser details YourSeq 148 661 824 2000 95.2% chr3 + 72456391 72456554 164 browser details YourSeq 147 663 823 2000 95.7% chr1 + 133233608 133233768 161

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr13 - 8851670 8853669 2000 browser details YourSeq 84 1844 1985 2000 78.1% chr12 - 105204856 105204996 141 browser details YourSeq 84 1824 1973 2000 76.1% chr11 - 101205639 101205781 143 browser details YourSeq 84 1826 1969 2000 75.8% chr5 + 150746382 150746518 137 browser details YourSeq 82 1834 1968 2000 81.1% chr7 - 79939002 79939139 138 browser details YourSeq 82 1842 1964 2000 81.7% chr1 + 59754498 59754618 121 browser details YourSeq 81 1830 1968 2000 80.5% chrX - 155352871 155353016 146 browser details YourSeq 81 1824 1984 2000 75.7% chr10 + 111547057 111547215 159 browser details YourSeq 80 1826 1967 2000 76.1% chr14 - 33074666 33074804 139 browser details YourSeq 80 1824 1969 2000 77.4% chr12 + 12831141 12831281 141 browser details YourSeq 76 1849 1969 2000 82.8% chr11 - 102243917 102244195 279 browser details YourSeq 75 1826 1964 2000 79.3% chr5 + 87583345 87583478 134 browser details YourSeq 74 1851 1968 2000 81.4% chr4 - 108673936 108674053 118 browser details YourSeq 74 1830 1973 2000 73.5% chr11 - 85939371 85939507 137 browser details YourSeq 73 1826 1978 2000 75.0% chr10 - 95260201 95260341 141 browser details YourSeq 73 1628 1969 2000 74.8% chr17 + 15545588 15545772 185 browser details YourSeq 72 1824 1970 2000 75.3% chr7 - 122011834 122011954 121 browser details YourSeq 70 1842 1969 2000 75.9% chr12 - 110996035 110996160 126 browser details YourSeq 68 1851 1968 2000 78.9% chr15 - 46725203 46725320 118 browser details YourSeq 68 1852 1969 2000 78.9% chr11 - 78484974 78485091 118

Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Wdr37 WD repeat domain 37 [ Mus musculus (house mouse) ] Gene ID: 207615, updated on 12-Aug-2019

Gene summary

Official Symbol Wdr37 provided by MGI Official Full Name WD repeat domain 37 provided by MGI Primary source MGI:MGI:1920393 See related Ensembl:ENSMUSG00000021147 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Expression Ubiquitous expression in cerebellum adult (RPKM 6.0), whole brain E14.5 (RPKM 5.6) and 28 other tissues See more Orthologs human all

Genomic context

Location: 13; 13 A1 See Wdr37 in Genome Data Viewer

Exon count: 19

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 13 NC_000079.6 (8802966..8872100, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 13 NC_000079.5 (8802214..8870976, complement)

Chromosome 13 - NC_000079.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 17 transcripts

Gene: Wdr37 ENSMUSG00000021147

Description WD repeat domain 37 [Source:MGI Symbol;Acc:MGI:1920393] Gene Synonyms 3110035P10Rik, 4933417A01Rik Location Chromosome 13: 8,802,968-8,871,909 reverse strand. GRCm38:CM001006.2 About this gene This gene has 17 transcripts (splice variants), 252 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 26 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Wdr37- ENSMUST00000054251.12 4626 496aa ENSMUSP00000062174.6 Protein coding CCDS26232 Q8CBE3 TSL:5 202 GENCODE basic APPRIS P1

Wdr37- ENSMUST00000021572.10 4526 496aa ENSMUSP00000021572.4 Protein coding CCDS26232 Q8CBE3 TSL:1 201 GENCODE basic APPRIS P1

Wdr37- ENSMUST00000176922.7 3780 170aa ENSMUSP00000135742.1 Protein coding CCDS49204 Q05BF4 TSL:1 212 GENCODE basic

Wdr37- ENSMUST00000164183.8 1780 170aa ENSMUSP00000131469.2 Protein coding CCDS49204 Q05BF4 TSL:1 203 GENCODE basic

Wdr37- ENSMUST00000176813.7 833 179aa ENSMUSP00000135097.1 Protein coding - H3BJR7 CDS 3' incomplete 211 TSL:5

Wdr37- ENSMUST00000176587.1 699 133aa ENSMUSP00000135271.1 Protein coding - H3BK66 CDS 5' incomplete 209 TSL:3

Wdr37- ENSMUST00000176329.7 697 185aa ENSMUSP00000135101.1 Protein coding - H3BJS1 CDS 3' incomplete 207 TSL:5

Wdr37- ENSMUST00000176429.7 580 193aa ENSMUSP00000134916.1 Protein coding - H3BJB3 CDS 5' and 3' 208 incomplete TSL:5

Wdr37- ENSMUST00000175958.1 402 49aa ENSMUSP00000135182.1 Protein coding - H3BJZ0 CDS 3' incomplete 205 TSL:5

Wdr37- ENSMUST00000177404.1 348 30aa ENSMUSP00000135785.1 Protein coding - H3BLH1 CDS 3' incomplete 214 TSL:3

Wdr37- ENSMUST00000176098.7 611 78aa ENSMUSP00000135094.1 Nonsense mediated - H3BJR4 CDS 5' incomplete 206 decay TSL:5

Wdr37- ENSMUST00000177537.7 527 66aa ENSMUSP00000135577.1 Nonsense mediated - H3BKY3 CDS 5' incomplete 216 decay TSL:5

Wdr37- ENSMUST00000176715.1 426 57aa ENSMUSP00000134987.1 Nonsense mediated - H3BJH4 TSL:5 210 decay

Wdr37- ENSMUST00000221401.1 2252 No - Retained intron - - TSL:NA 217 protein

Wdr37- ENSMUST00000177409.1 502 No - Retained intron - - TSL:3 215 protein

Wdr37- ENSMUST00000177112.7 728 No - lncRNA - - TSL:5 213 protein

Wdr37- ENSMUST00000175687.1 338 No - lncRNA - - TSL:5 204 protein

Page 7 of 9 https://www.alphaknockout.com

88.94 kb Forward strand 8.80Mb 8.82Mb 8.84Mb 8.86Mb 8.88Mb Gm48374-201 >TEC (Comprehensive set...

Contigs < AC132433.3 < AC124732.5 Genes (Comprehensive set... < Gm48327-201TEC < Wdr37-206nonsense mediated decay < Wdr37-212protein coding

< Wdr37-202protein coding

< Wdr37-201protein coding

< Wdr37-216nonsense mediated decay < Wdr37-211protein coding

< Wdr37-208protein coding < Wdr37-205protein coding

< Gm48375-201TEC < Wdr37-217retained intron < Wdr37-214protein coding

< Wdr37-215retained intron < Wdr37-210nonsense mediated decay

< Wdr37-207protein coding

< Wdr37-203protein coding

< Wdr37-209protein coding

< Wdr37-213lncRNA

< Wdr37-204lncRNA

Regulatory Build

8.80Mb 8.82Mb 8.84Mb 8.86Mb 8.88Mb Reverse strand 88.94 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000054251

< Wdr37-202protein coding

Reverse strand 68.76 kb

ENSMUSP00000062... MobiDB lite Coiled-coils (Ncoils) Superfamily WD40-repeat-containing domain superfamily

SMART WD40 repeat Prints G-protein beta WD-40 repeat Pfam WD40 repeat PROSITE profiles WD40-repeat-containing domain

WD40 repeat PROSITE patterns WD40 repeat, conserved site PANTHER PTHR19855

PTHR19855:SF12 Gene3D WD40/YVTN repeat-like-containing domain superfamily CDD cd00200

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

synonymous variant

Scale bar 0 60 120 180 240 300 360 420 496

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9