https://www.alphaknockout.com

Mouse Nvl Knockout Project (CRISPR/Cas9)

Objective: To create a Nvl knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Nvl (NCBI Reference Sequence: NM_026171 ; Ensembl: ENSMUSG00000026516 ) is located on Mouse 1. 23 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 23 (Transcript: ENSMUST00000027797). Exon 3~9 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 5.15% of the coding region. Exon 3~9 covers 32.2% of the coding region. The size of effective KO region: ~9798 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3' 4

1 3 5 6 7 8 9 23

Legends Exon of mouse Nvl Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 9 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(31.9% 638) | C(21.1% 422) | T(26.55% 531) | G(20.45% 409)

Note: The 2000 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.35% 527) | C(20.8% 416) | T(33.5% 670) | G(19.35% 387)

Note: The 2000 bp section downstream of Exon 9 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr1 - 181139307 181141306 2000 browser details YourSeq 669 1 822 2000 91.4% chr3 - 57889930 57890762 833 browser details YourSeq 667 1 818 2000 91.4% chr17 + 86848841 86849656 816 browser details YourSeq 655 1 822 2000 90.3% chr4 + 99787548 99788375 828 browser details YourSeq 652 1 822 2000 89.9% chr2 - 29609772 29610595 824 browser details YourSeq 648 1 822 2000 89.8% chrX - 71722768 71723583 816 browser details YourSeq 647 1 822 2000 90.3% chr14 - 46855948 46856767 820 browser details YourSeq 646 1 822 2000 90.4% chr7 + 97942293 97943115 823 browser details YourSeq 645 1 822 2000 91.2% chr5 + 29299558 29300380 823 browser details YourSeq 644 1 822 2000 90.4% chr16 + 13346037 13346863 827 browser details YourSeq 642 19 822 2000 91.1% chr1 - 167525767 167526574 808 browser details YourSeq 641 1 822 2000 89.7% chr19 - 42642477 42643299 823 browser details YourSeq 640 1 822 2000 89.8% chr18 - 77111213 77112034 822 browser details YourSeq 640 1 822 2000 90.0% chr13 + 76015383 76016206 824 browser details YourSeq 639 1 821 2000 90.0% chr16 - 7415243 7416065 823 browser details YourSeq 639 1 822 2000 90.3% chr17 + 53408520 53409343 824 browser details YourSeq 638 1 822 2000 89.6% chr16 + 36458505 36459328 824 browser details YourSeq 637 1 822 2000 90.5% chr18 + 12398908 12399706 799 browser details YourSeq 635 1 822 2000 90.0% chr6 - 149371790 149372613 824 browser details YourSeq 635 1 822 2000 90.5% chr11 + 20761080 20761902 823

Note: The 2000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr1 - 181127509 181129508 2000 browser details YourSeq 217 1195 1482 2000 90.1% chr5 - 25585161 25585783 623 browser details YourSeq 178 23 442 2000 85.8% chr8 - 116361525 116361730 206 browser details YourSeq 173 262 543 2000 92.6% chr2 + 112711854 112712359 506 browser details YourSeq 170 263 472 2000 92.5% chr2 - 33524802 33525022 221 browser details YourSeq 169 263 448 2000 95.7% chrX + 151131834 151132020 187 browser details YourSeq 169 258 444 2000 95.2% chr10 + 117713891 117714077 187 browser details YourSeq 168 254 445 2000 95.7% chr2 - 169975594 169976005 412 browser details YourSeq 167 260 447 2000 94.7% chr8 + 57389708 57389899 192 browser details YourSeq 166 258 447 2000 94.7% chr10 - 119514066 119514255 190 browser details YourSeq 166 113 446 2000 90.2% chr18 + 17073570 17074038 469 browser details YourSeq 166 255 446 2000 93.3% chr12 + 83937234 83937425 192 browser details YourSeq 166 257 446 2000 93.7% chr11 + 106554240 106554429 190 browser details YourSeq 166 255 447 2000 93.3% chr11 + 86550694 86550887 194 browser details YourSeq 164 173 445 2000 92.3% chrX + 159287841 159288497 657 browser details YourSeq 163 266 444 2000 96.6% chrX - 92491645 92491827 183 browser details YourSeq 163 257 445 2000 92.5% chr4 - 126062112 126062298 187 browser details YourSeq 163 257 446 2000 94.1% chr2 - 38850961 38851150 190 browser details YourSeq 163 261 444 2000 93.9% chr19 + 5783560 5783741 182 browser details YourSeq 163 255 445 2000 91.6% chr12 + 59167268 59167457 190

Note: The 2000 bp section downstream of Exon 9 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Nvl nuclear VCP-like [ Mus musculus (house mouse) ] Gene ID: 67459, updated on 12-Aug-2019

Gene summary

Official Symbol Nvl provided by MGI Official Full Name nuclear VCP-like provided by MGI Primary source MGI:MGI:1914709 See related Ensembl:ENSMUSG00000026516 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 1200009I24Rik Expression Ubiquitous expression in CNS E18 (RPKM 9.1), CNS E14 (RPKM 8.2) and 28 other tissues See more Orthologs human all

Genomic context

Location: 1; 1 H4 See Nvl in Genome Data Viewer Exon count: 24

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 1 NC_000067.6 (181087138..181144214, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 1 NC_000067.5 (183023554..183074288, complement)

Chromosome 1 - NC_000067.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: Nvl ENSMUSG00000026516

Description nuclear VCP-like [Source:MGI Symbol;Acc:MGI:1914709] Gene Synonyms 1200009I24Rik Location : 181,087,138-181,144,204 reverse strand. GRCm38:CM000994.2 About this gene This gene has 5 transcripts (splice variants), 204 orthologues, 5 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Nvl-201 ENSMUST00000027797.8 9392 855aa ENSMUSP00000027797.7 Protein coding CCDS15581 Q9DBY8 TSL:1 GENCODE basic APPRIS P1

Nvl-204 ENSMUST00000193758.1 2529 No protein - Retained intron - - TSL:NA

Nvl-203 ENSMUST00000191728.1 1502 No protein - Retained intron - - TSL:1

Nvl-205 ENSMUST00000195209.1 1193 No protein - Retained intron - - TSL:2

Nvl-202 ENSMUST00000191721.1 603 No protein - lncRNA - - TSL:2

77.07 kb Forward strand 181.08Mb 181.10Mb 181.12Mb 181.14Mb Fgfr3-ps-201 >processed pseudogene Cnih4-205 >protein coding (Comprehensive set...

Cnih4-203 >protein coding

Cnih4-204 >protein coding

Contigs < AC119911.10 Genes (Comprehensive set... < Nvl-201protein coding

< Nvl-202lncRNA < Nvl-204retained intron

< Nvl-205retained intron

< Nvl-203retained intron

Regulatory Build

181.08Mb 181.10Mb 181.12Mb 181.14Mb Reverse strand 77.07 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript pseudogene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000027797

< Nvl-201protein coding

Reverse strand 57.07 kb

ENSMUSP00000027... PDB-ENSP mappings MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily P-loop containing nucleoside triphosphate hydrolase SMART AAA+ ATPase domain Pfam NVL2, nucleolin binding domain ATPase, AAA-type, core

AAA ATPase, AAA+ lid domain PROSITE patterns ATPase, AAA-type, conserved site PANTHER PTHR23077:SF55

PTHR23077 Gene3D NVL2, N-terminal domain superfamily 3.40.50.300

1.10.8.60 CDD cd00009

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 855

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8