https://www.alphaknockout.com

Mouse Nckipsd Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Nckipsd conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Nckipsd (NCBI Reference Sequence: NM_030729 ; Ensembl: ENSMUSG00000032598 ) is located on Mouse 9. 13 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 13 (Transcript: ENSMUST00000035218). Exon 3~5 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Nckipsd gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-372F13 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a null mutation exhibit altered composition of postsynaptic densities and actin cytoskeleton in hippocampal neurons.

Exon 3 starts from about 13.17% of the coding region. The knockout of Exon 3~5 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 393 bp, and the size of intron 5 for 3'-loxP site insertion: 1239 bp. The size of effective cKO region: ~1607 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 13 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Nckipsd Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(8053bp) | A(21.9% 1764) | C(26.86% 2163) | T(22.96% 1849) | G(28.28% 2277)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 + 108808354 108811353 3000 browser details YourSeq 31 1865 1913 3000 85.8% chr7 - 73447825 73447871 47 browser details YourSeq 25 8 36 3000 96.3% chr2 + 147180641 147180670 30 browser details YourSeq 24 1862 1887 3000 88.0% chr10 + 126407420 126407444 25 browser details YourSeq 23 2581 2608 3000 84.0% chr5 - 98050411 98050436 26 browser details YourSeq 23 2976 3000 3000 96.0% chr17 - 87924692 87924716 25 browser details YourSeq 23 2976 3000 3000 87.5% chr1 - 36291369 36291392 24 browser details YourSeq 21 2514 2534 3000 100.0% chr4 - 66489286 66489306 21 browser details YourSeq 21 2824 2844 3000 100.0% chr15 - 99784161 99784181 21 browser details YourSeq 21 796 816 3000 100.0% chr9 + 41141266 41141286 21 browser details YourSeq 21 2513 2533 3000 100.0% chr6 + 38690586 38690606 21 browser details YourSeq 20 358 377 3000 100.0% chr1 - 41333567 41333586 20 browser details YourSeq 20 966 987 3000 95.5% chr1 - 15695836 15695857 22

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr9 + 108812961 108815960 3000 browser details YourSeq 84 371 492 3000 85.6% chr9 - 66602603 66602735 133 browser details YourSeq 80 355 486 3000 83.5% chr11 + 13767902 13768037 136 browser details YourSeq 79 360 485 3000 77.9% chr1 - 182166181 182166302 122 browser details YourSeq 79 367 493 3000 85.6% chr7 + 141021816 141021953 138 browser details YourSeq 77 360 479 3000 90.8% chrX - 103377593 103378192 600 browser details YourSeq 77 360 488 3000 82.4% chr16 + 45475015 45475156 142 browser details YourSeq 76 360 461 3000 87.3% chr17 + 28926805 28926906 102 browser details YourSeq 72 360 492 3000 92.0% chr2 - 29980945 29981091 147 browser details YourSeq 71 367 461 3000 87.4% chr17 - 34942575 34942669 95 browser details YourSeq 71 366 461 3000 91.8% chr16 + 94344761 94344856 96 browser details YourSeq 70 360 461 3000 84.4% chr12 - 97302771 97302872 102 browser details YourSeq 70 360 461 3000 84.4% chr14 + 47482676 47482777 102 browser details YourSeq 70 360 463 3000 83.7% chr10 + 80869999 80870102 104 browser details YourSeq 69 367 461 3000 86.4% chr12 - 24967234 24967328 95 browser details YourSeq 68 368 461 3000 86.2% chrX + 155221079 155221172 94 browser details YourSeq 67 362 464 3000 82.6% chr16 + 50459734 50459836 103 browser details YourSeq 66 368 461 3000 82.8% chr1 + 162459171 162459263 93 browser details YourSeq 65 368 461 3000 85.2% chr18 - 42496348 42496443 96 browser details YourSeq 65 367 463 3000 83.6% chr16 - 73609456 73609552 97

Note: The 3000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and protein information: Nckipsd NCK interacting protein with SH3 domain [ Mus musculus (house mouse) ] Gene ID: 80987, updated on 12-Aug-2019

Gene summary

Official Symbol Nckipsd provided by MGI Official Full Name NCK interacting protein with SH3 domain provided by MGI Primary source MGI:MGI:1931834 See related Ensembl:ENSMUSG00000032598 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as DIP1; ORF1; WISH; Wasbp; AF3P21; SPIN90; WASLBP Expression Ubiquitous expression in adrenal adult (RPKM 18.3), cerebellum adult (RPKM 18.0) and 28 other tissues See more Orthologs human all

Genomic context

Location: 9; 9 F2 See Nckipsd in Genome Data Viewer

Exon count: 14

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 9 NC_000075.6 (108808346..108818839)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 9 NC_000075.5 (108710711..108720697)

Chromosome 9 - NC_000075.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Nckipsd ENSMUSG00000032598

Description NCK interacting protein with SH3 domain [Source:MGI Symbol;Acc:MGI:1931834] Gene Synonyms AF3P21, DIP1, ORF1, SPIN90, WISH, Wasbp Location Chromosome 9: 108,808,368-108,818,844 forward strand. GRCm38:CM001002.2 View alleles of this gene on alternative sequences About this gene This gene has 6 transcripts (splice variants), 1 gene allele, 196 orthologues, is a member of 1 Ensembl protein family and is associated with 5 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Nckipsd- ENSMUST00000035218.8 3360 714aa ENSMUSP00000035218.7 Protein coding CCDS23538 Q9ESJ4 TSL:1 201 GENCODE basic APPRIS P1

Nckipsd- ENSMUST00000195323.1 467 76aa ENSMUSP00000141728.1 Protein coding - A0A0A6YWW3 TSL:3 206 GENCODE basic

Nckipsd- ENSMUST00000194819.1 367 101aa ENSMUSP00000141702.1 Protein coding - A0A0A6YWU4 CDS 3' incomplete 205 TSL:3

Nckipsd- ENSMUST00000192678.1 178 45aa ENSMUSP00000141857.1 Protein coding - A0A0A6YX64 CDS 5' incomplete 203 TSL:1

Nckipsd- ENSMUST00000192180.1 1089 No - Retained - - TSL:2 202 protein intron

Nckipsd- ENSMUST00000194413.1 754 No - Retained - - TSL:3 204 protein intron

Page 6 of 8 https://www.alphaknockout.com

30.48 kb Forward strand

108.80Mb 108.81Mb 108.82Mb (Comprehensive set... Ip6k2-206 >protein coding Nckipsd-201 >protein coding Celsr3-207 >protein coding

Ip6k2-207 >retained intron Nckipsd-206 >protein coding Nckipsd-204 >retained intron Celsr3-201 >protein coding

Ip6k2-201 >protein coding Nckipsd-205 >protein coding Nckipsd-202 >retained intron Gm37714-201 >TEC

Ip6k2-211 >protein coding Nckipsd-203 >protein coding

Ip6k2-208 >retained intron

Contigs AC168054.4 > Genes < Gm35025-202lncRNA (Comprehensive set...

< Gm35025-203lncRNA

< Gm35025-201TEC

Regulatory Build

108.80Mb 108.81Mb 108.82Mb Reverse strand 30.48 kb

Regulation Legend CTCF Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000035218

10.48 kb Forward strand

Nckipsd-201 >protein coding

ENSMUSP00000035... MobiDB lite Low complexity (Seg) Superfamily SH3-like domain superfamily SMART SH3 domain Pfam SH3 domain Domain of unknown function DUF2013

PROSITE profiles SH3 domain PANTHER PTHR13357 Gene3D 2.30.30.40 CDD SPIN90, SH3 domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend stop gained missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 600 714

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8