https://www.alphaknockout.com

Mouse Snapc4 Knockout Project (CRISPR/Cas9)

Objective: To create a Snapc4 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Snapc4 (NCBI Reference Sequence: NM_172339 ; Ensembl: ENSMUSG00000036281 ) is located on Mouse 2. 23 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 22 (Transcript: ENSMUST00000035427). Exon 2~16 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from the coding region. Exon 2~16 covers 49.66% of the coding region. The size of effective KO region: ~9498 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 89 10 11 12 13 14 15 16 23

Legends Exon of mouse Snapc4 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1912 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 730 bp section downstream of Exon 16 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(1912bp) | A(21.29% 407) | C(22.86% 437) | T(29.81% 570) | G(26.05% 498)

Note: The 1912 bp section upstream of Exon 2 is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(730bp) | A(21.37% 156) | C(24.11% 176) | T(27.12% 198) | G(27.4% 200)

Note: The 730 bp section downstream of Exon 16 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1912 1 1912 1912 100.0% chr2 - 26378624 26380535 1912 browser details YourSeq 216 732 1125 1912 91.0% chr7 - 73609495 73610204 710 browser details YourSeq 185 859 1120 1912 89.1% chr16 + 82752580 82753109 530 browser details YourSeq 183 854 1120 1912 91.5% chr11 - 62188828 62189337 510 browser details YourSeq 182 854 1131 1912 91.4% chr2 - 18222038 18222343 306 browser details YourSeq 182 860 1120 1912 90.7% chr11 - 29672435 29672863 429 browser details YourSeq 181 811 1102 1912 92.2% chr11 - 75527361 75527962 602 browser details YourSeq 178 812 1084 1912 91.3% chr1 - 131991175 131991759 585 browser details YourSeq 176 860 1127 1912 93.6% chr1 + 72232488 72242579 10092 browser details YourSeq 171 860 1125 1912 91.4% chr11 + 55472438 55472987 550 browser details YourSeq 163 854 1100 1912 94.6% chr11 - 86589606 86941702 352097 browser details YourSeq 162 784 1003 1912 87.4% chr2 + 142919278 142919477 200 browser details YourSeq 158 865 1085 1912 91.7% chr12 + 106536845 106537382 538 browser details YourSeq 157 793 995 1912 88.3% chr17 - 33591056 33591239 184 browser details YourSeq 156 809 1085 1912 93.0% chr11 - 101662641 101663080 440 browser details YourSeq 155 793 1048 1912 84.2% chr12 + 21459328 21459525 198 browser details YourSeq 153 854 1096 1912 91.3% chr8 - 90911463 90911941 479 browser details YourSeq 150 775 1009 1912 86.5% chr15 + 96897091 96897313 223 browser details YourSeq 150 775 993 1912 87.0% chr12 + 40083902 40084109 208 browser details YourSeq 146 809 993 1912 88.5% chr4 - 55197046 55197210 165

Note: The 1912 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 730 1 730 730 100.0% chr2 - 26368409 26369138 730 browser details YourSeq 23 461 486 730 96.0% chr2 - 122250581 122250606 26 browser details YourSeq 22 219 241 730 100.0% chr16 - 52215386 52215411 26 browser details YourSeq 22 428 451 730 87.0% chr15 + 93300234 93300256 23 browser details YourSeq 20 516 537 730 95.5% chr15 + 67553906 67553927 22

Note: The 730 bp section downstream of Exon 16 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Snapc4 small nuclear RNA activating complex, polypeptide 4 [ Mus musculus (house mouse) ] Gene ID: 227644, updated on 14-Aug-2019

Gene summary

Official Symbol Snapc4 provided by MGI Official Full Name small nuclear RNA activating complex, polypeptide 4 provided by MGI Primary source MGI:MGI:2443935 See related Ensembl:ENSMUSG00000036281 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 5730436L13Rik Expression Ubiquitous expression in ovary adult (RPKM 8.1), genital fat pad adult (RPKM 7.7) and 28 other tissues See more Orthologs human all

Genomic context

Location: 2; 2 A3 See Snapc4 in Genome Data Viewer Exon count: 25

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 2 NC_000068.7 (26362765..26380661, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 2 NC_000068.6 (26218285..26236173, complement)

Chromosome 2 - NC_000068.7

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 14 transcripts

Gene: Snapc4 ENSMUSG00000036281

Description small nuclear RNA activating complex, polypeptide 4 [Source:MGI Symbol;Acc:MGI:2443935] Gene Synonyms 5730436L13Rik Location Chromosome 2: 26,362,765-26,380,653 reverse strand. GRCm38:CM000995.2 About this gene This gene has 14 transcripts (splice variants), 176 orthologues, 15 paralogues, is a member of 1 Ensembl protein family and is associated with 3 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Snapc4-201 ENSMUST00000035427.10 4363 1325aa ENSMUSP00000041767.4 Protein coding CCDS15802 Q8BP86 TSL:1 GENCODE basic APPRIS P2

Snapc4-202 ENSMUST00000114115.8 4368 1333aa ENSMUSP00000109750.2 Protein coding - A2AIV6 TSL:1 GENCODE basic APPRIS ALT2

Snapc4-203 ENSMUST00000123934.1 761 254aa ENSMUSP00000122456.1 Protein coding - F7D5X7 CDS 5' and 3' incomplete TSL:3

Snapc4-212 ENSMUST00000149850.7 4459 No protein - lncRNA - - TSL:1

Snapc4-213 ENSMUST00000150121.1 2039 No protein - lncRNA - - TSL:2

Snapc4-207 ENSMUST00000135171.1 814 No protein - lncRNA - - TSL:2

Snapc4-211 ENSMUST00000149316.7 797 No protein - lncRNA - - TSL:2

Snapc4-209 ENSMUST00000144871.1 793 No protein - lncRNA - - TSL:3

Snapc4-208 ENSMUST00000136054.1 724 No protein - lncRNA - - TSL:3

Snapc4-210 ENSMUST00000148024.1 709 No protein - lncRNA - - TSL:5

Snapc4-204 ENSMUST00000124843.1 678 No protein - lncRNA - - TSL:3

Snapc4-205 ENSMUST00000125789.1 543 No protein - lncRNA - - TSL:2

Snapc4-214 ENSMUST00000155643.1 372 No protein - lncRNA - - TSL:3

Snapc4-206 ENSMUST00000133778.7 319 No protein - lncRNA - - TSL:3

Page 7 of 9 https://www.alphaknockout.com

37.89 kb Forward strand 26.36Mb 26.37Mb 26.38Mb 26.39Mb Gm13562-201 >lncRNA Gm13563-201 >lncRNA (Comprehensive set...

Gm13562-202 >lncRNA Gm13563-203 >lncRNA

Pmpca-201 >protein coding

Pmpca-202 >protein coding

Gm13563-202 >lncRNA

Pmpca-204 >lncRNA

Pmpca-205 >lncRNA

Pmpca-203 >lncRNA

Contigs < AL732541.11 Genes (Comprehensive set... < Card9-202protein coding < Snapc4-202protein coding < Entr1-204protein coding

< Card9-201protein coding < Snapc4-207lncRNA < Snapc4-203protein coding < Snapc4-204lncRNA < Entr1-201protein coding

< Card9-203lncRNA < Snapc4-201protein coding < Entr1-202protein coding

< Snapc4-212lncRNA < Entr1-203protein coding

< Snapc4-208lncRNA < Snapc4-209lncRNA < Snapc4-214lncRNA< Entr1-206lncRNA

< Snapc4-211lncRNA < Entr1-208lncRNA

< Snapc4-205lncRNA < Entr1-209lncRNA

< Snapc4-210lncRNA < Entr1-211protein coding

< Snapc4-213lncRNA < Entr1-207lncRNA

< Snapc4-206lncRNA < Entr1-205lncRNA

< Entr1-213lncRNA

< Entr1-210lncRNA

< Entr1-212lncRNA

Regulatory Build

26.36Mb 26.37Mb 26.38Mb 26.39Mb Reverse strand 37.89 kb

Regulation Legend

CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000035427

< Snapc4-201protein coding

Reverse strand 17.88 kb

ENSMUSP00000041... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Homeobox-like domain superfamily SMART SANT/Myb domain Pfam PF13921 PROSITE profiles Myb-like domain

Myb domain PANTHER PTHR46621 Gene3D 1.10.10.60 CDD SANT/Myb domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend inframe insertion missense variant synonymous variant

Scale bar 0 200 400 600 800 1000 1325

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9