https://www.alphaknockout.com

Mouse Slc35c2 Knockout Project (CRISPR/Cas9)

Objective: To create a Slc35c2 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Slc35c2 (NCBI Reference Sequence: NM_144893 ; Ensembl: ENSMUSG00000017664 ) is located on Mouse 2. 10 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 10 (Transcript: ENSMUST00000109300). Exon 2~10 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 0.09% of the coding region. Exon 2~10 covers 100.0% of the coding region. The size of effective KO region: ~6218 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10

Legends Exon of mouse Slc35c2 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.3% 466) | C(26.2% 524) | T(21.65% 433) | G(28.85% 577)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.1% 422) | C(28.4% 568) | T(23.05% 461) | G(27.45% 549)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr2 - 165283369 165285368 2000 browser details YourSeq 24 1295 1323 2000 84.7% chr6 + 13062080 13062106 27 browser details YourSeq 22 696 720 2000 96.0% chr6 - 98563485 98563511 27 browser details YourSeq 22 1054 1075 2000 100.0% chr19 - 18597160 18597181 22 browser details YourSeq 22 379 400 2000 100.0% chr18 - 11043575 11043596 22 browser details YourSeq 22 1372 1396 2000 95.9% chr10 + 53878111 53878137 27 browser details YourSeq 21 444 465 2000 100.0% chr1 - 31750839 31750861 23 browser details YourSeq 21 1298 1328 2000 83.9% chrX + 73829401 73829431 31 browser details YourSeq 20 1333 1352 2000 100.0% chr10 - 36172756 36172775 20 browser details YourSeq 20 1747 1766 2000 100.0% chr1 - 74594402 74594421 20 browser details YourSeq 20 524 543 2000 100.0% chr1 + 32833791 32833810 20

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr2 - 165275149 165277148 2000 browser details YourSeq 30 1218 1266 2000 96.9% chr1 + 119667099 119667149 51 browser details YourSeq 22 1006 1027 2000 100.0% chr13 - 33894390 33894411 22 browser details YourSeq 22 1487 1508 2000 100.0% chr9 + 65277840 65277861 22 browser details YourSeq 22 567 588 2000 100.0% chr4 + 62837535 62837556 22 browser details YourSeq 20 1420 1439 2000 100.0% chr1 - 57330574 57330593 20 browser details YourSeq 20 631 650 2000 100.0% chr1 - 36740189 36740208 20 browser details YourSeq 20 1030 1049 2000 100.0% chr1 - 34764422 34764441 20 browser details YourSeq 20 1826 1845 2000 100.0% chr1 + 42603015 42603034 20 browser details YourSeq 20 728 747 2000 100.0% chr1 + 17864906 17864925 20

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Slc35c2 solute carrier family 35, member C2 [ Mus musculus (house mouse) ] Gene ID: 228875, updated on 14-Aug-2019

Gene summary

Official Symbol Slc35c2 provided by MGI Official Full Name solute carrier family 35, member C2 provided by MGI Primary source MGI:MGI:2385166 See related Ensembl:ENSMUSG00000017664 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as C85957; CGI-15; Ovcov1; D2Wsu58e Expression Ubiquitous expression in duodenum adult (RPKM 63.1), small intestine adult (RPKM 33.3) and 28 other tissues See more Orthologs human all

Genomic context

Location: 2 H3; 2 85.53 cM See Slc35c2 in Genome Data Viewer Exon count: 14

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 2 NC_000068.7 (165276522..165287888, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 2 NC_000068.6 (165102056..165113327, complement)

Chromosome 2 - NC_000068.7

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 16 transcripts

Gene: Slc35c2 ENSMUSG00000017664

Description solute carrier family 35, member C2 [Source:MGI Symbol;Acc:MGI:2385166] Gene Synonyms CGI-15, D2Wsu58e, Ovcov1 Location Chromosome 2: 165,276,554-165,287,869 reverse strand. GRCm38:CM000995.2 About this gene This gene has 16 transcripts (splice variants), 168 orthologues, 9 paralogues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Slc35c2- ENSMUST00000109300.8 2008 364aa ENSMUSP00000104923.2 Protein coding CCDS17074 Q5GMH2 TSL:1 204 Q8VCX2 GENCODE basic APPRIS P1

Slc35c2- ENSMUST00000109298.7 1991 364aa ENSMUSP00000104921.1 Protein coding CCDS17074 Q5GMH2 TSL:1 202 Q8VCX2 GENCODE basic APPRIS P1

Slc35c2- ENSMUST00000109299.7 1926 364aa ENSMUSP00000104922.1 Protein coding CCDS17074 Q5GMH2 TSL:1 203 Q8VCX2 GENCODE basic APPRIS P1

Slc35c2- ENSMUST00000017808.13 1843 364aa ENSMUSP00000017808.7 Protein coding CCDS17074 Q5GMH2 TSL:1 201 Q8VCX2 GENCODE basic APPRIS P1

Slc35c2- ENSMUST00000133961.7 862 192aa ENSMUSP00000118227.1 Protein coding - Q5GMG8 CDS 3' 211 incomplete TSL:3

Slc35c2- ENSMUST00000155289.7 830 199aa ENSMUSP00000119071.1 Protein coding - Q5GMH1 CDS 3' 215 incomplete TSL:5

Slc35c2- ENSMUST00000156134.7 796 192aa ENSMUSP00000116288.1 Protein coding - Q5GMG8 CDS 3' 216 incomplete TSL:2

Slc35c2- ENSMUST00000129210.7 728 163aa ENSMUSP00000118605.1 Protein coding - Q5GMG7 CDS 3' 206 incomplete TSL:5

Slc35c2- ENSMUST00000131409.1 690 137aa ENSMUSP00000120036.1 Protein coding - A2A5A3 CDS 3' 209 incomplete TSL:3

Slc35c2- ENSMUST00000129336.7 588 196aa ENSMUSP00000123299.1 Protein coding - Q5GMG9 CDS 3' 207 incomplete TSL:5

Slc35c2- ENSMUST00000130393.1 376 58aa ENSMUSP00000123450.1 Protein coding - Q5GMG6 CDS 3' 208 incomplete TSL:3

Slc35c2- ENSMUST00000132270.7 1892 72aa ENSMUSP00000125708.1 Nonsense mediated - E0CYZ1 TSL:1 210 decay

Slc35c2- ENSMUST00000145301.7 786 72aa ENSMUSP00000123757.1 Nonsense mediated - E0CYZ1 TSL:5 212 decay

Slc35c2- ENSMUST00000147247.1 2042 No - Retained intron - - TSL:2 213 protein

Slc35c2- ENSMUST00000125550.1 1110 No - Retained intron - - TSL:1 205 protein Page 7 of 9 https://www.alphaknockout.com

Slc35c2- ENSMUST00000154608.7 775 No - Retained intron - - TSL:5 214 protein

31.32 kb Forward strand

165.27Mb 165.28Mb 165.29Mb Contigs AL591430.8 > (Comprehensive set... < Gm25569-201snoRNA < Slc35c2-205retained intron< Slc35c2-213retained intron < Elmo2-205protein coding

< Slc35c2-202protein coding < Elmo2-203protein coding

< Slc35c2-203protein coding < Elmo2-202protein coding

< Slc35c2-204protein coding < Elmo2-211protein coding

< Slc35c2-201protein coding < Elmo2-204protein coding

< Slc35c2-210nonsense mediated decay < Elmo2-201protein coding

< Slc35c2-214retained intron < Slc35c2-208protein coding

< Slc35c2-207protein coding

< Slc35c2-212nonsense mediated decay

< Slc35c2-215protein coding

< Slc35c2-216protein coding

< Slc35c2-211protein coding

< Slc35c2-206protein coding

< Slc35c2-209protein coding

Regulatory Build

165.27Mb 165.28Mb 165.29Mb Reverse strand 31.32 kb

Regulation Legend CTCF Enhancer Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000109300

< Slc35c2-204protein coding

Reverse strand 11.32 kb

ENSMUSP00000104... Transmembrane heli... Low complexity (Seg) Pfam Sugar phosphate transporter domain PANTHER PTHR11132:SF238

PTHR11132

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 364

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9