https://www.alphaknockout.com

Mouse Cysrt1 Knockout Project (CRISPR/Cas9)

Objective: To create a Cysrt1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Cysrt1 (NCBI Reference Sequence: NM_026415 ; Ensembl: ENSMUSG00000036731 ) is located on Mouse 2. 2 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 2 (Transcript: ENSMUST00000043379). Exon 2 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 0.23% of the coding region. Exon 2 covers 100.0% of the coding region. The size of effective KO region: ~424 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2

Legends Exon of mouse Cysrt1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(22.1% 442) | C(27.8% 556) | T(24.05% 481) | G(26.05% 521)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.75% 475) | C(24.1% 482) | T(24.25% 485) | G(27.9% 558)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr2 - 25239499 25241498 2000 browser details YourSeq 27 1349 1377 2000 96.6% chr2 - 45321821 45321849 29 browser details YourSeq 26 1347 1380 2000 88.3% chr19 + 7956919 7956952 34 browser details YourSeq 25 1349 1377 2000 93.2% chr7 - 17738296 17738324 29 browser details YourSeq 25 1349 1377 2000 93.2% chr18 - 6714081 6714109 29 browser details YourSeq 25 1598 1622 2000 100.0% chr3 + 51571964 51571988 25 browser details YourSeq 25 1018 1051 2000 96.3% chr11 + 103372231 103372266 36 browser details YourSeq 23 545 567 2000 100.0% chr2 + 179791859 179791881 23 browser details YourSeq 21 549 569 2000 100.0% chr7 + 123124811 123124831 21

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr2 - 25237073 25239072 2000 browser details YourSeq 211 263 2000 2000 93.2% chr1 + 190996843 191365404 368562 browser details YourSeq 176 286 1995 2000 90.0% chr4 + 116981893 117080135 98243 browser details YourSeq 148 266 426 2000 96.3% chr5 - 114896104 114896267 164 browser details YourSeq 148 262 429 2000 94.6% chr6 + 54547724 54547906 183 browser details YourSeq 148 260 429 2000 94.7% chr11 + 23690163 23690346 184 browser details YourSeq 148 262 429 2000 94.7% chr10 + 75619748 75619921 174 browser details YourSeq 146 260 425 2000 95.2% chr4 + 137761610 138035006 273397 browser details YourSeq 146 262 426 2000 92.6% chr14 + 62781487 62781649 163 browser details YourSeq 142 271 441 2000 89.1% chr11 - 4066871 4067034 164 browser details YourSeq 142 262 429 2000 93.4% chr1 - 130365465 130464742 99278 browser details YourSeq 140 262 426 2000 92.7% chr1 - 119538113 119538288 176 browser details YourSeq 140 258 422 2000 90.0% chr17 + 86774959 86775119 161 browser details YourSeq 140 266 429 2000 93.2% chr14 + 31027579 31027742 164 browser details YourSeq 139 260 430 2000 91.8% chr13 - 13755864 13756039 176 browser details YourSeq 138 263 425 2000 92.6% chrX + 92351484 92351647 164 browser details YourSeq 138 265 429 2000 92.8% chr3 + 68581084 68581251 168 browser details YourSeq 138 266 429 2000 92.7% chr2 + 72406459 72406626 168 browser details YourSeq 138 262 421 2000 91.2% chr1 + 54114296 54114453 158 browser details YourSeq 137 261 429 2000 92.0% chr11 - 13465368 13465536 169

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Cysrt1 cysteine rich tail 1 [ Mus musculus (house mouse) ] Gene ID: 67859, updated on 12-Aug-2019

Gene summary

Official Symbol Cysrt1 provided by MGI Official Full Name cysteine rich tail 1 provided by MGI Primary source MGI:MGI:1915109 See related Ensembl:ENSMUSG00000036731 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AV023762; 2310002J15Rik Expression Biased expression in stomach adult (RPKM 175.0), lung adult (RPKM 24.2) and 1 other tissue See more Orthologs all

Genomic context

Location: 2; 2 A3 See Cysrt1 in Genome Data Viewer Exon count: 2

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 2 NC_000068.7 (25238819..25239897, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 2 NC_000068.6 (25094339..25095417, complement)

Chromosome 2 - NC_000068.7

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 3 transcripts

Gene: Cysrt1 ENSMUSG00000036731

Description cysteine rich tail 1 [Source:MGI Symbol;Acc:MGI:1915109] Gene Synonyms 2310002J15Rik Location Chromosome 2: 25,238,818-25,242,452 reverse strand. GRCm38:CM000995.2 About this gene This gene has 3 transcripts (splice variants), 77 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Cysrt1-203 ENSMUST00000186719.1 909 142aa ENSMUSP00000140416.1 Protein coding CCDS50526 Q9D1E4 TSL:NA GENCODE basic APPRIS P1

Cysrt1-201 ENSMUST00000043379.4 873 142aa ENSMUSP00000045363.3 Protein coding CCDS50526 Q9D1E4 TSL:1 GENCODE basic APPRIS P1

Cysrt1-202 ENSMUST00000114356.2 509 114aa ENSMUSP00000109996.2 Protein coding - A0A0A0MQE7 CDS 3' incomplete TSL:5

Page 7 of 9 https://www.alphaknockout.com

23.64 kb Forward strand 25.23Mb 25.24Mb 25.25Mb Rnf208-202 >protein coding (Comprehensive set...

Rnf208-201 >protein coding

Contigs AL732309.9 > Genes (Comprehensive set... < Slc34a3-201protein coding < Rnf224-201protein coding < Cysrt1-201protein coding < AL732309.1-201protein coding < Ndor1-203retained intron

< Slc34a3-202lncRNA < Rnf224-202protein coding < Cysrt1-203protein coding < Ndor1-201protein coding

< Slc34a3-207lncRNA < Cysrt1-202protein coding < Ndor1-211protein coding

< Slc34a3-205lncRNA < Ndor1-202protein coding

< Slc34a3-203lncRNA < Ndor1-210nonsense mediated decay

< Slc34a3-206lncRNA < Ndor1-205nonsense mediated decay

< Slc34a3-204lncRNA < Ndor1-204retained intron

< Ndor1-207retained intron

< Ndor1-208protein coding

< Ndor1-209protein coding

< Ndor1-206lncRNA

Regulatory Build

25.23Mb 25.24Mb 25.25Mb Reverse strand 23.64 kb

Regulation Legend CTCF Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000043379

< Cysrt1-201protein coding

Reverse strand 1.17 kb

ENSMUSP00000045... MobiDB lite Low complexity (Seg) Pfam Uncharacterised protein family UPF0574 PANTHER Uncharacterised protein family UPF0574

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 142

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9