https://www.alphaknockout.com

Mouse ccdc198 Knockout Project (CRISPR/Cas9)

Objective: To create a ccdc198 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The ccdc198 (NCBI Reference Sequence: NM_025956.4 ; Ensembl: ENSMUSG00000021850 ) is located on Mouse 14. 7 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 7 (Transcript: ENSMUST00000228936). Exon 3~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 24.72% of the coding region. Exon 3~5 covers 30.84% of the coding region. The size of effective KO region: ~9349 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 7

Legends Exon of mouse ccdc198 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1229 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 1491 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(1229bp) | A(29.37% 361) | C(18.31% 225) | T(32.38% 398) | G(19.93% 245)

Note: The 1229 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(1491bp) | A(26.36% 393) | C(23.41% 349) | T(26.96% 402) | G(23.27% 347)

Note: The 1491 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1229 1 1229 1229 100.0% chr14 - 49243812 49245040 1229 browser details YourSeq 33 860 914 1229 80.0% chr4 + 111386703 111386757 55 browser details YourSeq 28 850 895 1229 80.5% chr4 - 149816431 149816476 46 browser details YourSeq 28 852 895 1229 81.9% chr1 + 193818825 193818868 44 browser details YourSeq 27 844 876 1229 89.7% chr2 - 153190090 153190121 32 browser details YourSeq 27 686 718 1229 93.4% chr1 + 162350175 162350212 38 browser details YourSeq 25 861 891 1229 90.4% chr14 - 56592161 56592191 31 browser details YourSeq 22 886 909 1229 95.9% chr1 + 134606586 134606609 24 browser details YourSeq 21 520 540 1229 100.0% chr14 - 59661591 59661611 21 browser details YourSeq 21 835 855 1229 100.0% chr1 - 6080032 6080052 21 browser details YourSeq 20 858 891 1229 79.5% chr7 + 77929473 77929506 34 browser details YourSeq 20 1111 1130 1229 100.0% chr10 + 20824289 20824308 20 browser details YourSeq 20 148 167 1229 100.0% chr1 + 108160875 108160894 20

Note: The 1229 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 1491 1 1491 1491 100.0% chr14 - 49232972 49234462 1491 browser details YourSeq 55 262 439 1491 92.4% chr17 + 46594042 46594290 249 browser details YourSeq 52 258 347 1491 85.2% chr13 - 112436265 112436356 92 browser details YourSeq 49 361 433 1491 76.9% chr2 - 69701903 69701971 69 browser details YourSeq 47 743 803 1491 88.6% chr10 + 83319932 83319992 61 browser details YourSeq 46 291 348 1491 89.7% chr3 - 90107621 90107678 58 browser details YourSeq 45 263 337 1491 80.0% chr10 - 25914048 25914122 75 browser details YourSeq 45 321 780 1491 67.3% chr15 + 100446528 100446931 404 browser details YourSeq 45 260 340 1491 77.8% chr15 + 78733446 78733526 81 browser details YourSeq 44 743 806 1491 84.4% chr14 - 45517710 45517773 64 browser details YourSeq 44 740 803 1491 84.4% chr13 - 20987514 20987577 64 browser details YourSeq 44 743 806 1491 84.4% chr10 - 32342295 32342358 64 browser details YourSeq 44 744 801 1491 88.0% chr9 + 114013875 114013932 58 browser details YourSeq 44 737 806 1491 88.0% chr17 + 25886373 25886563 191 browser details YourSeq 43 732 788 1491 87.8% chr4 - 89422485 89422541 57 browser details YourSeq 43 743 803 1491 85.3% chr14 + 66659146 66659206 61 browser details YourSeq 43 289 345 1491 87.8% chr11 + 79925485 79925541 57 browser details YourSeq 42 740 803 1491 82.9% chr8 - 3257709 3257772 64 browser details YourSeq 42 268 343 1491 77.7% chr6 - 18383506 18383581 76 browser details YourSeq 42 740 803 1491 82.9% chr17 - 91108240 91108303 64

Note: The 1491 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and protein information: ccdc198 coiled-coil domain containing 198 [ Mus musculus (house mouse) ] Gene ID: 67082, updated on 26-Jun-2020

Gene summary

Official Symbol ccdc198 provided by MGI Official Full Name coiled-coil domain containing 198 provided by MGI Primary source MGI:MGI:1914332 See related Ensembl:ENSMUSG00000021850 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 1700011H14Rik Expression Biased expression in kidney adult (RPKM 9.1), testis adult (RPKM 7.4) and 3 other tissues See more Orthologs human all

Genomic context

Location: 14; 14 C1 See ccdc198 in Genome Data Viewer Exon count: 7

Annotation release Status Assembly Chr Location

108.20200622 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (49226358..49245445, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (49846034..49865103, complement)

Chromosome 14 - NC_000080.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: ccdc198 ENSMUSG00000021850

Description coiled-coil domain containing 198 [Source:MGI Symbol;Acc:MGI:1914332] Gene Synonyms 1700011H14Rik Location : 49,219,588-49,245,474 reverse strand. GRCm38:CM001007.2 About this gene This gene has 5 transcripts (splice variants), 132 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags ccdc198- ENSMUST00000228936.1 3444 294aa ENSMUSP00000154726.1 Protein coding CCDS26996 Q9CPZ1 GENCODE 205 basic APPRIS P2 ccdc198- ENSMUST00000022398.14 1349 303aa ENSMUSP00000022398.8 Protein coding - A0A2K6EDK1 TSL:1 201 GENCODE basic APPRIS ALT2 ccdc198- ENSMUST00000227113.1 798 251aa ENSMUSP00000153946.1 Protein coding - A0A2I3BPW4 CDS 3' 204 incomplete ccdc198- ENSMUST00000130853.1 790 216aa ENSMUSP00000117775.1 Protein coding - A9C480 CDS 3' 202 incomplete TSL:5 ccdc198- ENSMUST00000148109.1 800 65aa ENSMUSP00000114834.1 Nonsense mediated - D6RDQ1 TSL:3 203 decay

Page 7 of 9 https://www.alphaknockout.com

45.89 kb Forward strand 49.21Mb 49.22Mb 49.23Mb 49.24Mb 49.25Mb Gm48935-202 >lincRNA (Comprehensive set...

Gm48935-201 >lincRNA

Gm48936-201 >TEC

Contigs AC154227.2 > CT030637.9 > Genes (Comprehensive set... < ccdc198-201protein coding

< ccdc198-205protein coding

< ccdc198-203nonsense mediated decay

< ccdc198-204protein coding

< ccdc198-202protein coding

Regulatory Build

49.21Mb 49.22Mb 49.23Mb 49.24Mb 49.25Mb Reverse strand 45.89 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000228936

< ccdc198-205protein coding

Reverse strand 21.35 kb

ENSMUSP00000154... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Pfam Protein of unknown function DUF4619 PANTHER Protein of unknown function DUF4619

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 294

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9