https://www.alphaknockout.com

Mouse Ccdc80 Knockout Project (CRISPR/Cas9)

Objective: To create a Ccdc80 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Ccdc80 (NCBI Reference Sequence: NM_026439 ; Ensembl: ENSMUSG00000022665 ) is located on Mouse 16. 8 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 8 (Transcript: ENSMUST00000061050). Exon 2 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a null allele exhibit increased adiposity, hyperglycemia, glucose intolerance, impaired insulin secretion, and altered energy intake and expenditure when fed a high-fat diet. Mice homozygous for a different null allele develop thyroid adenomas and ovarian carcinomas.

Exon 2 starts from the coding region. Exon 2 covers 65.86% of the coding region. The size of effective KO region: ~1886 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 8

Legends Exon of mouse Ccdc80 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 446 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(446bp) | A(20.63% 92) | C(25.34% 113) | T(36.1% 161) | G(17.94% 80)

Note: The 446 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.3% 526) | C(19.3% 386) | T(29.75% 595) | G(24.65% 493)

Note: The 2000 bp section downstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 446 1 446 446 100.0% chr16 + 45094438 45094883 446 browser details YourSeq 30 240 278 446 74.2% chr12 + 41092558 41092588 31 browser details YourSeq 29 359 392 446 94.0% chr10 - 36573269 36573315 47 browser details YourSeq 25 360 388 446 88.9% chr11 - 43479304 43479331 28 browser details YourSeq 25 358 383 446 100.0% chr1 + 88081571 88081611 41 browser details YourSeq 24 284 312 446 81.5% chr3 + 141580951 141580977 27 browser details YourSeq 22 2 25 446 87.0% chr14 + 62962614 62962636 23 browser details YourSeq 21 38 58 446 100.0% chr15 - 22333169 22333189 21 browser details YourSeq 21 387 409 446 95.7% chr13 - 77806284 77806306 23 browser details YourSeq 21 386 406 446 100.0% chr1 - 11227292 11227312 21 browser details YourSeq 21 394 414 446 100.0% chr1 + 29603312 29603332 21 browser details YourSeq 20 385 404 446 100.0% chr10 - 100107310 100107329 20 browser details YourSeq 20 180 199 446 100.0% chr13 + 30001091 30001110 20 browser details YourSeq 20 257 278 446 95.5% chr1 + 64281681 64281702 22

Note: The 446 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr16 + 45096759 45098758 2000 browser details YourSeq 55 95 196 2000 88.8% chr18 - 11019753 11020062 310 browser details YourSeq 55 131 219 2000 77.4% chr10 - 119000570 119000650 81 browser details YourSeq 55 124 206 2000 88.9% chr16 + 24928093 24928192 100 browser details YourSeq 54 114 196 2000 83.1% chr10 + 70768689 70768768 80 browser details YourSeq 51 133 232 2000 75.4% chr13 + 34341106 34341194 89 browser details YourSeq 50 142 231 2000 76.6% chr11 + 47656752 47656828 77 browser details YourSeq 45 130 196 2000 83.6% chr19 - 29669089 29669155 67 browser details YourSeq 45 162 231 2000 78.2% chr11 - 80834503 80834561 59 browser details YourSeq 45 69 196 2000 66.7% chr1 - 24162080 24162164 85 browser details YourSeq 45 125 196 2000 78.3% chr12 + 70104644 70104713 70 browser details YourSeq 44 120 200 2000 72.4% chr12 + 8226392 8226468 77 browser details YourSeq 43 138 219 2000 74.6% chr4 - 41590438 41590507 70 browser details YourSeq 43 138 200 2000 84.2% chr3 + 71568426 71568488 63 browser details YourSeq 43 157 231 2000 74.6% chr2 + 152775021 152775084 64 browser details YourSeq 41 157 231 2000 75.6% chr19 + 26468576 26468637 62 browser details YourSeq 41 130 192 2000 82.6% chr13 + 11833124 11833186 63 browser details YourSeq 40 168 231 2000 80.9% chr17 - 80015995 80016053 59 browser details YourSeq 40 138 193 2000 85.8% chr12 - 80936231 80936286 56 browser details YourSeq 40 129 196 2000 79.5% chr11 + 86522945 86523012 68

Note: The 2000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Ccdc80 coiled-coil domain containing 80 [ Mus musculus (house mouse) ] Gene ID: 67896, updated on 17-Sep-2019

Gene summary

Official Symbol Ccdc80 provided by MGI Official Full Name coiled-coil domain containing 80 provided by MGI Primary source MGI:MGI:1915146 See related Ensembl:ENSMUSG00000022665 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Urb; DRO1; Ssg1; 2610001E17Rik Expression Biased expression in subcutaneous fat pad adult (RPKM 96.6), mammary gland adult (RPKM 61.1) and 12 other tissues Orthologs See more human all

Genomic context

Location: 16; 16 B5 See Ccdc80 in Genome Data Viewer

Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 16 NC_000082.6 (45093407..45127924)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 16 NC_000082.5 (45094166..45128037)

Chromosome 16 - NC_000082.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Ccdc80 ENSMUSG00000022665

Description coiled-coil domain containing 80 [Source:MGI Symbol;Acc:MGI:1915146] Gene Synonyms 2610001E17Rik, DRO1, Ssg1, Urb Location Chromosome 16: 45,093,402-45,128,077 forward strand. GRCm38:CM001009.2 About this gene This gene has 6 transcripts (splice variants), 97 orthologues, is a member of 1 Ensembl protein family and is associated with 20 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Ccdc80-202 ENSMUST00000099498.9 4083 949aa ENSMUSP00000097097.2 Protein coding CCDS28192 Q8R2G6 TSL:1 GENCODE basic APPRIS P1

Ccdc80-201 ENSMUST00000061050.5 3648 949aa ENSMUSP00000058752.5 Protein coding CCDS28192 Q8R2G6 TSL:1 GENCODE basic APPRIS P1

Ccdc80-203 ENSMUST00000134924.1 788 No protein - lncRNA - - TSL:3

Ccdc80-204 ENSMUST00000138048.7 535 No protein - lncRNA - - TSL:3

Ccdc80-206 ENSMUST00000155800.1 411 No protein - lncRNA - - TSL:3

Ccdc80-205 ENSMUST00000139509.1 363 No protein - lncRNA - - TSL:2

54.68 kb Forward strand

45.09Mb 45.10Mb 45.11Mb 45.12Mb 45.13Mb (Comprehensive set... Ccdc80-205 >lncRNA

Ccdc80-204 >lncRNA

Ccdc80-206 >lncRNA

Ccdc80-202 >protein coding

Ccdc80-203 >lncRNA

Ccdc80-201 >protein coding

Contigs < AC124600.5 AC129601.4 > Regulatory Build

45.09Mb 45.10Mb 45.11Mb 45.12Mb 45.13Mb Reverse strand 54.68 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding

Non-Protein Coding

RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000061050

33.87 kb Forward strand

Ccdc80-201 >protein coding

ENSMUSP00000058... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Cleavage site (Sign... Pfam Domain of unknown function DUF4174 PANTHER PTHR46792:SF2

PTHR46792

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 800 949

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8