https://www.alphaknockout.com

Mouse Ccdc50 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Ccdc50 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Ccdc50 (NCBI Reference Sequence: NM_026202 ; Ensembl: ENSMUSG00000038127 ) is located on Mouse 16. 11 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 11 (Transcript: ENSMUST00000100026). Exon 3 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Ccdc50 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-239K7 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 12.35% of the coding region. The knockout of Exon 3 will result in frameshift of the gene. The size of intron 2 for 5'-loxP site insertion: 941 bp, and the size of intron 3 for 3'-loxP site insertion: 2546 bp. The size of effective cKO region: ~627 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 3 11 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Ccdc50 Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7127bp) | A(27.84% 1984) | C(17.92% 1277) | T(34.42% 2453) | G(19.83% 1413)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr16 + 27403402 27406401 3000 browser details YourSeq 147 1234 1423 3000 88.9% chr5 - 102825121 102825310 190 browser details YourSeq 74 1248 1406 3000 94.2% chr12 - 55270821 55425692 154872 browser details YourSeq 64 1931 2470 3000 71.5% chr16 - 24291736 24292086 351 browser details YourSeq 63 2244 2308 3000 98.5% chr4 - 100567937 100568001 65 browser details YourSeq 62 1355 1458 3000 89.5% chr10 - 76869506 76869608 103 browser details YourSeq 58 1266 1397 3000 92.6% chr2 - 121603798 121603933 136 browser details YourSeq 56 1351 1430 3000 87.2% chr11 - 107116534 107116612 79 browser details YourSeq 56 1356 1459 3000 84.1% chr1 + 133079271 133079370 100 browser details YourSeq 54 1353 1461 3000 89.8% chr11 + 101137616 101137864 249 browser details YourSeq 53 1282 1397 3000 84.8% chr11 + 75558231 75558339 109 browser details YourSeq 52 1351 1419 3000 95.0% chr7 - 35071999 35072072 74 browser details YourSeq 52 1353 1651 3000 68.4% chr9 + 54678100 54678265 166 browser details YourSeq 50 1356 1671 3000 93.3% chr9 + 118362219 118362641 423 browser details YourSeq 50 1353 1458 3000 86.3% chr17 + 47195535 47195637 103 browser details YourSeq 49 1362 1419 3000 94.9% chr9 - 102618200 102618264 65 browser details YourSeq 49 1903 2001 3000 78.2% chr6 - 91278973 91279051 79 browser details YourSeq 49 1352 1430 3000 87.7% chr10 - 91166238 91166318 81 browser details YourSeq 48 1917 1996 3000 87.5% chr7 - 35667815 35667897 83 browser details YourSeq 48 1341 1402 3000 94.6% chr3 + 159856232 159856294 63

Note: The 3000 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr16 + 27407029 27410028 3000 browser details YourSeq 179 2596 2885 3000 86.9% chr2 - 60595206 60595491 286 browser details YourSeq 176 2609 2905 3000 86.2% chr8 - 86608165 86608456 292 browser details YourSeq 165 2603 2904 3000 85.8% chr8 + 88557752 88558054 303 browser details YourSeq 164 2596 2912 3000 84.5% chr7 + 72750597 72750904 308 browser details YourSeq 163 2603 2871 3000 85.3% chr16 - 21006715 21006990 276 browser details YourSeq 160 2603 2912 3000 84.9% chr2 + 45825494 45825800 307 browser details YourSeq 159 2619 2885 3000 87.1% chr3 - 19155187 19155442 256 browser details YourSeq 159 2609 2895 3000 90.5% chr10 - 65672772 65673126 355 browser details YourSeq 157 2609 2912 3000 87.7% chr13 - 14271230 14271562 333 browser details YourSeq 156 2596 2915 3000 91.1% chr17 + 66591461 66591804 344 browser details YourSeq 155 2511 2885 3000 83.9% chr2 + 59222354 59222656 303 browser details YourSeq 154 2596 2871 3000 87.2% chr7 + 44765471 44765745 275 browser details YourSeq 154 2611 2884 3000 86.3% chr2 + 16424775 16425052 278 browser details YourSeq 151 2596 2871 3000 88.3% chr8 - 33971292 33971594 303 browser details YourSeq 148 2609 2885 3000 89.3% chr2 + 43575358 43575635 278 browser details YourSeq 148 2596 2871 3000 84.4% chr13 + 31703813 31704075 263 browser details YourSeq 147 2634 2912 3000 84.6% chr3 - 153453226 153453522 297 browser details YourSeq 146 2613 2871 3000 84.4% chr8 - 34208690 34208945 256 browser details YourSeq 146 2608 2898 3000 84.6% chr3 - 78133510 78133795 286

Note: The 3000 bp section downstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Ccdc50 coiled-coil domain containing 50 [ Mus musculus (house mouse) ] Gene ID: 67501, updated on 12-Aug-2019

Gene summary

Official Symbol Ccdc50 provided by MGI Official Full Name coiled-coil domain containing 50 provided by MGI Primary source MGI:MGI:1914751 See related Ensembl:ENSMUSG00000038127 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as C3orf6 Expression Ubiquitous expression in CNS E11.5 (RPKM 15.7), CNS E14 (RPKM 15.0) and 28 other tissues See more Orthologs all

Genomic context

Location: 16 B2; 16 18.98 cM See Ccdc50 in Genome Data Viewer

Exon count: 14

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 16 NC_000082.6 (27387065..27452218)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 16 NC_000082.5 (27389063..27452304)

Chromosome 16 - NC_000082.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 5 transcripts

Gene: Ccdc50 ENSMUSG00000038127

Description coiled-coil domain containing 50 [Source:MGI Symbol;Acc:MGI:1914751] Gene Synonyms 2610529H08Rik, 5730448P06Rik, D16Bwg1543e Location Chromosome 16: 27,388,869-27,452,218 forward strand. GRCm38:CM001009.2 About this gene This gene has 5 transcripts (splice variants), 184 orthologues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Ccdc50- ENSMUST00000100026.9 7089 305aa ENSMUSP00000097604.3 Protein CCDS37305 A6H6M8 TSL:1 203 coding Q810U5 GENCODE basic APPRIS P1

Ccdc50- ENSMUST00000096127.10 1096 290aa ENSMUSP00000093841.4 Protein CCDS37306 Q3TNK7 TSL:1 202 coding Q810U5 GENCODE basic

Ccdc50- ENSMUST00000039443.13 2192 264aa ENSMUSP00000038509.7 Protein - Q810U5 TSL:1 201 coding GENCODE basic

Ccdc50- ENSMUST00000143823.1 767 256aa ENSMUSP00000118633.1 Protein - F6UK66 CDS 5' and 3' 204 coding incomplete TSL:5

Ccdc50- ENSMUST00000149077.1 226 No - lncRNA - - TSL:5 205 protein

83.35 kb Forward strand

27.38Mb 27.40Mb 27.42Mb 27.44Mb 27.46Mb Ccdc50-203 >protein coding (Comprehensive set...

Ccdc50-202 >protein coding

Ccdc50-201 >protein coding

Ccdc50-204 >protein coding

Ccdc50-205 >lncRNA

Contigs < CT010568.14 Regulatory Build

27.38Mb 27.40Mb 27.42Mb 27.44Mb 27.46Mb Reverse strand 83.35 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000100026

63.35 kb Forward strand

Ccdc50-203 >protein coding

ENSMUSP00000097... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Pfam Coiled-coil domain-containing protein 50, N-terminal PANTHER Coiled-coil domain-containing protein 50

PTHR22115:SF1

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 40 80 120 160 200 240 305

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7