https://www.alphaknockout.com

Mouse Fam20a Knockout Project (CRISPR/Cas9)

Objective: To create a Fam20a knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Fam20a (NCBI Reference Sequence: NM_153782 ; Ensembl: ENSMUSG00000020614 ) is located on Mouse 11. 11 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 11 (Transcript: ENSMUST00000020938). Exon 2~8 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit abnormal ameloblast morphology, disrupted dental enamel formation in both incisor and molar teeth, abnormal kidney morphology, disseminated calcifications of muscular arteries, and intrapulmonary calcifications.

Exon 2 starts from about 24.95% of the coding region. Exon 2~8 covers 50.22% of the coding region. The size of effective KO region: ~9685 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 11

Legends Exon of mouse Fam20a Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 2 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 655 bp section downstream of Exon 8 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.8% 536) | C(23.0% 460) | T(25.75% 515) | G(24.45% 489)

Note: The 2000 bp section upstream of Exon 2 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(655bp) | A(24.43% 160) | C(27.63% 181) | T(20.0% 131) | G(27.94% 183)

Note: The 655 bp section downstream of Exon 8 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 - 109685528 109687527 2000 browser details YourSeq 176 538 858 2000 88.4% chr13 - 66893666 66893996 331 browser details YourSeq 172 538 841 2000 89.0% chr12 + 24467301 24467614 314 browser details YourSeq 166 533 853 2000 89.4% chr8 - 36295684 36296021 338 browser details YourSeq 166 538 842 2000 88.9% chr14 - 109468284 109468592 309 browser details YourSeq 164 567 797 2000 89.1% chr2 + 102262488 102262745 258 browser details YourSeq 161 533 806 2000 90.5% chr10 - 70895807 71053897 158091 browser details YourSeq 161 538 847 2000 87.1% chr2 + 168245540 168245863 324 browser details YourSeq 155 533 842 2000 85.8% chr1 - 186401511 186401834 324 browser details YourSeq 153 538 797 2000 89.7% chr7 - 66474545 66474830 286 browser details YourSeq 153 538 802 2000 87.5% chr2 + 52350037 52350327 291 browser details YourSeq 153 554 842 2000 87.8% chr19 + 58359455 58359757 303 browser details YourSeq 151 538 842 2000 87.7% chr15 + 56159560 56159870 311 browser details YourSeq 149 538 842 2000 89.5% chr12 - 118185999 118186317 319 browser details YourSeq 148 538 842 2000 88.6% chr18 - 45889349 45889667 319 browser details YourSeq 147 533 794 2000 87.8% chr6 + 141116408 141116669 262 browser details YourSeq 147 538 792 2000 88.7% chr3 + 90304410 90304702 293 browser details YourSeq 146 544 797 2000 90.2% chr13 - 116461169 116461425 257 browser details YourSeq 145 533 798 2000 87.1% chr12 - 83892289 83892549 261 browser details YourSeq 145 538 791 2000 87.3% chr10 + 60866005 60866281 277

Note: The 2000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 655 1 655 655 100.0% chr11 - 109675188 109675842 655 browser details YourSeq 27 578 608 655 86.7% chr17 - 36228357 36228386 30 browser details YourSeq 25 532 558 655 96.3% chr9 + 103249625 103249651 27 browser details YourSeq 23 532 556 655 87.5% chr14 + 34292721 34292744 24 browser details YourSeq 23 532 556 655 87.5% chr1 + 35787581 35787604 24 browser details YourSeq 22 575 599 655 95.9% chr14 - 79624064 79624093 30 browser details YourSeq 21 488 508 655 100.0% chr2 + 155650391 155650411 21 browser details YourSeq 20 407 426 655 100.0% chr11 + 53873239 53873258 20

Note: The 655 bp section downstream of Exon 8 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Fam20a family with sequence similarity 20, member A [ Mus musculus (house mouse) ] Gene ID: 208659, updated on 12-Aug-2019

Gene summary

Official Symbol Fam20a provided by MGI Official Full Name family with sequence similarity 20, member A provided by MGI Primary source MGI:MGI:2388266 See related Ensembl:ENSMUSG00000020614 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as AI606893 Expression Ubiquitous expression in testis adult (RPKM 14.5), duodenum adult (RPKM 11.5) and 26 other tissues See more Orthologs human all

Genomic context

Location: 11; 11 E1 See Fam20a in Genome Data Viewer Exon count: 12

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (109669746..109723163, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (109534240..109583570, complement)

Chromosome 11 - NC_000077.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Fam20a ENSMUSG00000020614

Description family with sequence similarity 20, member A [Source:MGI Symbol;Acc:MGI:2388266] Location Chromosome 11: 109,669,749-109,722,279 reverse strand. GRCm38:CM001004.2 About this gene This gene has 4 transcripts (splice variants), 209 orthologues, 2 paralogues, is a member of 1 Ensembl protein family and is associated with 27 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Fam20a- ENSMUST00000020938.7 2541 541aa ENSMUSP00000020938.7 Protein coding CCDS25584 Q8CID3 TSL:1 201 GENCODE basic APPRIS P1

Fam20a- ENSMUST00000155559.7 3085 541aa ENSMUSP00000116687.1 Nonsense mediated - Q8CID3 TSL:1 204 decay

Fam20a- ENSMUST00000146408.7 828 No - Retained intron - - TSL:3 203 protein

Fam20a- ENSMUST00000144972.1 794 No - Retained intron - - TSL:3 202 protein

72.53 kb Forward strand 109.66Mb 109.68Mb 109.70Mb 109.72Mb Prkar1a-203 >protein coding (Comprehensive set...

Prkar1a-202 >protein coding

Prkar1a-201 >protein coding

Prkar1a-205 >lncRNA

Contigs < AL732387.10 AL691448.24 >

Genes < Fam20a-204nonsense mediated decay (Comprehensive set...

< Fam20a-201protein coding

< Fam20a-203retained intron

< Fam20a-202retained intron

Regulatory Build

109.66Mb 109.68Mb 109.70Mb 109.72Mb Reverse strand 72.53 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000020938

< Fam20a-201protein coding

Reverse strand 49.33 kb

ENSMUSP00000020... Transmembrane heli... Low complexity (Seg) Pfam FAM20, C-terminal PANTHER FAM20

PTHR12450:SF12

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 541

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8