https://www.alphaknockout.com

Mouse Notum Knockout Project (CRISPR/Cas9)

Objective: To create a Notum knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Notum (NCBI Reference Sequence: NM_175263 ; Ensembl: ENSMUSG00000042988 ) is located on Mouse 11. 12 exons are identified, with the ATG start codon in exon 2 and the TAG stop codon in exon 12 (Transcript: ENSMUST00000106178). Exon 2~12 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a null mutation display perinatal lethality, abnormal kidney development, and impaired tracheal cartilage development. Mice homozygous for a gene trapped allele exhibit abnormal dentin development, periodontal inflammation, tooth decay and increased bone mineral density.

Exon 2 starts from about 0.07% of the coding region. Exon 2~12 covers 100.0% of the coding region. The size of effective KO region: ~6114 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5 6 7 8 9 10 11 12

Legends Exon of mouse Notum Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(18.15% 363) | C(31.5% 630) | T(20.1% 402) | G(30.25% 605)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. Significant high GC-content regions are found. The gRNA site is selected outside of these high GC-content regions.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(30.45% 609) | C(23.05% 461) | T(25.65% 513) | G(20.85% 417)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 - 120660339 120662338 2000 browser details YourSeq 35 168 203 2000 100.0% chr15 + 22282830 22282867 38 browser details YourSeq 34 171 204 2000 100.0% chr14 + 27156640 27156673 34 browser details YourSeq 32 171 202 2000 100.0% chr2 - 32203871 32203902 32 browser details YourSeq 29 520 554 2000 96.8% chr7 - 94495654 94495689 36 browser details YourSeq 27 562 590 2000 96.6% chr11 - 85721726 85721754 29 browser details YourSeq 26 522 547 2000 100.0% chr16 - 71816869 71816894 26 browser details YourSeq 25 562 588 2000 96.3% chr6 - 145133830 145133856 27 browser details YourSeq 25 524 549 2000 100.0% chr10 - 108958439 108958466 28 browser details YourSeq 25 644 680 2000 67.9% chr1 - 85874159 85874186 28 browser details YourSeq 25 562 590 2000 93.2% chr8 + 106998116 106998144 29 browser details YourSeq 24 532 555 2000 100.0% chr3 + 93045485 93045508 24 browser details YourSeq 24 520 545 2000 88.0% chr18 + 72325545 72325569 25 browser details YourSeq 23 526 549 2000 100.0% chr1 - 29180733 29180759 27 browser details YourSeq 23 562 584 2000 100.0% chr9 + 21928086 21928108 23 browser details YourSeq 23 562 584 2000 100.0% chr19 + 8923395 8923417 23 browser details YourSeq 22 562 583 2000 100.0% chr15 - 78812961 78812982 22 browser details YourSeq 22 564 585 2000 100.0% chr16 + 21196675 21196696 22 browser details YourSeq 21 564 584 2000 100.0% chr7 - 3295291 3295311 21 browser details YourSeq 21 564 584 2000 100.0% chr16 - 91160606 91160626 21

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr11 - 120652223 120654222 2000 browser details YourSeq 339 811 1899 2000 92.3% chr5 - 100504510 100876663 372154 browser details YourSeq 311 696 1137 2000 87.9% chr11 - 61559918 61560330 413 browser details YourSeq 281 695 1104 2000 92.0% chr2 - 181704294 181705023 730 browser details YourSeq 277 694 1107 2000 88.8% chr1 + 131985016 131985404 389 browser details YourSeq 275 721 1103 2000 92.6% chr11 + 75579740 75710723 130984 browser details YourSeq 274 716 1112 2000 91.5% chr18 - 46333521 46913010 579490 browser details YourSeq 272 736 1111 2000 90.8% chr10 - 128449390 128450276 887 browser details YourSeq 267 721 1103 2000 88.3% chr14 - 40987553 40987929 377 browser details YourSeq 263 721 1114 2000 87.4% chr12 + 84950251 84950633 383 browser details YourSeq 261 702 1114 2000 86.1% chr5 - 146917975 146918371 397 browser details YourSeq 259 720 1107 2000 84.2% chr13 - 93182179 93182535 357 browser details YourSeq 251 701 1105 2000 87.2% chr1 + 74173845 74174357 513 browser details YourSeq 243 764 1113 2000 87.6% chr19 - 32690923 32691581 659 browser details YourSeq 239 1419 1964 2000 93.4% chr17 + 26426513 26966489 539977 browser details YourSeq 233 721 1113 2000 90.1% chr14 - 13992214 13992767 554 browser details YourSeq 232 828 1175 2000 90.9% chr9 + 62048116 62048593 478 browser details YourSeq 226 1009 1591 2000 93.8% chr11 - 87010251 87011066 816 browser details YourSeq 219 765 1113 2000 88.4% chr15 + 81701326 81701864 539 browser details YourSeq 210 923 1573 2000 84.8% chr9 - 55183484 55183902 419

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Notum notum palmitoleoyl-protein carboxylesterase [ Mus musculus (house mouse) ] Gene ID: 77583, updated on 14-Oct-2019

Gene summary

Official Symbol Notum provided by MGI Official Full Name notum palmitoleoyl-protein carboxylesterase provided by MGI Primary source MGI:MGI:1924833 See related Ensembl:ENSMUSG00000042988 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as 5730593N15Rik Expression Biased expression in liver adult (RPKM 11.4), kidney adult (RPKM 4.2) and 10 other tissues See more Orthologs human all

Genomic context

Location: 11; 11 E2 See Notum in Genome Data Viewer Exon count: 14

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 11 NC_000077.6 (120653788..120661809, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 11 NC_000077.5 (120515102..120522151, complement)

Chromosome 11 - NC_000077.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Notum ENSMUSG00000042988

Description notum palmitoleoyl-protein carboxylesterase [Source:MGI Symbol;Acc:MGI:1924833] Gene Synonyms 5730593N15Rik Location Chromosome 11: 120,653,788-120,661,175 reverse strand. GRCm38:CM001004.2 About this gene This gene has 4 transcripts (splice variants), 242 orthologues, is a member of 1 Ensembl protein family and is associated with 16 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Notum-202 ENSMUST00000106177.7 2212 503aa ENSMUSP00000101783.1 Protein coding CCDS49007 Q8R116 TSL:5 GENCODE basic APPRIS P1

Notum-203 ENSMUST00000106178.8 2012 503aa ENSMUSP00000101784.2 Protein coding CCDS49007 Q8R116 TSL:1 GENCODE basic APPRIS P1

Notum-204 ENSMUST00000150458.1 728 215aa ENSMUSP00000122788.1 Protein coding - B7ZCA6 CDS 3' incomplete TSL:3

Notum-201 ENSMUST00000055439.3 1433 No protein - Retained intron - - TSL:1

27.39 kb Forward strand 120.65Mb 120.66Mb 120.67Mb Myadml2os-201 >lncRNA Notumos-201 >lncRNA (Comprehensive set...

Contigs AL663030.12 > Genes (Comprehensive set... < Myadml2-201protein coding < Notum-203protein coding

< Myadml2-203protein coding < Notum-202protein coding

< Myadml2-202lncRNA < Notum-201retained intron

< Notum-204protein coding

Regulatory Build

120.65Mb 120.66Mb 120.67Mb Reverse strand 27.39 kb

Regulation Legend CTCF Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000106178

< Notum-203protein coding

Reverse strand 7.04 kb

ENSMUSP00000101... MobiDB lite Low complexity (Seg) Cleavage site (Sign... Pfam Pectinacetylesterase/NOTUM

PANTHER PTHR21562:SF7

Pectinacetylesterase/NOTUM

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant splice region variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 503

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8