https://www.alphaknockout.com

Mouse Gm44504 Knockout Project (CRISPR/Cas9)

Objective: To create a Gm44504 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Gm44504 (NCBI Reference Sequence: NM_001278271 ; Ensembl: ENSMUSG00000015290 ) is located on Mouse X. 7 exons are identified, with the ATG start codon in exon 4 and the TAG stop codon in exon 7 (Transcript: ENSMUST00000178691). Exon 6 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 6 starts from about 32.91% of the coding region. Exon 6 covers 44.37% of the coding region. The size of effective KO region: ~209 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 6 7

Legends Exon of mouse Gm44504 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 209 bp section of Exon 6 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 209 bp section of Exon 6 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(209bp) | A(26.32% 55) | C(28.23% 59) | T(20.57% 43) | G(24.88% 52)

Note: The 209 bp section of Exon 6 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(209bp) | A(26.79% 56) | C(28.23% 59) | T(20.57% 43) | G(24.4% 51)

Note: The 209 bp section of Exon 6 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 209 1 209 209 100.0% chrX - 74367700 74367908 209 browser details YourSeq 27 166 197 209 93.6% chr14 - 63665416 63665447 32 browser details YourSeq 25 163 188 209 100.0% chr6 + 48422412 48422441 30 browser details YourSeq 22 115 138 209 87.0% chr6 - 87900551 87900573 23 browser details YourSeq 22 115 138 209 87.0% chr3 - 135498216 135498238 23 browser details YourSeq 20 115 136 209 95.5% chr3 - 109597061 109597082 22 browser details YourSeq 20 131 150 209 100.0% chr10 + 64386117 64386136 20

Note: The 209 bp section of Exon 6 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 209 1 209 209 100.0% chrX - 74367702 74367910 209 browser details YourSeq 25 165 190 209 100.0% chr6 + 48422412 48422441 30 browser details YourSeq 22 117 140 209 87.0% chr6 - 87900551 87900573 23 browser details YourSeq 20 117 138 209 95.5% chr3 - 109597061 109597082 22 browser details YourSeq 20 133 152 209 100.0% chr10 + 64386117 64386136 20

Note: The 209 bp section of Exon 6 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Gm44504 predicted readthrough transcript (NMD candidate), 44504 [ Mus musculus (house mouse) ] Gene ID: 100169864, updated on 15-Aug-2019

Gene summary

Official Symbol Gm44504 provided by MGI Official Full Name predicted readthrough transcript (NMD candidate), 44504 provided by MGI Primary source MGI:MGI:5621304 See related Ensembl:ENSMUSG00000015290 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Tu11; Gm38419; SlcUbl4a; Slc10a3-ubl4 Summary This locus represents naturally occurring readthrough transcription between the neighboring Slc10a3 (solute carrier family Expression 10 (sodium/bile acid cotransporter family), member 3) and Ubl4 (ubiquitin-like 4) on chromosome X. While some readthrough transcripts encode a protein that is identical to the downstream gene product, at least one readthrough transcript appears to be a candidate for nonsense-mediated mRNA decay (NMD), and is unlikely to produce a protein product. [provided by RefSeq, May 2013] Ubiquitous expression in adrenal adult (RPKM 44.9), ovary adult (RPKM 39.9) and 28 other tissues See more

Genomic context

Location: X A7.3; X See Gm44504 in Genome Data Viewer

Exon count: 7

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) X NC_000086.7 (74367447..74373349, complement)

Chromosome X - NC_000086.7

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Ubl4a ENSMUSG00000015290

Description ubiquitin-like 4A [Source:MGI Symbol;Acc:MGI:95049] Gene Synonyms DXHXS254E, DXS254Eh, Gdx, Ubl4 Location Chromosome X: 74,365,718-74,373,218 reverse strand. GRCm38:CM001013.2 About this gene This gene has 7 transcripts (splice variants), 165 orthologues, 12 paralogues, is a member of 1 Ensembl protein family and is associated with 5 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Ubl4a- ENSMUST00000155676.7 2442 157aa ENSMUSP00000120461.1 Protein coding CCDS30230 P21126 TSL:1 206 Q3UK94 GENCODE basic APPRIS P1

Ubl4a- ENSMUST00000178691.1 1038 157aa ENSMUSP00000136070.1 Protein coding CCDS30230 P21126 TSL:2 207 Q3UK94 GENCODE basic APPRIS P1

Ubl4a- ENSMUST00000015434.14 782 102aa ENSMUSP00000015434.8 Nonsense mediated - F8WHM4 TSL:3 201 decay

Ubl4a- ENSMUST00000140632.1 681 No - Retained intron - - TSL:3 205 protein

Ubl4a- ENSMUST00000126191.1 614 No - Retained intron - - TSL:2 204 protein

Ubl4a- ENSMUST00000126117.1 2220 No - lncRNA - - TSL:1 203 protein

Ubl4a- ENSMUST00000125914.1 369 No - lncRNA - - TSL:2 202 protein

Page 7 of 9 https://www.alphaknockout.com

27.50 kb Forward strand 74.36Mb 74.37Mb 74.38Mb Contigs AL807376.4 > Genes (Comprehensive set... < Ubl4a-206protein coding

< Ubl4a-203lncRNA < Slc10a3-201protein coding

< Ubl4a-201nonsense mediated decay

< Ubl4a-205retained intron

< Ubl4a-207protein coding

< Ubl4a-204retained intron

< Ubl4a-202lncRNA

< Slc10a3-202protein coding

< Slc10a3-203protein coding

< Slc10a3-204protein coding

Regulatory Build

74.36Mb 74.37Mb 74.38Mb Reverse strand 27.50 kb

Regulation Legend CTCF Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000178691

< Ubl4a-207protein coding

Reverse strand 5.75 kb

ENSMUSP00000136... Low complexity (Seg) Coiled-coils (Ncoils) Superfamily Ubiquitin-like domain superfamily SMART Ubiquitin domain Prints Ubiquitin Pfam Ubiquitin domain Ubl4, C-terminal TUGS domain

PROSITE profiles Ubiquitin domain PROSITE patterns Ubiquitin conserved site PANTHER PTHR46555 Gene3D 3.10.20.90 CDD cd01807

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend synonymous variant

Scale bar 0 20 40 60 80 100 120 157

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9