https://www.alphaknockout.com

Mouse Rbpms Knockout Project (CRISPR/Cas9)

Objective: To create a Rbpms knockout Mouse model (C57BL/6J ) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Rbpms (NCBI Reference Sequence: NM_019733 ; Ensembl: ENSMUSG00000031586 ) is located on Mouse 8. 9 exons are identified, with the ATG start codon in exon 1 and the TGA stop codon in exon 7 (Transcript: ENSMUST00000191473). Exon 5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 5 starts from about 41.79% of the coding region. Exon 5 covers 25.55% of the coding region. The size of effective KO region: ~151 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 5 9

Legends Exon of mouse Rbpms Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.9% 438) | C(22.1% 442) | T(32.3% 646) | G(23.7% 474)

Note: The 2000 bp section upstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.2% 464) | C(22.0% 440) | T(29.2% 584) | G(25.6% 512)

Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr8 - 33834455 33836454 2000 browser details YourSeq 182 225 436 2000 94.5% chr4 + 136325089 136325298 210 browser details YourSeq 178 225 423 2000 93.9% chr4 - 149188886 149189083 198 browser details YourSeq 178 217 422 2000 94.5% chr10 + 4411982 4412193 212 browser details YourSeq 175 220 420 2000 91.6% chr9 - 44362224 44362415 192 browser details YourSeq 175 236 443 2000 92.8% chr16 + 20271162 20271362 201 browser details YourSeq 174 228 422 2000 95.9% chr8 - 84195499 84195703 205 browser details YourSeq 174 221 422 2000 92.5% chr6 - 42368721 42368920 200 browser details YourSeq 172 230 422 2000 95.9% chr10 + 116185819 116186026 208 browser details YourSeq 171 229 420 2000 95.3% chr3 - 20075539 20075736 198 browser details YourSeq 171 219 422 2000 93.5% chr4 + 150367201 150367416 216 browser details YourSeq 170 230 431 2000 92.9% chr8 - 61426954 61427159 206 browser details YourSeq 169 223 422 2000 91.9% chr13 - 100537816 100538014 199 browser details YourSeq 169 226 421 2000 95.2% chrX + 102202319 102202875 557 browser details YourSeq 169 230 420 2000 96.2% chr4 + 57889240 57889432 193 browser details YourSeq 168 238 420 2000 95.0% chr5 + 100573812 100573992 181 browser details YourSeq 168 238 464 2000 95.3% chr10 + 117086224 117086803 580 browser details YourSeq 167 217 421 2000 89.2% chr2 - 115548882 115549077 196 browser details YourSeq 167 220 419 2000 91.7% chr9 + 45013337 45013533 197 browser details YourSeq 167 225 423 2000 90.5% chr6 + 125462744 125462936 193

Note: The 2000 bp section upstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr8 - 33832304 33834303 2000 browser details YourSeq 41 1679 1780 2000 76.0% chr14 + 16626542 16626631 90 browser details YourSeq 40 1754 1861 2000 78.6% chr5 + 128614474 128614566 93 browser details YourSeq 39 1734 1783 2000 93.4% chr7 + 127132885 127132938 54 browser details YourSeq 35 1734 1783 2000 85.4% chr17 + 53647285 53647332 48 browser details YourSeq 34 1734 1781 2000 92.5% chr3 - 131472619 131472667 49 browser details YourSeq 32 1732 1783 2000 76.4% chr7 - 123522752 123522797 46 browser details YourSeq 32 1755 1790 2000 97.3% chr6 - 81241326 81241367 42 browser details YourSeq 32 1750 1783 2000 97.1% chr2 - 170403062 170403095 34 browser details YourSeq 32 1752 1783 2000 100.0% chr16 - 18456754 18456785 32 browser details YourSeq 32 1746 1779 2000 97.1% chr13 + 14000368 14000401 34 browser details YourSeq 31 1751 1790 2000 90.0% chr8 + 107441806 107441847 42 browser details YourSeq 31 1748 1784 2000 91.9% chr10 + 111571840 111571876 37 browser details YourSeq 30 1746 1781 2000 91.7% chr2 - 112436439 112436474 36 browser details YourSeq 30 1910 1993 2000 87.5% chr13 - 82709180 82709261 82 browser details YourSeq 30 1752 1783 2000 96.9% chr11 - 54987847 54987878 32 browser details YourSeq 30 1754 1783 2000 100.0% chr11 - 53880297 53880326 30 browser details YourSeq 30 1767 1850 2000 96.9% chrX + 71712276 71712361 86 browser details YourSeq 30 1746 1781 2000 91.7% chr13 + 97147965 97148000 36 browser details YourSeq 29 1754 1784 2000 96.8% chr8 - 83767366 83767396 31

Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Rbpms RNA binding protein gene with multiple splicing [ Mus musculus (house mouse) ] Gene ID: 19663, updated on 12-Aug-2019

Gene summary

Official Symbol Rbpms provided by MGI Official Full Name RNA binding protein gene with multiple splicing provided by MGI Primary source MGI:MGI:1334446 See related Ensembl:ENSMUSG00000031586 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as RBP-MS; hermes; AU017537; 2010300K22Rik; 2700019M19Rik Expression Broad expression in ovary adult (RPKM 24.2), bladder adult (RPKM 22.5) and 20 other tissues See more Orthologs human all

Genomic context

Location: 8; 8 A4 See Rbpms in Genome Data Viewer Exon count: 14

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 8 NC_000074.6 (33782643..33929922, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 8 NC_000074.5 (34893116..35040313, complement)

Chromosome 8 - NC_000074.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 15 transcripts

Gene: Rbpms ENSMUSG00000031586

Description RNA binding protein gene with multiple splicing [Source:MGI Symbol;Acc:MGI:1334446] Gene Synonyms 2010300K22Rik, 2700019M19Rik, hermes Location : 33,782,643-33,929,863 reverse strand. GRCm38:CM001001.2 About this gene This gene has 15 transcripts (splice variants), 195 orthologues, 1 paralogue and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Rbpms- ENSMUST00000191473.6 2547 197aa ENSMUSP00000140387.1 Protein coding CCDS40317 Q9WVB0 TSL:1 213 GENCODE basic APPRIS ALT1

Rbpms- ENSMUST00000053251.11 2466 197aa ENSMUSP00000055813.4 Protein coding CCDS40317 Q9WVB0 TSL:1 203 GENCODE basic APPRIS ALT1

Rbpms- ENSMUST00000033994.14 2375 191aa ENSMUSP00000033994.8 Protein coding CCDS72112 T1ECW4 TSL:1 201 GENCODE basic

Rbpms- ENSMUST00000033995.13 1039 220aa ENSMUSP00000033995.6 Protein coding CCDS40318 Q9WVB0 TSL:1 202 GENCODE basic APPRIS P4

Rbpms- ENSMUST00000183088.1 592 159aa ENSMUSP00000138420.1 Protein coding - S4R1Y2 CDS 3' 210 incomplete TSL:5

Rbpms- ENSMUST00000183062.1 459 64aa ENSMUSP00000138726.1 Protein coding - S4R2P3 CDS 3' 209 incomplete TSL:3

Rbpms- ENSMUST00000182987.7 1385 197aa ENSMUSP00000138483.1 Nonsense mediated - Q9WVB0 TSL:1 208 decay

Rbpms- ENSMUST00000182256.7 865 93aa ENSMUSP00000138140.1 Nonsense mediated - S4R1A1 TSL:3 205 decay

Rbpms- ENSMUST00000183336.7 549 70aa ENSMUSP00000138533.1 Nonsense mediated - S4R280 CDS 5' 212 decay incomplete TSL:3

Rbpms- ENSMUST00000182926.7 405 58aa ENSMUSP00000138361.1 Nonsense mediated - S4R1T1 CDS 5' 207 decay incomplete TSL:3

Rbpms- ENSMUST00000231942.1 2534 No - Retained intron - - - 215 protein

Rbpms- ENSMUST00000183122.1 525 No - Retained intron - - TSL:2 211 protein

Rbpms- ENSMUST00000182184.7 643 No - lncRNA - - TSL:1 204 protein

Rbpms- ENSMUST00000231786.1 516 No - lncRNA - - - 214 protein

Rbpms- ENSMUST00000182871.1 317 No - lncRNA - - TSL:3 206 protein

Page 7 of 9 https://www.alphaknockout.com

167.22 kb Forward strand 33.80Mb 33.85Mb 33.90Mb Gtf2e2-202 >protein coding Gm26632-201 >lncRNA (Comprehensive set...

Gtf2e2-201 >protein coding

Gtf2e2-203 >protein coding

Gtf2e2-204 >lncRNA

Contigs AC123616.11 > < AC162523.9

Genes (Comprehensive set... < Rbpms-203protein coding

< Rbpms-213protein coding

< Rbpms-201protein coding

< Rbpms-208nonsense mediated decay

< Rbpms-205nonsense mediated decay < Rbpms-206lncRNA < Rbpms-215retained intron

< Rbpms-212nonsense mediated decay < Gm26978-201lncRNA

< Rbpms-207nonsense mediated decay

< Rbpms-204lncRNA

< Rbpms-211retained intron

< Rbpms-202protein coding

< Rbpms-209protein coding

< Rbpms-210protein coding

< Rbpms-214lncRNA

Regulatory Build

33.80Mb 33.85Mb 33.90Mb Reverse strand 167.22 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000191473

< Rbpms-213protein coding

Reverse strand 147.20 kb

ENSMUSP00000140... Low complexity (Seg) Superfamily RNA-binding domain superfamily SMART RNA recognition motif domain Pfam RNA recognition motif domain PROSITE profiles RNA recognition motif domain PANTHER PTHR10501:SF25

PTHR10501 Gene3D Nucleotide-binding alpha-beta plait domain superfamily CDD cd12682

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 197

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9