https://www.alphaknockout.com

Mouse Msmb Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Msmb conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Msmb (NCBI Reference Sequence: NM_020597 ; Ensembl: ENSMUSG00000021907 ) is located on Mouse 14. 4 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 4 (Transcript: ENSMUST00000022464). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Msmb gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-95M10 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-in allele exhibit early development progressing from intraepithelial neoplasia with microinvasion to well-differentiated prostate gland adenocarcinoma, and show enlargement of the prostate gland and the lymph nodes, and increased metastatic potential.

Exon 2 starts from about 1.18% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 6001 bp, and the size of intron 2 for 3'-loxP site insertion: 1981 bp. The size of effective cKO region: ~606 bp. The cKO region does not have any other known gene.

Page 1 of 7 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 3 4 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Msmb Homology arm cKO region loxP site

Page 2 of 7 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7106bp) | A(25.81% 1834) | C(21.45% 1524) | T(30.48% 2166) | G(22.26% 1582)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 7 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr14 + 32144827 32147826 3000 browser details YourSeq 361 2469 3000 3000 92.3% chr6 - 37680866 37681565 700 browser details YourSeq 324 2469 2866 3000 92.3% chr17 - 26317589 26318007 419 browser details YourSeq 321 2469 2866 3000 92.3% chr14 - 27539126 27539522 397 browser details YourSeq 321 2471 2866 3000 92.6% chr13 + 85523334 85523749 416 browser details YourSeq 320 2469 2852 3000 93.8% chr4 - 144812059 144812472 414 browser details YourSeq 319 2469 2866 3000 91.5% chr17 - 26309078 26309497 420 browser details YourSeq 315 2469 2866 3000 90.4% chr9 + 5867811 5868233 423 browser details YourSeq 314 2469 2878 3000 91.0% chr7 - 142234125 142238413 4289 browser details YourSeq 311 2469 2855 3000 93.6% chr9 - 37976267 37976666 400 browser details YourSeq 311 2469 2852 3000 92.9% chr10 - 40787098 40787516 419 browser details YourSeq 311 2471 2862 3000 91.1% chrX + 120224471 120224887 417 browser details YourSeq 311 2469 2867 3000 92.5% chr7 + 28530085 28530519 435 browser details YourSeq 309 2469 2855 3000 92.6% chr18 - 6593107 6593507 401 browser details YourSeq 308 2469 2863 3000 92.4% chr15 - 33328347 33328759 413 browser details YourSeq 308 2469 2866 3000 90.3% chr15 + 76961415 76961842 428 browser details YourSeq 306 2469 2852 3000 92.5% chr9 - 104727262 104727673 412 browser details YourSeq 303 2322 2826 3000 92.7% chr6 - 129106636 129107177 542 browser details YourSeq 303 2469 2867 3000 91.1% chr19 - 61217104 61217513 410 browser details YourSeq 302 2469 2852 3000 91.4% chr7 - 142234603 142235009 407

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr14 + 32148433 32151432 3000 browser details YourSeq 345 2308 2805 3000 87.0% chr1 - 44082687 44083195 509 browser details YourSeq 331 2308 2804 3000 85.0% chr14 + 100290034 100290533 500 browser details YourSeq 330 2308 2794 3000 89.9% chr12 - 58298387 58298892 506 browser details YourSeq 325 2308 2794 3000 87.4% chrX + 163685380 163685882 503 browser details YourSeq 324 2314 2805 3000 87.8% chr1 - 53964996 53965506 511 browser details YourSeq 323 2327 2803 3000 87.9% chr10 - 32262398 32675790 413393 browser details YourSeq 323 2308 2803 3000 88.4% chr6 + 105647016 105647529 514 browser details YourSeq 321 2310 2787 3000 89.3% chr2 + 69967626 69968120 495 browser details YourSeq 319 2357 2803 3000 88.0% chr11 - 4412976 4413419 444 browser details YourSeq 315 2081 2803 3000 86.3% chr11 + 70111340 70112013 674 browser details YourSeq 314 2308 2794 3000 86.9% chr2 + 3902101 3902600 500 browser details YourSeq 312 2315 2805 3000 88.5% chr3 + 50246113 50246616 504 browser details YourSeq 312 2382 2794 3000 88.7% chr19 + 20022477 20022905 429 browser details YourSeq 309 2385 2804 3000 87.5% chr16 - 16449405 16449828 424 browser details YourSeq 309 2308 2794 3000 86.4% chr11 + 10036462 10036963 502 browser details YourSeq 305 2340 2794 3000 87.5% chr1 + 84080912 84081391 480 browser details YourSeq 304 2396 2803 3000 89.4% chr11 + 110382224 110382638 415 browser details YourSeq 303 2396 2803 3000 87.4% chr9 - 53218938 53219337 400 browser details YourSeq 299 2396 2791 3000 88.1% chr17 - 3751136 3751539 404

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 https://www.alphaknockout.com

Gene and information: Msmb beta-microseminoprotein [ Mus musculus (house mouse) ] Gene ID: 17695, updated on 12-Aug-2019

Gene summary

Official Symbol Msmb provided by MGI Official Full Name beta-microseminoprotein provided by MGI Primary source MGI:MGI:97166 See related Ensembl:ENSMUSG00000021907 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as PIP; PSP94; beta-MSP Expression Low expression observed in reference dataset See more Orthologs human all

Genomic context

Location: 14; 14 B See Msmb in Genome Data Viewer

Exon count: 6

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 14 NC_000080.6 (32142023..32158327)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 14 NC_000080.5 (32955209..32971513)

Chromosome 14 - NC_000080.6

Page 5 of 7 https://www.alphaknockout.com

Transcript information: This gene has 2 transcripts

Gene: Msmb ENSMUSG00000021907

Description beta-microseminoprotein [Source:MGI Symbol;Acc:MGI:97166] Gene Synonyms PIP, PSP94, beta-MSP, beta-inhibin, prostatic inhibin protein Location Chromosome 14: 32,142,023-32,158,370 forward strand. GRCm38:CM001007.2 About this gene This gene has 2 transcripts (splice variants), 125 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Msmb-201 ENSMUST00000022464.13 562 113aa ENSMUSP00000022464.6 Protein coding CCDS36862 O08540 TSL:1 GENCODE basic APPRIS P1

Msmb-202 ENSMUST00000130397.1 537 No protein - lncRNA - - TSL:3

36.35 kb Forward strand

32.14Mb 32.15Mb 32.16Mb (Comprehensive set... Msmb-201 >protein coding Ncoa4-202 >protein coding

Msmb-202 >lncRNA Ncoa4-212 >retained intron

Ncoa4-208 >protein coding

Ncoa4-207 >protein coding

Ncoa4-206 >nonsense mediated decay

Ncoa4-201 >protein coding

Ncoa4-204 >protein coding

Ncoa4-205 >protein coding

Ncoa4-203 >protein coding

Contigs < AC154532.3 Genes < Gm18909-201processed pseudogene (Comprehensive set...

Regulatory Build

32.14Mb 32.15Mb 32.16Mb Reverse strand 36.35 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript pseudogene

Page 6 of 7 https://www.alphaknockout.com

Transcript: ENSMUST00000022464

16.34 kb Forward strand

Msmb-201 >protein coding

ENSMUSP00000022... Cleavage site (Sign... Pfam Beta-microseminoprotein PANTHER Beta-microseminoprotein

PTHR10500:SF0 Gene3D 2.10.70.10 2.20.25.590

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 10 20 30 40 50 60 70 80 90 100 113

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 7 of 7