https://www.alphaknockout.com

Mouse Shcbp1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Shcbp1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Shcbp1 (NCBI Reference Sequence: NM_011369 ; Ensembl: ENSMUSG00000022322 ) is located on Mouse 8. 13 exons are identified, with the ATG start codon in exon 1 and the TAG stop codon in exon 13 (Transcript: ENSMUST00000022945). Exon 4 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Shcbp1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-78D8 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a knock-out allele exhibit normal viability, fertility and T cell development but show decreased susceptibility to experimental autoimmune encephalomyelitis.

Exon 4 starts from about 19.36% of the coding region. The knockout of Exon 4 will result in frameshift of the gene. The size of intron 3 for 5'-loxP site insertion: 2817 bp, and the size of intron 4 for 3'-loxP site insertion: 10568 bp. The size of effective cKO region: ~709 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 4 13 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Shcbp1 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7209bp) | A(28.62% 2063) | C(17.05% 1229) | T(35.05% 2527) | G(19.28% 1390)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 - 4765340 4768339 3000 browser details YourSeq 52 826 917 3000 93.4% chr16 - 8331956 8332475 520 browser details YourSeq 51 805 917 3000 79.1% chr15 - 36961886 36961989 104 browser details YourSeq 51 878 984 3000 88.1% chr1 + 157428036 157428147 112 browser details YourSeq 45 804 972 3000 72.6% chr14 + 123527491 123527621 131 browser details YourSeq 44 877 984 3000 70.4% chr4 - 145182864 145182971 108 browser details YourSeq 42 1568 1950 3000 69.6% chr1 + 93943371 93943690 320 browser details YourSeq 41 878 982 3000 69.6% chr19 - 14472432 14472536 105 browser details YourSeq 41 878 984 3000 69.2% chr13 - 100544167 100544273 107 browser details YourSeq 41 883 984 3000 93.8% chr10 - 84550041 84550145 105 browser details YourSeq 40 878 984 3000 62.7% chr14 - 48511038 48511128 91 browser details YourSeq 39 776 909 3000 62.7% chr1 - 74979699 74979788 90 browser details YourSeq 36 945 981 3000 100.0% chr9 + 59438448 59438492 45 browser details YourSeq 35 948 984 3000 97.3% chr17 + 12354135 12354171 37 browser details YourSeq 34 947 984 3000 94.8% chr15 - 3521211 3521248 38 browser details YourSeq 34 893 984 3000 68.5% chr7 + 44618242 44618333 92 browser details YourSeq 33 878 918 3000 92.4% chr14 - 79737565 79737606 42 browser details YourSeq 32 877 984 3000 64.9% chr18 - 19737759 19737866 108 browser details YourSeq 32 836 907 3000 86.2% chr16 - 22415160 22415229 70 browser details YourSeq 32 944 982 3000 94.5% chr6 + 39479275 39479321 47

Note: The 3000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr8 - 4761631 4764630 3000 browser details YourSeq 395 2445 2990 3000 90.8% chr15 + 31978615 31979108 494 browser details YourSeq 395 2546 3000 3000 92.9% chr11 + 35645051 35645500 450 browser details YourSeq 392 2553 3000 3000 94.2% chr9 - 109037300 109037751 452 browser details YourSeq 392 2556 3000 3000 94.2% chr16 - 6546395 6546841 447 browser details YourSeq 390 2531 3000 3000 91.6% chr17 - 42119259 42119713 455 browser details YourSeq 388 2541 3000 3000 92.2% chr7 + 6462692 6463151 460 browser details YourSeq 387 2530 3000 3000 90.5% chr7 + 63387767 63388218 452 browser details YourSeq 387 2566 3000 3000 94.1% chr3 + 99489439 99489872 434 browser details YourSeq 387 2566 3000 3000 95.2% chr17 + 25238513 25239054 542 browser details YourSeq 386 2555 2996 3000 93.0% chr19 - 25293398 25293836 439 browser details YourSeq 386 2565 3000 3000 94.3% chr3 + 136161722 136162157 436 browser details YourSeq 385 2564 3000 3000 93.6% chr7 - 7024020 7024455 436 browser details YourSeq 385 2565 3000 3000 94.7% chr11 + 105114274 105114716 443 browser details YourSeq 384 2565 3000 3000 94.1% chr1 + 66464531 66464966 436 browser details YourSeq 383 2556 3000 3000 92.3% chr15 - 94809315 94809755 441 browser details YourSeq 383 2566 3000 3000 94.1% chr1 - 162769104 162769538 435 browser details YourSeq 382 2565 3000 3000 94.1% chr5 - 36724734 36725361 628 browser details YourSeq 382 2562 3000 3000 93.7% chr3 - 60047462 60047913 452 browser details YourSeq 382 2558 3000 3000 93.3% chr15 - 6677762 6678206 445

Note: The 3000 bp section downstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Shcbp1 Shc SH2-domain binding protein 1 [ Mus musculus (house mouse) ] Gene ID: 20419, updated on 12-Aug-2019

Gene summary

Official Symbol Shcbp1 provided by MGI Official Full Name Shc SH2-domain binding protein 1 provided by MGI Primary source MGI:MGI:1338802 See related Ensembl:ENSMUSG00000022322 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as mPAL Expression Biased expression in CNS E11.5 (RPKM 16.0), liver E14 (RPKM 13.9) and 10 other tissues See more Orthologs human all

Genomic context

Location: 8; 8 A1.1 See Shcbp1 in Genome Data Viewer

Exon count: 13

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 8 NC_000074.6 (4735976..4779567, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 8 NC_000074.5 (4735980..4779534, complement)

Chromosome 8 - NC_000074.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 7 transcripts

Gene: Shcbp1 ENSMUSG00000022322

Description Shc SH2-domain binding protein 1 [Source:MGI Symbol;Acc:MGI:1338802] Gene Synonyms mPAL Location Chromosome 8: 4,735,976-4,779,567 reverse strand. GRCm38:CM001001.2 About this gene This gene has 7 transcripts (splice variants), 136 orthologues, 1 paralogue, is a member of 1 Ensembl protein family and is associated with 6 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Shcbp1- ENSMUST00000022945.8 2156 668aa ENSMUSP00000022945.7 Protein coding CCDS22088 Q9Z179 TSL:1 201 GENCODE basic APPRIS P1

Shcbp1- ENSMUST00000207876.1 620 207aa ENSMUSP00000146454.1 Protein coding - A0A140LHK7 CDS 5' and 3' 205 incomplete TSL:3

Shcbp1- ENSMUST00000207262.1 508 170aa ENSMUSP00000146339.1 Protein coding - A0A140LHA7 CDS 5' and 3' 202 incomplete TSL:3

Shcbp1- ENSMUST00000207665.1 2606 No - Retained - - TSL:1 203 protein intron

Shcbp1- ENSMUST00000208856.1 2365 No - Retained - - TSL:1 207 protein intron

Shcbp1- ENSMUST00000208381.1 1937 No - Retained - - TSL:1 206 protein intron

Shcbp1- ENSMUST00000207725.1 1022 No - Retained - - TSL:1 204 protein intron

Page 6 of 8 https://www.alphaknockout.com

63.59 kb Forward strand 4.73Mb 4.74Mb 4.75Mb 4.76Mb 4.77Mb 4.78Mb Gm6334-201 >processed pseudogene (Comprehensive set...

Contigs AC155814.6 > AC129942.7 > Genes (Comprehensive set... < Shcbp1-201protein coding

< Shcbp1-207retained intron

< Shcbp1-203retained intron < Shcbp1-202protein coding

< Shcbp1-205protein coding < Shcbp1-206retained intron

< Shcbp1-204retained intron

Regulatory Build

4.73Mb 4.74Mb 4.75Mb 4.76Mb 4.77Mb 4.78Mb Reverse strand 63.59 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript pseudogene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000022945

< Shcbp1-201protein coding

Reverse strand 43.58 kb

ENSMUSP00000022... Low complexity (Seg) Superfamily Pectin lyase fold/virulence factor SMART Parallel beta-helix repeat Pfam Right handed beta helix domain

PANTHER PTHR14695:SF8

PTHR14695 Gene3D Pectin lyase fold

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 480 540 600 668

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8