https://www.alphaknockout.com

Mouse Sgip1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Sgip1 conditional knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Sgip1 (NCBI Reference Sequence: NM_144906 ; Ensembl: ENSMUSG00000028524 ) is located on Mouse 4. 24 exons are identified, with the ATG start codon in exon 1 and the TAA stop codon in exon 24 (Transcript: ENSMUST00000080728). Exon 2 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the Mouse Sgip1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP24-63D7 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 2 starts from about 0.45% of the coding region. The knockout of Exon 2 will result in frameshift of the gene. The size of intron 1 for 5'-loxP site insertion: 92317 bp, and the size of intron 2 for 3'-loxP site insertion: 7823 bp. The size of effective cKO region: ~564 bp. The cKO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele gRNA region 5' gRNA region 3'

1 2 24 Targeting vector

Targeted allele

Constitutive KO allele (After Cre recombination)

Legends Exon of mouse Sgip1 Homology arm cKO region loxP site

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. It may be difficult to construct this targeting vector.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(7064bp) | A(28.94% 2044) | C(20.61% 1456) | T(27.31% 1929) | G(23.15% 1635)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr4 + 102849634 102852633 3000 browser details YourSeq 78 1808 1991 3000 87.9% chr8 + 7780316 7780486 171 browser details YourSeq 66 1812 1883 3000 97.2% chr1 - 185238047 185238118 72 browser details YourSeq 64 144 268 3000 87.4% chr17 - 57242424 57242547 124 browser details YourSeq 62 1808 1886 3000 86.0% chr7 - 71544001 71544065 65 browser details YourSeq 62 1808 1889 3000 84.7% chr5 - 71215171 71215236 66 browser details YourSeq 61 146 526 3000 66.3% chr8 - 14721711 14721821 111 browser details YourSeq 60 1808 1884 3000 85.5% chr18 + 69506027 69506089 63 browser details YourSeq 59 144 222 3000 84.7% chr1 - 57884229 57884306 78 browser details YourSeq 54 144 268 3000 82.6% chr6 - 30428295 30428411 117 browser details YourSeq 54 145 269 3000 88.4% chr5 - 137591961 137592082 122 browser details YourSeq 54 148 269 3000 72.2% chr12 + 28926515 28926636 122 browser details YourSeq 53 180 532 3000 62.0% chr6 + 90607005 90607108 104 browser details YourSeq 52 144 211 3000 87.1% chr6 - 92386252 92386317 66 browser details YourSeq 52 1829 1886 3000 94.5% chr2 - 163967841 163967897 57 browser details YourSeq 52 1833 1884 3000 100.0% chr11 - 42252763 42252814 52 browser details YourSeq 52 1835 1886 3000 100.0% chr3 + 12309165 12309216 52 browser details YourSeq 52 1833 1884 3000 100.0% chr17 + 63431203 63431254 52 browser details YourSeq 51 177 264 3000 88.4% chr5 - 122676929 122677015 87 browser details YourSeq 51 180 263 3000 91.7% chr4 + 137671047 137671130 84

Note: The 3000 bp section upstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr4 + 102853198 102856197 3000 browser details YourSeq 32 1735 1768 3000 97.1% chr3 + 18593025 18593058 34 browser details YourSeq 29 1726 1769 3000 71.1% chr18 - 24356573 24356610 38 browser details YourSeq 28 1158 1198 3000 96.7% chr4 - 127422620 127422661 42 browser details YourSeq 26 556 588 3000 96.5% chr6 - 44886964 44886997 34 browser details YourSeq 25 556 587 3000 96.3% chr14 - 36699595 36699627 33 browser details YourSeq 25 556 587 3000 96.3% chr13 - 16961399 16961431 33 browser details YourSeq 25 556 587 3000 96.3% chr10 + 98529550 98529582 33 browser details YourSeq 25 556 587 3000 96.3% chr10 + 19049191 19049223 33 browser details YourSeq 24 557 587 3000 96.2% chr12 - 61087305 61087336 32 browser details YourSeq 24 556 587 3000 96.2% chr1 - 22169370 22169402 33 browser details YourSeq 22 45 68 3000 87.0% chr2 + 14525019 14525041 23

Note: The 3000 bp section downstream of Exon 2 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 8 https://www.alphaknockout.com

Gene and information: Sgip1 SH3-domain GRB2-like (endophilin) interacting protein 1 [ Mus musculus (house mouse) ] Gene ID: 73094, updated on 10-Oct-2019

Gene summary

Official Symbol Sgip1 provided by MGI Official Full Name SH3-domain GRB2-like (endophilin) interacting protein 1 provided by MGI Primary source MGI:MGI:1920344 See related Ensembl:ENSMUSG00000028524 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as gag; SGIP1alpha; 3110007P09Rik Expression Biased expression in frontal lobe adult (RPKM 10.0), cortex adult (RPKM 5.5) and 7 other tissues See more Orthologs human all

Genomic context

Location: 4; 4 C6 See Sgip1 in Genome Data Viewer

Exon count: 30

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 4 NC_000070.6 (102741302..102977426)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 4 NC_000070.5 (102432968..102643782)

Chromosome 4 - NC_000070.6

Page 5 of 8 https://www.alphaknockout.com

Transcript information: This gene has 15 transcripts

Gene: Sgip1 ENSMUSG00000028524

Description SH3-domain GRB2-like (endophilin) interacting protein 1 [Source:MGI Symbol;Acc:MGI:1920344] Gene Synonyms 3110007P09Rik Location Chromosome 4: 102,741,297-102,973,628 forward strand. GRCm38:CM000997.2 About this gene This gene has 15 transcripts (splice variants), 250 orthologues, 5 paralogues, is a member of 1 Ensembl protein family and is associated with 12 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Sgip1- ENSMUST00000066824.13 4305 659aa ENSMUSP00000063712.7 Protein coding CCDS71429 Q8VD37 TSL:1 201 GENCODE basic APPRIS P4

Sgip1- ENSMUST00000072481.11 3915 639aa ENSMUSP00000072301.5 Protein coding CCDS71430 Q8VD37 TSL:1 202 GENCODE basic APPRIS ALT1

Sgip1- ENSMUST00000080728.12 3712 806aa ENSMUSP00000079553.6 Protein coding CCDS18404 Q8VD37 TSL:1 203 GENCODE basic

Sgip1- ENSMUST00000106882.8 6061 826aa ENSMUSP00000102495.2 Protein coding - Q8VD37 TSL:5 205 GENCODE basic

Sgip1- ENSMUST00000149547.1 2251 575aa ENSMUSP00000122556.1 Protein coding - F7CPX0 CDS 5' 213 incomplete TSL:5

Sgip1- ENSMUST00000156596.7 415 70aa ENSMUSP00000115075.1 Protein coding - F6VMS8 CDS 3' 214 incomplete TSL:3

Sgip1- ENSMUST00000183855.7 1121 188aa ENSMUSP00000139337.1 Nonsense mediated - Q8VD37 TSL:2 215 decay

Sgip1- ENSMUST00000097948.8 744 No - Retained intron - - TSL:3 204 protein

Sgip1- ENSMUST00000142430.7 2271 No - lncRNA - - TSL:1 212 protein

Sgip1- ENSMUST00000140676.1 815 No - lncRNA - - TSL:3 210 protein

Sgip1- ENSMUST00000140289.1 713 No - lncRNA - - TSL:3 209 protein

Sgip1- ENSMUST00000136164.1 658 No - lncRNA - - TSL:2 207 protein

Sgip1- ENSMUST00000131431.1 549 No - lncRNA - - TSL:3 206 protein

Sgip1- ENSMUST00000141624.1 523 No - lncRNA - - TSL:3 211 protein

Sgip1- ENSMUST00000137592.7 466 No - lncRNA - - TSL:3 208 protein

Page 6 of 8 https://www.alphaknockout.com

252.33 kb Forward strand 102.75Mb 102.80Mb 102.85Mb 102.90Mb 102.95Mb (Comprehensive set... Gm24869-201 >snoRNA Sgip1-213 >protein coding

Sgip1-204 >retained intron Sgip1-206 >lncRNA Tctex1d1-208 >protein coding

Sgip1-212 >lncRNA Sgip1-211 >lncRNA Sgip1-210 >lncRNA

Sgip1-201 >protein coding

Sgip1-208 >lncRNA Tctex1d1-207 >nonsense mediated decay

Sgip1-202 >protein coding

Sgip1-214 >protein coding

Sgip1-207 >lncRNA

Sgip1-203 >protein coding

Sgip1-209 >lncRNA

Sgip1-215 >nonsense mediated decay

Sgip1-205 >protein coding

Contigs AL772290.3 > AL732543.8 > Genes < Gm12709-201lncRNA (Comprehensive set...

Regulatory Build

102.75Mb 102.80Mb 102.85Mb 102.90Mb 102.95Mb Reverse strand 252.33 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript RNA gene

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000080728

210.81 kb Forward strand

Sgip1-203 >protein coding

ENSMUSP00000079... MobiDB lite Low complexity (Seg) Superfamily AP-2 complex subunit mu, C-terminal superfamily Pfam Muniscin C-terminal PROSITE profiles Mu homology domain PANTHER PTHR23065

PTHR23065:SF27 CDD SGIP1, mu-homology domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 80 160 240 320 400 480 560 640 720 806

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8