http://www.alphaknockout.com/ Mouse Insm1 Conditional Knockout Project (CRISPR/Cas9)

Objective: To create a Insm1 conditional knockout mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Insm1 ( NCBI Reference Sequence: NM_016889 ; Ensembl: ENSMUSG00000068154 ) is located on mouse 2. 1 exon is identified , with the ATG start codon in exon 1 and the TAG stop codon in exon 1 (Transcript: ENSMUST00000089257). Exon 1 will be selected as conditional knockout region (cKO region). Deletion of this region should result in the loss of function of the mouse Insm1 gene. To engineer the targeting vector, homologous arms and cKO region will be generated by PCR using BAC clone RP23-114P5 as template. Cas9, gRNA and targeting vector will be co-injected into fertilized eggs for cKO mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a null allele display perinatal and neonatal lethality, respiratory failure, and impaired pancreatic and intestinal endocrine cell development.

Exon 1 starts from the start codon. The knockout of Exon 1 cover 100% of the coding region. The size of effective cKO region: ~3232 bp. This strategy is designed based on genetic information in existing databases. Due to the complexity of biological processes, all risk of loxP insertion on gene transcription, RNA splicing and translation cannot be predicted at existing technological level.

Page 1 of 7 http://www.alphaknockout.com/

Overview of the Targeting Strategy

gRNA region

Wildtype allele A T

5' G gRNA region 3'

1

Targeting vector A T G

Targeted allele A T G

Constitutive KO allele (After Cre recombination)

Legends Homology arm Exon of mouse Insm1 cKO region loxP site

Page 2 of 7 http://www.alphaknockout.com/

Overview of the Dot Plot Window size: 10 bp

Forward Reverse Complement

Sequence 12

Note: The sequence of homologous arms and cKO region is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution Window size: 300 bp

Sequence 12

Summary: Full Length(9100bp) | A(22.49% 2047) | C(27.49% 2502) | G(26.26% 2390) | T(23.75% 2161)

Note: The sequence of homologous arms and cKO region is analyzed to determine the GC content. Significant high GC-content regions are found. It may be difficult to construct this targeting vector.

Page 3 of 7 http://www.alphaknockout.com/

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr2 + 146218922 146221921 3000

browser details YourSeq 32 2942 2976 3000 97.1% chr5 - 114796937 114796971 35

browser details YourSeq 31 2936 2968 3000 97.0% chr10 + 42018384 42018416 33

browser details YourSeq 30 2954 2998 3000 89.2% chr11 + 94328216 94328265 50

browser details YourSeq 30 2941 2976 3000 90.7% chr10 + 126895714 126895748 35

browser details YourSeq 29 1411 1442 3000 96.9% chr2 + 114926731 114926764 34

browser details YourSeq 29 2935 2967 3000 83.9% chr10 + 94450628 94450658 31

browser details YourSeq 24 2945 2968 3000 100.0% chr1 + 157554870 157554893 24

Note: The 3000 bp section upstream of Exon 1 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 3000 1 3000 3000 100.0% chr2 + 146225022 146228021 3000

browser details YourSeq 30 1625 1661 3000 94.0% chrX - 116507847 116507891 45

browser details YourSeq 24 893 916 3000 100.0% chr10 - 60368341 60368364 24

browser details YourSeq 23 1915 1937 3000 100.0% chr14 - 64924035 64924057 23

browser details YourSeq 21 2584 2604 3000 100.0% chr6 - 7755906 7755926 21

browser details YourSeq 21 2259 2279 3000 100.0% chr13 - 119753326 119753346 21

Note: The 3000 bp section downstream of Exon 1 is BLAT searched against the genome. No significant similarity is found.

Page 4 of 7 http://www.alphaknockout.com/ Gene and protein information: Insm1 insulinoma-associated 1 [ Mus musculus (house mouse) ] Gene ID: 53626, updated on 17-Nov-2020

Gene summary

Official Symbol Insm1 provided by MGI Official Full Name insulinoma-associated 1 provided by MGI Primary source MGI:MGI:1859980 See related Ensembl:ENSMUSG00000068154 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as IA; IA-1 Orthologs human all NEW Try the new Gene table Try the new Transcript table

Genomic context

Location: 2; 2 G1 See Insm1 in Genome Data Viewer Exon count: 1

Annotation release Status Assembly Chr Location

109 current GRCm39 (GCF_000001635.27) 2 NC_000068.8 (146063917..146066940)

108.20200622 previous assembly GRCm38.p6 (GCF_000001635.26) 2 NC_000068.7 (146221997..146225020)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 2 NC_000068.6 (146047733..146050756)

Chromosome 2 - NC_000068.8

Page 5 of 7 http://www.alphaknockout.com/

Transcript information: This gene has 1 transcript

Gene: Insm1 ENSMUSG00000068154

Description insulinoma-associated 1 [Source:MGI Symbol;Acc:MGI:1859980] Gene Synonyms IA-1 Location Chromosome 2: 146,063,841-146,066,940 forward strand. GRCm39:CM000995.3 About this gene This gene has 1 transcript (splice variant), 208 orthologues, 1 paralogue and is associated with 36 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Match Flags

Insm1-201 ENSMUST00000089257.6 3100 521aa ENSMUSP00000092048.4 Protein coding CCDS16830 Q05BD7 Q63ZV0 TSL:NA GENCODE basic APPRIS P1

23.10 kb Forward strand 146.055Mb 146.060Mb 146.065Mb 146.070Mb 146.075Mb (Comprehensive set... Cfap61-206 >protein coding Insm1-201 >protein coding

Cfap61-209 >processed transcript

Cfap61-208 >processed transcript

Cfap61-210 >processed transcript

Cfap61-214 >protein coding

Contigs AL935056.18 > Regulatory Build

146.055Mb 146.060Mb 146.065Mb 146.070Mb 146.075Mb Reverse strand 23.10 kb

Regulation Legend CTCF Promoter Promoter Flank

Gene Legend Protein Coding

merged Ensembl/Havana Ensembl protein coding

Non-Protein Coding

processed transcript

Page 6 of 7 http://www.alphaknockout.com/

Transcript: ENSMUST00000089257

3.10 kb Forward strand

Insm1-201 >protein coding

ENSMUSP00000092... MobiDB lite Low complexity (Seg) Superfamily Zinc finger C2H2 superfamily SMART Zinc finger C2H2-type Pfam Zinc finger C2H2-type PROSITE profiles Zinc finger C2H2-type PROSITE patterns Zinc finger C2H2-type PANTHER Insulinoma-associated protein 1/2

PTHR15065:SF5 Gene3D 3.30.160.60

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend frameshift variant missense variant synonymous variant

Scale bar 0 60 120 180 240 300 360 420 521

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC, VectorBuilder.

Page 7 of 7