https://www.alphaknockout.com

Mouse Tagln2 Knockout Project (CRISPR/Cas9)

Objective: To create a Tagln2 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Tagln2 (NCBI Reference Sequence: NM_178598 ; Ensembl: ENSMUSG00000026547 ) is located on Mouse 1. 5 exons are identified, with the ATG start codon in exon 2 and the TGA stop codon in exon 5 (Transcript: ENSMUST00000111230). Exon 2~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a null allele display abnormalities in T cell physiology including cytotoxicity.

Exon 2 starts from about 0.17% of the coding region. Exon 2~5 covers 100.0% of the coding region. The size of effective KO region: ~1558 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 2 3 4 5

Legends Exon of mouse Tagln2 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(21.35% 427) | C(26.1% 522) | T(25.95% 519) | G(26.6% 532)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(20.8% 416) | C(26.5% 530) | T(27.8% 556) | G(24.9% 498)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr1 + 172503149 172505148 2000 browser details YourSeq 50 901 965 2000 93.3% chr2 - 3741034 3741112 79 browser details YourSeq 50 903 965 2000 89.7% chr9 + 90536677 90536738 62 browser details YourSeq 50 899 965 2000 87.1% chr13 + 72629659 72629718 60 browser details YourSeq 43 903 965 2000 78.5% chr6 + 134624948 134625003 56 browser details YourSeq 38 916 965 2000 95.4% chr5 + 128958586 128958644 59 browser details YourSeq 38 909 965 2000 76.8% chr17 + 15631694 15631740 47 browser details YourSeq 36 899 938 2000 97.4% chr6 - 71374887 71374931 45 browser details YourSeq 36 900 939 2000 92.2% chr4 + 88320744 88320782 39 browser details YourSeq 35 909 959 2000 94.9% chr13 + 18128370 18128422 53 browser details YourSeq 34 909 961 2000 94.8% chr17 - 13530780 13530842 63 browser details YourSeq 34 903 947 2000 83.4% chr4 + 89217123 89217165 43 browser details YourSeq 33 916 955 2000 94.8% chrX - 18343161 18343201 41 browser details YourSeq 33 903 965 2000 69.5% chr5 - 148348039 148348078 40 browser details YourSeq 32 903 945 2000 91.9% chr2 - 134656420 134656659 240 browser details YourSeq 32 909 965 2000 66.7% chr11 + 7695064 7695100 37 browser details YourSeq 31 909 960 2000 83.4% chr7 - 71409097 71409143 47 browser details YourSeq 31 904 940 2000 94.2% chr11 - 54168500 54168541 42 browser details YourSeq 30 909 948 2000 75.8% chr5 - 88589590 88589622 33 browser details YourSeq 30 904 940 2000 79.5% chr4 + 107924071 107924104 34

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr1 + 172506707 172508706 2000 browser details YourSeq 53 178 323 2000 98.2% chr14 - 64408730 64409088 359 browser details YourSeq 30 194 223 2000 100.0% chr6 + 100185202 100185231 30 browser details YourSeq 29 1050 1084 2000 87.9% chr14 - 35001599 35001632 34 browser details YourSeq 29 1753 1821 2000 96.8% chr5 + 99438763 99438831 69 browser details YourSeq 27 1533 1560 2000 100.0% chr18 - 76645857 76645896 40 browser details YourSeq 27 1536 1562 2000 100.0% chr9 + 13699828 13699854 27 browser details YourSeq 24 40 63 2000 100.0% chr1 + 140506590 140506613 24 browser details YourSeq 24 452 482 2000 72.0% chr1 + 22663526 22663550 25 browser details YourSeq 23 40 62 2000 100.0% chr6 + 92426495 92426517 23 browser details YourSeq 22 1571 1592 2000 100.0% chr4 + 86933395 86933416 22 browser details YourSeq 21 1533 1553 2000 100.0% chr7 - 79514036 79514056 21 browser details YourSeq 21 1372 1392 2000 100.0% chr4 - 114659556 114659576 21 browser details YourSeq 21 1533 1553 2000 100.0% chr7 + 4924501 4924521 21 browser details YourSeq 21 41 61 2000 100.0% chr10 + 27801231 27801251 21 browser details YourSeq 21 1300 1320 2000 100.0% chr1 + 62950656 62950676 21

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Tagln2 transgelin 2 [ Mus musculus (house mouse) ] Gene ID: 21346, updated on 8-Oct-2019

Gene summary

Official Symbol Tagln2 provided by MGI Official Full Name transgelin 2 provided by MGI Primary source MGI:MGI:1312985 See related Ensembl:ENSMUSG00000026547 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as Sm22B; Sm22a; SM22beta; 2700094C18Rik Expression Broad expression in stomach adult (RPKM 455.3), lung adult (RPKM 276.3) and 19 other tissues See more Orthologs human all

Genomic context

Location: 1 H3; 1 79.89 cM See Tagln2 in Genome Data Viewer Exon count: 5

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 1 NC_000067.6 (172500246..172507375)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 1 NC_000067.5 (174430377..174437506)

Chromosome 1 - NC_000067.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Tagln2 ENSMUSG00000026547

Description transgelin 2 [Source:MGI Symbol;Acc:MGI:1312985] Gene Synonyms 2700094C18Rik, Sm22B Location : 172,500,047-172,507,380 forward strand. GRCm38:CM000994.2 About this gene This gene has 4 transcripts (splice variants), 158 orthologues, 5 paralogues, is a member of 1 Ensembl protein family and is associated with 2 phenotypes. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Tagln2-202 ENSMUST00000111230.7 1586 199aa ENSMUSP00000106861.1 Protein coding CCDS35784 Q9WVA4 TSL:1 GENCODE basic APPRIS P1

Tagln2-201 ENSMUST00000111228.1 716 199aa ENSMUSP00000106859.1 Protein coding CCDS35784 Q9WVA4 TSL:3 GENCODE basic APPRIS P1

Tagln2-204 ENSMUST00000192460.1 369 90aa ENSMUSP00000141983.1 Protein coding - A0A0A6YXG6 CDS 3' incomplete TSL:2

Tagln2-203 ENSMUST00000138296.7 1101 No protein - Retained intron - - TSL:2

27.33 kb Forward strand 172.495Mb 172.500Mb 172.505Mb 172.510Mb 172.515Mb Igsf9-208 >protein coding Igsf9-210 >retained intronIgsf9-212 >retained intron Gm37125-201 >TEC (Comprehensive set...

Igsf9-201 >protein coding Tagln2-202 >protein coding

Igsf9-206 >retained intron Igsf9-213 >retained intron Tagln2-203 >retained intron

Igsf9-205 >nonsense mediated decay Tagln2-204 >protein coding

Igsf9-202 >protein coding Tagln2-201 >protein coding

Igsf9-211 >retained intron

Igsf9-214 >protein coding

Igsf9-203 >retained intron

Contigs < AC121551.11 Regulatory Build

172.495Mb 172.500Mb 172.505Mb 172.510Mb 172.515Mb Reverse strand 27.33 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000111230

7.33 kb Forward strand

Tagln2-202 >protein coding

ENSMUSP00000106... Superfamily CH domain superfamily

SMART Calponin homology domain

Prints PR00890

Smooth muscle protein/calponin Pfam Calponin homology domain Calponin repeat

PROSITE profiles Calponin homology domain Calponin repeat

PROSITE patterns Calponin repeat

PANTHER PTHR18959

Transgelin-2 Gene3D CH domain superfamily

CDD Calponin homology domain

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 199

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8