https://www.alphaknockout.com

Mouse Tm4sf1 Knockout Project (CRISPR/Cas9)

Objective: To create a Tm4sf1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Tm4sf1 (NCBI Reference Sequence: NM_008536 ; Ensembl: ENSMUSG00000027800 ) is located on Mouse 3. 7 exons are identified, with the ATG start codon in exon 3 and the TAA stop codon in exon 7 (Transcript: ENSMUST00000196979). Exon 3~7 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from about 0.17% of the coding region. Exon 3~7 covers 100.0% of the coding region. The size of effective KO region: ~7028 bp. The KO region does not have any other known gene.

Page 1 of 8 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 6 7

Legends Exon of mouse Tm4sf1 Knockout region

Page 2 of 8 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of start codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of stop codon is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 8 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.4% 528) | C(22.75% 455) | T(27.25% 545) | G(23.6% 472)

Note: The 2000 bp section upstream of start codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(29.5% 590) | C(20.2% 404) | T(29.8% 596) | G(20.5% 410)

Note: The 2000 bp section downstream of stop codon is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 8 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 - 57294789 57296788 2000 browser details YourSeq 27 913 951 2000 96.6% chr1 + 176019877 176019967 91 browser details YourSeq 21 1962 1982 2000 100.0% chr4 - 6976486 6976506 21 browser details YourSeq 21 740 761 2000 100.0% chr1 + 193867219 193867241 23

Note: The 2000 bp section upstream of start codon is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr3 - 57285759 57287758 2000 browser details YourSeq 70 1924 1998 2000 97.4% chr11 + 51800401 51910802 110402 browser details YourSeq 69 1909 1999 2000 89.1% chr13 + 29845633 30202041 356409 browser details YourSeq 60 1919 1998 2000 87.5% chr5 - 126708500 126708579 80 browser details YourSeq 60 1917 1996 2000 86.9% chr12 + 99552821 99552899 79 browser details YourSeq 60 1921 1992 2000 95.6% chr12 + 86908029 86908100 72 browser details YourSeq 59 1924 1998 2000 92.9% chr12 - 79972698 79972772 75 browser details YourSeq 59 1921 1997 2000 88.4% chr2 + 71711490 71711566 77 browser details YourSeq 59 1922 1992 2000 91.6% chr16 + 37348860 37348930 71 browser details YourSeq 58 1921 1998 2000 87.2% chr10 - 53980922 53980999 78 browser details YourSeq 57 1924 1998 2000 88.0% chr12 - 8768636 8768710 75 browser details YourSeq 57 1924 1998 2000 88.0% chr11 + 44436380 44436454 75 browser details YourSeq 56 1928 1993 2000 96.8% chr6 - 94300002 94300067 66 browser details YourSeq 56 1927 1998 2000 95.2% chr1 - 153350591 153350663 73 browser details YourSeq 56 1927 1992 2000 96.8% chr16 + 32030100 32030165 66 browser details YourSeq 56 1924 1997 2000 87.9% chr13 + 114829268 114829341 74 browser details YourSeq 56 1927 1992 2000 96.8% chr13 + 98928659 98928724 66 browser details YourSeq 56 1921 1979 2000 98.4% chr10 + 36426568 36426627 60 browser details YourSeq 56 1927 1998 2000 88.9% chr1 + 88765725 88765796 72 browser details YourSeq 55 1926 1998 2000 87.7% chr10 - 63832680 63832752 73

Note: The 2000 bp section downstream of stop codon is BLAT searched against the genome. No significant similarity is found.

Page 5 of 8 https://www.alphaknockout.com

Gene and information: Tm4sf1 transmembrane 4 superfamily member 1 [ Mus musculus (house mouse) ] Gene ID: 17112, updated on 24-Oct-2019

Gene summary

Official Symbol Tm4sf1 provided by MGI Official Full Name transmembrane 4 superfamily member 1 provided by MGI Primary source MGI:MGI:104678 See related Ensembl:ENSMUSG00000027800 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as L6; M3s1 Expression Broad expression in lung adult (RPKM 43.9), heart adult (RPKM 24.6) and 19 other tissuesS ee more Orthologs human all

Genomic context

Location: 3; 3 D See Tm4sf1 in Genome Data Viewer Exon count: 10

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 3 NC_000069.6 (57285611..57387736, complement)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 3 NC_000069.5 (57090986..57105841, complement)

Chromosome 3 - NC_000069.6

Page 6 of 8 https://www.alphaknockout.com

Transcript information: This gene has 6 transcripts

Gene: Tm4sf1 ENSMUSG00000027800

Description transmembrane 4 superfamily member 1 [Source:MGI Symbol;Acc:MGI:104678] Gene Synonyms 12A8 target antigen, L6, L6 antigen, M3s1 Location : 57,285,611-57,301,988 reverse strand. GRCm38:CM000996.2 About this gene This gene has 6 transcripts (splice variants), 122 orthologues, 4 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Tm4sf1-205 ENSMUST00000196979.4 3372 202aa ENSMUSP00000143652.1 Protein coding CCDS38433 Q64302 TSL:1 GENCODE basic APPRIS P1

Tm4sf1-202 ENSMUST00000171384.7 2855 202aa ENSMUSP00000130999.1 Protein coding CCDS38433 Q64302 TSL:1 GENCODE basic APPRIS P1

Tm4sf1-201 ENSMUST00000029376.12 1436 202aa ENSMUSP00000029376.8 Protein coding CCDS38433 Q64302 TSL:1 GENCODE basic APPRIS P1

Tm4sf1-203 ENSMUST00000196506.1 747 136aa ENSMUSP00000143697.1 Protein coding - A0A0G2JGU1 CDS 3' incomplete TSL:2

Tm4sf1-204 ENSMUST00000196704.1 798 No protein - Retained intron - - TSL:3

Tm4sf1-206 ENSMUST00000198030.1 767 No protein - lncRNA - - TSL:3

36.38 kb Forward strand 57.28Mb 57.29Mb 57.30Mb 57.31Mb Contigs AC125103.4 > < AC119854.7

Genes (Comprehensive set... < Tm4sf1-202protein coding

< Tm4sf1-205protein coding

< Tm4sf1-201protein coding

< Tm4sf1-206lncRNA

< Tm4sf1-203protein coding

< Tm4sf1-204retained intron

Regulatory Build

57.28Mb 57.29Mb 57.30Mb 57.31Mb Reverse strand 36.38 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene processed transcript

Page 7 of 8 https://www.alphaknockout.com

Transcript: ENSMUST00000196979

< Tm4sf1-205protein coding

Reverse strand 16.33 kb

ENSMUSP00000143... Transmembrane heli... Low complexity (Seg) Pfam L6 membrane PANTHER PTHR14198:SF18

L6 membrane

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant synonymous variant

Scale bar 0 20 40 60 80 100 120 140 160 180 202

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 8 of 8