https://www.alphaknockout.com

Mouse Map4 Knockout Project (CRISPR/Cas9)

Objective: To create a Map4 knockout Mouse model (C57BL/6N) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Map4 (NCBI Reference Sequence: NM_001205330 ; Ensembl: ENSMUSG00000032479 ) is located on Mouse 9. 19 exons are identified, with the ATG start codon in exon 2 and the TAA stop codon in exon 19 (Transcript: ENSMUST00000035055). Exon 4~13 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note: Mice homozygous for a gene trapped allele are viable and do not display any overt phenotypic abnormalities.

Exon 4 starts from about 8.68% of the coding region. Exon 4~13 covers 73.66% of the coding region. The size of effective KO region: ~42885 bp. The KO region does not have any other known gene.

Page 1 of 10 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3' 12

1 4 5 6 7 8 9 10 11 13 19

Legends Exon of mouse Map4 Knockout region

Page 2 of 10 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section upstream of Exon 4 is aligned with itself to determine if there are tandem repeats. Tandem repeats are found in the dot plot matrix. The gRNA site is selected outside of these tandem repeats.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 13 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 10 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(23.15% 463) | C(22.6% 452) | T(33.7% 674) | G(20.55% 411)

Note: The 2000 bp section upstream of Exon 4 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(26.25% 525) | C(20.7% 414) | T(29.05% 581) | G(24.0% 480)

Note: The 2000 bp section downstream of Exon 13 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 10 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr9 + 110024102 110026101 2000 browser details YourSeq 349 237 1317 2000 85.9% chrX - 20541133 20542091 959 browser details YourSeq 347 37 926 2000 86.3% chr8 + 51246781 51247629 849 browser details YourSeq 345 336 926 2000 90.5% chr6 - 21755730 21756337 608 browser details YourSeq 345 322 926 2000 89.9% chr19 + 22346392 22347020 629 browser details YourSeq 343 89 926 2000 85.0% chr7 - 26802262 26802917 656 browser details YourSeq 342 336 921 2000 87.9% chr8 - 42429966 42430546 581 browser details YourSeq 342 288 926 2000 90.4% chr2 - 84097880 84098548 669 browser details YourSeq 342 322 926 2000 86.4% chr11 - 3944722 3945289 568 browser details YourSeq 342 296 920 2000 90.2% chr10 + 87664493 87665117 625 browser details YourSeq 340 46 926 2000 87.7% chrX + 138897964 138898948 985 browser details YourSeq 336 341 909 2000 87.8% chr5 - 20419561 20420129 569 browser details YourSeq 336 336 915 2000 92.5% chr3 - 84932778 84933365 588 browser details YourSeq 335 341 926 2000 87.5% chr5 - 63612387 63612963 577 browser details YourSeq 335 322 921 2000 89.6% chr11 + 105321572 105322195 624 browser details YourSeq 334 288 926 2000 87.9% chr9 - 82323270 82323914 645 browser details YourSeq 334 321 921 2000 87.4% chr11 - 87300325 87300943 619 browser details YourSeq 333 344 915 2000 90.8% chr1 - 96919810 96920395 586 browser details YourSeq 333 341 926 2000 90.2% chrX + 111764419 111765022 604 browser details YourSeq 332 94 926 2000 91.2% chrX + 161202357 161312706 110350

Note: The 2000 bp section upstream of Exon 4 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr9 + 110068987 110070986 2000 browser details YourSeq 142 239 440 2000 93.9% chr1 + 94803428 94803953 526 browser details YourSeq 134 236 609 2000 92.0% chr10 + 117613665 117614144 480 browser details YourSeq 131 238 386 2000 94.6% chr16 - 13735401 13735555 155 browser details YourSeq 130 239 386 2000 94.6% chr5 - 31273630 31273783 154 browser details YourSeq 129 236 387 2000 93.5% chr10 + 75914463 75914628 166 browser details YourSeq 128 207 384 2000 86.5% chr16 - 33339237 33339400 164 browser details YourSeq 127 238 386 2000 93.3% chrX - 12833881 12834228 348 browser details YourSeq 127 238 386 2000 93.3% chr15 - 84507757 84507912 156 browser details YourSeq 125 238 386 2000 92.6% chr8 - 58439802 58439956 155 browser details YourSeq 125 237 386 2000 92.7% chr4 - 58400431 58400590 160 browser details YourSeq 125 220 378 2000 93.2% chrX + 104186817 104187317 501 browser details YourSeq 125 237 387 2000 92.0% chr4 + 54481151 54481330 180 browser details YourSeq 124 242 396 2000 91.4% chr15 + 8488231 8488389 159 browser details YourSeq 123 226 386 2000 89.6% chrX - 151099079 151099240 162 browser details YourSeq 123 239 384 2000 93.2% chr19 - 10372407 10372560 154 browser details YourSeq 123 228 383 2000 91.5% chr1 - 68311739 68311901 163 browser details YourSeq 123 239 383 2000 93.2% chr9 + 71370706 71370864 159 browser details YourSeq 123 239 383 2000 93.2% chr1 + 157406153 157406304 152 browser details YourSeq 122 239 386 2000 91.9% chr14 - 67694782 67694935 154

Note: The 2000 bp section downstream of Exon 13 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 10 https://www.alphaknockout.com

Gene and information: Map4 -associated protein 4 [ Mus musculus (house mouse) ] Gene ID: 17758, updated on 14-Aug-2019

Gene summary

Official Symbol Map4 provided by MGI Official Full Name microtubule-associated protein 4 provided by MGI Primary source MGI:MGI:97178 See related Ensembl:ENSMUSG00000032479 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as MAP-4; Mtap4; Mtap-4; AA407148 Expression Ubiquitous expression in cerebellum adult (RPKM 29.3), frontal lobe adult (RPKM 18.1) and 28 other tissues See more Orthologs human all

Genomic context

Location: 9 F2; 9 59.83 cM See Map4 in Genome Data Viewer Exon count: 24

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 9 NC_000075.6 (109929841..110083954)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 9 NC_000075.5 (109834278..109986451)

Chromosome 9 - NC_000075.6

Page 6 of 10 https://www.alphaknockout.com

Transcript information: This gene has 17 transcripts

Gene: Map4 ENSMUSG00000032479

Description microtubule-associated protein 4 [Source:MGI Symbol;Acc:MGI:97178] Gene Synonyms MAP 4, Mtap4 Location Chromosome 9: 109,931,460-110,083,955 forward strand. GRCm38:CM001002.2 About this gene This gene has 17 transcripts (splice variants), 224 orthologues, 2 paralogues, is a member of 2 Ensembl protein families and is associated with 1 phenotype. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Map4- ENSMUST00000035055.14 6091 1125aa ENSMUSP00000035055.10 Protein coding CCDS57704 P27546 TSL:1 201 GENCODE basic

Map4- ENSMUST00000164930.7 4925 933aa ENSMUSP00000131285.3 Protein coding CCDS81082 E9PZ43 TSL:1 204 GENCODE basic APPRIS P1

Map4- ENSMUST00000199498.4 4055 902aa ENSMUSP00000142439.1 Protein coding CCDS81083 A0A0G2JDN7 TSL:1 214 GENCODE basic

Map4- ENSMUST00000169851.7 3446 99aa ENSMUSP00000131660.2 Protein coding CCDS57703 Q78TF3 TSL:1 206 GENCODE basic

Map4- ENSMUST00000165876.7 5573 1124aa ENSMUSP00000132662.3 Protein coding - P27546 TSL:5 205 GENCODE basic

Map4- ENSMUST00000163190.7 4322 1441aa ENSMUSP00000143171.1 Protein coding - A0A0G2JFH2 CDS 5' and 3' 202 incomplete TSL:5

Map4- ENSMUST00000163979.6 3320 414aa ENSMUSP00000129362.3 Protein coding - A0A140T8T5 TSL:1 203 GENCODE basic

Map4- ENSMUST00000199548.4 1383 290aa ENSMUSP00000143408.1 Protein coding - A0A0G2JG35 CDS 5' incomplete 215 TSL:2

Map4- ENSMUST00000199461.4 1176 276aa ENSMUSP00000143296.1 Protein coding - A0A0G2JFT4 CDS 5' incomplete 213 TSL:5

Map4- ENSMUST00000198511.4 825 220aa ENSMUSP00000142558.1 Protein coding - A0A0G2JDY5 CDS 5' incomplete 211 TSL:2

Map4- ENSMUST00000200480.1 758 116aa ENSMUSP00000142501.1 Protein coding - A0A0G2JDU1 CDS 5' incomplete 217 TSL:3

Map4- ENSMUST00000199985.1 754 81aa ENSMUSP00000142640.1 Protein coding - A0A0G2JE57 CDS 5' incomplete 216 TSL:3

Map4- ENSMUST00000199161.4 693 207aa ENSMUSP00000143205.1 Protein coding - A0A0G2JFK3 CDS 5' incomplete 212 TSL:3

Map4- ENSMUST00000196763.1 740 No - Retained - - TSL:3 208 protein intron

Map4- ENSMUST00000197289.1 940 No - lncRNA - - TSL:3 209 protein

Map4- ENSMUST00000198185.1 829 No - lncRNA - - TSL:5 210 protein

Map4- ENSMUST00000196729.1 442 No - lncRNA - - TSL:3 207 protein

172.50 kb Forward strand 109.94Mb 109.96Mb 109.98Mb 110.00Mb 110.02Mb 110.04Mb 110.06Mb 110.08Mb (Comprehensive set... Map4-201 >protein coding

Map4-206 >protein coding Gm42433-201 >TEC Map4-202 >protein coding

Page 7 of 10 Map4-205 >protein coding

Map4-209 >lncRNA Map4-207 >lncRNA Map4-204 >protein coding

Gm4734-201 >processed pseudogene Gm42432-201 >TEC Map4-203 >protein coding

Map4-214 >protein coding

Map4-210 >lncRNA

Map4-215 >protein coding

Map4-213 >protein coding

Map4-211 >protein coding

Map4-212 >protein coding

Map4-208 >retained intron

Map4-217 >protein coding

Map4-216 >protein coding

Contigs < AC161246.2 Genes < Gm4644-201processed pseudogene < Gm43732-201lncRNA < Dhx30-216nonsense mediated decay (Comprehensive set...

< Dhx30-214retained intron

< Dhx30-209retained intron

< Dhx30-205retained intron

< Dhx30-223retained intron

< Dhx30-212protein coding

< Dhx30-219protein coding

< Dhx30-215protein coding

< Dhx30-201protein coding

< Dhx30-203protein coding

< Dhx30-204protein coding

< Dhx30-220protein coding

< Dhx30-218protein coding

< Dhx30-202protein coding

< Dhx30-224protein coding

< Dhx30-226protein coding

< Dhx30-222protein coding

< Dhx30-206protein coding

Regulatory Build

109.94Mb 109.96Mb 109.98Mb 110.00Mb 110.02Mb 110.04Mb 110.06Mb 110.08Mb Reverse strand 172.50 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene pseudogene processed transcript 172.50 kb Forward strand 109.94Mb 109.96Mb 109.98Mb 110.00Mb 110.02Mb 110.04Mb 110.06Mb 110.08Mb Genes Map4-201 >protein coding (Comprehensive set...

https://www.alphaknockout.com Map4-206 >protein coding Gm42433-201 >TEC Map4-202 >protein coding

Map4-205 >protein coding

Map4-209 >lncRNA Map4-207 >lncRNA Map4-204 >protein coding

Gm4734-201 >processed pseudogene Gm42432-201 >TEC Map4-203 >protein coding

Map4-214 >protein coding

Map4-210 >lncRNA

Map4-215 >protein coding

Map4-213 >protein coding

Map4-211 >protein coding

Map4-212 >protein coding

Map4-208 >retained intron

Map4-217 >protein coding

Map4-216 >protein coding

Contigs < AC161246.2 Genes < Gm4644-201processed pseudogene < Gm43732-201lncRNA < Dhx30-216nonsense mediated decay (Comprehensive set...

< Dhx30-214retained intron

< Dhx30-209retained intron

< Dhx30-205retained intron

< Dhx30-223retained intron

< Dhx30-212protein coding

< Dhx30-219protein coding

< Dhx30-215protein coding

< Dhx30-201protein coding

< Dhx30-203protein coding

< Dhx30-204protein coding

< Dhx30-220protein coding

< Dhx30-218protein coding

< Dhx30-202protein coding

< Dhx30-224protein coding

< Dhx30-226protein coding

< Dhx30-222protein coding

< Dhx30-206protein coding

Regulatory Build

109.94Mb 109.96Mb 109.98Mb 110.00Mb 110.02Mb 110.04Mb 110.06Mb 110.08Mb Reverse strand 172.50 kb

Page 8 of 10 Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene pseudogene processed transcript 172.50 kb Forward strand

109.94Mb 109.96Mb 109.98Mb 110.00Mb 110.02Mb 110.04Mb 110.06Mb 110.08Mb Genes (Comprehensive set... Map4-201 >protein coding

Map4-206 >protein coding Gm42433-201 >TEC Map4-202 >protein coding

Map4-205 >protein coding

Map4-209 >lncRNA Map4-207 >lncRNA Map4-204 >protein coding

Gm4734-201 >processed pseudogene Gm42432-201 >TEC Map4-203 >protein coding

Map4-214 >protein coding

Map4-210 >lncRNA

Map4-215 >protein coding

Map4-213 >protein coding

Map4-211 >protein coding

Map4-212 >protein coding

Map4-208 >retained intron

Map4-217 >protein coding

Map4-216 >protein coding

Contigs < AC161246.2 Genes < Gm4644-201processed pseudogene < Gm43732-201lncRNA < Dhx30-216nonsense mediated decay (Comprehensive set...

< Dhx30-214retained intron

< Dhx30-209retained intron

< Dhx30-205retained intron

< Dhx30-223retained intron

< Dhx30-212protein coding

< Dhx30-219protein coding

< Dhx30-215protein coding

< Dhx30-201protein coding

< Dhx30-203protein coding

< Dhx30-204protein coding

< Dhx30-220protein coding

< Dhx30-218protein coding

< Dhx30-202protein coding

< Dhx30-224protein coding

< Dhx30-226protein coding

< Dhx30-222protein coding

< Dhx30-206protein coding

Regulatory Build

109.94Mb 109.96Mb 109.98Mb 110.00Mb 110.02Mb 110.04Mb 110h.0t6tMpbs://www1.1a0.l0p8Mhbaknockout.com Reverse strand 172.50 kb

Regulation Legend CTCF Enhancer Open Chromatin Promoter Promoter Flank Transcription Factor Binding Site

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene pseudogene processed transcript

Page 9 of 10 https://www.alphaknockout.com

Transcript: ENSMUST00000035055

152.50 kb Forward strand

Map4-201 >protein coding

ENSMUSP00000035... MobiDB lite Low complexity (Seg) Coiled-coils (Ncoils) Pfam Microtubule associated protein, tubulin-binding repeat PROSITE profiles Microtubule associated protein, tubulin-binding repeat PROSITE patterns Microtubule associated protein, tubulin-binding repeat PANTHER PTHR11501:SF16

Microtubule associated protein MAP2/MAP4/Tau

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend

inframe insertion missense variant synonymous variant

Scale bar 0 100 200 300 400 500 600 700 800 900 1000 1125

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 10 of 10