https://www.alphaknockout.com

Mouse Gpbp1l1 Knockout Project (CRISPR/Cas9)

Objective: To create a Gpbp1l1 knockout Mouse model (C57BL/6J) by CRISPR/Cas-mediated genome engineering.

Strategy summary: The Gpbp1l1 (NCBI Reference Sequence: NM_029868 ; Ensembl: ENSMUSG00000034042 ) is located on Mouse 4. 12 exons are identified, with the ATG start codon in exon 3 and the TAG stop codon in exon 12 (Transcript: ENSMUST00000030460). Exon 3~5 will be selected as target site. Cas9 and gRNA will be co-injected into fertilized eggs for KO Mouse production. The pups will be genotyped by PCR followed by sequencing analysis. Note:

Exon 3 starts from the coding region. Exon 3~5 covers 32.98% of the coding region. The size of effective KO region: ~3744 bp. The KO region does not have any other known gene.

Page 1 of 9 https://www.alphaknockout.com

Overview of the Targeting Strategy

Wildtype allele 5' gRNA region gRNA region 3'

1 3 4 5 12

Legends Exon of mouse Gpbp1l1 Knockout region

Page 2 of 9 https://www.alphaknockout.com

Overview of the Dot Plot (up) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 689 bp section upstream of Exon 3 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Overview of the Dot Plot (down) Window size: 15 bp

Forward Reverse Complement

Sequence 12

Note: The 2000 bp section downstream of Exon 5 is aligned with itself to determine if there are tandem repeats. No significant tandem repeat is found in the dot plot matrix. So this region is suitable for PCR screening or sequencing analysis.

Page 3 of 9 https://www.alphaknockout.com

Overview of the GC Content Distribution (up) Window size: 300 bp

Sequence 12

Summary: Full Length(689bp) | A(29.03% 200) | C(15.82% 109) | T(37.01% 255) | G(18.14% 125)

Note: The 689 bp section upstream of Exon 3 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Overview of the GC Content Distribution (down) Window size: 300 bp

Sequence 12

Summary: Full Length(2000bp) | A(30.85% 617) | C(18.75% 375) | T(31.9% 638) | G(18.5% 370)

Note: The 2000 bp section downstream of Exon 5 is analyzed to determine the GC content. No significant high GC-content region is found. So this region is suitable for PCR screening or sequencing analysis.

Page 4 of 9 https://www.alphaknockout.com

BLAT Search Results (up)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 689 1 689 689 100.0% chr4 + 116570236 116570924 689 browser details YourSeq 73 223 451 689 91.1% chr14 - 57832305 57832627 323 browser details YourSeq 69 206 365 689 87.1% chr9 - 28843037 28843190 154 browser details YourSeq 69 223 372 689 89.9% chr12 + 27433381 27433557 177 browser details YourSeq 68 162 302 689 92.6% chr6 + 60983118 60983263 146 browser details YourSeq 62 93 300 689 74.3% chr4 - 136058204 136058327 124 browser details YourSeq 61 206 302 689 93.0% chr8 - 88194505 88194617 113 browser details YourSeq 61 223 301 689 93.0% chr5 - 119478250 119478338 89 browser details YourSeq 61 187 301 689 83.6% chrX + 92785423 92785524 102 browser details YourSeq 60 223 301 689 89.5% chr14 - 10233998 10234089 92 browser details YourSeq 59 216 302 689 90.5% chr12 + 35183275 35183362 88 browser details YourSeq 58 225 302 689 91.5% chr1 - 136708595 136708683 89 browser details YourSeq 57 206 300 689 93.9% chrX - 47020965 47021068 104 browser details YourSeq 57 223 298 689 92.7% chr10 - 112366531 112366623 93 browser details YourSeq 57 223 305 689 91.4% chrX + 153411452 153411552 101 browser details YourSeq 57 223 302 689 88.0% chr10 + 68872743 68872833 91 browser details YourSeq 55 223 298 689 93.7% chr5 - 110241710 110241799 90 browser details YourSeq 55 635 689 689 100.0% chr15 - 42032742 42032796 55 browser details YourSeq 54 223 302 689 86.5% chr11 + 42125252 42125341 90 browser details YourSeq 53 635 689 689 98.2% chr1 - 4693424 4693478 55

Note: The 689 bp section upstream of Exon 3 is BLAT searched against the genome. No significant similarity is found.

BLAT Search Results (down)

QUERY SCORE START END QSIZE IDENTITY CHROM STRAND START END SPAN ------browser details YourSeq 2000 1 2000 2000 100.0% chr4 + 116574616 116576615 2000 browser details YourSeq 129 525 671 2000 94.6% chr13 - 59699614 59699975 362 browser details YourSeq 128 510 684 2000 92.7% chr5 - 100709547 100709746 200 browser details YourSeq 128 514 675 2000 86.8% chr1 + 24465277 24465434 158 browser details YourSeq 118 532 686 2000 92.3% chr6 + 121126290 121126645 356 browser details YourSeq 113 526 716 2000 92.5% chr2 - 21162834 21163047 214 browser details YourSeq 112 557 1085 2000 78.9% chr13 - 90877131 90877367 237 browser details YourSeq 111 532 686 2000 92.4% chr4 - 132798717 132798902 186 browser details YourSeq 109 511 651 2000 90.0% chr3 - 107652916 107653060 145 browser details YourSeq 109 525 651 2000 95.1% chr1 + 34527172 34527299 128 browser details YourSeq 106 549 1074 2000 75.9% chr4 + 120917171 120917461 291 browser details YourSeq 104 532 692 2000 94.1% chr9 - 80112542 80112831 290 browser details YourSeq 103 105 651 2000 78.9% chrX - 101107281 101107593 313 browser details YourSeq 93 535 649 2000 95.2% chr9 + 57662526 57662642 117 browser details YourSeq 93 553 675 2000 90.6% chr19 + 11064282 11064428 147 browser details YourSeq 93 532 681 2000 83.4% chr17 + 94465432 94465553 122 browser details YourSeq 92 535 649 2000 94.3% chr3 - 154216180 154216296 117 browser details YourSeq 91 535 650 2000 96.0% chr19 + 57143002 57143119 118 browser details YourSeq 90 536 649 2000 94.2% chrX - 140396440 140396554 115 browser details YourSeq 89 535 652 2000 92.4% chr2 - 32546556 32546674 119

Note: The 2000 bp section downstream of Exon 5 is BLAT searched against the genome. No significant similarity is found.

Page 5 of 9 https://www.alphaknockout.com

Gene and information: Gpbp1l1 GC-rich promoter binding protein 1-like 1 [ Mus musculus () ] Gene ID: 77110, updated on 24-Oct-2019

Gene summary

Official Symbol Gpbp1l1 provided by MGI Official Full Name GC-rich promoter binding protein 1-like 1 provided by MGI Primary source MGI:MGI:1924360 See related Ensembl:ENSMUSG00000034042 Gene type protein coding RefSeq status VALIDATED Organism Mus musculus Lineage Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus Also known as BC002292; 5330440M15Rik Expression Ubiquitous expression in large intestine adult (RPKM 15.8), thymus adult (RPKM 14.8) and 28 other tissues See more Orthologs human all

Genomic context

Location: 4; 4 D1 See Gpbp1l1 in Genome Data Viewer Exon count: 14

Annotation release Status Assembly Chr Location

108 current GRCm38.p6 (GCF_000001635.26) 4 NC_000070.6 (116557179..116593902)

Build 37.2 previous assembly MGSCv37 (GCF_000001635.18) 4 NC_000070.5 (116230332..116266487)

Chromosome 4 - NC_000070.6

Page 6 of 9 https://www.alphaknockout.com

Transcript information: This gene has 4 transcripts

Gene: Gpbp1l1 ENSMUSG00000034042

Description GC-rich promoter binding protein 1-like 1 [Source:MGI Symbol;Acc:MGI:1924360] Gene Synonyms 5330440M15Rik Location Chromosome 4: 116,557,658-116,593,882 forward strand. GRCm38:CM000997.2 About this gene This gene has 4 transcripts (splice variants), 215 orthologues, 2 paralogues and is a member of 1 Ensembl protein family. Transcripts

Name Transcript ID bp Protein Translation ID Biotype CCDS UniProt Flags

Gpbp1l1-201 ENSMUST00000030460.14 3588 473aa ENSMUSP00000030460.8 Protein coding CCDS18511 Q6NZP2 TSL:1 GENCODE basic APPRIS P1

Gpbp1l1-202 ENSMUST00000106475.1 2642 473aa ENSMUSP00000102083.1 Protein coding CCDS18511 Q6NZP2 TSL:1 GENCODE basic APPRIS P1

Gpbp1l1-203 ENSMUST00000131913.1 714 No protein - lncRNA - - TSL:3

Gpbp1l1-204 ENSMUST00000138837.1 448 No protein - lncRNA - - TSL:3

Page 7 of 9 https://www.alphaknockout.com

56.23 kb Forward strand 116.55Mb 116.56Mb 116.57Mb 116.58Mb 116.59Mb 116.60Mb (Comprehensive set... Gm12953-201 >lncRNA Gpbp1l1-203 >lncRNA Ccdc17-201 >protein coding

Gpbp1l1-201 >protein coding Ccdc17-202 >lncRNA

Gpbp1l1-202 >protein coding Ccdc17-203 >lncRNA

Gpbp1l1-204 >lncRNA Ccdc17-205 >lncRNA

Ccdc17-204 >lncRNA

Contigs AL669953.7 > Genes < Tmem69-202protein coding < C530005A16Rik-201lncRNA (Comprehensive set...

< Tmem69-203lncRNA < Nasp-202protein coding

< Tmem69-201protein coding < Nasp-203protein coding

< Nasp-201protein coding

< Nasp-208lncRNA

< Nasp-204lncRNA

Regulatory Build

116.55Mb 116.56Mb 116.57Mb 116.58Mb 116.59Mb 116.60Mb Reverse strand 56.23 kb

Regulation Legend CTCF Open Chromatin Promoter Promoter Flank

Gene Legend Protein Coding

Ensembl protein coding merged Ensembl/Havana

Non-Protein Coding

RNA gene

Page 8 of 9 https://www.alphaknockout.com

Transcript: ENSMUST00000030460

36.23 kb Forward strand

Gpbp1l1-201 >protein coding

ENSMUSP00000030... MobiDB lite Low complexity (Seg) Pfam Vasculin family PANTHER Vasculin family

PTHR14339:SF10

All sequence SNPs/i... Sequence variants (dbSNP and all other sources)

Variant Legend missense variant splice region variant synonymous variant

Scale bar 0 40 80 120 160 200 240 280 320 360 400 473

We wish to acknowledge the following valuable scientific information resources: Ensembl, MGI, NCBI, UCSC.

Page 9 of 9