Same As Figure 3 but for Mouse Protein-Coding Genes. 80 400 CODING - 40 200 PROTEIN
Total Page:16
File Type:pdf, Size:1020Kb
1000 100 10 average normalised counts per kilobase 0 0 1.1 2 3 4 5 7.5 11 gene length (kb) Supplementary Figure 1 | Same as Figure 3 but for mouse protein-coding genes. 80 400 CODING - 40 200 PROTEIN 0 0 40 80 20 40 PSEUDOGENES 0 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 number of transcript isoforms number of transcript isoforms Supplementary Figure 2 | Olfactory receptors have several isoforms per gene. Barplots of the number of genes with the indicated number of different transcript isoforms. Genes have been split into protein-coding (top) and pseu- dogenes (bottom), and by species (human on the right, mouse on the left). A Ifi47 Olfr56 BC147437.1 BC141219.1 AK133471.1 AK53352.1, BB664859.1 162 5 7 2 2 172 17 B Olfr115 Olfr116 AA543264.1 279 3 Gm18213 8 28 26 4 21 3 3 174 15 14 C Olfr139 Olfr399 CB271224.1 29 9 5 15 2 10 6 23 Supplementary Figure 3 | Same as Figure 4 but for the additional mouse ORs that share a 5’ UTR with a neighbour- ing gene. mRNA, EST or PacBio clones supporting splice junctions between the two genes are indicated above the corresponding transcript. A OR7C1 OR7A5 DA117551.1 DB071724.1, HY003914.1 DA444957.1 84 239 14 5 6 8 20 4 5 3 9 3 2 3 168 18 2 67 4 54 177 72 129 2 3 2 8 25 36 31 B OR5V1 OR11A1 AA936177.1 OR12D3 13 5 2 62 148 4 4 3 23 2 9 2 41 221 2 2 19 2 73 2 C HBE1 OR51B5 BC022184.1 BE789678.1 BE074228.1 Pacbio_capture_seq_hsK562 BM450021.1 BG496706.1, BF699214.1 7 3 2 7 4 2 4 3 Supplementary Figure 4 | Same as Figure 4 but for the additional human ORs that share a 5’ UTR with a neighbour- ing gene. mRNA, EST or PacBio clones supporting splice junctions between the two genes are indicated above the corresponding transcript. Olfr680-ps1 Olfr681 Olfr682-ps1 64 50 CAACAAAGTAT--AAGAAGAGT reference 110 2 7 CAACAAAGTAT--AAGAAGAGT BALB/cJ 18 CAACAAAGTAT--AAGAAGAGT CBA/J 3 26 CAACAAAGTATAGAAGAAGAGT BUB/BnJ CAACAAAGTATAGAAGAAGAGT CAST/EiJ CAACAAAGTATAGAAGAAGAGT NOD/ShiLtJ CAACAGAGTATAGAAGAAGAGT SPRET/EiJ gtc atc ccc aag gt ag gtc ctg ctt atc Olfr682-ps1 V I P K V L L I gtc atc ctc aag gt ag gtc ctg ctt atc Olfr680-ps1 V I L K V L L I gtc att cct aag gt ag gtc ctg ctc atc Dog V I P K V L L I gtc atc cct aag gt ag gtc ctg ttc atc Cow V I P K V L F I gtc atc cct aag gt ag gtc ctg ctc atc Sheep V I P K V L L I 73 V I L C L T V I P K V L L I F W F N M K 93 Olfr682-ps1 73 V I L C L T V I L K V L L I F W F N M K 93 Olfr680-ps1 74 I I L C L T V I P K V L L I F W F N M K 94 Dog 74 V I L C L T V I P K V L F I F W F D M K 94 Cow 74 V I L C L T V I P K V L L I F W F D M K 94 Sheep Supplementary Figure 5 | Additional example of a split OR gene. On chromosome 7, Olfr682-ps1 was annotated as a pseudogene, but we identified an open reading frame (ORF) spanning two exons that codes for a 311 aa protein. This gene is a polymorphic pseudogene that, in the reference genome, contains a frameshift in the C-terminal domain (purple transcript); however, several mouse strains contain a 2bp indel at position 105,126,541 that restores the correct frame. The splice junction and protein sequence are conserved in several mammals, including dog, cow and sheep. Olfr682-ps1 has a close paralogue, Olfr680-ps1, which shares 97% identity at the protein level. Whereas Olfr680-ps1 lacks transcriptional evidence, we used the conservation with Olfr682-ps1 and other mammals to anno- tate a full-length split transcript structure..