1000

100

10 average normalised counts per kilobase

0

0 1.1 2 3 4 5 7.5 11 length (kb)

Supplementary Figure 1 | Same as Figure 3 but for mouse -coding . 80 400 CODING - 40 200 PROTEIN

0 0 40 80

20 40 PSEUDOGENES

0 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 number of transcript isoforms number of transcript isoforms

Supplementary Figure 2 | Olfactory receptors have several isoforms per gene. Barplots of the number of genes with the indicated number of different transcript isoforms. Genes have been split into protein-coding (top) and pseu- dogenes (bottom), and by species (human on the right, mouse on the left). A Ifi47

Olfr56 BC147437.1

BC141219.1

AK133471.1 AK53352.1, BB664859.1

162

5 7 2 2 172 17 B Olfr115 Olfr116

AA543264.1

279 3 Gm18213 8 28 26 4

21 3 3 174 15 14 C Olfr139 Olfr399

CB271224.1

29 9

5 15 2 10 6 23

Supplementary Figure 3 | Same as Figure 4 but for the additional mouse ORs that share a 5’ UTR with a neighbour- ing gene. mRNA, EST or PacBio clones supporting splice junctions between the two genes are indicated above the corresponding transcript. A OR7C1 OR7A5

DA117551.1

DB071724.1, HY003914.1

DA444957.1

84 239 14 5 6 8 20 4 5 3 9 3

2 3 168 18 2 67 4 54 177

72 129 2 3

2 8 25 36 31 B OR5V1 OR11A1 AA936177.1

OR12D3

13 5 2 62 148 4 4 3 23 2 9 2 41 221

2 2

19 2 73 2 C HBE1 OR51B5

BC022184.1

BE789678.1

BE074228.1

Pacbio_capture_seq_hsK562 BM450021.1 BG496706.1, BF699214.1

7 3 2

7

4 2 4

3

Supplementary Figure 4 | Same as Figure 4 but for the additional human ORs that share a 5’ UTR with a neighbour- ing gene. mRNA, EST or PacBio clones supporting splice junctions between the two genes are indicated above the corresponding transcript. Olfr680-ps1 Olfr681 Olfr682-ps1 64 50 CAACAAAGTAT--AAGAAGAGT reference 110 2 7 CAACAAAGTAT--AAGAAGAGT BALB/cJ 18 CAACAAAGTAT--AAGAAGAGT CBA/J 3 26 CAACAAAGTATAGAAGAAGAGT BUB/BnJ CAACAAAGTATAGAAGAAGAGT CAST/EiJ CAACAAAGTATAGAAGAAGAGT NOD/ShiLtJ CAACAGAGTATAGAAGAAGAGT SPRET/EiJ

gtc atc ccc aag gt ag gtc ctg ctt atc Olfr682-ps1 V I P K V L L I gtc atc ctc aag gt ag gtc ctg ctt atc Olfr680-ps1 V I L K V L L I gtc att cct aag gt ag gtc ctg ctc atc Dog V I P K V L L I gtc atc cct aag gt ag gtc ctg ttc atc Cow V I P K V L F I gtc atc cct aag gt ag gtc ctg ctc atc Sheep V I P K V L L I

73 V I L C L T V I P K V L L I F W F N M K 93 Olfr682-ps1 73 V I L C L T V I L K V L L I F W F N M K 93 Olfr680-ps1 74 I I L C L T V I P K V L L I F W F N M K 94 Dog 74 V I L C L T V I P K V L F I F W F D M K 94 Cow 74 V I L C L T V I P K V L L I F W F D M K 94 Sheep

Supplementary Figure 5 | Additional example of a split OR gene. On 7, Olfr682-ps1 was annotated as a pseudogene, but we identified an open reading frame (ORF) spanning two exons that codes for a 311 aa protein. This gene is a polymorphic pseudogene that, in the reference , contains a frameshift in the C-terminal domain (purple transcript); however, several mouse strains contain a 2bp indel at position 105,126,541 that restores the correct frame. The splice junction and protein sequence are conserved in several mammals, including dog, cow and sheep. Olfr682-ps1 has a close paralogue, Olfr680-ps1, which shares 97% identity at the protein level. Whereas Olfr680-ps1 lacks transcriptional evidence, we used the conservation with Olfr682-ps1 and other mammals to anno- tate a full-length split transcript structure.