bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Invertebrate Metaxins 1 and 2: Widely Distributed Proteins Homologous to Vertebrate Metaxins Implicated in Protein Import into Mitochondria
Kenneth W. Adolph
Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455
Email Address: [email protected]
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
ABSTRACT
Metaxin 1 and 2 genes, previously investigated in vertebrates, are shown to be widely
distributed among invertebrates. But metaxin 3 is absent. The predicted proteins of the
invertebrate metaxins were initially identified by homology with human metaxin 1 and 2 proteins,
and by the presence of characteristic GST_Metaxin protein domains. Invertebrate metaxins were
revealed for a variety of phyla, including Echinodermata, Cnidaria, Porifera, Chordata,
Arthropoda, Mollusca, Brachiopoda, Placozoa, and Nematoda. Metaxins were also found in
insects (Arthropoda) of different taxonomic orders: Diptera, Coleoptera, Lepidoptera,
Hymenoptera, and Blattodea. Invertebrate and human metaxin 1 proteins have about 41% identical
amino acids, while metaxin 2 proteins have about 49% identities. Invertebrate and vertebrate
metaxins share the same characteristic protein domains, further strengthening the identification of
the invertebrate proteins as metaxins. The domains are, for metaxin 1, GST_N_Metaxin1_like,
GST_C_Metaxin1_3, and Tom37. For metaxin 2, they are GST_N_Metaxin2, GST_C_Metaxin2,
and Tom37. Phylogenetic trees show that invertebrate metaxin 1 and metaxin 2 proteins are
related, but form separate groups. The invertebrate proteins are also closely related to vertebrate
metaxins, though forming separate clusters. These phylogenetic results indicate that all metaxins
likely arose from a common ancestral sequence. The neighboring genes of the invertebrate
metaxin 1 and 2 genes are largely different for different invertebrate species. This is unlike the
situation with vertebrate metaxin genes, where, for example, the metaxin 1 gene is adjacent to the
thrombospondin 3 gene. The dominant secondary structures predicted for the invertebrate
metaxins are alpha-helical segments, with little beta-strand. The conserved pattern of helical
segments is the same as that found for vertebrate metaxins 1, 2, and 3.
2 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1. INTRODUCTION
The metaxins have previously been identified as vertebrate proteins involved in the import
of newly synthesized proteins into mitochondria. Three vertebrate metaxins have been
characterized: metaxin 1 (Bornstein et al., 1995), metaxin 2 (Armstrong et al., 1999), and metaxin
3 (Adolph, 2004, 2005). The vertebrate metaxins are thought to be components of protein
complexes functionally equivalent to the TOM or SAM complexes of the outer mitochondrial
membrane of yeast (Saccharomyces cerevisiae). The role of the TOM and SAM complexes in
protein import into yeast mitochondria is much better understood than protein import in human
cells and other vertebrate cells. This investigation concerns whether invertebrates, like vertebrates,
have metaxin genes. The aims have been to understand which, if any, of the metaxins are found in
invertebrates, how widely distributed they might be among invertebrate phyla, and, most
importantly, fundamental properties of the metaxins encoded by invertebrate genomes. The results
show that metaxin 1 and 2 genes are widely distributed among invertebrates, just as in vertebrates.
But although metaxin 3 is highly conserved among vertebrates (Adolph, 2019), metaxin 3 was not
found for the invertebrates studied.
The invertebrates in this investigation represent many of the invertebrate phyla. An
emphasis was on invertebrates that are widely used as model organisms in biological and
biomedical research. These include the classic model organisms C. elegans (Brenner, 1974; White
et al., 1986) and Drosophila melanogaster (Wangler et al., 2017). Both had their genomes
sequenced early in the history of whole genome sequencing (C. elegans sequencing consortium,
1998; Adams et al., 2000). Other organisms in this study that are important model systems and
have had their genomes sequenced include Ciona intestinalis (Satoh, 2003; Dehal et al., 2002),
3 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Nematostella vectensis (Genikhovich and Technau, 2009; Putnam et al., 2007), and Trichoplax
adhaerens (Schierwater and DeSalle, 2018; Srivastava et al., 2008).
The first metaxin gene to be identified was the mouse metaxin 1 gene (Mtx1) (Bornstein
et al., 1995). The metaxin 1 protein was revealed to be located in the outer mitochondrial
membrane, and experimental evidence demonstrated a role in protein import into mitochondria
(Armstrong et al., 1997). The mouse metaxin 1 gene is in close proximity to the thrombospondin 3
gene (Thbs3) and the glucocerebrosidase gene (Gba). The thrombospondins are extracellular
matrix proteins, and are important mediators of cell-matrix and cell-cell interactions (Adams and
Lawler, 2011; Mosher and Adams, 2012).
A gene homologous to the mouse metaxin 1 gene was found to be present in humans at
cytogenetic location 1q21. In human cells, the metaxin 1 gene (MTX1) is between a
glucocerebrosidase pseudogene (psGBA1) and the gene for thrombospondin 3 (THBS3) (Long et
al., 1996; Adolph et al., 1995), with a metaxin 1 pseudogene (psMTX1) also at this locus. The gene
order is GBA1—psMTX1—psGBA1—MTX1—THBS3. In humans, a deficiency of the
glucocerebrosidase enzyme as a result of mutations in GBA1 can cause Gaucher disease (Mistry et
al., 2017; Tayebi et al., 2003; LaMarca et al., 2004). An increased genetic risk for the development
of Parkinson’s disease is also associated with GBA1 mutations (Do et al., 2019).
A second metaxin protein was identified in the mouse through its interaction with metaxin
1 protein (Armstrong et al., 1999). The metaxin 2 protein shows 29% amino acid identities with
mouse metaxin 1. The gene (Mtx2) is adjacent to a group of homeobox genes coding for
transcription factors. Because of their interaction, metaxin 1 and metaxin 2 proteins might form a
complex in the outer mitochondrial membrane involved in the import of proteins into
4 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
mitochondria. A human gene (MTX2) homologous to Mtx2 is at 2q31.2. Like the mouse gene, it is
adjacent to a cluster of homeobox genes.
Zebrafish metaxin 3 was the first metaxin 3 to be identified (Adolph, 2004). The zebrafish
metaxin 3 protein sequence, deduced from its cDNA sequence, was 40% homologous to zebrafish
metaxin 1, and 26% homologous to metaxin 2. Xenopus laevis metaxin 3 was also identified
through sequencing of cDNAs (Adolph, 2005). Zebrafish and Xenopus metaxin 3 proteins have
55% amino acid identities, indicating that the proteins are conserved to a great degree. The
recognition that metaxin 3 is a metaxin but distinct from metaxins 1 and 2 was based on factors
including amino acid sequence homology, presence of GST_Metaxin domains, phylogenetic
analysis, and conserved patterns of protein secondary structure.
The import of proteins into mitochondria has been investigated in great detail in yeast
(Saccharomyces cerevisiae) (Pfanner et al., 2019; Wiedemann and Pfanner, 2017; Neupert, 2015).
In particular, the TOM complex (translocase of the outer mitochondrial membrane complex) and
SAM complex (sorting and assembly machinery complex) bring about the import of beta-barrel
proteins into yeast mitochondria and their insertion into the outer membrane. The Tom37 domain,
a conserved protein domain of the SAM complex, is also present in the metaxin proteins, both
vertebrate and invertebrate (this study).
Although the invertebrates include important research organisms used in many labs,
nothing has been published about the metaxins of invertebrates. Metaxin proteins representing
diverse invertebrate phyla have therefore been studied. The proteins were primarily those
predicted by genome sequencing. Metaxin 1 and metaxin 2 were found to be highly conserved
among invertebrates, and fundamental properties of the proteins were determined.
5 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
2. MATERIALS AND METHODS
The amino acid sequence alignments in Figure 1 were produced using the Global Align
program of the NCBI (https://blast.ncbi.nlm.nih.gov/Blast.cgi; Needleman and Wunsch, 1970;
Altschul et al., 1997). This program allowed the identities and similarities of two metaxin amino
acid sequences to be determined along their entire lengths. The EMBOSS Needle program
(https://www.ebi.ac.uk/Tools/psa/emboss_needle/) was also used to generate pairwise sequence
alignments. The CD_Search tool (www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi; Marchler-
Bauer et al., 2017) was used to identify the protein domains included in Figure 2 through
searching the conserved domain database of the NCBI. Evolutionary relationships of invertebrate
metaxins (Figure 3) were examined with the COBALT multiple sequence alignment tool and the
phylogenetic trees produced from the multiple sequence alignments
(www.ncbi.nlm.nih.gov/tools/cobalt; Papadopoulos and Agarwala, 2007). In addition, the Clustal
Omega program (https://www.ebi.ac.uk/Tools/msa/clustalo/) was used to generate multiple
sequence alignments and phylogenetic trees. The genes adjacent to invertebrate metaxin genes
were identified from the related information on “Gene neighbors” that resulted from BLAST
searches (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Adjacent genes were also identified using the
Genome Data Viewer and, earlier, the Map Viewer genome browsers
(www.ncbi.nlm.nih.gov/genome/gdv). The PSIPRED secondary structure prediction server
(bioinf.cs.ucl.ac.uk/psipred/; Jones, 1999) was used to predict the alpha-helical structures of
invertebrate metaxin proteins in Figure 4. The presence of transmembrane helices was investigated
with the PHOBIUS program (www.ebi.ac.uk/Tools/pfa/phobius/; Madeira et al., 2019) and also
the TMHMM prediction server (www.cbs.dtu.dk/services/TMHMM/; Krogh et al., 2001).
6 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
3. RESULTS AND DISCUSSION
3.1. Conservation of Metaxin Proteins Among Invertebrates
In this investigation, invertebrates were found to have distinct metaxin 1 and metaxin 2
genes like humans and other vertebrates. However, invertebrates were not observed to have
metaxin 3 genes. Plants and bacteria are different and have a single metaxin-like gene (K.W.
Adolph, unpublished). For invertebrates other than insects, metaxin 1 and 2 genes were found for
representative invertebrates in a variety of phyla. These phyla include Echinodermata (example:
sea urchin), Cnidaria (coral), Porifera (sponge), Chordata (amphioxus), Arthropoda (spider),
Mollusca (octopus), Brachiopoda (brachiopod), Placozoa (Tricoplax), and Nematoda (C. elegans).
Insects were also found to have metaxin 1 and metaxin 2 genes, while metaxin 3 was not
observed. This was true for representative insects (Phylum: Arthropoda; Class: Insecta) of various
taxonomic orders: Diptera (example: fruit fly), Coleoptera (beetle), Lepidoptera (moth),
Hymenoptera (honey bee), Hemiptera (bed bug), and Blattodea (termite).
For the fruit fly Drosophila, a second metaxin 2 gene was present in 13 of 17 Drosophila
species studied. The two genes are designated metaxins 2a and 2b. The predicted 2a and 2b
proteins have about 48% sequence identities. However, other insects have a single metaxin 2 gene,
including the Mediterranean fruit fly, oriental fruit fly, and olive fruit fly. Different species of
Drosophila have very similar metaxin proteins. For example, metaxin 1 proteins of different
species have about 85% sequence identities, and metaxin 2a proteins 91%.
7 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
3.2. Amino Acid Sequence Homology of Invertebrate Metaxins
Invertebrate metaxin genes and proteins were first identified by homology with the amino
acid sequences of human metaxin 1 and human metaxin 2 proteins. The invertebrate protein
sequences are, in most cases, the sequences predicted from genomic DNA sequences. The
identification of the invertebrate proteins as metaxins was strengthened by the presence of
conserved, characteristic protein domains, the GST_N_Metaxin, GST_C_Metaxin, and Tom37
domains, but no additional major domains. Also, the lengths of the invertebrate metaxin proteins
had to be comparable to human metaxin 1 (317 amino acids) or human metaxin 2 (263 amino
acids).
The invertebrate metaxin 1 proteins share about 41% amino acid sequence identities with
human metaxin 1. The metaxin 2 proteins share about 49% identities with human metaxin 2. As an
example, the metaxin 1 protein sequence of the purple sea urchin Strongylocentrotus purpuratus
(Echinodermata) has 46% identical amino acids when aligned with human metaxin 1. As
additional examples, the sponge Amphimedon queenslandica (Porifera) has 28% identities, and the
Pacific oyster Crassostrea gigas (Mollusca) has 39% identities. When aligned with the human
metaxin 2 protein sequence, the sea urchin metaxin 2 sequence shows 50% identities, the sponge
metaxin 2 sequence 35%, and the oyster sequence 47%.
Figure 1(A) includes the sequence of human metaxin 1 aligned with that of the roundworm
C. elegans (Caenorhabditis elegans; Nematoda). There are 31% amino acid identities. In Figure
1(B), 36% identities are found in aligning human metaxin 2 with C. elegans metaxin 2. C.elegans
is extensively used as a model organism for investigating the genetic control of development and
physiology (Brenner, 1974; White et al., 1986), and was the first multicellular organism to have its
genome completely sequenced (C. elegans sequencing consortium, 1998).
8 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 1(C) shows the alignment of human metaxin 1 with Drosophila melanogaster
metaxin 1. The identities are 27%. For human and Drosophila metaxin 2a, 41% identities are
observed (Figure 1(D)). The Drosophila metaxins are included because Drosophila melanogaster
is widely used as a model organism for studies of molecular genetics, population genetics, and
developmental biology (Adams et al., 2000; Wangler et al., 2017).
3.3. Protein Domains of Invertebrate Metaxins
A defining characteristic of metaxin proteins is the presence of GST_N_Metaxin and
GST_C_Metaxin domains, and, in addition, Tom37 domains. Figure 2(A) shows the domain
structure of representative invertebrate metaxin 1 and metaxin 2 proteins, while insects are in
Figure 2(B). The invertebrates in 2(A) include the sea squirt Ciona intestinalis, which has the
smallest genome of any chordate that can be utilized experimentally. As shown in the figure,
Ciona metaxin 1 has GST_N_Metaxin1_like, GST_C_Metaxin1_3, and Tom37 domains. Ciona
metaxin 2 has GST_N_Metaxin2, GST_C_Metaxin2, and Tom37 domains. The metaxin domains
of the starlet sea anemone (Nematostella vectensis) and Trichoplax adhaerens are also included in
Figure 2(A). The domains are identical to those of Ciona. The starlet sea anemone is a Cnidarian,
the simplest animals with cells organized into tissues. Trichoplax adhaerens is the simplest known
animal and has the smallest known genome of any animal. (For references, see Introduction.) The
metaxin 1 and metaxin 2 domain structures of the three organisms are seen to be highly conserved.
The metaxin proteins of vertebrates, in particular humans, have similar domain structures
to those seen in Figure 2. Human metaxin 1 and metaxin 3 have GST_N_Metaxin1_like,
9 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
GST_C_Metaxin1_3, and Tom37 domains. Human metaxin 2 has GST_N_Metaxin2,
GST_C_Metaxin2, and Tom37 domains.
The metaxin domains of representative insects are shown in Figure 2(B). The metaxin 1
and metaxin 2 domains are the same as those in 2(A) for invertebrates that don’t include insects.
The examples in Figure 2(B) are the yellow fever mosquito (Aedes aegypti), a dampwood termite
(Zootermopsis nevadensis), and the common bed bug (Cimex lectularius).
The Tom37 domain, as identified by the NCBI conserved domain database, is present in
yeast (Saccharomyces cerevisiae) protein Tom37, now generally known as Sam37. Sam37 is a
component of the SAM complex, the sorting and assembly machinery complex. The SAM
complex, consisting of Sam50, Sam37, Sam35, and Mdm10, is located in the outer mitochondrial
membrane, and is involved in the insertion of beta-barrel proteins into the outer membrane. The
involvement of Sam37 (Tom37) in protein import into mitochondria of yeast is suggestive of a
similar role for the metaxins in vertebrates and invertebrates, and such a role in human and mouse
mitochondria has been experimentally demonstrated.
3.4. Evolutionary Relationships of Invertebrate Metaxin Proteins
Phylogenetic analysis demonstrates that the invertebrate metaxin 1 and metaxin 2
homologs are grouped separately from vertebrate metaxins, including human metaxins. Figure
3(A) shows the phylogenetic relationships of selected invertebrate and vertebrate metaxins. The
analysis was based on multiple sequence alignments of the protein sequences. The horizontal
lengths of the branches are proportional to the extent of evolutionary change. As Figure 3(A)
10 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
demonstrates, the metaxin 1 proteins form a distinct group relative to the metaxin 2 proteins. And
within each group, the invertebrate and vertebrate proteins form separate clusters.
Also within each group, invertebrate metaxins of the same phylum are seen to be the most
closely related. This includes the starfish and sea urchin metaxin 1 and metaxin 2 proteins
(Echinodermata), the oyster and snail metaxins (Mollusca), and the horseshoe crab and scorpion
metaxins (Arthropoda). Similarly, for vertebrate metaxin 1 and metaxin 2 proteins, the human and
mouse proteins are more closely related compared to the Xenopus and zebrafish metaxins.
The observed grouping of the invertebrate and vertebrate metaxin 1 proteins indicates that
the metaxin 1 proteins arose from a common ancestor. The separate grouping of the metaxin 2
proteins likewise indicates that the metaxin 2 proteins originated from a common ancestor. In
addition, the results strengthen the likelihood of a common ancestral gene sequence for both of the
metaxins. This is true even though alignments of the metaxin 1 and 2 protein sequences show
relatively low percentages of amino acid identities (22% for human metaxins 1 and 2).
Insect metaxins are included in Figure 3(B). As with the other invertebrate metaxins in
Figure 3(A), the insect metaxin 1 and metaxin 2 proteins form separate groups. The metaxins of
insects of the same taxonomic order are seen to be more closely related than those in different
orders. The examples shown are moth and silkworm metaxins (Order: Lepidoptera), honey bee
and bumblebee metaxins (Hymenoptera), and Drosophila metaxins of different species (Diptera).
Although two distinct groupings are found for the metaxin 1 and metaxin 2 proteins, the insect
metaxins, like other invertebrate metaxins, most likely originated from an ancestral metaxin, even
though the amino acid sequences of metaxins 1 and 2 have diverged significantly.
11 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The metaxins of the three species of Drosophila in Figure 3(B) are very closely related
phylogenetically. For metaxin 2, the two Drosophila proteins (metaxins 2a and 2b) found for a
number of Drosophila species are divided into a metaxin 2a cluster and a metaxin 2b cluster. The
results indicate that the 2a proteins are related through a common ancestor, and similarly for the
2b cluster. And both the 2a and 2b proteins are likely to have originated from a common metaxin 2
ancestral sequence.
3.5. Neighboring Genes of Invertebrate Metaxins
The genes adjacent to invertebrate metaxin 1 and metaxin 2 genes are largely different
comparing one invertebrate with another. Also, no similarities are seen with the arrangement of
genes in humans and other vertebrates, where the metaxin 1 gene (MTX1) is next to the
thrombospondin 3 gene (THBS3), the metaxin 2 gene (MTX2) is adjacent to a cluster of homeobox
genes, and the metaxin 3 gene (MTX3) is next to the thrombospondin 4 gene (THBS4). In contrast,
invertebrate metaxin genes are not adjacent to thrombospondin or homeobox genes. And the genes
in the chromosomal regions up to 10 to 20 or more genes from the metaxin genes are different
than the genes of humans and other vertebrates, and generally differ from one invertebrate to
another. For example, the roundworm C. elegans, which has a well-annotated genome, has a
metaxin 1 gene (mtx-1) between the humpback gene and a heat-shock protein gene. And a metaxin
2 gene (mtx-2) is next to the prolyl carboxy peptidase gene. Unique arrangements of genes
adjacent to the metaxin genes were also found for Ciona intestinalis, a sea squirt, with the metaxin
1 gene (mtx-1) between genes for a regulatory protein and a transforming, acidic coiled-coil
protein. There are exceptions, however, to having different neighboring genes. A few genes were
found to be close to the metaxin genes of some, but not all, invertebrates. As an example, an ADP-
12 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
dependent glucokinase gene is close to 3 of 10 invertebrate metaxin 1 genes. But the metaxin 2
genes of the 10 invertebrates showed no similarities in adjacent genes.
More specifically for insects, the genes near metaxin 1 genes are different for Drosophila
melanogaster compared to the Mediterranean fruit fly, honey bee, etc. Also, for Drosophila
melanogaster, the genes close to the two metaxin 2 genes (2a and 2b) are different. However, the
genes adjacent to the single metaxin 1 homolog are conserved in ten of twelve Drosophila species.
Other exceptions to having different genes are closely related species such as the oriental fruit fly
and olive fruit fly, which have the same 5 adjacent genes.
3.6. Secondary Structures of Invertebrate Metaxin Proteins
Alpha-helical segments are the dominant secondary structures of invertebrate metaxins.
There is relatively little beta-strand. Figure 4 shows that representative invertebrate metaxin 1 and
2 proteins each have eight predicted helical segments (H2 through H9) with the same pattern of
spacing between the segments. Metaxin 1 proteins have an additional 9th helical segment (H10) in
their extended C-terminal regions, while metaxin 2 proteins have a 9th helix (H1) in their N-
terminal regions. This pattern of alpha-helices is the same as the pattern for metaxins 1, 2, and 3 of
humans and other vertebrates. As shown in Figure 4(A), the conserved pattern of helices was
found for invertebrate phyla including Cnidaria (example: coral), Porifera (sponge), Arthropoda
(amphipod), Mollusca (octopus), Chordata (amphioxus), and Brachiopoda (brachiopod). Figure
4(B) includes multiple sequence alignments showing the pattern of conserved helices for a
selection of insects (Phylum: Arthropoda; Class: Insecta). The insect taxonomic orders represented
in Figure 4(B) are Hymenoptera (ant, bumblebee), Coleoptera (beetle), Diptera (fruit fly),
13 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Blattodea (termite), and Hemiptera (whitefly). The pattern of predicted alpha-helical segments is
very similar to the pattern in Figure 4(A) for representative invertebrates, not including insects.
Eight helices (H2 to H9) are found for both metaxin 1 and metaxin 2 of insects. H1, an extra N-
terminal helix, is characteristic of metaxin 2 proteins. H10, an additional C-terminal helix,
characterizes metaxin 1 proteins. These highly conserved patterns are found for other insects and
non-insect invertebrates not included in Figure 4.
Metaxin 1 proteins of vertebrates have a predicted transmembrane helix near the C-
terminus. However, vertebrate metaxin 2 and metaxin 3 proteins don’t have a C-terminal
transmembrane helix. The metaxin 1 transmembrane helix is identified as helix H10 of
vertebrates, since it largely coincides with H10. The transmembrane helix could serve to anchor
metaxin 1 proteins to the outer mitochondrial membrane in connection with their role in protein
import into mitochondria. The presence of a C-terminal transmembrane helix is also predicted for
many invertebrate metaxin 1 proteins. For example, the sea urchin Strongylocentrotus purpuratus
has a transmembrane helix between residues 272 and 294 of the 319 amino acid protein. Belcher’s
lancelet or amphioxus metaxin 1 has the helix between residues 269 and 291 of the 321 amino
acid protein. The transmembrane helix mostly coincides with C-terminal helix H10. But neither of
the metaxin 2 proteins of these invertebrates has a transmembrane helix. Considering insects, the
Drosophila melanogaster metaxin 1 homolog (CG9393) has a transmembrane helix between
residues 281 and 301 of the 316 amino acid protein. But neither of the two D. melanogaster
metaxin 2 homologs (CG5662 and CG8004) has a transmembrane helix.
14 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
REFERENCES
Adams, J.C. and Lawler, J., The thrombospondins. Cold Spring Harb. Perspect. Biol. 3:a009712. doi: 10.1101/cshperspect.a009712 (2011). Adams, M.D., Celniker, S.E., Holt, R.A. et al., The genome sequence of Drosophila melanogaster. Science 287, 2185-2195 (2000). Adolph, K.W., The zebrafish metaxin 3 gene (mtx3): cDNA and protein structure, and comparison to zebrafish metaxins 1 and 2. Gene 330, 67-73 (2004). Adolph, K.W., Characterization of the cDNA and amino acid sequences of Xenopus metaxin 3, and relationship to Xenopus metaxins 1 and 2. DNA Sequence 16, 252-259 (2005). Adolph, K.W., Metaxin 3 is a highly conserved vertebrate protein homologous to mitochondrial import proteins and GSTs. bioRxiv doi: 10.1101/813451 (2019). (http://biorxiv.org/cgi/content/short/813451v1)
Adolph, K.W., Long, G.L., Winfield, S., Ginns, E.I. and Bornstein, P., Structure and organization of the human thrombospondin 3 gene (THBS3). Genomics 27, 329-336 (1995). Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402 (1997). Armstrong, L.C., Komiya, T., Bergman, B.E., Mihara, K. and Bornstein, P., Metaxin is a component of a preprotein import complex in the outer membrane of the mammalian mitochondrion. J. Biol. Chem. 272, 6510-6518 (1997). Armstrong, L.C., Saenz, A.J. and Bornstein, P., Metaxin 1 interacts with metaxin 2, a novel related protein associated with the mammalian mitochondrial outer membrane. J. Cell. Biochem. 74, 11-22 (1999).
15 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Bornstein, P., McKinney, C.E., LaMarca, M.E., Winfield, S., Shingu, T., Devarayalu, S., Vos, H.L. and Ginns, E.I., Metaxin, a gene contiguous to both thrombospondin 3 and glucocerebrosidase, is required for embryonic development in the mouse: implications for Gaucher disease. Proc. Natl. Acad. Sci. USA 92, 4547-4551 (1995). Brenner, S., The genetics of Caenorhabditis elegans. Genetics 77, 71-94 (1974). C. elegans sequencing consortium, Genome sequence of the nematode C. elegans: a platform for investigating biology, Science 282, 2012-2018 (1998). Dehal, P., Satou, Y., Campbell, R.K. et al., The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298, 2157-2167 (2002). Do, J., McKinney, C., Sharma, P. and Sidransky, E., Glucocerebrosidase and its relevance to Parkinson disease. Mol. Neurodegener. 14, 36 (2019) doi:10.1186/s13024-019-0336-2. Genikhovich, G. and Technau, U., The starlet sea anemone Nematostella vectensis: an anthozoan model organism for studies in comparative genomics and functional evolutionary developmental biology. Cold Spring Harb. Protoc. (2009) pdb.emo129. doi: 10.1101/pdb.emo129 Jones, D.T., Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195-202 (1999). Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L.L., Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567-580 (2001). LaMarca, M.E., Goldstein, M., Tayebi, N., Arcos-Burgos, M., Martin, B.M. and Sidransky, E., A novel alteration in metaxin 1, F202L, is associated with N370S in Gaucher disease. J. Hum. Genet. 49, 220-222 (2004). Long, G.L., Winfield, S., Adolph, K.W., Ginns, E.I. and Bornstein, P., Structure and organization of the human metaxin gene (MTX) and pseudogene. Genomics 33, 177-184 (1996).
16 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Madeira, F., Park, Y.M., Lee, J., Buso, N., Gur, T., Madhusoodanan, N., Basutkar, P., Tivey, A.R.N., Potter, S.C., Finn, R.D. and Lopez, R., The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. 47, W636-W641 (2019). Marchler-Bauer, A., Bo, Y., Han, L., He, J., Lanczcki, C.J., Lu, S., Chitsaz, F., Derbyshire, M.K., Geer, R.C., Gonzales, N.R., Gwadz, M., Hurwitz, D.I., Lu, F., Marchler, G.H., Song, J.S., Thanki, N., Wang, Z., Yamashita, R.A., Zhang, D., Zheng, C., Geer, L.Y. and Bryant, S.H., CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 45, D200-D203 (2017). Mistry, P.K., Lopez, G., Schiffmann, R., Barton, N.W., Weinreb, N.J. and Sidransky, E., Gaucher disease: progress and ongoing challenges. Mol. Genet. Metab. 120, 8-21 (2017). Mosher, D.F. and Adams, J.C., Adhesion-modulating/matricellular ECM protein families: a structural, functional and evolutionary appraisal. Matrix Biol. 31, 155-161 (2012). Needleman, S.B. and Wunsch, C.D., A general method applicable to the search for similarities in the amino acid sequences of two proteins. J. Mol. Biol. 48, 443-453 (1970). Neupert, W., A perspective on transport of proteins into mitochondria: a myriad of open questions. J. Mol. Biol. 427, 1135-1158 (2015). Papadopoulos, J.S. and Agarwala, R., COBALT: constraint-based alignment tool for multiple protein sequences. Bioinformatics 23, 1073-1079 (2007). Pfanner, N., Warscheid, B. and Wiedemann, N., Mitochondrial proteins: from biogenesis to functional networks. Nat. Rev. Mol. Cell Biol. 20, 267-284 (2019). Putnam, N.H., Srivastava, M., Hellsten, U. et al., Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317, 86-94 (2007). Satoh, N., The ascidian tadpole larva: comparative molecular development and genomics. Nat. Rev. Genet. 4, 285-295 (2003). Schierwater, B. and DeSalle, R., Placozoa. Curr. Biol. 28, R97-R98 (2018). Srivastava, M., Begovic, E., Chapman, J. et al., The Trichoplax genome and the nature of placozoans. Nature 454, 955-960 (2008).
17 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Tayebi, N., Stubblefield, B.K., Park, J.K., Orvisky, E., Walker, J.M., LaMarca, M.E. and Sidransky, E., Reciprocal and nonreciprocal recombination at the glucocerebrosidase gene region: implications for complexity in Gaucher disease. Am J. Hum. Genet. 72, 519-534 (2003). Wangler, M.F., Yamamoto, S., Chao, H.T., Posey, J.E., Westerfield, M., Postlethwait, J., Hieter, P., Boycott, K.M., Campeau, P.M. and Bellen, H.J., Model organisms facilitate rare disease diagnosis and therapeutic research. Genetics 207, 9-27 (2017). White, J.G., Southgate, E., Thomson, J.N. and Brenner, S., The structure of the nervous system of the nematode Caenorhabditis elegans. Philos. Trans. R. Soc. Lond. B Biol. Sci. 314, 1-340 (1986). Wiedemann, N. and Pfanner, N., Mitochondrial machineries for protein import and assembly. Annu. Rev. Biochem. 86, 685-714 (2017).
18 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
FIGURE LEGENDS
Figure 1. Homology of invertebrate metaxins: amino acid sequence alignments. (A) The
amino acid sequence of human metaxin 1 (317 amino acids) is aligned with the sequence of a
representative invertebrate C. elegans (312 aa), an important model system for studies of
development. The identities and similarities are shown in red between the sequences. Identities are
31% and similarities are 46% for this example. Invertebrates, generally, share about 41% amino
acid sequence identities with human metaxin 1. (B) The alignment of human metaxin 2 (263 aa)
with C. elegans metaxin 2 (244 aa) shows 36% identical amino acids, with 57% similarities.
Invertebrate metaxin 2 sequences more typically share about 49% identities relative to human
metaxin 2. Comparing the sequences of various invertebrate metaxin 1 proteins with invertebrate
metaxin 2 proteins shows about 27% identities. (C) Alignment of human metaxin 1 and an insect
metaxin 1. In this alignment with a Drosophila melanogaster metaxin 1 homolog (315 aa), 27%
identical amino acids and 45% similarities are found. D. melanogaster was included because it is a
classic model system of genetics and developmental biology. Comparing the metaxin 1 sequences
of insects of different orders with each other shows about 38% identities. (D) For human metaxin
2 and D. melanogaster metaxin 2a (269 aa), there are 41% identities and 61% similarities. D.
melanogaster is one of the Drosophila species with two metaxin 2 proteins, 2a and 2b. The two
proteins have 48% sequence identities. Comparing the metaxin 2 sequences of a variety of insects
of different orders shows about 51% homology.
Figure 2. Domain structure of invertebrate metaxin proteins. (A) The metaxin 1 and metaxin 2
protein domains of representative invertebrates, not including insects, are shown. These are the
characteristic GST_N_Metaxin, GST_C_Metaxin, and Tom37 domains. The invertebrate
19 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
examples include a sea squirt (Ciona intestinalis; Phylum: Chordata), a sea anemone
(Nematostella vectensis; Phylum: Cnidaria), and Tricoplax adhaerens (Phylum: Placozoa). The
domain structures are highly conserved in these examples, and in the metaxins of other
invertebrates. (B) The metaxin protein domains are shown for three representative insects: the
yellow fever mosquito (Aedes aegypti; Order: Diptera), a termite (Zootermopsis nevadensis;
Order: Blattodea), and the common bed bug (Cimex lectularius; Order: Hemiptera). Other insects
have the same domains, and the insect domains are the same as for the invertebrates in (A). The
presence of a Tom37 domain in both metaxin 1 and 2 proteins is in keeping with a possible role
for the invertebrate metaxins in the import of proteins into mitochondria. Tom37 is a yeast protein
with a well-characterized role in protein import.
Figure 3. Evolutionary relationships of metaxin 1 and 2 proteins of invertebrates.
(A) Phylogenetic tree of selected invertebrate and vertebrate metaxins. The invertebrate species
shown are the crown-of-thorns starfish (Acanthaster planci), purple sea urchin (Strongylocentrotus
purpuratus), Pacific oyster (Crassostrea gigas), snail (Biomphalaria glabrata), Atlantic horseshoe
crab (Limulus polyphemus), and bark scorpion (Centruroides sculpturatus). The starfish and sea
urchin are examples of the phylum Echinodermata, the oyster and snail are Mollusca, and the
horseshoe crab and scorpion are Arthropoda. (B) The insect metaxins are those of a moth
(Amyelois transitella), the domestic silkworm (Bombyx mori), the Western honey bee (Apis
mellifera), and the common eastern bumblebee (Bombus impatiens). Three Drosophila species are
also included (Drosophila melanogaster, Drosophila elegans, and Drosophila eugracilis). The
moth and silkworm are examples of the taxonomic order Lepidoptera, the honey bee and
bumblebee are Hymenoptera, and the Drosophila species are Diptera.
20 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 4. Secondary structures of invertebrate metaxin proteins: multiple sequence
alignments. (A) Representative invertebrate metaxin 1 and metaxin 2 amino acid sequences are
aligned, and the predicted alpha-helical segments are underlined and shown in red. The helices are
labeled H1 through H10. The invertebrate species include a stony coral (Acropora digitifera;
Phylum: Cnidaria), a sponge (Amphimedon queenslandica; Porifera), an amphipod crustacean
(Hyalella azteca; Arthropoda), the California two-spot octopus (Octopus bimaculoides; Mollusca),
the Florida lancelet or amphioxus (Branchiostoma floridae; Chordata), and a brachiopod (Lingula
anatina; Brachiopoda). Invertebrate metaxin proteins, like the vertebrate metaxins, have predicted
secondary structures that are dominated by alpha-helical segments, with very little beta-strand.
The pattern of alpha-helical segments is highly conserved between species. Invertebrate metaxin 1
and metaxin 2 proteins each have 9 helices. Eight of the helices are shared by both metaxin
proteins. Metaxin 1 has a C-terminal helix (H10) that metaxin 2 lacks, while metaxin 2 has an N-
terminal helix (H1) that metaxin 1 lacks. The conservation of helices is striking, considering that
the two metaxins only share about 20% sequence identities. For example, the metaxin 1 and 2
proteins of the California sea hare Aplysia californica (Mollusca) have only 21% identities. This
result demonstrates how invertebrate metaxin secondary structure is more highly conserved than
primary structure. (B) shows the alignment of representative insect metaxin 1 and 2 protein
sequences. The alpha-helical segments, labeled H1 to H10, are underlined and in red. The insects
included are the Florida carpenter ant (Camponotus floridanus; Order: Hymenoptera), the
mountain pine beetle (Dendroctonus ponderosae; Coleoptera), the Mediterranean fruit fly
(Ceratitis capitata; Diptera), the drywood termite (Cryptotermes secundus; Blattodea), the
silverleaf whitefly (Bemisia tabaci; Hemiptera), and the common eastern bumblebee (Bombus
21 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
impatiens; Hymenoptera). As in Figure 4(A), the insect metaxins have 9 alpha-helical segments
that are highly conserved between species. Eight of the helices (H2 to H9) are found for both
metaxin 1 and metaxin 2. Metaxin 2 has an extra N-terminal helix (H1), and metaxin 1 has an
extra C-terminal helix (H10).
22
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(A) Human Metaxin 1/C.elegans Metaxin 1
Identities = 31%, Similarities = 46%
Human 1 MAAPMELFCWSGGWGLPSVDLDSLAVLTYARFTGAPLKVHKISNPWQSPSGTLPALRTSH 60 M EL W +GLP++D+ SL L ++ +P++V + + PW+SPSG LP + + C.elegans 1 M----ELHIWPSDFGLPTIDVVSLQFLACSKMCASPVRVIQSTRPWRSPSGELPMVAQTE 56
Human 61 GEVISVP--HKIITHLRK--EKYNADYDLSARQGADTLAFMSLLEEKLLPVLVHTFWIDT 116 GE V K + L+K + D DL+ + A AF L L P ++HTFW D C.elegans 57 GEAKPVTDFEKFVDILKKCGQDVVIDADLTTIEKAQLDAFSCYLHHNLYPAVMHTFWTDE 116
Human 117 KNYVEVTRKWYAEAMPFPLNFFLPGRMQRQYMERLQLLTGEHRPEDEEELEKELYREARE 176 NY VT+ WYA + FP N + ++++ + L+LL G++ + E+ +EA C.elegans 117 LNYNTVTQYWYASHLHFPYNLYY---LEKRRKKALRLLAGKN------DTEILKEAFM 165
Human 177 CLTLLSQRLGSQKFFFGDAPASLDAFVFSYLALLLQAKLPSGKLQVHLRGLHNLCAYCTH 236 L LS +LG KFF G+ P SLDA VF YLA LL+ LP+ +LQV L NL + C.elegans 166 ALNTLSTKLGDNKFFCGNKPTSLDALVFGYLAPLLRVPLPNDRLQVQLSACPNLVRFVET 225
Human 237 ILSLYFP------WDGAEVPPQRQTPAGPETEE------EPYRRRNQILSV 275 + S+Y P W + A TEE E R+ IL C.elegans 226 VSSIYLPLGEDELKRQQANRKMWQSRISKAKADKEAAKTTEEASESLPEEPPMRDAILFT 285
Human 276 LAGLAAMVGYALLSGIVSIQRATPARAPGTRTLGMAEEDEEE 317 L L + +A+ +G++ + EE+ E C.elegans 286 LGALTLSLVFAIHTGLIQVS------VEEEISE 312
(B) Human Metaxin 2/C.elegans Metaxin 2
Identities = 36%, Similarities = 57%
Human 1 MSLVAEAFVSQIAAAEPWPENATLYQQLKGEQILLSDNAASLAVQAFLQMCNLPIKVVCR 60 M+ AA+ W E+ +L+ +Q L+ D A LAVQ FL+M +LP V R C.elegans 1 MN------AAQDW-EDVSLFTPYLNDQALMYDFADCLAVQTFLRMTSLPFNVRQR 48
Human 61 ANAEYMSPSGKVPFIHVGNQVVSELGPIVQFVKAKGHSLSDGLEEVQKAEMKAYMELVNN 120 N +++SP G VP + + +++ IV FV KG +L+ L E Q A+M+A + ++ + C.elegans 49 PNVDFISPDGVVPLLKINKTLITGFNAIVDFVHKKGVTLTSHLSETQVADMRANISMIEH 108
Human 121 MLLTAELYLQWCDEATVGEITHARYGSPYPWPLNHILAYQKQWEVKRKMKAIGWGKKTLD 180 +L T E ++ W + T ++T RYGS Y WPL+ +L + K+ ++ ++ W KT+D C.elegans 109 LLTTVEKFVLWNHDETYDKVTKLRYGSVYHWPLSSVLPFVKRRKILEELSDKDWDTKTMD 168
Human 181 QVLEDVDQCCQALSQRLGTQPYFFNKQPTELDALVFGHLYTILTTQLTNDELSEKVKNYS 240 +V E D+ +ALS +LG+Q Y PTE DAL+FGH+YT++T +L ++ +K YS C.elegans 169 EVGEQADKVFRALSAQLGSQKYLTGDLPTEADALLFGHMYTLITVRLPLTNITNILKKYS 228
Human 241 NLLAFCRRIEQHYFEDRGKGRLS 263 NL+ F +RIEQ YF+ C.elegans 229 NLIEFTKRIEQQYFKQ 244
Figure 1(A,B). Homology of invertebrate metaxins: amino acid sequence alignments.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(C) Human Metaxin 1/Drosophila melanogaster Metaxin 1 (CG9393)
Identities = 27%, Similarities = 45%
Human 1 MAAPME--LFCWSGGWGLPSVDLDSLAVLTYARFTGAPLKVHKISNPWQSPSGTLPALRT 58 M + L+ + G +GLPS+D + L L RFT P+ V SNP +S +G LP L+ D.melanogaster 1 MEMQLGAMLYVYKGEYGLPSIDFECLRALCLLRFTRCPMDVQTSSNPLRSGAGKLPYLQI 60
Human 59 SHGEVISVPHKIITHLRKEKYNADYDLSARQGADTLAFMSLLEEKLLPVLVHTFWIDTKN 118 + + +I L E Y D LS +Q + A+ + + L + + + N D.melanogaster 61 GNQKFAGY-RQIKRVLDLEGYPIDAHLSTKQKHLSTAYANWVFTNLHAYYHYFLFGEPHN 119
Human 119 YVEVTRKWYAEAMPFPLNFFLPGRMQRQYMERLQLLTGEHRPEDEEELEKELYREARECL 178 + TR YA+ PFP NF+ P QR+ + +Q++ G + ++ E + D.melanogaster 120 FDTTTRGLYAKRTPFPFNFYYPSSYQREACDVVQVMAGFDVNDKLDKHEGDY------171
Human 179 TLLSQRLGSQKFFFGDAPASLDAFVFSYLALLLQAKLPSGKLQVHLRGLHNLCAYCTHIL 238 S++LG + +FFGD + DA V+SYLA++ + LP+ LQ H++G NL + I D.melanogaster 172 ---SRKLGRKVWFFGDTYSEFDAIVYSYLAIIFKIALPNNPLQNHIKGCQNLVNFINRIT 228
Human 239 SLYFPWDGAEVPPQRQTPAGPETE----EEPYRRRNQILSVLAGLAAMVGYALLSGIVSI 294 F +G +TP+G E E + ++AG+ A++ + I D.melanogaster 229 KDIFRIEGYSSVKLTKTPSGTEASLTASERKFLDSELNTKIVAGVGAVLAMGAFAAWRGI 288
Human 295 QRATPARAPGTRTLGMAEEDEE-----E 317 R+ T G+ ED+E + D.melanogaster 289 YNQVTTRS-STDYDGIDYEDDEMEEGLD 315
(D) Human Metaxin 2/Drosophila melanogaster Metaxin 2a (CG8004)
Identities = 41%, Similarities = 61%
Human 1 MS--LVAEAFVSQIAAAEPWPENATLYQQLKGEQILLSDNAASLAVQAFLQMCNLPIKVV 58 M+ +++ + +AEPWPE+ATLYQ + EQILL +NA+ LAV+A+L+MCNLP + D.melanogaster 1 MTSQYLSQLITADKLSAEPWPEDATLYQPYEAEQILLPENASCLAVKAYLKMCNLPFLIR 60
Human 59 CRANAEYMSPSG---KVPFIHVGNQVVSELGPIVQFVKAKGHSLSDGLEEVQKAEMKAYM 115 ANAE+MSP G K+PFI G + +E PIV FV+ K ++ +E +KA+M+ Y+ D.melanogaster 61 SCANAEHMSPGGRMTKLPFIRAGAFIFAEFEPIVNFVEQKELAIGSWQDEDEKADMRTYV 120
Human 116 ELVNNMLLTAELYLQWCDEATVGEITHARYGSPYPWPLNHILAYQKQWEVKRKMKAIGWG 175 LV N+ AELY+ + +E E+T R G +PWPLNH+ Y K+ R +K W D.melanogaster 121 SLVENIFTMAELYISFKNERVYKEVTAPRNGVVFPWPLNHMQNYGKRRNALRLLKVYQWD 180
Human 176 KKTLDQVLEDVDQCCQALSQRLGTQP---YFFNKQPTELDALVFGHLYTILTTQLTNDEL 232 +D V++ V +CC+ L +L P +F+ QP ELDA+ FGHL++ILTT L N L D.melanogaster 181 DLDIDSVIDKVAKCCETLEYKLKESPETPFFYGDQPCELDAIAFGHLFSILTTTLPNMAL 240
Human 233 SEKVKNYSNLLAFCRRIEQHYFEDRGKGRLS 263 +++ YF+ R + D.melanogaster 241 AQTVQKFQHLVEFCRFVDEKYFQTRCLP--N 269
Figure 1(C,D). Homology of invertebrate metaxins: amino acid sequence alignments.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(A) Invertebrate Metaxins 1 and 2 Protein Domains
Ciona intestinalis (sea squirt, vase tunicate) metaxin 1
(1-314)
Nematostella vectensis (starlet sea anemone) metaxin 1
(1-325)
Trichoplax adhaerens metaxin 1
(1-292)
Ciona intestinalis (sea squirt, vase tunicate) metaxin 2
(1-261)
Nematostella vectensis (starlet sea anemone) metaxin 2
(1-244)
Trichoplax adhaerens metaxin 2
(1-231)
GST_N_Metaxin1_like; GST_C_Metaxin1_3; Tom37
GST_N_Metaxin2; GST_C_Metaxin2
Figure 2(A). Domain structure of invertebrate metaxin proteins. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(B) Insect Metaxins 1 and 2 Protein Domains
Aedes aegypti (yellow fever mosquito) metaxin 1
(1-334)
Zootermopsis nevadensis (Nevada dampwood termite) metaxin 1
(1-321)
Cimex lectularius (common bed bug) metaxin 1
(1-318)
Aedes aegypti (yellow fever mosquito) metaxin 2
(1-265)
Zootermopsis nevadensis (Nevada dampwood termite) metaxin 2
(1-262)
Cimex lectularius (common bed bug) metaxin 2
(1-317)
GST_N_Metaxin1_like; GST_C_Metaxin1_3; Tom37
GST_N_Metaxin2; GST_C_Metaxin2
Figure 2(B). Domain structure of invertebrate metaxin proteins. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(A) Invertebrate and Vertebrate Metaxins
Not Including Insects
Crown-of-thorns starfish metaxin 1
Purple sea urchin metaxin 1
Pacific oyster metaxin 1
Snail (Biomphalaria glabrata) metaxin 1
Atlantic horseshoe crab metaxin 1
// Bark scorpion metaxin 1
Human metaxin 1
Mouse (Mus musculus) metaxin 1
Xenopus laevis metaxin 1
Zebrafish (Danio rerio) metaxin 1
Crown-of-thorns starfish metaxin 2
Purple sea urchin metaxin 2
Snail (Biomphalaria glabrata) metaxin 2
Pacific oyster metaxin 2
Atlantic horseshoe crab metaxin 2 // Bark scorpion metaxin 2
Zebrafish (Danio rerio) metaxin 2
Xenopus laevis metaxin 2
Human metaxin 2
Mouse (Mus musculus) metaxin 2
Figure 3(A). Evolutionary relationships of metaxin 1 and 2 proteins of invertebrates. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(B) Insect Metaxins
Amyelois moth metaxin 2
Domestic silkworm metaxin 2
Western honey bee metaxin 2
Common eastern bumblebee metaxin 2
// Drosophila elegans metaxin 2a
Drosophila eugracilis metaxin 2a
Drosophila melanogaster metaxin 2a (CG 8004)
Drosophila eugracilis metaxin 2b
Drosophila elegans metaxin 2b
Drosophila melanogaster metaxin 2b (CG5662)
Drosophila elegans metaxin 1
Drosophila eugracilis metaxin 1
Drosophila melanogaster metaxin 1 (CG9393)
// Western honey bee metaxin 1
Common eastern bumblebee metaxin 1
Amyelois moth metaxin 1
Domestic silkworm metaxin 1
Figure 3(B). Evolutionary relationships of metaxin 1 and 2 proteins of invertebrates. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(A) Invertebrate Metaxin 1 and 2 Amino Acid Sequences
H1 H2
1 MAATV ------MELQS-WPGD--WNLPSIDTRCLEVLAYAKF AGCPINVN-RNNNPWKSPLGEFPVLRSGG 61 Coral MTX 1
1 MASFG[ 1]GSEEIRSKMTLYV-FTGN--EYSASVDPESLQILTYLRL[6]ENEEIEIK-YTCKHWESPTGVFPTLVSDG 76 Sponge MTX 1
1 ------MELHV-WEGDssYGLPSMDLESLQIMMYVKF SGAPVKII-KSMSPAHAPSRNLPVLSCPS 58 Amphipod MTX 1
1 MSSL-[10]ASVPWPDDAALYQPYEKE--QILLTDSANCLSVKTFLKM CGLNFNLELRVNAEEMSPSGKIPFIKVGP 80 Octopus MTX 2
1 ----- ASEPWPDDVCLYQPYKLQ--QVLLPDNAQCLAVQCYLQM CGLPFRTVLRTNAEHMSPSGKPPFLRAGP 66 Amphioxus MTX 2
1 MTTLK[12]AKEPWPDNVNLYQTYEAG--QILLPDSASCLSVKAFLEM CGLKYSVVLRNNAEHMSPSGKVPFIQIGP 83 Brachiopod MTX 2
H3 H4 H5
62 DVLTAPDKIMDYFRLKNFNADYQLNAKQQADTLALVSLVRSKLYPAISFTFWVDERNYTKFTR--PWFSKRLPFPLNYFL 139 Coral MTX 1
77 LYYTGVDSIIKELKKQGLDLDADCSDEELAEISAYSAFLSDRLKKALLYLHWVEAKNYTEVTRgwLSGATVFPFTLYILS 156 Sponge MTX 1
59 GNLNKFSAITAHLRRKNFSCDHELTRKQCSDVLAYQQLIEEKLFPAFVYLWWVEPRSYQEIIH--KWYYSRMRFPYKLWV 136 Amphipod MTX 1
81 FLISEMDPIIAFVNTKGFHISNHLSDSQQSEMRAYMALVESILINAELYLSWCHEDVYYEITK--PRYGACYPWPLNMIL 158 Octopus MTX 2
67 YVISEVDPIIKFVSTKGHSLSDKLDRPQQAEMKAYLALVQGTLGNAELYISWCDPTTASEISR--PRYSSPYPWPLNTLL 144 Amphioxus MTX 2
84 FLISEFEPIIGFVNTKGFHLSANLDDTERAEMRAYMALVDNILVNAEHYISWCDATVLQTVTK--PRYGSPYPWPLNKLV 161 Brachiopod MTX 2
H6 H7 H8
140 PGKlSRQKRSEICEGVYSHEIDDLQLEN---KLFKEALECLKLLSIVLGDKDFFFSHRPTTLDAVVFSHLAVIYKAPLP- 215 Coral MTX 1
157 RQH-RLAKQVIMTSSSGNTSNNTSTIEQcgiELLSQAEEAISLLAGRLGDQKYFYGSSLSSFDVLVFGYLDLIAQYELPg 235 Sponge MTX 1
137 PSN-RHNAYKAFIESIVDEPDDEVAVET---ELLKRGQECLTLLSVRLGDKGFFFGSSPSSLDAILAPYLALLLKVDLR- 211 Amphipod MTX 1
159 PLK----KFTEIKNRLTSIGWFNKSFE----EVCEQIKTCCQVLSERLGNQTYFFGDKPTELDALVFGHLFTLLTTNLP- 229 Octopus MTX 2
145 AYR----KQWDVQNQLKALGWYSKTLE----EVYDEVDNCCKALAQRLGNHHYFFKQGPTELDALVFGHLYTILTTPLT- 215 Amphioxus MTX 2
162 PLK----KWYEETSRLKALGWSQKTLS----EVCEEVQTCCQALSERLDKNNFFFGNRPTELDALVFGHLFTILTTPMP- 232 Brachiopod MTX 2
H9 H10
216 NNKLHNYLQGYDNLSTFCRRVLQRYFS[6]QRQQDQPSAKRSSSQDDAAVDRNS[17]TAYAFISGLV[12]EESTPELG 319 Coral MTX 1
236 NNQLGTRLKSLVQLTNFCSDIREKAYP EIKSRHVSTSSSMSPSSPLRKDPT[ 1]WIFVGVAGIL[ 3]QGMRVGLT 308 Sponge MTX 1
212 VPALQNHLRQCPNLTAYVSRALATCFP NENPSSIGYGRDKPRGPPPAEDGV[16]FGYVFASGLV[ 9]DGEEYGLA 305 Amphipod MTX 1
230 VDQFALVIQKYDNLVEFCNNIQATYYK ------256 Octopus MTX 2
216 DNSLADIVKEYSNLINFCRHIEKKFFE ERERGPKD------250 Amphioxus MTX 2
233 VNKLAEIVKNFDNLVHFCARIEQQYFS ATEAS------264 Brachiopod MTX 2
320 YTFEG[10] 334 Coral MTX 1
309 RKLLK 313 Sponge MTX 1
306 EEEEE[ 4] 314 Amphipod MTX 1 Figure 4(A). Secondary structures of invertebrate metaxin proteins: multiple sequence alignments.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.06.895979; this version posted January 7, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
(B) Insect Metaxin 1 and 2 Amino Acid Sequences
H1 H2
1 ------MDNAEI------LQLDI-WKGDWGLPSVDIDCLQVLAYAKFSGIPLKVNLTSN-PFKTPNGRLPLLRA 60 Ant MTX 1
1 ------MLGNDK-----FTLYV-YDGDYGLPSFNVECTKVLLYIAMVKVSVEIKTLNN-IKSCAFYTGPCFMH 60 Beetle MTX 1
1 ------MLLGG------VLYV-YKGEWGLPSIDFDCLRALCLVKFTRCPLQIDTNVN-PMRSSTGKLPYIQI 58 Fruit fly MTX 1
1 -MPSVLLTESIALELGAQEPWPQDVKLFQPYEVEQILLPDNANCLAVQAFLKMCGLDFEVEPRSNAEYMSPSGRVPFIKC 79 Termite MTX 2
1 mVSSNVLLDSIQIEIGGDNEWPNDAKLFQPYETEQILMPDSANCLAVQAFLQMANLKYETVYKANAAEMSPTGRVPFIKV 80 Whitefly MTX 2
1 -MPCPLLADSITLELEAQEPWPQPITLYQPYEVEQILLPDNANCLAVQAFLKMCGLDFQIEPRSNAEYMSPSGRVPFIKC 79 Bumblebee MTX 2
H3 H4 H5
61 GLNTLDTVKDILPFFRAKHYNSDYTLTDKQCADVLAYDALLKEKLYPALQFIWWIDKKNLD____ELIRPWYCKALPFPF 136 Ant MTX 1
61 KNLSFGSFGDIVPYLNTLNFDLDVQLSPKQRSEALALSNMVQAKLRGPVEFAFWVDYRNFE[41]ELVK------166 Beetle MTX 1
59 GTERFCGYREIKKMLDKEGFAIDAKLQVKHELLSPAYANWVFTNLHSYYNYFMYGEPNNFD____-TTRSLYAKRTMFPL 133 Fruit fly MTX 1
80 GAFVVAELDPIISFVNHKGISLSDKLDSVQKADMRAYMSLINTVLANAELYITWCDKTTLQ____EVTKPRFGSVLPWPL 155 Termite MTX 2
81 GQYLVAELEPIVAFVQNKNISLTSNLTDAEKSDMRAYMSLINTVFGNAENYITWCDPETYS____KVTKKRFGSIHPFPL 156 Whitefly MTX 2
80 GAFLISEFDNIVSFIGNKGRSLSDHLSATCKADMRAYMSLVNNVFVNAELYICWVDEPTLN____EVTKVRHGSVYPWPL 155 Bumblebee MTX 2
H6 H7 H8
137 KFYYPGKFERQAQALMQSLYSME DNID[4]EVYSEAQKCLTLLSMRLGDGDFFFGQQ-PSTIDAIVYSYLALLLKAP 213 Ant MTX 1
167 ------EYLN ---SLASECLASLSARLGVSKYFFRDQ-PSSLDVVVYSYVAPLVKLP 213 Beetle MTX 1
134 NFFYPSSYQREACDIVMVMSGFD[5]ENHD[2]SLTINAKKCVNLLSKKLGKKVWFFGDV-FSEFDAIVYSYLAILFKTT 213 Fruit fly MTX 1
156 NHILAWQKRSQVVKKLGVLGWGN KTLD EVYEEIENCCSALSERLENRPYFFGDR-PTELDAQVFGHLFTILTTP 228 Termite MTX 2
157 NHILVWKKRSATIERMKALGGKT EQ-E___AVLDDVDLCCQSLSDRLGDREFFFKSG-PTELDALVFGHIFTIITTP 228 Whitefly MTX2
156 NHFLNWQKRKEVIKKLNVLGWYN KTIE___EVCKEVQNCCTALSERLEGSDYFSGEKtPNELDALVFGHIFTIVTTP 229 Bumblebee MTX 2
H9 H10
214 LPNPVLQNHLRNCTNLVKYVSRISQRYFEYE[4]EKDKAEKNAEKLgeDECEFPNKRRNQIFAAFVAAMAmaGYALSTGI 294 Ant MTX 1
214 FPSNDLSIVVALWPNLVSFINRIDAKFLPEL[1]KESKYIKNEAKLktSDDEVSYVAISILTVSAISLLW--GFAVSKGF 289 Beetle MTX 1
214 LPNNPLQNHIKGCQNLVNFINRITKDIFKNE[2]NSVKAIKSQSSN--DPLLTATERHFLESEKKTKILAgiGAVVAMGT 290 Fruit fly MTX 1
229 LPNNRFASIVRGYPNLIELCKLVEKEYF------ERTAEA------262 Termite MTX 2
229 LPNNQLNGIVRGFPNLIEHCKRIDKMFF------KTNFSSEWM------265 Whitefly MTX 2
230 LPGNKLANIVQSYPLLVHLCKRIETSIF------SPQEIRVK------265 Bumblebee MTX 2
295 V[12]STEYI[7] 319 Ant MTX 1
290 I SSKLF 295 Beetle MTX 1
291 F[12]SKRYY[3] 311 Fruit fly MTX 1
Figure 4(B). Secondary structures of invertebrate metaxin proteins: multiple sequence alignments.