www.nature.com/scientificreports

OPEN The (CYP) superfamily in cnidarians Kirill V. Pankov1,2, Andrew G. McArthur1, David A. Gold3, David R. Nelson4, Jared V. Goldstone5 & Joanna Y. Wilson2*

The cytochrome P450 (CYP) superfamily is a diverse and important enzyme family, playing a central role in chemical defense and in synthesis and metabolism of major biological signaling molecules. The CYPomes of four cnidarian genomes (Hydra vulgaris, digitifera, Aurelia aurita, Nematostella vectensis) were annotated; phylogenetic analyses determined the evolutionary relationships amongst the sequences and with existing metazoan CYPs. 155 functional CYPs were identifed and 90 fragments. were from 24 new CYP families and several new subfamilies; genes were in 9 of the 12 established metazoan CYP clans. All species had large expansions of clan 2 diversity, with H. vulgaris having reduced diversity for both clan 3 and mitochondrial clan. We identifed potential candidates for xenobiotic metabolism and steroidogenesis. That each genome contained multiple, novel CYP families may refect the large evolutionary distance within the cnidarians, unique physiology in the cnidarian classes, and/or diferent ecology of the individual species.

The cytochrome P450 superfamily. Te cytochrome P450 (CYP) superfamily consists of a large group of hemeproteins that catalyze a wide range of reactions, playing important roles in several fundamental bio- logical processes (e.g. steroid synthesis, metabolism, chemical defense)1. CYPs primarily perform a function, adding polar hydroxyl groups to a substrate, making it more reactive and ­hydrophilic2. Each individual CYP has a specifc suite of compounds that ft into its active site, ofen with a structure–activ- ity relationship. Collectively, CYP enzymes interact with both endogenous (e.g. steroids and fatty acids) and exogenous (e.g. drugs, plant secondary metabolites, and pollutants) compounds with highly diverse chemical structures. Tus, the CYP superfamily plays a role in multiple, critical physiological processes. In vertebrates, the CYP11, CYP17, CYP19, CYP21 and CYP51 families catalyze steps in the synthesis of hormones, including sex ­steroids3. Te CYP51 family, the only CYP family found in all domains of life, produces an intermediate, 4,4-dimethylcholesta-8(9),14,24-trien-3β-ol, in the cholesterol biosynthesis ­pathway4, the end product of which is an important component of the lipid bilayer and involved in cell signaling. Many of the enzymes in the CYP4 family play a role in the metabolism of omega fatty ­acids5, providing a source of adenosine triphosphate (ATP) for cells when carbohydrate stores are low. Certain in the CYP1, CYP2, and CYP3 families have high substrate specifcity for foreign compounds such as polycyclic aromatic hydrocarbons (PAHs) and drugs, modi- fying them as part of xenobiotic response and ­detoxifcation6,7. CYP genes are named according to a standard set of nomenclature conventions based largely on sequence ­similarity8,9. CYPs with greater than 40% amino acid sequence identity belong to the same family (represented by a number) and those with greater than 55% sequence identity belong to the same subfamily (represented by a letter). Each CYP is assigned a number, typically based on order of discovery. As an example, CYP17A1 would be the frst gene identifed in the ‘A’ subfamily of the ‘CYP17′ family. A broader classifcation of CYPs into ‘clans’ has also been implemented, based on phylogenetic relationships among CYP ­families10. All of the CYP families that repeatedly cluster in the same phylogenetic clade are grouped into the same clan, which is usually given the name of the smallest number family present in that clade. For example, since CYP39, CYP7 and CYP8 sequences consistently form a single clade in phylogenetic analyses, all three families became part of clan 7. Te crucial nature of CYP proteins is emphasized by their presence in some capacity across all domains of life and they are thought to be present in all metazoan species­ 11. Te progression of genomic sequencing technologies has allowed the entire CYP gene complement (the CYPome) of many metazoan species to be identifed, including

1M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, DeGroote School of Medicine, McMaster University, Hamilton, ON, Canada. 2Department of Biology, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada. 3Department of Earth and Planetary Sciences, University of California Davis, Davis, CA, USA. 4Department of Microbiology, Immunology and Biochemistry, University of Tennessee, Memphis, TN, USA. 5Department of Biology, Woods Hole Oceanographic Institution, Woods Hole, MA, USA. *email: [email protected]

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 1 Vol.:(0123456789) www.nature.com/scientificreports/

Figure 1. A simplifed view of the cnidarian phylum in the tree of life. Branch lengths on the trees are arbitrary and meant only to visually display the relationship between the clades of species, not evolutionary distance between them. (A) Te placement of the cnidarian phylum on the metazoan tree of life is highlighted. It diverged from the majority of metazoans, before the development of bilateral symmetry and the split between protostome and deuterostome development. (B) Cartoon of the relationships within the cnidarian phylum. Te placement of A. aurita, H. vulgaris, A. digitifera and N. vectensis is illustrated. Adapted from­ 25,30.

those from major metazoan phyla. Studies have been completed on chordates­ 7, model organisms such as mouse, fsh, frog, and chicken, and fsh such as Danio rerio (zebrafsh)12 and Takifugu rubripes (fugu puferfsh)13. Other phyla that have been investigated in detail include the hemichordate Ciona intestinalis (sea squirt tunicate)14, echinoderm Stronglyocentrotus purpuratus (sea urchin)15, the Drosophila melanogaster (fruit fy)16 and Daphnia pulex (water fea crustacean)17, and the annelid Capitella teleta (marine worm)18. In many of these studies, full length and partial genes have been identifed (as well as pseudogenes), along with estimations of intron–exon boundaries. In general, most vertebrate genomes tend to have between 50–100 CYP genes­ 19, but the number of genes expected in other metazoan species is hard to predict as the diversity and evolutionary origins of individual CYP families is not well understood. For example, the genome of the annelid Capitella teleta contained 24 novel CYP families and 7 novel CYP subfamilies amongst its predicted total of 96 functional ­CYPs18. We can anticipate continued discovery of novel CYP genes where function is completely unknown as we sequence an increasingly broad diversity of metazoan genomes. While the number of novel CYPs varies across phyla (even the contains the poorly understood “orphan” CYP20 enzyme)20, they are of great interest as identi- fcation of their function may provide important clues to the unique physiology and environmental stressors of diferent species, as well as the evolution of core physiological processes such as steroid biosynthesis. Relatively little work has been done to identify cytochrome P450s in the species that comprise the cnidarian phylum, one of the most diverse and earliest-branching groups on the metazoan tree of life. One defensome study identifed CYP sequences from the starlet sea anemone Nematostella vectensis, but the 82 genes were not named, in part due to their low similarity with known CYP genes­ 21. With more recent genome sequencing in other cnidarians, notably the moon jellyfsh Aurelia aurita22 and brown hydra Hydra vulgaris23 and stony Acropora digitfera24, we now have an opportunity to examine cnidarian CYPomes in detail.

The cnidarian phylum: evolutionary signifcance. Tere are more than 20,000 known cnidarian spe- cies, including many types of jellyfsh, , anemones and other aquatic . Many of these species have become cost-efective models for research related to toxicology, regeneration and other biochemical ­systems25. Evidence from ribosomal DNA indicates that the common ancestor of the cnidarian phylum diverged from the rest of the metazoans approximately 600 million years ago­ 26; although more recent molecular studies support an older divergence in the Neoproterozoic Era, around 600–800 million years ­ago27–29. Tis was before the split between protostomes (arthropods, annelids and mollusks) and deuterostomes (chordates, hemichordates, and echinoderms) that defnes many of the species that have had their CYP content characterized (Fig. 1A). Te lack of data on CYP genes from this area of the tree of life is a major gap in understanding CYP evolution in meta- zoans, and the functional role that CYPs may play in cnidarian physiology. Tere is extensive morphological, behavioural, developmental, and ecological diversity within the and an understanding of CYP genes in the context of generation of biological signaling molecules and metabolism may be highly relevant to the diversity within this ancient lineage. Te defning feature that unites the phylum is the presence of a specialized cell known as a cnidocyte, ofen referred to as the ‘stinging cell’. It produces a structure called a cnida (i.e. nema- tocyst) that contains toxins used as defense against predators and as a means of attack to capture ­prey25. Within the phylum, the three major lineages are the , endocnidozoa and medusozoa; hydrozoans, scyphozoans and cubozoans are all in the ­medusozoa30. Te anthozoan class is the most abundant in species and includes corals and anemones that reside on marine seafoors and act as habitats for other aquatic animals. Te endocni-

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 2 Vol:.(1234567890) www.nature.com/scientificreports/

dozoa consist of parasitic species. Te medusozoan are the jellyfsh, hydroids and siphonophores. Hydrozoans such as the brown hydra are small predatory animals that ofen live in salt and freshwater habitats, typically in large colonies. Te scyphozoans (i.e. “true jellies”) and cubozoans (i.e. box jellyfsh) play important roles in the pelagic food chain and ecology. Cnidarian life cycles can include swimming medusae and sessile polyp stages, asexual and/or sexual reproduction, free-living and parasitic stages, solitary or colonial groupings, and associa- tions with photosynthetic dinofagellates. Complete genome assemblies had been published for several cnidarian species; we selected genomes from two of the three cnidarian classes: anthozoa (Nematostella vectensis, Acropora digitifera) and medusozoan (the hydrozoan Hydra vulgaris and the scyphozoan Aurelia aurita) (Fig. 1B)25; the Clytia genome has also recently been ­completed31. Te hydrozoan H. vulgaris, commonly known as the brown hydra, has been studied as a model organism for regeneration and more recently, stem cell diferentiation­ 23. H. vulgaris is part of the Hydra vulgaris group, a group where species boundaries are uncertain and across which there is limited sequence ­diversity32. NCBI considers H. magnipapillata33 and H. attenuata as heterotypic synonyms for H. vulgaris (NCBI:txid6087)34 and H. vulgaris strain 105 specifcally denotes the hydra previously described as H. magnipapillata. We use H. vulgaris as the current taxonomy but H. magnipapillata was used with the genome release­ 23. Te anthozoan A. digitifera, or stony coral, is one of the main species that comprises the architecture of coral reefs, sustaining some of the most diverse marine ecosystems on the ­planet35. Another anthozoan, the starlet sea anemone N. vectensis, has become a model organism for gene knockout studies due to its quick developmental cycle­ 36. Lastly, the schyphozoan moon jellyfsh A. aurita has long been of interest to the scientifc community due to its unique body plan and complex life cycle, as well as periodic ‘blooms’ of jellyfsh that can have destructive efects on marine ecosystems­ 37. A broad study of the defensome of N. vectensis was conducted by Goldstone­ 21, including searching the genome for putative CYPs. A total of 82 potential CYP genes were identifed, but at that time they were not grouped into existing families or subfamilies due to low amino acid sequence similarity with other known CYPs (< 40%)21. However, phylogenetic investigation revealed that the majority of these genes belonged in cytochrome P450 clans 2 and 3, which include CYPs from other organisms that metabolize both exogenous and endogenous compounds. Tese data suggest that detailed genomic and phylogenetic analyses of cnidarian CYPs presents an exciting opportunity to identify novel CYP families and subfamilies given the early divergence of the cnidaria from the bilateral metazoan and the ecological diversity among individual cnidar- ian species. Te available genome sequences provide the means for a broad scope analysis of cytochrome P450 genes in the cnidarian phylum as a whole, as well as a comparison between three major classes. Te goal of this study was thus to identify and characterize the cytochrome P450 superfamily in cnidarians, which could lead to the discovery of novel CYP families with unique functional properties, illuminate the ways that these spe- cies respond to environmental stressors, and to provide insight into the evolution of CYP functional diversity within the Metazoa. Results Genome‑wide annotation of cytochrome P450 genes. For each cnidarian genome, well curated CYPs from other species (human, fruit fy, zebrafsh, nematode worm, and annelid worm) were used in the basic local alignment search tool (BLAST) to identify regions of the genome that contained a likely CYP gene. Annota- tions were individually curated to locate, where possible, start and stop codons, appropriate signal sequences for intron–exon boundaries, an appropriate size of approximately 500 amino acids in translated sequence, and the presence and location of the key CYP gene motifs (I-helix, K-helix, meander coil and heme loop); expressed sequence tag (EST) support and a match to the Pfam CYP hidden markov model (HMM) were used to confrm identifed CYP genes. Te 82 CYP regions previously identifed in N. vectensis were curated without further searching of the genome. In total, there were 19 cytochrome P450 genes found in the H. vulgaris genome, 14 in the A. digitifera coral species, 22 in the A. aurita moon jelly, and 44 in N. vectensis that met all criteria dur- ing curation (start and stop codon, translated to 500 aa, contained all of the CYP motifs). Many of the H. vulgaris gene predictions had strong EST support and all predicted complete genes in all species matched the entirety of the Pfam CYP model (HMM) with high confdence. Exon number was low in many of the predicted cnidar- ian CYP genes, with typically only 1–3 exons per gene; the only cnidarian CYPs with greater than 5 exons were found in the coral A. aurita. Tere were several ‘partial’ CYPs genes identifed in each species that were missing several hundred amino acid residues at the N-terminus or C-terminus of the protein (thus missing start and/or stop codons), but still contained all four of the key CYP motifs. Tere were 5 ‘partial’ CYPs in H. vulgaris, 10 in the A. digitifera coral, 15 in the A. aurita moon jelly, and 25 in N. vectensis. Tese partial genes contained enough genetic informa- tion that they are presumed functional, even though they are not completely resolved in the current genome assemblies available for each species. A list of all the ‘complete’ and ‘partial’ genes from each species, complete with their genomic location and length is available in Table S1. Each of these was assigned a formal name by the CYP Nomenclature Committee. Te length of each sequence and conservation of signature CYP motifs are summarized in Table 1 (H. vul- garis), Table 2 (A. digitifera), and Table 3 (A. aurita) for each of the newly identifed complete and partial genes. Te K-helix and heme loop motifs were the most highly conserved in all three cnidarian species; the glutamic acid and arginine residues in the K-helix were present in every gene and the only deviation in the heme loop was a fairly conservative G > A substitution in the A. aurita CYP3552G1 and a deletion in the CYP20 proteins we have observed in other metazoans (and noted ­in38). Te I-helix and meander coil had signifcantly more variation between diferent CYP genes; the I-helix is degenerate in the CYP20 genes (a feature we and others­ 38 have observed in metazoan CYP20s; Lemaire personal communication and data not shown) and while the phe- nylalanine, proline and arginine residues in the meander coil are highly conserved, the others vary considerably.

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 3 Vol.:(0123456789) www.nature.com/scientificreports/

I-helix K-helix Meander Coil Heme Loop CYP Length AA [AG]G-x-[DE]T[TS] AA E-x-x-R AA FDPER AA F-x-x-G-x-R-x-C-x-G CYP4LA1 491 302 EGHDTT 358 ESLR 410 FIPER 431 FSAGPRNCIG CYP4LA2 502 300 AGHDTI 356 ESMR 408 FIPER 429 FSAGSRNCLG CYP4LA3 497 308 EGHDTT 364 ESLR 416 FIPER 437 FSAGPRNCIG CYP4LA4 498 309 EGHDST 365 ESMR 417 FIPER 438 FSAGPRNCIG CYP4LA5 505 316 EGHDTT 372 ESLR 424 FIPER 424 FSAGPRNCIG CYP4LB1 506 291 AGHDTT 349 ESLR 401 FIPER 422 FSAGPRNCIG CYP20A1 496 – – 360 EVMR 412 FDPDR 433 FGFAGKRKCPG CYP3352A1 497 297 AGSETS 354 ETLR 407 FYPER 430 FSSGPRSCIG CYP3352A2 447 303 AGSETS 360 ETLR 413 FYPER 436 FSSGPRSCLG CYP3352A3 493 298 AGSETS 355 ETLR 408 FYPER 431 FSNGPRSCLG CYP3352A4 493 298 AGSETS 355 ETLR 408 FYPER 431 FSGGPRACLG CYP3352A5 326 131 AGSETS 188 ETLR 241 FYPER 264 FSSGPRSCLG CYP3352A6 322 127 AGSETS 184 ETLR 237 FYPER 260 FSGGPRSCLG CYP3352A7 308 129 AGSETS 169 ETLR 222 FYPER 245 FSGGPRSCLG CYP3352B1 505 300 SGSETT 357 ETLR 410 FNPYR 433 FSTGLRACLG CYP3352C1 506 300 AGTETA 357 ETLR 411 FNPYR 434 FSAGTRVCLG CYP3352C2 518 300 AGTETA 357 ETLR 357 FNPYR 434 FSAGTRVCLG CYP3352D1 506 300 GGVETT 357 ESLR 410 FNPYR 433 FSIGLRACLG CYP3352D2 507 260 AGSETT 317 ECLR 370 FNPHR 393 FSAGTRVCLG CYP3352D3 499 276 TGSETT 333 ECLR 386 FNPYR 409 FSAGTRVCLG CYP3352D4 475 276 TGSETT 333 ECLR 386 FNPYR 409 FSAGTRVCLG CYP3352D5 444 300 GGSETT 357 ECLR 410 FNPYR 433 FSAGTRVCLG CYP3353A1 489 295 AGSETT 351 ETLR 404 FNPMR 427 FSAGPRGCIG CYP3354A1 499 299 AGTETT 355 ETLR 408 FDPKR 431 FSAGARVCLG

Table 1. Te occurrence and position of conserved cytochrome P450 motifs in the H. vulgaris CYPome. ‘Length’ indicates the total length of the translated protein sequence, and ‘AA’ is the position of the frst amino acid residue for each motif. Bolded residues indicate residues that were conserved.

Tis type of variation in the I-helix and meander coil is not uncommon in CYPs from other species and thus is not surprising for the cnidarians. As with all genome annotations of CYPs, multiple ‘fragments’ of CYPs were identifed; the fragments varied in length, were missing one or more of the main CYP motifs, and it was unclear on whether they represented func- tional genes or pseudogenes (Table S2): 11 fragment genes in H. vulgaris, 34 in A. digitifera, 31 in A. aurita, and 14 in N. vectensis. Based on the complete and partial CYP gene sequences and setting aside the fragmentary gene sequences, the cnidarian genomes contain a minimum of 24 (H. vulgaris and A. digitifera), 37 (A. aurita), or 69 (N. vectensis) functional CYP genes and a total of 24 novel CYP families and 12 new subfamilies. Two of the new families are exclusive to H. vulgaris (CYP3353, CYP3354), another two are exclusive to A. digitifera (CYP3344, CYP3346), six are exclusive to N. vectensis (CYP377, CYP443, CYP3340, CYP3345, CYP3347, CYP3349), and four are exclusive to A. aurita (CYP3336-3337, CYP3341, CYP3351). In total, more than 85% of the complete or partial genes identifed are from new CYP families and, other than the CYP20 genes and one CYP16 in N. vectensis, the rest are from new subfamilies within the CYP3, CYP4 and CYP46 clans.

Phylogenetic analysis of cnidarian cytochrome P450 sequences. Te phylogenetic relationships between the cnidarian CYP genes are shown in Fig. 2. Tis maximum likelihood tree contains all of the complete and partial CYPs from each cnidarian species, but not the fragmentary genes. Te phylogeny includes CYPs from humans, zebrafsh and other vertebrates from eighteen vertebrate CYP families: CYP1-5, CYP7, CYP8, CYP11, CYP16, CYP17, CYP19-21, CYP24, CYP26, CYP27, CYP46 and CYP51. Tere are a total of 330 CYP sequences included in this tree. Te vertebrate CYPs were clustered into their clans as expected and all clans had high bootstrap sup- port (greater than 90%). Known species phylogeny was evident within the CYP gene families, both for the ver- tebrate species and those families where there were sequences from multiple cnidarian species. For example, N. vectensis and A. digitifera CYPs (the two anthozoan species) ofen clustered together, with representatives from each species in the CYP3094, CYP3339-CYP3340, CYP3342, and CYP3343-CYP3348 families clustered in the clan 2 and clan 3 clades. Likewise, H. vulgaris and A. aurita sequences always clustered together throughout the tree (CYP3351-3354 in clan 2, CYP4 and CYP20). In general, the cnidarian genes formed distinct clades within each clan (for example, the clan 2 H. vulgaris, clan 4 H. vulgaris and clan 3 A. aurita CYPs). Clan 2 contained the largest number of cnidarian sequences, most of which were in clades separate from any vertebrate sequences. Nonetheless, there were cnidarian sequences from the CYP3094 and CYP3349 family that clustered close to the CYP17 and CYP21 families and several cnidarian families (CYP3346, CYP3347, CYP3343, and CYP3348) that clustered with the CYP1 family. Vertebrate CYP27 and CYP24 families clustered with the

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 4 Vol:.(1234567890) www.nature.com/scientificreports/

I-helix K-helix Meander Coil Heme Loop CYP Length AA [AG]G-x-[DE]T[TS] AA E-x-x-R AA FDPER AA F-x-x-G-x-R-x-C-x-G CYP375B3 504 303 AGVDTT 360 ETLR 412 FKPER 439 FGFGTRMCLG CYP3094B1 504 309 AGLETS 366 EVLR 419 FNPER 442 FGAGRRGCLG CYP3094H1 440 284 AGSETL 342 EALR 395 FDPNR 418 FSAGRRVCLG CYP3094J1 485 290 AGLETT 347 EVLR 400 FDPSR 423 FGAGRRVCLG CYP3094K1 451 280 AGSDIT 337 EIMR 390 FDPTR 414 FGGGLRACMG CYP3094K2 451 280 AGSDIT 337 EIMR 390 FDPTR 414 FGGGLRACMG CYP3339A1 516 311 AGYETS 369 EALR 421 FDPER 442 FGSGPRVCIG CYP3339A2 462 267 AGYETS 326 ESQR 378 FDPER 399 FGAGPRNCIG CYP3339A3 511 317 AGYETS 376 EALR 428 FDPER 449 FGAGPRNCIG CYP3339A4 493 299 AGYETS 358 ETLR 410 FDPER 431 FGAGPRNCIG CYP3342A5 228 36 AGYETS 94 ETLR 146 FDPER 167 FGHGPHNCVG CYP3343A1 481 291 GGYEKL 347 EILR 400 FNPYR 422 FSTGDRKCPG CYP3344A1 489 291 AGIDTV 348 ETLR 401 FNPHR 424 FSTGGRRCLG CYP3346A1 489 290 AGFETS 347 ELLR 400 FNPRR 424 FSGGRRKCPG CYP3346A2 476 276 GGAIDTS 334 ELLR 387 FNPRR 411 FSVGRRQCPG CYP3346A3 476 276 GGAIDTS 334 ELLR 387 FNPRR 411 FSVGRRQCPG CYP3346A4 492 293 AGFDTS 350 ELLR 403 FCPSR 427 FSAGRRKCPG CYP3346A5 359 160 AGFETS 217 ELLR 270 FNPKR 294 FSGGRRKCPG CYP3346A6 491 313 AGFETS 370 ELLR 423 FYPRR 447 FSGGRRKCPG CYP3348A1 397 206 AGYETT 262 ESLR 315 FNPYR 337 FGAGRRVCAG CYP3348A2 403 207 AGYETT 263 ESLR 316 FNPYR 338 FGAGRRVCAG CYP3348B6 346 197 AGYETT 254 EALR 307 FNPQR 329 FSAGRRVCAG CYP3348B7 348 197 AGYETT 254 EALR 307 FNPHR 329 FSAGRRVCAG CYP3355C1 495 298 AGMETT 355 ETLR 408 FRPDR 431 FSGGRRSCLG

Table 2. Te occurrence and position of conserved cytochrome P450 motifs in the A. digitifera CYPome. ‘Length’ indicates the total length of the translated protein sequence, and ‘AA’ is the position of the frst amino acid residue for each motif. Bolded residues indicate residues that were conserved.

cnidarian CYP3338, CYP377, and CYP375 families in the mitochondrial clan except for the CYP375D subfamily in N. vectensis, which is basal to the same clade. Phylogenetic analysis also had clear support for cnidarian CYP4, CYP20 and CYP46 family members. Interestingly, the phylogenetic results also supported N. vectensis having several genes related to CYP3N (CYP3662A1-3) and CYP3P (CYP3662B1) subfamilies, in addition to novel cnidarian CYP families (notably CYP3340) that clustered close to the vertebrate CYP5s in clan 3. With the exception of fve genes in A. aurita (CYP3336A1, CYP3336B1, CYP3336B2-1, CYP3336B2-2, and CYP3337A1), all of the cnidarian CYPs could be grouped into existing metazoan CYP clans. Te proportion of each CYPome that fell into the major metazoan clans is shown in Fig. 3A. Clan 2 genes made up between 50 and 75% of each species’ CYPome, making it the most prevalent clan in each cnidarian genome. Interestingly, clan 3 was the second most prevalent (20–25%) for all of the cnidarians other than H. vulgaris, which lacked any complete or partial CYPs from this clan. While clan 4 CYPs represented a small portion of the N. vectensis and A. aurita CYPomes and were completely absent in A. digitifera, a relatively large proportion (25%) of the H. vulgaris CYPs fell in this clan. In total numbers this translates to only six clan 4 CYPs in H. vulgaris compared to two and one in N. vectensis and A. aurita, respectively. Te mitochondrial clan CYPs were not highly prevalent in any of the cnidarian species (roughly 4–11% of the total CYPomes, except for H. vulgaris, where they were absent). Overall, when examining the presence/absence of the twelve metazoan CYP clans for each of the cnidar- ian species, the same clans were not found in all cnidarian genomes (Fig. 3B, Table S4). However, inclusion of fragmentary CYP genes fnds clans 2, 3, 4 and 20 are present in all of four cnidarian species. To better examine CYP family associations, phylogenetic trees were generated with only clan 2 sequences, including the CYP1, CYP2, CYP17 and CYP21 vertebrate families (Fig. 4; 121 sequences) or only clan 3 and 4 sequences, including the CYP3, CYP4 and CYP5 vertebrate families (Fig. 5; 61 sequences). Te fve A. aurita CYPs that did not place within an existing metazoan clan in Fig. 2, but were basal to the clan 3 and 4 CYPs, were included in the clan 3 and 4 tree. Tese subtrees generally had higher bootstrap support than the main tree and it was clear that the cnidarian CYPs formed distinct groups within the larger clan 2 clade; this pattern was observed both on the large tree containing all of the cnidarian CYPs (Fig. 2) and within the clan 2-specifc tree (Fig. 4), with strong bootstrap support separating the major groups in both cases. Within clan 2, the CYP3343- CYP3348 families in N. vectensis and A. digitifera formed a sister clade to the CYP1 family of enzymes, while all of the H. vulgaris and A. aurita CYPs in this clan, as well as the CYP3094, CYP3349 and CYP3355 families from N. vectensis and A. digitifera, formed a large clade with the CYP17 and CYP21 vertebrate families, with varying degrees of relatedness. Te phylogenetic tree of the clan 3 and clan 4 CYPs verifed that the fve uncategorized A. aurita CYPs were basal to those clans (Fig. 5). Te cnidarian clan 3 CYPs formed a clade of exclusively cnidarian

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 5 Vol.:(0123456789) www.nature.com/scientificreports/

I-helix K-helix Meander Coil Heme Loop CYP Length AA [AG]G-x-[DE]T[TS] AA E-x-x-R AA FDPER AA F-x-x-G-x-R-x-C-x-G CYP4KY1 525 332 EGHDTT 390 ESLR 443 FIPER 464 FSAGPRNCVG CYP20A1 481 – – 348 ETLR 400 FDPER 423 F-AGKRKCPG CYP375E1 440 248 AGVDTT 305 ETLR 357 FKPER 380 FGFGVRMCLG CYP375E2 440 248 AGVDTT 305 ETLR 357 FKPER 380 FGFGVRMCLG CYP3336A1 502 310 AGHHTS 367 EAMR 419 YNPDR 441 FARGPRMCIG CYP3336B1 504 309 AGHETT 366 ESMR 418 FNPDR 442 FSRGPRMCIG CYP3336B2 332 143 AGSTSL 196 ESMR 248 FNPDR 270 FSRGPRMCIG CYP3336B3 323 134 AGSTSL 187 ESMR 239 FNPDR 261 FSRGPRMCIG CYP3337A1 474 315 AGFETT 372 ESLR 424 FDPER 447 FSNGPRNCIG CYP3338A1 521 330 AAVDTT 387 ESMR 439 FKPKR 462 FGYGVRMCLG CYP3341A1 501 309 AGYDTS 366 ETLR 418 FKPER 439 FGAGPRNCIG CYP3341A2 501 309 AGYDTS 366 ETLR 418 FKPER 439 FGAGPRNCIG CYP3341A3 502 309 AGYEST 366 ETLR 418 FKPER 439 FGMGPRNCIG CYP3341A4 228 123 AGYDTS 182 ETLR 228 FPDPE 255 FGMGPRNCIG CYP3341A5 225 34 AGYDTS 93 ETLR 139 FPDPE 166 FGMGPRNCIG CYP3341A6 227 34 AGYEST 91 ETLR 143 FKPER 164 FGMGPRNCIG CYP3341A7 226 34 AGYDTS 91 ETLR 137 FPDPE 164 FGMGPRNCIG CYP3341A8 226 34 VGYDTS 91 ETLR 143 FKPER 164 FGMGPRNCIG CYP3350A1 499 302 AGSDTT 358 EVLR 411 FQPER 433 FSAGKRICLG CYP3350B1 498 305 TGTETT 360 EVLR 413 FAPER 435 FSVGKRVCLG CYP3350B2 220 27 TGTETT 82 EVLR 135 FAPER 157 FSVGKRVCLG CYP3351A1 316 122 AGVDTT 179 ETLR 231 FMPER 254 FSIGRRRCPG CYP3351A2 469 275 AGVDTT 332 ETLR 384 FMPER 407 FSIGRRRCPG CYP3352E1 499 302 AGSETT 359 ETLR 412 FIPER 435 FSAGRRVCLG CYP3352E2 499 302 AGSETT 359 ETLR 412 FIPER 435 FSAGRRVCLG CYP3352F1 495 302 AGAETT 358 ETLR 411 FRPER 434 FGAGRRVCVG CYP3352F2 495 302 GGAETT 358 ETLR 411 FRPER 434 FGAGRRVCLG CYP3352F3 487 291 AGSETT 347 EIFR 399 FKPER 422 FGAGRRGCLG CYP3352F4 503 302 SGAETT 358 ETLR 411 FKPER 434 FGAGRRVCIG CYP3352F5 328 127 SGAETT 183 ETLR 236 FKPER 259 FGAGRRVCIG CYP3552G1 506 300 AGSETT 357 EVLR 410 FNPER 433 FSAGRRVCLA CYP3352H1 498 303 AAIDTT 360 ETLR 413 FKPER 436 FSAGKRVCFG CYP3352H2 322 127 AAIETT 184 ETAR 237 FQPER 260 FSAGKRVCFG CYP3352H3 498 303 AAIETT 360 ETAR 413 FQPER 436 FSAGKRVCFG CYP3352H4 498 303 AATETT 360 ESAR 413 FQPER 436 FSAGKRVCFG CYP3352H5 322 127 AATETT 184 ESAR 237 FQPER 260 FSAGKRVCFG CYP3352H6 498 303 AAIETT 360 ETAR 413 FQPER 436 FSAGKRVCFG

Table 3. Te occurrence and position of conserved cytochrome P450 motifs in the A. aurita CYPome. ‘Length’ indicates the total length of the translated protein sequence, and ‘AA’ is the position of the frst amino acid residue for each motif. Bolded residues indicate residues that were conserved.

genes that were most closely related to CYP5 genes. Similarly, all of the cnidarian CYP4 genes formed a distinct group within the clan 4 clade most closely related to CYP4V genes, and each of the genes belonged to new CYP4 subfamilies. Each of the phylogenetic trees included annotated CYPs from vertebrate species. Inclusion of invertebrate CYPs from D. melanogaster, C. elegans, and C. teleta interfered negatively with tree construction, ofen resulting in low bootstrap support (< 50%). Tis has been reported in other studies using both vertebrate and invertebrate ­sequences16,18. Even with the vertebrate sequences, the deeper branches that separate the clans in Fig. 3 had some bootstraps under 50% (not shown on the tree), likely due to the evolutionary distance that separates the species between the various phyla. However, the relationship between the clans on this tree remains consistent with trends that are commonly seen in maximum-likelihood CYP analysis­ 12,13,21 and bootstrap support for the individual clans remained high. Te tree was rooted with the CYP51 family because CYP51 is the only CYP family found in all domains of life (including bacteria and viruses), and thus is thought to be the frst eukaryotic ­CYP44,10; CYP51 genes are commonly used to root trees in CYP phylogenetic analyses­ 19. Te same root was chosen for the clan 2 phylogeny and the mitochondrial clan CYP11 family was used to root the clan 3 and 4 phylogeny. Subtree roots were chosen from clans that typically form basal clades to the main clans being analyzed in those subtrees­ 12,13.

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 6 Vol:.(1234567890) www.nature.com/scientificreports/

r r

C C C C C u CY r u C r Y ur r Y Y

CY Y Y a u

e CYP2 YP4 a P P4 P r P46D1 P Aa r CYP2 P46G1_Nv CY A C p A 4 us _ 4

4 P D o CYP3346A _ Aa YP20A1 a * _ CYP20A1 6 Aau 6 1 * 6

6 CYP3 _ * _ 46C1_Nve n 2 *** s * P F 6 * F CYP E

A * 2 CY * A ap c

* R B H * B1_N 2 Mm 3 0N1_Aaur 1 B3 1 7 s er c

1 A B1 _ 6 0N1_Hv _ r 346A2_Ad * _ _ r _Nvec 6 * _ 3 C

_ _ 6 6 H P3346A5_ 6 3 rer 6 * N 6 3 D u C N 3

D Nv 4 _ YP 3 * 4 3 346A 4 D 4 3 _ l v * C YP 33 3 * v 3 P r P _ P * vul P * e _ l

* e 46A vec 3 2_Nve YP3345A1 e CY e P 3 Y Y e Y vu Hsap c H r * l c Z 3 _ YP3 P 34 c 3_Adig Y c V8 CYP C C vu * C * D u 3 1_ CY P c C 1_ * ul C P4V7 v 47 6 6 CYP3 * ul CYP3343A1_ r l 3343A2_ CY _H * A4 P e Y 4KZ1_Nve B A CYP3336A1_ * H v P4K CYP33 A ig CYP4V2 YP4 3 3* _A Adig r 4KY1_Aa A5_H di C * H C 343A 1_ C P A 4_ * _ YP * Hvu r g CY L YP3 A d Y A 1_ CY C _ Nvec 4 * re * YP4L L _Nvec di i C C g 4 D sap * C LA A2 sap P 44A1_Ad 3 g _ YP3 3 Nvec CYP4L P *3348 _ L C 4 N CYP Y P4 4 YP33 8B T8 1_H _H p * C Y CYP ve 4 Adig B 3 C p * 48 5_ 1_Hsap sa B a C c CYP YP 1 s * 4_ P4 4A22 s Y 3348B6 48 B Nvec C H p CY P i P 4A * 3 N g CY _ a 334 B2_ Y _Nve 1 Mmu s C P ve 4Z1_H * C X _ H er YP 3 1 _ r C 348B6_8B1 Nve c CYP 4 * _ P X 2 p Y 3348B c Nv CYP Y 4 C P3 _N P F2 * c C Hsa Y Y ap e _ p P33 34 v c C s * 4F43_D 3 H e 8A 7_ Adig P _ p c CYP4 4F * 4 Y 2 sa CY 8 2_Adig A P F A dig C H C 11_Hsa_ Y P 1_Adig CY P4 F dig P1 1 CY _M CY F12 vec P1 A 5_A * 2_ mus CYP4 C A1_H 2A 1_N *vec YP1A H CYP4 4 c CY sa B _N e YP4F8_Hsap 2 *v p C N CYP1C2_DrerP s YP33 _ ec 1D1_Drer_D a 3342 1 *v C p C CYP1C re YP 2 _N ec C P3342C4 2 *v r 3 A c CY 3 _N e CY P 1 * P 1 Y 342 1B _ C 3 _Nv c CYP30 C Drer P 42A *ve 1_Hsa 3 YP1 Y A3 r CYP309 C 42 _N 3 * B 94 CYP3 3 A4 r CYP309* H1 _Dre p P 42 u 3 * _ CY Aa * 4C1_N Ad r 41A1_Aau_ C 3 Y CYP3 A2 * ig P3 4C2_Nv P3 1 r * v Y 34 5_Aaur C YP3094094 ec C A *au CY* F3_ ec CYP3 * P F N YP3341 41A4_A r * 309 5_Nvecvec C 3 CYP 4 P3 1A6_Aaur* Y _Aau * 30 F4_ C 3 r C 9 A au Y 4F N 41 A* P3 ve * 2 CYP334 ur C 094 _N c P33 A7_ a YP3094D1_N v * _A * F1_Nvecec CY 3341 CYP P ec 41A8 * * 3 CY 33 1_Nv c 094 C CY ve 75 39 Nve* P309 E1 CYP 3 _ * _N c CYP 4E4_Nvev 8 CYP3 * ec 6 3339B1 * 3094E B3_Nvecec CY CYP * P 5 c 3339 * 30 _Nvec P g 9 9B2_Nv CYP3 4E CY Adi* 3 * 094E _N 1_ g CYP3094G1_ vec CYP333 di 2_Nvec _A * * YP3339A 2 C C Y 1 3_Adi*g P 30 Nv 100 0 CYP3339A9A

* 94 0 CY ec P3094B2_B1 1 P333 4_Adi*g _Ad 0 8 Y 0 C c * 9 ig 339A CYP3094J1_A 8 * * Nve 9 CYP3 2B1_Nve c c CYP3094J2_N _Nve* CYP366 A1 * dig 62 36 CYP YP * 3094 ve C 2A2_Nvec * c 366 c K P CYP3094 1_A CY 3_Nve* dig * ec K P3662A v CY 2_A CY * P30 dig A2_N * 94A 1 Nvec _Nv CYP3340 A1_ * * CYP1 ec 0 7 _Mmus CYP334 rer * CYP1 7_Hs 74 CYP5A1_D p ap _Hsa CYP17 P5A1 _Xla CY e CYP1 P3_Rnor 7_Gga CY p l 60 CYP 17A1_Dr CYP3A43_Hsa er sap CYP17 CYP3A4_H A2_Dre p r 7_Hsa CYP21A1_Drer CYP3A v1_Hsap CY 96 P21_Mmus CYP3A5_ 5_Drer CYP21A 100 CYP3A6 2_Hsap C4_Drer CYP334 CYP3 9B1_Nvec 100 C1_Drer CYP3 10 CYP3 349A2_Nvec 0 * _Drer CYP3 0 CYP3C3 349A1_Nvec 10 * YP3C2_Drer C CYP3352H6_Aaur * CYP51A1_Drer CYP3352H2_Aaur * CYP51_Hsap CYP3352H5_Aaur * 100 CYP39A1_Drer CYP3352H4_Aaur 100 98 * CYP39A1_Hsap CYP3352H3_Aaur * 100 CYP8A1_Drer CYP3352H1_Aaur 99 CYP8A1_Hsap * aur CYP3350A1_A 98 CYP8B1_Drer * _Aaur P3350B2 100 CY CYP8B2_ r Drer * 50B1_Aau 33 CYP CYP8B3_ ur Drer * 51A1_Aa 3 CY CYP3 90 P8B1_H Aaur sap * 351A2_ CY P CYP3 0 7A1_Dre l 10 r * 3352A7_Hvu CYP7 P A CY ul 1_Hsa A6_Hv 74 p * P3352 CYP7D1 CY vul _Dre CY r * P3352A3_H P7B1_ CY vul 65 Hsap 2A2_H CY * 335 P7C1 YP l _ C _Hvu Drer 1 CYP16 * P3352A5 l 0 D1_N CY vu 1_H 0 CY vec 352A P16A1_T * YP3 ul v C 4_H CY rub 352A P16A1_ * YP3 vul C H Dn 1_ 10 CYP26B1_Hs ig * A 354 * YP3 vul 0 100 C C 3A1_H YP2 ap ul 6B1 * CYP335 _Hv CY _D C1 P rer 352 26C1_Hsa * CYP3 CYP 2C2_Hvul l 26C p 335 vu 1 P _H 10 CY _Dr * CY 100 P B1 26A er 352 vul 0 _H 71 CYP2 1_Dre * CYP3 3 l 2D vu 6A r 35 H CY 1_H * CYP3 P333 sap vul _H CYP 8A D1 l 3 1_A * CYP3352D4_52 vu 77 a C 3 _H YP B1_N ur YP3 l 3 * C CY 77A1_N vec P ur 37 v * CYP3352D2 Aa CY 5C ec _ 1 1 99 P3 _Nv * 2E aur CY 75B * CYP3352D5_Hvu35 ec 2_A r P 3_Adi * P3 3 2E u CY 75B1_N * CY _Aa P g 335 F4 r 37 * C CYP 52 Aau Y 5B2 ve * 33 _ r P3 c F5 CY 75 _N * YP vec C 352 _Aau P A1 * 3 ur 3 _ a C 75 N * YP E ve CYP3 352F 3 1 c * aur C 75 _ P3 A Y A * 52F2_A E a CY _ P2 2 ur * F1 aur CYP2 4 _A YP33 2 A A1_Mm a * 5 C 3 _ ec CY ur * G1 4_ 2 Nv P Hsap * _ ec C 24A1 u CYP3 355 v YP s * 5B2 N c * 5 3_ CYP 2 _ CYP 33 7 D * B Nve c C1 re _ CYP2 * 55 2 _ r CYP 33 C1 ve 7C1_H Dre * P _N ec C 50 1 Y 7 r CY 33 Nv g P A P _ CYP27A6_Dr2 3_ sa * 355B di 7 p r A D CY 3 5A1 e CYP2 7 rer 1_A _ * 35 C Dr r Dre CYP 3 _ e CY P Dr 7 r * _ CY P A CY P3355 1 27A4 5_ er Y U C P 2 2 Dr * C CYP2R1 sap YP2 P p 7 _ er Y a CY A Dr * C 7 1 9_H p P27B1 _ e 2C19_Hsap Hs C A H r P _ YP 1 * 8 sa C s _ P2C 1 ap a CY Y _H Y 27 M p C 8 s s C P27B1_ m * C 2 u B _Xla _H YP us P m CY 1_ 3 CY E1 sap CYP1 D e YP2C P 7 r C ap 37 5D1 er E1_M s p CYP1 Hs CYP2 2 H 5D2_Nv P C 1B _Nv a 7v2_H 6_ YP1 p A sap C CY ap 1B 1_Dre C YP ec P2 P2A p Y 1B 1 Y C 1 _H e C B6_H P 1A c CY 1_Hs Hsa rer CY YP 2 1 _H r S D 1A s P2 1_ C 11 1 a CYP2A13_HsaY Drer Y P1 _ sa p P2 rer C 2 Dre C P A1 D rer CY Y 1 _D p CY 2Y3_ _ P 1 A1 P r CYP 1 _G 8 19A2_Dr r r * e P CYP2F Y Drer r CY A e K 6_D r _ C 1 1_ g r C D M * 9 P2 0_ C Y P 19_Hs a _ Dre A m CYP2Y4_ CY H l Y _ r YP P2X1 2X Drer C 1_Dr s us C 9 CY a K18 YP P 2 11 YP2K1 C p _Drer er r e 2 CY 2 X C YP _ a r C P 2 X9_ 1 0 e 17 _Drer CYP2AA p CYP2K2 r D CYP2AA4 YP2AA 2 X7 2_D _ r 7 sap CY P 2X X D rer CYP r C 2K CYP2 2A K C rer CYP2 8_ _Dr Drer re Y er r CY CYP2K1 YP2K21_ C P re r C C er r 6_D Y P r r r r D r r P C Y r r Y A7_Dr er D e Y Y D 2AA8_Dr e P e

e 2 _ re P C r e P r r P P r _D 2 AA Dr 2_ er r CYP2K22_Dre 2 D A r CYP2 2 D 2 D 2 D A _Dre 9 A er YP2K6_Dre _ V1 AA 2N13_Drer _ A 9_Dre _ D _ J 14 _ A1 0 _Dr A D P8_D A1 CYP2K31_Dr C 6 3 _Dre P 2 3 2 6 P A 6

1 r er P7 v P Y _ D _ 6 D D _ e 1 CYP2W1_H 0 2 2 2 2 YP2AE1_Dre H e _ C H Dr r 1 A A A _ P _ r P Drer C CYP2 s er _ YP2 2 s YP2P 2 2 D r CYP2AE2_ Dr e CYP2 Y a D a P C P C P re r CY p p C re

Y Y e CYP2P1 Y

r r

C C

C r

0.5

Figure 2. Maximum likelihood phylogenetic tree of cnidarian cytochrome P450 enzymes. Te tree was generated with rapid bootstrapping and a gamma distribution with the JTT substitution matrix. Each CYP clan was assigned a diferent colour on the tree branches: clan 2 is purple, clan 3 is light green, clan 4 is blue, clan 7 is brown, clan 19 is red, clan 20 is dark green, clan 26 is orange, clan 46 is salmon, clan 51 is yellow, mitochondrial clan is turquoise. Te fve A. aurita CYPs that do not ft into established metazoan clans are light grey. Te cnidarian CYPs are highlighted in dark grey on the perimeter of the tree. Te diferent coloured asterisks indicate which cnidarian each CYP is from: H. vulgaris is brown, A. digitifera is black, A. aurita is yellow, N. vectensis is turquoise. Te tree is rooted with the CYP51 family. Bootstrap values for each CYP clan, major CYP family divisions within each clan and major cnidarian clades are indicated on the tree. Bootstrap support for the placement of the branches that separate the individual clans was low (less than 50). Species acronyms are based on the family and species: Drer—zebrafsh, Ggal—chicken, Hsap—human, Mmus—mouse, Rnor- rat, Xlae— frog, Trub—fugu puferfsh, Dnig—green puferfsh, Hvul—brown hydra, Adig—stony coral, Aaur—moon jellyfsh, Nvec—starlet sea anemone.

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 7 Vol.:(0123456789) www.nature.com/scientificreports/

Figure 3. Te distribution of cytochrome P450 genes among the metazoan CYP clans in four cnidarian species. (A) Te proportion of each species’ CYPome that falls within the major metazoan clans (clan 2, clan 3, clan 4 and the mitochondrial clan). Te number of CYP genes included in the analysis is indicated under each chart and includes all of the complete and partial CYPs from each species. Clan 16, clan 20 and clan 46 CYPs, as well as fve A. aurita CYPs that did not fall discretely within any of the established metazoan CYP clans were designated as ‘Other’. (B) Te presence or absence of each of the twelve metazoan CYP clans within each cnidarian species. Each column represents a diferent clan, and each row represents a diferent species. A flled black dot means that the clan was identifed in the complete or partial genes, a white dot with a black outline means that the clan was identifed in the fragmentary genes, and a white dot with a grey outline means that the clan was not found in the data. We have included CYP16 in Clan 16 because this clan designation has recently been determined by the nomenclature committee and other phylogenetic analyses support that CYP16 forms a separate clan­ 61. In our phylogenetic analyses, the CYP16 genes clustered with Clan 26 genes (Fig. 2).

Discussion Cnidarian CYPomes. Te genomic analysis of H. vulgaris, N. vectensis, A. digitifera and A. aurita demon- strated that cnidarian CYPomes are highly diverse, with 24 novel CYP families and 12 novel subfamilies involv- ing more than 85% of the predicted cnidarian CYP genes. Tis means that a majority of cytochrome P450 genes in cnidarians are less than 40% similar to all other documented genes in this superfamily, a testament to more than 500 million years of natural selection and ­diversifcation26. Tere are other cnidarian gene families which have proven to be unique from their metazoan counterparts. For example, innexin transmembrane proteins identifed in H. vulgaris were only 25% identical to similar genes in protostomes­ 39. Similarly, while there are many identifed precursors of neuropeptide signaling genes in N. vectensis (such as RPamides, RWamides, and VRamides), none of them are confrmed orthologs of bilaterian neuropeptides­ 40. However, there is not always a signifcant diference between cnidarian sequences and other studied metazoans. Te Wnt signaling protein family is largely conserved between cnidarians and vertebrates, as most major subfamilies are present in both

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 8 Vol:.(1234567890) www.nature.com/scientificreports/

CYP1_Rnor CYP1A2_Hsap CYP1A1_Hsap 100 CYP1A_Drer CYP1D1_Drer CYP1C1_Drer CYP1B1_Hsap CYP1B_DrerYP1B_Drer CYP3347A1_NveCYP3347A1_Nvec 79 CYP3346A2_AdiYP3346A2_Adig CYP3346A3_AdiYP3346A3_Adig* CYP3346A6P_AdiCYP3346A6P_Adig* CYP3346A1_AdiYP3346A1_Adig* CYP3346A5_AdiYP3346A5_Adig * CYP3346A4_AdiYP3346A4_Adig* CYP3344A1_AdiYP3344A1_Adig* 88 CYP3343A1_AdiYP3343A1_Adig* CYP3343A3_NveCYP3343A3_Nvec* CYP3343A2_NveCYP3343A2_Nvec* CYP3345A1_NveCYP3345A1_Nvec* CYP3348A1_AdiYP3348A1_Adig* CYP3348A2_AdiYP3348A2_Adig* CYP3348B1_NveCYP3348B1_Nvec* CYP3348B6_NveCYP3348B6_Nvec* CYP3348B2_NveCYP3348B2_Nvec* CYP3348B3_NveCYP3348B3_Nvec* CYP3348B5_NveCYP3348B5_Nvec* 100 CYP3348B4_NveCYP3348B4_Nvec* CYP3348B7_AdiCYP3348B7_Adig* CYP3348B6_AdiCYP3348B6_Adig* CYP2U1_DreCYP2U1 D r * CYP2J2_Hsap * CYP2N13_Drer 94 CYP2P6_Drer CYP2W1_Hsap CYP2K6_Drer CYP2C8_Hsap CYP2E1_Hsap CYP2F1_Hsap CYP2A6_Hsap CYP2S1_Hsap CYP2B6_Hsap CYP2X6_Drer 91 CYP3355C1_AdiYP3355C1_Adig CYP3355A1_NveCYP3355A1_Nvec CYP3355B1_NveCYP3355B1_Nvec* CYP3355B2_NveCYP3355B2_Nvec* CYP3350C1_NveYP3350C1_Nvec* 82 CYP3355B3_NveCYP3355B3_Nvec* CYP3094H1_AdiCYP3094H1_Adig* 100 CYP17A2_DrerCYP17A2D * CYP17_Hsap * CYP17_Mmus CYP17A1_Drer CYP17_Ggag CYP17_Xlae CYP21A1_Drer 80 CYP21_Mmus 100 CYP21A2_HsapCYP21A2_Hsap CYP3349B1_NveCYP3349B1_Nvec 76 CYP3349A2_NveCYP3349A2_Nvec CYP3349A1_NveCYP3349A1_Nvec* CYP3094A1_NveCYP3094A1_Nvec* CYP3094K1_AdiYP3094K1_Adig* CYP3094K2_AdiYP3094K2_Adig* 71 CYP3094B1_AdiCYP3094B1_Adig* CYP3094B2_NveCYP3094B2_Nvec* 79 CYP3094F5_NveYP3094F5_Nvec* CYP3094F4_NveYP3094F4_Nvec* CYP3094F3_NveYP3094F3_Nvec* CYP3094F1_NveYP3094F1_Nvec* CYP3094F2_NveYP3094F2_Nvec* CYP3094G1_NveYP3094G1_Nvec* CYP3094E4_NveYP3094E4_Nvec* 71 CYP3094E1_NveYP3094E1_Nvec* CYP3094E3_NveYP3094E3_Nvec* CYP3094E5_NveYP3094E5_Nvec* CYP3094E2_NveYP3094E2_Nvec* CYP3094D1_NveCYP3094D1_Nvec* CYP3094C1_NveYP3094C1_Nvec* CYP3094C2_NveYP3094C2_Nvec* CYP3094J2_NveCYP3094J2_Nvec * CYP3094J1_AdigCYP3094J1_Adig * CYP3351A2_AauYP3351A2_Aaur* CYP3351A1_AauYP3351A1_Aaur* CYP3350A1_AauYP3350A1_Aaur* CYP3350B1_AauCYP3350B1_Aaur* CYP3350B2_AauCYP3350B2_Aaur* CYP3552G1_AauCYP3552G1_Aaur* CYP3352H1_AaurCYP3352H1_Aaur* 80 CYP3352H3_AaurCYP3352H3_Aaur* CYP3352H5_AaurCYP3352H5_Aaur* CYP3352H4_AaurCYP3352H4_Aaur* CYP3352H2_AaurCYP3352H2_Aaur* CYP3352H6_AaurCYP3352H6_Aaur* CYP3352E1_AauCYP3352E1_Aaur* CYP3352E2_AauCYP3352E2_Aaur* CYP3352F1_AauCYP3352F1_Aaur* CYP3352F2_AauCYP3352F2_Aaur* CYP3352F3_AauCYP3352F3_Aaur* CYP3352F5_AauCYP3352F5_Aaur* CYP3352F4_AauCYP3352F4_Aaur* CYP3353A1_HvuYP3353A1_Hvul* CYP3354A1_HvuYP3354A1_Hvul* CYP3352A1_HvuYP3352A1_Hvul* CYP3352A4_HvuYP3352A4_Hvul* CYP3352A7_HvuYP3352A7_Hvul* CYP3352A6_HvuYP3352A6_Hvul* CYP3352A3_HvuYP3352A3_Hvul* CYP3352A2_HvuYP3352A2_Hvul* 91 CYP3352A5_HvuYP3352A5_Hvul* CYP3352B1_HvuCYP3352B1_Hvul* CYP3352C1_HvuYP3352C1_Hvul* CYP3352C2_HvuYP3352C2_Hvul* CYP3352D1_HvuCYP3352D1_Hvul* CYP3352D3_HvuCYP3352D3_Hvul* CYP3352D4_HvuCYP3352D4_Hvul* CYP3352D5_HvuCYP3352D5_Hvul* CYP3352D2_HvuCYP3352D2_Hvul* CYP51A1_DrerCYP51A1D * CYP51_Hsap 100 *

0.4

Figure 4. Maximum-likelihood phylogenetic tree of cnidarian clan 2 cytochrome P450 enzymes. Te tree was generated with rapid bootstrapping and a gamma distribution with the JTT substitution matrix. Te CYP1 family and a sister cnidarian clade are salmon. Te yellow branch identifes a large group of cnidarian CYPs and the vertebrate CYP17 and CYP21 families; this branch is further divided based on evolutionary distance to the vertebrate CYPs (blue is almost all CYPs that clustered directly with the vertebrate sequences, while the red and green groups both clustered outside of the vertebrate sequences). Te cnidarian CYP names are highlighted in dark grey on the tree. Te diferent coloured asterisks indicate which cnidarian each CYP is from: H. vulgaris is brown, A. digitifera is black, A. aurita is yellow, N. vectensis is turquoise. Te tree is rooted with the CYP51 family. Bootstrap values for major CYP family divisions and major cnidarian clades are indicated on the tree. Species acronyms are based on the family and species: Drer—zebrafsh, Ggal—chicken, Hsap—human, Mmus— mouse, Rnor- rat, Xlae—frog, Hvul—brown hydra, Adig—stony coral, Aaur—moon jellyfsh, Nvec—starlet sea anemone.

­phyla41,42. Additionally, an examination of the SOX transcription factor gene family in cnidarians identifed clear bilaterian orthologs along with expected expression ­patterns43. Tis makes it clear that certain gene families arose in an early metazoan ancestor and remained largely unchanged in cnidarians, while others such as the cytochrome P450 superfamily arose early in the history of metazoans but were subject to signifcant change in this phylum. Using sequence similarity to known genes does have its challenges, particularly when there is bias in the taxonomic sampling across the tree of life. Much of the known CYP genes are from eukaryotes and of the metazoan sequences, vertebrates and insect sequences dominate (see the Cytochrome P450 Homepage; https://​ drnel​son.​uthsc.​edu/​Cytoc​hrome​P450.​html). Tus, it will be highly interesting to reassess the unique cnidarian genes and determine which are the result of duplications within the cnidarian lineage and which have non- vertebrate orthologs. Much better taxonomic sampling, both within and near the cnidarians, would be helpful to address this question. Tere is clear diversity in the cytochrome P450 complement across the cnidarians as a phylum. Figure 3 demonstrates that, when considering gene number, proportion of CYPs belonging to particular clans, and the presence/absence of diferent CYP clans, none of the four species considered were exactly the same. Diferences between the N. vectensis and H. vulgaris genome have been well documented, since they were the frst two cnidar- ian species to be fully sequenced; both in terms of general observations such as genome size, but also in terms of other specifc genes such as the 301 family epitheliopeptides and the SWT domain of receptors and secreted ­proteins44. Publication of the A. digitifera genome additionally identifed that there are marked diferences in the repertoire of TIR-domain-containing proteins between A. digitifera, N. vectensis, and H. vulgaris24. Te four cnidarian species analyzed here difer in total CYP number and in the families and clans of CYPs that are present in their genomes. Tere are certain families that were exclusive to a particular species (Table S3); based on the phylogenetic history of the phylum­ 25,30 (Fig. 1B) it is possible to infer when these families arose relative to the speciation events that formed the diferent classes and species. Table S3 identifes the CYP families by species and indicates likely gene gain or loss by species or class. In general, N. vectensis has a much larger CYP gene complement than the other cnidarians and that is largely based on gene gains, even in CYP families shared with A. digitifera (e.g. 17 CYP3094 genes versus 5 in A.digitifera; 7 CYP3342 genes versus 1 in A. digitifera). Gene expansions are common among CYP families in other species, including those as diverse as mouse, sea urchin,

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 9 Vol.:(0123456789) www.nature.com/scientificreports/

95 CYP3337A1_Aaur CYP3336A1_Aaur CYP3336B1_Aaur * CYP3336B3_Aaur * CYP3336B2_Aaur * CYP4_Xlae * CYP4V2_Hsap * CYP4V7_Dre_ r CYP4LA3_HvulCYP4LA3_Hvul CYP4LA4_HvulCYP4LA4_Hvul * CYP4LA5_HvulCYP4LA5_Hvul * CYP4LA1_HvulCYP4LA1_Hvul * CYP4LA2_HvulCYP4LA2_Hvul * 100 94 CYP4LB1_HvulCYP4LB1_Hvul * CYP4KY1_AaurCYP4KY1_Aaur* 78 CYP4KZ2_Nvec CYP4KZ1_Nvec * CYP4F43_Drer * CYP4F2_Hsap * CYP4X1_Mmus CYP4A11_Hsap 100 CYP5A1_Drer CYP5A1_Hsa_pp CYP3340A1_NvecCYP3340A1_Nvec CYP3340A2_NvecCYP3340A2_Nvec * CYP3662B1_Nvec * CYP3662A3_NvecCYP3662A3_Nvec * CYP3662A2_NvecCYP3662A2_Nvec * 68 CYP3662A1_NvecCYP3662A1_Nvec CYP3339C1_NvecCYP3339C1_Nvec * CYP3339B2_NvecCYP3339B2_Nvec * 65 CYP3339B1_NvecCYP3339B1_Nvec * CYP3339B3_NvecCYP3339B3_Nvec * CYP3339A2_AdigCYP3339A2_Adig * CYP3339A4_AdigCYP3339A4_Adig * CYP3339A3_AdigCYP3339A3_Adig * CYP3339A1_AdigCYP3339A1_Adig * CYP3342C1_Nvec * CYP3342C2_NvecCYP3342C2_Nvec * CYP3342B1_Nvec * CYP3342A5_AdigCYP3342A5_Adig * CYP3342A1_NvecCYP3342A1_Nvec * 97 CYP3342A2_NvecCYP3342A2_Nvec * CYP3342A4_NvecCYP3342A4_Nvec * CYP3342A3_NvecCYP3342A3_Nvec * CYP3341A7_AaurCYP3341A7_Aaur * CYP3341A8_AaurCYP3341A8_Aaur * CYP3341A6_AaurCYP3341A6_Aaur * CYP3341A3_AaurCYP3341A3_Aaur * CYP3341A4_AaurCYP3341A4_Aaur * CYP3341A5_AaurCYP3341A5_Aaur * CYP3341A1_AaurCYP3341A1_Aaur * CYP3341A2_AaurCYP3341A2_Aaur * CYP3C1_Drer * CYP3C4_Drer * CYP3A5_v1_Hsap 100 CYP3A7_Hsap CYP3_Rnor CYP11A1_Hsap CYP11B1_Hsap 100 CYP11B2_Hsap

0.3

Figure 5. Maximum-likelihood phylogenetic tree of cnidarian clan 3 and clan 4 cytochrome P450 enzymes. Te tree was generated with rapid bootstrapping and a gamma distribution with the JTT substitution matrix. Te Clan 3 CYPs are light green, the Clan 4 CYPs are dark blue, and the CYPs that do not fall within a clan are grey. Te cnidarian CYP names are highlighted in dark grey on the tree. Te diferent coloured asterisks indicate which cnidarian each CYP is from: H. vulgaris is brown, A. digitifera is black, A. aurita is yellow, N. vectensis is turquoise. Te tree is rooted with the CYP11 family. Bootstrap values for each CYP clan, major CYP family divisions within each clan and major cnidarian clades are indicated on the tree. Species acronyms are based on the family and species: Drer—zebrafsh, Hsap—human, Mmus—mouse, Rnor- rat, Xlae—frog, Hvul—brown hydra, Adig—stony coral, Aaur—moon jellyfsh, Nvec—starlet sea anemone.

and mosquito. Tese expansions are ofen in CYP families involved in responding to the chemical environment, although in many cases little is known about the enzyme function. Gene expansions have been previously identi- fed in ­Anthozoans22; what novel functions are aforded N. vectensis from the CYP gene expansions are not clear. Fourteen of the new families (CYP377, CYP443, CYP3094, CYP3339, CYP3340, CYP3342-3349 and CYP3355) were exclusive to the anthozoan class, meaning these likely arose afer the anthozoans diverged from the remainder of the cnidarians. Of these families, CYP3094, CYP3339, CYP3342, CYP3343, CYP3348 and CYP3355 are found in both A. digitifera and N. vectensis, indicating that these families likely arose afer the anthozoan split from the rest of the cnidarians, but before the split between the two species within this class. CYP3344, CYP3346, and CYP3348 are exclusive to A. digitifera while CYP377, CYP3340, CYP3345, CYP3347, and CYP3349 are exclusive to N. vectensis, which means these families likely arose separately afer the split of the lineages including these species. CYP443 was exclusive to N. vectensis in this gene set, but CYP443 genes have been found in A digitifera, A. palmate, A. millepora, Aiptasia pallida and Anemonia viridis (D. Nelson, unpublished data), suggesting this is gene family is not exclusive. CYP443 are in Clan74 and are found in only a few species. Five CYP families (CYP334, CYP3336-3338 and CYP3351) are exclusive to the scyphozoan species and just two (CYP3353 and CYP3354) are exclusive to the hydrozoan species. Te CYP3352 family is present in both H. vulgaris and A. aurita, which means it likely arose afer the anthozoan split, but before the split between hydrozoan and scyphozoans. Te presence of multiple species-exclusive CYP families is likely the result of exposure to diferent selective pressures and environments over time. Even the two anthozoan species, which are presumed to be the most similar to each other based on phylogenetics, diverged roughly 520–490 million years ­ago24. Tat means there was a signifcant period of time (> 500 million years) when these cnidar- ian lineages were exposed to diferent environments, had access to diferent resources, and were diferentially exposed to other factors that would infuence CYP evolution.

Cnidarian CYP Clans. Te CYP clan framework was implemented as a means to describe the recurring deep branching clusters observed in metazoan CYP phylogenies­ 9. Te clans provide a way to describe and char- acterize the complexity of CYP evolution between species. Multiple phylogenetic analyses have established cer- tain relationships between the metazoan CYP clans; for example, clan 3 and clan 4 consistently form sister clades in metazoan phylogenies­ 17,19,21. Tese same patterns are observed in the trees generated for this study, although the bootstrap support is not high for this grouping in this phylogeny (Fig. 2). Te presence and absence of each clan has been examined in multiple phyla, establishing expectations for which clans are present in diferent types of species. Almost all metazoans, excluding certain early-branching phyla such as ctenophores, have CYPs from clans 2, 3, 4, and the mitochondrial ­clan19. When fragmentary genes are considered, this is true for all of the

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 10 Vol:.(1234567890) www.nature.com/scientificreports/

cnidarian species analyzed (Fig. 3B). Large clan losses have been documented in certain phyla as well, particu- larly in nematodes and arthropods­ 19. Our analysis indicates that clans 7 and 51 were lost in the cnidarian phylum as a whole; CYP19 is not present in any cnidarian genome (Fig. 3B). Examining the phylogenetic relationships between the cnidarian CYPs and clans provides conservative hypotheses about what types of CYPs are present in these species, based on the evolutionary distance between the sequences. Clan 2 genes were especially abundant in the cnidarian CYPomes and are discussed in detail, as is the particularly interesting loss of clan 51.

Clan 2. In general, the clan 2 genes are the most prevalent CYPs in all four cnidarians (approximately 50% of the N. vectensis and A. aurita CYPomes and approximately 75% of the A. digitifera and H. vulgaris CYPomes), which is also the case in humans­ 7, ­zebrafsh12, the tunicate Ciona intestinalis14, the marine annelid Capitella teleta18, and the echinoderm Stronglyocentrotus purpuratus15. Te clan 2 phylogenetic tree (Fig. 4) elaborates on the relationships seen in the larger phylogeny (Fig. 2), with high bootstrap support. None of the cnidarian genes clustered with the CYP2 family, which is prominent in vertebrates and includes enzymes that catalyze arachido- nate and drug metabolism. In contrast, CYP1 genes and other CYP families (sometimes referred to as “CYP1- like”) have frequently been found to closely cluster with CYP1s in other species, such as ­tunicates14, ­annelids18, and the sea ­urchin15. In cnidarians, the CYP3343-CYP3348 families from N. vectensis and A. digitifera, formed a large clade with the CYP1 family, making these cnidarian genes strong candidates for xenobiotic metabolism. Te CYP1 family is critical for the metabolism of planar, halogenated compounds such as polycyclic aromatic hydrocarbons in ­vertebrates6,7 and there is evidence that cnidarians (N. vectensis in particular) respond to PAHs and weathered oil through increased expression of certain CYPs as well as antioxidant enzymes such as superox- ide dismutases and ­catalases45,46 (Berger and Tarrant personal communication). A large group of cnidarian clan 2 CYPs form a clade with the CYP17 and CYP21 families, which are directly linked to steroidogenesis in vertebrates. The CYP17A1 gene in humans, also known as steroid 17α-monooxygenase/17,20-lyase, oxidizes pregnenolone or progesterone, which is one of the frst steps in andro- gen or corticoid synthesis following cholesterol side-chain ­cleavage47. Te CYP21A1 gene, or steroid 21-hydroxy- lase, catalyzes the hydroxylation of progesterone and 17α-hydroxyprogesterone, one of the intermediate steps in the synthesis of the stress hormone ­cortisol48. Te CYP3094 and CYP3349 families in N. vectensis and A. digitifera form a clade immediately basal to the CYP17/21 genes, while the CYP3355 families were clearly outside of this clade (Fig. 4). Tere are a total of sixty-seven cnidarian enzymes in this large group of CYPs (25 N. vectensis, 6 A. digitifera, 17 H. vulgaris and 19 Aurelia), with particularly large expansions of the CYP3352 family in H. vulgaris and A. aurita and the CYP3094 family in N. vectensis and A. digitifera. Most species typically have a fairly low copy number of steroidogenic CYPs (for example, humans have only one CYP17A1 gene­ 7 and zebrafsh have only ­two12), which means it is unlikely that all of these genes are steroidogenic. Similarly, the presence of multiple clan 46 genes in N. vectensis is surprising. Te CYP46 family, or cholesterol 24-hydroxylase, modifes cholesterol so that it can pass through the blood–brain barrier in humans and be metabolized in the ­liver49. In vertebrates these genes are usually present in low copy number, with only one in ­humans7 and two in ­zebrafsh12. CYP46 copy number is likewise low in the scallop Chlamys farreri (2)50, the polychaete C. teleta (1)18, and the ctenophore Mnemiopsis leidyi19 but was much higher in four species of Brachionus rotifers (7–8 per ­species51). Tere are seven CYP46 genes in N. vectensis. Although the N. vectensis nervous system lacks a brain-like centre or core, the molecular function of these CYPs is presumed to be similar to their vertebrate homologs based on high sequence similarity­ 52. Te capacity to synthesize estrogen de novo by cnidarians has been demonstrated in the anthozoans Scle- ractinia53 and Montipora54. In vertebrates, select CYP families (CYP11, CYP17, CYP19, CYP21, CYP51) and hydroxysteroid-dehydrogenase enzymes play important roles in steroidogenesis, including estrogen production­ 55. However, there were few cnidarian CYPs that qualify as candidate homologs of vertebrate steroid synthesis genes. No cnidarian CYP gene clustered with the CYP19 or CYP51 families and the cnidarian CYPs in the mitochondrial clan clearly do not cluster with the CYP11 family (Fig. 2). Tus, if CYPs play a major role in de novo estrogen production in cnidarians, it is likely that the CYP family (or families) involved are phylogenetically distinct from the vertebrate genes, and pinpointing the exact genes will be challenging. Te best candidates in our results are the CYPs in phylogenetic proximity to the steroidogenic CYP17 and CYP21 vertebrate families (Fig. 2), such as CYP3094K1, CYP3094K2, and CYP3094A1 from the anthozoans N. vectensis and A. digitifera.

Clan 51. Clan 51, which consists entirely of the CYP51 family, was not present in any of the cnidarian genomes analyzed in this study, even when potential pseudogenes or gene fragments were considered. Te lack of CYP51 genes is interesting as this family encodes lanosterol 14 α-demethylase, which catalyzes an essential step in the biosynthesis of ­cholesterol56. Tis family has been found in molluscs, annelids, and sponges in addi- tion to vertebrates, but is absent in insects, crustaceans, and nematodes­ 19, and some ­tunicates57. Te absence of this family from the cnidarian species studied here implies that these species lack de novo cholesterol synthesis and would need to receive this important signaling molecule from their environment and diet. Studies on the evolution of the cholesterol biosynthesis genes found losses in ­Cnidaria58. Tis is in line with research that indicated cholesterol as the most prominent sterol (50–63%) in lipid samples of cnidarians as a result of their generally carnivorous diet­ 59. Tere is varying data regarding de novo cholesterol biosynthesis in cnidaria­ 57, as for example, some, but not all, anthozoa tested in one study exhibited de novo sterol synthesis­ 60. Tus obligate dietary cholesterol may be a character that varies within cnidaria, and indeed even within anthozoa.

Hydra vulgaris. Te CYPome of H. vulgaris is of special interest due to several unique diferences from the CYPomes of other species. Tere were 24 complete CYP genes predicted in H. vulgaris and only 11 fragments that may or may not be functional. Tis is smaller than most reported metazoan CYPome; to date, the sponge

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 11 Vol.:(0123456789) www.nature.com/scientificreports/

Amphimedon queenslandica with 35 genes­ 19 and the tomato russet mite with 23 genes­ 61 have the smallest CYPomes. Te results for H. vulgaris are a large departure from the 69 predicted CYPs in N. vecten- sis, which is comparable to most vertebrate CYPomes (usually around 50–100 genes)19 and other species such as the crustacean Daphnia pulex (75 CYPs)17 and the sea urchin Stronglyocentrotus purpuratus (120 CYPs)15. While there were only 24 CYPs predicted in A. digitifera, there were 34 fragments, which means it is possible that this coral CYPome is considerably larger. Certain CYP families (CYP375, CYP3350) are present in the anthozoan and scyphozoan cnidarians, but absent in the hydrozoan H. vulgaris. Te phylogenetic history of the cnidaria (Fig. 1B) suggests that any gene families present in anthozoans and scyphozoans arose in the ancestral cnidarian and would be present in hydro- zoans as well unless lost. Tus, clan 3 genes were identifed in all of the cnidarian species except for H. vulgaris, yet clan 3 CYPs are believed to have been present in very early animals, before the origins of ­cnidarians19. As indicated in Fig. 3B, a single gene fragment in H. vulgaris had high sequence similarity to CYP3 genes from other species, but it was missing the critical CYP heme loop motif (Table S2) and may represent a pseudogene. Regardless, it is clear that clan 3 genes have undergone a signifcant reduction in H. vulgaris when compared to the other cnidarians (Fig. 3A). A similar reduction in gene count is observed in the mitochondrial CYP clan in H. vulgaris (Fig. 3B). All of this evidence suggests that extensive loss of CYP genes and clans has occurred in H. vulgaris, more so than the other cnidarian species analyzed, in agreement with observations by other researchers that consider- able gene loss has occurred in the Hydra genome as a whole. For example, while N. vectensis has roughly 140 homeobox genes from nearly 60 gene families, H. vulgaris only has about 50 homeobox genes from 30 gene families (no such reduction has been observed in A. digitifera)62. Te genomic origin of the Hox gene cluster is thought to be a close neighbour to the original cytochrome P450 locus that expanded to form the large CYP superfamily, which provides further support for the smaller Hydra CYPome­ 19. More than half of the H. vulgaris genome is composed of repetitive elements (including transposable elements) and estimates of protein-coding gene number are signifcantly lower in this ­species23. Tere is no clear explanation for this reduction in genome size and loss of multiple gene families in H. vulgaris, although it is correlated with lack of a larval stage in the H. vulgaris life cycle. Given the reduced H. vulgaris CYPome, determining the function of the novel H. vulgaris CYPs is of particular interest.

Annotation and phylogenetic analysis of cytochrome P450 genes. Precise gene annotation and phylogenetic analysis can be difcult when studying species that are distantly related to those with well-studied and defned CYPomes. Te CYPs used to search the cnidarian genomes were well curated genes from humans, D. rerio, D. melanogaster, C. elegans, and C. teleta. As Tables 1, 2 and 3 indicate, all four signifcant P450 motifs are generally well conserved in the cnidarian CYPs. Te I-helix motif ([AG]G-x-[DE]T[TS]) has been shown to play a role in proton delivery for the reactions that CYPs catalyze, while the K-helix (E-x-x-R) and meander coil (FDPER) are thought to stabilize the core structure of the protein through salt bridge ­interactions63. Te heme loop (F-x-x-G-x-R-x-C-x-G) is arguably the most important of the four because it facilitates association with the heme cofactor in the active ­site64. Other than a few known exceptions (for example, the CYP20 family is missing the I-helix motif and contain a deletion in the heme loop (K. Pankov, personal observation)—both observed in our cnidarian data), the large majority of metazoan cytochrome P450 enzymes maintained the presence of these four motifs and their relative position in CYP proteins was conserved. Of the predicted cnidarian CYPs, there were only a few predicted amino acid substitutions in the I-helix or meander coil and perfect matches to the K-helix and heme loop motifs in all cases, suggesting accurate prediction of gene models in these cnidarian genomes. However, multiple partial genes and gene fragments were identifed in each annotated genome, which could be the result of evolutionary processes or artifacts of genome sequencing and assembly (albeit 90% of the H. vulgaris and A. digitifera genomes are estimated to be present in the current assemblies)23,24. By our defnition, partial sequences had sufcient length to identify all conserved CYP motifs, with a strong match to the P450 HMM, and EST coverage when applicable; increasing our confdence that the partial genes encode functional CYP genes but that their sequences include genome assembly gaps. For example, certain partial CYPs on Con- tig36189 for H. vulgaris and Contig NW_015441134.1 for A. digitifera have sequencing gaps directly upstream of the predicted genes. Te predicted fragmentary genes are more problematic—additional research will be needed to determine if they refect real pseudogenes or genome sequencing artifacts. However, pseudogenes are common in the CYP superfamily, since much of the diversity of cytochrome P450s arose from multiple tandem gene ­duplications65, and there is clear evidence for tandem CYP gene duplication in all of the cnidarian genomes (Table S1). Tere were 13 potential pseudogenes in the annelid C. teleta ­CYPome18 and 17 in the vertebrate Fugu ­CYPome13, comparable to the fragmentary genes found in the cnidarian genomes: 11 gene fragments in H. vul- garis, 34 in A. digitifera, 31 in A. aurita, and 14 in N. vectensis.

Conclusion. Gene annotation has made it clear that there is signifcant diversity between cnidarian CYPomes, even when those species belong in the same class. Over 500 million years of evolutionary history has expanded cnidarian CYPomes in some clans (especially clan 2) and resulted in extensive loss in others (clan 3 and mitochondrial clan CYPs). Certain cnidarians have a relatively large and diverse set of CYPs (the starlet sea anemone and the moon jellyfsh), while others have a smaller, less diverse CYP complement (the stony coral and the brown hydra). Overall, cnidarian CYPs are found in nine of the twelve metazoan CYP clans; clan 16 is the most ­recent61. Yet, our analysis identifed 24 novel cnidarian CYP families. Tis presents exciting opportunities for discovery of new functional capabilities of the cytochrome P450 superfamily in the context of metazoan and cnidarian evolution. Tis study presents several cnidarian candidates for genes with detoxifcation roles (CYP3343-3348 in the anemone and coral) and many more from all four cnidarians that could be involved in

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 12 Vol:.(1234567890) www.nature.com/scientificreports/

­steroidogenesis66,67. All of the predicted functional cytochrome P450s in H. vulgaris, A. digitifera, N. vectensis, and A. aurita have been identifed with confdence. Functional assays and in silico methods such as structural modeling and substrate docking may provide additional clues to the role of these cnidarian enzymes. Methods Genome‑wide annotation of cytochrome P450 genes. Te frst versions of the genome assemblies for each species (H. vulgaris, A. digitifera, A. aurita) were searched for cytochrome P450 genes. Te Basic Local Alignment Search Tool (v2.2.31)68 was used to identify local alignments between the cnidarian genomes and a query that consisted of all annotated CYPs in humans and zebrafsh (vertebrate) and select CYPs in D. mela- nogaster (), C. elegans (nematode), and C. teleta (annelid) in a TBLASTN search (protein query against a genome nucleotide). Tough previous eforts had been successful with just vertebrate ­queries18, a large variety of sequences from across metazoans were included to maximize CYP gene identifcation. Only those BLAST high scoring pairs with expectation values of 1.0 × ­10–11 or smaller were considered signifcant. Te JBrowse genome viewer (v1.12.1)69 was used to manually annotate the signifcant regions of each genome from the BLAST results, identifying start (ATG) and stop (TGA/TAA/TAG) codons, plus introns, exons, and splice site signals (GT/AG) at intron–exon boundaries. Expressed sequence tag (EST) data sets publicly available for both H. vulgaris and A. aurita were aligned to the respective genomes with BLAT (BLAST-like alignment tool)70 as an aid to identifcation of genes, intron–exon boundaries, and confrmation of gene expression. Te H. vulgaris EST data set consisted of approximately 18,000 individual ­reads23 while the A. aurita EST data set contained only 77 reads­ 37. Potential CYPs in each cnidarian species were identifed, considered full length at ~ 500 amino acid residues, and were matched to the well-curated cytochrome P450 HMM in the Pfam protein family ­database71 to confrm identity. Te ScanProsite ­tool72 was used to verify the presence of four largely conserved CYP motifs: the I-helix, K-helix, meander coil and heme loop. Pfam and Prosite scans included the 82 N. vectensis CYPs previously iden- tifed but not named­ 21. Each putative CYP was classifed as complete (proper length with start and stop codon, all motifs present, and match to the HMM) or partial (presence of at least the entire ~ 120 amino acid region that contains all motifs, but clearly less than full length). Any potential CYP that was missing at least one of the Prosite motifs was considered a gene fragment. Te resulting complete cnidarian CYPs were used as queries for another BLAST search of each species’ genome to ensure that all paralogs were identifed.

Phylogenetic analysis of cnidarian cytochrome P450 sequences. All of the complete and partial CYPs were included in phylogenetic analyses, plus the previously predicted CYP protein sequences for N. vect- ensis21, with all predicted CYP proteins assigned a formal name by the CYP Nomenclature Committee. Clustal Omega (v1.2.4)73 was used to generate a global multiple sequence alignment of all H. vulgaris, A. digitifera, N. vectensis and A. aurita sequences plus a variety of vertebrate CYPs, including all major families from humans and Danio rerio, and select families from Mus musculus, Xenopus laevis, Rattus norvegicus, Gallus gallus, Takifugu rubripes, and Dichotomyctere nigroviridis (over 300 sequences in total). Mesquite (v3.10)74 was utilized to remove poorly aligned regions of uncertain homology, especially at the termini of the protein sequences where signif- cant variation is typically observed. Te fnal trimmed alignment was used as input for the Randomized Axeler- ated Maximum Likelihood Program (RAxML v8.2.9)75, with the rapid generation algorithm (-x), a gamma dis- tribution for among-site rate variation, the JTT substitution matrix with empirical amino acid frequencies, and 100 bootstrap replicates for assessment of phylogenetic confdence. Te fnal maximum likelihood phylogenetic tree was visualized with Figtree (v1.4.3)76 and rooted using the CYP51 family of enzymes. Clan-specifc multiple sequence alignments and phylogenetic trees were generated using the same process, restricted to include only sequences that clustered with a particular clan in the global phylogeny. Te clan 2 tree was rooted with CYP51, while the clan 3 and 4 trees were rooted with the CYP11 family. Data availability No new sequence data were generated, previously published accessions are included in the manuscript.

Received: 29 September 2020; Accepted: 15 April 2021

References 1. Danielson, P. B. Te cytochrome P450 superfamily: Biochemistry, evolution and drug metabolism in humans. Curr. Drug Metab. 3, 561–597 (2002). 2. Hill, H. A. O., Roder, A. & Williams, R. J. P. Cytochrome P-450: Suggestions as to the structure and mechanism of action. Nature 57, 69–72 (1970). 3. Zhang, L. H., Rodriguez, H., Ohno, S. & Miller, W. L. Serine phosphorylation of human P450c17 increases 17,20-lyase activity: Implications for adrenarche and the polycystic ovary syndrome. Proc. Natl. Acad. Sci. USA 92, 10619–10623 (1995). 4. Lepesheva, G. I. & Waterman, M. R. Sterol 14α-Demethylase Cytochrome P450 (CYP51), a P450 in all biological kingdoms. Biochim. Biophys. Acta 1770, 467–477 (2007). 5. Arnold, C., Konkel, A., Fischer, R. & Schunck, W. H. Cytochrome P450-dependent metabolism of omega-6 and omega-3 long- chain polyunsaturated fatty acids. Pharmacol. Rep. 62, 536–547 (2010). 6. Lewis, D. F. Human cytochromes P450 associated with the phase 1 metabolism of drugs and other xenobiotics: A compilation of substrates and inhibitors of the CYP1, CYP2 and CYP3 families. Curr. Med. Chem. 10, 1955–1972 (2003). 7. Lewis, D. F. 57 varieties: Te human cytochromes P450. Pharmacogenomics 5, 305–318 (2004). 8. Nelson, D. R. et al. P450 superfamily: Update on new sequences, gene mapping, accession numbers and nomenclature. Pharma- cogenet 6, 1–42 (1996). 9. Nelson, D. R. Metazoan cytochrome P450 evolution. Comp. Biochem. Physiol. C Pharmacol. Toxicol. Endocrinol. 121, 15–22 (1998).

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 13 Vol.:(0123456789) www.nature.com/scientificreports/

10. Nelson, D. R. Cytochrome P450 and the individuality of species. Arch. Biochem. Biophys. 369, 1–10 (1999). 11. Nelson, D. R. Progress in tracing the evolutionary paths of cytochrome p450. Biochem. Biophys. Acta 1814, 14–18 (2011). 12. Goldstone, J. V. et al. Identifcation and developmental expression of the full complement of cytochrome P450 genes in zebrafsh. BMC Genomics 11, 643. https://​doi.​org/​10.​1186/​1471-​2164-​11-​643 (2010). 13. Nelson, D. R. Comparison of P450s from human and fugu: 420 million years of vertebrate P450 evolution. Arch. Biochem. Biophys. 409, 18–24 (2003). 14. Goldstone, J. V. et al. Cytochrome P450 1 genes in early deuterostomes (tunicates and sea urchins) and vertebrates (chicken and frog): origin and diversifcation of the CYP1 gene family. Mol. Biol. Evol. 24, 2619–2631 (2007). 15. Goldstone, J. V. et al. Te chemical defensome: environmental sensing and response genes in the Stronglyocentrotus purpuratus genome. Dev. Biol. 300, 366–384 (2006). 16. Tijet, N., Helvig, C. & Feyereisen, R. Te cytochrome P450 gene superfamily in Drosophila melanogaster: annotation, intron-exon organization and phylogeny. Gene 262, 189–198 (2001). 17. Baldwin, W. S., Marko, P. B. & Nelson, D. R. Te cytochrome P450 (CYP) gene superfamily in Daphnia pulex. BMC Genomics 10, 169. https://​doi.​org/​10.​1186/​1471-​2164-​10-​169 (2009). 18. Dejong, C. A. & Wilson, J. Y. Te cytochrome P450 superfamily complement (CYPome) in the annelid Capitella teleta. PLoS ONE 9, e107728. https://​doi.​org/​10.​1371/​journ​al.​pone.​01077​28 (2014). 19. Nelson, D. R., Goldstone, J. V. & Stegeman, J. J. Te cytochrome P450 genesis locus: Te origin and evolution of animal cytochrome P450s. Philos. Trans. R. Soc. B Biol. Sci. 368, 20120474. https://​doi.​org/​10.​1098/​rstb.​2012.​0474 (2013). 20. Guengerich, F. P. & Cheng, Q. Orphans in the human cytochrome p450 superfamily: Approaches to discovering functions and relevance in pharmacology. Pharmacol. Rev. 63, 684–699 (2011). 21. Goldstone, J. V. Environmental sensing and response genes in cnidaria: Te chemical defensome in the sea anemone Nematostella vectensis. Cell Biol Toxicol 24, 483–502 (2008). 22. Gold, D. A. et al. Te genome of the jellyfsh Aurelia and the evolution of animal complexity. Nat. Ecol. Evolut. 3, 96–104 (2019). 23. Chapman, J. A. et al. Te dynamic genome of Hydra. Nature 464, 592–596 (2010). 24. Shinzato, C. et al. Using the Acropora digitifera genome to understand coral responses to environmental change. Nature 476, 320–323 (2011). 25. Brusca, R. C., Wendy, M. & Shuster, S. M. Invertebrates 3rd edn. (Oxford University Press, 2016). 26. Sullivan, J. C., Retizel, A. M. & Finnerty, J. R. A high percentage of introns in human genes were present early in animal evolution: Evidence from the basal metazoan Nematostella vectensis. Genome Inform. 17, 219–229 (2006). 27. Dohrmann, M. & Wörheide, G. Dating early animal evolution using phylogenomic data. Sci. Rep. 7, 3599 (2017). 28. Gold, D. A. Life in changing fuids: A critical appraisal of swimming animals before the . Integr. Comp. Biol. 58, 677–687 (2018). 29. Park, E. et al. Estimation of divergence times in cnidarian evolution based on mitochondrial protein-coding genes and the fossil record. Mol. Phylogenet. Evol. 62, 329–345 (2012). 30. Kayal, E. et al. Phylogenomics provides a robust topology of the major cnidarian lineages and insights on the origins of key organ- ismal traits. BMC Evol. Biol. 18, 68 (2018). 31. Leclère, L. et al. Te genome of the jellyfsh Clytia hemisphaeric and the evolution of the cnidarian life-cycle. Nat. Ecol. Evol. 3, 801–810 (2019). 32. Martínez, D. E. et al. Phylogeny and biogeography of Hydra (Cnidaria: Hydridae) using mitochondrial and nuclear DNA sequences. Mol. Phylogenet. Evol. 57, 403–410 (2010). 33. Itô, T. A new fresh-water polyp, Hydra magnipapillata, n. sp. from Japan. Sci. Rep. 18, 6–10 (1947). 34. Pallas, P. S. Elenchus Zoophytorum. Apud Petrum van Cleef, Hagae-Comitum (1766). 35. Hinrichs, S., Patten, N. L., Feng, M., Strickland, D. & Waite, A. M. Which environmental factors predict seasonal variation in the coral health of Acropora digitifera and Acropora spicifera at Ningaloo Reef?. PLoS ONE 8, e60830. https://​doi.​org/​10.​1371/​journ​ al.​pone.​00608​30 (2013). 36. Genikhovich, G. & Technau, U. Te starlet sea anemone Nematostella vectensis: An anthozoan model organism for studies in comparative genomics and functional evolutionary developmental biology. Cold Spring Harb. Protoc. 2009, pdb.emo129. https://​ doi.​org/​10.​1101/​pdb.​emo129 (2009). 37. Brekhman, V., Malik, A., Haas, B., Sher, N. & Lotan, T. Transcriptome profling of the dynamic life cycle of the scyphozoan jellyfsh Aurelia aurita. BMC Genomics 16, 74. https://​doi.​org/​10.​1186/​s12864-​015-​1320-z (2015). 38. Lemaire, B. et al. Cytochrome P450 20A1 in zebrafsh: Cloning, regulation and potential involvement in hyperactivitiy disorders. Toxicol. Appl. Pharmacol. 296, 73–84 (2016). 39. Takaku, Y. et al. Innexin gap junctions in nerve cells coordinate spontaneous contractile behavior in hydra polyps. Sci. Rep. 4, 3573. https://​doi.​org/​10.​1038/​srep0​3573 (2014). 40. Elphick, M. R., Mirabeau, O. & Larhammar, D. Evolution of neuropeptide signaling systems. J. Exp. Biol. 221, 151092. https://doi.​ ​ org/​10.​1242/​jeb.​151092 (2018). 41. Kusserow, A. et al. Unexpected complexity of the Wnt gene family in a sea anemone. Nature 433, 156–160 (2005). 42. Lengfeld, T. et al. Multiple Wnts are involved in Hydra organizer formation and regeneration. Dev. Biol. 330, 186–199 (2009). 43. Magie, C. R., Pang, K. & Martindale, M. Q. Genomic inventory and expression of SOX and FOX genes in the cnidarian Nematostella vectensis. Dev Genes Evol. 215, 618–630 (2005). 44. Steele, R. E., David, C. N. & Techanu, U. A genomic view of 500 million years of cnidarian evolution. Trends Genet. 27, 7–13 (2011). 45. Tarrant, A. M., Reitzel, A. M., Kwok, C. K. & Jenny, M. J. Activation of the cnidarian oxidative stress response by ultraviolet radia- tion, polycyclic aromatic hydrocarbons and crude oil. J. Exp. Biol. 217, 1444–1453 (2014). 46. Tarrant, A. M., Payton, S. L., Reitzel, A. M., Porter, D. R. & Jenny, M. J. Ultraviolet radiation signifcantly enhances the molecular response to dispersant and sweet crude oil exposure in Nematostella vectensis. Mar. Environ. Res. 134, 96–108 (2018). 47. Petrunak, E. M., DeVore, N. M., Porubsky, P. R. & Scott, E. E. Structures of human steroidogenic cytochrome P450 17A1 with substrates. J. Biol. Chem. 289, 32952–32964 (2014). 48. Pallan, P. S. et al. Human cytochrome P450 21A2, the major steroid 21-hydroxylase: Structure of the enzyme·progesterone substrate complex and rate-limiting C-H bond cleavage. J. Biol. Chem. 290, 13128–13143 (2015). 49. Nerbert, D. W., Wikvall, K. & Miller, W. L. Human cytochromes P450 in health and disease. Philos. Trans. R. Soc. Lond. B Biol. Sci. 368, 20120431. https://​doi.​org/​10.​1098/​rstb.​2012.​0431 (2013). 50. Guo, H. et al. Identifcation of cytochrome P450 (CYP) genes in Zhikong scallop (Chlamys farreri). J. Ocean Univ. China 12, 97–102 (2013). 51. Han, J., Park, J. C., Hagiwara, A., Park, H. G. & Lee, J. S. Identifcation of the full 26 cytochrome P450 (CYP) genes and analysis of their expression in response to benzo[a]pyrene in the marine rotifer Brachionus rotundiformis. Comp. Biochem. Physiol. D 29, 185–192 (2019). 52. Marlow, H. Q., Srivastava, M., Matus, D. Q., Rokhsar, D. & Martindale, M. Q. Anatomy and development of the nervous system of Nematostella vectensis, an anthozoan cnidarian. Dev. Neurobiol. 69, 235–254 (2009). 53. Twan, W. H., Hwang, J. S. & Chang, C. F. Sex steroids in scleractinian coral, Euphyllia ancora: Implication in mass spawning. Biol. Reprod. 68, 2255–2260 (2003).

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 14 Vol:.(1234567890) www.nature.com/scientificreports/

54. Tarrant, A. M., Atkinson, M. J. & Atkinson, S. Estrone and estradiol-17 beta concentration in tissue of the scleractinian coral, Montipora verrucosa. Comp. Biochem. Physiol. A Mol. Integr. Physiol. 122, 85–92 (1999). 55. Goldstone, J. V. et al. Genetic and structural analyses of cytochrome P450 hydroxylases in sex hormone biosynthesis: Sequential origin and subsequent coevolution. Mol. Phylogenet. Evol. 94(Pt B), 676–687 (2016). 56. Lamb, D. C., Kelly, D. E. & Kelly, S. L. Molecular diversity of sterol 14α-demethylase substrates in plants, fungi and humans. FEBS Lett. 425, 263–265 (1998). 57. Morrison, A. M. S. et al. Identifcation, modeling and ligand afnity of early deuterostome CYP51s and functional characterization of recombinant zebrafsh sterol 14α-demethylase. Biochem. Biophys. Acta 1840, 1825–1836 (2014). 58. Zhang, T. et al. Evolution of the cholesterol biosynthesis pathway in animals. Mol. Biol. Evol. 36, 2548–2556 (2019). 59. Nelson, M. M., Phleger, C. F., Mooney, B. D. & Nichols, P. D. Lipids of gelatinous Antarctic zooplankton: Cnidaria and ctenophora. Lipids 35, 551–559 (2000). 60. Kanazawa, A., Teshima, S. & Tomita, S. Sterol biosynthesis in some coelenterates and echinoderms. Bull. Jpn. Soc. Sci. Fish 40, 1257–1260 (1974). 61. Dermauw, W., Van Leeuwen, T. & Feyererisen, R. Diversity and evolution of the P450 family in arthropods. Insect Biochem. Mol. Biol. 127, 103490 (2020). 62. Technau, U. & Schwaiger, M. Recent advances in genomics and transcriptomics of cnidarians. Mar. Genomics 24, 131–138 (2015). 63. Hasemann, C. A., Kurumbail, R. G., Boddupalli, S. S., Peterson, J. A. & Deisenhofer, J. Structure and function of cytochromes P450: A comparative analysis of three crystal structures. Structure 3(1), 41–62 (1995). 64. Werck-Reichhart, D. & Feyereisen, R. Cytochromes P450: A success story. Genome Biol. 1, 3003.1-3003.9. https://doi.​ org/​ 10.​ 1186/​ ​ gb-​2000-1-​6-​revie​ws3003 (2000). 65. Werck-Reichhart, D., Bak, S., & Paquette, S. Cytochrome P450. in Te Arabidopsis Book e0028 (2002). 66. Baker, M. E., Nelson, D. R. & Studer, R. Origin of the response to adrenal and sex steroids: Roles of promiscuity and co-evolution of enzymes and steroid receptors. J. Steroid Biochem. Mol. Biol. 151, 12–24 (2014). 67. Markov, G. V. et al. Independent elaboration of steroid hormone signaling pathways in metazoans. Proc. Natl. Acad. Sci. USA 106, 11913–11918 (2009). 68. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). 69. Skinner, M. E., Uzilov, A. V., Stein, L. D., Mungall, C. J. & Holmes, I. H. JBrowse: A next generation genome browser. Genome Res. 19, 1630–1638 (2009). 70. Kent, W. J. BLAT – Te BLAST-like alignment tool. Genome Res. 12, 656–664 (2002). 71. Finn, R. D. et al. Te Pfam protein families database. Nucelic Acids Res. 38, D211–D222. https://doi.​ org/​ 10.​ 1093/​ nar/​ gkp985​ (2010). 72. Sigrist, C. J. A., Cerutti, L. & de Castro, E. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 38, D161–D166. https://​doi.​org/​10.​1093/​nar/​gkp885 (2010). 73. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539. https://​doi.​org/​10.​1038/​msb.​2011.​75 (2011). 74. 74Maddison, W. P., & Maddison, D. R. Mesquite: A Modular System for Evolutionary Analysis. Version 3.10 http://mesqu​ itepr​ oject.​ ​ org (2016) 75. Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014). 76. Rambaut, A. FigTree v1.4.3. http://​tree.​bio.​ed.​ac.​uk/​sofw​are/​fgtr​ee/ (2016) Acknowledgements Tis study was supported by grants from the Natural Science Engineering Research Council of Canada in the form of a Discovery Grant (RGPIN-328204-2011 and RGPIN-05767-2016) to J.Y.W. A.G.M. held a Cisco Research Chair in Bioinformatics, supported by Cisco Systems Canada, Inc. K.V.P. was supported by a Natu- ral Sciences and Engineering Research Council of Canada (NSERC) Undergraduate Student Research Award (USRA). Computer resources were supplied by the McMaster Service Lab and Repository computing cluster, funded in part by grants to A.G.M. from the Canadian Foundation for Innovation (34531 to A.G.M.). Te authors would like to thank Amos Raphenya and Brian Alcock (McMaster University) for assistance with computational tasks. Author contributions Genome annotation of CYPs in Hydra, Acropora, and Aurelia genomes was completed by K.V.P.; J.V.G. completed the genome annotations in Nematostella. D.A.G. shared the Aurelia genome data for this study. Gene alignments, motif searching, phylogenetic analyses were completed by K.V.P. A.G.M. and J.Y.W. were co-supervisors of K.V.P. and contributed with K.V.P. to draf manuscript writing and editing; A.G.M. provided bioinformatics supports in terms of cluster access. D.R.N. provided the nomenclature assignment of all the genes based on sequence and phylogenies provided by K.V.P. and J.Y.W. Gene nomenclature was verifed by J.V.G. and J.Y.W. Te project idea and major funding was from J.Y.W. All co-authors contributed to writing and editing the manuscript.

Competing interests Te authors declare no competing interests. Additional information Supplementary Information Te online version contains supplementary material available at https://​doi.​org/​ 10.​1038/​s41598-​021-​88700-y. Correspondence and requests for materials should be addressed to J.Y.W. Reprints and permissions information is available at www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations.

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 15 Vol.:(0123456789) www.nature.com/scientificreports/

Open Access Tis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.

© Te Author(s) 2021

Scientifc Reports | (2021) 11:9834 | https://doi.org/10.1038/s41598-021-88700-y 16 Vol:.(1234567890)