Gene 575 (2016) 478–487

Contents lists available at ScienceDirect

Gene

journal homepage: www.elsevier.com/locate/gene

Research paper In silico identification of BESS-DC genes and expression analysis in the silkworm, Bombyx mori

Zhongchen Rao a, Jun Duan b, Qingyou Xia b,QiliFenga,⁎ a Guangzhou Key Laboratory of Insect Development Regulation and Application Research, School of Life Sciences, South China Normal University, Guangzhou, 510631, China b State Key Laboratory of Silkworm Genome Biology, Southwest University, 400716, China article info abstract

Article history: BESS domain is a binding domain that can interact each other or with other domains. In this study, 323 Received 27 January 2015 BESS domain containing (BESS-DC) were identified in 3328 proteomes. These BESS-DC genes pertain Received in revised form 10 September 2015 to 41 species of five phyla, most of which are arthropod insects. A BESS domain contains two α-helixes linked Accepted 11 September 2015 by a coil or β-turn. Phylogenetic tree and architecture analysis show that the BESS domain seems to generate Available online 16 September 2015 along with the DNA-binding MADF domain. Two hundred thirty three BESS-DC genes (71.1%) contain at least

Keywords: one MADF domain, while 59 genes (18.2%) had only the BESS domain. In addition to BESS and MADF domains, BESS domain some of genes also contain other ligand binding domains, such as DAO, DUS and NAD_C. Nineteen genes Protein–protein interaction (5.8%) are associated with other DNA binding domains, such as Myb and BED. The BESS-DC genes can be divided Bioinformatics into 17 subfamilies, eight of which have more than one clade. In Bombyx mori, 12 BESS-DC genes that do not Expression contain intron in the BESS domain region were localized to eight chromosomes. Real-time PCR results showed Bombyx mori that most of the B. mori BESS-DC genes highly expressed from late larval stage to adult stage. The results of sequence comparison and evolution analyses suggest a hypothesis that the BESS-DC genes may play a role in central nervous system development, long term memory and metamorphosis of insects of different phyla. © 2015 Elsevier B.V. All rights reserved.

1. Introduction BESS domain is a kind of specific protein-binding domain, named after three proteins that originally defined the domain: BEAF32 (Boundary A is a conserved part of a protein sequence and can element associated factor 32) (Zhao et al., 1995), Su(var)3–7(Reuter function independently from the rest of the protein sequence. A domain et al., 1990)andStonewall (Clark and McKearin, 1996). A BESS domain forms a compact three-dimensional structure that usually is stable and usually contains 40 residues and composes of two or three properly folded independently. Many proteins may consist of several α-helices, which might be associated with a /SANT HTH domain structural domains and a domain may appear in a variety of different (Bhaskar and Courey, 2002). A BESS domain in a protein is involved in proteins. Domains can serve as modules for building up large assem- a variety of protein–protein interaction, for example, interaction with blies, such as virus particles or muscle fibers. Domains also can provide Dorsal and TBP-associated factors (Cutler et al., 1998). The proteins specific catalytic or binding sites for substrates, as found in enzymes or that contain BESS domain(s) are usually called BESS domain containing regulatory proteins. The domains that act as a specific binding site can (BESS-DC) proteins. be divided into several classes based on nature of the ligands they BESS domain is firstly found in Drosophila and believed to be specific bind with, such as nucleic acid binding, protein binding or iron binding. in Drosophila (Cutler et al., 1998; England et al., 1992). But in online research and our preliminary study on Bombyx mori, the BESS- Abbreviations: BESS, BEAF32 (Boundary element associated factor 32), Su(var)3–7and DC genes were also found in other species. BESS-DC proteins are Stonewall; BESS-DC, BESS-domain containing; MADF, myb/SANT-like domain in Adf-1; involved in many biology processes, for example, nurse cells and oocyte NLS, nuclear localization signal; NES, nuclear export signal; HP 1, heterochromatin protein specialization (Clark and McKearin, 1996), eye development (Duong 1; qRT-PCR, quantitative real time-polymerase chain reaction; ER, endoplasmic reticulum; et al., 2008), immune response (Ratnaparkhi et al., 2008), lethal hybrid BLAST, The Basic Local Alignment Search Tool. ⁎ Corresponding author. rescue (Maheshwari and Barbash, 2012; Chatterjee et al., 2007; Brideau E-mail address: [email protected] (Q. Feng). et al., 2006), chromosomes boundary (Cuvier et al., 1998; Pathak et al.,

http://dx.doi.org/10.1016/j.gene.2015.09.024 0378-1119/© 2015 Elsevier B.V. All rights reserved. Z. Rao et al. / Gene 575 (2016) 478–487 479

2007) and long-term memory (DeZazzo et al., 2000). In this study, domain sequences (2 genes contain more than one BESS domain) genes that encode BESS-DC proteins were identified by in silico could be grouped into 17 subfamilies, out of which eight subfamilies approach in different organisms with completely sequenced proteomes could be further classified into several clades (Table C1). (UniProt, 2010). Expression patterns of some of these BESS-DC genes in Generally, a homologous gene from the closest species that is not B. mori were analyzed to explore their possible functions. analyzed is selected as the best out-group in phylogenetic tree construc- tion. In our study, however, all the BESS-DC genes are considered as in- 2. Result group genes, out-group does not exist theoretically because they were predicted from the global species with complete proteome. Therefore, 2.1. Identification of BESS-DC proteins in organisms with complete the gene that meets the following two conditions was chosen as proteomes presumed out-group in our analysis: 1) a gene from a species that has a farther evolution distance than any other species; 2) the gene cannot Through comprehensive searches in 3328 complete proteomes be grouped into the other 17 subfamilies. Thus, the gene H3ILF2 from deposited in Uniprot (www.uniprot.org), a total of 323 independent echinodermata was selected as an out-group. BESS-DC proteins encoded by 323 genes were found (Supplement Phylogenetic tree analysis indicated that BESS-1 subfamily formed a Table 1) by using HMMsearch. These BESS-DC genes are derived from binary monophyletic group with BESS-2 subfamily (Fig. 2), both of 41 eukaryota species of 20 families of five different phyla (Fig. 1). The which are from Diptera and Lepidoptera (Table C1). Similarly, BESS-4 BESS-DC genes from the arthropod phylum accounted for 94% of the and BESS-5 subfamilies formed another binary monophyletic group, total genes, most of which are from insecta, including the orders of both of which are from Diptera. The BESS-3 subfamily is also from Diptera, Hymenoptera, Lepidoptera, Coleoptera and Hemiptera. Eighty Diptera and might be a common ancestor for BESS-1/BESS-2 and two percent (252) BESS-DC genes are from the order of Diptera. BESS-4/BESS-5 subfamily groups. The above subfamilies form a ternary Thirteen genes are from the chordata phylum. Two, one and one BESS- tree with BESS-6 and BESS-7 subfamilies. BESS-8, BESS-9 and BESS-10 DC genes are from echinodermata, porifera and nematode, respectively. subfamilies form a ternary tree with a bootstrap value of 62 and these three subfamilies are from Diptera. Most of the members from the 2.2. Phylogenetic and evolutionary analysis of the BESS-DC protein family above 10 subfamilies (BESS-1 to BESS-10, except BESS-6) are found in either Diptera or Lepidoptera. The members of BESS-6 are present in The predicated BESS domain sequences (37 amino acids each) were Coleoptera and Lepidoptera. The members of the subfamily BESS-11 used to construct phylogenetic tree. The results show that 325 BESS were found in Diptera, Lepidoptera, Coleoptera and Hymenoptera. The

Fig. 1. Distribution of BESS-DC genes. BESS domain containing genes distribute to five phyla of eukaryota, seven different classes, 14 orders and 41 species. Most (94%) of the BESS-DC genes belong to arthropoda. 480 Z. Rao et al. / Gene 575 (2016) 478–487

Fig. 2. Phylogenetic tree and architecture analysis of the subfamilies of BESS domains in different species. (A) Phylogenetic relationship of the BESS domain was reconstructed by using the maximum likelihood method. Only subfamilies and subclades are shown. Branch length of the tree is related to distances between subfamilies or subclade. The branch length of subfamily or subclade are represented by the branch length of its members, which have the nearest distant to other subfamilies or subclade. (B) Domain arrangement in the subfamilies of 323 BESS- DC genes. common ancestor of BESS-1 to BESS-11 forms a binary monophyletic and 10th sites of the BESS domain are phenylalanine and serine, respec- group with BESS-13 subfamily, which is derived from Hymenoptera, tively, with the homology more than 85%. Another 8 conserved sites suggesting that BESS-13 subfamily might appear earlier than any also share more than 50% homology. The remaining 11 conserved sites other subfamilies of BESS-1 to BESS-11. BESS-12 subfamily consisted have 2–3 variations of amino acid residues. The total homology was of species of 10 orders in different phyla and could be divided into 6 more than 75% (Fig. 3A). clades with lower bootstrap values. The common ancestor of the Based on the similarity analysis of secondary structure by Sopma above subfamilies forms a binary monophyletic group with BESS-14, (Geourjon and Deleage, 1995),Jpred3 (Cole et al., 2008; Cuff et al., which was found only in Lepidoptera. BESS-15 and BESS-16 subfamilies 1998), CFSSP (Cuff et al., 1998), HNN (Zhao et al., 2004) and PHD were found in Hymenoptera and Diptera, respectively, sharing a com- (Cleard et al., 1997; Cattanach et al., 1972), 307 BESS domains were mon ancestor with the above 15 subfamilies and they are considered predicted to contain two α-helices with 12–14 amino acids long. The as the earliest subfamilies in the BESS family. Thus, when the H3ILF2 first α-helix is mainly located between the 3rd or 5th residues and the was considered as an out-group, the BESS-1 and BESS-4 subfamilies 16th or 17th residue. The start site of the second α-helix was variable were the most distant from the oldest ancestor of the BESS family, and it ended at the 33rd to 35th residue. Twelve genes from the BESS- implying that the BESS-1 and BESS-4 might be the latest subfamilies 9 subfamily were predicted to have only one α-helix; four genes from in the BESS families; on contrary, the BESS-15, BESS-16 and BESS-17 the BESS-17 subfamily were predicted to have three α-helices. For subfamilies were the most close to the oldest ancestor and they may other genes, the BESS domain contains two α-helices, and linked by be the original clades in the BESS families. coil(s) or β-turn(s).

2.3. Sequence comparison of BESS domain and secondary structure analysis 2.4. Architectural structure of BESS-DC proteins

Alignment of the predicated 323 BESS domain sequences revealed Full sequence analysis revealed that in addition to the BESS domain, both the obvious conservation and divergence. Totally 21 conserved BESS-DC proteins may contain DNA-binding domain(s) or other ligand amino acids were found in the BESS domain of 37 residues. The 7th binding domain(s) (Fig. 2B). BESS domains can form 12 architectural Z. Rao et al. / Gene 575 (2016) 478–487 481

The results of phylogenetic and architectural analysis suggest that the BESS domain might generate along with the MADF domain, and then during the evolution process, some subfamilies might lose the MADF domain, as seen in BESS-5 and BESS-16 in Diptera, while some subfamilies developed other DNA-binding domains, as seen in BESS- 10, BESS-11 and BESS-12 in Diptera. Some subfamilies, such as BESS-1 in Lepidoptera, might duplicate the MADF-BESS structure.

2.5. Analysis of nuclear localization signal (NLS) and nuclear export signal (NES)

Structure analysis reveal that many BESS-DC proteins are transcrip- tion factors functioning in the cell nucleus and some transcription factors play a role between cytoplasm and nucleus (Zhao et al., 2004). NLS is an amino acid sequence that targets a protein into the nucleus from cytoplasm, while NES is a short amino acid sequence that leads a protein into cytoplasm from nucleus through the nuclear pore complex. There were 254 BESS-DC proteins found to have NLS, 138 (42.5%) of which have a single NLS in either monopartile type or bipartile type. All members of the BESS-16a subfamily [su(var)3–7 family] contained more than one NLS. Proteins of the su(var)3–7 family are heterochromatin-associated proteins in the nuclei of the larval salivary glands of Drosophila and involved in the genomic silencing of position-effect variegation (Cleard et al., 1997). However, 70 proteins mainly from BESS-8, BESS-12b and BESS-16c subfamilies do not have Fig. 3. Alignment of different domains of 17 subfamilies BESS-DC genes. Alignment of 17 any NLS. The functions of these proteins are not known, but some of subfamilies of BESS-DC genes. Conserved amino acids are the ones that are conserved in them contained KKXX-like motif in the C-terminus or XXRR-like motif all of the 17 subfamilies. The letters underlined indicate the conserved amino acid residues with similarity more than 50%. in the N-terminus, which are endoplasmic reticulum membrane signals, implying that they may function in cytoplasm. Thirty two proteins mainly belonging to BESS-1, BESS-3 and BESS-7 subfamilies contained both NLS and NES (Complementary data 1), indicating that these structures with other domains. Two hundred thirty three BESS proteins proteins may function by shuttling between nucleus and cytoplasm. contain a MADF domain at the N-terminal or middle region. MADF and BESS domains can form eight different types of architec- 2.6. Properties of BESS-DC proteins tures. BESS-1 to BESS-7 subfamilies appeared to originate from the same ancestor and most of them contain the typical MADF–BESS structure. Length of amino acid sequences of BESS-DC proteins varied from 56 MADF–MADF–BESS organization was also found in BESS-6 and BESS-7 amino acids (E2ACY9) to 1984 amino acids (H9K8X1). Most (210) of and the distance between two MADF domains was 60 amino acids, the BESS-DC proteins (65% of total BESS-DC proteins) have 200 to 400 suggesting that MADF domain probably had been duplicated in these amino acid residues. Twenty eight proteins mainly in BESS-16c have subfamilies. Other repeat structures were found in two proteins of less than 200 amino acids, whereas 56 proteins in BESS-1c, BESS-2, BESS-1 subfamily in B. mori. H9JS66 has a structure of MADF–BESS– BESS-11 and BESS-13 have more than 400 amino acids. Most members MADF–BESS and H9J8Q4 has a structure of MADF–BESS–MADF– of the BESS-1a and BESS-16a subfamilies have more than 600 amino MADF–BESS, indicating that duplication of the MADF and BESS domains acids. occurred in these genes in B. mori. Some proteins contained other Predicted isoelectric point (pI) values of the identified BESS-DC domain(s) in the downstream of the MADF–BESS structure, such as proteins in this study were distributed in two ranges of 5 to 7 and 9 to cullin (ubiquitin protein ligase binding domain), Dus (flavin adenine 10. The predicted pI values of all the members of BESS-2, BESS-4, denucleotide binding domain) and Dao (FAD dependent oxidoreduc- BESS-6, BESS-8, BESS-9, BESS-13, BESS-15 and BESS-16 subfamilies are tase activity domain). The distance between MADF and BESS domains acidic, while the predicted pI values of members of BESS-1, BESS-7 was about 80–140 amino acids, depending on the subfamilies they and BESS-8 subfamilies are basic. belong to. The distance between BESS and other domain was random, Grand average of hydropathicity (GRAVY, G-value) is often used for varying from 30 to 500 amino acids. hydrophobic analysis of proteins. The hydrophobic analysis shows that Zf-BED, Myb-DNA-binding-5 and Thap domains were found to all the BESS-DC proteins appeared to be hydrophilic. The members substitute the MADF domain at the upstream of the BESS domain in of the BESS-11 subfamily are the most hydrophilic proteins, with some of the proteins of the BESS-10, BESS-11 and BESS-12 subfamilies E9IP78 from Solenopsis invicta being the strongest hydrophilic protein and the average distances of these domains apart from the BESS domain (G-value −1.44). In average, BESS-5 subfamily show lower hydrophilia were 190, 170 and 140 amino acids, respectively. NAD_Gly3P_dh_N do- than other subfamilies. B4JRC0 from Drosophila grimshawi has the low- main (NAD-dependent glycerol-3-phosphate dehydrogenase N- est hydrophilia among all of the BESS-DC proteins. terminus) and NAD_Gly3P_dh_C domain (NAD-dependent glycerol-3- phosphate dehydrogenase C-terminus) were found to locate at the 2.7. BESS-DC proteins in B. mori downstream of the BESS domain in the BESS-16 subfamily and the distance apart from the BESS domain were about 30 and 20 amino Twelve BESS-DC proteins were identified in silkworm proteomes by acid residues, respectively. HMM searching (E-value b 10E-5). Alignment of the 12 B. mori BESS-DC Fifty nine proteins contained BESS domain only and they distributed is shown in Fig. 4A. Secondary structure analysis revealed that B. mori to the BESS-5, BESS-9 and BESS-16 subfamilies. Because these proteins BESS-DC proteins consist of two α-helixes linking by random coil. The are full-length proteins and have only BESS domain, it appears that two conserved amino acids, phenylalanine (F) and serine (S), are locat- their functions can be achieved by the BESS domain alone. ed at the front part of the BESS domain and separated by leucine (L) and 482 Z. Rao et al. / Gene 575 (2016) 478–487

Fig. 4. Alignment of 14 BmBESS-DC domains and structure of BmBESS-DC genes in B. moir. (A) Designation of α helix 1 and α helix 2 is based on analysis by Jpred3 and Sopma. The sub- family names are listed on the right. The highly conserved amino acid residues that share 75% similarity are indicated with black background and two sites with 100% similarity are indi- cated with arrows on the top. The amino acid residues with gray background are conserved amino acids that share 50% similarity. (B) Organization, transcription direction, introns and exons of the BmBESS-DC genes are shown. The light gray bands represent the exon; the black and dark gray boxes show the position of the MADF and BESS domains, respectively. one random amino acid. Three other conserved amino acids with more locate in eight chromosomes. Chromosome 27 contains three genes, than 75% similarity among the sequences are two aspartic acid (D)sites 9JV08, H9JV62 and H9JV63 and the latter two genes cluster together, at α-helix 1 and a lysine (K) site at α-helix 2. Eight B. mori BESS-DC implying these two genes may share similar function or collaborate proteins have the MADF-BESS architectural structure. Two proteins each other to function. Chromosome 3 has D5MTP4 and H9JHS6 genes have only a BESS domain, while H9JS66 and H9J8Q4 have more than and chromosome 17 contains H9J7Q5 and H9J366 genes. These BESS one BESS domain and have MADF-MADF-BESS-MADF-BESS and genes do not cluster together. Each of chromosomes 4, 9, 12, 16 and MADF-BESS-MADF-BESS architectures, respectively, which are different 21 contains one gene, while no BESS-DC genes were found in other from the BESS DC proteins in the other species. chromosomes. Length of the BmBESS-DC genes is various, ranging from 539 bp 2.8. Chromosome localization and structure of B. mori BESS-DC genes to 8087 bp and the coding regions span from 531 bp to 1398 bp (Table 1). Six genes are transcribed in forward direction and the others Chromosome localization analysis is an important part of functional are in reverse direction (Fig. 4B). Most of BmBESS-DC genes except genomics and the location of genes may be related to the physiological H9J9F0, H9JS66 and H9J8Q4 have one or two introns. All of the BESS function. Chromosome localization of the 12 B. mori BESS-DC (BmBESS- domain regions of the genes do not contain any intron, whereas all of DC) proteins genes is listed in Table 1. All of the BmBESS-DC genes the MADF domain regions have at least one intron (Fig. 4B).

Table 1 Chromosome localization and coding region of BmBESS-DC genes in B. mori. ⁎ No. ID ID from silkdb Protein length Location (chrom.) Genome position Strand Gene length coding region Exons

1 H9J8Q4 BGIBMGA005896 466 12 nscaf2842:1340634..1348721 + 8087 1398 7 2 H9JS66 BGIBMGA012376 464 21 nscaf3041:1062087..1070179 − 8092 1392 6 3 H9J7Q5 BGIBMGA005547 210 17 nscaf2829:2873897..2879887 − 5990 630 3 4 H9J9F0 BGIBMGA006143 179 4 nscaf2847:4116945..4117484 + 539 537 1 5 H9JU56 BGIBMGA013068 238 16 nscaf3058:7344788..7346694 + 1896 714 3 6 H9JV63 BGIBMGA013426 177 27 nscaf3072:2605811..2606542 + 731 531 2 7 H9JV62 BGIBMGA013425 346 27 nscaf3072:2600822..2604881 + 4059 1038 3 8 H9JHS6 BGIBMGA009074 206 3 nscaf2931:719960..721376 − 1416 618 2 9 H9JSG6 BGIBMGA012477 225 9 nscaf3045:644681..646101 − 1420 675 3 10 D5MTP4 BGIBMGA007371 193 3 nscaf2883:584561..586983 − 2422 579 3 11 H9JV08 BGIBMGA013371 236 27 nscaf3071:640677..641881 + 1204 708 2 12 H9J366 BGIBMGA003953 211 17 nscaf2766:60390..63243 − 2853 633 2

⁎ “+” indicates sense strand; “−” indicates antisense strand. Z. Rao et al. / Gene 575 (2016) 478–487 483

2.9. Identification of orthologous families in Drosophila melanogaster and of the tested stages (Fig. 6). H9J7Q5, H9JU56, H9JHS6 and D5MTP4 Aedes aegypti were expressed in all of the tested stages from egg to adult, but the expression levels of the latter three genes were very low (Fig. 6A). Fourteen BESS domains of B. mori BESS-DC proteins (BmBESS), 21 H9J7Q5 belongs to the BESS-12 subfamily and is homologous to Adf- BESS domains of D. melanogaster BESS-DC proteins (DmBESS) and 10 1-like, which plays an important role in dendrite morphogenesis and BESS domains of A. aegypti BESS-DC proteins (AaBESS) were used to affects long-term memory in fly(Parrish et al., 2006; Sokolowski, construct a ML tree (Fig. 5). These insect species are the top three 2001; Burrell and Sahley, 2001). Microarray analysis of gene expression ones with the largest number of BESS genes including all subfamilies in B. mori (GSE17571) showed that H9J7Q5 was highly expressed in of the BESS domains. Sea urchin H3ILF2 was used as an out-group. larval/adult nervous system and adult ovary, and also expressed at BmBESS belong to BESS-1, 2, 3, 6, 9, 11, 12 and 17 subfamilies. Seven moderate levels in heart, mid/hind gut, malpighian tubule and salivary BmBESS formed monophyletic groups with three DmBESS and four gland (Xia et al., 2007). AaBESS with high bootstrap values. These DmBESS and AaBESS H9J8Q4, H9JV62, H9JV63, H9J9F0 and H9JS66 shared similar tempo- are considered orthologous members of BmBESS. Two pairs of ral expression patterns. They were expressed starting from late larval BmBESS BESS domains (two domains from H9JS66 and two domains stage and peaked at wandering stage (Fig. 6B). H9J8Q4 and H9JS66 are from H9JV62 and H9JV63, respectively) shared high similarity and members of the BESS-1c subfamily. Temporal expression data showed formed a monophyletic group, suggesting that H9JV62 probably is a that H9J8Q4 started to express at L5D6 and peaked at L5D7 and W0 paralogous member of H9JV63. D5MTP4 and H9JHS6 could not form stage, then decreased slowly (Fig. 6B). Microarray data (GSE17571) monophyletic groups with other BESS domains from D. melanogaster showed that the gene was highly expressed in Malpighian tubule, and and A. aegypti, suggesting that no orthologues were present in fly and moderate expressed in testis and ovary (Xia et al., 2007), suggesting mosquito. that the H9J8Q4 may mainly function in Malpighian tubule at the late larval stage. H9JS66 was highly expressed from W2 to adult stage with 2.10. Expression profiles of BmBESS-DC genes and functional prediction a peak at prepupal stage. This gene was highly expressed in male testis and female ovary in microarray analysis (GSE17571) and moderately The qRT-PCR results showed that six genes (H9J7Q5, H0JV08, expressed in silk grand (Xia et al., 2007), implying that H9JS66 may H9J8Q5, H9JV62, H9J9F0 and H9JS66) were highly expressed at some play a role in reproductive process and cocoons spinning.

Fig. 5. Phylogenetic tree of the BESS domains in silkworm, fruit fly and mosquito. (A) A maximum likelihood tree of 14 BmBESS domains with 21 DmBESS domains and 10AaBESS domains is shown. Branch lengths of the tree are not proportional to distances, and bootstrap values less than 50 are not shown. The gray boxes indicate the silkworm BESS domain and the black dots represent the junction nodes of the members from the same subfamilies. (B) Part of ML tree of H9JHS6 with its orthologues. (C) Part of ML of D5MTP4 with its orthologues. 484 Z. Rao et al. / Gene 575 (2016) 478–487

Fig. 6. Developmental expression of 12 BESS-DC genes in B. mori. Samples of total RNA was isolated at 20 time points from egg stage to adult stage. BmActin-A2 gene was used as control. (A) Four genes (H9J7Q5, H9JU56, H9JHS6 and D5MTP4) expressed at all testing stage; (B) five genes (H9J8Q4, H9JS66, H9JV62, H9JV63 and H9J9F0) started to highly express at late larval stage; (C) Three genes (H9JSG6, H9J366 and H9JV08) started to highly express at pupa stage.

H0JV62 and H0JV63 are members of the BESS-6 subfamily. Both of NLS have not been recognized. We hypothesize that these 54 BESS-DC them were expressed highly from W2 to P2 stage, and then decreased transcription factors may use other unidentified NLS or other proteins until P6 stage when their expression increased again. Though sharing to help enter nucleus from the cytoplasm. Some of these BESS-DC similar expression patterns, the expression level of H0JV62 was 10 proteins contain NES or ER membrane signal, suggesting that they times higher than H0JV63. may function in the cytoplasm or organelles. Fifty nine proteins, mainly H9JSG6, H9J366 and H9JV08, which are members of the BESS-2, in the BESS-5 and BESS-16 subfamilies, contain only one BESS domain. BESS-9 and BESS-17 subfamilies, respectively, shared similar expression Out of these 59 genes, 43 contain NLS, suggesting that they probably patterns and they were highly expressed starting from pupa stage are capable of functioning in cell nucleus with only one BESS domain. (Fig. 6C). These three genes were highly expressed in the testis or ovary in microarray analysis (GSE17571), indicating that they may 3.2. Function prediction of BESS-DC protein subfamilies function in the reproductive system. All the results of qRT-PCR and microarray analysis showed that most Functions of BESS-DC proteins can be predicted based on the infor- of these BmBESS-DC genes were expressed from late larval to adult mation of annotation, homologous comparison, expression data and stage, implying that they may play important roles in nervous system references. Possible functions of 7 out of 17 subfamilies are discussed development and reproductive system development during metamor- here. The BESS-1 subfamily can be further classified into three clades. phosis in B. mori. Based on the gene annotation information, clade 1a is stonewall subfamily. Transcriptional regulation of stonewall is essential in 3. Discussion cystocytes for maturation into specialized nurse cells and oocyte (Clark and McKearin, 1996). Stonewall mutations block proper oocyte 3.1. Localization of BESS-DC proteins differentiation and frequently cause the presumptive oocyte to develop into a nurse cell. Eventually, the germ cells degenerate by apoptosis. No Sequence analysis showed that 265 BESS-DC proteins (81.7%) were direct functional information is available for clade 1b and 1c, but the transcription factors with different DNA-binding domains, but 54 of expression profiles of these two subfamilies imply that they may be them had no predicted NLS. Usually, a must enter related to nervous system development and adult reproduction. nucleus to regulate target gene expression. A typical NLS is generally The BESS-4 subfamily is represented by the gene Dip3 (Dorsal predicted as monopartite and bipartite sequence, while other types of interacting protein 3). Dip3 plays an important role in eye development, Z. Rao et al. / Gene 575 (2016) 478–487 485 for example, altering fate determination and resulting in formation of companion of HP1 in the genomic silencing of position-effect variega- extra antennae when the gene was mis-expressed in the early eye- tion (Cleard et al., 1997). Losing Su(var)3–7 may impact male X antennal disc (Duong et al., 2008). Dip3 is also involved in dorsoventral polytene chromosome morphology and dosage compensation (Spierer patterning and immune response (Ratnaparkhi et al., 2008). These et al., 2005). Expression data also implies that Su(var)3–7maypartici- functions are consistent with the expression data that showed the pate in reproductive system in both sex. The BESS-16b subfamily is genes were highly expressed in eyes and cuticle. related to Ravus, the phantom of Su(var)3–7. Ravus associates in vivo The BESS-5 subfamily is annotated to Lhr (Lethal hybrid rescue), with polytene chromosomes but, in contrast to Su(var)3–7, it does not which causes lethality of F1 hybrid male in the crosses between show specificity for the chromocenter (Delattre et al., 2002). The D. melanogaster and Drosophilasimulans at late larval or prepupal stages expression profile suggested that ravus may play a role in female due to failure in chromosome condensation during mitosis (Brideau ovary development in late larval stage. et al., 2006; Watanabe, 1979). However a Lhr mutant male of There is no functional information for the members of the remaining D. simulans produces viable hybrid males when the mutant male was 10 subfamilies. The members of BESS-6, BESS-13, BESS-14, BESS-15 and crossed to females of D. melanogaster (Prigent et al., 2009). Many studies BESS-17 subfamilies are highly conserved, all of which belong to four revealed that it is the Lhr gene of D. simulans that induces the lethality of orders of diptera, coleoptera, lepidoptera and hymenoptera. The data the hybrid males by regulating activities of the X chromosome(s) for of BLAST search analysis and gene expression appear to imply that dosage compensation (Chatterjee et al., 2007; Shirata et al., 2013). BESS-6, BESS-14 and BESS-15 may be involved in some biochemical Annotation of the BESS-9 subfamily leads to coop (corepressor of processes in reproductive system and the BESS-13 and BESS-17 subfam- pan), which plays an important role in the Wnt/Wg signaling pathway. ilies may be involved in development of many organs, such as eye, The Wnt/Wg signaling pathway controls various processes, such as central nervous system and especially in reproductive system. patterning, growth, energy and tissue homeostasis and maintenance The members of BESS-3, BESS-7 and BESS-8 subfamilies can be of somatic stem cells (Reya and Clevers, 2005; Gat et al., 1998; divided into several clades, all of which belong to one order of diptera, Hartmann, 2006; Bennett et al., 2003; Christodoulides et al., 2009). The BLAST search analysis and gene expression data imply that BESS-3 Many human diseases including cancer and metabolic disorders may may be related to nerve, wing and reproductive systems. Recent result from the error regulation of the Wnt/Wg pathway (Clevers, research shows that a member of BESS-3 subfamily, hng2,maybe 2006; Logan and Nusse, 2004; Prestwich and Macdougald, 2007). In involved in wing hinge patterning (Shukla et al., 2014). BESS-7 and the Wnt/Wg signaling pathway, Pangolin and Armdillo, as well BESS-8 are related to many organs, such as central nervous system, as their interacting cofactors, mediate transcriptional activation midgut, Malpighian tubules, fat body and especially the reproductive in the present of Wg (Pai et al., 1997; Aberle et al., 1997). Coop system. interacts with Pangolin by BESS domain, forming a complex on the The members of BESS-2 and BESS-11 show less conserved as Pangolin/TCF-binding motif, antagonizing the binding of Armadillo to compared with most of the subfamilies (except for BESS-12 subfamily) Pangolin. Over-expression of coop specifically suppresses Wg target discussed above, all of which belong to four orders of diptera, coleop- genes, whereas loss of coop function causes de-repression of Wg genes tera, lepidoptera and hymenoptera from insect. The BLAST search and (Song et al., 2010). The expression data show that coop genes are expression data show that these two subfamilies are being involved mainly expressed in mesothoracic tergum (Thibault et al., 2004). the functions in many organs, such as eye, crop, midgut, Malpighian The BESS-10 subfamily is annotated to BEAF32 (Boundary element- tubules, fat body, central nervous system and reproductive system. associated factor 32) gene family. Genomes of higher eukaryotes are In summary, the results of this study imply that the BESS-DC genes usually organized into distinct functional domains that are defined by may play an important role in central nervous system and long term boundary elements. These boundaries restrict the action of regulatory memory in many different species from the five phyla. BESS-DC genes elements to the appropriate genomic targets (Gerasimova and Corces, may also play specific function in several physiological systems 1996). Boundaries also act as barriers against the spread of inactive of insecta, such as reproduction and nervous systems, wing hinge heterochromatin to euchromatic regions. Sequences that act as bound- patterning and eye development. aries have been identified in yeast and mammal (Bell et al., 2001; West et al., 2002). Scs and scsʹ are among the first boundary sequences identified (Kellum and Schedl, 1991, 1992). The CGATA motif in scsʹ is 3.3. Number of BESS genes one of the targets that BEAF32 acts on. BEAF32 also binds to numerous other interbands on the polytene chromosomes, some of which have BESS domain is first defined by three genes from Drosophila,butalso been identified and shown to possess boundary activity, such as nuclear found in other organisms. The average numbers of the identified BESS matrix (Cuvier et al., 1998; Pathak et al., 2007). Expression data show genes in Diptera, Lepidoptera and other insects are 17, 10 and 4, respec- that BEAF32 is highly expressed in adult ovaries. tively. Phylogenetic tree analysis indicates that Drosophila BESS-DC The BESS-12 subfamily is represented by Adf-1.Somestudies proteins are associated with many subfamilies, such as BESS-5, which show that Adf-1 is involved in dendrite morphogenesis and memory. is predicted to be involved in a specific process of lethal hybrid rescue Adf-1 mutants show normal early memory but defective long-term in Drosophila. Some subfamilies have been suggested to be functionally memory (DeZazzo et al., 2000). Adf-1 is also involved in larval locomo- related to the metamorphosis. For example, CG3838, hng, coop and Dip3 tory behavior (Parrish et al., 2006; Sokolowski, 2001). The expression from BESS-2, BESS-3, BESS-9 and BESS-4 subfamilies, respectively, data of this study suggests that adf-1 may be related to the nervous appear to be up-regulated by juvenile hormone III treatment in adult fe- formation of female reproductive system. In Drosophila, Dip3 (Dorsal- male of D. melanogaster, indicating that these families may be related to interacting protein 3), a homolog of adf-1 and Stonewall, functions as a the juvenile hormone pathway. The results of functional prediction and and is able to bind DNA in a sequence specificmannerand qRT-PCR in this study also imply that BESS-DC genes mainly function in activate transcription. This gene possesses an N-terminal MADF domain metamorphosis stage, suggesting that Diptera and Lepidoptera may and a C-terminal BESS domain. The MADF domain directs the DNA need more specific BESS genes for gene transcription regulation during binding, while the BESS domain directs a variety of protein–protein metamorphosis. On the contrary, less BESS-DC genes were found in interactions. The MADF and BESS domains are related to the SANT other organisms that do not have such a special development stage in domain, which is found in many transcriptional regulators and their life, like zebrafish in Chordata, contain only 1–2 BESS-DC genes coregulators (Bhaskar and Courey, 2002). mainly in the BESS-12 subfamily, which is related to adf-1 that is in- The BESS-16a is a Su(var)3–7 (suppressor of variegation 3–7) volved in memory function. Arthopoda with specific metamorphosis subfamily. Su(var)3–7 is a heterochromatin-associated protein and seems to contain more BESS-DC genes than other phyla. 486 Z. Rao et al. / Gene 575 (2016) 478–487

4. Material and methods pupa3, pupa4, pupa5, pupa6 and adult, by using RNAiso (TAKARA, Japan) according to the instruction of manufacturer's protocol. The 4.1. Identification of BESS-DC genes in different species concentration and quality of RNA was determined by Nanodrop ND- 100 Spectrophotometer (GE Healthcare Life Sciences, Piscataway, Because the resultant BESS domains from pfam online research USA) and agarose gel analysis. The RNA was then used a template to (http://pfam.xfam.org/) not well recognized the adf family members, synthesize cDNA by reverse transcription (M-MLV RTase cDNA Synthe- for example, dmadf-1, in this study, HMM was reconstructed with sis Kit, TAKARA, Japan). qRT-PCR was performed in 96 well plate and Hmmbuilder from hmmer3.0 (Finn et al., 2011). Forty nine BESS each reaction for the target genes and control (BmActin) was repeated domain seed sequences, out of which 43 were referred to pfam database for three times in one plate. Reaction was performed under the follow- (http://pfam.sanger.ac.uk), was chosen from fly (30 sequences), mos- ing condition: 1 cycle of denaturation at 95 °C for 10 s, 40 cycles of 95 °C quito (11 sequences), bee (5 sequences) and globefish (3 sequences). for 5 s, 60 °C for 31 s. The qRT-PCR analyses were repeated biologically BESS-DC proteins were searched in the proteomes of different species for three times. by Hmmsearch. Supplementary data to this article can be found online at http://dx. All of the proteomes in this study were downloaded from uniprot doi.org/10.1016/j.gene.2015.09.024. (http://www.uniprot.org/) and only those proteomes (3328 in total) of the species whose genomes have been completely sequenced were Acknowledgments used for analysis, including proteomes of 348 eukaryota species, 1653 bacteria, 133 archaea and 1194 viruses. All the proteome data were This study was supported by grants from National Natural Science combined by using python script and Hmmsearch was used to search Foundation of China (31330071) and National Basic Research Program BESS-DC proteins in these 3328 proteomes with the E-value lower of China (2012CB114602). than 10−5 as a threshold.

References 4.2. Phylogenetic analysis Aberle, H., Bauer, A., Stappert, J., Kispert, A., Kemler, R., 1997. Beta-catenin is a target for Phylogenetic analysis was conducted using MEGA5.2. Because BESS the ubiquitin–proteasome pathway. EMBO J. 16, 3797–3804. domain length is short, the distance method like neighbor joining (NJ) Bell, A.C., West, A.G., Felsenfeld, G., 2001. Insulators and boundaries: versatile regulatory elements in the eukaryotic genome. Science 291, 447–450. is not suitable. Therefore, the characteristic method was used in this Bennett, C.N., Hodge, C.L., MacDougald, O.A., Schwartz, J., 2003. Role of Wnt10b and C/ study. All the identified BESS domains were used to construct the EBPalpha in spontaneous adipogenesis of 243 cells. Biochem. Biophys. Res. Commun. – – 302, 12–16. maximum likelihood (ML) characteristic tree. Jones Taylor Thornton – “ Bhaskar, V., Courey, A.J., 2002. The MADF BESS domain factor Dip3 potentiates synergistic (JTT) model was used as the best protein model with Rates and activation by dorsal and twist. Gene 299, 173–184. Patterns” set to Gamma distributed (G) and “Data subset to use” set to Brideau, N.J., Flores, H.A., Wang, J., Maheshwari, S., Wang, X., et al., 2006. Two partial deletion at a value of 95. The other options were set defaults. Dobzhansky–Muller genes interact to cause hybrid lethality in Drosophila. Science 314, 1292–1295. Maximum parsimony (MP) and NJ analysis were also performed as Burrell, B.D., Sahley, C.L., 2001. Learning in simple systems. Curr. Opin. Neurobiol. 11, references. The tree was bootstrapped with 1000 replicates to provide 757–764. statistical information and to guarantee the accuracy of constructed Cattanach, B.M., Williams, C.E., Bailey, H., 1972. Identification of the linkage groups carried by the metacentric chromosomes of the tobacco mouse (Mus poschiavinus). Cytoge- clades. The threshold was set to 50%. netics 11, 412–423. Chatterjee, R.N., Chatterjee, P., Pal, A., Pal-Bhadra, M., 2007. Drosophila simulans Lethal 4.3. Alignment and structure analysis of sequences hybrid rescue mutation (Lhr) rescues inviable hybrids by restoring X chromosomal dosage compensation and causes fluctuating asymmetry of development. J. Genet. 86, 203–215. Multiple sequence alignment was made by using ClustalW in Christodoulides, C., Lagathu, C., Sethi, J.K., Vidal-Puig, A., 2009. Adipogenesis and WNT Mega5.1 and DNAman. Secondary structure was predicted using online signalling. Trends Endocrinol. Metab. 20, 16–24. programs, Jpred3 (http://www.compbio.dundee.ac.UK/www-jpred/) Clark, K.A., McKearin, D.M., 1996. The Drosophila stonewall gene encodes a putative transcription factor essential for germ cell development. Development 122, 937–950. (Cole et al., 2008; Cuff et al., 1998) and Sopma (http://npsa-pbil.ibcp. Cleard, F., Delattre, M., Spierer, P., 1997. SU(VAR)3–7, a Drosophila heterochromatin- fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html)(Geourjon and associated protein and companion of HP1 in the genomic silencing of position- – Deleage, 1995), CFSSP (http://www.biogem.org/tool/chou-fasman/) effect variegation. EMBO J. 16, 5280 5288. Clevers, H., 2006. Wnt/beta-catenin signaling in development and disease. Cell 127, (Cuff et al., 1998), HNN (https://npsa-prabi.ibcp.fr/cgi-bin/secpred_ 469–480. gor4.pl)(Zhao et al., 2004) and PHD methods (https://npsa-prabi.ibcp. Cole, C., Barber, J.D., Barton, G.J., 2008. The Jpred 3 secondary structure prediction server. fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_phd.html)(Cleard Nucleic Acids Res. 36, W197–W201. Cuff, J.A., Clamp, M.E., Siddiqui, A.S., Finlay, M., Barton, G.J., 1998. JPred: a consensus et al., 1997; Cattanach et al., 1972). secondary structure prediction server. Bioinformatics 14, 892–893. Nuclear localization signal (NLS) and other signal sequences were Cutler, G., Perry, K.M., Tjian, R., 1998. Adf-1 is a nonmodular transcription factor that predicted by using wolfpsort (http://wolfpsort.org/). Nuclear export contains a TAF-binding Myb-like motif. Mol. Cell. Biol. 18, 2252–2261. Cuvier, O., Hart, C.M., Laemmli, U.K., 1998. Identification of a class of chromatin boundary signal (NES) was predicted by using NetNES (http://www.cbs.dtu.dk/ elements. Mol. Cell. Biol. 18, 7478–7486. services/NetNES). Delattre, M., Spierer, A., Hulo, N., Spierer, P., 2002. A new gene in Drosophila melanogaster, The isoelectric point (pI) and molecular mass were predicted by Ravus, the phantom of the modifier of position-effect variegation Su(var)3–7. Int. J. Dev. Biol. 46, 167–171. using Compute pI/Mw online (http://web.expasy.org/compute_pi/). DeZazzo, J., Sandstrom, D., de Belle, S., Velinzon, K., Smith, P., et al., 2000. nalyot, a The aliphatic index and hydropathicity were predicted by using mutation of the Drosophila myb-related Adf1 transcription factor, disrupts synapse protparam online (http://web.expasy.org/protparam/). formation and olfactory memory. Neuron 27, 145–158. Duong, H.A., Wang, C.W., Sun, Y.H., Courey, A.J., 2008. Transformation of eye to antenna by misexpression of a single gene. Mech. Dev. 125, 130–141. 4.4. Expression profile of BESS-DC genes in B. mori England, B.P., Admon, A., Tjian, R., 1992. Cloning of Drosophila transcription factor Adf-1 reveals homology to Myb oncoproteins. Proc. Natl. Acad. Sci. U. S. A. 89, 683–687. Expression profilesoftheBESS-DCgenesinB. mori were analyzed by Finn, R.D., Clements, J., Eddy, S.R., 2011. HMMER web server: interactive sequence similar- ity searching. Nucleic Acids Res. 39, W29–W37. using quantitative real-time PCR (qRT-PCR) method. Total RNA was Gat, U., DasGupta, R., Degenstein, L., Fuchs, E., 1998. De novo hair follicle morphogenesis extracted from the whole insect of the silkworm at 20 developmental and hair tumors in mice expressing a truncated beta-catenin in skin. Cell 95, stages: egg, L1D1 (day 1 from 1st instars larval stage), L2D1, L3D1, 605–614. Geourjon, C., Deleage, G., 1995. SOPMA: significant improvements in protein secondary L4D1, L5D4, L5D5, L5D6, L5D7, W0 (the moment entering wandering structure prediction by consensus prediction from multiple alignments. Comput. stage), W1, W2, prepupa, pupa1 (day 1 from pupa stage), pupa2, Appl. Biosci. 11, 681–684. Z. Rao et al. / Gene 575 (2016) 478–487 487

Gerasimova, T.I., Corces, V.G., 1996. Boundary and insulator elements in chromosomes. Shirata, M., Araye, Q., Maehara, K., Enya, S., Takano-Shimizu, T., et al., 2013. Allelic Curr. Opin. Genet. Dev. 6, 185–192. asymmetry of the Lethal hybrid rescue (Lhr) gene expression in the hybrid between Hartmann, C., 2006. A Wnt canon orchestrating osteoblastogenesis. Trends Cell Biol. 16, Drosophila melanogaster and D. simulans:confirmation by using genetic variations of 151–158. D. melanogaster. Genetica. Kellum, R., Schedl, P., 1991. A position-effect assay for boundaries of higher order Shukla, V., Habib, F., Kulkarni, A., Ratnaparkhi, G.S., 2014. Gene duplication, lineage- chromosomal domains. Cell 64, 941–950. specific expansion, and subfunctionalization in the MADF–BESS family patterns the Kellum, R., Schedl, P., 1992. A group of scs elements function as domain boundaries in an Drosophila wing hinge. Genetics 196, 481–496. enhancer-blocking assay. Mol. Cell. Biol. 12, 2424–2431. Sokolowski, M.B., 2001. Drosophila: genetics meets behaviour. Nat. Rev. Genet. 2, Logan, C.Y., Nusse, R., 2004. The Wnt signaling pathway in development and disease. 879–890. Annu. Rev. Cell Dev. Biol. 20, 781–810. Song, H., Goetze, S., Bischof, J., Spichiger-Haeusermann, C., Kuster, M., et al., 2010. Coop Maheshwari, S., Barbash, D.A., 2012. An indel polymorphism in the hybrid incompatibility functions as a corepressor of Pangolin and antagonizes Wingless signaling. Genes gene lethal hybrid rescue of Drosophila is functionally relevant. Genetics 192, Dev. 24, 881–886. 683–691. Spierer, A., Seum, C., Delattre, M., Spierer, P., 2005. Loss of the modifiers of variegation Pai, L.M., Orsulic, S., Bejsovec, A., Peifer, M., 1997. Negative regulation of Armadillo, a Su(var)3–7 or HP1 impacts male X polytene chromosome morphology and dosage Wingless effector in Drosophila. Development 124, 2255–2266. compensation. J. Cell Sci. 118, 5047–5057. Parrish, J.Z., Kim, M.D., Jan, L.Y., Jan, Y.N., 2006. Genome-wide analyses identify transcrip- Thibault, S.T., Singer, M.A., Miyazaki, W.Y., Milash, B., Dompe, N.A., et al., 2004. A tion factors required for proper morphogenesis of Drosophila sensory neuron complementary transposon tool kit for Drosophila melanogaster using P and piggyBac. dendrites. Genes Dev. 20, 820–835. Nat. Genet. 36, 283–287. Pathak, R.U., Rangaraj, N., Kallappagoudar, S., Mishra, K., Mishra, R.K., 2007. Boundary UniProt, C., 2010. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38, element-associated factor 32B connects chromatin domains to the nuclear matrix. D142–D148. Mol. Cell. Biol. 27, 4796–4806. Watanabe, T.K., 1979. A gene that rescues the lethal hybrids between Drosophila Prestwich, T.C., Macdougald, O.A., 2007. Wnt/beta-catenin signaling in adipogenesis and melanogaster and D. simulans.Jpn.J.Genet.54,325–331. metabolism. Curr. Opin. Cell Biol. 19, 612–617. West, A.G., Gaszner, M., Felsenfeld, G., 2002. Insulators: many functions, many Prigent, S.R., Matsubayashi, H., Yamamoto, M.T., 2009. Transgenic Drosophila simulans mechanisms. Genes Dev. 16, 271–288. strains prove the identity of the speciation gene Lethal hybrid rescue. Genes Genet. Xia, Q., Cheng, D., Duan, J., Wang, G., Cheng, T., et al., 2007. Microarray-based gene Syst. 84, 353–360. expression profiles in multiple tissues of the domesticated silkworm, Bombyx mori. Ratnaparkhi, G.S., Duong, H.A., Courey, A.J., 2008. Dorsal interacting protein 3 potentiates Genome Biol. 8, R162. activation by Drosophila Rel homology domain proteins. Dev. Comp. Immunol. 32, Zhao, K., Hart, C.M., Laemmli, U.K., 1995. Visualization of chromosomal domains with 1290–1300. boundary element-associated factor BEAF-32. Cell 81, 879–889. Reuter, G., Giarre, M., Farah, J., Gausz, J., Spierer, A., et al., 1990. Dependence of position- Zhao, X., Gan, L., Pan, H., Kan, D., Majeski, M., et al., 2004. Multiple elements regulate effect variegation in Drosophila on dose of a gene encoding an unusual zinc-finger nuclear/cytoplasmic shuttling of FOXO1: characterization of phosphorylation- and protein. Nature 344, 219–223. 14-3-3-dependent and -independent mechanisms. Biochem. J. 378, 839–849. Reya, T., Clevers, H., 2005. Wnt signalling in stem cells and cancer. Nature 434, 843–850.