g in Geno nin m i ic M s

ta & a P Sohpal et al., J Data Mining Genomics Proteomics 2013, 4:2 D r

f o Journal of

o t

e

l DOI: 10.4172/2153-0602.1000128

o

a

m

n

r

i

c u

s o J ISSN: 2153-0602 Data Mining in Genomics & Proteomics

Research Article Open Access Computational Analysis of Distance and Character based for Capsid of Human Herpes Virus Vipan Kumar Sohpal1*, Apurba Dey2 and Amarpal Singh3 1Department of Chemical & Biotechnology, Beant College of Engineering & Technology, Gurdaspur Punjab, India 2Department of Biotechnology, National Institute of Technology, Durgapur West Bengal, India 3Department of Electronics & Communication Engineering, Beant College of Engineering & Technology, Gurdaspur Punjab, India

Abstract Molecular phylogenetic is a fundamental aspect of evolutionary analysis and depends on distance & character based methods. In this paper, we compare the viral capsid proteins of HHV to analyze the relationship among proteins using substitution models, phylogenetic model with exhaustive search and ME techniques. The effect of Poisson correction with shape parameter on NJ and UPGMA trees also analyze. We show by extensive computer simulation that phylogenetic tree is the reflection of substitution distance. The effect of max-mini branch & bound method and mini- mini heuristic model and log likelihood associated with character based tree also discussed. We applied ML and MP for perfectly analysis of proteins relationship. We conclude that substitution models, shape parameter, search level and SBL have a critical role to reconstruct phylogenetic tree. study shows that χ2 value is higher in closely as compare distant related proteins.

Keywords: Human herpes virus (HHV); Phylogenetic methods (NJ, significant computational challenge. Chen et al. [9] compared with GI, UPGMA, ML and MP); Sum of branch length (SBL); Molecular clock they found that the GII geno-group had four deletions and two special insertions in the VP1 region. Introduction Posada and Buckley [10] focused on most commonly implemented Molecular phylogenetic analyses consist of three stages: (1) model selection approach, the hierarchical likelihood ratio test, is not selection and multiple of homologous the optimal strategy for model selection but the Akaike Information (2) substitution model of amino acid evolution, (3) tree building Criterion (AIC) and Bayesian methods offer to cut edge. Saitou and and tree evaluation. The phylogeny of protein assess through various Nei [11] proposed an alternative for reconstructing phylogenetic software such as; Paup, Phylip and Mega. These contain numbers of trees from evolutionary distance data. The principle of this method proteins as well as nucleotide substitution models for evolutionary is to find pairs of operational taxonomic units which minimize analysis as mentioned in step 2 of molecular phylogenetic. Similarly, the total branch length at each stage of clustering. Sohpal et al. [12] the tree construction and boot strap evaluation methods are also reviewed the software and analyzed that MEGA5, is available in analysis tool. In this paper, we focus on MEGA software user-friendly software for mining online databases, building sequence for Phylogenetic analysis of viral capsid protein of human herpes virus alignments and phylogenetic trees, and using methods of evolutionary (HHV), which causes chicken pox, herpes zoster (VZV), cancers, and bioinformatics in basic biology, biomedicine, and evolution. Kumar encephalitis in the human being. Previous attempts (Literature) have and Filipski [13] compared the traditional approach to phylogeny yielded comparative analysis of phylogenetic methods. reconstruction, in a set of homologous sequences which represent to a character. Sullivan et al. [14] conducted a direct evaluation using Alfaro and Huelsenbeck [1] used a simulation approach to assess simulations and compared the accuracy of phylogenies estimated using the performance of both Bayesian and AIC-based methods and full optimization of all model parameters on each tree evaluated to the compared it. Geddes et al. [2] compared the phylogenetic trees of core accuracy of trees estimated via successive approximations. Tamura erythritol catabolic genes with species phylogeny provides evidence et al. [15] developed MEGA5 which is a collection of Maximum that is consistent with these loci having been horizontally transferred Likelihood (ML) analyses for inferring evolutionary trees, selecting from the alpha-proteobacteria into both the beta and gamma - best fit substitution models inferring ancestral states and sequences proteobacteria. Felsenstein [3] applied the maximum likelihood and estimating evolutionary rates site-by-site. Yang [16] performed techniques to the estimation of evolutionary trees from nucleic acid the simulation with assumptions underlying the maximum parsimony sequence data. Felsenstein [4] developed statistical method bootstrap (MP) method of phylogenetic tree reconstruction intuitively examine. and show significant evidence for a group if it is defining by three or more characters. Kenney and Rosenzweig [5] study indicates that Mbn- like compounds may be more widespread than previously thought bactins. Guindon et al. [6] introduced a new algorithm to search the *Corresponding author: Vipan Kumar Sohpal, Department of Chemical & tree space, and they used parsimony criterion to filter out the least Biotechnology, Beant College of Engineering & Technology, Gurdaspur Punjab, India, E-mail: [email protected] promising topology. Sourdis and Nei [7] studied the relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in Received March 23, 2013; Accepted March 27, 2013; Published April 03, 2013 obtaining the correct tree using computer simulation. Krause et al. [8] Citation: Sohpal VK, Dey A, Singh A (2013) Computational Analysis of Distance and worked on metagenomics, which is providing striking insights into the Character based Phylogenetic Tree for Capsid Proteins of Human Herpes Virus. J Data Mining Genomics Proteomics 4: 128. doi:10.4172/2153-0602.1000128 of microbial communities. They developed massively parallel 454 pyrosequencing technique that gives the opportunity to rapidly Copyright: © 2013 Sohpal VK, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits obtain metagenomic sequences at a low cost and without cloning bias. unrestricted use, distribution, and reproduction in any medium, provided the The phylogenetic analysis of the short reads produced represents a original author and source are credited.

J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 4 • Issue 2 • 1000128 Citation: Sohpal VK, Dey A, Singh A (2013) Computational Analysis of Distance and Character based Phylogenetic Tree for Capsid Proteins of Human Herpes Virus. J Data Mining Genomics Proteomics 4: 128. doi:10.4172/2153-0602.1000128

Page 2 of 8

The results of simulated work appeared to support the intuitive. Yang distance use as a parameter to compare substitution models of proteins. [17] show that failure to account for rate variation can have drastic Table 1 gives the data for distance versus 12 viral capsid proteins of effects, leading to biased dating of speciation events, biased estimation human herpes virus for three models. The analysis shows that distance of the transition/transversion rate ratio, and incorrect reconstruction between Triplex capsid protein HHV2H and HHV2G has minimum of phylogenies. distance in all the substitution models. The distance divergences between these two strains are 0.019, 2.00 and 0.019 for the p-distance, In this paper, we focus on MEGA software for reconstruction of number of amino acid and poisson models respectively. The second an evolutionary tree of viral capsid protein of human herpes virus lowest distance values observe in major capsid protein HHV1 and (HHV). The objective of this paper to analyze the relationship between HHV2H are 0.093, 10.00 and 0.108 for the p-distance, number of amino protein using distance and character based methods. We also focus acid and poisson models. On the other hand, the highest divergence on the poisson correction distance substitution with shape parameter found in portal protein HHV2H and triplex capsid protein HHV2H are to develop the relatedness of proteins using SBL and Log-likelihood 0.935, 100 and 97.89 for the same evolutionary distance models. Table parameter. 1 shows that the results are qualitative consistent irrespective of model. Results Lower divergence (Triplex capsid protein HHV2H and HHV2G) signify the closely related proteins and a substantial increase in the Effect of substitution models estimated distance for the divergent sequences (portal protein HHV2H and triplex capsid protein HHV2H). These substitution models lead to Evolutionary distances are a fundamental tool for the study of the creation of trees with different branch lengths and topologies. The molecular evolution, phylogenetic reconstruction and the estimate effects of substitution models of amino acid on phylogenetic trees for of divergence time. Here, we use the three substitution models 12 capsid proteins detect using the neighbor-joining method, which p-distance, number of differences and poisson model for estimating uses the distance information that is present in table 1. The trees made the evolutionary divergence between the viral capsid of proteins and using figure 1 the p-distance and figure 2 the Poisson correction having

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [ 1] Triplex_Capsid_Protein_HHV11 [ 2] Triplex_Capsid_Protein_HHV2G 0.421 [ 3] Triplex_Capsid_Protein_HHV2H 0.43 0.019 [ 4] Triplex_Capsid_Protein_HHV2 0.879 0.897 0.888 [ 5] Capsid_Protein-II_HHV11 0.888 0.925 0.925 0.916 [ 6] Capsid_Protein-II_HHV2H 0.888 0.907 0.907 0.897 0.168 [ 7] Capsid_Protein__HHV1 0.822 0.85 0.86 0.888 0.888 0.879 [ 8] Capsid_Protein_HHV2 0.822 0.813 0.822 0.86 0.85 0.841 0.28 [ 9] Portal_Protein_HHV11 0.925 0.925 0.916 0.916 0.935 0.925 0.907 0.925 [10] Portal_Protein_HHV2H 0.925 0.935 0.935 0.916 0.925 0.916 0.907 0.916 0.112 [11] MajorCapsid_Protein_HHV2H 0.897 0.907 0.916 0.897 0.804 0.766 0.907 0.907 0.907 0.916 [12] Major_Capsid_Protein_HHV1 0.907 0.916 0.925 0.888 0.794 0.757 0.897 0.897 0.879 0.888 0.093 [ 1] Triplex_Capsid_Protein_HHV11 [ 2] Triplex_Capsid_Protein_HHV2G 45 [ 3] Triplex_Capsid_Protein_HHV2H 46 2 [ 4] Triplex_Capsid_Protein_HHV2 94 96 95 [ 5] Capsid_Protein-II_HHV11 95 99 99 98 [ 6] Capsid_Protein-II_HHV2H 95 97 97 96 18 [ 7] Capsid_Protein__HHV1 88 91 92 95 95 94 [ 8] Capsid_Protein_HHV2 88 87 88 92 91 90 30 [ 9] Portal_Protein_HHV11 99 99 98 98 100 99 97 99 [10] Portal_Protein_HHV2H 99 100 100 98 99 98 97 98 12 [11] MajorCapsid_Protein_HHV2H 96 97 98 96 86 82 97 97 97 98 [12] Major_Capsid_Protein_HHV1 97 98 99 95 85 81 96 96 94 95 10 [ 1] Triplex_Capsid_Protein_HHV11 [ 2] Triplex_Capsid_Protein_HHV2G 0.965 [ 3] Triplex_Capsid_Protein_HHV2H 1.012 0.019 [ 4] Triplex_Capsid_Protein_HHV2 29.43 40.78 34.41 [ 5] Capsid_Protein-II_HHV11 34.41 75.68 75.68 60.23 [ 6] Capsid_Protein-II_HHV2H 34.41 49.09 49.09 40.78 0.221 [ 7] Capsid_Protein__HHV1 13.91 19.57 22.22 34.41 34.41 29.43 [ 8] Capsid_Protein_HHV2 13.91 12.56 13.91 22.22 19.57 17.36 0.459 [ 9] Portal_Protein_HHV11 75.68 75.68 60.23 60.23 97.98 75.68 49.09 75.68 [10] Portal_Protein_HHV2H 75.68 97.98 97.98 60.23 75.68 60.23 49.09 60.23 0.134 [11] MajorCapsid_Protein_HHV2H 40.78 49.09 60.23 40.78 11.39 7.998 49.09 49.09 49.09 [12] Major_Capsid_Protein_HHV1 49.09 60.23 75.68 34.41 10.37 7.379 40.78 40.78 29.43 0.108 0.108 Table 1: Distance data versus 12 viral capsid proteins of human herpes virus for three substitution models.

J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 4 • Issue 2 • 1000128 Citation: Sohpal VK, Dey A, Singh A (2013) Computational Analysis of Distance and Character based Phylogenetic Tree for Capsid Proteins of Human Herpes Virus. J Data Mining Genomics Proteomics 4: 128. doi:10.4172/2153-0602.1000128

Page 3 of 8 gamma distribution 0.52. Branch lengths are in units of evolutionary tree. The second lowest distance separation group distances used to infer NJ tree. The sum of the branch lengths was 313.93 (major capsid protein HHV1 and HHV2H) has the next narrow taxon in figure 1 and 3.156 in figure 2 with bootstrap consensus. Figure 1 of separation on 100% bootstrap consensus. Those 12 OTUs distance NJ (p-distance) tree shows that 100% of the bootstrap trials, in entire relationships are visualize in the phylogenetic tree. The Sum of Branch capsid proteins support as being a . Figure 2, the clade containing Length (SBL) is the second parameter for analysis of distance based portal protein HHV11 and HHV2H support 98% of the bootstrap phylogenetic tree. Neighbor joining method minimizes the sum of replicates. While the capsid proteins of HHV11 and HHV2H have 28% branch length at each stage of clustering OTU’s that is a function of the of bootstrap tree, this means that in 72% of the bootstrap trees, triplex gamma distribution (shape parameter). We construct the NJ tree under capsid protein of HHV2 joined that group of proteins. The minimum different gamma distribution range from 0.5 to 10. Figure 3a-3d shows divergence observed in table 1, figures 1 and 2 between triplex capsid the phylogenetic tree with poisson correction and shape parameter protein HHV2H and HHV2G and distant related proteins are portal of 1.0, 2.5, 4.0 and 5.0. Quantitatively lowest taxon separation protein HHV2H & triplex capsid protein HHV2H. From that we observes in triplex capsid protein HHV2G and HHV2H in the entire can infer that there is not strong support for a distinct, closely relate phylogenetic tree irrespective of the shape parameter values. Contrary major/capsid group that shared an ancestor with the portal protein. to that bootstrap replicate reach to 100% from 22% on increasing the So further analysis of distance and character based phylogenetic trees gamma distribution. The higher or 100% bootstrap consensus allows reconstruction is required. measuring the distance more accurately than low bootstrapping value. From figure 3d on the basis of taxon separation, the proteins can be Distance based phylogenetic tree arranged as triplex capsid

J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 4 • Issue 2 • 1000128 Citation: Sohpal VK, Dey A, Singh A (2013) Computational Analysis of Distance and Character based Phylogenetic Tree for Capsid Proteins of Human Herpes Virus. J Data Mining Genomics Proteomics 4: 128. doi:10.4172/2153-0602.1000128

Page 4 of 8

72 Triplex Capsid Protein_HHV2G 90 Triplex Capsid Protein_HHV2G 75 0.010 96 0.0095 0.497 Triplex Capsid Protein HHV2H 0.3605 Triplex Capsid Protein HHV2H 0.010 0.0095 35 8.384 Triplex Capsid Protein_HHV11 45 2.1355 Triplex Capsid Protein_HHV11 0.507 0.3700 9.473 22 Capsid Protein _HHV1 45 1.2870 75 Capsid Protein _HHV1 0.233 0.1948 8.658 Capsid Protein_HHV2 2.3107 Capsid Protein_HHV2 13.110 0.233 0.8223 0.1948 Triplex Capsid Protein_HHV2 Triplex Capsid Protein_HHV2 18.363 3.7925 50 Capsid Protein-II_HHV11 1.0525 92 Capsid Protein-II_HHV11 20 0.111 0.1011 5.174 Capsid Protein-II HHV2H 68 1.6932 Capsid Protein-II HHV2H 0.111 0.1011 42 23.088 MajorCapsid Protein HHV2H 2.8204 78 MajorCapsid Protein HHV2H 5.285 0.0515 3.100 28 Major Capsid Protein_HHV1 1.7427 Major Capsid Protein_HHV1 18.157 0.0515 99 Portal Protein_HHV11 99 Portal Protein_HHV11 10.216 0.067 0.0632 18.089 Portal Protein_HHV2H 5.6040 Portal Protein_HHV2H 0.067 0.0632

(a) (b)

98 Triplex Capsid Protein_HHV2G 100 Triplex Capsid Protein_HHV2G 100 0.0095 100 0.0095 0.3005 Triplex Capsid Protein HHV2H 0.2834 Triplex Capsid Protein HHV2H 0.0095 0.0095 58 0.9979 Triplex Capsid Protein_HHV11 65 0.7826 Triplex Capsid Protein_HHV11 0.3100 0.2928 54 0.3923 99 Capsid Protein _HHV1 51 0.2644 100 Capsid Protein _HHV1 0.1758 0.1700 1.1321 Capsid Protein_HHV2 0.9054 Capsid Protein_HHV2 0.2057 0.1758 0.1297 0.1700 Triplex Capsid Protein_HHV2 Triplex Capsid Protein_HHV2 1.7002 1.3399 0.2489 100 Capsid Protein-II_HHV11 0.1543 100 Capsid Protein-II_HHV11 0.0956 0.0938 80 0.9515 Capsid Protein-II HHV2H 85 0.7947 Capsid Protein-II HHV2H 0.0956 0.0938 0.8588 99 MajorCapsid Protein HHV2H 0.5811 100 MajorCapsid Protein HHV2H 0.0500 0.0495 0.9971 Major Capsid Protein_HHV1 0.8389 Major Capsid Protein_HHV1 0.0500 0.0495 100 Portal Protein_HHV11 100 Portal Protein_HHV11 0.0609 0.0602 2.0939 Portal Protein_HHV2H 1.5638 Portal Protein_HHV2H 0.0609 0.0602

(c) (d)

Figure 3: A dataset consisting of 12 viral capsid proteins of HHV strains and tree were generated using neighbor joining technique, the poisson correction and shape parameter of (a) 1.0, (b) 2.5, (c) 4.0 and (d) 5.0.

100 Triplex Capsid Protein_HHV2G 100 Triplex Capsid Protein_HHV2G 100 0.010 100 0.0095 0.497 Triplex Capsid Protein HHV2H 0.3605 Triplex Capsid Protein HHV2H 0.010 0.0095 72 8.384 Triplex Capsid Protein_HHV11 77 2.1355 Triplex Capsid Protein_HHV11 0.507 0.3700 56 9.473 100 Capsid Protein _HHV1 1.2870 100 Capsid Protein _HHV1 0.233 54 Capsid Protein_HHV2 0.1948 8.375 8.658 2.3107 Capsid Protein_HHV2 0.233 0.8223 0.1948 Triplex Capsid Protein_HHV2 Triplex Capsid Protein_HHV2 18.363 11.926 100 Capsid Protein-II_HHV11 3.7925 0.111 1.0525 100 Capsid Protein-II_HHV11 97 4.943 Capsid Protein-II HHV2H 0.1011 0.111 97 1.6932 Capsid Protein-II HHV2H 21.684 100 MajorCapsid Protein HHV2H 0.1011 0.054 2.8204 100 MajorCapsid Protein HHV2H 5.000 Major Capsid Protein_HHV1 0.0515 0.054 1.7427 Major Capsid Protein_HHV1 100 Portal Protein_HHV11 0.0515 0.067 100 Portal Protein_HHV11 38.597 Portal Protein_HHV2H 0.0632 0.067 5.6040 Portal Protein_HHV2H 0.0632 (a) (b)

100 Triplex Capsid Protein_HHV2G 100 Triplex Capsid Protein_HHV2G 100 0.0095 100 0.0095 0.3005 Triplex Capsid Protein HHV2H 0.2834 Triplex Capsid Protein HHV2H 0.0095 0.0095 75 0.9979 Triplex Capsid Protein_HHV11 77 0.7826 Triplex Capsid Protein_HHV11 0.3100 0.2928 59 0.3923 100 Capsid Protein _HHV1 54 0.2644 100 Capsid Protein _HHV1 0.1758 0.1700 1.1321 Capsid Protein_HHV2 0.9054 Capsid Protein_HHV2 0.2057 0.1758 0.1297 0.1700 Triplex Capsid Protein_HHV2 Triplex Capsid Protein_HHV2 1.7002 1.3399 0.2489 100 Capsid Protein-II_HHV11 0.1543 100 Capsid Protein-II_HHV11 0.0956 0.0938 95 0.9515 Capsid Protein-II HHV2H 95 0.7947 Capsid Protein-II HHV2H 0.0956 0.0938 0.8588 100 MajorCapsid Protein HHV2H 0.5811 100 MajorCapsid Protein HHV2H 0.0500 0.0495 0.9971 Major Capsid Protein_HHV1 0.8389 Major Capsid Protein_HHV1 0.0500 0.0495 100 Portal Protein_HHV11 100 Portal Protein_HHV11 0.0609 0.0602 2.0939 Portal Protein_HHV2H 1.5638 Portal Protein_HHV2H 0.0609 0.0602

(c) (d)

Figure 4: A dataset consisting of 12 viral capsid proteins of HHV strains and tree were generated using UPGMA, the poisson correction and shape parameter of (a) 1.0, (b) 2.5, (c) 4.0 and (d) 5.0.

J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 4 • Issue 2 • 1000128 Citation: Sohpal VK, Dey A, Singh A (2013) Computational Analysis of Distance and Character based Phylogenetic Tree for Capsid Proteins of Human Herpes Virus. J Data Mining Genomics Proteomics 4: 128. doi:10.4172/2153-0602.1000128

Page 5 of 8

total cost (number of changes) in MP search level (1 & 3) is equal, and 140 2 is approximately similar. UPGMA NJ Maximum likelihood method is also character-based approach, which uses probabilistic models to choose an optimum tree. In this 120 section, we incorporate gamma parameters that account rate variations across sites and measure impact of gamma parameter on the log likelihood score for viral capsid data. Poisson substitution model use 100 with shape parameter values in the range of 2 to 7. The log likelihood score elevate with a rise in gamma parameter from 2 to 5, but the value of the log likelihood become constant from 5.25 gamma value. 80 The extreme value observes -2467.29 with discrete rate 0.931, 0.9613, 1.0364, 1.1009 and 1.1011. Figure 7a shows that gamma value control L B

S the log likelihood value (probability) up to a limit in the original tree 60 highest values indicate (gamma=5) that greatest likelihood producing in observed dataset. Figure 7a is the original tree without bootstrapping consensus. Now we use parametric bootstrapping analysis to remove sampling error of phylogenetic tree and reduce the deviation in sum of 40 the branch length. SBL is also the function of the gamma distribution at the lowest rate, but it become constant at a higher rate. The minimum SBL observed in both the cases (original and bootstrap consensus 20 tree) at lowest gamma parameter 2.0. The values of SBL vary from 8.3799 to 8.3927 in the original tree and 8.2414 to 8.3927 in case of bootstrap consensus tree. The lowest value of SBL variation is 1.65% 0 0 2 4 6 8 10 and approximately negligible at higher gamma value. It reflects that Shape Parameter original trees are shadow of bootstrap consensus and a similar result shown in figure 7b. Figure 5: A dataset consisting of 12 viral capsid proteins of HHV strains and sum of branch length were generated using UPGMA and neighbor joining having shape parameter in the range of 0.5 -10. Molecular clock We use the Tajima test of the molecular clock which basis on the equal rate (null hypothesis). The equality rate between the triplex capsid heuristic approach restricts the branch length 497, and the tree is shown protein sequences analyzed and their unique difference in sequence A, in figure 6a, and Long-branch Attraction (LBA) artifact use to analyze B and C are 0, 6 and 84 respectively. χ2 test statistic value for triplex the phylogenetic tree. In max-mini branch-&-bound tree depict that all capsid proteins 6.0 and P-value 0.01431. The P values less than 0.05 so clade has 100% bootstrap replication consensus for 12 OTU’s. Capsid the null hypothesis is rejecting meant equal rates between lineages does protein-II of HHV11 is rapidly evolving proteins as compare to Portal not exist. Similar simulation also performed with two more sets [1 A protein of HHV11 and HHV2H. Major capsid and capsid proteins (Capsid Protein _HHV1), B (Capsid Protein_HHV2) and 2 A (Major (HHV1 and HHV2 strains) develop at the same rate of mutation while Capsid Protein HHV2H), B (Major Capsid Protein_HHV1), C (Capsid triplex capsid proteins evolve with slowest rate. Protein_HHV2) and C (Capsid Protein-II_HHV11) and found that χ2 Mini-mini heuristic search algorithm also uses for finding the and P values reduce as shown in table 2. The observation shows that in maximum parsimony tree. Here, we use search factor up to three levels addition of lead lower χ2 and increases p-value. to control the extensiveness of the reconstructed phylogenetic tree. Analysis of figure 6b for 12 OTU of viral protein revealed that triplex Discussion capsid protein and portal protein evolve from monophyletic clade Substitution model is a powerful tool for reconstructing a distance while major capsid and capsid protein arise from other clade. Triplex based phylogenetic tree. The present study also supports this argument; capsid protein HHV2 and capsid protein HHV1 are a combination of it has shown that NJ method is efficient to obtaining the correct tree top and bottom . LBA analysis for search factor one indicates on the distance data. One of the main reasons for this is that the table that portal protein of HHV11 and HHV2H have high mutation during 1 produces distance matrix under three different substitution models evolution and same confirm by count the number of change. Capsid protein –II of HHV11 and HHV2H have less mutation as compare to Equality rate between sequences χ2 test statistic P-value portal protein virtue of that number of change at taxon level is less than A (Triplex Capsid Protein_HHV2G) 7. Major capsid protein is not far behind the portal and capsid protein B (Triplex Capsid Protein HHV2H) 6.0 0.01431 in term of change. Triplex capsid protein shows minimum value for the C (Triplex Capsid Protein_HHV11) A (Capsid Protein _HHV1) number of changes relative to other OTU’s. On comparative analysis B (Capsid Protein_HHV2) 4.0 0.04550 for search factor two and three as shown in figure 6c and 6d, that triplex C (Capsid Protein-II_HHV11) capsid protein’s clade length is constant and minimum mutagenic A (MajorCapsid Protein HHV2H) evolution. Capsid protein –II HHV-11 evolve at a faster rate relative to B (Major Capsid Protein_HHV1) 3.0 0.08326 other capsid proteins. Portal proteins show the maximum cost on the C (Capsid Protein_HHV2 basis of changes observes in both phylogenetic trees are 5 and 7. Capsid Table 2: Tajima’s relative rate test for χ2 test statistic, P-value less than 0.05 is often proteins and major capsid proteins show the variation in respect of used to reject the null hypothesis of equal rates between lineages. The analysis involved 3 amino acid sequences. All positions containing gaps and missing data search level. Mini-mini heuristic search algorithm assessed that the were eliminated. Evolutionary analyses were conducted in MEGA5.

J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 4 • Issue 2 • 1000128 Citation: Sohpal VK, Dey A, Singh A (2013) Computational Analysis of Distance and Character based Phylogenetic Tree for Capsid Proteins of Human Herpes Virus. J Data Mining Genomics Proteomics 4: 128. doi:10.4172/2153-0602.1000128

Page 6 of 8 1 1 1 1 V V H H 1 2 2 1 1 1 H 1 V V H 1 HH V HH 2 V H V 1 G 2 1 _ _ 2 V 2 V V V II II H H HH V HH - HH - V HH 2 2 HH _ _ _ n n _ _ HH V V i n i n HH

HH n HH n i i

n HH e i e _ i HH n

_ i t e t e n e 1 i e n _ t t e n n o i t o t i HH HH e t i i o V o 2

r n r o e t o e i o r e r e t r t P o P r V II II t t r

e

o P P - - r o P t

o P

o d

r P d H

HH r l l n n

P r r o i i l d

2 i i P _ a G a P r s d s HH i

P

P 1 a

t d t e e i

2 V p s p t l P _ d i t t 1 r n r s

d d i r a p s a i a V o o n o i o i p d s V t o i a p e r r s s i C a P 2 C r P p t HH

e

a

P

s 2 p P p P

C

o t a o

V

C p 1 a HH a

V C r n o HH P d d r a C _

i r r i i r V _ C P C o r

e s

s o o n j C P n HH t j o j i 4

x d 4 p x p 3 a i HH j _ i o a a e x d e a e a 5 3 2 5 e _ HH a s t r i l l n e M t _

s M M n i p o V C l C p P p

o

M i

i r i p e a p

r n r e t d r i a 0 P i 1 1 t i P

C 0 r T o T

3 HH

e s

2 o C r d 3 T t 2

d 10 _

r i p 1 10 i P 0 0 o 0 s

n a P s r

i p d 0 p 10 C 10 P i d 0 e a

0

a i t s x d s 10 o C p 10 i C

e 10 p r

a s l x a x P p p

e C e i

l a C d l

r x i p p C x i T s e

i

2 r l e 3 p r 1 2 l 1 T p 1 1 a T

1 1 i p

5 5 i r C 1 4 r

0 1 T 4

1 T 1

10 3 3 2 2 3 2 1 9 1 1 8 1 8 1 9 1 0 4 5 5 9 9 8 7 1 1 9 9 1 9 1 1 9 1 9 6 2 3 0 1 9 5 5 7 7 7 2 4 4 5 5 3 3

(a) (b) 1 1 2 1 1 V V V H 1 H 1 2 2 1 V 1 V H HH HH HH V V V 2 _ _ _ HH n n V II H i i HH _ - HH 1 HH 2 HH

e e _ n 1 n _ t t _ i V n i n HH n o o i V i

n H e e i r r i t e t e 2 n 1 e t t e P P o i G o t HH

V t o r V o 2

r HH e 2 o d d r o r t P _ i i P r V II r V

P

o P - s s

P

II d HH r P d H

HH p p - l

n i

d i l 2 i P i _ a a s G a d s n n HH

HH 1 a H s t e i i i p 2 V p t _ d C C t _ 1 n r 2 s p e e

a i r a i H V o n t t o n a p s V V x x o i e 2 i r C o o a C P p t e e

HH C e

P e

2 r r P V l l

r t a o t r

C HH P P p p

V r o n o HH o HH o d

i i C _ j

i r j i r r _ P d d r r r a

e s o HH n a i i II P P n 1 T T t j o i 4

- d p s s 3 _

i HH j 1 M i o a e M d d p a p

n 5 n e _

a s t r i i i i V t a a s s M n p o C P e e

o

M i

r p p t a C C t

r

e d a a o P o 0 1 t i HH P

C x r r

s o _ C C d e 3 2 2

P

2 d P 10 r i l p

n i 0 x s p l a i P d s

i e p i a e p l r C 10 d t t s a

0 a i p T r o p

i x 3 s C o r a C r

e p 10

0 P P l 1 T x

a

C x

1

p l e V e i 0 l 10 C a l

r t p 1 p x i T r 0 i

r HH e o 3 r 2 l T _ 1 1 10 P 0 T

1 p 8

0 5 i n r 1 i 10 0 4 T e 3 1

t 10 2 o 3 5 r 0 V 2 2 1 P

5 d 4 HH i _ s p n 2 1 i a 1 9 1 e 0 t C 9

o 10 r P 1

5 d i 9 s 7 1 5 p 1 5 9 a C

4 8 2 9 6 1 1 2 0 4 3 0 5 1 4 5 7 9 9 2 3 4 4 3 3 3 5 3 2

(c) 1 (d)

Figure 6: A dataset consisting of 12 viral capsid proteins of HHV strains and tree were generated using maximum parsimony technique and heuristic analysis (a) max-mini (b) mini-mini with search level 1.

J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 4 • Issue 2 • 1000128 Citation: Sohpal VK, Dey A, Singh A (2013) Computational Analysis of Distance and Character based Phylogenetic Tree for Capsid Proteins of Human Herpes Virus. J Data Mining Genomics Proteomics 4: 128. doi:10.4172/2153-0602.1000128

Page 7 of 8

2467.29 Original tree Score of Maximum Likelihood tree 9 Bootstrap Consenus tree

2467.28 8

2467.27 7

6 h 2467.26 t g d n o e L

ho 5 h li c e n k 2467.25 i a r L

B

f 4 og o

L m

2467.24 u S 3

2467.23 2

2467.22 1

2467.21 0 2 3 4 5 6 7 2 3 4 5 6 7 (a) Gamma Parameter (b) Gamma Parameter

Figure 7: A dataset consisting of 12 viral capsid proteins of HHV strains and tree were generated using maximum likelihood technique using CNI analysis (a) Log likelihood (b) Sum of branch length.

(p-distance, Number of amino acid and poisson correction method) bound method which discard the clade of capsid protein HHV1 and and the replica of distance into tree form shown in figures 1 and 2. HHV2 as shown in figure 6a and results are consistent with distance Triplex capsid proteins of HHV2G and HHV2H have least matrix based approach. We apply the mini-mini heuristic technique (MP) different in the entire substitution model and the same observation to authenticate the results obtained in the previous simulation. The found in reconstructed phylogenetic tree. Portal protein of HHV2H MP search level 1 and 3 having an equal number of changes (19), has unexceptional highest distance value in substitution methods while MP search factor 2 has only 14 changes with the assumption producing clade at lowest bootstrapping value of 28% in figure 3. that outfit discard. Moreover, the one more consistency observed Therefore, the efficiency of the distance based methods depends on the that triplex capsid proteins have the lowest cost. The accuracy and model and relatedness in OTU’s. robustness of mini-mini search method having value 2 is better than Several lines of evidence from figures 3 and 4 suggest that UPGMA other tree represent in figure 6b-6d. Log -likelihood value of ML tree and NJ estimates are more robust because gamma distances (poisson is also changing with gamma parameter. The simulation based analysis correction method) also account for rate variation among sites in suggest that lower value of gamma give the transient, and higher value calculating evolutionary distances. We have used shape parameter (1.0, of gamma provide the stable log likelihood, which are most suitable for 2.5, 4.0 and 5.0) as a criterion because we are interest in estimating phylogenetic tree analysis. The difference between SBL in the original a clade relationship and taxon separation in the phylogenetic tree. tree and bootstrapping tree are insignificant, indicates that closeness NJ method results show that each protein placed in fixed position in between viral capsid proteins. the four phylogenetic trees, the distances between proteins vary, but Molecular clock‘s results show that the null hypothesis is fail due bootstrapping values of the trees can be quite large, depending on to p<0.5 in first two cases, and evolution takes in various proteins at gamma distribution. Contrary to NJ, gamma distribution does not different rates. The results of Tajima molecular clock also favour the influence UPGMA bootstrapping value. The clade separation distance phylogenetic tree discussed in figures 3-5. in all the four sets of phylogenetic tree is similar in quantitative to NJ trees. This happen because UPGMA method’s assumption (all Conclusion taxa evolve at a constant rate and that they are equally distant from the root, implying that a molecular clock is in effect) is applicable in Our phylogenetic analyses of viral capsid proteins deduced from computer simulation for phylogenetic tree. In a comparative study the HHV-1 and HHV-2 strongly suggests that the major capsid, triplex of the UPGMA and NJ methods, show that significant parameter capsid and portal protein have common lineage. However, sequences SBL maintain the original data from a distance matrix. Clustering of capsid protein HHV1 and HHV2 evolved are comparatively fast and optimally techniques (ME) results indicate that higher gamma with respect to their counterpart HHV-II-1 and HHV-II-2. Poisson substitutions with shape parameter are requiring understanding distribution is most suitable for reconstruction and analysis of tree. the phylogenetic tree. Our results suggest that the taxon separation We also use both character based approach (MP and ML) to of proteins is less and highly closeness between viral strains. examine the relationship among viral capsid proteins of HHV strains. Comprehensive knowledge of evolution in HHV strain may allow the Simulation of 12 OTU’s of proteins using max-mini branch-and- identification of targets and propose drug for therapy in human.

J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 4 • Issue 2 • 1000128 Citation: Sohpal VK, Dey A, Singh A (2013) Computational Analysis of Distance and Character based Phylogenetic Tree for Capsid Proteins of Human Herpes Virus. J Data Mining Genomics Proteomics 4: 128. doi:10.4172/2153-0602.1000128

Page 8 of 8

References 9. Chen L, Wu D, Ji L, Wu X, Xu D, et al. (2013) Bioinformatics analysis of the epitope regions for norovirus capsid protein. BMC Bioinformatics 14: S5. 1. Alfaro ME, Huelsenbeck JP (2006) Comparative performance of Bayesian and AIC-based measures of phylogenetic model uncertainty. Syst Biol 55: 89-96. 10. Posada D, Buckley TR (2004) Model selection and model averaging in : advantages of akaike information criterion and bayesian 2. Geddes BA, Hausner G, Oresnik IJ (2013) Phylogenetic analysis of erythritol approaches over likelihood ratio tests. Syst Biol 53: 793-808. catabolic loci within the Rhizobiales and Proteobacteria. BMC Microbiol 13: 46. 11. Saitou N, Nei M (1987) The neighbor-joining method: a new method for 3. Felsenstein J (1985) Confidence limits on phylogenies: an approach using the reconstructing phylogenetic trees. Mol Biol Evol 4: 406-425. bootstrap. Evolution 39: 783-791. 12. Sohpal VK, Dey A, Singh A (2010) MEGA biocentric software for sequence and phylogenetic analysis: a review. Int J Bioinform Res Appl 6: 230-240. 4. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368-376. 13. Kumar S, Filipski AJ (2001) Molecular Phylogeny Reconstruction. Encyclopedia of Life Sciences, Nature Publishing Group, London, UK. 5. Kenney GE, Rosenzweig AC (2013) Genome mining for methanobactins. BMC Biol 11: 17. 14. Sullivan J, Abdo Z, Joyce P, Swofford DL (2005) Evaluating the performance of a successive-approximations approach to parameter optimization in maximum- 6. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, et al. (2010) likelihood phylogeny estimation. Mol Biol Evol 22: 1386-1392. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59: 307-321. 15. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. (2011) MEGA5: Molecular Evolutionary Genetics Analysis using maximum likelihood, 7. Sourdis J, Nei M (1988) Relative efficiencies of the maximum parsimony and evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28: distance-matrix methods in obtaining the correct phylogenetic tree. Mol Biol 2731-2739. Evol 5: 298-311. 16. Yang Z (1996) Phylogenetic analysis using parsimony and likelihood methods. 8. Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, et al. (2008) J Mol Evol 42: 294-307. Phylogenetic classification of short environmental DNA fragments. Nucleic 17. Yang Z (1996) Among-site rate variation and its impact on phylogenetic Acids Res 36: 2230-2239. analyses. Trends Ecol Evol 11: 367-372.

J Data Mining Genomics Proteomics ISSN: 2153-0602 JDMGP, an open access journal Volume 4 • Issue 2 • 1000128