Characterization and diversity of selected Actinorhizal haemoglobin genes and proteins with reference to Alnus-Frankia symbiosis

Thesis submitted to the University of North Bengal For the Award of Doctor of Philosophy in Botany

By Sanghati Bhattacharya

Supervisor Prof. Arnab Sen

Department of Botany University of North Bengal Raja Rammohunpur, Siliguri September, 2017

This work is dedicated to my parents

Declaration

I declare that the thesis entitled “Characterization and diversity of selected Actinorhizal haemoglobin genes and proteins with reference to Alnus-Frankia symbiosis” has been prepared by me under the guidance of Prof. Arnab Sen, Department of Botany, University of North Bengal. No part of the thesis has formed the basis for the award of any degree or fellowship previously.

[Sanghati Bhattacharya] Department of Botany, University of North Bengal, RajaRammohunpur, Siliguri-734013

Date: 28.08.2017

i ACCREDITED BY NAAC WITH GRADE A Visit us at: UNIVERSITY OF NORTH BENGAL Department of Botany ENLIGHTENMENT TO PERFECTION RajaRammohunpur Siliguri 734013 West Bengal INDIA Phone: +91 353 2699118 FAX: +91 353 2699001 www.nbu.ac.in

Ref. No. ………………………………………………. Date: ………………………………………… Prof. Arnab Sen Professor & Head

Certificate

I certify that Miss Sanghati Bhattacharya has prepared the thesis entitled “Characterization and diversity of selected Actinorhizal haemoglobin genes and proteins with reference to Alnus- Frankia symbiosis”, for the award of Ph. D. degree of the University of North Bengal, under my guidance. She has carried out the work at the Department of Botany, University of North Bengal. The results incorporated in this thesis have not been submitted for any other degree elsewhere.

Place: Siliguri [Prof. Arnab Sen] Supervisor Date: 28/08/2017 Department of Botany University of North Bengal RajaRammohunpur,

Siliguri-734013. INDIA

Abstract

Nitrogen, one of the most essential (actinorhizal Hbs and nutrients used for survival of living actinobacterial Hbs) to elucidate the organism and it makes up four-fifths of inherent functional diversity and the the atmosphere, but being inert, cannot underlying evolutionary mechanism among intra and inter-specific be available directly to the higher organisms. Some microorganisms help members of actinoHbs. in fixation of atmospheric nitrogen into A. nepalensis, the host plant of this its reactive forms by making study is an excellent example of association with legume or actinorhizal successional plant as found to be early . The process of fixing the inert visitors of the landslide regions. molecular nitrogen into usable form by Present study includes a through the enzyme nitrogenase is called survey of host plant i.e. A. nepalensis Biological Nitrogen Fixation (BNF). in sub Himalayan West Bengal and Nitrogenase enzyme is known to be Sikkim, with its population genetics oxygen labile and needs to get study. The total area was divided into protection from free oxygen to three populations keeping in mind facilitate reduction of molecular about the vehicle-route of Darjeeling nitrogen into ammonia. So as a and Gangtok. The population genetics consequence haemoglobin (Hb) gene study revealed that entire population of comes in the scenario, which is found A. nepalensis has segregated depending to be involved in this aspect. on their geographical distribution and

The present study deals with the in this respect river Teesta act as molecular analysis and expression geographical barrier for dispersal of germplasm. Soil from the bottom of study of Alnus nepalensis Hbs, along with in-silico analysis of actinoHbs collected A. nepalensis plants were

ii

taken during field visit to estimate the non-symbiotic Hbs (nsHbs) showed a soil nutrients present within it. remarkable resemblance with class II However extremely variable soil symbiotic Hb (sHb)/legHbs (Lhbs) and nutrient conditions with little high in those properties differ them from that soil carbon was found to favor the of class I Hbs and ptHbs. growth of A. nepalensis in studied Motif annotation revealed two region. functionally active stretches of 50 and Further, we carried forward an in-silico 21 amino acid sequence containing analysis of actinorhizal Hbs along with “YjbI” like activity in ptHbs, which is different types of plant Hbs (pHbs) and responsible for inorganic ion transport actinobacterial Hbs (bHb) available in and metabolism., similar to bHbs but public domain. PHb constitute a totally absent in other pHbs. This diverse group of haem proteins and finding suggests that ptHb share evolutionarily belong to three different similar characteristics with bHbs, classes - symbiotic, non-symbiotic and which made it distinct from other truncated. Since truncated pHbs pHbs. This finding suggested the (ptHbs) have a 2-on-2 structure, they further analysis of bHbs in detail. For are structurally different from the other this, the codon level analysis and two Hb groups. functional annotation of bHbs were

To analyze the various aspects of Hbs, performed. The whole genome two fundamental levels i.e. sequence sequences along with Hb genes of type level and structure level analysis was strain of were retrieved. performed. Codon usage pattern using parameters like GC, GC3, Nc, etc and percentile In sequence level analysis, a thorough calculation revealed that codon sequence analysis of 96 pHbs along biasness of Frankia Hbs are expressed with identified 121 bHbs from 72 type in a moderate to high manner and strain of actinobacteria currently depending upon 1) GC compositional available in public domain were done. constrains and 2) natural selection on Physiochemical properties and motif their transitional efficiency. annotation of actinoHbs was performed A function based phylogenetic by using various software's and approach was undertaken to understand algorithms, which revealed that class II the ancient lineage of the protein

iii

family which subsequently revealed the homology modeling technique and function of bHbs are depending on 1) peeped into their structural properties. their different functions within same Structure based studies revealed that genera, 2) host specificity and 3) eco- the C. glauca sHb and D. glomerata geographical habitat. Functional ptHb shows distinct stereo-chemical annotation was also revealed that properties from that of the other C. single Hb of Frankia alni contain NOD glauca, M. gale and A. firma nsHb factor which afterward develop a proteins. unique pathway to synthesize chitine To analyse the structural similarity, the based signal molecule and help in the backbone of actinoHbs followed by interaction of host and microsymbiont structure based phylogeny was to facilitate symbiosis. considered. The result revealed that the Sequence comparison was done to backbone structures of pHbs (including identify the conservation level amongst sHb, nsHb and ptHb) and bHbs were actinoHbs. This revealed that the totally different, though the side chain conservation of sHbs and nsHbs of modifications were much more similar actinorhizal Hbs are quiet similar with for ptHbs and bHbs. This result that of other pHbs, whereas ptHbs supports the hypothesis that ptHbs showed high conservation level with have a similar structural arrangement bHbs, which further supported the with bHbs. result found from physiochemical To analyze functional divergence, we property test and motif annotation. It have performed assessment of land was also found that the actinoHb pHbs along with bHb and have done proteins were might be structurally not altered evolutionary rates among all related to the ferredoxin-NADP+ type types of member proteins. The reductase and involve in a different assessment revealed that ptHb was mechanism to reduction of oxidized functionally diverged from the other haem iron into ferrus form. pHbs (class I & II) while some Further, we constructed 3D structures properties are similar with class I of actinorhizal Hb proteins from Alnus nsHbs. firma, Casuarina glauca, Myrica gale Quantitative RT-PCR was performed and Datisca glomerata through to study the expression level of Hb

iv

genes in different plant parts of A. root region. nepalensis. A partial mRNA Hb gene This might be due to its search to find from A. nepalensis was identified, out their microsymbiont Frankia for which showed 96% similarities with interaction. Present study also suggests class I nsHb of A. firma. The that divergence may occur amongst expression study of Hb genes depicted plant and bacterial haemoglobin that Hb expresses in an elevated proteins where ptHbs have some linker manner in nodules, when inoculated values between them. with Frankia, but the expression level is significantly high in untreated plant

v

Preface

This Ph.D. thesis contains the result of Special thanks to Dr. Malay my research work undertaken at the Bhattacharya, Assistant professor, Department of Botany, University of Department of Tea Science, University North Bengal, from January, 2013 to of North Bengal for his valuable February, 2017. During this course of suggestions, enormous encouragement time I had gone through various and support. He has devoted his favorable and unfavorable mental precious time and helped me a lot to states. Without help, support, finish this research work. motivation & inspiration by others, I I am also thankful to the Head of the would have never been able to finish Department and all the teachers and my research work properly. Therefore I stuff of the Department of Botany, would like to thank those people who University of North Bengal for their have shown me the source of support. inspiration towards the completion of I express my gratitude for all the my research work, and please forgive members of Department of Tea me if I miss any of them. Science, University of North Bengal First of all I would like to express my for supporting me to complete the deep sense of gratitude and heartfelt work. thanks to my research supervisor, Prof. It would be injustice, if I forget the Arnab Sen, Department of Botany, name, Dr. Dorjay Lama, sir, University of North Bengal, for his Department of Microbiology, St. esteemed supervision, invaluable Joseph College, for letting me guidance, priceless suggestions and permissible to use his laboratory all constant encouragement. His scientific through my research work. knowledge helped me to carry forward I would like to thank Department of my research work from the beginning. Biotechnology, Govt of India for I express my sincere thanks and providing Bioinformatics Facility in regards to him. North Bengal University and also like

vi

to acknowledge Department of determination for higher education and Biotechnology, Government of India, being the all-time support system for financial support during my during hard phase of my life. I also research work. grateful to my elder sister Mrs.

I am grateful to all my Lab mates, Dr, Oindobi Bhattacharya, my brother in Subarna Thakur, Dr. Arvind Kr. Goyal, law Mr. Sourav Bhattacharya for their Dr. Tanmayee Mishra, Dr. Ritu Rai, massive support, encouragement, Dr. Debadin Bose, Dr. BC Basistha, mental boost up in every step of my Dr. Ayan Roy, Arnab Chakraborty, research work. I also like to thanks my Pallab Kar, Indrani sarkar, Mousumi little niece Arjya Bhattacharya. Saha, Saroja Chhetri, Reha Labar, It would be unfair if I forget the name Moushikha Lala and Krishanu Ghosh of Mr. Rakesh banerjee, my husband, for offering me the necessary help and who constantly encouraged and assistance with their selfless fidelity. I supported me since last one year to would be remiss if I neglect to complete my research work serenely. acknowledge Abhijit Chettri, research My earnest thanks to him, for his scholar of St. Joseph College and supportive nature. My sincere thanks to Upashna Chettri, research scholar of my mother in law Mrs. Mita Banerjee University of North Bengal for their also, to make me feel comfortable to help during the course of my research fulfill my research work peacefully. work. I would also like to acknowledge I would also thankful to many other the service of my lab boys Raj and friends and family members whose Basu. name may not appear here but they all My sincere thanks to my parents, Mr. helped me morally and in many other Madhabendra Bhattacharya and Mrs. ways.

Kalyani Bhattacharya for their Lastly I am thankful to the almighty for persistence encouragement, motivation, all the blessings. economical and moral support, which helped me to overcome various hurdles during my research work. I am also thankful to them for supporting my [Sanghati Bhattacharya]

vii

Table of Contents Chapter Description Page # Declaration i Abstract ii-v Preface vi-vii List of tables ix List of figures x List of appendices xii 1 Introduction 1-6 2 Review of Literature 7-37 2.1 Actinorhizal plants 7 2.2 Biological Nitrogen Fixation 11 2.3 The microsymbiont - Frankia 12 2.4 Systematic approach of Frankia-actinorhizal symbiosis 15 2.5 Haemoglobin – a key factor in global nitrogen balance 16 2.6 Computational biology of haemoglobin proteins associ- 32 ated with nitrogen fixation 2.7 In-silico analysis involved in computational biology 35 2.8 Challenges and future prospects 37 3 Materials and Methods 38-67 3.1 Ecology of Alnus nepalensis in sub-Himalayan West 38 Bengal and Sikkim 3.2 Genetic diversity of A. nepalensis 41 3.3 Characterization of actinohaemoglobins (in-silico) 48 3.4 Comparative study amongst actinohaemoglobins in gene 57 level 3.5 Homology modelling of actinorhizal haemoglobin pro- 58 teins 3.6 Evolutionary trend of truncated plant haemoglobins 60 3.7 Expression study of A. nepalensis haemoglobin in differ- 62 ent plant regions 4 Results and Discussion 68-118 4.1 Ecology of A. nepalensis 68 4.2 Population genetics and genetic diversity studies of A. 74 nepalensis 4.3 Characterization of actinohaemoglobins (in-silico) 84 4.4 Comparative study amongst actionohaemoglobins in 95 gene level 4.5 Homology modeling of different actinorhizal haemoglo- 102 bins (in-silico) 4.6 Evolutionary trend of plant truncated haemoglobins 110 4.7 Expression study of haemoglobins of A. nepalensis in 115 different plant regions Conclusion 119-122 Bibliography 123-137 Index 138-139 Appendix One–Seven viii

List of Tables Table Title Page # 3.1 Detail information of study area 40 3.2 Geographical location from where soil samples were collected 41 3.3 Detail description of collection site 43 3.4 Primer sequences producing successful amplification 47 3.5 List of land plant haemoglobins used in present study 49 3.6 Different type-strains of actinobacterial genome selected for present 52 study 3.7 Structural resemblance analysis 60 3.8 Structure based phylogenetic analysis amongst selected plant haemo- 61 globins and actinobacterial haemoglobins 3.9 Detail of the samples used in expression study 66 4.1 Compilation of collected field data of A. nepalensis in sub- 69 Himalayan West-Bengal and Sikkim 4.2 Morphologycal data of A. nepalensis recorded during field study 71 4.3 Estimation and analysis of nutrients present in soil collected from the 73 base of A. nepalensis in studied area 4.4 Purity of A. nepalensis DNA 76 4.5 Total number and size of amplified bands, number of monomorphic 78 and polymorphic bands and percentage of polymorphism generated by the RAPD primers 4.6 Similarity coefficient matrix of RAPD profile of 43 Alnus species 80 4.7 Primers found in all three populations producing distinct scorable 83 bands 4.8 Similarity coefficient amongst studied populations of A. nepalensis 84 4.9 Physiochemical properties of haemoglobin protein sequences from 86 various plants 4.10 Identification of actinobacterial haemoglobins with well distinguished 89 biotope codes 4.11 Coden usage patterns and percentile of codon usage in actinorbacterial 90 haemoglobins 4.12 Motifs identified from various actinohaemoglobins 96 4.13 Distribution of motifs present in plant and actinobacterial haemoglo- 98 bins 4.14 Similarity coefficient matrix of motifs present in plant and bacterial 99 haemoglobins 4.15 Template selection of query modeled haemoglobin proteins on the 103 basis of e-value and percentages of sequence similarity 4.16 Validation of crude actinorhizal haemoglobin protein models by vari- 104 ous algorithms 4.17 Characteristics of modeled actinorhizal haemoglobin proteins 107 4.18 Functional divergent analysis of member proteins of plant haemegobin 113 in contrast to bacterial haemoglobins 4.19 Detail result of nodulation during plant infectivity test 116

ix

List of Figures

Figure Title Page # 2.1 Phylogenetic grouping of actinorhizal plants 8 2.2 Evolution of haemoglobins from the ancient earth with the time period 22 2.3 Distribution of haemoglobins in different life systems 23 2.4 Classification of haemoglobins in plant system 25 2.5 Systematic order of haemoglobins in plant system from the Algal an- 33 cestor till date

3.1 Map route of study area 39 3.2 Naturally occurring A. nepalensis in study area and their surroundings 62 vegetations

3.3 Data collection sheet 63 3.4 (A&B) Surface sterilization procedure of A. nepalensis nodules by 65 using 30% H2O2

(C) Removal of upper epidermal layer from A. nepalensis nodules 65 4.1 (A&B) Different root nodules of A. nepalensis collected from differ- 72 ent collection sites

(C) Morphological study of A. nepalensis according to altitudinal 72 variation

4.2 Correlation between weight of nodules and particular soil nitrogen 74 4.3 DNA-gel electrophoresis of crude DNA isolated from studied samples 75 4.4 (A-I) Gel showing DNA bands of A. nepalensis species amplified 77 by the RAPD primers

4.5 Dendrogram constructed on the basis of the data obtained from RAPD 81 analysis

4.6 Dendrogram constructed on the basis of the genetic similarity and dis- 84 tance between populations of A. nepalensis

4.7 Percentile plot of actinobacterial haemoglobins 91 4.8 GC3 versus NC plot of Frankia haemoglobins 92 4.9 Functional phylogeny of actinobacterial haemoglobins 94 4.10 Motif based phylogeny amongst actinohaemoglobins 99 4.11 Amino acid sequence of actinorhizal symbiotic, non-symbiotic and 100 truncated haemoglobins along with some selected plant and bacterial hameoglobins

4.12 3D structure of A. firma class I non-symbiotic haemoglobin 104

x

Figure Title Page # 4.13 (A) Ramachandran plot of A. firma modeled protein 105 (B) Plot shows Z-score of modeled A. firma non-symbiotic haemo- 105 globin in PDB determined by X-ray crystallography (light blue) or NMR spectroscopy (dark blue) with respect to their length 4.14 Image showed the normalized atomic displacement plot of actinorhizal 109 haemoglobins 4.15 Solvent accessibility plot of the A. firma non-symbiotic haemoglobin 110 protein 4.16 Lesk-Hubbard plot to identify the similarity between plant and bacte- 111 rial haemoglobins 4.17 Structure based phylogenetic tree amongst selected actinohaemoglo- 112 bins 4.18 The phylogenetic tree comprised both plant and actinobacterial haemo- 112 globins. Grey portions are some abtinobacterial haemoglobin proteins which are not considered for study but had taken to construct the phy- logenetic tree

4.19 Substitution rate among plant and bacterial haemoglobin clusters 114 4.20 (A) Microscopic view of A. nepalensis winged seed 115 (B) Germinated A. nepalensis seeds in BOD incubator 115 4.21 Expression of A. nepalensis non-symbiotic haemoglobin in different 117 plant parts

xi

List of Appendices

Appendix Title Page #

Appendix A List of publications One

Appendix B List of abbreviations Two

Appendix C Buffer and chemical used for DNA fingerprinting studies Four Appendix D Software used in study Six

Appendix E Web server used in present study Seven

xii

INTRODUCTION

Chapter 1 Introduction

Symbiotic association, a perfect precipitation and is thus deposited in example of division of labor between terrestrial ecosystems. The other way two organisms, the outcome of which of nitrogen fixation involves activity of may lead to endeavors like biological certain soil that absorb nitrogen fixation (BNF). Reduction of atmospheric N2 gas and convert it into ammonium. This process is known as atmospheric N2 to ammonia and its further assimilation into amino acids BNF. However, the ability to fix and other bio-molecules enables nitrogen is found only in one biological gaseous nitrogen to incorporate into kingdom, the Prokaryota (Sprent and life processes. As all organisms need Sprent, 1990). Thus other organisms Nitrogen to survive, nitrogen fixation is have exploited the ability of probably the second most important prokaryotes to fix nitrogen by establishing various types of biochemical pathway after carbon fixation. In nature, there are two major interactions (Werner, 1992). ways of fixing nitrogen. Natural abiotic Cyanobacteria and plant microbe nitrogen fixation, which can be symbiosis are considered to be among mediated by lightning or fires, oxidizes the major milestones in evolution of N2 to nitrate (NO3-). The NO3- life on Earth, bringing together the two most essential biochemical pathways, produced in this way can be washed out from the atmosphere with carbon fixation and nitrogen fixation.

INTRODUCTION 2

These occur two main types of nitrogen they fix in global basis symbiosis between nitrogen fixing (Schwintzer, 2012; Mirza et al., 2007), bacteria and vascular plants. One yet knowledge of their biology and between Rhizobium and leguminous uses is, for the most part, very recent. plants, and other between Frankia and They have in common a predilection to actinorhizal plants (Wall, 2000). A grow in marginally fertile soils and large number of woody dictyledonous often serve as pioneer species early in plants making symbiotic association successional plant community with actinomycetes, belonging to the development (Kennedy et al., 2010). genus Frankia are called actinorhizal Because they often thrive on marginal plants (viz. Alnus nepalensis, Eleagnus soils, actinorhizal plants have current pyriformis, Myrica negi, Casuarina and potential applications in reclaiming eqisetifolia, Coriaria nepalensis and and conditioning soils, producing Hippophae sp. etc). timber and pulp and acting as wind

The rhizobia-legume symbiosis break ornamental and fuel wood plants. involves more than 1700 plant species Besides, globally they have potential of the family Fabaceae (Leguminosae) for integrating into schemes for distributed in three sub-families: addressing issues of reforestation Mimosoideae, eaesalpionoideae and (Benson and Silvester, 1993; Bose and Papilionoideae (Wall, 2000) with Sen, 2006). bacterial partners belonging to the However in all cases of Rhizobia- family Rhizobiaceae (Rhizobium, legume or Frankia-actinorhizal Azorhizobium , Sinorhizobium , symbiosis a new plant organ, the Bradyrhizobium and Mesorhizobium) nodule, is developed in which the (Mousavi et al., 2014). Unlike bacteria proliferate, express the most Rhizobium, Frankia form symbiosis vital enzyme, the nitrogenase and fix with 24 different genera of nitrogen into ammonia. These dictyledonous mostly woody plants compounds are then assimilated and belonging to 8 families. These plants transported to the rest of the plant called actinorhizal plants comprise 260 parts (Hirsch, 1992; Pawlowski and species associated with the filamentous Bisseling 1996; Franche et al,. 1998). actinomycetes Frankia. Actinorhizal Several genes of both host and bacteria plants rival legumes in the amount of are involved in the process of

INTRODUCTION 3 nodulation and nitrogen fixation. ubiquitously almost in all life forms Actinorhizal nodules are characterized and has been reported to be critically by a central vascular bundle and involved in oxygen transport and peripheral infected tissue surrounded circulation mechanisms. Plant by cortical nodule parenchyma. haemoglobins (pHbs) are different Actinorhizal nodules are ontologically from mammalian globins and were first related to roots. The nodule also identified in the leguminous plant provides an environment with a low O2 (Glycine max) which is nodulated by content, inside the host system. This rhizobia (Kubo, 1939). The presence of environmrnt in the root nodules are Hbs in the plant nodules accounted for essential, as the enzyme nitrogenase its red pigmentation and these proteins has to get protection from free oxygen were termed as leghaemoglobin (LHb) that produce into the nodules. (Dordas et al., 2003). PHb proteins Nitrogenase, the enzyme that catalyzes were initially thought to be restricted to nitrogen fixation, is oxygen labile. plant species carrying out symbiotic Some oxygen however must be nitrogen fixation, but were later found provided so that the bacteria can in non-nodulated plants (Bogusz et al., respire and produce energy required 1988) as well. both for survival and to drive N2 There are three distinct types of pHbs: fixation. A special O2-transporting symbiotic haemoglobins (sHbs), protein, called haemoglobin (Hb) thus nonsymbiotic haemoglobins (nsHbs) eventually plays a crucial role in and truncated haemoglobins (ptHbs) shielding the nitrogenase from oxygen (Dordas et al., 2003). Previously it was and supplies controlled amount of O2 thought that sHb is mainly found in the carefully. root nodules of plants which go to Hb is a ubiquitous protein, present in symbiotic association with Rhizobia all life forms, which functions to (Kubo, 1939; Guldner et al., 2004), but provide reversible binding of gaseous further research revealed the presence ligand such as oxygen and nitric oxide of Hbs in the actinorhizal root nodules (Dordas et al., 2003). Usually, Hb also (Benson and Silvester, 1993). reminds us of mammalian and Parasponia Hb have originated from circulatory system of blood but sHbs, and are classically thought to significance of Hb are to exist have non-symbiotic roles (Appleby et

INTRODUCTION 4 al., 1983). These Hbs are also called as the definite functions of these two LHb. types of haemoglobins are not fully

The sHbs are exclusively named when determined yet. The presences of found in actinorhizal root nodules different types of Hb in various plant (Kubo, 1939; Appleby, 1984) and tissues indicate it has other functions termed as LHb when present in besides oxygen transportation Rhizobial symbiosis (Appleby, 1984; (Hebelstrup et al., 2013). Perazzolli et al., 2004). A third type of pHb, i.e. ptHB, is found

SHbs are required for symbiotic to be distributed in wide range of life nitrogen fixation, but not for cellular forms, starting from bacteria to functions of plants (Vinogradov et al., unicellular eukaryotic to higher plant 2006). The ratio of oxygen and nitric (Sasakura et al., 2006; Vinogradov et oxide concentration in symbiotic al., 2006). PtHb have a 2-on-2 nodules is maintained by the nsHbs. sandwich primary sequence Initially, pHbs were assumed to exist (Igamberdiev et al., 2004) instead of only in symbiotic plant systems due to the 3-on-3 classical Hb fold their abundance in symbiotic root (Vinogradov et al., 2006; Bogusz et al., nodules (Bogusz et al., 1990). 1990; Ciaccio et al., 2015), and are However, investigation of various plant also 20-40 amino acids shorter than the species like Parasponia andersonii, classical form of haemoglobins Trema tomentosa, Celtis australis and (Vinogradov et al., 2006; Jokipii- others showed many of these Hbs are Lukkari et al., 2009). Conversion of nonsymbiotic in nature (Reeder and hexa to penta coordination arrangement Hough, 2014).NsHbs are found in ptHb, takes around 20 minutes dwelling mainly in leaf and stem. which is extremely slow compared to Nevertheless, it is present in roots of other classes of Hbs ( Jokipii-Lukkari the seedlings, roots tissues exposed to et al., 2009; Ascenzi et al., 2017). The flooding stress and oxygen-stressed above evidence revealed that the ptHbs aleurone tissues (Taylor et al., 1994). are structurally different from the two sHbs have a greater affinity to oxygen other classes of pHb (sHbs and nsHbs). than sHbs (Hill, 2012). Nonsymbiotic Some plant species contain two ptHbs, haemoglobins are of two types: Class 1 but the functions of these Hbs still and Class 2 haemoglobins. However, remain elusive (Jokipii et al., 2008).

INTRODUCTION 5

Although ptHbs are longer than forest ecosystem and maintaining the bacterial haemoglobins (Vinogradov et fertility of forest soil (Benson and al., 2006; Jokipii et al., 2008), a Silvester, 1993). Although no work has phylogenetic tree based on sequence been done on the ecological aspect, alignment placed ptHbs closer to the genetic diversity, population genetics haemoglobins of gram-positive study in host plant i.e A. nepalensis in bacteria (Wittenberg et al., 2002; Hoy Sub-Himalayan West Bengal and & Hargrove, 2008). Sikkim region.

Some work has been done on the The above information prompted us to expression of pHb in response to an in-depth study regarding the symbiotic and pathogenic bacteria characters, phylogeny and the (Nagata et al., 2008; Sasakura et al., expression of different Hb genes and 2006) but, virtually no work has been proteins found in actinorhizal plants done in respect to actinorhizal pHb with special reference to the particularly the (A. nepalensis), actinobacterial Hbs (bHbs), specially found in Sub-Himalayan West Bengal the symbiotic counterpart of Alnus i.e and Sikkim region. No report has been Frankia. In the present study, both found on three dimensional structure of experimental methods as well as in- actinorhizal pHb proteins, their silico analysis were done to understand expression pattern in different plant the divergent of the functionality of parts specially in Alder of sub- different pHbs as well as bHbs. himalayan West Bengal and Sikkim till Keeping in mind, the above date. The evolution of ptHbs in information the present study entitled actinorhizal plant system, their “Characterization and diversity of functions in plant system. The general selected actinorhizal haemoglobin trend of actinorhizal pHb in respect to genes and proteins with reference to other pHbs in gene as well as protein Alnus-Frankia symbiosis” has been level. designed with the following objectives: Alder being a major constituent of  Determination of the ecology of A. forest flora, particularly in the high nepalensis in sub Himalayan West altitude Himalayan region play a Bengal and Sikkim. pivotal role in early succession of

INTRODUCTION 6

 Determination of the population homology modeling technique and genetics and genetic diversity of their intrinsic dynamics study. sub Himalayan A. nepalensis.  Comparative phylogenetic analysis  Characterization of actinorhizal amongst action-Hbs based on their pHb genes with reference to all structures, sequences and presence kind of pHbs available in public of functional motifs among them. domain (in-sillico).  Structural resemblance  Determination of functional region determination among actinoHbs. as well as specific residues  Determination of functional responsible for actinoHbs divergent of action-Hbs to describe (actinorhizal pHbs and bHbs). the evolution of ptHbs into plant  Comparative codon usage analysis system. of selected 100 bHbs (in-sillico).  To learn the expression of A.  Determination of functional nepalensis Hb genes in response to diversity of bHbs. Frankia.

 Determination of 3D model of actinorhizal pHb proteins using

REVIEW OF LITERATURE Chapter 2 Review of Literature Alnus-Frankia actinorhizal symbiosis. 2.1. Actinorhizal plants:

Symbiosis is a mutual association Actinobacteria Frankia develops root between two different organisms living nodules in actinorhizal plants and fixes in close physical association typically atmospheric nitrogen. An actinomycete for taking advantages from each other. Frankia and the location of their These relationships are essential to nitrogen fixation i.e. root nodules, is many organisms and ecosystem to termed together as actinorhiza. A few provide a balance that can only be angiosperms are able to have symbiosis achieved by working together. There with Frankia in terms of fix ing are mainly two kinds of symbiotic nitrogen by the modification of their relations. First one is mutualism, where root hair. this group of dicotyledonous both the symbiont and host benefit, and plants that are nodulated by Frankia is second one is commensalism, where called actinorhizal plants. Till date symbiont benefits with little effect on about 260 species 24 genera belonging the host. Two types of mutualism i.e. to 8 families of 7 orders have been 1) Obligate: one organism cannot reported to have actinorhizal survive without other and 2) associatios (Froussart et al., 2016) (Figure 2.1). Facultative: each organism can survive independently but it benefits both when Actinorhizal plants are found in they remain together were found. different geographical location of the

Symbiosis are observed between earth, that covers arctic Tundra (Dryas eukaryotes-eukaryotes, eukaryotes- spp.) and alpine forest (Alnus sp., prokaryotes and prokaryotes - Coriaria sp. etc) to costal and xeric prokaryotes system. Present study is regions (Casuarina sp.). In Nepal, confined to one such eukaryote- Alnus nepalensis, found at a wide prokaryote facultative mutualism i.e. altitudinal range, descending as low as

REVIEW OF LITERATURE 8

Family Genus Myricaceae Myrica (●) Comptonia (●) Alnus (●) Frankia Clade I Casuarinaceae Gymnostoma (♠) (isolate) RH-sV Casuarina (♠) Allocasuarina(♠) Ceuthostoma (♠) Elegnaceae Elaeagnus (●) Hippophae (●) Spephardia (●) Rhamnaceae Colletia (♦) Discaria (♦) Frankia Clade II Kentrotham (♦) (isolate) IP-sV Retanilla (♦) Telguenea (♦) Trevoa (♦) Ceanothus (◘) Rosaceae Dryas (◘) Purshia (◘) Cowaniana (◘) Cercocarpus (◘) Frankia Clade III Chamaebatia (◘) (isolate) IP-nsV Coriariaceae Coriaria (■) Datiscaceae Datisca (■) Frankia like

Figure 2.1: Phylogenetic grouping of actinorhizal plants (Symbol between brackets of plant genus indicated native geographical distribution – (●) to most continents; (♠) to Australia and Western Pa- cific; (♦) to South America and southern New Zealand; (◘) Western North America; (■) disjunction distribution in northern and southern temperate zones; RH – root hair infection; IP – intercellular penetration; sV – septate vescicles in nodule; nsV – nonseptated vesicles in nodule) (Benson and Clawson 2000, Jeong et al. 1999, Huss-Danell 1997, Saltis et al. 1995, Swensen and Mullin 1977, L G Wall 2000) 500 m, but most common from 900 m A record regarding the presence of A. and above up to 2700m. At lower nepalensis in sub-himalayan West altitude, A. nepalensis mostly found in Bengal and Sikkim during the British moist regions, such as riverside but period are available in the Bengal also found abundant in rough rocky District Gazetteers by L S S land exposed by landslides, and O’MALLEY (1907). cultivable land (Sharma, 2012). It is Frankia in symbiosis with actinorhizal liable to be damaged by browsing -1 -1 plants fix 240-350 kg N2 ha y range animals when young, but seedlings of molecular nitrogen, which is almost over 50 cm high are relatively immune. equivalent to those of leguminous

REVIEW OF LITERATURE 9 plants (Dawson, 1990; Hibbs and Morphology of nodules, formed by Cromack, 1990; Wheeler and Miller, different actinorhizal plants differ 1990; Wall, 2000). Since actinorhizal considerably depending upon plants are the early visitors of marginal organization of infected cells in the soils, they are considered as pioneer cortex (i.e. symbiotic, nitrogen-fixing species in the landslides and other cells), oxygen protection mechanisms threatened areas. These plants are also for nitrogenase, infection mechanisms, economically important while used as patterns of Frankia differentiation, and timber, fire-wood, food, chemicals, etc organization of carbon and nitrogen (Benoit and Berry, 1990; Diem and metabolism (Pawlowski and Bisseling, Dommergues, 1990; Hibbs and 1996; Berry et al., 2011; Schubert et Cromack, 1990; Myrold, 1994). In al., 2011). The actinorhizal root India, the actinorhizal plants are nodules are conventionally formed reported to be growing as dominant through a series of interactions species under different environmental between the host actinorhizal plant and condition. The morphological the symbiont ie. Frankia. The characteristics, habit, habitat, interaction takes place in the cortical distribution and possible ecological cells of lateral root in actinorhizal significance of actinorhizal plants plants and subsequently formed found in Kumaun Himalayan region nodules. The actinorhizal root nodules are studies and reported by Bargali are continuing structure, consists of (2011). Swensen, (1996) identified multiple lobes. The lobes in Alnus are three major phylogenetic subgroups of morphologically different from Myrica actinorhizal plants. The first subgroup and Ceanothus lobes. Myrica and includes symbiotic taxa from the Ceanothus lobes are distinct but in families Betulaceae, Myricaceae and Alnus, they are compactly crowded.

Casuarinaceae. The second subgroup The perfect condition for root infection includes the families Datiscaceae and process and subsequent development of Coriariaceae from symbiotic taxa, and nodule is not well understood in the third subgroup also includes Frankia-Actinorhizal system. The symbiotic taxa from the families hyphae of Frankia are embedded in the Rhamnaceae, Rosaceae and mucilage layer of the root hair of Elaeagnaceae (Swensen, 1996). actinorhizal plants and established the

REVIEW OF LITERATURE 10 infection precess. It is established that place either by deforming the root hair a single root-hair infection is good as found in Alnus, Casuarina, Myrica enough for development of nodule. etc (Callaham et al., 1979; Berry and However, more than one infection Torry, 1983) or by intracellularly as occurs simultaneously in the root-hair, found in case of Ceanothus, for the nodule formation, in natural Elaeagnus, etc. Some strains can environment. It has been also well pursue both the pathways for established that, under laboratory successive infection (Miller and Baker, conditions the occurrence of nodulation 1986). is directly proportional with the In rhizobia–legume symbioses amount of inoculums given (Newcomb nodulation process established by help and Wood, 1987). Knowlton et al., of NOD signal molecules and forms (1980) reported that, in actinorhizal the basis of host specificity (Oldroyd et plants, several non-symbiotic soil al., 2009). These kinds of NOD factors bacteria involve in root hairs and the bioassay of reporter genes are deformation process. Those are called reported to be absent In Frankia ‘helper’, which play an important role (Ceremonie et al., 1999). On the basis i n t h e F r a n k i a - actinorhizal of the actinorhizal root hair interactions. deformation study (Gherbi et al., 2008; Under controlled condition, Markmann et al., 2008) and signal Pseudomonas sp. helps in the nodule factors present in arbuscular formation in Alnus and Casuarina mycorrhizal fungi (Maillet et al., (Knowlton et al., 1980). The signal 2011), it is predicted that a chitin-based molecule of host plants allows same molecule in Frankia, act equivalents to essential chemical changes and the NOD factor of rhizobia–legume undergoes regulation of nitrogen fixing symbiosis. Recently a LysM-type symbiosis with Frankia. This involves mycorhizal receptor was reported to be as a minimum of two different and found in infection process of successive signal molecules that leads Rhizobium symbiosis which involve in to a root nodule formation mechanism of the nonlegume Parasponia in actinorhizal plants (Wall, 2000). andersonii (Den Camp et al., 2011). Two different pathways are reported This finding suggested that Frankia for infection of the host tissue. It takes might have gone through a series of

REVIEW OF LITERATURE 11 unique pathways of synthesis of a angiosperms mostly belonging to novel chitin-based signal molecule Magnoliidae. Some of these plants distinct from Rhizobium. These progressively evolved afterwards into molecules allow independent infection the actinorhizal plants. Nitrogen fixing mechanism and act as NOD factor microbes fix nitrogen to the soil and alike to rhizobia–legume symbiosis increase the availability of nitrogen (Pawlowski et al., 2011). In nodule during the last 100M years. So, due to Frankia follow a unique pathway, for stable amount of availability of nitrogen utilization and the primary nitrogen in the soil, some of the ancient nitrogen assimilation reported to be plants lost their symbiotic nitrogen stored as arginine (Berry et al., 2011). fixing ability. However this primordial

2.1.1. Evolution of actinorhizal plants property remains active in some selective pioneer plants i.e. actinorhizal by studying the facultative mutualism: plants. The DNA hybridization study Frankia and actinorhizal symbiosis on actinorhizal plants supports the evolved almost at similar time. Two second hypothesis (Normand and different hypotheses regarding the Bousquet, 1989). The nitrogen fixation evolution of the actinorhizal symbiosis process in nodules might have evolved have been proposed. According to the at a definite time in the earth’s first hypothesis, plant genera obtained evolution (Doyle, 1998). The nitrogen the ability of making symbiotic fixing actinorhizal plants are grouped relationship with Frankia to achieve in Rosid I linages of the seed plants. certain selective advantages related to This Rosid I group may be further ecological niches. So, the divergence subgrouped into four linages, of which of families into their respective genera three are actinorhizal and the fourth had taken place after acquisition of the one is Fabaceae (Soltis et al., 1995). symbiotic character. The second Geographical distribution, fossil hypothesis stated that a group of plant records, anatomical and morphological was forced to make association with studies extend supports to this nitrogen fixing soil bacteria Frankia, as hypothesis (Wall, 2000). the available soil nitrogen was limited in the early cretaceous period (Bond, 2.2. Biological Nitrogen Fixation: 1983) which subsequently survives in Nitrogen is an important nutrient used struggle for existence with other by living organisms for continued their

REVIEW OF LITERATURE 12 existence. It is most commonly (Raymond et al., 2004). deficient element, responsible for BNF provides a means to meet the reduced agricultural yields throughout needs of a growing population with a the world. Dinitrogen (N 2) or nutritious, environmentally friendly, Molecular nitrogen makes up four- sustainable food supply. This makes fifths of the atmosphere but is the need for BNF research very unavailable metabolically directly to convincing in the present situation. In higher plants or animals. Nitrogen is the last two decades, various available to some microorganism interesting discoveries regarding through Biological Nitrogen Fixation nitrogen fixation have been reported. (BNF) in which atmospheric nitrogen Genome sequence approaches, ‘omics’ is converted to ammonia by help of the approaches for microorganisms, enzyme nitrogenase. The ammonia genetically modifies crops have taken from microorganisms is then common platform in agricultural transferred to the plants to meet its biology. The advance level nitrogen nutritional need of nitrogen for the fixation research is mainly focused on synthesis of proteins, enzymes, nucleic the enzyme complex called nitrogenise. acids, chlorophyll, etc. thus nitrogen This enzyme complex has involved in enters the food chain from atmosphere various biochemical processes, apart through microorganisms and plants. from its usual functions. Those Thus all eukaryotes, including higher includes signal transduction, inter and plants and animals unsurprisingly intra molecular electron transfer, depend on the N-fixing microbes for p r o t e i n - protein interaction, their nitrogen supply followed by BNF involvement in enzymatic catalysis etc activity. The organism that capable to (Peters et al., 1995). grow without external sources and 2.3. The microsymbiont - Frankia: fixed nitrogen in soil is called diazotroph. According to literature The research on root nodules of non- regarding BNF only prokaryotes leguminous plants was getting focused (members of the domains Archaea and from early nineteenth century. The Bacteria) are capable of performing it. physiological nature of nodules and the The nitrogen fixing ability is cause of nodule formation first extensively distributed across both the speculated by Meyen (1829) to bacterial and archaeal domains understand the parasitic infection in the

REVIEW OF LITERATURE 13 root. Detail anatomy of the nodules Nocardia alni by Von Plotho (1941) was studied by Woronin (1866). He and Waksman (1941) respectively, observed hyphae like structures and Proactinomyces alni (Krassilnikov, some round vesicle within it. He also 1949), Streptomyces alni (Fiuczek, observed some intracellular region 1959; Normand & Fernandez, 2006). where the hyphae were passing Hiltner (1898), identified the through. He also found that some endophyte as member of actinomycetes hyphae continued with round vesicular for the first time, while studying the swelling tips. Woronin considered the roots of Alnus and Elaeagnus and unknown organism as fungus and found close ally with Streptomyces named Schinzia alni as this endophyte (Hirsch, 2009). Hellriegel and showed resemblance with a fungus Wilfarth, (1886-1888) (Quispel, 1988) called Schinzia cellulicolia (Sen, studied on the atmospheric nitrogen 1996). Brunchorst (1886), studied the fixation by the bacteria residing in the cytological difference between cortical layer in leguminous nodules. leguminous and non-leguminous roots They introduced two terminologies and named the unknown organism as during their research ‘nitrogen user’ Frankia subtilis to honor his teacher and ‘nitrogen accumulator’ and A.B. Frank. Frank was a swiss described the differences between microbiologist and ironically them. They identified actinorhizal plant considered the structures as protein as nitrogen accumulator, and granules and did not believe in the also described those plants accumulate presence of living microorganism in atmospheric nitrogen for plants growth. any kind of nodules. Afterwards Frank Their findings opened up a new era in changed his idea and considers Frankia plant microbial research (Bottomley, as a fungus along with Brunchorst 1912; Quispel, 1988; Sen, 1996). Later (1885). In 1895, the name Frankia alni Hiltner in his work demonstrated that was also coined by Von Tubeuf (1895) young alders need root nodules to as a tribute to A.B. Frank. Several survive in nitrogen free soils. This names of Frankia have been proposed result pointed out that though alders are later on, those are Plasmodiophora alni certainly not nitrogen fixer but they are (Moller, 1885), Frankiella alni (Maire nitrogen accumulator on the soil and and Tison, 1909), Aktinomyces alni another organism must be involved in (Peklo, 1910), Actinomyces alni and

REVIEW OF LITERATURE 14 the nitrogen fixation process. In (1978) from Comptonia peregrina in another study, Brewin (2002) isolated pure culture followed by other workers bacteria from root nodules of (Diem and Dommergues, 1983; Diem leguminous plants which failed to et al., 1983 and Sarma et al., 1998). infect non leguminous plants, which Lalonde (1978) worked on Frankia proved that leguminous strain CpI1 isolated by Callaham et al., microsymbionts and non-leguminous (1978) and demonstrated its ability to microsymbionts are two different reinfect the host plant and established organisms (Brewin, 2002; Pawlowski its symbiotic behaviour. The and Bisseling, 1996). Finally morphological character of the actinomecetes was identified as non- bacterium isolated by Callaham et al., leguminous microsymbiont (Quispel, (1978) shows unexpected resemblance 1990). Pommer (1956) first with the lost strain of Pommer which successfully isolated a slow-growing he isolated in 1959 from A. glutinosa actinobacteria from nodules of Alnus nodules (Benson and Silvester, 1993). glutinosa in pure culture with unique Because of slow growing nature of morphological features like Frankia, frequent contamination multilocular sporangia, hyphae and occurred by various other fast growing vesicles. After 2-3 weeks of growth in organisms, fungi and other glucose-asparagin agar the bacteria was actinomycetes. These are often obtained as 0.6mm colonies described mistaken as Frankia. To overcome this by Waksman (1950) for actinomycetes. problem Lechevalier and Lechevalier, Unfortunately this strain was lost (1984) proposed the definition of before further studies in different Frankia : “Actinomycetic, nitrogen independent laboratories. In the fixing, nodule forming endophytes or year1964 (Becking et al., 1964; Silver, endoparasite that have grown in pure 1964) scientists observed prokaryotic culture in vitro and that: a) induce structure by electron microscope in effective or ineffective nodules in a root nodules of Alnus glutinosa and host plant and may be reisolated from Myrica cerifera. This fact re- within the nodules of that plant, b) established that these root nodules produce sporangia containing were inhabited by actinomycetes nonmotile spores in submerged liquid bacteria. Finally Frankia was isolated culture, and may also form vesicles, c) for the first time by Callaham et al., free living actinomycetes having no

REVIEW OF LITERATURE 15 known nodule forming or nitrogen three or four times in the history of fixing capacity, but that show the evolution. This evidence clearly morphology described above.” rejected the Doyle (1998) hypothesis of Scientists (Myrold, 1994, Benson and nitrogen fixation in nodules might have Silvester, 1993; and Akkermans and evolved at a definite time in the earth’s Hirsch, 1997) reported that three history. Swenson (1996), Benson and different kinds of cellular structure Clawson, (2000) and Jeong et al., have been produced by Frankia in pure (2000) supported the above idea for culture or in symbiotic condition i.e. further study. The measurement of vegetative hyphae, multilocular evolutionary distance was done by sporangia containing vesicle and ribosomal RNA-encoding gene (rrn spores. However, Frankia infected by gene), however there are a few rrn Casuarina does not produce vesicle in genes to study the evolutionary root system of plants (Sen, 1996). distance. Normand et al., (1996), Jeong 2.4. Systematic approach of Frankia- et al., (1999) and Clawson et al., actinorhizal symbiosis: (2004) made this types of study with with different conclusions. For this 16s r-DNA sequence study revealed type of work (Normand and Fernandez, that Frankia can be divided into three 2008) the Ochman's metric test clades. Other than those three clades, (Ochman and Wilson, 1987) was there has been another clade, which known to be the best. According to this one referred as ‘Frankia like clade’. metric, Frankia ancestor were evolved The existence of this clade was proven at about 350 MY ago from a group of by Benson and Clawson, (2000) in soil actinomycetes. The first trace of Polymerase chain reaction (PCR) land plant was found in this period. amplification studies of the 16s-rDNA. Frankia was thought to be appeared The isolation of Frankia was from this ancestor after a second phase successfully done from Clade I and of evolution, at about 100MY ago. The Clade II, however, the no isolation of first dicotyledonous plant families Frankia from clade III. This data started appearing during this period in revealed that Frankia-Actinorhizal the earth. This findings revealed the symbiosis evolved from more than one fact that, Frankia clusters emerged at common evolutionary ancestor or 100-200MY ago. During this period ancestral group and that to at least

REVIEW OF LITERATURE 16 the appearance of the oldest this infection because of their habitat. actinorhizal plant genera like Myrica Casuarina is inhabitant to drier and Alnus (Normand and Fernandez, Australian continent while 2008) takes place and they Gymnostoma branched out in the communicated with genus Frankia to wetter Melanesia region from the make symbiosis with each other. tertiary era beginning 65 MY ago Frankia has relaxed host specificity, (Pawlowski and Demchenko, 2009). compared to rhizobia. Host range for 2.5. Haemoglobin – a key factor in Frankia was found to be extremely global nitrogen balance: broad and limited in clade-to-clade Kubo first identified haemoglobin (Hb) interaction. The plants belong to as the haemoprotein like red pigment in Hammamelidae clade is nodulated by the root nodule of soybean (Kubo, the clade- I Frankia. clade II Frankia 1939). Further research revealed this nodulate the plants belong to protein has remarkable properties of Elaeagnaceae and Rhamnaceae clade oxygenation and oxido-reductase and Rosaceae plants are nodulated by properties as well, where the central clade III Frankia (Benson and iron can change its valency (Burris and Clawson, 2000). So, it can be Haas, 1944). Hb+O ó H b O concluded that that this specificity 2 2 (Oxygenation). This property of largely inhabits on a group of signal valency change helps the protein to molecules or a group of specific reversibly bind with oxygen and molecule that have same common carbon monoxide (Jokipii-Lukkari, characters as well as chemical 2016) and is therefore function as an backbone. The M y r i c a a n d excellent oxygen carrier. The presence Gymnostoma type of primitive of Hb in root nodules of all leguminous actinorhizal plants, produce nodules in and actinorhizal plants with very low their root system, when infected by partial pressure of carbon monoxide both Clade I and Clade II Frankia (L G incline to the fact that Hb is somehow Wall 2000). This observation connected with BNF (Santi et al., established the primitiveness of these 2013) by buffering the free oxygen plants which subsequently considered concentration at a low tension to them as most primitive actinorhizae. protect the nitrogenase from oxygen- Gymnostoma and Casuarina may inactivation (Appleby, 1992). consider as a probable explanation of

REVIEW OF LITERATURE 17

2.5.1. Importance of haemoglobin in legume nodules (Horchani et al., actinorhizal symbiosis: 2011), which further leads to high Vescicle is the special structure levels of stress. Reactive oxygen produced by Frankia to protect its species (ROS) such as hydrogen nitrogenase enzyme from O2. In peroxide (H2O2) and superoxide anions

Casuarina nodules, Frankia do not (O2 –) are produces by Class II Hbs in form vesicle. Frankia have been root nodules which lead oxidative synthesize Hb to protect nitrogenise stress (Becana et al., 2000). RNAi from the oxygen (Beckwith et al., inhibition of leghaemogllobin gene 2002). Symbiotic haemoglobin (sHb) transcription in root nodules of the in Casuarina nodule provides a legume Lotus japonicas confirms this partially anoxic environment, which fact. It also helps to reduces H2O2 subsequently help Frankia to fix contents (Gunther et al., 2007), along nitrogen (Berg and McDowell, 1987; with maintenance of free O2, loss of Jacobsen-Lyon et al., 1995). However, nitrogenase and nitrogen fixation in in pure culture Casuarina – Frankia, root nodules. Tavares et al., (2007) produce vesicle to protect their have shown that Frankia contributes to nitrogenase from the oxygen (Myrold production of ROS in nodules. So, both et al., 1994). Datisca glomerata actinorhizal and legume nodules have produces truncated haemoglobin to manage with high levels of stress. (ptHb) in Frankia infected root nodules Although research revealed that (Pawlowski et al., 2007). It is not actinorhizal plants can grow at in involved in O2 transport as class I Hb, relatively adverse conditions than but helps in nitric oxide (NO) legumes, and therefore, it is also detoxification. The occurrence of huge possible that they have improved amounts of a class I haemogllobin in antioxidant based defence mechanism alder (Sasakura et al., 2006) and (Pawlowski et al., 2011). The Myrica gale (Heckmann et al., 2006), intercellular infection takes place at a along with the evidence of ptHb in the higher rate compare to root hair Datisca glomerata root nodule with infection. There are a number of Frankia, indicates that, big amounts of bacterial factors and ecological nitric oxide are produced in conditions involved in nodulation actinorhizal root nodules, as seen in process. The ecological factors involve

REVIEW OF LITERATURE 18 water, light, nitrogen availability, soil but Hb is ubiquitously in eukaryotes as pH, phosphate, pO2 and pCO2. The well as in many bacteria playing bacterial factors are physiological state significant role (Dordas et al., 2003a; of the strain, amount and concentration Dordas et al., 2003b). In fact, Hb is of the inoculum, and nitrogen fixing available in almost all forms of life and ability of the bacterium. Those factors has been reported to be critically play crucial role in the regulation, involved in oxygen transport and development and function of the root circulation mechanisms in eukaryotic nodules in actinorhizal plants. forms of life including mammals Valverde and Wall, (1999) showed the (Dordas et al., 2003a; Dordas et al., tap root system consists of a short-term 2003b). gap of susceptibility for nodulation. Regarding plant Haemoglobin (pHb), it 2.5.2. Multipurpose nature of indeed play a crucial role in nitrogen haemoglobin: fixation, diffusion and buffer of the

Hb is a complex spherical shaped concentration of free oxygen, supply of protein, containing a haem beta oxygen to the developing tissues, prosthetic group within the alpha regulates nitric oxide scavenging helical globular fold, attached with a activities, seed dormancy, transition to proximal histidine side chain by flowering and root development etc covalent bonding. The prosthetic group (Hebelstrup et al., 2007). Nevertheless, of Hb contains a characteristic iron pHb largely remains behind the atom popularly known as the haem curtains and there are only a few component. It is crucial part of the review articles about the structure and molecule that plays key role in functionality of plant related Hb. reversible binding of oxygen with four 2.5.3. History (Evolutionary Trend of of the six coordination sites occupied haemoglobin from very beginning): by the haem pyrrole nitrogens. The The magic molecule “Hb” has been in protein part that encompasses and existence and circulation right from protects the haem component is known antiquity and is found to be very stable as the globin segment of the molecule in nature, after being subject to severe (Goodman et al., 1988). forces of selection. Hb is one of the Usually, Hb reminds us of mammalian probable earliest proteins of ‘Hadean blood present in the circulatory system, era’, thought to be present in Last

REVIEW OF LITERATURE 19

Universal Common Ancestor (LUCA). diversified into the oxygenic LUCA was a single cell organism and photosynthesizer and the oxygen had a propensity to be much more homeostasis (Vinogradov and Moens, flexible, and perform their metabolic 2008). processes. LUCA was found to be Oxygenic photosynthesizer single composed of Hb-like protein called domain globin has been characterised protoglobin (Vinogradov et al., 2007). by single or repetitive events with In anoxygenic time period, protoglobin subsequently developed 2/2 fold. There in LUCA established itself as was also marked evolutionary anoxygenic photosynthesizer and transition from monomeric to employed oxygen even before it oligomeric protein state (D'Alessio, existed in free state (Vinogradov and 1999).

Moens, 2008). Subsequently, Some ancient eukaryotes, bacteria and protoglobin converted itself from archaea acquired this protein by lateral anoxygenic to oxygenic gene transfer (Marti et al., 2006). In photosynthesizer and released oxygen eukaryotic system this globin as byproduct. So in ‘Archean era’ subsequently developed into protoglobin was one of the most neuroglobins, cytoglobins, myoglobins, important proteins which was actually and Hbs (Freitas et al., 2005). responsible for transformation of our 2.5.4. Variation of haemoglobin in atmosphere from anoxygenic to different domain of life form: oxygenic (Vinogradov et al., 2013). The discovery of Hbs in virtually all By the end of the ‘Archean era’ forms of life has revealed that besides, protoglobin was thought to be used by transport of oxygen between tissues, LUCA in order to balance the oxygen they execute additional functions homeostasis and for protecting them which range from intracellular oxygen from antioxidant enzyme like by- transport to catalysis of redox reactions products. Thus, they avoided probable (Vinogradov et al., 2006). These interaction with more oxygen and acted functional modifications of Hbs like the sHb of modern day nitrogen illustrate its acquisition of new roles fixing bacteria. So probably in the through which changes in the coding junction of ‘Archean’ and ‘Proterozoic’ regions as well as in the regulatory eras, single domain protoglobin got

REVIEW OF LITERATURE 20 elements of the same genes. Although, The α-globin gene is characterised by a low oxygen concentrations induces CpG island, which is actually many of these diverse Hbs, to date, responsible for functionality, whereas none of the molecular mechanisms for the β-globin gene tends to be AT-rich their hypoxic induction reveal common (Dikshit et al., 1990; Hardison, 1998). regulatory proteins (Fago et al., 2004). Gene organization study revealed the There are various sub-groups of globin presence of three exons in mammalian family that include myoglobin, Hbs (Hardison, 1998). Plant system erythrocruorin, Hb beta, contains similar type of Hb sequence leghaemoglobin (LHb), Hb alpha, with other globin proteins; perhaps globin-nematode type, myoglobin- fascinatingly splitting of middle exon trematode type, globin-lamprey- in LHb has given that particular Hb to hagfish type, Hb-extracellular type and four exon organizations (Anderson et globin-annelid type. (Vinogradov et al., 1996). al., 2007). Eight types of globins Bacterial Haemoglobin (bHb) binds including Hb, neuroglobin, myoglobin, oxygen at low concentration and cytoglobin, androglobin, globin E, significantly involves in delivering of globin X and globin Y are found in oxygen to the terminal respiratory Vertebrates. It is noteworthy that Hbs oxidase like cytochrome O (Freitas et and myoglobins have the ability to bind al., 2005). Although various other oxygen reversibly and they are the only functions like, sensing oxygen active members of globin family that concentration, detoxification of NO, can bind to the heme prosthetic group passing the signal to transcription (Pesce et al., 2003). Nevertheless, in factors has also been found. Apart from case of mammals, non-coding DNA above functions, btHb directly involves sequences alignments did not reveal in the mechanism of heme degradation significant matches between and iron release (Ott et al., 2005). mammalian α- and β-globin gene Hb has also been found in another clusters, which thought to be diverged lineage of life form, namely archaeal approximately 450 million years ago system. The main function includes and are still expressed in certain protection from nitrosative and coordinated fashion (Anderson et al., oxidative stress (Vinogradov et al., 1996).

REVIEW OF LITERATURE 21

2007). However, orientation of B10 couple sensor. Significant lateral gene distal tyrosine in archaeal system is bit transfer also occurred between different from other globin proteins bacterial to eukaryotic system in the and is parallel to the heme plane which middle of the ‘Proterozoic subsequently helps to decrease the era’ (Vinogradov and Moens, 2008). A binding capacity of oxygen vivid display of the pattern of (Vinogradov et al., 2006). diversification of Hbs in different

2.5.5. Structural evolution of living forms has been illustrated in haemoglobin in various domains of Figure 2.2. life: At the end of the ‘Proterozoic era’

Hb in different domains of life has bacterial system included: enlarged their amino acid length by 3/3 flavoHbs, 3/3 single domain addition of C- terminal region. globins Accumulation event of C-terminal 3/3 globin couple sensor, 3/3 domain occurred with the second protoglobins segment of single domain globin. 2/2 truncated Hb. This globin was acquired by unicellular Eukaryotic system comprised of: organisms and a few lower multicellular organisms like green 3/3 flavoHbs, 3/3 single domain algae, lower eukaryotes, some globins protozoans, some algae like 2/2 truncated Hbs and lacks 3/3 globin phytoplanktons, archaea and bacteria. couple sensor, 3/3 protoglobins; and Those events occurred during early Archaean system contained: ‘Proterozoic era’ (Vinogradov et al., 2007). 3/3 globin couple sensor, 3/3 protoglobins Consequently, this single domain globin developed into 3/3 folds 2/2 truncated Hb but lacks 3/3 structures, also termed as ‘chimeric flavoHbs, 3/3 single domain globins. globins’. Higher plants acquired this Such patterns of differential existence chimerical globin probably by lateral of the various types of Hbs in diverse gene transfer events and further forms of life are persistent till date diversified into flavoHb and globin (Kavanaugh et al., 1992). 2/2 truncated

REVIEW OF LITERATURE 22

Figure 2.2: Evolution of haemoglobins from the ancient earth with the time period

Hb have been reported to exist in determined globin domain length, prokaryotic as well as eukaryotic position of the proximal histidine and clades of life and are clearly evident distal residues and the chemical nature from Figure 2.3. of the heme pocket (Vinogradov et al., 2007). 2.5.6. Discovery of plant haemoglobins: Several hypotheses have been proposed regarding history and evolution of Protoglobins are considered to be globin. The LHbs and animal Hb genes present in LUCA (Last Universal are the result of convergent evolution Common Ancestor) depending on the

REVIEW OF LITERATURE 23

Figure 2.3: Distribution of haemoglobins in different life systems story, where each evolved been reports that claim a vertical shift independently and acquire identical of Hb genes at the end of ‘Proterozoic characters (Freitas et al., 2005). era’ (Roesner et al., 2005). Another hypothesis is that it arose in PHbs are structurally similar to animal the distant past prior to the common Hbs and myoglobins and were first ancestor of living organisms. Then a characterized from the root nodules of major portion lost their Hb genes or leguminous plants (Kubo, 1939). perhaps, they simply mutated to a great Further investigation revealed that extent (Vinogradov et al., 2013). presence of Hbs accounted for the red Another significant possibility has been pigmentation and these proteins were the lateral gene transfer. The trait termed as LHb. acquired laterally and subsequently passed on from the ancestors to PHbs are abundant in both dicots and descendants. However, there have also monocots. The intricacies of monocot

REVIEW OF LITERATURE 24 associated Hb functions in non- elucidating the complexities of Hb symbiotic tissues still remain in dark. functions in plants versus animals (Ott Primarily, pHb proteins were thought et al., 2005). to be restricted to plant species The ratio of oxygen and nitric oxide carrying out symbiotic nitrogen concentration in symbiotic nodules is fixation but further scientific maintained by the non-symbiotic investigations established their haemoglobin (nsHb). Flavins present in presence in non-nodulating plants as prosthetic group reduce globins by well. In the two types of Hbs, Hb-1 is transferring electrons from NAD (P) H expressed in leaves and roots while Hb to the heme iron of globin protein. This -2 expresses only in leaves. The reduction mechanism depends on biochemical properties suggest that this various factors like pyridine Hb protein probably does not function nucleotide, flavin coenzyme and to facilitate the diffusion of oxygen different classes of globins by means of (Arredondo-Peter et al., 1997). a specific combination with NADH or SHb helps in diffusion of oxygen and NADPH. FAD reduces class 1 globins buffer the concentration of free oxygen at fast reduction rate while class 2 to protect the nitrogenase from oxygen- globin reduce slowly. Significantly inactivation and thus aid to improve Class 3 globins reduce rapidly by FMN soil fertility. In legumes, symbiotic than FAD (Sainz et al., 2013). nitrogen fixation occurs in specific 2.5.7. Types of plant haemoglobin: organs called nodules. The LHb are PHb has been classified by various exclusively required for symbiotic ways. Based on the symbiotic nitrogen fixation, but not for cellular attributes, Hb may be broadly functions of plants. Physiological classified into class ‘0’, sHb, nsHb and analysis of nodules revealed the crucial ptHb (Ohwaki et al., 2005) (Figure contribution of LHb towards 2.4). The functional intricacies of establishing low free -ox ygen various types of Hbs have been concentrations but high energy state in discussed later in this review. nodules, condition that is necessary for effective symbiotic nitrogen fixation. 2.5.7.1. Class ‘0’ haemoglobin:

Such an observation demonstrates the In ancient Bryophytes, Hb is often strength of Hbs in plants and helps in coined as class ‘0’. This course group

REVIEW OF LITERATURE 25 contains both penta and hexa Bhattacharya et al., 2013). SHbs, in coordinated structures (Garrocho- plant system are found in various forms Villegas et al., 2007). They contain no like nitrosyl-LHb, oxy Hb, ferryl Hb, intron in their gene sequences like ferrous Hb etc. (Herold and Puppo, other pHbs (Reddy, 2006). 2005; Ji et al., 1994). The major 2.5.7.2. Symbiotic haemoglobin: function of sHb is to facilitate oxygen

SHbs were evolved from nsHbs diffusion, thus aiding bacterial (Trevaskis et al., 1997; Sturms et al., endosymbiosis (Perazzolli et al., 2006). 2011a) among non-legume nodulating It also buffers the free oxygen plants. (Kubo, 1939, Appleby, 1984, concentration, facilitating nitrogenase

Figure 2.4: Classification of haemoglobins in plant system

REVIEW OF LITERATURE 26 to remain at low tension state and in system. Though most of these Hbs are turn protect nitrogenase from oxygen- non-symbiotic (Appleby et al., 1988; inactivation (Ohwaki et al., 2005; Heckmann et al., 2006).

Appleby, 1992). Higher oxygen NsHbs were found to dwell mainly in association and lower oxygen leaf and stem. Nevertheless, Taylor et dissociation rate constants help to al., (1994) and Sowa et al., (1998) achieve binding and release of oxygen reported its presence in roots of the gradient within the nodule seedlings, particularly the root tissues (Igamberdiev et al., 2011; Kundu et al., exposed to flooding stress and oxygen- 2003). stressed aleurone tissues. The presence SHb mostly falls under class-II Hb of different types of Hb in different other than Parasponia Hb, which is plant tissues indicates its function other originates from class-I and structurally than oxygen transportation (Arredondo & functionally similar to nsHb -Peter et al., 1997; Ross et al., 2001).

(Bhattacharya et al., 2013). NsHbs are structurally hexacoordinates 2.5.7.3. Leg haemoglobin: where distal histidine reversibly binds

LHb is distributed widely in the iron at the sixth coordination site leguminous plants and function to (Arredondo-Peter et al.; 1997; Duff et al., 1997). modulate NO during symbiosis. LHbs were evolved after diversification of NsHbs have also been observed to sHbs from nsHbs (Appleby, 1984; actively participate in mitochondrial Perazzolli et al., 2006). ATP synthesis (Nie and Hill, 1997).

2.5.7.4. Non-symbiotic haemoglobin: NsHb expression is modulated by the calcium-ATPase concentration and Initially it was assumed that Hb exists also found to interfere with calcium only in symbiotic plant systems due to modulated signal transduction (Nie et their abundance in symbiotic root al., 2006; Ross et al., 2004). Excess nodules (Arredondo-Peter et al., 1997; occurrence of nsHb helps to maintain Ross et al., 2001; Appleby et al., 1988; the growth of root portion, cytosolic Heckmann et al., 2006). However a ATP levels and lower nitric oxide series of research, involving various levels under hypoxia (Dordas et al., plant species showed that Hb is 2003a; Igamberdiev and Hill, 2004). virtually present in all kinds of plant Nitrate, nitrite, and nitric oxide donors

REVIEW OF LITERATURE 27 induce nsHb proteins to associate with adaptation, cold tolerance and also NADH-nitrate reductase and enhance regulates the gene expression involved its function (Ohwaki et al., 2005; in phosphatidic acid synthesis and Igamberdiev et al., 2006a). sphingolipid phosphorylation (Cantrel

Phylogenetic analyses reflect two et al., 2011). distinct classes of nsHb, (Class-I and Class-I nsHbs are characterized by low Class-II) (Trevaskis et al., 1997; values of the hexacoordination Gopalasubramaniam et al., 2008) equilibrium constant (KH) in where evolution was observed from comparison to other classes of Hbs. hexacoordinate to pentacoordinate (Garrocho-Villegas et al., 2007; transitional state. Further study Hebelstrup et al., 2007). revealed, this classification also The first class-I nsHb crystal structure associated with the affinity of Hbs to be studied thoroughly was the Oryza towards oxygen (Ohwaki et al., 2005). sativa Hb I (Hargrove, 2000; Halder et Monocotyledonous plants contain al., 2007). Usually class-I nsHbs exclusively the class-I nsHbs, whereas, constitute one or two cysteine residues both the classes of nsHbs (class-I and per monomer which helps in reduction II) are reported in dicots (Hunt et al., of ferric ions and protects the protein 2002). from autoxidation (Hoy and Hargrove, 2.5.7.5. Non-symbiotic class-I 2008; Igamberdiev et al., 2006b). haemoglobin: Cystein residues along with sulfahydryl

Class-I nsHb shows high affinity to reagents such as reduced glutathione, oxygen, (Hargrove et al., 2000; facilitate the transition of ferric to Smagghe et al., 2009) and also ferrous state of the protein regulates nitric oxide scavenging (Igamberdiev et al., 2005, Sturms et activities, thus correlates crucially to al., 2010) and allow nsHb I to serve as major developmental processes like soluble electron transport proteins, in seed dormancy, transition to flowering, the enzymatic system (Sturms et al., root development (Mendelson et al., 2011b; Dordas et al., 2003a). Class-I 1994; Gupta et al., 2011) and nsHb is also found to assist the adaptation to abiotic and biotic stress maintenance of the ATP level, at low oxygen concentrations (Arechaga- (Baudouin, 2011). Nitrate reductase produce NO, which helps in plant Ocampo et al., 2001; Trevisan et al.,

REVIEW OF LITERATURE 28

2011; Dordas et al., 2004; Sowa et al., evident from the Hordeum vulgare 1998). However, over expression of class-I nsHb structure that the ‘‘piston class-I nsHb leads to vigorous growth ‘movement of the E-helix along the which however, protects the plant from helical axis is responsible for ligand hypoxia and modulate survival rate binding properties (Smagghe et al., (Ohwaki et al., 2005; Hunt et al., 2006; Bykova et al., 2006). Class-I 2002). Furthermore, highly expressed nsHbs have been found to display levels of class-I nsHb was found to lower necrotic symptoms and higher exhibit a wide domain of functions that nitric oxide scavenging activity in included hydrogen peroxide Nicotiana tabacum plants (Sereglyes et scavenging activities under hypoxia al., 2004; Sereglyes et al., 2000). (Perazzolli et al., 2004; Igamberdiev et Distal histidine of nsHb I has been al., 2014; Yang et al., 2005), reported to interact with a conserved peroxidase-like activities, NADH phenylalanine of the β-helix (PheB10) oxidation, and S-nitrosoHb nitric oxide and result in disorder in the CD-region scavenging (Perazzolli et al., 2004; (Sasakura et al., 2006; Roesner et al., Sakamoto et al., 2004). 2005, Smagghe et al., 2006). Ligand migration in class-I nsHb is regulated Expression of class-I nsHbs in root by suitable interaction between region are sometimes repressed by hydrophobic channel and internal fungal infection (Gupta et al., 2011; ligand docking sites (Bruno et al., Uchiumi et al., 2002) but enhanced by 2007). different conditions like hypoxia, cold stress, rhizobial infection, plant 2.5.7.6. Non-symbiotic class-II hormones, nitric oxide and sucrose haemoglobins: levels, etc (Sereglyes et al., 2000; Class-II nsHb displays a comparatively Shimoda et al., 2005; Bidon-Chanal et lower likeness to oxygen (Dordas et al., 2007; Marti et al., 2006; al., 2003b; Vigeolas et al., 2011) and Igamberdiev et al., 2005; Qu et al., involved in supplying of oxygen to the 2006). developing tissues (Hill 2012; Vigeolas

Hordeum vulgare class-I nsHb has et al., 2011). been observed to be bound with Generally it has been reported that cyanide ligand (Smagghe et al., 2008; dicot plants usually contain class-II Ioanitescu et al., 2005). It has been nsHb. However, the only monocot

REVIEW OF LITERATURE 29 plant to contains class-II nsHb is Zea (Hunt et al., 2001; Sakamoto et al., mays (Garrocho-Villegas and 2004; Thiel et al., 2011) which signify Arredondo-Peter, 2008). that class-II nsHbs are deeply involved

Hexacoordination arrangement in class in seed oil production and supplying -I nsHbs appear to be much more oxygen to the developing seeds (Hoy stable compared to class-II nsHbs. This and Hargrove, 2008; Trevaskis et al., arrangement actually helps class-II 1997; Roesner et al., 2005). nsHbs to increase the oxygen storage 2.5.7.7. Comparative relationship and diffusion affinities and also lower between symbiotic and non-symbiotic the NO scavenging function (Kakar et haemoglobins: al., 2011a; Garrocho-Villegas and SHb differ structurally from nsHbs and Arredondo-Peter, 2008). myoglobin, as their points of ligand NO and oxygen occupy two ligand orientations are completely diverse. binding sites of class-I nsHbs while NsHbs and myoglobin actually class-II nsHbs offer only one ligand depends on distal pockets whereas, sHb binding site as another site is always depends on its proximal pockets for occupied by NO (Nienhaus et al., ligand regulation (Kakar et al., 2011b; 2010). Protein instability regulate the Kundu et al., 2003). In case of exchange of ligand to the distal pocket myoglobin the distal histidine is found in class II ns Hbs (Bruno et al., 2007). to be bound with a stabilised water

The oxygen binding properties of class molecule by a strong hydrogen bond -II nsHbs are more similar to sHbs and (Kundu and Hargrove, 2003) which is display affinity to oxygen, resembling subsequently removed paving way for other ligand to bind. On the contrary, the rate of cytochrome oxidase. Such a tendency of predilection for oxygen in sHb water is not stabilized in the binding actually suggests their role distal pocket. Proximal histidine has towards oxygen diffusion (Spyrakis et been found to bear an eclipsed al., 2011; Vigeolas et al., 2011). orientation with haem pyrole nitrogens in nsHbs and myoglobins whereas, in It has also been revealed that under low case of sHb proximal histidine increase external oxygen concentration, the ligand affinity by maintaining a expression of class-II nsHb increases definite orientation (Quillin et al., up to 5-fold and prevents fermentation 1993; Takano, 1977; Kundu et al.,

REVIEW OF LITERATURE 30

2003; Tarricone et al., 1997). the region of globin fold with

2.5.7.8. Plant truncated (Class-III) glutamine E7 and tyrosine B10 which haemoglobin: basically stabilize the ligand (Pesce et al., 2003). The surface of the protein is PtHbs are found to be distributed, connected to the distal haem pocket by almost entire plant kingdom, from the a polar tunnel with a xenon binding site ancient algal ancestor to the recent at the position of tunnel entry and higher plants (Couture et al., 1994). tunnel branches (Milani et al., 2004a; They have very low affinity to oxygen Milani et al., 2004b). therefore classified as class III Hb (Hunt et al., 2001) however, their Some plant species contains two ptHbs function is somewhat obscure as of - the first one usually found in the root now. ptHb shows very low similarities region and shows similarities in their with class-I and class-II Hbs and expression with sHbs, whereas second possess 40-45 % sequence similarities one found in the base of the nodules, with structural motif’ of btHbs. BtHb vascular tissues and mycorhizal roots. sequences are usually shorter than However, function of these Hbs still usual vertebrate and non-vertebrate remain an area unexplored (Vieweg et Hbs and myoglobins though their al., 2005; Watts et al., 2001). Based on length differs extensively for each case the functional patterns, scientists (Ascenzi et al., 2007). However ptHbs proposed that the function of ptHb possess genes that are longer than might be related to suppression of genes encoding 3-on-3 Hbs (Jokipii- defence processes against symbiosis Lukkari et al., 2009), where 2-on-2 and also detoxification of nitric oxide sandwich system were found, instead (Vieweg et al., 2005; Pawlowski et al., of 3-on-3 classical Hb fold 2007). (Vinogradov et al., 2006). In oxygenated state ptHb shows

According to researchers, the ‘2 on 2 pentacoordinate arrangement (Watts et Hb fold’ originated by the division of al., 2001) while hexacoordinate classical ‘3 on 3 alpha globin arrangement has also been seen during fold’ (Holm and Sander, 1993; reduction and deoxygenation (Couture Baudouin, 2011; Cantrel et al., 2011). et al., 1999; Das et al., 1999). Conversion of hexa to penta PtHb possess glycine-glycine motifs in coordination arrangement in ptHbs,

REVIEW OF LITERATURE 31 takes around 20 minutes which is been deleted in course of evolution. extremely slow compared to other Comprehensive sequence and structure classes of Hbs (Wittenberg et al., 2002; based studies suggest that class-I and II Gupta et al., 2011). plant nsHb in animal myoglobin and 2.5.8. Resemblance of plant Hb might have been derived from a haemoglobins with different forms of common unicellular ancestor, globin protein: estimated to have lived around 1500

Characterization of pHbs, showed million years ago (Moens et al., 1996). similarities of class-I and II Hbs with 2.5.9. Structural rearrangement of animal myoglobin and class-III Hbs plant haemoglobin: with btHb (Watts et al., 2001). Hbs may also be classified on the basis Structural analysis and intron-exon of coordination of the haem iron. SHbs arrangement revealed that Barley (class and nsHbs are homologous to 3/3 fold -I) and Rice (class-II) Hb resembled of animal Hb and the remaining pHbs animal myoglobin and Hb (Holm and consist of 2/2 fold called Sander, 1993; Hoy et al., 2007; Perutz, ‘ptHb’ (Vazquezlim et al., 2012). 3/3 1979) which further established that Hbs had emerged from green algae there are two conserved introns before even the evolution of prevalent in both in animal system embryophytes (Meakin et al., 2007). (myoglobin and Hb genes) as well as 2/2 Hbs originated from ancient plant system (sHb and nsHb) (Mathieu organic photosynthesizers which et al., 1998). further interacted with photo-system I

Additionally plant symbiotic and and II and such an interaction has nsHbs have been reported to contain a continued till date (Vinogradov et al., 2006). Duplication, diversification, and third intron that resembles neuroglobin and cytoglobin genes of vertebrates functional adaptations have been and Hb genes from invertebrates attributed to evolutionary patterning in (Burmester et al., 2002; Roesner et al., the plant kingdom. In the middle of the 2005). Such occurrence indicates that ‘Proterozoic era’ diversification of three intron arrangements is the plant nsHb genes into the nsHb-1 and primitive version of the Hb genes in nsHb-2 occurred and then it emerged which the central intron might have gradually in monocot and dicot plants

REVIEW OF LITERATURE 32 respectively, followed by subsequent detoxification and electron transfer evolution into sHbs at 600 mya (Arredondo-Peter et al., 1998).

( A r r e d o n d o - Peter, 2011). In case of hexacoordination, both distal Rearrangement of heme - F e and proximal histidine occupies the coordination paved way for successful haem active site. Proximal histidine transition of hexa- to penta- occupies the fifth position as usual coordination which subsequently lead while the binding of distal histidine is non- symbiotic to symbiotic reversible which also allow the gaseous arrangement of pHbs (Nakajima et al., exogenous ligands to bind. Therefore, 2005). Systematic order of Hbs in plant pentacoordination state provides an system from the Algal ancestor till date optimal condition for storage and was shown in Figure 2.5. transportation of oxygen over the In plants, pentacoordinate and hexacoordination state (Gupta et al., hexacoordinate Hbs are predominant. 2011). Erythrocyte Hb and other oxygen 2.6. Computational biology of transporter Hb possess pentacoordinate haemoglobin proteins associated structure. The haem prosthetic group with nitrogen fixation: contains an iron atom with six Along with the developments of coordination sites of which four various computational techniques, a coordination sites are occupied by the large amount of data made available in haem pyrrole nitrogens. Usually two the public domain. This data also histidine side chains are always includes nucleic-acid and amino-acid attached to the prosthetic group of Hb sequences of Hbs from a wide range of (Trent and Hargrove, 2002). plant system and microbes as well. When the proximal histidine However, very little is known about the coordinates the fifth position and structure and role of all these Hb leaving the sixth position free to bind proteins. X - ray and NMR the exogenous ligands of diatomic crystallography techniques are most gases like nitric oxide and oxygen, commonly used method to determine pentacoordinate structure of Hb is the pHb and bHb protein structures formed. Pentacoordinate Hb has the experimentally. However the tertiary ability to transport its bound ligands structures of large number of Hb thus, scavenging, sensing, proteins from different plants and

REVIEW OF LITERATURE 33

Figure 2.5: Systematic order of haemoglobins in plant system from the Algal ancestor till date

REVIEW OF LITERATURE 34 microbes particularly those of in case of slow growing organisms and symbiotic, non-symbiotic and truncated their subsequent proteins which poses has not been yet resolved. The exact difficulties during protein purification. mechanism of working of these plant Homology modeling technique and bHb proteins are also relatively introduced in the field of research by unknown due to the difficulty in Browne et al., (1969). a large number obtaining crystals of oxygenated Hbs. of homology models of proteins with This is because the resting state of the different folds and functions have been protein does not bind oxygen, and also reported since the mid 1980s (Johnson the competitive binding nature of Hb et al., 1994; Sali, 1995). The protein with O2 and NO. Additionally, a models generated by homology number of differences have also been modeling techniques are quite useful in come out regarding the X-ray providing conformational properties crystallographic protein structures and structure-function relationship. (Chang et al., 2012). In this regard a The three dimensional protein structure possible alternative approach to predict of plant as well as bHbs are often the three dimensional protein structure considered as ideal model system for was introduced. This technique is the the study of the oxygen binding homology based virtual protein properties, electron transfer modeling technique. Homology mechanisms, complex metal cluster modeling is a consistent technique that assembly, protein–protein interactions, can predict the three dimensional structure-function relationships and last structure of protein with accuracy but not the least the plant microbe similar to one obtained at low- association specially the actinorhizal resolution by experimental means symbiosis. (Marti-Renom et al., 2000). This computer based bioinformatics Molecular Dynamics simulations offer technique of protein modelling depends detail information about the molecular on the various parameters like selection motional properties of specific given of template of query protein, the time period and are widely used to alignment of a protein sequence of study protein motions at the atomic unknown structure (target) etc. This level. First protein simulation for 9.2 ps technique is very much useful mostly was carried out by Mc Cammon et al.,

REVIEW OF LITERATURE 35

(1977). This research group run bovine have been utilized for phylogenetic pancreatic tyrpsin inhibitor (BPTI) in reconstruction based on a structural molecular dynamics simulation to dissimilarity (RMSD) matrix (Boyd et understand the motional properties of al., 2011). that particular inhibitor. In the year 2.7. In-silico analysis involved in 1979 Case and Karplus, (1979) worked computational biology: on dynamics study of ligand binding A phylogenetic approach is one of the heme protein. First application of best techniques in computational normal modes to identify low biology to understand the evolution of frequency of the proteins was done by proteins. This technique is useful to Brooks and Karplus, (1983). This was construct the phylogenetic tree in case achieved by the oscillations using the of divergently evolved proteins. energy minimization of the molecular Phylogenetic approaches have mainly mechanics force-field of protein. First come about on the basis of the simulation of protein in a virtual water similarities in the nucleic acis as well box was done by Levitt and Sharon, as protein sequences. But previous (1988). studies revealed that the origin and Metalloproteins like pHbs are extending distribution of pHbs and responsible for many vital functions. bHbs are confusing from a Structure based studies focused mainly phylogenetic perspective. This is on storage role, involvement of because of factors that confused electron-transfer process and binding molecular phylogeny such as sequence properties of these proteins. divergence, paralogy, and horizontal On the basis of the availability of their gene transfer (Raymond et al., 2004). 3D structures, structural divergent of This confusion lead the postulation that these protein was also been studied only sequence based phylogeny is not apart from molecular dynamics enough to disclose the complex simulation. The studies revealed that evolutionary trend in pHbs and bHbs. various pHb and bHb proteins have Many workers (Nadler, 1995; Qi et al., structural, functional and mechanistic 2004; Sims et al., 2009) supported the similarities as well as evolutionary belief of erroneous sequence based relationships. The structures of various alignment methods. Hence the homologs of pHb and bHb proteins alternative approaches came out.

REVIEW OF LITERATURE 36

The suitable alternative approach found proteins with many other bHb proteins other than sequence alignment is with diverse biological functions, structure based phylogenetic approach. which share low sequence similarity It is well known fact that the features but high structural similarities. Along of protein conformational structures are with the evolution of pHb and bHb, highly conserved than that of amino features that need attention is the acid sequences in case of homologous functional divergence of the proteins proteins (Chothia et al., 1986; Hubbard involved in this biological process. and Blundell, 1987). Various way of Experimental works by various research demonstrated that the workers (Gu, 1999; Dermitzakis and homologous proteins maintain similar Clark, 2001; Raes and Van de Peer, protein structure as well as function, 2003) have shown that a gene though they could diverge beyond duplication event leads to a shift in recognition at the level of their amino protein function from an ancestral acid sequences. Various protein function. As a result some residues are families like short-chain alcohol altered their functional constrains dehydrogenases (Breitling et al., 2001) which subsequently leads into and metallo-β-lactamases (Garau et al., functional divergence. This is the 2005) showed low level sequence reason behind the different similarities but retained the similar evolutionary rates found at these sites kind of protein folds as well as broad which vary with different homologous biochemical features. Currently, genes of a gene family. Site-specific phylogenetic approach based on 3D altered evolutionary rates can be structures has been utilized for detected by comparing the rate characterization of functional correlation between gene clusters, properties of cupin folds (Agarwal et when the phylogeny is given (Gu, al., 2009). It was found that structure 1999). This approach has been earlier based phylogenetic analysis reflect exploited by Gribaldo and his co- functional clustering of cupin workers (2003) to trace the functional superfamily. divergence in vertebrate Hbs. Alpha Therefore, structure based approaches subunits of G-protein (Zheng et al., can be employed to assess the 2007), OPR gene family in plants (Li et phylogenetic relationships of pHb al., 2009), anoctamin family of

REVIEW OF LITERATURE 37 membrane proteins (Milenkovic et al., bioinformatics techniques, determine 2010) were also been exploited by this the molecular properties and functions method. But a broad picture on the of the protein at the basic level. With functional divergence in the pHb and the rapid increase in the number of bHb protein family is still unavailable. pHb genes along with the complete genomes of varied bacteria in the 2.8. Challenges and future prospects: public domain, the computational tools Considerable progress has been made act as powerful weapon to handle the in understanding the machinery of unresolved properties of actinoHb pHbs in last decade. The major part of (actinorhizal pHbs and bHbs). the research has been focused on the diversity of actinorhizal Hbs from With the help of in-silico protein other pHbs, elucidation of the modelling techniques and new compositions and functions of all of the algorithms associated with structural pHb gene products. In the past, divergence, the problems associated problems associated with detection of with interpretation of sequence data, Hb gene from different types of plants functional evolution of pHbs and bHbs specially symbiotic and truncated Hb can be resolved in a better way and of actinorhizal plants and subsequent subsequently increases the new sights crystallization of the protein has been of research. Studies aided by the the major problems in the Hb of plant bioinformatics tools offer a global view kingdom research. But as we entered of the expression, regulation, into the post genomics era, the major dynamics, evolution of the proteins, hurdles have been removed. So the inter evolutionary rate and the challenge now is to deposit all the capability in offering new opportunities known information’s together and with to preserve and improve biotic the help of combined application of resources. biochemistry. genetics and

MATERIALS AND METHODS Chapter 3 Materials and Methods

district includes three subdivisions - 3.1. Ecology of Alnus nepalensis in Kurseong, Mirik and Siliguri. sub-Himalayan West Bengal and Kurseong and Mirik subdivisions are Sikkim: hill terrain while Siliguri lies in the 3.1.1. The study area: plains (Table 3.1). Gangtok is the Eastern Himalayas is famous for its capital of Sikkim, the 22nd tiny richness in biodiversity. The study area Himalayan state of India, located was selected in view of its rich between 27°04′46″ to 28°07'48″N Biodiversity and comprises parts of latitude and 88°00′58″ to 88°55′25″E eastern Himalayas like Darjeeling and longitude (Basistha et al., 2010). It is Kalimpong districts of West Bengal situated on the edges of the Eastern and Gangtok with its adjoining areas of Himalayas and is enclosed by the Sikkim. Darjeeling and Kalimpong Tibetan plateau in the north, Nepal in districts lie in the northernmost part of west, Bhutan in the east and Darjeeling West Bengal, a state in eastern India in in the south. The state is predominantly the foothills of the Himalayas. The mountainous; with no plain land spread study area of Darjeeling and over 7096 sq km (Basistha et al., Kalimpong, is lies in between 26°31′ to 2010). However the study area covers 27°13′N latitude and 87°59′ to 88°53′E Gangtok and adjoining regions, longitude (Das and Ghose, 2011). It is popularly known as East-Sikkim which situated on the flunks of the eastern lies between 27°17′25″ N to 27°20′11″ Himalayas, and is bordered by the N latitude and 88°35′27″ E to 88°36′ Tibetans plateau in the north, Nepal in 55″E longitude (Table 3.1). west, Bhutan in the east and Jalpaiguri 3.1.2. Environmental studies: and Dinajpur district of West Bengal in south (Figure 3.1). Geographically, the Literatures regarding sampling intensity of Alnus nepalensis of the districts are mostly mountainous with plains known as the Terai. Darjeeling study area are extremely lacking. A

EAST AND SOUTH SIKKIMMATERIALS AND METHODS 39 Enchey monostry Upper sichey * * Burtuk * * Gairi gaon Tadong * * Deorali * Rani pool *

gpo Ran * *

Manpur *

DARJEELING & KALIMPONG Sukhia p

okhri * * Pash * upati * * * * Mir* ik* *

Figure 3.1: Map route of study area

MATERIALS AND METHODS 40

Table 3.1: Detail information of study area Study area Altitude Rainfall Soil type Ecological region (ft.) (mm) Darjeeling 5473-7431 82 - 3092 Loamy Temperate Kurseong 4888-5473 36 - 3736 Sandy Worm & temperate Mirik 5267 24 - 2876 Clayish & temperate Kalimpong 3811-7598 34 - 2395 Clayish Mild, worm & temperate Gangtok 3926-4857 56 - 3626 Rocky Mild - temperate Namchi 4154 50 - 2699 Loamy Temperate total of 18 different sites were selected species association etc were studied in throughout the study area to record each of the 18 sites. different geophysico -chemical 3.1.3. Germplasm collection: parameters. The morphological The germplasm, for carrying out characters of plant were also recorded. experiments based on the objectives The sites were named as per the were collected from various location information collected from the local and altitude of geophysico-chemical people. Each study site includes slopes, study sites. Fully grown developed vertical slopes, valley, landslides area fruits and seeds were collected during and non-riverside areas. Different geo- month of April and stored in a sterile physico-chemical parameters like glass bottle. Leaves were collected at altitude, latitude and longitude were the same time for DNA isolation. recorded by Global Positioning System (GPS) (Gramin EXrex Vista H). Other 3.1.4. Soil collection: data such as aerial temperature and Soils were collected from the nodule humidity were recorded using digital collection sites of A. nepalensis in the thermometer (Multi thermometer, CE month of April. For detail of the make), humidity meter (Model LR6 collection sites refer Table 3.2. Mignon AA, England make), soil pH 3.1.5. Soil analysis: and moisture meter (Takemura, Japan make model DM – 15) and the soil Essential data like aerial temperature, temperature with soil thermometer pH, soil temperature, humidity, soil standard field equipment. Soil quality, texture and colour were recorded colour, topography, site cover, during the study of geophysico- vegetation type, management and chemical parameters. Soil collected

MATERIALS AND METHODS 41 from a depth of 8-10 inches below the zones of natural populations of sub- surface (after removing the top most Himalayan West Bengal and Sikkim. soil) and were stored in a screw cap The collected leaves were wrapped in bottle tightly sealed and labelled. aluminium foil and preserved in Analysis of different parameters of soil cryogenic container (CRYOSEAL-IR- like sulphur, pH, nitrogen (Bremner, 7, Model no. 023017) containing liquid 1996), organic carbon (Walkley, 1947), nitrogen. To explore the genetic potash (Pratt, 1968) and phosphate diversity of A. nepalensis, the study (Truog, 1930) content were estimated area was divided into three parts based in laboratory. on geographical proximity. The 1st zone of natural population includes 3.2. Genetic diversity of A. Kurseong, Sonada and Ghoom. nepalensis: Darjeeling, western region of Ghoom, 3.2.1. Material collection: Sukhiya Pokhri, Pashupati and Mirik Fresh leaves of A. nepalensis were were considered as 2nd zone and 3rd randomly collected from different zone includes Gangtok, Rishop,

Table 3.2. Geographical location from where soil samples were collected (Sl no=Serial number)

Sl no. Zone Site name Altitude Latitude Longitude (ft) (°N) (°E) 1 Enchey monostry 5909 27.3357 88.6176 East Sikkim 2 Upper Sichey 5428 27.3349 88.6144 (Gangtok 3 and adjoin- Lower Burtuk 5183 27.3471 88.6126 ing region) 4 Deorali 4574 27.3219 88.6063 5 Gairi Gaon 4418 27.3144 88.6012 6 Tadong 3720 27.3203 88.5967 7 Upper Tadong 4131 27.3118 88.5972

8 Samdur 3569 27.2975 88.5923 9 Rani pool 3926 27.2966 88.5958 10 Kalimpong Kalimpong 5088 27.0864 88.4983 11 Mongpoo 3616 26.9788 88.3601 Darjeeling 12 Kurseong 4888 26.8863 88.2777 13 Sonada 6238 26.9577 88.2695 14 Ghoom 7347 27.0081 88.2594

MATERIALS AND METHODS 42

Kalimpong and Mongpoo. Collection added and the mixture was mixed of plant parts were done in the middle gently by inverting the tube upside April. A. nepalensis leaves were down. collected from 17 plants of population The mixture was centrifuged (REMI 1, 14 plants of population 2 and 12 make, Model No.C-24) at 6000 rpm for plants of population 3. in the sampling 15 minutes at 25°C and supernatant area plants were located approximately was transferred carefully to a fresh 4-7 meters apart from each other tube. (Table 3.3). Equal volume of ice cold Isopropanol 3.2.2. Genomic DNA isolation from A. (Merck India, Cat # 17813) was added nepalensis: to the final supernatant. Previously stored A. nepalensis leaves Upon gentle swirling the DNA - CTAB were taken out from the cryogenic complex precipitated as a whitish container (CRYOSEAL-IR-7, Model network and was spooled out of the no. 023017) and used for isolation of solution using a bent Pasteur pipette. genomic DNA by the following steps. (For some strain, DNA was not Five (5) grams of fresh leaves were observed as DNA- CTAB complex, but taken in a motor and pestle, and ground after adding ice cold isopropanol and it into a fine powder along with liquid incubation at 4°C for 30 minutes

N2. followed by centrifugation at 12000

The pulverized material was taken in rpm for 15 minutes at 4°C DNA an Oakridge tube containing 15 ml of precipitation was observed). pre-warmed CTAB extraction buffer DNA was then washed in 70% ethyl (65°C) (refer appendix C for alcohol (BDH Cat#10107) and allowed composition) and gently swirled. to dry in air. It was finally dissolved in

The tube was then incubated in a water 500 µl of 1X TE buffer (pH 7.4). bath (Rivotek) for 1 hour at 65°C with 3.2.3. Purification of DNA: occasional mixing by gentle swirling. RNA, protein and polysaccharides are Following this, an equal volume of most important contaminants found in chloroform (Merck India, Cat# crude DNA preparation and it is 822265) : isoamyl alcohol (Merck importance to remove them as these India, Cat# 8.18969.1000) (24:4) was will hamperd the further downstream

MATERIALS AND METHODS 43

Table 3.3: Detail description of collection site (Sl no=Serial number) Sl no. Population Region Site name Altitude Latitude Longitude (ft) (°N) (°E) I-1 Kurseong Baghgora 4959 26.879509 88.278594 I-2 Population Hill cart road 4905 26.880552 88.277335 I I-3 Montiviot 4888 26.886389 88.277721 I-4 St. Marys post office 5096 26.894273 88.279787 I-5 Chaita Pani Tea gar- 5383 26.903776 88.292014 den I-6 Edenvale Tea garden 5473 26.916533 88.289731 I-7 Sonada Lower Sonada 5842 26.936511 88.290210 I-8 Sonada khasmahal 6238 26.957725 88.269532 I-9 Sonada post office 6389 26.960081 88.271320 I-10 Tibetian monastery 6700 26.966750 88.274087 I-11 Sonada forest 7174 26.971903 88.280224 I-12 Upper Sonada 6915 26.984010 88.278330 I-13 Ghoom Ghoom 7091 26.989249 88.270367 I-14 Dooteria forest 7033 26.995121 88.259360 I-15 Senchal 7362 27.003892 88.260622 I-16 Bhalikhop 7347 27.008180 88.259458 I-17 Ghoom monastery 7293 27.011612 88.250448 II-1 Population Darjeeling Saint Josephs College 6492 27.059512 88.250667 II-2 II Richmond hill 6451 27.057831 88.256503 II-3 Chauk bazar 6904 27.047997 88.265206 II-4 Pandam Limbu 6577 27.042177 88.269932 II-5 Limbugaon 7052 27.038853 88.266641 II-6 Jalapahar 7431 27.03207 88.264865 II-7 West point 6738 27.026802 88.254201 II-8 Batasia loop 6968 27.018308 88.248157 II-9 Ghoom Lepcha jagat 6952 27.010235 88.196913 II-10 Majdhura 7077 27.004332 88.171849 II-11 Sukhia Sukhia pokhri 7102 26.997813 88.167230 pokhri II-12 Pashupati Mim Nagri 6756 26.973733 88.130303 II-13 Pashupatinagar 6669 26.946805 88.125566 II-14 Mirik Mirik 5267 26.886290 88.187986 III-1 Population Gangtok Lower sichey 4857 27.336767 88.606937 III-2 III Tadong 4131 27.311819 88.597223 III-3 Namchi 4154 27.160503 88.369701 III-4 Ranipool 3926 27.296625 88.595803 III-5 Rishop Rishop 7598 27.183197 88.521206 III-6 Icha forest 6663 27.126697 88.580521 III-7 Kalimpong Delo 5088 27.086460 88.498359 III-8 Baghdhara 3811 27.067951 88.471776 III-9 Sunwar 3880 27.037848 88.447891 III-10 Mongpoo Upper Mongpoo 6144 26.972187 88.339870 III-11 Chinchona plantation 3643 26.973322 88.370000 III-12 Mongpoo bazar 3616 26.978898 88.360149

MATERIALS AND METHODS 44 processing. CTAB DNA extraction make, Model No.C-24) at 12,000 rpm. buffer helps in elimination of The pellet obtained was washed in 70% polysaccharides from DNA ethyl alcohol, dried and dissolved in 50 preparations to a large extent. The µl of 1X TE buffer (pH 7.4) (refer RNA was removed by treating the appendix C for composition). sample with RNase enzyme. Extraction 3.2.3.2. Purification of RNA: with phenol : chloroform following RNase treatment was also employed RNase-A (50 µg/ml) (Sigma, Cat#R- for eliminating RNA and most of 4875) was added to the genomic DNA proteins by using the following dissolved in 500 µl of 1X TE buffer protocol. (pH 7.4) and it was incubated at 37°C for 1 hr in a dry water bath (GeNeiTM 3.2.3.1. Protein purification: make, Cat#107173). For removal of protein from DNA An equal volume of chloroform : samples, the dissolved DNA was Isoamyl alcohol (24:1) was added and extracted with an equal volume of mixed properly. equilibrated phenol (pH 8.0) (Sigma, Cat#P4557-400ML) and mixed Centrifuged at 10000 rpm for 15 properly for 10-15 minutes. It was then minutes at room temperature. centrifuged at 12000 rpm for 15 The upper aqueous layer was minutes at 25°C. transferred to a fresh microcentrifuge

The upper aqueous layer was taken in a tube. fresh tube and extracted with an equal To this aqueous phase 0.1 volume of volume of chloroform: Isoamyl alcohol 3M sodium acetate (pH 5.2) and (24:1) and then centrifuged at 10000 double volume of ice cold ethanol rpm for 10 minutes at 25°C. (100%) was then added for DNA

The upper aqueous layer was taken in a precipitation. fresh tube. To it 0.1 volume of 3M It was then centrifuged at 13000 rpm sodium acetate (pH 5.2) (Sigma, Cat#S for 30 minutes at 4°C. -9513) and double volume of Ice cold The DNA pellet obtained was washed absolute ethyl (BDH Cat#10107) was in 80 % ethyl alcohol, air dried and added and precipitated at 4°C for 30 finally dissolved in 50 µl of 1X TE (pH minutes in a cooling centrifuge (REMI 7.4).

MATERIALS AND METHODS 45

3.2.4. Quantification of DNA using ratio around 1.8 was chosen for further spectrophotometer: PCR-based methods.

DNA quantification is important for 3.2.5. DNA analysis by gel several applications in molecular electrophoresis: biology including amplification of A pure molecular biology grade, target DNA by polymerase chain DNase free Agarose (0.8%, gelling reaction and complete digestion of temperature 36°C) was used to cast the DNA by restriction enzymes. DNA gel in 0.5 X TBE (Tris-Borate-EDTA) quantification is generally carried out buffer (refer appendix C for by spectrophotometric measurements composition) containing 7µl of or by agarose gel analysis. Both the Ethidium Bromide (10mg/ml) on gel methods were employed in the present platform (100x70mm) (Tarsons, Cat # study. 7024).

Spectrophotometer (Thermo UVI Five (5µl) of DNA samples were spectrophotometer, Thermo Electron mixed with 3µl of 6X gel loading dye Corporation, England UK) was (refer appendix C for composition) and calibrated at 260 nm and 280 nm by loaded in the well carefully. taking 600µl 1X TE buffer in a cuvette Lambda DNA/ EcoRI/ HindIII double (*Photon cell, New Jersey, USA) digest (2 µl) and 100 bp ladder were DNA (6µl diluted in 594 µl of 1X TE) used as molecular markers to was taken in a cuvette and the optical determine the size of genomic DNA in density (OD) was recorded at both 260 the adjacent well. nm and 280 nm. The gel was run at 50 volt (V) and 100 The DNA concentration (ng/µl) was milliampere for 1.5 hour in the midi measured by using the following submarine Electrophoresis Unit formula: (Tarsons, Cat #7050) connected to the

Amount of DNA (ng/µl) = (OD260 X 50 electrophoresis Power Supply Unit X DF) / 1000 (DF stands for “dilution (Tarsons, Cat #7090). factor”). After the run time was over the gel was The quality of DNA was judged from visualized on a UV Transilluminator the OD values recorded at 260nm and (GeNeiTM, cat #SF805).

280 nm. The DNA showing A260/A280

MATERIALS AND METHODS 46

The molecular size of the genomic with MgCl2 - 12.5µl

DNA was detected in the form of Primer – 1.25 µl (0.25 µM) bands. The size of the bands was Template DNA – 2 µl (25 ng) estimated with Photo Capt Version 12.4, (Vi LHber lourmat, USA). Pyrogen free water – to a final volume of 25 µl 3.2.6. Gel Photography: One negative control tube was also The gel was photographed by using an prepared by PCR mix without DNA. indigenously built Gel Documentation System fitted with Canon SLR camera The ingredients were mixed evenly in a (EOS 350D) bearing Marumi orange SpinWin PCR micro centrifuge filter (58mm YA2, Marumi, Japan). (Tarson, Cat# 1000). The software in usage for the purpose The PCR reactions were performed in was EOS utility software. Applied Biosystems, Thermal Cycler, 3.2.7. RAPD-PCR analysis: 2720 PCR machine.

RAPD - PCR were done to study the The amplification cycle consisted of 35 genetic diversity of A. nepalensis. A cycles with following specifications: total of 40 oligonucleotide random Cycle 1: Denaturation at 94°C for 4 primers (Chromus Biotech made) were minutes, primer annealing at 58°C for screened for forty three (43) samples of 1 minute, primer extension at 72°C for A. nepalensis collected from different 1 minute. areas under study. Thirty four (34) Cycle 2-34: Denaturation at 94°C for 1 primers amplified successfully and minute, primer annealing at 58°C for 1 were used for downstream minute, primer extension at 72°C for 1 amplifications (Table 3.4). minute. 3.2.8. PCR-RAPD amplification: Cycle 35: Denaturation at 94°C for 1 In a sterile 0.2 ml thin walled PCR tube minute, primer annealing at 58°C for 1 (Tarson, Cat #500050) following minute, primer extension at 72°C for 7 components were added sequentially minutes. for PCR reaction volume of 25 µl, in 3.2.9. RAPD-PCR gel analysis: the order as given below. All the PCR product of RAPD, after Ready mix TM Taq PCR reaction mix their respective amplifications were

MATERIALS AND METHODS 47

Table 3.4: Primer sequences producing successful amplification Primer ID Sequence (5′-3′) Primer ID Sequence (5′-3′)

OPA 1 CAGGCCCTTC OPB 1 GTTTCGCTCC

OPA 2 TGCCGAGCTG OPB 2 TGATCCCTGG

OPA 3 AGTCAGCCAC OPB 3 CATCCCCCTG

OPA 4 AATCGGGCTG OPB 4 GGACTGGAGT

OPA 5 ATTTTGCTTG OPB 5 TGCGCCCTTC

OPA 7 GAAACGGGTG OPB 6 TGCTCTGCCCC

OPA 9 GGGTAACGCC OPB 7 GGTGACGCAG

OPA 10 GTGATCGCAG OPB 8 GTCCACACGG

OPA 11 CAATCGCCGT OPB 10 CTGCTGGGAC

OPA 12 TCGGCGATAG OPB 11 GTAGACCCGT

OPA 13 CAGCACCCAC OPB 12 CCTTGACGCA

OPA 15 TTCCGAACCC OPB 13 TTCCCCCGCT

OPA 16 AGCCAGCGAA OPB 14 TCCGCTCTGG

OPA 17 GACCGCTTGT OPB 15 GGAGGGTGTT

OPA 18 AGGTGACCGT OPB 17 AGGGAACGAG

OPA 19 CAAACGTCGG OPB 18 CCACAGCAGT OPA 20 GTTGCGATCC OPB 20 GGACCCTTAC separated on 1.5 % Agarose gel digest (2µl) and 100 bp (base pair) containing 7µl Ethidium Bromide (2µl) ladder were used as molecular solution run in 0.5 X TBE buffer (pH markers to determine the size of 8.0). genomic DNA.

PCR product (12µl) was mixed with The estimation of band sizes of the 4µl of gel loading dye (refer appendix genomic DNA and photography were C for composition), mixed well and done as mentioned earlier. then loaded in the Agarose gel and run The PCR were carried out thrice and for 2.5 hours. only the clear and reproducible bands Lambda DNA/ EcoRI/ HindIII double were compared with the adjacent

MATERIALS AND METHODS 48 marker DNA to estimate the sizes. www.ncbi.nlm.nih.gov/) database.

3.2.10. Data analysis: Accession number and the details of retrieved sequences were listed in The RAPD-PCR fingerprints were Table 3.5. 5 actinorhizal and 91 scored in binary form i.e. the presence nonactinorhizal symbiotic of the band as 1 and absence of band as haemoglobins (sHbs), non-symbiotic 0 and assembled in data matrix. The haemoglobins (nsHbs) and plant data was primarily analyses by truncated haemoglobins (ptHbs) were NTSYSpc2 and POPGENE selected for further study. programme package. Only distinct RAPD bands were recorded. Genetic 3.3.1.2. Analysis of physiochemical variations in terms of genetic similarity parameters: as well as distance amongst individuals Physiochemical data of land plant Hbs in each population and within were generated from the ProtParam population were also calculated by (Gasteiger et al., 2005.) software using using binary matrix analysis. ExPASy server. FASTA amino acid

3.3. Characterization of sequence format were applied for actinohaemoglobins (in-silico): further analysis. Different tools in the Proteomic server (ProtParam and 3.3.1. Characterization of actinorhizal Computer pI/Mw) were applied to haemoglobins: figure out different physiochemical 3.3.1.1. Sequences retrieval: properties like length of the protein, Haemoglobin (Hb) genes of molecular weight (kilo Dalton), oxygen actinorhizal plants (Alnus firma, binding capacity, thermo-stability, and Casuarina glauca, Myrica gale and hydrophobicity etc of the types of Datisca glomerata ) a n d pHbs. nonactinorhizal plants present in public 3.3.2. Characterization of domain, were downloaded and actinobacterial haemoglobins: compared during the period of study. 3.3.2.1. Sequence Retrieval: Nucleotide and amino acid sequences along with their annotations of various Nucleic acids and amino acid symbiotic, non-symbiotic and truncated sequences were selected from JGI-IMG types of plant haemoglobins (pHb) (http://img.jgi.doe.gov/) database on were retrieved from NCBI (http:// the basis of 100 different type-strains

MATERIALS AND METHODS 49

Table 3.5: List of land plant haemoglobins used in present study (NsHb – Non-symbiotic haemo- globin; SHb – Symbiotic haemoglobin; LHb – Leg haemoglobin; PtHb – Plant truncated haemoglo- bin)

Plant name Protein type Accession number Alnus firma NsHb-1 BAE75956.1 Arabidopsis thaliana NsHb-1 AAB82769.1 A. thaliana NsHb-2 AAB82770.1 A. thaliana PtHb NP 567901.1 Astragalus sinicus LHb-2 ABB13622.1 Brachypodium distachyon NsHb-1 XP 003558445.1 B. distachyon PtHb XP 003563697 Brassica napus NsHb-2 AAK07741.1 Canavalia lineata LHb-2 AAA18503 C. lineata PtHb ACQ91204.1 Casuarina glauca NsHb-1 CAA37898.1 C. glauca SHb-2 P08054.2 Ceratodon purpureus NsHb-1 ABK41124.1 Chamaecrista fasciculata NsHb-1 ABR68293 Cichorium intybus × Cichorium en- divia NsHb-2 CAA07547.1 Citrus unshiu NsHb-1 AAK07675 Datisca glomerata PtHb CAD33536 Euryale ferox NsHb-1 AAQ22728.1 E. ferox NsHb-2 AAQ22729.1 Eutrema halophilum NsHb-2 BAJ33934.1 E. halophilum PtHb BAJ34404.1 Glycine max LHb-2 CAA23730.1 G. max LHb-2 CAA23731.1 G. max LHb-2 CAA23732.1 G. max LHb-2 AAA33980.1 G. max NsHb-1 AAA97887.1 G. max PtHb AAS48191 Gossypium hirsutum NsHb-1 AAX86687.1 G. hirsutum NsHb-2 AAK21604.1 Hordeum vulgare NsHb-1 AAB70097.1 H. vulgare PtHb AAK55410.1

MATERIALS AND METHODS 50

Continuation of Table 3.5 Plant name Protein type Accession number Lotus japonicus LHb-2 BAB18108.1 L. japonicus LHb-2 BAB18107.1 L. japonicus LHb-2 BAB18106.1 L. japonicus NsHb-1 BAE46739.1 Lupinus luteus LHb-2 AAC04853.1 Malus domestica NsHb-1 AAP57676 Malus hupehensis NsHb-1 ACV41424 Medicago sativa LHb-2 AAA32659.1 M. sativa NsHb-1 AAG29748.1 Medicago truncatula LHb-2 CAA40899.1 M. truncatula LHb-2 CAA40900.1 M. truncatula PtHb XP 003603592.1 Myrica gale NsHb-1 ABN49927.1 Oryza sativa NsHb-1 AAK72229.1 O. sativa NsHb-1 AAC49881.1 O. sativa NsHb-1 AAK72230.1 O. sativa NsHb-1 AAK72231.1 O. sativa PtHb NP 001057972.1 Parasponia andersonii NsHb-1 AAB86653.1 P. andersonii SHb-1 1212354A Parasponia rigida NsHb-1 P68169 Phaseolus vulgaris LHb-2 AAA33767.1 Physcomitrella patens NsHb-1 ABK20873.1 P. patens PtHb XP 001781680.1 P. patens PtHb XP 001760820.1 Picea sitchensis NsHb-1 ABR17163 P. sitchensis PtHb ABK22150 Pisum sativum LHb-2 BAA31156 Populus tremula × Populus tremuloides NsHb-1 ABM89109.1 P. tremula × P. tremuloides PtHb ABM89110.1 Populus trichocarpa NsHb-1 XP 002313074.1 P. trichocarpa PtHb XP 002309574.1

MATERIALS AND METHODS 51

Continuation of Table 3.5 Plant name Protein type Accession number Psophocarpus tetragonolobus LHb AAC60563.1 Pyrus communis NsHb-1 AAP57677 Quercus petraea NsHb-1 ABO93466 Raphanus sativus NsHb-1 AAP37043 Rheum australe NsHb-1 ACH63214 Ricinus communis NsHb-1 EEF43319.1 R. communis PtHb XP 002516587.1 R. communis PtHb XP 002537252.1 R. communis PtHb XP 002539183.1 Selaginella moellendorffii NsHb-1 EFJ10590.1 S. moellendorffii PtHb EFJ07410.1 Sesbania rostrata LHb-2 CAA31859.1 S. rostrata LHb-2 CAA32043.1 Solanum lycopersicum NsHb-1 AAK07676.1 S. lycopersicum NsHb-2 AAK07677.1 Sorghum bicolor PtHb EER89990.1 Trema orientalis NsHb-1 CAB16751.1 Trema tomentosa NsHb-1 CAA68405.1 Trema virgata NsHb-1 CAB63706.1 Triticum aestivum NsHb-1 AAN85432.1 T. aestivum PtHb ACH86231.1 Vicia faba LHb-2 CAA90870.1 Vicia sativa LHb-2 CAA70431.1 Vigna unguiculata LHb-2 AAA86756.1 V. unguiculata LHb-2 AAB65769.1 Vitis vinifera NsHb-1 CBI32537.3 V. vinifera NsHb-1 CBI32538.3 V. vinifera PtHb XP 002284484.1 Wolffia arrhiza NsHb-1 AEQ39061 Zea mays ssp. Mays NsHb-1 AAG01375.1 Z. mays ssp. Mays NsHb-1 AAZ98790.1 Z. mays ssp. Mays PtHb ACG29525.1 Zea mays ssp. Parviglumis NsHb-1 AAG01183.1

MATERIALS AND METHODS 52

------√ √ √ √ √ √ √ √ Type strain

Family Nocardiaceae Nocardiaceae Nocardiaceae Segniliparaceae Mycobacteriaceae Mycobacteriaceae Mycobacteriaceae Mycobacteriaceae Tsukamurellaceae Tsukamurellaceae Mycobacteriaceae Catenulisporaceae Actinomycetaceae Actinomycetaceae Actinomycetaceae Actinomycetaceae Bifidobacteriaceae Bifidobacteriaceae Bifidobacteriaceae Bifidobacteriaceae Bifidobacteriaceae Bifidobacteriaceae Bifidobacteriaceae Corynebacteriaceae Corynebacteriaceae Corynebacteriaceae Corynebacteriaceae Corynebacteriaceae Corynebacteriaceae Corynebacteriaceae Corynebacteriaceae Corynebacteriaceae

241 0102 GO 06 GO K411 K411 Bd1 -- , ATCC 700975 PRL2011 Sl no.=Serial number) CCDC5079 CCDC5079 ATCC 13950 DJO10A Br4923 PRL2010 PRL2010 DSM 44702 DSM 45100 33, DSM 20162 103S CCM, DSM 20595 VH2, DSM 44266 VH2, CN IFM 10152 IFM ATCC 43063 Kalinowski ATCC Kalinowski 13032 317, ATCC 14019 ACS-071-V-Sch8b animalis ATCC animalis 25527 ID139908, DSM 44928 DSM ID139908, CDC 1076, DSM 44985 CDC Rhodococcus equi Rhodococcus Bifidobacterium dentium Mycobacterium leprae Mycobacterium Nocardia farcinica Corynebacterium ulcerans Corynebacterium Genome Name / Sample Name Name Genome Corynebacterium jeikeium Corynebacterium Mobiluncus curtisii Mobiluncus Bifidobacterium longum Corynebacterium diphtheriae Corynebacterium Bifidobacterium bifidum Mycobacterium massiliense Mycobacterium Bifidobacterium asteroides Corynebacterium resistens Corynebacterium Corynebacterium variabile Corynebacterium Mycobacterium tuberculosis Mycobacterium Gardnerella vaginalis Gardnerella Bifidobacterium breve Mycobacterium intracellulare Mycobacterium Tsukamurella paurometabola Tsukamurella Segniliparus rotundus Gordonia polyisoprenivorans Bifidobacterium animalis Catenulispora acidiphila Arcanobacterium haemolyticum Arcanobacterium Corynebacterium aurimucosum Corynebacterium Corynebacterium glutamicum Corynebacterium

637000085 650716029 643348566 646564587 643692018 639279306 651053043 637000198 649633089 646564566 648028043 649633015 651053005 646311910 642555107 649633046 644736339 646564505 Taxon_oid 2517093035 2519103109 2511231114 2511231126 2517093032 2519899775 2513237236 2512564042

2 3 4 5 6 7 8 9 1 15 16 17 18 19 25 26 11 12 13 14 20 21 22 23 24 10 Sl no. Table3.6. type-strains Different actinobacterialof genome selected for present study (

MATERIALS AND METHODS 53

------√ √ √ √ √ √ √ √ √ √ Type strain

Family Jonesiaceae Frankiaceae Frankiaceae Frankiaceae Frankiaceae Frankiaceae Frankiaceae Nocardiaceae Nocardiaceae Nocardiaceae Micrococcaceae Micrococcaceae Kineosporiaceae Kineosporiaceae Nakamurellaceae Acidothermaceae Glycomycetaceae Mycobacteriaceae Mycobacteriaceae Mycobacteriaceae Mycobacteriaceae Mycobacteriaceae Dermabacteraceae Dermabacteraceae Cellulomonadaceae Geodermatophilaceae Geodermatophilaceae Geodermatophilaceae

11B PR4 PR4 PYR-1 PYR-1 DD2 B4 BC501 SRS30216 DQS3-9A1 DY -- 8 RHA1 RHA1 TW08/27 TW08/27 -40K-21, -40K-21, DSM 44728 -- 04, DSM 44233 G-20, DSM 43160 G-20, 6 -- 0, DSM 4810 ATCC 17931 ATCC 17931 Y ACN14a ACN14a LLR 55134, DSM 20603

sp. CN3 sp. CcI3 sp. EuI1c sp. EAN1pec sp. EAN1pec Frankia datiscae Dg1 datiscae Frankia Frankia Frankia Frankia Frankia Frankia alni Frankia Rhodococcus opacus Rhodococcus Rhodococcus jostii Rhodococcus Rothia mucilaginosa Rhodococcus erythropolis Rhodococcus Blastococcus saxobsidens Blastococcus Modestobacter marinus Modestobacter Tropheryma whipplei Tropheryma Genome Name / Sample Name Name Genome Acidothermus cellulolyticus Acidothermus Candidatus Rothia dentocariosa Mycobacterium gilvum PYR-GCK Mycobacterium Mycobacterium vanbaalenii Mycobacterium Kineococcus radiotolerans Kineococcus Amycolicicoccus subflavus Amycolicicoccus Jonesia denitrificans Jonesia Brachybacterium faeciumBrachybacterium Nakamurella multipartita Nakamurella Geodermatophilus obscurus Geodermatophilus Stackebrandtia nassauensis Stackebrandtia

646564571 640753031 637000330 644736331 644736376 649633093 646311951 644736393 646311931 643692033 637000234 646564564 637000115 637000116 641228492 649633045 639633001 640427122 639633044 650716009 Taxon_oid 2506783011 2508501039 2512564033

44 45 46 47 48 49 50 40 42 43 30 31 32 33 34 35 36 37 38 39 41 28 29 27 Sl no Continuation Tableof 3.6

MATERIALS AND METHODS 54

------√ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Type strain

Family Micrococcaceae Micrococcaceae Micrococcaceae Micrococcaceae Micrococcaceae Micrococcaceae Nocardioidaceae Nocardioidaceae Dermacoccaceae Dermacoccaceae Beutenbergiaceae Microbacteriaceae Microbacteriaceae Microbacteriaceae Sanguibacteraceae Cellulomonadaceae Pseudonocardiaceae Propionibacteriaceae Propionibacteriaceae Propionibacteriaceae Propionibacteriaceae Propionibacteriaceae Micromonosporaceae Micromonosporaceae Micromonosporaceae Micromonosporaceae Micromonosporaceae Micromonosporaceae Promicromonosporaceae

-1 -1 A6 Sphe3 NM 266 TC1 -18-032 St LHB037 ATCC 27029 NBRC 102363 NBRC DC2201 DC2201 AB CIRM-BIA1 shermanii CNB-440 CNB-440 -74, DSM 10542 -74, P101, DSM 43017 P101, DSM 43017 XIL07, DSM 15894 XIL07, 134, DSM 20109 541, DSM 20547 re117, CIP 108037 CIP re117, 7KIP, DSM 43043 7KIP, sp. JS614 sp. JS614 ST xyli CTCB07 xyli michiganensis NCPPB 382 michiganensis Fleming NCTC 2665 Fleming IFO 14399, DSM 17836 IFO Nocardioides Leifsonia xyli Kocuria rhizophilaKocuria Arthrobacter aurescens Arthrobacter Salinispora tropica Propionibacterium acnes Propionibacterium Genome Name / Sample Name Name Genome Verrucosispora maris Verrucosispora Arthrobacter chlorophenolicus Arthrobacter Microlunatus phosphovorus Microlunatus Microbacterium testaceumMicrobacterium Arthrobacter phenanthrenivorans Arthrobacter Micrococcus luteus Micrococcus Kytococcus sedentarius Kytococcus Cellulomonas flavigena Kribbella flavida Sanguibacter keddieii Sanguibacter Actinoplanes missouriensis Actinoplanes Micromonospora aurantiaca Micromonospora calvum Arthrobacter arilaitensis Arthrobacter Saccharomonospora viridisSaccharomonospora Beutenbergia cavernae 0122, DSM 12333HKI Beutenbergia Xylanimonas cellulosilytica Xylanimonas Clavibacter michiganensisClavibacter Propionibacterium freudenreichii Propionibacterium

646564565 637000149 650716057 646311968 643692008 646564520 640427108 649633057 649633006 639633005 643348509 644736380 644736390 650377905 642555133 648028042 640427140 650716103 650716058 644736404 651053058 639633046 646311938 649633084 Taxon_oid 2513237176

51 53 54 55 56 57 52 58 59 60 61 64 65 62 63 66 67 68 69 74 75 70 71 72 73 Sl no Continuation Tableof 3.6

MATERIALS AND METHODS 55

------√ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ Type strain

Family Nocardiopsaceae Nocardiopsaceae Nocardiopsaceae Rubrobacteraceae Rubrobacteraceae Coriobacteriaceae Coriobacteriaceae Coriobacteriaceae Coriobacteriaceae Coriobacteriaceae Conexibacteraceae Acidimicrobiaceae Acidimicrobiaceae Streptomycetaceae Streptomycetaceae Streptomycetaceae Streptomycetaceae Streptomycetaceae Streptomycetaceae Streptomycetaceae Streptomycetaceae Streptomycetaceae Streptomycetaceae Streptomycetaceae Pseudonocardiaceae Pseudonocardiaceae Pseudonocardiaceae Pseudonocardiaceae streptosporangiaceae Streptosporangiaceae Streptosporangiaceae Thermomonosporaceae --

-4680 -4680 BCW CB1190 A3(2) NRRL 2338 NRRL YX 87.22 DSM 9941 DSM 43183 MA ICP, DSM 10331 ICP, DSM 46488 S699, ATCC 13685 DSM 15641 12-3, RHS 1, DSM 20476 R51, DSM 43833 NI 9100, DSM 43021 NI 101, DSM 43827 -6054, NBRC 14216 NBRC -6054, ATCC BAA-2165 ATCC BAA-2165 dassonvillei DSM 43111 DSM dassonvillei 43111 13350 NBRC griseus 1246, DSM 20469 IPP DSM 14684 ID131577, VPI, DSM 7084 VPI, VPI 0255, DSM 2243 VPI KM Thermobifida fusca Streptomyces scabiei Streptomyces Olsenella uli Olsenella Streptomyces coelicolor Streptomyces Genome Name / Sample Name Name Genome Streptomyces cattleya Streptomyces Streptomyces avermitilis Streptomyces Rubrobacter xylanophilusRubrobacter Streptomyces bingchenggensis Streptomyces Eggerthella lenta Eggerthella Thermomonospora curvata Thermomonospora Pseudonocardia dioxanivorans Pseudonocardia Actinosynnema mirum Actinosynnema Nocardiopsis a LHba Thermobispora bispora Saccharopolyspora erythraea Saccharopolyspora Streptomyces griseus Streptomyces Cryptobacterium curtum Cryptobacterium Atopobium parvulum Conexibacter woesei Conexibacter Kitasatospora setae Slackia heliotrinireducens Slackia Acidimicrobium ferrooxidansAcidimicrobium Amycolatopsis mediterranei Amycolatopsis Streptosporangium roseum Nocardiopsis dassonvillei

646311917 637000248 644736405 648028047 644736358 644736346 644736327 646311958 637000319 646311963 644736322 637000305 641522653 646564557 646862346 646564576 637000304 640069329 651053061 646564581 644736323 Taxon_oid 2511231200 2518645556 2511231086 2511231181

99 98 97 96 95 94 90 91 92 93 85 86 87 88 89 84 81 82 83 78 79 80 77 76 100 Sl no. Continuation Tableof 3.6

MATERIALS AND METHODS 56 of actinobacterial genome (Table 3.6). been investigated by several potent

3.3.2.2. Haemoglobin identification: indices like amount of Guanine and Cytosine in the nucleotide sequence Hb genes and proteins, present within (G+C content), frequency of G and C those strains were identified by using in the third position of codons (GC3s) the JGI-IMG database. Moreover have and effective number of codons (Nc) NCBI database was used to filter-out (Wright, 1990). All these parameters additional Hb genes and proteins were calculated by the software present within the selected CodonW (Ver. 1.4.2) (Peden, 1999). actinobacterial strains. The parameters are crucial for 3.3.2.3. Secondary information determining the level of codon usage collection: bias in studied genomes and emphasize

Necessary secondary information’s like the factors affecting codon usage Gene Id, Locus Tag, Pfams, COG pattern. Categories, Product Name, Gene The effective number of codons (Nc) is Symbol, DNA Sequence Length, a parameter to measure the biasness of Amino Acid Sequence Length of synonymous codons. It is a quantitative actinobacterial haemoglobins (bHbs) calculation which reflects the were assembled from JGI-IMG occurrence of a minute subset of database. codons used by a gene (Wright, 1990) Based on the predominant lifestyle, and its value ranges from 20 (on usage selected bHbs were segregated into of one codon for each amino acid) to different niche, hence named as 61 (on usage of all the codons with different species biotopes. Biotope is equal occurrence without the an area of uniform environmental termination codons). Nc was computed conditions providing a living place for as in equation (1). a specific assemblage of Nc=2+S+{29/[S2+(1-S)2]} ------(1) microorganisms (Sen et al., 2014). 3.3.2.5. Prediction of expression 3.3.2.4. Codon usage variation pattern: analysis: The codon adaptation index (CAI) is a The codon usage discrepancy in frequently used measurement of codon various actinobacterial genomes have usage within a gene relative to a

MATERIALS AND METHODS 57 reference set i.e ribosomal protein calculated by using pre-calculated NC- genes. The CAI values were computed AVG, GC-AVG, GC3s-AVG and CAI- by using CAI Calculator2 (Wu et al., AVG to understand the comparative 2005). CAI Calculator2 is a web-based expression level of bHb genes application. The resulted CAI value according to their biotopes. varies between 0 and 1·0 (Sharp and 3.3.2.7. Functional annotation: Li, 1987). Higher CAI value indicates the gene of interest has a codon usage BACMAP (http:// pattern more parallel to that in the bacmap.wishartlab.com/) and Uniport reference set of genes. CAI is usually Server (http://www.uniprot.org/) were calculated based on the equation (2). used to identify different functional properties associated with bHbs.

3.3.2.8: Function based phylogeny: CAI = exp ---(2) A binary matrix has been prepared on In equation 2, ωK signifies the relative the basis of the molecular and adaptedness of the Kth codon and L biological function served by bHbs, i.e. represents the number of synonymous the presence of a particular band as 1 codons present in the gene. and absence as 0 and assemble into Ideally, the reference set for CAI data matrix. Subsequently the calculation is composed of highly similarity matrix followed by expressed genes. So, the CAI value dendrogram has been prepared by provides an indication of gene analyzing the similarity coefficient expression level, correspond to the matrix, generated by NTSYSpc2 assumption that there is translational software. selection to optimize gene sequences 3.4. Comparative study amongst according to their expression levels. actionohaemoglobins in gene level: Ribosomal proteins are generally highly expressed genes in a given 3.4.1. Identification of domains and genome and as a result are taken as the motifs: reference set for of CAI calculation. For domain search, the Pfam site 3.3.2.6. Percentile calculation: (http://www.sanger. ac.uk/software/ pfam/search.html) was used. The NC, GC, GC3s and CAI percentile of amino acid sequences from both plants Hb genes in each strain have been

MATERIALS AND METHODS 58 and bHbs (Table 3.5, 3.6) were bHbs were compared using selected, aligned and subjected to CLUSTALW2 (Thompson et al., 1994) BLOCK MAKER (Henikoff et al., software to identify the conserved 1998) for domain analysis. residues amongst actinorhizal Hbs. Accordingly, the blocks were fed to 3.5. Homology modeling of MEME suit (Bailey et al., 2015) for actinorhizal plant haemoglobin motif elicitation followed by MAST proteins: (http://meme.ebi.edu.au/meme/cgi-bin/ 3.5.1. Template selection and model meme.cgi) search, using the conserved building: blocks from MOTIF server of BLOCK MAKER. The conserved protein motifs The amino acid sequences from were characterized for biological actinorhizal nsHbs (A. firma, C. function analysis using protein glauca, M. gale), sHb (C. glauca), and BLAST. Interproscan, provides the ptHb (D. glomerata) sequences were best possible match based on highest retrieved from sequence database of similarity score (Quevillon et al., 2005) NCBI. Thorough scanning of the was also used for further study. sequences, revealed that actinorhizal protein of our interest were not 3.4.1.1. Motif based binary tree: available in the Protein Data Bank A binary matrix has been prepared by (PDB) (http://www.rcsb.org/pdb/home/ counting the number of motif present home.do) and therefore protein 3D in plant and actinobacterial sustem, i.e. structure were constructed using the presence of motif as 1 and absent as homology modeling technique. 0, and assemble into data matrix. Template selection was done by using Subsequently the similarity has been PSI-BLAST (position specific iterative calculated by analyzing the similarity blast) (Altschul et al., 1997) against coefficient matrix, generated by PDB database. Various parameters like NTSYSpc2 software. quality of the template structure, 3.4.2: Amino acis sequence environmental likeness and comparison: phylogenetic similarity were

The amino acid sequence of considered while choosing the desired actinorhizal sHb, nsHb and ptHbs template. Alignment of the template along with some selected pHbs and and target sequences was carried out

MATERIALS AND METHODS 59 with the help of software are present in functionally important CLUSTALW2. Crude three segments (Centeno et al., 2005). Energy dimensional all-atom models were minimization was done to all the protein constructed using Modeller 9v7, 9v10 atoms, using the steepest descent and (Webb and Sali, 2014) using sequence conjugate gradient process to eradicate alignment between the template existing bad sectors within protein sequence and the target sequence. structures. All these possible Energy minimization was done by computational analyses were performed selecting desired parameters to satisfy using the Swiss-PDB Viewer package the spatial restraints acquired from the (Kaplan and Littlejohn, 2001) in-vacuo alignment (Centeno et al., 2005). These with the GROMOS96 43B1 parameters restraints obtained on the basis of set. Hydrogen bonds were not homology, are generally improved by considered for the final model. stereo-chemical restraints on bond 3.5.3. Model evaluation: lengths, bond angles, dihedral angles, The model was analyzed further with a and non-bonded atom-atom contacts series of programs for ensuring its that are attained from a molecular internal stability and reliability. The mechanics force field (Oliveira et al., Ramachandran Plot (Ramachandran et 2011). al., 1963) was constructed into the I-TASSER server (Yang and Zhang, possible conformations of φ and ψ 2015) for protein structure and function angles for modeled proteins. The quality prediction were also used to construct of the model was checked using ProSA the protein structure where similarities (Wiederstein and Sippl, 2007), ERRAT with template were very low. analysis (Colovos and Yeates, 1993)

3.5.2. Up-gradation of the crude model and VERIFY3D (Eisenberg et al., 1997). structures: The refined models were then submitted to ProFunc server (Laskowski et al., The refinement of the crude protein 2005) to identify functionally important models were necessary as the regions and the presence of pockets in constructed rough models by the the structure were predicted using homology modeling technique often CASTp server (Dundas et al., 2006). 3v contain certain amount of errors which website was also used to calculate the may mislead if the concerned residues

MATERIALS AND METHODS 60 volume of modeled proteins. activity of proteins (Hollup et al., 2005). Lowest frequency modes was 3.5.4. Conformational dynamics study: performed using MolMovDB The structural dynamics task was made (Alexandrov et al., 2005). Solvent using various web-based algorithms. accessibility of the amino acid residues Slowest mode of the protein was in the modelled proteins was calculated by using the software determined by using ASA-view WEBnm (Hollup et al., 2005) and algorithm (Ahmad et al., 2004). related deformation energies were also 3.6. Evolutionary trend of plant determined. ElNemo (Suhre and truncated haemoglobins: Sanejouand, 2004) was utilized to calculate the Normal Mode Analysis 3.6.1. Structural resemblance (NMA) of the proteins which determination among plant and contribute to the movements of bacterial haemoglobins: corresponding protein. NMA predict DALI, a structural alignment server the probable movements of the proteins (Holm and Sander, 1993) was and aid in selection of the slowest employed to explore the similarity

Table 3.7: Structural resemblance analysis (ptHb=plant truncated haemoglobin; nsHb=non-symbiotic haemoglobin; sHb=symbiotic haemoglobin; LHb=leghaemoglobin; bHb=actinobacterial haemoglo- bin)

Name Type PDB-ID Datisca glomerata ptHb Modelled Oryza Sativa nsHb 1d8u Glycine max LHb 1fsl Casuarina glauca sHb Modelled Myrica gale nsHb Modelled Zea mays nsHb 2r50 Alnus firma nsHb Modelled Trema tomentosa nsHb 3qqq Arabidopsis thaliana nsHb 3zhw A. thaliana ptHb 4con Mycobacterium tuberculosis bHb 1idr Bacillus subtilis bHb 1ux8 Vitreoscilla stercoraria bHb 1vhb Geobacillus stearothermophilus bHb 2bkm Thermobifida fusca bHb 2bmm Agrobacterium tumefaciens bHb 2xyk Tetrahymena pyriformis bHb 3aq5 Chlamydomonas moewusii bHb 1dly

MATERIALS AND METHODS 61 between 3D structures of the modelled the PHYLIP package (Felsenstein, proteins with those of some selected 1989) to explore the structure based Hb proteins already present on the phylogenetic closeness amongst them. PDB (Table 3.7) on the basis of their e- 3.6.2. Functional divergent analysis: values. Simultaneously, a robust 3.6.2.1. Data collection and multiple multiple structural alignment sequence alignments: algorithms MUSTANG (Konagurthu et al., 2006) was used to carry out the Amino acid and nucleotide sequences structural superposition. Lesk-hubbard’ of selected Hb proteins from plant and graph was used to plot the backbone actionbacterial species (Table 3.5, 3.6) structure (positions of c-alpha atom were selected for functional divergent and their RMSD values) of some analysis. Total of two hundred and randomly selected pHbs and bHbs. seventeen protein sequences were subjected to multiple alignment using 3.6.1.1. Structure based phylogenetic ClustalW 1.83 (Thompson et al., tree: 1994). A phylogenetic tree was Crystal structures of some selected constructed based on neighbor-joining pHbs and bHbs (Table 3.8) were method using the software MEGA 4 retrieved and employed to MUSTANG (Tamura et al., 2007) as well as server followed by FITCH program of PHYLIP package

Table 3.8: Structure based phylogenetic analysis amongst selected plant haemoglobins and actinobac- terial haemoglobins (bHb=bacterial haemoglobin; pthb=plant truncated haemoglobin; sHb=symbiotic haemoglobin; nsHb=non-symbiotic haemoglobin)

Name Type PDB-ID Vitreoscilla stercoraria bHb 1vhb Datisca glomerata ptHb Modelled Mycobacterium tuberculosis bHb 1idr Arabidopsis thaliana ptHb 4con Thermobifida fusca bHb 2bmm Glycine max LHb 1fsl Lupinus luteus LHb 2gdm Oryza Sativa nsHb 1d8u Trema tomentosa nsHb 3qqq Myrica gale nsHb Modelled Alnus firma nsHb Modelled Casuarina glauca sHb Modelled

MATERIALS AND METHODS 62

3.6.2.2. Functional divergence divergence coefficient dropped to zero. detection: Residues calculated to be functionally

Hb sequence duplication events were divergent were finally confirmed and tested for type I functional divergence tabulated. based on the method by Gu and Vander 3.7. Expression study of A. nepalensis (2002). The study was carried out with haemoglobin in different plant Diverge program version 2.0. This regions: method is based on maximum 3.7.1. Collection of Germplasm: likelihood procedures to estimate for Germplasms in two forms viz. seeds the functional divergence and was and nodules of A. nepalensis were calculated for each position in the collected from different sites of hilly alignment. To detect amino acid region of West Bengal mainly in residues reflecting functional Darjeeling and Kalimpong district in divergence, Hb subfamilies were pair- the month of April. Nodules borne in wise compared to each other. A. nepalensis roots(source of Frankia) Successive elimination of the highest were collected on the way from Siliguri scoring residues was done to – Kurseong - Darjeeling – Pasupati – determining the cut-off value for the Mirik - Siliguri route and seeds were posterior probability from the collected from healthy plants near alignment until the Functional

Figure 3.2: Naturally occurring A. nepalensis in study area and their surroundings vegetations

MATERIALS AND METHODS 63 Cinchona plantation at Mongpoo in soaked with aerated water (seeds were Darjeeling district. Photograph of the dipped in aerated water at room plants and surroundings vegetations temperature) and ii) non-soaked in were taken (Figure 3.2). water, were used for the experiment.

The light colour of nodule is an The seeds were taken separately in two indication of its active growth and different sterile conical flasks and youngness (Myrold and Huss-Danell, surface sterilized with 30% H2O2 for 1994). So, nodules were collected 10 minutes. Seeds were then washed having light whitish-brown in colour with sterile distilled water for several and essential field data of the tree and times (minimum 10-12 times).The the locality were recorded in data seeds of A. nepalensis were then placed collection sheet (attached) (Figure 3.3) on a sterile moist filter paper placed on which includes habit, habitat, and a pertiplate and kept in the BOD vegetation of surrounding areas, soil incubator at 31°C ± 3°C for type and nodules. germination. One month old seedlings

The percentage of nodulation was were transferred to sterile pouches calculated by the following formula of containing nitrogen free different th th th Raman and Elumalai, (1991). concentration (1/4 , 1/8 and 1/16 ) of Hoagland solution adjusted to pH-7. (Hoagland and Arnon, 1950).

The collected nodules were stored in 3.7.3. Nodule sterilization: plastic bag containing moist tissue The collected nodules were first paper for the maintenance of active washed thoroughly in running tap water potential and kept in ice box to water to remove soil and organic minimize the tissue degradation. The debris. The lobules and lobes were soil samples were also collected in the slowly separated by using a pair of same manner and both the samples forceps and scalpel. The lobes were were kept in -20°C in the laboratory for further cleaned by using mild detergent further use. (Extrant) in a petri-plate. This activity

3.7.2. Germination of Seeds and was repeated till the lobes appeared to seedlings cultivation: be clean. Finally lobes were washed with distilled water followed by sterile Two treatments of seeds – i) overnight distilled water for several times (10-12

MATERIALS AND METHODS 64

Figure 3.3: Data collection sheet

MATERIALS AND METHODS 65 times). gradually poured into side arm flask, Since Frankia is a slow growing till it immerged the nodules (Figure organism, a chance of contamination of 3.4 A). its culture by soil borne fast growing A vacuum pump was then attached microorganisms is more (Lechevalier, with the arm of side arm flask to 1994). To overcome this problem further clean the nodule parts. This surface sterilization of nodules were step was continued for 10- 15 seconds done in Laminar Air Flow chamber by and was repeated for 2-3 times. following methods: The side arm flask was kept Nodules were treated with 0.1% HgCl2 undisturbed for 20 minutes and stirred for two minutes and washed 6-7 times by occasional mixing after every 5 with sterile distilled water to remove the minutes of interval. traces of HgCl2. After 20 minutes of time period, The surface sterilized lobes were nodules were washed by sterile transferred into side arm flask by using a distilled water for 10-15 times by pair of forceps. using a series of petri-plates (Figure

The concentrated H2O2 (30%) was 3.4 B).

A

B C

Figure 3.4: (A & B) Surface sterilization procedure of A. nepalensis nodules by using 30% H2O2; (C) Removal of upper epidermal layer from A. nepalensis nodules

MATERIALS AND METHODS 66

3.7.4. Plant infectivity test: lux illumination.

After sterilization procedure the nodule 3.7.5. Expression study of haemoglobin lobes were taken on a sterile glass plate genes by Real-Time PCR: and upper epidermal layer was peeled 3.7.5.1. Sample preparation: off using two sterilized needles (Figure 3.4 C). Four different plant parts (leaf, stem, root and nodules) were used for study The nodule lobes were then washed the expression of Hb gene though RT- thoroughly with sterilized distilled PCR. water and used for plant infectivity test. The following three sets were Total eleven samples were prepared for prepared for further study. this study and named accordingly (Table 3.9). (1) Naturally occurring matured A. nepalensis plant parts were used as +ve 3.7.5.2. RNA isolation: control, (2) 15 months old seedlings Total RNA from eleven samples was inoculated with crushed nodules under extracted using the TRIzol reagent test and (3) Un-inoculated seedlings (Invitrogen) according to were used in each set as –ve control. manufacturer’s instructions (Perazzolli

Each set (10 replicates) containing et al., 2004). The presence of RNA in twenty seedlings were allowed to grow each sample was checked by agarose in a plant growth chamber at 26°C with gel electrophoresis. Quantification of approximately 90% humidity and 1100 RNA was done using

Table 3.9: Detail of the samples used in expression study (Set no.=Set number) Set no. Description Sample number Sample Set 1 Fully grown naturally occurring ma- 1 Leaf tured A. nepalensis used as +ve control 2 Stem 3 Root 4 Nodules Set 2 Seedlings inoculated with crushed 5 Leaf nodule under test 6 Stem 7 Root 8 Nodules Set 3 Un-inoculated seedlings were used in 9 Leaf each set as ‘-ve’ control 10 Stem 11 Root - -

MATERIALS AND METHODS 67 spectrophotometer (Thermo UVI and the ratio was 1:5. Each 25 μl spectrophotometer , Thermo Electron reaction mixture contains 12.5 μl of Corporation, England UK). The SYBR Green PCR master mix samples showing values greater than (Applied Biosystems), 0.125 μl of 2.0 were used for further study. RNase inhibitor (40 U μl–1; TaKaRa),

3.7.5.3. Primer designing: 0.125 μl of Superscript II-RT (200 U μl –1; Invitrogen) with 1 μl of each At first EST database were used to primer pair (the primer concentration identify the homologous genes which was 5 µM) using the parameters were subsequently used as control to recommended by the manufacturer. designing the primer pair. Primers were The PCR was carried out in 40 cycles designed by using Primer3 software of 95°C for 15 s and 60°C for 1 min. (Untergasser et al., 2012) with melting Each PCR reaction was performed in temperature 58-60°C. Primers were triplicate without any template designed to be as specific as possible controls. Microsoft excel file was used for the selected gene family members. to analyse the raw expression values 3.7.5.4. Expression study by RT-PCR: for the tested genes. The analysis was Two steps RT-PCR was performed to done by using ABI Prism Dissociation study the expression level of Hb genes. Curve Analysis Software. The primer Quantitative RT-PCR was performed sequence for RT-PCR was used as on the ABI Prism 7700 Sequence follows: AhbF 5′ Detection System (Applied TCCCATGCCATGTCTGTCTT 3′ and Biosystems) in a 96-well reaction plate. AhbR 5′ DNaseI-treated total RNA (50 ng) was AGCTTCTCCCCATGCAATCT 3′ for used for cDNA synthesis with random quantifying the RNA transcript of Hb hexamer primers according to the genes present in each part of A. manufacturer's instructions. cDNAs nepalensis. were diluted with nuclease free water

RESULTS AND DISCUSSION Chapter 4 Results and Discussion

and air temperature 6 to 19°C. 4.1. Ecology of Alnus nepalensis:

4.1.1. Study area and ecology: It was observed that young seedlings get defoliated by frost, and were very Ecology and habitat related often killed. However, as it occurs information of Alnus nepalensis in sub naturally approximately up to 7600ft. Himalayan West Bengal and Sikkim in areas with low frost. These were depicted in Table 4.1 A. differences may be due to the locality, nepalensis plant was found to be or possibly the occurring time. growing naturally in area occupied by evergreen hilly forest, stony slopes, Particularly at lower elevations it is water channel areas and steep characteristic of moist sites such as grassland. In the field study in Sub riverside but also colonize in gravelly Himalayan West Bengal and Sikkim, land exposed by landslides, and A. nepalemsis was found to be abandoned cultivable land (Sharma, naturally growing in the area with a 2012). Field work also confirms the geographical boundary of 26°87′78″ to altitudinal limit of 6000ft.-7000ft. as 27°34′71″N and 88°25′00″ to 88°61′ favorable height for growth of A. 76″E and with relative humidity 19% nepalensis in studies region. to 35%. Sharma and Ambasht (1986) A. nepalensis was found to grow in reported that this plant is commonly varied soil types. The study area found at an altitude ranging from consisted of various combination of 5479ft to 9842.51ft in Nepal, with a soil i.e. loamy, sandy, clayish etc, with temperate climate, but during field colour variation from reddish brown to study, I found this plant in the stidied blackish brown in hilly vertical slopes. region, to grow mainly at an altitude Study area consists of hilly slopes with ranging from 3616ft to 7598ft with soil virtually no plain land. Slopes are temperature in between 3.4 to 10°C important ecological component in

RESULTS AND DISCUSSION 69 Table 4.1: Compilation of collected field data of A. nepalensis species in sub-Himalayan West-Bengal and Sikkim

Soil Nodule Altitude Soil tempera- Air tempera- Place Soil colour mois- Soil pH Humidity Topology Vegetation type Location growth (ft.) ture (°C) ture (°C) ture from Lateral Rani pool 3926 Brown 4.8 20 4.74 15 33 Hilly vertical slope Village road side Clumped root Blackish Landslides area Lateral Samdur 3569 9.3 10 4.75 12 35 Road side Clumped Brown now steep slope root Lateral Upper Tadong 4131 Brown 5.7 40 6.2 19 31 Hilly steep slope Forest Clumped root Lateral Tadong 3720 Brown 10 30 5.13 16 23 Hilly steep slope Forest Scattered root Blackish Lateral Gangtok Gairi Gaon 4418 7.8 55 5.7 12 27 Hilly slope Forest Scattered Brown root Lateral Deorali bazaar 4574 Brown 6.5 30 4.58 13 25 Hilly Village road side Scattered root Upper Sichey Well drained For- Lateral 5428 Brown 5.8 16 3.85 9 22 Hilly Clumped gaon est root Enchey Lateral 5909 Brown 5.2 15 4.25 14 21 Hilly Grazing Forest Clumped monostry root Lower Burtuk Village road side Lateral 5183 Light Brown 4.2 60 4.03 8 20 Hilly steep slope Clumped road forest root Blackish Lateral Kalimpong Kalimpong 5088 3.4 42 5.34 15 27 Hilly vertical slope Forest Clumped Brown root Well drained For- Lateral Mongpoo 3616 Light brown 6.7 35 6.18 11 21 Hilly slope Scattered est root Sandy Lateral Kurseong 4888 4.9 40 6.22 10 23 Hilly road side Village road side Clumped brown root Blackish Village road side Lateral Sonada 6238 9.2 14 5.72 6 25 Steep slope Clumped Brown forest root Redish Lateral Ghoom 7347 6.8 28 4.29 9 28 Hilly steep slope Forest clumped Darjeeling brown root Lateral Darjeeling 6968 Brown 8.6 21 5.53 7 19 Hilly Forest Scattered root Blackish Lateral Sukhia pokhri 7102 7.6 31 5.23 8 32 Hilly road side Forest Scattered brown root Lateral Pashupati 6669 Brown 6.9 29 4.79 13 30 Hilly slope Forest Scattered root Lateral Mirik 5267 Light brown 9.1 33 5.01 11 29 Hilly slope Forest Scattered root

RESULTS AND DISCUSSION 70 identifying and evaluating potential 47±5cm in circumference. Leaves are environment impacts related to broad, dark green in colour, entire, landform alteration along with linear, rough lower with slight shiny vegetation successions. Soil pH of the upper surface. Leaf size varied with study area was measured and it was different growing conditions and found to be acidic with a range from location. Leaf size of A. nepalensis was 3.85 to 6.22. Similar soil pH range was 6.54 to 12.86cm in length and 3.72 to observed in hilly regions of Darjeeling 6.34cm in width. Morphological study district of West Bengal by Mukherjee (Table 4.2) revealed that average inner (2009). Soil moisture of the study area branch distance of A. nepalensis trees was extremely variable. Ranging from were 23.63±10cm. It has been also 10 to 60%. This wide range of variable noticed that inner branch distance of A. soil moisture favor growth of the plant. nepalensis trees were varied with It has also been observed that A. altitude. In higher altitude inner branch nepalensis grows mainly in the land- distances found higher while in lower slide area underlining its importance as altitude, inner brunch distance was successional plant. comparatively shorter. Distances

All matured A. nepalensis plants in the between root nodule cluster were also study area were found to bear root variable. Average inter nodular nodules due to symbiotic association distance was found to be 3.5±1.5cm. A with Frankia - an actinomycete to fix cluster of nodules borne in a matured atmospheric nitrogen. Seeds are plant measured 33.28 to 123.05gm produced in cones are abundant, (Figure 4.1-A,B). However, inter winged and easily shed by slight nodular distance found to be inversely shaking. This aid in easy dispersal and correlated with altitude, but positively may be one important reason for its correlated with weight of nodules growth in landslide prone areas. (Figure 4.1-C).

4.1.2. Plant morphology: 4.1.3. Soil analysis:

Morphological data of A. nepalensis Soil samples were collected during the were depicted in Table 4.2. Fully collection of nodules. Soil, around matured A.nepalensis trees from Sub- clumped and scatter root nodules of A. Himalayan region were found to nepalensis were collected from around 25-33meters in length and different geographical location.

RESULTS AND DISCUSSION 71 Average Average ence (cm)ence tree trunktree circumfer- 28 51.06 30 38.97 (m) Average Average tree height tree (gm) 33.28 48.05 nodules cluster ofcluster Mass ofMass a tance (cm)tance nodular dis- Average inter Average inter inter Average Average branch dis- tance (inch)tance 4.18 24.23 3.4 5.02 30.45 2.9 5.74 31.97 2.1 55.26 29 47.62 4.86 3.72 4.14 5.78 26.74 4.32 24.65 29.28 33.09 3.8 31.23 3.3 3.1 106.09 2.3 2.6 92.12 74.17 25 68.29 33 57.09 26 49.99 32 45.16 31 41.91 52.06 39.98 5.22 5.38 23.67 21.05 2.8 4.1 111.02 96.72 30 27 52.01 50.91 5.89 6.34 25.09 27.93 3.5 3.1 89.49 94.75 33 31 50.17 44.23 5.02 21.19 4.2 123.05 31 53.03 6.12 4.03 13.82 22.43 4.5 3.9 85.93 85.07 30 29 49.44 42.87 5.45 14.87 4.7 62.17 31 50.85 5.75 4.02 12.72 14.45 4.8 5.1 92.15 95.08 25 27 45.23 47.64 (cm) (cm) Average widthAverage length (cm) Average leafAverage (ft.) (ft.) 5267 7.15 6669 9.48 7102 11.98 4888 6238 6.54 7.95 7347 6968 12.86 10.02 3616 8.97 5183 5088 12.23 10.54 5428 5909 10.41 12.75 4574 11.88 4418 9.76 3720 10.25 4131 9.11 3926 3569 8.45 7.30 Altitude Mirik Place Place Sonada Sonada Ghoom Enchey Enchey Tadong Tadong Samdur monostry Pashupati Kurseong Mongpoo Rani pool Darjeeling Gairi GaonGairi Kalimpong Upper SicheyUpper Sukhia pokhri Lower Lower Burtuk Upper Tadong Upper Deorali bazaarDeorali Table4.2: Morphologycal data A.of nepalensis recordedduring field study

RESULTS AND DISCUSSION 72

A C

B

Figure 4.1: (A & B) Different root nodules of A. nepalensis collected from different collection sites; (C) Morphological study of A. nepalensis according to altitudinal variation

RESULTS AND DISCUSSION 73

Different soil parameters like pH, relatively high i.e. 45ppm. The test organic carbon, soil nitrogen, potash, results were found extremely variable phosphorus and sulphur content were and were totally dependent on micro- estimated (Table 4.3). pH of the ecological factors. Though, highly collected soil samples was found to be significant correlation between soil acidic (pH 3.85-6.22). Phosphate nitrogen with organic carbon has been concentration varied considerably from found.

10 to 45ppm (parts per million). I had made a correlation study between Amongst the study area lowest the weight of nodules and amount of concentration of phosphate was nitrogen estimated in particular estimated at Sonada (10ppm) and nodulated soil. Result showed that, highest at Mongpoo (45ppm). Average weight of nodules was found to be sulphur concentration was estimated to more in the places where amount of be 30±15ppm. The soil sulphur content soil nitrogen were less (Figure 4.2). In at Ranipool in Sikkim was found Deorali bazaar soil nitrogen estimated

Table 4.3: Estimation and analysis of nutrients present in soil collected from the base of A. nepalensis in studied area (Alt.=Altitude) Alt. Carbon Potash Phosphate Sulphur Nitrogen Place pH (ft.) (%) (ppm) (ppm) (ppm) (%) Enchey 5909 3.85 1.52 70 25 30 0.130 monostry

Lower Burtuk 5183 4.58 0.74 200 35 35 0.063

Deorali bazaar 4574 5.13 0.66 340 15 20 0.057

Upper Sichey 5428 4.74 1.63 385 30 30 0.140

Gairi Gaon 4418 5.70 2.30 395 25 25 0.197 Tadong 4131 6.20 1.98 240 15 30 0.170 Upper Tadong 3720 4.75 3.47 235 25 35 0.298 Samdur 3569 6.22 1.39 75 40 35 0.119 Rani pool 3926 4.25 1.56 53 35 45 0.134 Kalimpong 5088 4.03 1.03 230 15 25 0.089 Mongpoo 3616 5.34 1.19 155 45 35 0.103 Kurseong 4888 6.18 1.59 87 35 30 0.136 Sonada 6238 5.72 2.85 72 10 25 0.245 Ghoom 7347 5.53 2.88 67 15 30 0.247

RESULTS AND DISCUSSION 74 were 0.057% and weight of nodule Sonada the amount of nodules were clusters were found to be 123.05gm found to be very less i.e. 62.17gm, (Table 4.2). Likewise in Lower Burtuk 68.29gm and 74.17gm and the soil road weight of nodules were found to nitrogen in those places were estimated be 111.02gm and nitrogen content to be 0.298%, 0.297%, and 0.245% estimated to be 0.063%. However, the respectively. So, from the result a places like Upper Tadong, Ghoom, strong negative correlation had been seen. This result may suggest that excellent quality nodulation in A. nepalensis takes place where the amount of nitrogen is less.

4.2. Population genetics and genetic diversity studies of A. nepalensis:

4.2.1. Diversity study through molecular characterization:

4.2.1.1. DNA extraction:

Genomic DNA form A. nepalensis leaves was isolated using the Doyle and Doyle, (1987) standard protocol with minor modifications. A whitish network of nucleic acid and starch, was precipitated as DNA-CTAB complex, and then subjected to purification process for complete elimination of polysaccharide. DNA obtained was used for further downstream processing. The presence of DNA bands were observed by using agarose gel electrophoresis.

4.2.1.2. DNA quantification and quality check: Figure 4.2: Correlation between weight of nod- ules and particular soil nitrogen The DNA quantification was done by

RESULTS AND DISCUSSION 75

UV spectrophotometer using 260nm the genetic diversity among indigenous and 280nm filters. The ration of A260/ A. nepalensis species from sub-

A280 was calculated for each sample. Himalayan study area of my interest. All the experiments were performed in All the collected DNA samples were three or more replicates and the tried to amplify with 40 different 10- samples showing the ratio of around mer primers (refer materials and 1.8 were chosen for further studies methods for primer sequences). Of the (Table 4.4). The concentration of the 40 primers screened, 34 resulted in isolated DNA samples was varied from producing distinct and scorable bands. 215-2465ng/µl. This showed the A representative of RAPD profile isolated DNA samples were reasonably photograph of the A. nepalensis were pure. given in Figure 4.4. Total number of

The intactness of the DNA was amplified bands, number of determine with the help of 0.8% monomorphic, polymorphic bands, size Agarose gel electrophoresis using λ of the amplified bands and percentage DNA-EcoRI+Hind III double digest of polymorphism generated by the and 100bp ladder. The size of the RAPD primers is tabulated in Table bands was found to be approximately 4.5. In population I, 32 primers around 20-21kb (Figure 4.3). resulted in producing of 367 bands ranging in between 220bp to 2100bp of 4.2.1.3. RAPD analysis: which 16 were monomorphic while RAPD fingerprinting was used to study rest were polymorphic (Table 4.5). The

Figure 4.3: DNA-gel electrophoresis of crude DNA isolated from studied samples (Representative photo; 1: 100 bp DNA ladder; Collection site: 2-14; 2: Mongpoo bazaar (3616 ft.); 3: Ranipool (3926 ft.); 4: Tadong (4131 ft.); 5: Baghgora (4959 ft.); 6: Delo (5088 ft.); 7: Lower Sonada (5842 ft.); 8: Upper Mongpoo (6144 ft.); 9: Sonada post office (6389 ft.); 10: Pashupati (6669 ft.); 11: Batasia loop (6968 ft.); 12: Dooteria forest (7033 ft.); 13: Bhalikhop basti (7347 ft.); 14: Rishop (7598 ft.); 15: λ DNA-EcoRI+Hind III double digest DNA ladder)

RESULTS AND DISCUSSION 76

Table 4.4: Purity of A. nepalensis DNA (Sl no.=Serial number)

Sl no. Population A260 A280 A260/ A280 Quantity of ratio DNA (ng/µl) I-1 Population I 0.074 0.041 1.8 370 I-2 0.044 0.024 1.83 220 I-3 0.058 0.032 1.81 290 I-4 0.086 0.047 1.82 430 I-5 0.043 0.024 1.79 215 I-6 0.076 0.042 1.8 380 I-7 0.192 0.105 1.82 960 I-8 0.097 0.053 1.83 485 I-9 0.326 0.179 1.82 1630 I-10 0.079 0.044 1.79 395 I-11 0.088 0.048 1.83 440 I-12 0.079 0.044 1.79 395 I-13 0.077 0.042 1.83 385 I-14 0.072 0.040 1.8 360 I-15 0.098 0.054 1.81 490 I-16 0.083 0.046 1.8 415 I-17 0.069 0.038 1.81 345 II-1 Population II 0.493 0.273 1.8 2465 II-2 0.234 0.129 1.81 1170 II-3 0.094 0.052 1.8 470 II-4 0.076 0.042 1.8 380 II-5 0.083 0.046 1.80 415 II-6 0.066 0.036 1.83 330 II-7 0.049 0.027 1.81 245 II-8 0.095 0.053 1.79 475 II-9 0.062 0.034 1.82 310 II-10 0.078 0.043 1.81 390 II-11 0.066 0.036 1.83 330 II-12 0.095 0.052 1.82 475 II-13 0.082 0.045 1.82 410 II-14 0.038 0.021 1.80 190 III-1 Population III 0.091 0.05 1.82 455 III-2 0.087 0.048 1.81 435 III-3 0.092 0.051 1.8 460 III-4 0.083 0.046 1.8 415 III-5 0.089 0.049 1.81 445 III-6 0.078 0.043 1.81 390 III-7 0.106 0.059 1.79 530 III-8 0.086 0.047 1.82 430 III-9 0.069 0.038 1.81 345 III-10 0.337 0.187 1.8 1685 III-11 0.067 0.037 1.81 335 III-12 0.375 0.208 1.8 1875

RESULTS AND DISCUSSION 77

A B C

POPULATION –I; (A)=OPA01, (B)=OPA15, (C)=OPB20; Lane 0:100 bp DNA ladder; 1:Baghgora; 2:Hill curt road; 3:Montiviot; 4:St. Marys post office; 5:Chaita Pani Tea Garden; 6:Edenvale tea gar- den; 7:Lower Sonada; 8:Sonada khasmahal; 9:Sonada post office; 10:Tibetian monastery; 11:Sonada forest; 12:Upper Sonada; 13:Ghoom; 14:Dooteria forest; 15:Senchal; 16:Bhalikhop; 17:Ghoom mon- astery; 18:λ DNA-EcoRI+Hind III double digest DNA ladder

D E F

POPULATION-II; (D)=OPA02, (E)=OPA19, (F)=OPB15; Lane 0:100 bp DNA ladder; 1:Saint Jo- sephs College; 2:Richmond hill; 3:Chauk bazaar; 4:Pandam Limbu; 5:Limbugaon; 6:Jalapahar; 7:West point; 8:Batasia loop; 9:Lepcha jagat; 10:Majdhura; 11:Sukhia pokhri; 12:Mim Nagri; 13:Pashupatinagar; 14:Mirik; 15:λ DNA-EcoRI+Hind III double digest DNA ladder

G H I

POPULATION-III, (G)=OPA02, (H)=OPB03, (I)=OPA19, Lane 0:100 bp DNA ladder;1:Lower sichey; 2:Tadong; 3:Namchi; 4:Ranipool; 5:Rishop; 6:Icha forest; 7:Delo; 8:Baghdhara; 9:Sunwar; 10:Upper Mongpoo; 11:Chinchona plantation; 12:Mongpoo bazaar; 13:λ DNA-EcoRI+Hind III dou- ble digest DNA ladder

Figure 4.4 (A-I): Gel showing DNA bands of A. nepalensis species amplified by the RAPD primers

RESULTS AND DISCUSSION 78 Table 4.5: Total number and size of amplified bands, number of monomorphic and polymorphic bands and percentage of polymorphism generated by the RAPD primers (MB=Monomorphic bands, PB= poly- morphic bands)

Amplified RAPD loci Primers Sequence Population I Populatin II Population III Band No MB PB Band size (bp) Band No MB PB Band size (bp) Band No MB PB Band size (bp) OPA 01 CAGGCCCTTC 13 2 11 250-1900 9 2 7 210-1300 13 2 11 210-1300 OPA 02 TGCCGAGCTG 11 0 11 230-1600 10 0 10 280-1400 10 0 10 220-1580 OPA 03 AGTCAGCCAC - - - - 11 0 11 230-1600 15 2 13 200-1320 OPA 04 AATCGGGCTG 10 1 9 250-1580 17 1 16 220-1670 9 1 8 230-1860 OPA 05 ATTTTGCTTG 12 0 12 240-1700 9 0 9 400-1880 - - - - OPA 07 GAAACGGGTG 9 0 9 230-1400 12 0 12 250-1540 8 0 8 400-1800 OPA 09 GGGTAACGCC - - - - 10 0 10 390-1320 10 0 10 190-1370 OPA 10 GTGATCGCAG 10 2 8 280-1300 15 0 15 220-1300 14 1 13 220-1900 OPA 11 CAATCGCCGT 9 0 9 270-1920 13 1 13 210-1700 13 0 13 250-1690 OPA 12 TCGGCGATAG 9 0 9 310-1850 9 0 9 210-1920 11 0 11 260-1800 OPA 13 CAGCACCCAC 11 2 9 320-1900 10 1 9 250-1700 14 2 12 230-1320 OPA 15 TTCCGAACCC 9 1 8 300-1640 14 1 13 230-1680 10 1 9 240-1700 OPA 16 AGCCAGCGAA 14 0 14 250-1430 8 0 8 380-1880 9 0 9 220-1860 OPA 17 GACCGCTTGT 10 0 10 260-1680 11 2 9 220-1650 11 0 11 190-1900 OPA 18 AGGTGACCGT 12 0 12 240-1600 ------OPA 19 CAAACGTCGG 9 0 9 240-1900 12 0 12 250-1320 8 0 8 210-1700 OPA 20 GGACCCTTAC 10 0 10 300-1850 - - - - 8 1 7 310-1886 OPB 01 GTTTCGCTCC 9 0 9 230-1900 10 0 10 220-1320 - - - - OPB 02 TGATCCCTGG 13 0 13 240-1950 9 0 9 310-1950 10 0 10 200-1820 OPB 03 AGTCAGCCAC 10 0 10 260-1700 9 1 8 230-1800 13 1 12 300-1800 OPB 04 GGACTGGAGT 10 2 8 270-1500 9 2 7 310-1860 15 1 14 200-1450 OPB 05 TGCGCCCTTC 9 0 9 320-1800 14 1 13 320-1600 10 2 8 210-1490 OPB 06 TGCTCTGCCCC 15 1 14 310-1400 16 0 16 310-1790 9 1 8 420-1900 OPB 07 GGTGACGCAG 9 0 9 220-2100 12 2 10 210-1580 10 2 8 310-1800 OPB 08 GTCCACACGG 13 0 13 220-1680 9 0 9 200-1600 14 2 12 280-1380 OPB 10 CTGCTGGGAC 11 0 11 240-1960 10 0 10 210-1390 8 2 6 140-1600 OPB 11 GTAGACCCGT 18 1 17 250-1650 14 0 13 240-1760 12 1 11 230-1930 OPB 12 CCTTGACGCA 12 1 11 310-1800 19 1 18 200-1900 11 1 10 220-1600 OPB 13 TTCCCCCGCT 16 1 15 220-1450 12 1 11 240-1530 9 1 8 290-1840 OPB 14 TCCGCTCTGG 15 0 15 250-2000 16 1 15 220-1730 11 2 9 210-1740 OPB 15 GGAGGGTGTT 13 0 13 250-1770 12 0 12 200-1940 13 0 13 210-1930 OPB 17 AGGGAACGAG 10 0 10 220-1500 - - - - 17 1 16 300-2000 OPB 18 CCACAGCAGT 16 1 15 300-1400 - - - - 15 0 15 200-1680 OPB 20 GGACCCTTAC 10 1 9 220-1900 15 0 15 240-1750 12 1 11 220-1640 Total 367 16 351 356 17 339 352 28 324 %of Polymorphysm 95.64 95.22 92.04

RESULTS AND DISCUSSION 79 percentage of polymorphism was found bp to 2100 bp (Table 4.5). A. to be 95.64%. The number of nepalensis collected from population I polymorphic bands generated by each showed highest polymorphism while decamer primers ranged in between 8 population III showed lowest and 17. polymorphism amongst all three

Likewise, 30 primers resulted in populations. production of 356 bands in population 4.2.2. Similarity matrix analysis:

II. The fragments found, ranging in The similarity matrix obtained using between 200 bp to 1950 bp of which 17 Dice coefficient of similarity (Nei and were monomorphic and rest 339 were Li 1979) has been depicted in Table polymorphic (Table 4.5). The 4.6. The similarity coefficient among percentage of polymorphism was found the 43 species ranged from 0.435 – to be 95.22%. The number of 0.95. The lowest similarity (43.5%) polymorphic bands generated by each was observed between samples decamer primers ranged in between 7 collected from Tibetian monastery and 16. (C10-6700ft) and Lower Sichey (C32- However, 28 monomorphic bands and 4857ft), while the highest similarity 324 polymorphic bands were recorded (95%) found between samples from population III by successful collected from Upper Sonada (C12- amplification of 31 primers. The 6915ft) and Ghoom (C13-7091ft). amplified products ranged in size from The resulting dendrogram, generated 140 bp to 2000 bp. The amplification by using 34 oligonucleotide primers pattern showed that the percentage of has been shown in Figure 4.5. polymorphism was 92.04% and the Dendrogram showed three distinct number of polymorphic bands clusters and represented by 15, 15 and generated by each decamer primer 13 samples respectively. ranged in between 6 and 15 (Table It is interesting to note that while 4.5). collecting samples, I divided collection Total number of 1075 major scorable sites into two parts. In one part I bands, among which 61 monomorphic collected the samples from Darjeeling and 1014 polymorphic bands were hills which was further divided into recorded from the whole study and the route from Darjeeling to Kurseong amplification product ranged from 140

RESULTS AND DISCUSSION 80 Table 4.6: Similarity coefficient matrix of RAPD profile of 43 Alnus species C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 C28 C29 C30 C31 C32 C33 C34 C35 C36 C37 C38 C39 C40 C41 C42 C43 C1 1.00 C2 0.82 1.00 C3 0.81 0.79 1.00 C4 0.79 0.85 0.78 1.00 C5 0.66 0.75 0.69 0.74 1.00 C6 0.76 0.75 0.74 0.73 0.79 1.00 C7 0.76 0.70 0.69 0.72 0.71 0.75 1.00 C8 0.76 0.72 0.71 0.74 0.69 0.74 0.78 1.00 C9 0.73 0.74 0.69 0.71 0.69 0.71 0.81 0.74 1.00 C10 0.71 0.76 0.69 0.71 0.69 0.71 0.80 0.75 0.91 1.00 C11 0.75 0.80 0.73 0.81 0.71 0.69 0.74 0.74 0.78 0.79 1.00 C12 0.73 0.78 0.71 0.81 0.71 0.70 0.72 0.71 0.74 0.75 0.92 1.00 C13 0.75 0.77 0.73 0.81 0.74 0.72 0.74 0.71 0.76 0.77 0.93 0.95 1.00 C14 0.51 0.54 0.56 0.54 0.55 0.58 0.57 0.58 0.59 0.59 0.61 0.62 0.63 1.00 C15 0.45 0.50 0.49 0.49 0.56 0.52 0.57 0.54 0.54 0.51 0.56 0.56 0.56 0.57 1.00 C16 0.76 0.77 0.70 0.79 0.72 0.72 0.70 0.75 0.72 0.69 0.76 0.76 0.76 0.53 0.44 1.00 C17 0.61 0.69 0.71 0.69 0.66 0.66 0.61 0.63 0.60 0.64 0.64 0.64 0.66 0.56 0.46 0.71 1.00 C18 0.59 0.61 0.55 0.57 0.57 0.59 0.59 0.63 0.57 0.61 0.58 0.57 0.62 0.59 0.55 0.58 0.56 1.00 C19 0.57 0.59 0.54 0.56 0.51 0.59 0.55 0.56 0.50 0.55 0.58 0.56 0.58 0.61 0.52 0.56 0.57 0.77 1.00 C20 0.51 0.53 0.51 0.48 0.46 0.52 0.51 0.49 0.46 0.51 0.47 0.46 0.47 0.63 0.56 0.44 0.56 0.71 0.74 1.00 C21 0.54 0.55 0.48 0.50 0.50 0.56 0.52 0.51 0.50 0.51 0.51 0.50 0.52 0.61 0.59 0.52 0.51 0.77 0.74 0.79 1.00 C22 0.53 0.55 0.52 0.50 0.50 0.51 0.54 0.47 0.57 0.61 0.56 0.54 0.56 0.61 0.59 0.48 0.51 0.64 0.61 0.66 0.64 1.00 C23 0.55 0.59 0.53 0.52 0.54 0.52 0.53 0.51 0.52 0.53 0.54 0.52 0.56 0.61 0.56 0.49 0.55 0.66 0.72 0.74 0.68 0.74 1.00 C24 0.56 0.59 0.54 0.56 0.56 0.61 0.56 0.54 0.54 0.56 0.56 0.57 0.59 0.62 0.59 0.51 0.54 0.74 0.76 0.78 0.81 0.66 0.76 1.00 C25 0.56 0.56 0.54 0.54 0.46 0.54 0.49 0.50 0.50 0.54 0.52 0.50 0.52 0.64 0.58 0.48 0.53 0.70 0.76 0.76 0.79 0.61 0.72 0.83 1.00 C26 0.57 0.58 0.55 0.57 0.53 0.60 0.55 0.54 0.56 0.58 0.58 0.57 0.59 0.64 0.59 0.52 0.51 0.74 0.77 0.76 0.79 0.64 0.75 0.90 0.91 1.00 C27 0.56 0.59 0.54 0.56 0.51 0.54 0.52 0.49 0.56 0.59 0.56 0.54 0.56 0.58 0.54 0.48 0.56 0.70 0.64 0.71 0.71 0.69 0.72 0.79 0.81 0.79 1.00 C28 0.56 0.60 0.54 0.56 0.52 0.55 0.53 0.49 0.54 0.57 0.56 0.55 0.56 0.57 0.54 0.51 0.58 0.72 0.64 0.73 0.71 0.68 0.73 0.82 0.81 0.81 0.92 1.00 C29 0.54 0.59 0.56 0.56 0.51 0.59 0.47 0.54 0.49 0.51 0.54 0.56 0.56 0.60 0.56 0.53 0.54 0.74 0.71 0.77 0.74 0.62 0.71 0.81 0.82 0.85 0.75 0.79 1.00 C30 0.55 0.57 0.53 0.55 0.48 0.58 0.50 0.52 0.51 0.53 0.54 0.56 0.56 0.57 0.57 0.51 0.51 0.71 0.74 0.73 0.74 0.59 0.71 0.82 0.86 0.89 0.76 0.80 0.93 1.00 C31 0.55 0.56 0.53 0.55 0.49 0.58 0.49 0.52 0.49 0.51 0.53 0.55 0.54 0.57 0.53 0.53 0.55 0.71 0.69 0.73 0.68 0.59 0.70 0.79 0.82 0.85 0.75 0.80 0.94 0.93 1.00 C32 0.51 0.51 0.52 0.49 0.51 0.53 0.49 0.51 0.47 0.44 0.52 0.53 0.52 0.52 0.61 0.49 0.49 0.54 0.54 0.55 0.57 0.53 0.58 0.60 0.57 0.57 0.54 0.56 0.55 0.58 0.56 1.00 C33 0.49 0.53 0.53 0.52 0.49 0.49 0.49 0.48 0.51 0.49 0.51 0.55 0.53 0.56 0.66 0.47 0.55 0.58 0.54 0.60 0.56 0.61 0.59 0.55 0.55 0.55 0.54 0.57 0.54 0.56 0.53 0.78 1.00 C34 0.49 0.51 0.50 0.49 0.51 0.51 0.50 0.49 0.46 0.47 0.50 0.52 0.51 0.51 0.61 0.47 0.55 0.61 0.56 0.61 0.61 0.58 0.64 0.62 0.62 0.59 0.64 0.64 0.57 0.60 0.59 0.84 0.80 1.00 C35 0.52 0.53 0.51 0.52 0.54 0.52 0.50 0.48 0.48 0.49 0.50 0.52 0.51 0.50 0.59 0.49 0.54 0.56 0.56 0.57 0.58 0.58 0.60 0.62 0.62 0.59 0.61 0.63 0.57 0.61 0.60 0.86 0.81 0.90 1.00 C36 0.45 0.50 0.49 0.48 0.52 0.49 0.46 0.49 0.48 0.46 0.53 0.54 0.53 0.56 0.60 0.49 0.56 0.54 0.52 0.59 0.59 0.59 0.59 0.61 0.55 0.58 0.56 0.54 0.56 0.56 0.56 0.82 0.76 0.79 0.74 1.00 C37 0.54 0.56 0.55 0.51 0.54 0.53 0.54 0.57 0.50 0.52 0.55 0.56 0.55 0.58 0.59 0.56 0.56 0.60 0.59 0.61 0.60 0.59 0.61 0.60 0.61 0.60 0.56 0.61 0.56 0.59 0.58 0.76 0.76 0.79 0.82 0.75 1.00 C38 0.48 0.49 0.46 0.48 0.51 0.48 0.47 0.48 0.46 0.49 0.51 0.55 0.53 0.50 0.64 0.50 0.48 0.59 0.58 0.60 0.59 0.59 0.59 0.58 0.55 0.55 0.52 0.57 0.54 0.56 0.56 0.81 0.84 0.84 0.83 0.79 0.78 1.00 C39 0.50 0.54 0.49 0.53 0.53 0.50 0.51 0.50 0.53 0.54 0.54 0.56 0.58 0.56 0.58 0.54 0.53 0.60 0.57 0.56 0.56 0.61 0.61 0.59 0.53 0.54 0.54 0.58 0.51 0.54 0.52 0.73 0.78 0.78 0.75 0.75 0.77 0.84 1.00 C40 0.51 0.57 0.53 0.58 0.54 0.49 0.46 0.54 0.54 0.54 0.57 0.58 0.59 0.53 0.60 0.54 0.56 0.61 0.61 0.53 0.55 0.64 0.59 0.58 0.58 0.55 0.61 0.63 0.54 0.59 0.56 0.75 0.77 0.79 0.84 0.71 0.76 0.77 0.78 1.00 C41 0.48 0.49 0.50 0.49 0.55 0.52 0.51 0.51 0.51 0.54 0.50 0.52 0.53 0.56 0.61 0.50 0.55 0.59 0.58 0.59 0.59 0.62 0.56 0.62 0.62 0.62 0.62 0.60 0.54 0.59 0.57 0.74 0.73 0.80 0.81 0.76 0.82 0.77 0.75 0.77 1.00 C42 0.48 0.47 0.49 0.48 0.54 0.51 0.51 0.49 0.52 0.54 0.49 0.51 0.50 0.54 0.61 0.49 0.52 0.56 0.56 0.57 0.56 0.65 0.57 0.61 0.59 0.61 0.61 0.60 0.54 0.59 0.56 0.74 0.73 0.77 0.80 0.76 0.78 0.77 0.74 0.76 0.94 1.00 C43 0.56 0.54 0.56 0.56 0.55 0.52 0.51 0.55 0.51 0.51 0.53 0.56 0.57 0.51 0.54 0.54 0.51 0.58 0.58 0.57 0.55 0.56 0.57 0.56 0.56 0.55 0.56 0.59 0.59 0.57 0.59 0.76 0.73 0.76 0.77 0.71 0.75 0.81 0.75 0.76 0.77 0.76 1.00

C1: Baghgora (4959); C2: Hill curt road (4905); C3: Montiviot (4888); C4: St. Marys post office (5096); C5: Chaita Pani Tea Garden (5383); C6: Edenvale tea garden (5473); C7: Lower Sonada (5842); C8: Sonada khasmahal (6238); C9: Sonada post office (6389); C10: Tibetian monastery (6700); C11: Sonada forest (7174); C12: Upper Sonada (6915); C13: Ghoom (7091); C14: Dooteria forest (7033); C15: Senchal (7362); C16: Bhalikhop (7347); C17: Ghoom monastery (7293); C18: Saint Jo- sephs College (6492); C19: Richmond hill (6451); C20: Chauk bazaar (6904); C21: Pandam Limbu (6577); C22: Limbugaon (7052); C23: Jalapahar (7431); C24: West point (6738); C25: Batasia loop (6968); C26: Lepcha jagat (6952); C27: Majdhura (7077); C28: Sukhia pokhri (7102); C29: Mim Nagri (6756); C30: Pashupatinagar (6669); C31: Mirik (5267); C32: Lower sichey (4857); C33: Tadong (4131); C34: Namchi (4154); C35: Ranipool (3926); C36: Rishop (7598); C37: Icha forest (6663); C38: Delo (5088); C39: Baghdhara (3811); C40: Sunwar (3880); C41: Upper Mongpoo (6144); C42: Chinchona plantation (3643); C43: Mongpoo bazaar

RESULTS AND DISCUSSION 81

Figure 4.5: Dendrogram constructed on the basis of the data obtained from RAPD analysis (C1: Baghgora (4959); C2: Hill curt road (4905); C3: Montiviot (4888); C4: St. Marys post office (5096); C5: Chaita Pani Tea Garden (5383); C6: Edenvale tea garden (5473); C7: Lower Sonada (5842); C8: Sonada khasmahal (6238); C9: Sonada post office (6389); C10: Tibetian monastery (6700); C11: Sonada forest (7174); C12: Upper Sonada (6915); C13: Ghoom (7091); C14: Dooteria forest (7033); C15: Senchal (7362); C16: Bhalikhop (7347); C17: Ghoom monastery (7293); C18: Saint Josephs College (6492); C19: Richmond hill (6451); C20: Chauk bazaar (6904); C21: Pandam Limbu (6577); C22: Limbugaon (7052); C23: Jalapahar (7431); C24: West point (6738); C25: Batasia loop (6968); C26: Lepcha jagat (6952); C27: Majdhura (7077); C28: Sukhia pokhri (7102); C29: Mim Nagri (6756); C30: Pashupatinagar (6669); C31: Mirik (5267); C32: Lower sichey (4857); C33: Tadong (4131); C34: Namchi (4154); C35: Ranipool (3926); C36: Rishop (7598); C37: Icha forest (6663); C38: Delo (5088); C39: Baghdhara (3811); C40: Sunwar (3880); C41: Upper Mongpoo (6144); C42: Chinchona plantation (3643); C43: Mongpoo bazaar (3616))

RESULTS AND DISCUSSION 82

(collection route I) and the other was has been shown the population of A. from Ghoom to Mirik via Pashupathi nepalensis, the eastern part of (collection route II). The third one was Darjeeling hills is different than that of from Kalimpong to Gangtok via the western part. Similarly POP III Rangpo (collection route III) which is which is the left side of river Teesta the left side of Tista river. It was (incidentally POP I and II are situated observe that in RAPD analysis the at the right side of Teesta river) samples clustered almost exactly on the clustered in a completely different basis of their geographical locations. manner (cluster III).

The samples C1 to C17 (other than 4.2.3. Genetic similarity and distance C14 and C15) grouped together as between populations: cluster I were from collection routr I. I also measured closeness between the C18 to C31 and C14 grouped as cluster three populations in terms of Genetic II (collection route II) and C32 to C43 similarity & distance. Of the 34 along with C15 as cluster III primers producing distinct scorable (collection route III). Henceforth, they bands in all the three populations, 26 will be called as POP I, II and III primers were found to be common, respectively. producing scorable bands in all three However, Sample C14 (Dooteria forest populations individually (Table 4.7). at 7033ft.) and sample C15 (Senchal at Therefore I considered, bands 7362ft.) though belonged to cluster II produced by 26 primers, to measure the and cluster III, but both sample has genetic variation between populations. been collected from collection route I Table 4.7 shows that population of (POP I). POP I and II acquired 6 common bands The above result clearly showed that whereas POP I and POP III shared 5 the genetic distance and similarity are bands. POP II and III on the other hand clearly dependent on the geographical shared 4 common bands with each location of the populations. The root, other. Table 4.7 also revealed that which is via Pashupathi, is relatively primer sequences of OPA01, OPA04, less popular and runs through the OPA13, OPB04, OPB06, OPB12, and western part of Darjeeling hills. OPB13 were found to be present in the Whereas, the POP I is the more popular A. nepalensis samples present in POP I route runs through the eastern part. It and II. Likewise A. nepalensis of POP I

RESULTS AND DISCUSSION 83

Table 4.7: Primers found in all three populations producing distinct scorable bands

Common bands in Common bands in Common bands in Primers population I & II population I & III population II & III OPA 01 1 0 0 OPA 02 0 0 0 OPA 04 0 0 0 OPA 07 0 0 0 OPA 10 0 1 0 OPA 11 0 0 0 OPA 12 0 0 0 OPA 13 1 1 0 OPA 15 0 1 0 OPA 16 0 0 0 OPA 17 0 0 0 OPA 19 0 0 0 OPB 02 0 0 0 OPB 03 0 0 0 OPB 04 2 0 0 OPB 05 0 0 1 OPB 06 0 0 0 OPB 07 0 0 1 OPB 08 0 0 0 OPB 10 0 0 0 OPB 11 0 1 0 OPB 12 1 1 1 OPB 13 1 0 0 OPB 14 0 0 1 OPB 15 0 0 0 OPB 20 0 0 0 Total 6 5 4

RESULTS AND DISCUSSION 84 and III shared primer sequence of was found to be lowest at 81.14%. The OPA10, OPA13, OPA15, OPB141 and similarity between population I and III OPB12. Similarly, primer sequence of was found to be 83.42%.

OPB05, OPB07, OPB12, and OPB14 The resulting dendrogram (Figure 4.6) were found in A. nepalensis samples showed POP I and II are relatively present in POP II and III. closer than POP III. In this case Tista A binary matrix has been prepared river may be considered as (Table 4.8) by counting the presence geographical barrier for dispersal of and absence of common monomorphic germplasm two sides of Tista. bands in all three populations and 4.3. Characterization of subsequently the similarity between actinohaemoglobins (in-silico): populations was calculated. The 4.3.1. Characterization of similarity coefficient amongst three haemoglobins from actinorhizal plants: populations has been found to be 81.14 -88.57%. The similarity between 4.3.1.1. Sequences retrieval: population I and II was found to be Amongst the selected 96 plant highest and was 88.57%, while the haemoglobins (pHbs) available in similarity between population II and III public domain of NCBI, 32 species

Table 4.8: Similarity coefficient amongst studied populations of A. nepalensis Population I Population II Population III

Population I 1.00

Population II 0.88 1.00

Population III 0.83 0.81 1.00

Figure 4.6: Dendrogram constructed on the basis of the genetic similarity and distance between popu- lations of A. nepalensis

RESULTS AND DISCUSSION 85 were found to contain single pHb gene, communis found comparatively long 13 species contained a pair of pHb i.e. 269 amino acid residue than other genes, 5 species contain 3 pHb genes, 3 ptHbs. The molecular weight of pHbs species included 4 pHb genes, Oryza was found to be positively correlated sativa and Glycine max contained 5 with their length. and 6 copies of pHb genes respectively So, class I Hbs (sHb and nsHbs) were in their genome. found comparatively larger amongst Haemoglobins (Hbs) from plant studied pHbs. systems were found to include 2 In the case of class I Hbs and ptHbs, symbiotic Hbs (sHbs) (nonlegume the pI values varied widely and were Parasponia class I and actinorhizal found from 5.02 to 9.83 and from 5.49 Datisca class II), 22 legHbs (LHbs) to 9.01, respectively. Class II pHbs (class II), 50 nonsymbiotic Hbs (sHb/LHbs and nsHbs) showed their pI (nsHbs) (43 classes I and 7 class II) values ranging from 5.38 to 7.1. The and 22 plant truncated Hbs (ptHbs). low pI values for Class II Hbs can be 4.3.1.2. Physiochemical parameter attributed to the dominance of surface analysis: metal-OH species (Takeda and Fukawa

The results of physiochemical 2005) and helps to explain the higher properties of pHbs are tabulated in oxygen binding capacity of metal Table 4.9. Result showed total numbers cofactors relative to class I pHb of amino acid residues were varied members. This property of class II depending on the pHb classes. In case pHbs also attributed their tendency of class I Hbs (sHb and nsHbs), the towards oxygen diffusion. Similar amino acid residues varied from 140 to observation was reported in 250 residues, whereas in class II (sHb/ Arabidopsis thaliana class II nsHb by LHbs and nsHbs) it ranged from 123 to Vigeolas et al., (2011) and Spyrakis et 161, with only exception of Vicia al., (2011). sativa class II LHb, whose amino acid Result also revealed that the surface of sequence length was found to be class II Hbs and ptHbs were rich in comparatively shorter i.e. 73 amino negatively-charged residues, while the acid residues long. PtHb showed their surface of class I nsHbs were rich in amino acid residue length from 156 to positively-charged residues. Exception 177. Only the ptHb of Ricinus found in Zea mays, Physcomitrella

RESULTS AND DISCUSSION 86

Table 4.9: Physiochemical properties of haemoglobin protein sequences from various plants (Blue colour indicates actinorhizal haemoglobin) (A=Serial number; B=Organisms name; C=Accession number; D=Types of haemoglobin; E=Number of amino acids; F=Molecular weight; G=Theoretical pI; H=Total –ve residues; I=Total +ve residues; J=Instability index; K=Aliphatic index; L=GRAVY) A B C D E F G H I J K L A B C D E F G H I J K L 1 Zea mays AAZ98790.1 nsHb-1 191 20646.3 5.02 31 22 38.17 74.40 -0.239 49 Eutrema halophilum BAJ33934.1 nsHb-2 158 17962.6 5.70 26 21 32.68 88.23 -0.372 2 Zea mays subsp. Mays AAG01375.1 nsHb-1 165 18279.3 6.32 23 22 22.38 88.24 -0.016 50 Cichorium intybus CAA07547.1 nsHb-2 161 18023.9 5.51 25 21 28.61 90.87 -0.188 3 Wolffia arrhiza AEQ39061.1 nsHb-1 161 17872.8 8.97 21 24 31.67 84.97 -0.057 51 Parasponia andersonii 1212354A sHb-1 162 18184.2 8.59 19 21 38.32 86.05 -0.098 4 Vitis vinifera CBI32538.3 nsHb-1 233 26174.2 9.23 26 33 32.27 82.49 -0.273 52 Casuarina glauca P08054.2 sHb-2 152 17242.9 6.97 20 20 13.91 91.91 -0.370 5 V. vinifera CBI32537.3 nsHb-1 191 21357.8 8.93 21 25 40.83 82.67 -0.171 53 Canavalia lineate AAA18503.1 Lhb-2 149 16295.7 6.51 18 17 17.65 86.44 -0.144 6 Triticum aestivum AAN85432.1 nsHb-1 162 18144.1 8.67 21 23 29.68 85.62 -0.138 54 Medicago sativa AAA32659.1 Lhb-2 145 15612.8 5.90 17 15 36.92 89.52 -0.006 7 Trema virgata CAB63706.1 nsHb-1 161 18152.1 8.59 20 22 26.58 85.40 -0.163 55 Phaseolus vulgaris AAA33767.1 Lhb-2 146 15619.7 5.37 17 14 22.09 95.07 0.009 8 Trema tomentosa CAA68405.1 nsHb-1 161 18150.1 8.59 20 22 31.76 82.98 -0.220 56 Glycine max AAA33980.1 Lhb-2 145 15524.6 5.38 18 15 21.87 90.97 0.026 9 T. orientalis CAB16751.1 nsHb-1 161 18221.1 7.83 20 21 28.72 82.36 -0.181 57 Vigna unguiculata AAA86756.1 Lhb-2 145 15363.5 6.11 17 16 14.00 97.03 0.108 10 Solanum lycopersicum AAK07676.1 nsHb-1 152 17197.0 8.85 20 23 21.41 52.11 -0.220 58 V. unguiculata AAB65769.1 Lhb-2 145 15350.5 5.68 18 26 10.38 96.34 0.093 11 Selaginella moellendorffii EFJ10590.1 nsHb-1 190 21055.5 7.10 23 23 36.47 95.05 0.063 59 Lupinus luteus AAC04853.1 Lhb-2 154 16753.2 5.79 19 16 19.34 101.95 0.066 12 Ricinus communis EEF43319.1 nsHb-1 229 25922.5 9.27 23 30 33.96 92.40 -0.030 60 Psophocarpus tetragonolobus AAC60563.1 Lhb-2 145 15550.7 5.90 17 15 18.60 92.28 -0.002 13 Rheum austral ACH63214.1 nsHb-1 163 18192.1 8.59 21 23 31.07 81.41 -0.146 61 Astragalus sinicus ABB13622.1 Lhb-2 148 15853.1 6.06 16 14 32.27 93.65 0.052 14 Raphanus sativus AAP37043.1 nsHb-1 160 17982.0 8.77 20 23 29.93 82.31 -0.230 62 Pisum sativum BAA31156.1 Lhb-2 146 15998.4 5.56 19 15 31.17 96.23 0.037 15 Populus tremula ABM89109.1 nsHb-1 160 18015.0 8.89 20 23 38.68 81.06 -0.193 63 Lotus japonicas BAB18106.1 Lhb-2 123 12831.7 7.10 12 12 21.49 96.99 0.193 16 Picea sitchensis ABR17163.1 nsHb-1 250 27821.1 9.54 23 33 42.28 80.36 -0.219 64 L. japonicas BAB18107.1 Lhb-2 123 12875.8 7.10 12 12 22.57 96.99 0.190 17 Physcomitrella patens ABK20873.1 nsHb-1 180 20000.8 6.32 23 22 42.89 77.61 -0.343 65 Lotus japonicas BAB18108.1 Lhb-2 147 15754.9 5.29 18 13 19.45 87.69 -0.007 18 Parasponia rigida P68169.2 nsHb-1 162 18150.1 8.59 19 21 39.37 84.88 -0.112 66 Glycine max CAA23730.1 Lhb-2 144 15388.5 5.37 17 14 18.67 93.61 0.058 19 Parasponia andersonii AAB86653.1 nsHb-1 162 18178.2 8.59 19 21 38.32 86.05 -0.098 67 G. max CAA23731.1 Lhb-2 144 15373.5 6.10 17 16 19.76 94.31 0.099 20 Oryza sativa Japonica AAK72231.1 nsHb-1 167 18606.5 9.30 18 22 21.03 83.05 -0.093 68 G. max CAA23732.1 Lhb-2 145 15581.8 5.36 18 15 18.18 94.28 0.081 21 O. sativa Japonica AAK72230.1 nsHb-1 169 18568.6 9.83 15 23 25.76 83.25 -0.041 69 Sesbania rostrata CAA31859.1 Lhb-2 148 15800.0 5.89 16 14 27.21 91.01 0.045 22 O. sativa Indica AAC49881.1 nsHb-1 169 18615.6 8.99 18 21 28.61 82.60 -0.042 70 S. rostrata CAA32043.1 Lhb-2 148 15990.2 5.35 18 14 27.07 90.34 0.012 23 Oryza sativa Japonica AAK72229.1 nsHb-1 166 18444.4 6.91 21 21 36.63 90 -0.019 71 Medicago truncatula CAA40899.1 Lhb-2 147 15841.0 5.59 18 15 43.16 92.31 0.029 24 Myrica gale ABN49927.1 nsHb-1 160 18175.0 6.33 22 21 28.49 84.06 -0.237 72 M. truncatula CAA40900.1 Lhb-2 146 15752.1 6.29 17 16 34.97 92.88 0.037 25 Medicago sativa AAG29748.1 nsHb-1 160 17957.9 9.08 19 23 23.79 81.69 -0.255 73 Vicia sativa CAA70431.1 Lhb-2 73 8201.6 6.10 12 11 10.94 92.19 -0.045 26 Malus hupehensis ACV41424.1 nsHb-1 158 17810.9 8.57 21 23 28.91 88.23 -0.189 74 Vicia faba CAA90870.1 Lhb-2 146 15875.2 5.90 16 14 36.12 92.88 0.025 27 Malus domestica AAP57676.1 nsHb-1 158 17844.9 8.57 21 23 28.91 85.76 -0.196 75 Hordeum vulgare AAK55410.1 Pthb 171 19602.9 6.16 21 18 45.30 56.55 -0.661 28 Lotus japonicas BAE46739.1 nsHb-1 161 18036.0 9.00 18 22 33.40 80.56 -0.201 76 Glycine max AAS48191.1 Pthb 172 19781.2 7.16 18 18 40.48 76.10 -0.501 29 Hordeum vulgare AAB70097.1 nsHb-1 162 18043.0 7.84 21 22 27.97 86.23 -0.096 77 Picea sitchensis ABK22150.1 Pthb 164 19060.2 6.20 21 19 38.50 58.96 -0.709 30 Gossypium hirsutum AAX86687.1 nsHb-1 163 18383.5 8.77 21 24 20.56 86.20 -0.080 78 Populus tremula x Populus tremuloides ABM89110.1 Pthb 165 19045.3 6.65 21 20 36.29 61.52 -0.676 31 Populus trichocarpa XP_002313074.1 nsHb-1 160 17972.9 8.89 20 23 37.50 78.62 -0.219 79 Zea mays ACG29525.1 Pthb 171 19477.8 5.88 23 18 47.97 62.28 -0.561 32 Brachypodium distachyon XP_003558445.1 nsHb-1 162 17654.3 8.63 20 22 30.17 81.48 -0.197 80 Triticum aestivum ACH86231.1 Pthb 171 19574.9 5.96 21 17 47.02 58.83 -0.596 33 Zea mays subsp. Parviglumis AAG01183.1 nsHb-1 165 18278.3 7.82 22 23 22.65 88.24 -0.018 81 Carica papaya ACQ91204.1 Pthb 169 19490.7 5.86 23 18 33.33 64.20 -0.647 34 Glycine max AAA97887.1 nsHb-1 161 18047.9 8.97 19 22 27.05 79.94 -0.245 82 Eutrema halophilum BAJ34404.1 Pthb 173 20090.3 5.49 25 19 46.38 63.82 -0.758 35 Quercus petraea ABO93466.1 nsHb-1 161 17913.8 8.57 20 22 30.39 85.34 -0.130 83 Datisca glomerata CAD33536.1 Pthb 169 19661.2 7.06 19 19 48.77 69.29 -0.592 36 Pyrus communis AAP57677.1 nsHb-1 158 17871.9 8.59 20 22 30.35 86.96 -0.177 84 Sorghum bicolour EER89990.1 Pthb 171 19505.8 5.88 23 18 44.68 62.28 -0.543 37 Euryale ferox AAQ22728.1 nsHb-1 140 15839.3 9.02 18 21 30.10 79.43 -0.155 85 Selaginella moellendorffii EFJ07410.1 Pthb 156 18088.3 6.52 23 22 52.46 65.77 -0.623 38 Citrus unshiu AAK07675.1 nsHb-1 183 20535.8 9.15 22 27 36.66 80 -0.287 86 Arabidopsis thaliana NP_567901.1 Pthb 175 20196.6 5.79 25 20 45.66 62.51 -0.743 39 Chamaecrista fasciculate ABR68293.1 nsHb-1 150 16195.5 6.08 17 15 22.09 89.27 -0.013 87 Oryza sativa Japonica Group NP_001057972.1 Pthb 172 19551.8 6.23 22 19 41.69 53.43 -0.640 40 Ceratodon purpureus ABK41124.1 nsHb-1 177 19535.4 6.31 22 21 37.09 80.62 -0.217 88 Physcomitrella patens XP_001760820.1 Pthb 156 18313.6 7.97 22 23 47.37 61.35 -0.694 41 Casuarina glauca CAA37898.1 nsHb-1 160 17845.7 8.93 20 23 24.18 81.12 -0.220 89 P. patens XP_001781680.1 Pthb 170 19441.9 9.01 17 20 29.10 65.53 -0.485 42 Arabidopsis thaliana AAB82769.1 nsHb-1 160 18034.1 8.46 20 22 32.47 85.31 -0.148 90 Vitis vinifera XP_002284484.1 Pthb 169 19523.9 6.65 20 19 41.27 63.55 -0.644 43 Alnus firma BAE75956.1 nsHb-1 160 17904.8 8.95 19 22 26.72 84.12 -0.156 91 Populus trichocarpa XP_002309574.1 Pthb 165 19102.4 6.65 21 20 35.84 61.52 -0.695 44 Arabidopsis thaliana AAB82770.1 nsHb-2 158 17871.4 5.40 26 19 41.40 92.59 -0.312 92 Ricinus communis XP_002516587.1 Pthb 177 20645.1 6.04 22 18 45.92 67.85 -0.554 45 Solanum lycopersicum AAK07677.1 nsHb-2 156 17755.4 5.85 25 22 37.37 84.49 -0.337 93 R. communis XP_002537252.1 Pthb 157 17474.3 7.71 18 19 15.11 87.58 -0.191 46 Brassica napus AAK07741.1 nsHb-2 161 18318.0 5.89 26 22 30.63 87.20 -0.363 94 R. communis XP_002539183.1 Pthb 269 29504.3 8.96 29 33 28.19 88.18 -0.294 47 Gossypium hirsutum AAK21604.1 nsHb-2 159 18096.6 5.44 28 21 36.67 82.20 -0.473 95 Brachypodium distachyon XP_003563697.1 Pthb 171 19526.8 5.97 22 18 47.09 60.00 -0.620 48 Euryale ferox AAQ22729.1 nsHb-2 147 16404.9 6.10 20 18 29.43 81.63 -0.173 96 Medicago truncatula XP_003603592.1 Pthb 170 19883.3 6.49 21 19 34.00 70.06 -0.632

RESULTS AND DISCUSSION 87 patens, Myrica gale, Chamaecrista volume occupied by the aliphatic side fasciculate and Ceratodon purpureus chain (Ikai 1980). For class II LHbs class I nsHbs; and also P. patens and and nsHbs, the aliphatic index was R. communis in ptHbs. found to be varies from 86.44 to

The in vivo half-life of a protein was 101.95 and 81.63 to 92.59 respectively, calculated via the instability index whereas, for class I Hbs, it ranged from (Guruprasad et al., 1990), which 52.11 to 95.05. In case of ptHbs it was indicates the extent of stability of the varying from 53.43 to 88.18. Thus, proteins. Proteins having instability class II members were found to be index of more than 40 have an in vivo more thermostable than that of ptHbs half-life of less than 5 h, while proteins and class I members of pHbs based on having an instability index value less the higher aliphatic index values. than 40 have a longer in vivo half-life GRAVY (grand average of of 16 h (Rogers et al., 1986). Result hydropathicity) values reflect the showed that the instability index hydrophobicity of the amino acids. An values, for most of the studied class-I increase in GRAVY values indicates and class-II pHbs, were found to be more hydrophobic nature of a protein lower than 40, except for class II nsHb (Klein and Thongboonkerd 2004). from A. thaliana, class I nsHbs of Vitis GRAVY values were found to be vinifera, Picea sitchensis, P. patens highest for class II sHb/Lhb, followed and class II LHb of Medicago by class II nsHbs. Members of class I truncatula Hbs. Surprisingly, most of Hbs possess comparatively low gravy the studied ptHbs were found to have values than class II members. GRAVY more instability index values and hence scores for ptHbs were extremely low unstable, while Picea sitchensis, indicating lower hydrophobicity Carica papaya, P. patens, Populus compared to the other pHbs. trichocarpa, R. communis and M. An investigation of the physiochemical truncatula ptHbs were found to be parameters, revealed that class II nsHbs stable in nature. showed a remarkable resemblance with The thermostability of the proteins was class II sHb/Lhbs and those properties assessed by their aliphatic index, which differ them from that of class I Hbs and is directly proportional to the ptHbs. Results also showed that thermostability and defined as relative actinorhizal Hbs (Alnus firma,

RESULTS AND DISCUSSION 88

Casuarina glauca, M. gale and Datisca showed 5 Hbs and Frankia sp. EuI1C glomerata) showed general trend with contained 6 Hbs in their genomes. Hb other pHbs, whereas nsHb M. gale had genes were therefore found to be higher oxygen binding capacity distributed in an irregular manner compared to other actinorhizal class I within Frankia genomes. nsHbs. 4.3.2.2.. Distribution of actinobacterial 4.3.2. Characterization of haemoglobins into species biotopes: haemoglobins from selected Biotope is an area of uniform actinobacteria: environmental conditions providing a 4.3.2.1. Identification of living place for a specific assemblage actinobacterial haemoglobins: of microorganisms (Sen et al., 2014). Amongst the selected 100 Actinobacteria are able survive under actinobacterial type strains, 72 species different environmental conditions were found to contain Hb genes in their (Sen et al., 2014). Based on their genome (Table 4.10). Total 121 predominant lifestyle, they were actinobacterial Hbs (bHbs) were segregated into seven different identified, amongst which, 44 species biotopes. A letter code was assigned contained single Hb gene, 18 species for each biotope (T = Thermal, S = Soil had a pair of Hb genes, 4 species had 3 -borne, M = Mammalian-borne, P= Hbs, and 3 species contained 4 Hb Plant-borne, X= Extremophile, W = genes. Only 1 species was found to be Water-borne. C = Arthropod-borne). associated with 5, and 2 species All 22 Hbs identified in Frankia, were contained 6 Hb genes. showed their plant specificity and hence comes under plant specific It was found that different strains of biotope. Frankia possessed different numbers of Hbs in their genome. Table 4.10 4.3.2.3. Codon usage of actinobacterial showed that both, Candidatus Frankia haemoglobins: datiscae Dg1 and Frankia sp. CcI3 Codon usage patterns were determined contained 2 Hbs in their genome, to calculate the level of heterogeneity whereas Frankia sp. EAN1pec in codon use. Codon usage variation of contained 3 Hbs, Frankia alni ACN bHbs were tabulated in Table 4.11. In 14A showed 4 Hbs, Frankia sp. CN3 Frankia, the GC and GC3 content of

RESULTS AND DISCUSSION 89

Table 4.10: Identification of actinobacterial haemoglobins with well distinguished biotope codes (I=Species name, II=Number of haemoglobins; III=Product name; IV=Gene ID; V=Amino acid sequence length; VI=Biotope code) (T=Thermal, S=Soil- borne, M=Mammalian-borne, P=Plant-borne, X=Extremophile, W=Water-borne. C=Arthropod-borne)

I II III IV V VI I II III IV V VI Acidothermus cellulolyticus 11B 1 Globin 639730395 132 T Micrococcus luteus NCTC 2665 2 Hemoglobin-like flavoprotein 644805325 394 W Actinoplanes missouriensis NBRC 102363 2 Truncated hemoglobins 2514081203 104 S Tuncated hemoglobin 644805941 357 W Truncated hemoglobins 2514080769 252 S Micromonospora aurantiaca ATCC 27029 6 Globin 648137780 388 W Actinosynnema mirum 101, DSM 43827 1 Globin 644941681 127 S Globin 648136646 403 W Arthrobacter arilaitensis re117, CIP 108037 1 bacterial globin-like protein 649664446 172 S Globin 648140713 373 W Arthrobacter aurescens TC1 1 protozoan/cyanobacterial globin-like family protein 639801187 160 S Globin 648137781 129 W Arthrobacter chlorophenolicus A6 1 Globin 643590570 149 S Globin 648139511 145 W Arthrobacter phenanthrenivorans Sphe3 2 Hemoglobin-like flavoprotein 650465365 386 S Globin 648135685 124 W Truncated hemoglobin 650466896 149 S Mycobacterium gilvum PYR-GCK 2 Globin 640457487 124 S Beutenbergia cavernae HKI 0122, DSM 12333 2 Globin 643865198 151 S Globin 640456788 136 S Globin 643866730 153 S Mycobacterium intracellulare ATCC 13950 3 Truncated hemoglobins 2519499742 131 M Brachybacterium faecium 6-10, DSM 4810 2 Hemoglobin-like flavoprotein 644998274 433 M Globin 2519501035 135 M Truncated hemoglobin 644997225 148 M Truncated hemoglobins 2519499937 138 M Candidatus Frankia datiscae Dg1 2 Globin 2506866885 229 P Mycobacterium leprae Br4923 1 Hmoglobin-like, oxygen carrier 643606067 128 M Globin 2506866034 137 P Mycobacterium massiliense GO 06 1 Hemoglobin-like protein 2517209746 139 M Catenulispora acidiphila ID139908, DSM 44928 2 Globin 644960434 129 S Mycobacterium tuberculosis CCDC5079 2 Hemoglobin-like protein 651086212 358 M Globin 644953730 127 S Hemoglobin glbN 651084320 130 M Cellulomonas flavigena 134, DSM 20109 1 Globin 646794228 127 S Mycobacterium vanbaalenii PYR-1 2 Globin 639806461 124 S Clavibacter michiganensis michiganensis NCPPB 382 1 Hemoglobin-like protein, truncated hemoglobin family 640540772 131 P Globin 639807530 136 S Conexibacter woesei ID131577, DSM 14684 1 Globin 646500897 152 S Nakamurella multipartita Y-104 DSM 44233 4 Globin 645041591 153 W Corynebacterium aurimucosum CN-1, ATCC 700975 1 Hemoglobin-like protein 643835428 129 M Globin 645040952 145 W Corynebacterium diphtheriae 241 1 Hemoglobin-like protein 2511723893 130 M Globin 645041024 122 W Corynebacterium glutamicum Kalinowski ATCC 13032 1 Hemoglobin-like protein 639316939 131 M Globin 645040233 150 W Corynebacterium jeikeium K411 1 Hemoglobin-like protein 637664328 133 M Nocardioides sp. JS614 1 Globin 639775974 124 S Corynebacterium resistens DSM 45100 1 Hemoglobin-like protein 650932389 133 M Nocardiopsis alba ATCC BAA-2165 2 Globin family protein 2518832609 287 C Corynebacterium ulcerans 0102 1 Hemoglobin-like protein 2517222784 130 M Bacterial-like globin 2518829770 167 C Corynebacterium variabile DSM 44702 1 Hemoglobin-like protein 2511686199 157 P Nocardiopsis dassonvillei dassonvillei DSM 43111 2 Globin 646840997 215 M Flavohemoprotein (Hemoglobin-like protein) (Flavohemoglobin) Frankia alni ACN14a 4 (Nitric oxide dioxygenase) (NO oxygenase) (NOD) 638101691 449 P Globin 646839507 173 M Hypothetical protein-similar to truncated hemoglobins 638100010 142 P Pseudonocardia dioxanivorans CB1190 1 Globin 651156893 126 W Hemoglobin-like protein HbO 638098563 135 P Rhodococcus equi 103S 1 Bacterial-like globin glbo 649741818 161 M Hemoglobin-like protein HbN (Flavohemoglobin) 638099186 118 P Rhodococcus erythropolis PR4 1 2-on-2 Hemoglobin 643780427 139 S Frankia sp. CcI3 2 Globin 637879906 119 P Rhodococcus jostii RHA1 1 Probable globin protein 638087975 146 S Globin 637878481 140 P Rhodococcus opacus B4 1 2-on-2 Hemoglobin 646583198 135 S Protozoan/cyanobacterial globin family Frankia sp. CN3 5 Hemoglobin-like flavoprotein 2508672604 140 P Rothia dentocariosa ATCC 17931 1 protein 649719684 155 M Hemoglobin-like flavoprotein 2508675771 139 P Rothia mucilaginosa DY-18 2 Hemoglobin-like flavoprotein 646472031 418 M Truncated hemoglobins 2508673600 147 P Hemoglobin-like flavoprotein 646470423 344 M Truncated hemoglobins 2508678287 149 P Saccharomonospora viridis P101, DSM 43017 1 Truncated hemoglobin 644975270 132 M Truncated hemoglobins 2508676245 133 P Saccharopolyspora erythraea NRRL 2338 1 Hemoglobin-like, oxygen carrier 640172072 124 S Frankia sp. EAN1pec 3 Globin 641239178 257 P Salinispora tropica CNB-440 3 Globin 640475188 404 W Globin 641244174 139 P Globin 640476380 403 W Globin 641242564 141 P Globin 640475792 142 W Frankia sp. EuI1c 6 Globin 649753510 276 P Sanguibacter keddieii ST-74, DSM 10542 1 Truncated hemoglobin 646609806 133 M Globin 649748549 149 P Segniliparus rotundus CDC 1076, DSM 44985 1 Globin 646816763 142 M Globin 649751219 152 P Stackebrandtia nassauensis LLR-40K-21, DSM 44728 2 Globin 646676912 146 S Globin 649748460 147 P Globin 646679104 150 S Globin 649752185 145 P Streptomyces avermitilis MA-4680 1 Globin-like protein 637208528 134 S Globin 649751567 124 P Streptomyces bingchenggensis BCW-1 1 Globin 646975158 152 S Geodermatophilus obscurus G-20, DSM 43160 1 Globin 646517211 129 X Streptomyces cattleya DSM 46488 1 Globin 2511978826 134 S Gordonia polyisoprenivorans VH2, DSM 44266 2 Hemoglobin-like protein 2512745548 138 S Streptomyces coelicolor A3(2) 1 Globin 637266913 137 S Globin 2512744326 168 S Streptosporangium roseum NI 9100, DSM 43021 1 Truncated hemoglobins-like protein 646459606 255 S Intrasporangium calvum 7KIP, DSM 43043 1 Globin 649829399 128 S Thermobifida fusca YX 1 Similar to Truncated hemoglobins 637685922 188 T Jonesia denitrificans 55134, DSM 20603 1 Globin 645003676 135 M Thermobispora bispora R51, DSM 43833 1 Globin 646811125 129 T Kineococcus radiotolerans SRS30216 1 Globin 640827188 149 X Thermomonospora curvata DSM 43183 2 Globin 646421851 128 T Kitasatospora setae KM-6054, NBRC 14216 2 Hemoglobin-like flavoprotein 2511598615 138 S Globin 646419031 127 T Truncated hemoglobins 2511601179 154 S Tsukamurella paurometabola 33, DSM 20162 1 Globin 646804221 135 C Kocuria rhizophila DC2201 1 2-on-2 hemoglobin 642590476 136 S Verrucosispora maris AB-18-032 4 Globin 650817613 390 W Kribbella flavida IFO 14399, DSM 17836 3 Globin 646488807 139 S Globin 650820211 372 W Globin 646486696 153 S Globin 650817614 132 W Globin 646491322 128 S Globin 650819200 142 W Kytococcus sedentarius 541, DSM 20547 1 Truncated hemoglobin 644991181 178 W Xylanimonas cellulosilytica XIL07, DSM 15894 1 Globin 646442515 127 P Leifsonia xyli xyli CTCB07 1 Globin-like protein 637525750 135 P

RESULTS AND DISCUSSION 90

Table 4.11: Coden usage patterns and percentile of codon usage in actinorbacterial haemoglobins (I=Genome name; II=Biotope code; III=COG; IV=DNA sdequence length; V=Nc; VI=Percentile of Nc; VII=GC3; VIII=Percentile of GC3; IX=GC; X=CAI; XI=Percentile of CAI)

I II III IV V VI VII VIII IX X XI I II III IV V VI VII VIII IX X XI 637208528 Streptomyces avermitilis MA-4680 S 405 32.72 98.25826 0.929 102.9477 0.684 0.7 115.0937 646583198 Rhodococcus opacus B4 S 408 34.16 93.02832 0.863 101.6011 0.659 0.619 103.7372 637266913 Streptomyces coelicolor A3(2) S 414 32.75 102.6968 0.945 102.8851 0.691 0.676 106.4064 646609806 Sanguibacter keddieii ST-74, DSM 10542 M 402 29.37 94.74194 0.96 103.1593 0.699 0.77 106.8851 637525750 Leifsonia xyli xyli CTCB07 P 408 40.4 110.6546 0.794 93.51078 0.674 0.529 92.35335 646676912 Stackebrandtia nassauensis LLR-40K-21, DSM 44728 S 441 45.37 135.07 0.775 88.10823 0.603 0.5 78.0884 637664328 Corynebacterium jeikeium K411 M 402 34.76 86.44616 0.775 103.8735 0.634 0.52 114.0851 646679104 S. nassauensis LLR-40K-21, DSM 44728 S 453 33.42 99.4939 0.894 101.6371 0.707 0.716 111.8226 637685922 Thermobifida fusca YX T 567 42.48 115.4034 0.825 98.84975 0.676 0.554 90.27212 646794228 Cellulomonas flavigena 134, DSM 20109 S 384 33.46 110.1745 0.958 101.0655 0.958 0.732 102.5785 637878481 Frankia sp. CcI3 P 423 39.28 106.7681 0.858 100.716 0.681 0.681 105.2225 646804221 Tsukamurella paurometabola 33, DSM 20162 C 408 30.41 84.82566 0.921 107.7697 0.689 0.645 114.1795 637879906 F. sp. CcI3 P 360 38.46 104.5393 0.803 94.25989 0.669 0.577 89.15328 646811125 Thermobispora bispora R51, DSM 43833 T 390 29.74 95.10713 0.976 103.6423 0.703 0.875 115.7867 638087975 Rhodococcus jostii RHA1 S 441 32.67 84.76907 0.904 109.6422 0.671 0.807 109.1868 646816763 Segniliparus rotundus CDC 1076, DSM 44985 M 429 42.36 107.6767 0.765 92.15757 0.653 0.492 99.41402 638098563 Frankia alni ACN14a P 408 32.63 95.40936 0.93 104.5179 0.714 0.717 110.3247 646839507 Nocardiopsis dassonvillei dassonvillei DSM 43111 M 522 27.97 91.8555 0.963 102.72 0.709 0.781 114.8867 638099186 F. alni ACN14a P 357 29.2 85.38012 0.948 106.5408 0.723 0.812 124.9423 646840997 N. dassonvillei dassonvillei DSM 43111 M 648 32.12 105.4844 0.946 100.9067 0.741 0.663 97.52868 638100010 F. alni ACN14a P 429 33.28 97.30994 0.902 101.3711 0.709 0.696 107.0934 646975158 Streptomyces bingchenggensis BCW-1 S 459 29.41 87.68634 0.951 105.9138 0.7 0.751 127.7863 638101691 F. alni ACN14a P 1350 30.05 87.8655 0.918 103.1693 0.752 0.721 110.9401 648135685 Micromonospora aurantiaca ATCC 27029 W 375 30.56 98.54886 0.95 102.1176 0.806 0.648 95.49072 639316939 Corynebacterium glutamicum Kalinowski ATCC 13032 M 396 39.03 83.13099 0.636 111.7358 0.562 0.706 102.2003 648136646 M. aurantiaca ATCC 27029 W 1212 29.18 94.09868 0.956 102.7625 0.758 0.727 107.1323 639730395 Acidothermus cellulolyticus 11B T 399 46.25 111.4995 0.786 99.92372 0.679 0.662 103.1313 648137780 M. aurantiaca ATCC 27029 W 1167 27.6 89.00355 0.911 97.9254 0.717 0.61 89.89095 639775974 Nocardioides sp. JS614 S 375 31.58 90.2028 0.958 111.253 0.699 0.714 118.2903 648137781 M. aurantiaca ATCC 27029 W 390 29.78 96.03354 0.937 100.7202 0.7 0.71 104.6272 639801187 Arthrobacter aurescens TC1 S 483 43.5 99.97702 0.764 103.1596 0.635 0.497 96.5611 648139511 M. aurantiaca ATCC 27029 W 438 27.8 89.6485 0.941 101.1502 0.724 0.752 110.8164 639806461 Mycobacterium vanbaalenii PYR-1 S 375 41 112.0831 0.835 97.44428 0.667 0.583 98.54632 648140713 Micromonospora aurantiaca ATCC 27029 W 1122 28.12 90.68043 0.972 104.4824 0.725 0.787 115.9741 639807530 M. vanbaalenii PYR-1 S 411 39.13 106.971 0.882 102.9292 0.659 0.619 104.6315 649664446 Arthrobacter arilaitensis re117, CIP 108037 S 519 47.88 111.7126 0.594 0.856648 0.552 0.43 95.96072 640172072 Saccharopolyspora erythraea NRRL 2338 S 375 31.07 97.5204 0.94 102.2295 0.683 0.684 104.5232 649719684 Rothia dentocariosa ATCC 17931 M 468 47.12 96.30084 0.694 124.8651 0.576 0.498 100.3223 640456788 Mycobacterium gilvum PYR-GCK S 411 35.79 97.38776 0.921 107.9466 0.679 0.691 115.998 649741818 Rhodococcus equi 103S M 486 33.03 95.35219 0.893 101.8709 0.681 0.493 99.55574 640457487 M. gilvum PYR-GCK S 375 36.69 99.83673 0.835 97.86685 0.659 0.646 108.4438 649748460 Frankia sp. EuI1c P 444 35.25 107.8972 0.891 97.43029 0.703 0.649 95.96333 640475188 Salinispora tropica CNB-440 W 1215 34.19 93.6969 0.888 104.3233 0.702 0.643 109.7832 649748549 F. sp. EuI1c P 450 33.5 102.5406 0.867 94.8059 0.707 0.54 79.84622 640475792 S. tropica CNB-440 W 429 36.4 99.75336 0.821 96.45207 0.695 0.543 92.70958 649751219 F. sp. EuI1c P 459 37.35 114.3251 0.858 93.82176 0.667 0.562 83.09922 640476380 S. tropica CNB-440 W 1212 36.54 100.137 0.865 101.6212 0.684 0.582 99.36828 649751567 F. sp. EuI1c P 375 30.16 92.31711 0.951 103.9913 0.728 0.782 115.6292 640540772 Clavibacter michiganensis michiganensis NCPPB 382 P 396 30.02 100.2672 0.959 101.139 0.705 0.751 107.3624 649752185 F. sp. EuI1c P 438 27.34 83.68534 0.928 101.4762 0.72 0.737 108.9753 640827188 Kineococcus radiotolerans SRS30216 X 450 31.52 103.9578 0.971 102.7078 0.725 0.742 1.065633 649753510 F. sp. EuI1c P 831 42.11 128.895 0.748 81.79333 0.7 0.377 55.74449 641239178 Frankia sp. EAN1pec P 774 37.24 106.1574 0.86 97.84958 0.684 0.597 94.46203 649829399 Intrasporangium calvum 7KIP, DSM 43043 S 387 31.66 94.45107 0.926 1.025584 0.701 0.699 108.6585 641242564 F. sp. EAN1pec P 426 37.13 105.8438 0.898 102.1732 0.7 0.673 106.4873 650465365 Arthrobacter phenanthrenivorans Sphe3 S 1161 33.45 85.72527 0.852 106.4335 0.658 0.677 116.8248 641244174 F. sp. EAN1pec P 420 33.99 96.89282 0.91 103.5385 0.719 0.613 96.99367 650466896 A. phenanthrenivorans Sphe3 S 450 39.86 102.1527 0.847 105.8089 0.635 0.59 101.8119 642590476 Kocuria rhizophila DC2201 S 411 29.06 93.17089 0.96 103.795 0.721 0.653 101.1149 650817613 Verrucosispora maris AB-18-032 W 1173 31.38 95.37994 0.875 97.58002 0.703 0.517 81.48148 643590570 Arthrobacter chlorophenolicus A6 S 450 36.65 97.6032 0.869 106.5343 0.651 0.588 102.6536 650817614 V. maris AB-18-032 W 399 29.22 88.81459 0.868 96.79938 0.684 0.544 85.7368 643606067 Mycobacterium leprae Br4923 M 387 51.8 105.3917 0.68 101.0101 0.594 0.643 102.4212 650819200 V. maris AB-18-032 W 429 32.69 99.3617 0.947 105.6095 0.723 0.69 108.747 643780427 Rhodococcus erythropolis PR4 S 420 40.67 92.8963 0.828 112.8065 0.631 0.623 114.8175 650820211 V. maris AB-18-032 W 1119 32.87 99.90881 0.924 103.0445 0.696 0.659 103.8613 643835428 Corynebacterium aurimucosum CN-1, ATCC 700975 M 390 38.04 90.593 0.767 106.321 0.633 0.482 97.37374 650932389 Corynebacterium resistens DSM 45100 M 402 43.47 89.64735 0.711 112.8751 0.607 0.361 97.04301 643865198 Beutenbergia cavernae HKI 0122, DSM 12333 S 456 34.37 110.3371 0.913 97.29327 0.748 0.684 96.50113 651084320 Mycobacterium tuberculosis CCDC5079 M 393 45.52 109.7926 0.787 99.73387 0.628 0.609 100 643866730 B. cavernae HKI 0122, DSM 12333 S 462 32.02 102.7929 0.945 100.7033 0.741 0.697 98.33521 651086212 M. tuberculosis CCDC5079 M 1077 37.17 89.65268 0.846 107.2107 0.669 0.675 110.8374 644805325 Micrococcus luteus NCTC 2665 W 1185 25.76 85.46782 0.992 105.107 0.703 0.857 129.6717 651156893 Pseudonocardia dioxanivorans CB1190 W 381 36 113.4931 0.93 99.89259 0.685 0.696 104.0514 644805941 M. luteus NCTC 2665 W 1074 29.61 98.24154 0.962 101.9284 0.753 0.69 104.4031 2506866034 Candidatus Frankia datiscae Dg1 P 414 34.59 94.66338 0.9 105.6338 0.71 0.684 104.1096 644941681 Actinosynnema mirum 101, DSM 43827 S 384 28.58 96.97998 0.992 103.7874 0.709 0.762 109.263 2506866885 C. Frankia datiscae Dg1 P 690 40.93 112.0142 0.8 93.89671 0.696 0.576 87.67123 644953730 Catenulispora acidiphila ID139908, DSM 44928 S 384 31.5 95.5414 0.927 103.587 0.677 0.608 105.0086 2508672604 Frankia sp. CN3 P 423 34.84 102.8335 0.903 100.8826 0.702 0.627 91.31955 644960434 C. acidiphila ID139908, DSM 44928 S 390 31.06 94.20685 0.876 97.88803 0.69 0.532 91.88256 2508673600 F. sp. CN3 P 444 39.65 117.0307 0.839 93.73254 0.649 0.615 89.5718 644975270 Saccharomonospora viridis P101, DSM 43017 M 399 34.96 91.83084 0.894 106.3653 0.674 0.779 117.6382 2508675771 F. sp. CN3 P 420 30.07 88.75443 0.977 109.1498 0.739 0.829 120.7399 644991181 Kytococcus sedentarius 541, DSM 20547 W 537 27.51 87.5 0.946 102.6253 0.738 0.496 100.4049 2508676245 F. sp. CN3 P 402 32.11 94.77568 0.944 105.4631 0.734 0.763 111.1273 644997225 Brachybacterium faecium 6-10, DSM 4810 M 447 29.37 95.98039 0.972 103.824 0.73 0.892 108.6083 2508678287 F. sp. CN3 P 450 35.41 104.5159 0.909 101.5529 0.709 0.672 97.87358 644998274 B. faecium 6-10, DSM 4810 M 1302 29.3 95.75163 0.959 102.4354 0.734 0.862 104.9556 2511598615 Kitasatospora setae KM-6054, NBRC 14216 S 417 28.26 98.50122 0.948 99.20469 0.703 0.676 102.2229 645003676 Jonesia denitrificans 55134, DSM 20603 M 408 46.75 99.44693 0.548 88.54419 0.565 0.456 79.18041 2511601179 K. setae KM-6054, NBRC 14216 S 465 26.94 93.90031 0.96 100.4604 0.766 0.733 110.8423 645040233 Nakamurella multipartita Y-104, DSM 44233 W 453 31.56 97.79981 0.921 101.7792 0.72 0.672 103.0201 2511686199 Corynebacterium variabile DSM 44702 P 474 33.71 98.48086 0.884 103.0784 0.696 0.602 111.6883 645040952 N. multipartita Y-104, DSM 44233 W 438 31.26 96.87016 0.906 100.1216 0.701 0.652 99.95401 2511723893 Corynebacterium diphtheriae 241 M 393 39.81 82.6963 0.647 118.9338 0.585 0.525 141.2429 645041024 N. multipartita Y-104, DSM 44233 W 369 35.06 108.6458 0.88 97.24831 0.697 0.656 100.5672 2511978826 Streptomyces cattleya DSM 46488 S 405 28.2 89.24051 0.961 104.9356 0.714 0.718 120.9976 645041591 N. multipartita Y-104, DSM 44233 W 462 41.98 130.0899 0.79 87.30246 0.68 0.497 76.19194 2512744326 Gordonia polyisoprenivorans VH2, DSM 44266 S 507 39.11 103.933 0.79 95.06619 0.603 0.521 94.29864 646419031 Thermomonospora curvata DSM 43183 T 384 25.99 83.43499 0.933 101.3469 0.701 0.702 105.9623 2512745548 G. polyisoprenivorans VH2, DSM 44266 S 417 37 98.3258 0.777 93.50181 0.64 0.53 95.9276 646421851 T. curvata DSM 43183 T 387 35.24 113.13 0.902 97.97958 0.682 0.587 88.60377 2514080769 Actinoplanes missouriensis NBRC 102363 S 759 32.09 98.31495 0.958 105.5647 0.734 0.722 113.0952 646442515 Xylanimonas cellulosilytica XIL07, DSM 15894 P 384 31.95 102.5682 0.925 99.14255 0.701 0.605 93.58082 2514081203 A. missouriensis NBRC 102363 S 315 34.15 104.6262 0.922 101.5978 0.728 0.582 91.16541 646459606 Streptosporangium roseum NI 9100, DSM 43021 S 768 30.96 93.25301 0.921 101.4988 0.714 0.724 106.6431 2517209746 Mycobacterium massiliense GO 06 M 420 43.62 105.8481 0.785 99.80928 0.645 0.489 98.52912 646470423 Rothia mucilaginosa DY-18 M 1035 44.06 119.8912 0.702 99.68759 0.629 0.423 75.0133 2517222784 Corynebacterium ulcerans 0102 M 393 51.65 98.83276 0.664 123.0085 0.608 0.438 125 646472031 R. mucilaginosa DY-18 M 1257 29.79 81.06122 0.799 113.4621 0.62 0.751 133.1796 2518829770 Nocardiopsis alba ATCC BAA-2165 C 504 34.22 99.97079 0.886 101.524 0.697 0.659 111.0175 646486696 Kribbella flavida IFO 14399, DSM 17836 S 462 30.2 94.64118 0.938 102.772 0.749 0.704 105.9124 2518832609 N. alba ATCC BAA-2165 C 864 41.38 120.8881 0.783 89.72155 0.679 0.421 70.92318 646488807 K. flavida IFO 14399, DSM 17836 S 420 32.19 100.8775 0.891 97.62244 0.7 0.642 96.58493 2519499742 Mycobacterium intracellulare ATCC 13950 M 396 38.58 106.281 0.879 101.7479 0.664 0.604 99.39115 646491322 K. flavida IFO 14399, DSM 17836 S 387 29.1 91.19398 0.945 103.539 0.719 0.763 114.7886 2519499937 M. intracellulare ATCC 13950 M 417 34.04 93.7741 0.885 102.4424 0.686 0.629 103.505 646500897 Conexibacter woesei ID131577, DSM 14684 S 459 32.41 107.6387 0.918 98.07692 0.73 0.686 94.24371 2519501035 M. intracellulare ATCC 13950 M 408 37.48 103.2507 0.863 99.89582 0.674 0.605 99.5557 646517211 Geodermatophilus obscurus G-20, DSM 43160 X 390 29.56 96.22396 0.933 99.13931 0.724 0.745 106.902

RESULTS AND DISCUSSION 91

Hbs were found to be high (>80%), Frankia sp. EuI1c was lowly with corresponding low Nc values. It expressed. Hb genes in other Frankia was stated that low Nc value indicate strains were found to be moderately stong codon bias (Botzman and expressed.

Margalit, 2011), thus confirming the 4.3.2.3.2. Percentile calculation: effect of GC compositional constraint The percentile calculation of bHb on these genes. (Table 4.11) was carried out to 4.3.2.3.1. Expression pattern analysis: understand the comparative codon Actinobacterial CAI values showed a behavior amongst Hbs depending upon positive correlation with their GC the biotopes of studied actinobacteria values (Table 4.11). Most of the bHbs and plotted in Figure 4.7. Figure 4.7 have higher CAI values than the showed that majority of plant bHbs average CAI value of their trend towards higher GC3 values with corresponding genome, indicating high lower NC values along with moderate expression level of their Hbs. Hb genes to high expression level. Only of F. alni was found to be putatively exception found in Hb of Frankia sp. highly expressed whereas, Hb genes of EuI1c, which showed high NC values

Figure 4.7: Percentile plot of actinobacterial haemoglobins

RESULTS AND DISCUSSION 92 with corresponding low GC3 and CAI ruled by GC compositional constant values, which signify low level of gene (Wright 1990). In the study, expression. actionobacteria Frankia showed the

To understand the transitional effect of GC compositional constrains efficiency of Hb genes of on their genome and their Hb genes as actinobacteria Frankia, I had plotted well. the GC3 versus NC plot (Figure 4.8) Figure 4.8 showed that the Hb genes with their respective protein coding were present well below of the GC3- genes. GC3 versus Nc plots can be NC curve, which signify that the Hb attributed to examine the factors genes were under natural selection, and responsible for variations in codon far apart from the pressure under the usage among genes and genomes compositional constant, hence comes (Wright 1990). It was also stated that under natural selection on their concerned genes would fall under transitional efficiency. continuous GC3-Nc plot curve, when Therefore for Hb genes of codon usage of a genome is strictly actinobacteria Frankia codon bias were

A B

C D

Figure 4.8: GC3 versus NC plot of Frankia haemoglobins (A=Frankia alni ACN14a; B=Frankia sp. CN3; C=Frankia sp. EAN1pec; D=Frankia sp. EuI1c; In plot, Red coloured circle represent overall protein coding genes, green coloured circles indicate ribosomal proteins and blue coloured circles represent Hb genes)

RESULTS AND DISCUSSION 93 found to be high with persistence of cofactors (Vinogradov et al., 2006). natural selection on the translational 4.3.2.4.1. Functional phylogeny of efficiency on these Hb genes. actinobacterial actinobacterial 4.3.2.4. Functional annotation of haemoglobins: actinobacterial haemoglobins: A binary matrix, followed by Functional annotation revealed that phylogenetic tree was prepared on the identified 121 bHbs were associated basis of the molecular and biological with different gene product features functions of studied bHbs. and hence named accordingly. Figure 4.9 revealed that, actinobacteria It was also found that, the single Hb belonging to same genera may have gene possessed different molecular and different Hbs and clustered according biological functions. Those included to their specified functions. Only 6,7-dihydropteridine reductase, haeme, oxygen and iron binding monooxygenase, nitric oxide functions of all bHbs were found as dioxygenase, acyl-coa oxidase, common.

ATPase, cobalamin biosynthesis, Dendrogram constructd on the basis of electron carrier, flavin adenine Hbs, segregated Frankia into three dinucleotide binding, methionine clusters. Hb genes of Frankia sp. CcI3, synthase and signal transducer Frankia sp. EAN1 pec and Frankia sp. activities. Molecular functions also EuI1C involved with oxygen transport includes heme binding, oxygen and oxydoreductase activity other than binding, iron ion binding, oxygen common heme binding, oxygen transporter, ATP binding, binding and iron ion binding activity, oxidoreductase activity, iron-sulphur therefore positioned in the same cluster binding, hydrolase activity, cluster. metal ion binding and deoxygenase In case of F. alni ACN 14A, amongst 4 activity while biological function are copy of Hb genes, a single Hb gene only restricted to oxygen transport or was found to be associated with NOD NO binding (Jokipii-Lukkari et al., factor along with signal molecule. It 2009) for the studied bHbs. was reviewed in literature that, Frankia The activities of bHbs were highly might have developed a unique dependent on associated functional pathway of synthesis of a novel chitin-

RESULTS AND DISCUSSION 94

Figire 4.9: Functional phylogeny of actinobacterial haemoglobins (Frankia strains were marked with red stars)

RESULTS AND DISCUSSION 95 based signal molecule, which functions stretch similar to pHbs, two were as NOD factor and was independent in similar to bHbs and another two motifs their infection mechanism (Pawlowski were found to be functionally active for et al., 2011). Other 3 copies of Hbs of both pHbs as well as bHbs. Table 4.12 F. alni ACN 14A, were involved with shows the motifs of pHbs and bHb 6,7-dihydropteridine reductase activity, sequences from various actinoHbs nitric oxide dioxygenase activity, heme (actinorhizal pHbs and bHbs). binding, oxygen binding, iron ion One of the seven motifs with 29 amino binding activities and oxydoreductase acid sequence stretch activity, hence clustered separately (NPKLKPHAMKVFVMTCESAVQL from other Frankia strains. RKAGKVT) was found distributed in These different activities of Frankia class II sHb/LHb and nsHb proteins but Hbs might be occurred because of their was totally absent in other pHbs. The different host specific nature as well as sHb of Casuarina posseses this distinct ecological and geographical motif. Similarly, another two motifs distribitoons. were observed with sequence stretch 2 9 4.4. Comparative study amongst (LGLKFFLKIFEIAPSAKKLFSFLRD actionohaemoglobins in gene level: SDVP) and 11 (ALLETIKEAVP) and 4.4.1. Identification of common motifs: found to be present only in class I sHb Sequence comparison revealed that, and nsHbs. However those two there is a distinct homology amongst stretches of amino acid sequences was pHbs according to their classes. A total totally absent in class II pHbs and other of fifteen conserved motifs were ptHbs. A. firma, C.glauca and M. gale observed along with some short-length nsHbs were also possesses those two conserved regions. The fifteen stretches among their amino acid recognized motifs with varying width sequences. were feeds into protein BLAST for Those three distinct motifs with 29, 29 motif confirmation and annotation and 11 amino acid residues further purpose. Result revealed that eleven points towards the functional similarity out of fifteen identified motifs had of class II nsHb proteins with that of similarities with globin family, though sHb/LHbs and class I nsHb proteins seven of them had functionally active with that of sHb.

RESULTS AND DISCUSSION 96

TrHb2_O nol_dh, Mb like Class1 like nsHb Bac_globin, YjbI Bac_globin, YjbI Class2nsHb_LHbs Mot annoMot by Blast Class 1-2_nsHb_LHbs 22 85 no of sites Presence in 32 21 acid length

1.7e-808 21 1.0e-361 68 Class 1-2_nsHb_LHbs 1.1e-883 2.6e-1064 23 70 Class1-2_nsHb_LHbs

nsHb nsHb nsHb nsHb ptHb All pHb Present inPresent E-value Amino

I I pHb Class II 5.5e-1367 29 29 II sHb/LHb, III pHb Class I 2.4e-1227 29 44 VI pHb Class I 4.4e-381 11 40 Class1 like, nsHb Histidi- XI bHb 8.1e-495 15 64 TrHb2_Mt-trHbO-like_O IV No. No. VII VIII ptHb, bHb 1.1e-3009 50 86 TrHb2_Mt-trHbO-like_O, Motif

AIK SFLRDSDVP QLRKAGKVT Motif sequence ALLETIKEAVP WLRHMRAAV AIDETNLFEKLG VLRPMYPEEDLGPAE LGLKFFLKIFEIAPSAKKLF NPKLKPHAMKVFVMTCESAV FTEEQEALVVKSWEAMKKNSA V sHb/LHb, DPEHRAXLWDYLERAAHSLVN X bHb 6.0e-724 21 89 YjbI TrHb2_Mt-trHbO-like_O, EMWSPEMKNAWGEAYDQLVA MQSLQDKASEWSGVAAADAF RLRMFLEQYWGGPRTYSERR GHPRLRMRHAPFPIGPAARDR GGEETFRRLVDRFYERVAADP IX ptHb, bHb 2.6e-1061 21 118 TrHb2_Mt-trHbO-like_O, TLKRLGAVHFKKGVVDEHFEV Table4.12: Motifs identified from actinohaemoglobinsvarious

RESULTS AND DISCUSSION 97 I also observed another motif of (only distributed in class 1 nsHbs) and approximately 21 amino acid residues VRESTL (distributed in class I and II (TLKRLGAVHFKKGVVDEHFEV) nsHbs) has also been found. was widely distributed amongst all Blast annotatin revealed that, motif 8 pHbs including sHbs, nsHbs along with and 9 (Table 4.12) possesses ptHbs. Another two motifs “Truncated haemoglobin YjbI” like approximately 23 activity, which is responsible for (EMWSPEMKNAWGEAYDQLVAAI inorganic ion transport and metabolism K ) a n d 2 1 (Mori, 1999). This unique character (FTEEQEALVVKSWEAMKKNSA) found only in ptHbs similar to bHbs amino acid residues distributed only in but totally absent in other pHbs. This sHb/LHb and nsHb proteins in plants. finding suggests that ptHb share A distinct 32 amino acid stretch similar characteristics with bHbs, r e s i d u e s which made it distinct from other (MQSLQDKASEWSGVAAADAFAI pHbs.

DETNLFEKLG) was observed to be Careful observation revealed present only in ptHbs, but totally conservative amino acid residues absent in other pHbs even in bHbs. changes occurred within each However another two motifs identified functional stretches found to approximately 50 be distributed amongst the number of (RLRMFLEQYWGGPRTYSERRGHP sites of actionoHbs. RLRMRHAPFPIGPAARDRWLRHM 4.4.2. Motif based phylogeny: R A A V ) a n d 2 1 (GGEETFRRLVDRFYERVAADP) A binary matrix was constructed by residues long were identified to be counting the presence of motifs in present in ptHbs along with bHbs. Two pHbs (sHb, nsHb and ptHb) along with motifs with varying width of bHbs (Table 4.13) and calculated the approximately 21 similarities by similarity coefficient (DPEHRAXLWDYLERAAHSLVN) matrix using NTSYSpc2 software and 15 (VLRPMYPEEDLGPAE) (Table 4.14). The similarity between amino acid long were observed to be ptHb and plant sHb /nsHb was found to present only in bHbs. be 35.71%, whereas ptHb showed 50% similarities with bHbs on the basis of Some short stretches like EMKPSS the functionally active stretches present

RESULTS AND DISCUSSION 98

Table 4.13: Distribution of motifs present in plants and actinobacterial haemoglobins (A-K are the motif correspondence to motif number I-XI in Table 4.12) Organism name Type A B C D E F G H I J K Organism name Type A B C D E F G H I J K Organism name Type A B C D E F G H I J K Acidothermus cellulolyticus 11B BHb √ √ G. max LHb-2 √ √ √ √ P. sitchensis NsHb-1 √ √ √ √ Actinoplanes missouriensis NBRC 102363 BHb √ √ √ G. max LHb-2 √ √ √ √ P. sativum LHb-2 √ √ A.missouriensis NBRC 102363 BHb √ √ √ Gordonia polyisoprenivorans VH2, DSM 44266 BHb √ √ Populus tremula × Populus tremuloides PtHb √ √ √ √ Actinosynnema mirum 101, DSM 43827 BHb √ √ G. polyisoprenivorans VH2, DSM 44266 BHb √ √ P. tremula × P. tremuloides NsHb-1 √ √ √ √ √ Alnus firma NsHb-1 √ √ √ √ Gossypium hirsutum NsHb-2 √ √ √ √ Populus trichocarpa PtHb √ √ √ √ Arabidopsis thaliana PtHb √ √ √ √ G. hirsutum NsHb-1 √ √ √ √ √ P. trichocarpa NsHb-1 √ √ √ √ √ A. thaliana NsHb-2 √ √ √ √ Hordeum vulgare PtHb √ √ Pseudonocardia dioxanivorans CB1190 BHb √ √ √ A. thaliana NsHb-1 √ √ √ √ √ H. vulgare NsHb-1 √ √ √ √ √ Psophocarpus tetragonolobus LHb √ √ √ Arthrobacter arilaitensis re117, CIP 108037 BHb √ √ √ Intrasporangium calvum 7KIP, DSM 43043 BHb √ √ √ Pyrus communis NsHb-1 √ √ √ √ Arthrobacter aurescens TC1 BHb √ √ √ Jonesia denitrificans 55134, DSM 20603 BHb √ √ √ Quercus petraea NsHb-1 √ √ √ √ Arthrobacter chlorophenolicus A6 BHb √ √ √ Kineococcus radiotolerans SRS30216 BHb √ √ √ √ Raphanus sativus NsHb-1 √ √ √ √ √ Arthrobacter phenanthrenivorans Sphe3 BHb √ √ √ √ Kitasatospora setae KM-6054, NBRC 14216 BHb √ √ √ Rheum australe NsHb-1 √ √ √ √ A. phenanthrenivorans Sphe3 BHb √ √ √ K. setae KM-6054, NBRC 14216 BHb √ √ Rhodococcus equi 103S BHb √ √ Astragalus sinicus LHb-2 √ √ √ √ Kocuria rhizophila DC2201 BHb √ √ √ Rhodococcus erythropolis PR4 BHb √ Beutenbergia cavernae HKI 0122, DSM 12333 BHb √ √ √ Kribbella flavida IFO 14399, DSM 17836 BHb √ √ Rhodococcus jostii RHA1 BHb √ √ B. cavernae HKI 0122, DSM 12333 BHb √ √ √ K. flavida IFO 14399, DSM 17836 BHb √ √ Rhodococcus opacus B4 BHb √ √ Brachybacterium faecium 6-10, DSM 4810 BHb √ √ √ K. flavida IFO 14399, DSM 17836 BHb √ √ √ Ricinus communis PtHb √ √ √ B. faecium 6-10, DSM 4810 BHb √ √ Kytococcus sedentarius 541, DSM 20547 BHb √ √ √ √ R. communis PtHb √ √ √ Brachypodium distachyon PtHb √ √ √ √ Leifsonia xyli xyli CTCB07 BHb √ √ √ R. communis PtHb √ √ √ √ B. distachyon NsHb-1 √ √ √ √ √ Lotus japonicus NsHb-1 √ √ √ √ √ R. communis NsHb-1 √ √ √ √ Brassica napus NsHb-2 √ √ √ √ L. japonicus LHb-2 √ √ √ √ Rothia dentocariosa ATCC 17931 BHb √ √ √ Canavalia lineata PtHb √ √ √ √ L. japonicus LHb-2 √ √ √ √ Rothia mucilaginosa DY-18 BHb √ √ C. lineata LHb-2 √ √ √ √ L. japonicus LHb-2 √ √ √ √ R. mucilaginosa DY-18 BHb √ Candidatus Frankia datiscae Dg1 BHb √ √ Lupinus luteus LHb-2 √ √ √ √ Saccharomonospora viridis P101, DSM 43017 BHb √ C. Frankia datiscae Dg1 BHb √ √ √ Malus hupehensis NsHb-1 √ √ √ √ √ Saccharopolyspora erythraea NRRL 2338 BHb √ √ √ √ Casuarina glauca NsHb-1 √ √ √ √ √ Malus domestica NsHb-1 √ √ √ Salinispora tropica CNB-440 BHb √ √ √ C. glauca SHb-2 √ √ √ √ Medicago sativa NsHb-1 √ √ √ √ √ S. tropica CNB-440 BHb √ √ √ Catenulispora acidiphila ID139908, DSM 44928 BHb √ √ √ M. sativa LHb-2 √ √ √ √ S. tropica CNB-440 BHb √ √ √ C. acidiphila ID139908, DSM 44928 BHb √ √ √ Medicago truncatula PtHb √ √ √ √ Sanguibacter keddieii ST-74, DSM 10542 BHb √ √ Cellulomonas flavigena 134, DSM 20109 BHb √ √ √ √ M. truncatula LHb-2 √ √ √ Segniliparus rotundus CDC 1076, DSM 44985 BHb √ √ Ceratodon purpureus NsHb-1 √ √ √ √ √ M. truncatula LHb-2 √ √ √ √ Selaginella moellendorffii PtHb √ √ √ √ Chamaecrista fasciculata NsHb-1 √ √ √ √ √ Micrococcus luteus NCTC 2665 BHb √ √ S. moellendorffii NsHb-1 √ √ √ √ √ Cichorium intybus × Cichorium endivia NsHb-2 √ √ √ M. luteus NCTC 2665 BHb √ √ Sesbania rostrata LHb-2 √ √ √ √ Citrus unshiu NsHb-1 √ √ √ √ √ Micromonospora aurantiaca ATCC 27029 BHb √ √ S. rostrata LHb-2 √ √ √ Clavibacter michiganensis michiganensis NCPPB 382 BHb √ √ √ M. aurantiaca ATCC 27029 BHb √ Solanum lycopersicum NsHb-2 √ √ √ √ Conexibacter woesei ID131577, DSM 14684 BHb √ √ √ M. aurantiaca ATCC 27029 BHb √ √ S. lycopersicum NsHb-1 √ √ √ √ √ Corynebacterium aurimucosum CN-1, ATCC 700975 BHb √ √ √ M. aurantiaca ATCC 27029 BHb √ √ Sorghum bicolor PtHb √ √ √ √ Corynebacterium diphtheriae 241 BHb √ √ √ M. aurantiaca ATCC 27029 BHb √ √ √ Stackebrandtia nassauensis LLR-40K-21, DSM 44728 BHb √ √ √ √ Corynebacterium glutamicum Kalinowski ATCC 13032 BHb √ √ M. aurantiaca ATCC 27029 BHb √ √ √ S. nassauensis LLR-40K-21, DSM 44728 BHb √ √ Corynebacterium jeikeium K411 BHb √ √ √ Mycobacterium gilvum PYR-GCK BHb √ √ Streptomyces avermitilis MA-4680 BHb √ √ √ √ Corynebacterium resistens DSM 45100 BHb √ √ √ M. gilvum PYR-GCK BHb √ √ √ Streptomyces bingchenggensis BCW-1 BHb √ √ √ Corynebacterium ulcerans 0102 BHb √ √ √ √ Mycobacterium intracellulare ATCC 13950 BHb √ √ Streptomyces cattleya DSM 46488 BHb √ √ √ Corynebacterium variabile DSM 44702 BHb √ √ √ M. intracellulare ATCC 13950 BHb √ √ Streptomyces coelicolor A3(2) BHb √ √ Datisca glomerata PtHb √ √ √ M. intracellulare ATCC 13950 BHb √ √ Streptosporangium roseum NI 9100, DSM 43021 BHb √ √ √ Euryale ferox NsHb-2 √ √ √ Mycobacterium leprae Br4923 BHb √ √ √ Thermobifida fusca YX BHb √ √ E. ferox NsHb-1 √ √ √ √ Mycobacterium massiliense GO 06 BHb √ √ Thermobispora bispora R51, DSM 43833 BHb √ √ √ Eutrema halophilum PtHb √ √ √ √ Mycobacterium tuberculosis CCDC5079 BHb √ √ Thermomonospora curvata DSM 43183 BHb √ √ √ E. halophilum NsHb-2 √ √ √ √ M. tuberculosis CCDC5079 BHb √ √ T. curvata DSM 43183 BHb √ √ √ Frankia alni ACN14a BHb √ √ √ Mycobacterium vanbaalenii PYR-1 BHb √ √ Trema orientalis NsHb-1 √ √ √ √ √ F. alni ACN14a BHb √ √ √ M. vanbaalenii PYR-1 BHb √ Trema tomentosa NsHb-1 √ √ √ √ √ F. alni ACN14a BHb √ √ √ √ Myrica gale NsHb-1 √ √ √ √ √ Trema virgata NsHb-1 √ √ √ √ F. alni ACN14a BHb √ √ √ √ Nakamurella multipartita Y-104, DSM 44233 BHb Triticum aestivum PtHb √ √ √ √ Frankia sp. CcI3 BHb √ √ √ N. multipartita Y-104, DSM 44233 BHb √ T. aestivum NsHb-1 √ √ √ √ √ F. sp. CcI3 BHb √ √ N. multipartita Y-104, DSM 44233 BHb Tsukamurella paurometabola 33, DSM 20162 BHb √ √ √ √ Frankia sp. CN3 BHb √ √ √ N. multipartita Y-104, DSM 44233 BHb √ Verrucosispora maris AB-18-032 BHb √ √ F. sp. CN3 BHb √ √ √ √ Nocardioides sp. JS614 BHb √ √ √ V. maris AB-18-032 BHb √ F. sp. CN3 BHb √ √ √ Nocardiopsis alba ATCC BAA-2165 BHb √ √ √ V. maris AB-18-032 BHb √ √ √ F. sp. CN3 BHb √ √ √ √ N. alba ATCC BAA-2165 BHb √ √ √ V. maris AB-18-032 BHb √ √ √ √ F. sp. CN3 BHb √ √ √ Nocardiopsis dassonvillei dassonvillei DSM 43111 BHb √ Vicia faba LHb-2 √ √ √ √ Frankia sp. EAN1pec BHb √ √ √ √ N. dassonvillei dassonvillei DSM 43111 BHb √ √ √ Vicia sativa LHb-2 √ √ √ √ F. sp. EAN1pec BHb √ √ √ √ Oryza sativa PtHb √ √ √ Vigna unguiculata LHb-2 √ √ √ √ F. sp. EAN1pec BHb √ √ √ O. sativa NsHb-1 √ √ √ √ √ V. unguiculata LHb-2 √ √ √ √ F. sp. EuI1c BHb √ √ O. sativa NsHb-1 √ √ √ √ √ Vitis vinifera PtHb √ √ Frankia sp. EuI1c BHb √ √ √ √ O. sativa NsHb-1 √ √ √ √ √ V. vinifera NsHb-1 √ √ √ √ √ F. sp. EuI1c BHb √ √ O. sativa NsHb-1 √ √ √ √ √ V. vinifera NsHb-1 √ √ √ √ F. sp. EuI1c BHb √ √ √ √ Parasponia andersonii SHb-1 √ √ √ √ √ Wolffia arrhiza NsHb-1 √ √ √ √ √ F. sp. EuI1c BHb √ √ P. andersonii NsHb-1 √ √ √ √ √ Xylanimonas cellulosilytica XIL07, DSM 15894 BHb √ √ √ √ F. sp. EuI1c BHb √ √ Parasponia rigida NsHb-1 √ √ √ √ √ Zea mays ssp. Mays PtHb √ √ √ Geodermatophilus obscurus G-20, DSM 43160 BHb √ √ √ Phaseolus vulgaris LHb-2 √ √ √ √ Z. mays ssp. Mays NsHb-1 √ √ √ √ Glycine max PtHb √ √ √ √ Physcomitrella patens PtHb √ √ √ Z. mays ssp. Mays NsHb-1 √ √ √ √ √ G. max NsHb-1 √ √ √ √ P. patens PtHb √ √ √ √ Zea mays ssp. Parviglumis NsHb-1 √ √ √ G. max LHb-2 √ √ √ √ P. patens NsHb-1 √ √ √ √ G. max LHb-2 √ √ √ √ Picea sitchensis PtHb √ √ √ √

RESULTS AND DISCUSSION 99

Table 4.14: Similarity coefficient matrix of motifs present in plant and bacterial haemoglobins (sHb=symbiotic haemoglobin; nshb=non-symbiotic haemoglobin; ptHb=plant truncated haemoglo- bin; bHb=actinobacterial haemoglobin) sHb nsHb ptHb bHb sHb 1.00 nsHb 1.00 1.00 ptHb 0.35 0.35 1.00

bHb 0.14 0.14 0.50 1.00 within them. pHbs and their evolution may occurred

I carried forward the analysis by into plant system to serve the inorganic constructing a dendrogram based on ion transport and related metabolism in the similarity coefficient matrix. The plants. resulting dendrogram (Figure 4.10) 4.4.3. Amino acid sequence showed plant sHb and plant nsHbs comparison: were grouped together whereas, ptHbs Sequence comparison of pHbs along grouped separately with bHbs. with bHbs showed some interesting Motif finding and annotation supported observations. A representative of the physiochemical parameter result sequence based comparison amongst that class II nsHbs showed a actinorhizal Hb sequences (A. firma, remarkable resemblance with class II M. gale, C. glauca and D. glomerata) sHb/Lhbs and those properties differ along with Vitreoscilla Hb and bHbs of them from that of class I Hbs. F. alni ACN14a showed in Figure

Motif finding also reflect that ptHbs 4.11. were more similar to bHbs than other Sequence comparison revealed the

Figure 4.10 Motif based phylogeny amongst actinohaemoglobins

RESULTS AND DISCUSSION 100 Figure 4.11: Amino acid sequence of actinorhizal symbiotic, non-symbiotic and truncated haemoglobins along with some selected plant and bacterial hameoglobins (Blue (Blue hameoglobins bacterial and plant selected with some along haemoglobins truncated and non-symbiotic symbiotic, of actinorhizal sequence Amino acid 4.11: Figure residues) polar conserved indicates bar orange and residues apolar conserved indicates bar

RESULTS AND DISCUSSION 101 presence of tyrosine in B10 position, in firma and M. gale, but in C. glauca it actinorhizal Datisca ptHb as well as in was replaced by serine for nsHb and Frankia bHbs, whereas the presence of isolucine for sHb. whereas in Datisca phenylalanine in exchange of tyrosine- ptHb, arginine replaced the F7 lysine, B10 in actinorhizal sHb and nsHbs, has similar to Frankia bHb. been noted. It has been suggested that Smerdon et al., (1993) reported that tyrosine-B10 assist the haem ligand valine in F7 position have higher lignd stabilization activity in globin proteins binding affinity and take part in single of bacterial system (Smagghe et al., hydrogen bond network to fix the 2006). However, Gardner et al., (2000) proximal histidine, whereas serine in reported in E. coli, O2 dissociation F7 position decrease the ligand affinity. constant strongly increase, when This might be pointed that the residue tyrosine-B10 is replaced by present in F7 position in actinoHbs, phenylalanine. Therefore, may be to govern the stereochemistry of proximal make the O2 dissociation constant high, histidine conserved in F8 position and the actinorhizal nsHb and sHb made moreover, this association revealed that the replacement of tyrosine-B10 by the actinoHb proteins were might be phenylalanine, where Datisca ptHb did structurally not related to the not. ferredoxin-NADP+ type reductase and The distal histidine-E7, which was involve in a different mechanism to responsible for binding of ligand reduction of oxidized haem iron into (Gupta et al., 2011), was found to be ferrus form (Karplus et al., 1991). highly conserved in all studied Phenylalanine in CD1 region was actinoHbs along with Hb of Frankia, found to be highly conserved in but was not found in Vitreoscilla Hb. actinorhizal sHb and nsHb along with Frey and Kallio, (2003) reported the other pHbs for reversible oxygen presence of lysine in F7 position, binding (Hargrove et al., 2000). which was said to be responsible for Whereas, Datisca ptHbs showed electron transfer from FADH in replacement of phenylalanine with flavoHb, but finding of this study tyrosine. In Frankia bHb, the revealed that this lysine in F7 position replacement was done by arginine. It is was missing for actinohbs and replaced reported in various cases, that the by valine in actinorhizal nsHbs of A. phenylalanine in CD1 region was

RESULTS AND DISCUSSION 102 replaced by other residues like valine Hargrove et al., (2000). This may (Honig et al., 1990; Ogata et al., 1986) indicate the arrangement of core region or leucyl residue (Keeling et al., 1971) of actinorhizal sHb and nsHb proteins which lowered the affinity of Hb were made by the polar serine and towards oxygen and make that Hb glutamate residues, and covered by the unstable. So, probably the replacement apolar sidechain residues of isoleucine- of phenylalanine in CD1 region was 46, valine-120 and phenylalanine-123, taken place to lower the oxygen which made possible the electrostatic affinity of actinorhizal ptHb. interaction within core region. Result also revealed the conserved However this type of cover-up polar and apolar residues in the core arrangement by apolar residues were region of actinoHb proteins. Apolar found to be absent in Datisca ptHb, isoleucine-46 was found to be similar to Frankia bHb, which conserved in all plant sHb/LHbs and subsequently reflect the packaging of nsHbs, whereas in Datisca ptHb and protein structure in Datisca ptHb was Frankia bHb this was replaced by polar not good enough as like other aspartate. Polar serine-49 was found to actinorhizal pHbs. be conserved only for actinorhizal 4.5. Homology modeling of different nsHb. actinorhizal haemoglobins (in-silico): Polar glutamate-119 residue was found 4.5.1. Template selection and Model to be conserved in actinorhizal sHb and Building: nsHbs along with other plant nsHbs. Apolar valine-120 was found to be The 3D structures of actinorhizal Hb conserved in all pHbs, whereas apolar proteins were predicted by homology phenylalanine-123 was found to be modeling technique to understand the conserved only in actinorhizal nsHbs. structural detail of it. Position specific These residues of glutamate-119, iterative blast (PSI-BLAST) search valine-120 and phenylalanine-123 were found crystal structure of Trema found to be missing in Datisca ptHb tomentosa Hb protein (PDB ID-3QQQ) and Frankia bHb as this region was to be the best template for A. firma, C. absent in truncated Hbs. glauca and M. gale class I nsHb proteins. T. tomentosa nsHb is a In Rice Hb, the presence of polar homodimeric protein consisting of 161 residues in core region was reported by

RESULTS AND DISCUSSION 103

Table 4.15: Template selection of query modeled haemoglobin proteins on the basis of e-value and percentages of sequence similarity

Query protein Type Template PDB E- Similarity ID value Alnus firma nsHb Trema tomentosa 3QQQ 2e−90 83% (BAE75956.1)

Casuarina glauca nsHb Trema tomentosa 3QQQ 2e−91 84% (CAA37898.1) Myrica gale nsHb Trema tomentosa 3QQQ 2e-88 80% (ABN49927.1) Casuarina glauca sHb Lupinus luteus 1GDI 2e−47 54% (P08054.2) Datisca glomerata ptHb Arabidopsis 4CON 2e-94 78% (CAD33536.1) thaliana residues in each chain, with an e-value glomerata ptHb. The template was of 2e-96 to 2e−90 and sequence found 175 amino acids long and has similarities of 80% to 84% with query contained 78% sequence similarity proteins correspondingly. Table 4.15 with query protein (Table 4.15). shows the sequence similarity of query The modeled structure of actinorhizal proteins with templates along with sHb (C. glauca), nsHbs (A. firma, C. their accession number and blast e- glauca and M. gale) and ptHbs (D. values. glomerata) were found to be The Hb protein from Lupinus luteus homodimer with each subunit (PDB ID -1GDI) with e-value of 2e−47 comprising a hetero atom of heme was found 54% similar to that of C. group and containing Fe (iron) at the glauca (P08054.2) sHb. I have not centre, which is the core functional found much satisfactory template for region of the protein (Figure 4.12). this protein, so, I used threading 4.5.2. Validation of the crude modelled technique also. However, the result of structure: homology modeling was found better Validations of crude models were than threading result (data not shown). assessed by various softwares and Similarly, A. thaliana (4CON) ptHb algorithms to estimate their accuracy. (Reeder et al., 2014) was found to be Table 4.16 depicts the detail the best match for actinorhizal D. information of the actinorhizal Hb

RESULTS AND DISCUSSION 104

Figure 4.12: 3D structure of A. firma class I non-symbiotic haemoglobin

Table 4.16: Validation of crude actinorhizal haemoglobin protein models by various algorithms

Algorithms nsHb sHb ptHb

Alnus Casuarina Myrica Casuarina Datisca firma glauca gale glauca glomerata ProSA −7.35 −7.72 -7.57 −7.01 -6.62

VERIFY 3D (%) 81 80 81 83 89.41

Errat (%) 98-83 98.68 97.98 99.34 99.33

Ramachandran plot (%) 94.1 93.4 92.7 92.9 96.3

RESULTS AND DISCUSSION 105 protein structures validation results. validation results of modeled A. firma Figure 4.13 (A, B) showed the Hb protein.

A

B

4.13 (A)=Ramachandran plot of A. firma modeled protein; 4.13 (B)=Plot shows Z-score of modeled A. firma non-symbiotic haemoglobin in PDB determined by X-ray crystallography (light blue) or NMR spectroscopy (dark blue) with respect to their length; The Z-score of A. firma non-symbiotic haemoglobin protein is highlighted with large dot

RESULTS AND DISCUSSION 106

Ramachandran plot (Ramachandran et actinorhizal sHb and ptHb showed al., 1963) of the modeled actinorhizal 83% and 89.41% residues with 3D-1D nsHbs of A. firma, M. gale and C. score >0.2. ERRAT evaluation glauca illustrating that, 92.7 – 94.1.4% revealed a quality factor ranged of the total residues were present in the between 97.98% - 99.33% for all most favored regions which are non- studied proteins. The normally glycine and non-proline residues, while accepted range of high quality model is C. glauca sHb exhibited 92.9% and D. <50 (Colovos and Yeates, 1993). In glomerata ptHb showed 96.3% of the current cases, the ERRAT score was total residues present within the most well within the range of a high quality favored regions. The refined model models. PROVE, VERIFY 3D and was analyzed by different algorithms ERRAT results for all modelled for the evaluation of the model quality. proteins illustrated that the overall The z-score obtained from ProSA quality of the models were good analysis specifies the overall quality enough. and an extent to which the total energy RMSD calculations established that the of the modeled structure flows from actinorhizal Hb proteins had a energy distributions of the random deviation of approximately 0.51-0.53Å conformations (Wiederstein and Sippl in the Cα residues and 0.55-0.59Å in 2007). Modeled nsHbs of A. firma, M. the backbone residues, from the gale, C. glauca with C.glauca sHb and template proteins. This suggests that D. glomerata ptHb proteins had z- there was not much significant scores of −7.35, -7.57, −7.72, −7.01 deviation between the template and -6.62 respectively. These results proteins and the modelled proteins. specify that the protein models were These results implied that the pretty good and the outcome of the stereochemical properties and quality energy plot signifies that 3D models of of modelled structures were quite Hb proteins are reliable and precise. consistent. The modeled proteins were further Modeled protein structures were analyzed by VERIFY 3D. This analysis submitted in Protein Model DataBase revealed that 80 - 81% of the other (PMDB) repository for further study actinorhizal nsHbs protein residues had and the accession ids generated are an average 3D-1D score >0.2, whereas given below:

RESULTS AND DISCUSSION 107

A. firma (nsHb): PM0080995; C. modeled nsHbs. Molecular volume of glauca (nsHb): PM0080996; M. gale sHb and ptHb were found to be 25747 (nsHb): PM0080997; C. glauca (sHb): and 30272Å respectively. Whereas in PM0080998 and D. glomerata (ptHb): modeled nsHb proteins, molecular PM0080999. volume were found to be 51371-

4.5.3. Characterization of modeled 51878Å. Similarly, surface area was actinorhizal haemoglobin proteins: found comparatively lower in Casuarina sHb and Datisca ptHb i.e. To understand the characteristics of 5781 and 7479Å than A. nepalensis, M. modeled proteins, I considered the gale and C. glauca nsHb. They showed monomer of each homodimeric their surface area was within 10910- modeled protein structures. Analyses 11058Å. A common beta turn was of actinorhizal modeled proteins were identified in modeled nsHb protein tabulated in Table 4.17. structures i.e. Asn57-Leu60 and the Less helix-helix interactions and absent sequence lying in it was NVSL/PL of gamma turn made the Casuarina shb which was totally absent in sHb and distinct from studied actinorhizal ptHb protein structures. Structure modeled nsHbs and that of Datisca analysis also revealed that Casuarina ptHb. Datisca ptHb showed more helix sHb do not contained any major -helix interactions and more gamma channel in their structure. It was turn in its structure compared to reported that dissociation of ligand

Table 4.17: Characteristics of modeled actinorhizal haemoglobin proteins

nsHb sHb ptHb A. C. M. C. D. firma glauca gale glauca glomerata Helices 10 11 9 9 11 Helix-Helix interaction 17 16 17 12 24 Beta turn 3 1 3 5 3 Gamma turn 1 1 1 - 6 Molecular volume(Å) 51371 51714 51878 25747 30272 Surface area(Å) 10993 11058 10910 5781 7479 Sphericity(Ψ) 0.61 0.61 0.62 0.73 0.63 Effective radius(Å) 14.2 14.03 14.26 13.36 12.14 Nest 2 3 2 1 2 Channel 2 2 2 - 2 Pockets 23 29 24 22 31

RESULTS AND DISCUSSION 108 makes the channel open and allow to templates were recognized in those ion transport (Gupta 2014). The protein structures. The modeled channels of nsHbs were found larger structures also contain HEM ligand and than truncated globin protein. OH ligand in the position at 172 and

Binding sites analysis revealed the 171 respectively. A CL ligand has also presence of various clefts and cavities been found to be present at 170 residue in the surface of the modeled proteins. position of the modelled proteins. These clefts and cavities are very 4.5.4. Conformational dynamics study: important as the size of clefts in protein NsHb of A. firma and C. glauca surface determine the interaction of showed immense resemblance amongst protein with other molecules. The them in atomic displacement plot active site is usually characterized by a (Figure 4.14), where the highest value large and deep cleft with conserved corresponds to the most displaced amino acid residues (Laskowski et al., regions of the subjected protein model. 2005). Structure analysis of Although it can easily be recognized actinorhizal Hb proteins revealed the that normalized atomic displacement presence of 23, 29, 24, 22 and 31 plot of Casuarina sHb and Datisca pockets in A. firma, C. glauca and M. ptHb shows dissimilarities with gale nsHb, sHb of Casuarina and actinorhizal nsHb proteins. NMA Datisca ptHb proteins respectively indicated the thermal and vibrational with varying area and volume. A properties of the protein at the atomic common nest was identified amongst level. A. firma, C. glauca and M. gale modeled nsHb proteins, i.e. Ala87(A), nsHb revealed their deformation Gly88(A), Lys89(A). These nests were energies found to be 1717.73, 1701.18 structurally essential to forming and 1713.89, whereas the Casuarina concave depression which actually sHb protein showed deformation served as binding sites for an atom or a energy of 1779.21, and Datisca ptHb cluster of atoms (Watson and Milner- showed deformation energy 1175.93, White, 2002). Another common nest of which was found subsequently lowest His-Gly-Val was identified though than other actinorhizal Hb proteins and their position was not conserved for signify that protein can deformed very actinorhizal pHbs. No DNA binding easily. The deformation energies also templates and enzyme active site implied that the seventh mode of

RESULTS AND DISCUSSION 109

A B

C D

Figure 4.14: Image showed the normalized atomic displacement plot of actinorhizal haemoglobins (A=A. firma nsHb, B=C. glauca nsHb, C=C. glauca sHb, D=D. glomerata ptHb) Casuarina sHb had comparatively computed and observed B factors of A. large rigid regions than that of nsHbs firma, C, glauca and M. gale nsHb and ptHbs. proteins.

Overall B factor analysis revealed very Solvent accessibility analysis of the low negative correlations between the modelled actinorhizal Hb proteins

RESULTS AND DISCUSSION 110

Figure 4.15: Solvent accessibility plot of the A. firma non-symbiotic haemoglobin protein pointed that the accessible residues outer surface and Datisca ptHb shows were present on the outermost region. both positive and negative charges at Figure 4.15 shows the solvent its outer most surfaces. It has been also accessibility plot of the actinorhizal A. found that the core region was firma nsHb protein. It was also found constructed by apolar and polar that A.firma, C, glauca nsHb had conserved residues and was found to be positively charged residues, hydrophobic in nature. predominantly present on their 4.6. Evolutionary trend of plant outermost surface. However, majority truncated haemoglobins: of negatively charged residues and 4.6.1. Structural similarity analysis polar uncharged residues were present amongst plant and actinobacterial on the outermost surface of Casuarina haemoglobins: sHb and hydrophobic residues were confined to the inner rings of the spiral. To explore the similarity between 3D Myrica also found to be possess structures of the modelled proteins negatively charged residues on their with those of the randomly selected Hb

RESULTS AND DISCUSSION 111 proteins already present on the PDB, I 4.6.2. Structure based phylogenetic used multiple structural alignment analysis: algorithms MUSTANG (Konagurthu et The resulting phylogenetic tree (Figure al., 2006) to carry out the structural 4.17) based on their PDB structures, superposition. A Lesk-Hubbard plot revealed that ptHb shared same clade was constructed by using the number with bHbs, whereas plant sHb and of alpha carbon atom plotted against nsHbs placed in a different clade. their corresponding RMSD values This finding suggested that ptHbs were (Figure 4.16). Resulting plot revealed structurally different from other pHbs, two prominent clusters. One cluster and structural modification happened contained all of pHbs (LHbs, nsHbs in the side chain region which leads and ptHbs) and the other cluster different functionality than that of included bacterial, algal and protozoan other pHbs. Hbs. This result may indicate that the core structure of all Hbs are similar and 4.6.3. Functional divergent analysis: they subsequently may alter their 4.6.3.1. Phylogenetic tree construction pathways leading to different on the basis of amino acid sequences: functionality. To validate this idea, a The phylogenetic tree comprised of structure based phylogeny followed by 217 Hb proteins (both pHbs and bHbs functional divergent analysis was present in public domain) showed performed.

Figure 4.16: Lesk-Hubbard plot to identify the similarity between plant and bacterial haemoglobins

RESULTS AND DISCUSSION 112

Figure 4.17: Structure based phylogenetic tree amongst selected actinohaemoglobins

P

t

H

b

s

B H b

s

s

b

I

H

I I

S

s s

b b

H H

s s N N

Figure 4.18: The phylogenetic tree comprised both plant and actinobacterial haemoglobins (Grey por- tions are some actinobacterial haemoglobin proteins which are not considered for study but had taken to construct the phylogenetic tree)

RESULTS AND DISCUSSION 113 some distinct patterns (Figure 4.18). Hence, it may assumed that ptHb are Plant nsHb class II and sHb grouped in sequentially more related to bHbs a clade. Plant nsHb class I proteins rathers that other set of pHbs. were found to be very close to the 4.6.3.2. Functional divergent analysis: above mentioned clade which revealed In this study, I tried to analyze the that three sets of Hbs were shared reason behind different functionality common ancestor. However, ptHb among group members of pHbs group was found to be placed at a compared to bHb proteins. I had distance from the rest of the pHbs. considered five distinct sub-clusters - PtHbs and bHbs clustered adjacent to ptHbs, bHbs, nsHb1, nsHb II and LHbs each other and some ptHb proteins also from the sequence based phylogenetic merged with bHb proteins outside the tree (Figure 4.18). concerned grouped area of Figure 4.18.

Table 4.18: Functional divergent analysis of member proteins of plant haemegobin in contrast to bac- terial haemoglobins (MFE θ=Estimate of θ by the model-free method; MFE z-score=The z-score for the model-free estimate of θ after Fisher’s transformation; θ ML=Maximum likelihood estimate of θ; α ML=Maximum likelihood estimate of α) TYPE I: FUNCTIONAL DIVERGENT ANALYSIS Clusters to be com- ptHb/bHb ptHb/LHb ptHb/ ptHb/ pared nsHbI nsHbII

MFE θ 0.334903 0.668729 0.846163 0.784613

MFE z-Score -1.830687 -3.316707 -3.189672 -3,543235

θ ML 0.381600 0.704800 0.673600 0.842400

α ML 0.896813 0.905488 0.888218 0.714678 TYPE II: FUNCTIONAL DIVERGENT ANALYSIS Average no. of amino 4.390572 4.594138 3.223357 2.870381 acid (aa) substitution among two clusters Number of conservative 12.000000 6.000000 6.000000 5.000000 residues among two clusters Percentage of conserva- 10.000000 10.000000 12.000000 11.000000 tive amino acid changes

Percentage of radical 19.000000 25.000000 23.000000 25.000000 amino acid changes

RESULTS AND DISCUSSION 114

Pair-wise estimation of type I and type considering the number of conservative II functional divergent analysis data of residues among the studied clusters, pHbs and bHbs are shown in Table ptHB/bHB showed the highest value 4.18. (12) compared to others. Next the

In type I analysis, (Table 4.18) p-value percentages of conservative and radical <0.05 were obtained with θ>0 from the amino acid changes were considered z-score, which point towards null for analysis. This study revealed, that, hypothesis and signify that ptHB/bHB cluster showed less number evolutionary rate of the amino acid of percentage of conservative as well residues has shifted between gene as radical changes than others. clusters. Percentages of conservative changes were found to be more in ptHb/nsHbI For type II analysis I had considered followed by ptHb/nsHbII and ptHb/ four parameters, those are no. of LHb. However, in case of radical conservative residues among two changes the value was same for ptHb/ clusters, average number of amino acid nsHbII and ptHb/LHb followed by (aa) substitution, radical amino acid ptHb/nsHbI. changes and conservative amino acid changes. From the study, I found that, To determine the substitution types the aa substitution number for ptHB/ among the subgroups, a pair-wise bHB (4.36) was relatively more than comparison was visualized among the average aa substitution ptHB/pHB ptHb/btHb, ptHb/LHb and ptHb/nsHbI (3.55) (ptHB/pHB includes ptHb/LHb, and ptHb/nsHbII sub-clusters and all of ptHb/nsHbI and ptHb/nsHbII). While the clusters were compared separately

Figure 4.19: Substitution rate among plant and bacterial haemoglobin clusters

RESULTS AND DISCUSSION 115

(Figure 4.19). Both radical and 4.7. Expression study of conservative replacements were haemoglobins of A. nepalensis in noticed among group members. A different plant region: comparison of ptHbs/bHbs produced a 4.7.1: Preparation of seedlings for lower divergence rate, while a higher plant infectivity test: divergence rate was found in the ptHB/ Test showed 80-90% of the seeds pHB. From the result (Table 4.18), successfully germinated in BOD radical replacement was found to be incubator at 31°C within 5-7 days dominant over the conservative (Figure 4.20). It was found from plant replacement at an early stage. infectivity test that, the seeds overnight In type II functional divergence, gene soaked in aerated water gave the best duplication may occur at an early result for germination. stage, which leads to radical amino 4.7.2. Plant infectivity tests: acid changes. This occurrence may One month old seedlings were suggest that ptHbs were much more transferred into plant culture container similar to bHbs. Functional divergent with different concentration of also suggested the protein level Hoagland solution. Hoagland solution functional comparableness of ptHbs (1/6th concentration; at pH-7) gave the with bHbs. So, probably ptHbs were best result for growth of A. nepalensis originated from bacterial system and seedlings. their subsequent evolution leads them Three sets i.e. naturally grown matured to get transferred into plant system. A. nepalensis (set-I), seedlings inoculated with surface sterilized

A B

Figure 4.20: (A)=Microscopic view of A. nepalensis winged seed; (B)=Germinated A. nepalensis seeds in BOD incubator

RESULTS AND DISCUSSION 116

Table 4.19: Detail result of nodulation during plant infectivity test % of Nodulation Time required for Time required for deformation nodule initiation

Set II 60-70 2-3 days 17-21 days

Set III Nil Not applicable Not applicable crushed nodules (set-II) under test, and The newly developed nodules were un-inoculated seedlings (set-III) were brownish in colour and 1-1.5 mm in used for infectivity tests. Table 4.19 diameter. On the other hand, negative showed that 4-5 pre-nodules per A. controls (un-inoculated seedlings) nepalensis seedlings were formed of showed stunted growth in comparison which only 1-2 of them developed into to inoculated seedlings and at no stage complete nodule. Root hair of the experiment were nodules formed deformation took place within 2-3 days in them. and the nodules were observable under 4.7.3. Expression study of haemoglobin naked eyes at a minimum of 17-21 genes by Real Time-Polymerase Chain days after inoculation. In case of A. Reaction: nepalensis seedlings inoculated with crushed nodules showed 60-70% of A partial mRNA gene sequence of 483 nodulation in their root hair. Therefore nucleotide and corresponding 161 plant infectivity results showed that A. amino acid residues was identified nepalensis seedlings with crushed from A. nepalensis. Blast result nodules were good for root hair revealed that, the identified partial gene infection followed by nodulation in sequence showed highest similarities laboratory environment. with A. firma nsHb, both in gene (74%) as well as in protein (97%) level. The Early development of nodules was identified sequence is given below. reported in Rhizobium-induced root nodules of Parasponia rigida (Lancelle Nucleotide. sequence: et al ., 1985) whereas C . ATGAACACCCTGGAGGGCAGGG cunninghamian Miq. showed GCTTCACCGAGGAGCAGGAGGC nodulation from 19 to 24 days after CCTGGTGGTGAAGAGCTGGAAC inoculation (Torrey, 1978). GCCATGAAGCCCAACGCCGGCG

RESULTS AND DISCUSSION 117

AGCTGGGCCTGAAGTTCTTCCTG Amino acid sequence:

AAGATCTTCGAGATCGCCCCCAG MNTLEGRGFTEEQEALVVKSWNA CGCCCAGAAGCTGTTCAGCTACC MKPNAGELGLKFFLKIFEIAPSAQK TGAAGGACAGCCCCATCCCCCTG LFSYLKDSPIPLERNPKLKSHAMT GAGAGGAACCCCAAGCTGAAGA VFLMTCESAVQLRKAGKVTVRES GCCACGCCATGACCGTGTTCCTG SLKKLGAVHFKHGVVDEHYEVTK ATGACCTGCGAGAGCGCCGTGCA FALLETIKEAVPEMWSPEMKIAWG GCTGAGGAAGGCCGGCAAGGTG EAYDQLVAAIKSAMKPSS ACCGTGAGGGAGAGCAGCCTGA The result of Real Time-Polymerase AGAAGCTGGGCGCCGTGCACTTC Chain Reaction (RT-PCR) is shown in AAGCACGGCGTGGTGGACGAGC Figure 4.21 which revealed that the ACTACGAGGTGACCAAGTTCGCC relative amount of transcript obtained CTGCTGGAGACCATCAAGGAGG in different plant parts are extremely CCGTGCCCGAGATGTGGAGCCCC variable. It was found that expression GAGATGAAGATCGCCTGGGGCG AGGCCTACGACCAGCTGGTGGCC level of nsHb gene was highest in GCCATCAAGAGCGCCATGAAGC nodules compared to other plant parts CTTCTTCTTAG in set I and II. The relative amount of

A B C

Figure 4.21: Expression of A. nepalensis non-symbiotic haemoglobin in different plant parts (the numbers on the bar indicate the amounts of relative abundance in respect to stem); (A): The amount of transcript estimated by quantitative RT-PCR in different plant parts of naturally occurring mature A. nepalensis; (B): The amount of transcript of young A. nepalensis, after 15 months of inoculation with crushed nodules of wild mature A. nepalensis; (C): Amount of transcript of 15 months old un- inoculated A. nepalensis

RESULTS AND DISCUSSION 118 transcript of Hb (in set I), in naturally case is almost two fold than that of the occurring mature A. nepalensis nodules set II and even higher than naturally and root region was enhanced occurring plant roots. significantly, up to approximately 310 High expression of Hb in nodule is times and 37 times respectively than quiet natural since nodules are the main that of stem region. However no sights where Frankia recites and it was significant enhancement of Hbs in leaf said that nsHb helps, the interaction of region was found (7 times relative to Frankia in root-nodules (Sasakura et stem region). Similarly in set II, al., 2006). relative amount of transcript of Hb Functional annotation revealed that Hb enhanced significantly in nodules of F. alni contain NOD factor which followed by root region, i.e. subsequently develop a unique approximately 220 times and 24 times pathway to synthesise chitine based respectively than that of Hb gene signal molecule and help in the transcript of stem region, but no interaction of host and microsymbiont significant enhancement in leaf region to facilitate symbiosis. was found. Since in the set III, it was not In set III, since the seedlings were un- inoculated with any Frankia source, inoculated with crushed nodules, no the expression of nsHb in the root is nodules were formed, hence I could not higher since they may vigorously measure the expression of Hb in this searching for their microsymbiont. case. However significantly the expression level of Hbs in root in this

CONCLUSION Conclusion

I started the present work in the spring symbiosis and the conclusions derived of 2013. My prime idea was to study are given below: Alnus nepalensis, plant makes I found that A. nepalensis, is symbiosis association with Frankia and distributed with a geographical in turns increases the fertility of soil. I boundary of 26°87′78″ to 27°34′71″N was particularly interested because, and 88°25′00″ to 88°61′76″E and an being grown in sub-Himalayan region altitude ranging from 3616ft to 7598ft of West Bengal, I have always noticed in sub Himalayan West Bengal and that A. nepalensis is one of the Sikkim. Each micro-ecological factors important species which visits early in were found to be equally important for the landslide regions. Alnus therefore is proper growth of this plant in fragile an excellent example of successional landslide area with specifically 19% to plant. My particular interest however 31% relative humidity and acidic soil the haemoglobin (Hb) management of conditions. Study of Alnus-Frankia Alnus, since beginning, I wondering symbiosis showed that they can exist how Alnus-Frankia symbiosis can under extremely variable soil nutrient manage to protect the nitrogenase from conditions but favors moderate to high oxygen, as it is known that nitrogenase soil carbon. are oxygen labile and also thought Population genetics study revealed that whether Hb present in plant and entire population of A. nepalensis in bacteria may play any role or not in sub-Himalayan West Bengal and this regard. Sikkim were clusters depending on Here I studied the Hb genes and their geographical distribution. The proteins in respect to Alnus-Frankia species within selected area were found

CONCLUSION 120 to be genetically close to each other and Datisca glomerata) showed with minor exceptions. A. nepalensis general trend with other plant Hbs collected from population I showed (pHbs), whereas nsHb M. gale had high and III showed low percentage of higher oxygen binding capacity polymorphism amongst studies compared to other actinorhizal class I locations. nsHbs.

It has been also found that A. Sequence based analysis of actinoHbs nepalensis collected from eastern part (actinorhizal pHbs and actinobacterial (population I) of Darjeeling hills were Hbs (bHbs)) revealed that Datisca genetically more similar to western ptHb shared two common functionally part (population II) of Darjeeling hills, active stretches with bHbs consisting while genetically distant from of 50 and 21 amino acid sequences, Kalimpong and Gangtok hills containing “YjbI” like activity, which (population III), which situated other is responsible for inorganic ion side of river Teesta. River Teesta transport and metabolism. This unique bisects collection sites of population I character found only in ptHbs similar and II with III, keeping population I to bHbs but totally absent in other and II on its right and population III on pHbs. Sequence comparison also its left bank respectively, and may act revealed that residue present in F7 as geographical barrier for dispersal of position in actinoHbs, govern the germplasm of A. nepalensis in studied stereochemistry of proximal histidine region. conserved in F8 position and moreover,

In-sillico analysis of the this association revealed that the physiochemical properties and actinoHb proteins were might be functionally active motif annotation structurally not related to the revealed that class II non-symbiotic ferredoxin-NADP+ type reductase and Hbs (nsHbs) showed a remarkable ion transport mechanism but involve in resemblance with class II symbiotic Hb a different mechanism to reduction of (sHb)/legHbs (Lhbs) and those oxidized haem iron into ferrus form. properties differ them from that of This finding may suggest that class I Hbs and ptHbs. Results also actinorhizal pHbs are associated with showed that actinorhizal Hbs (Alnus the mechanism which is not related to firma, Casuarina. glauca, Myrica gale inorganic ion transport and to

CONCLUSION 121 substantiate this property ptHbs may apolar sidechain residues of isoleucine- evolve into plant system from bacteria. 46, valine-120 and phenylalanine-123,

Sequence comparison of actinoHbs which made possible the electrostatic also revealed that single amino acid interaction within core region. replacement of tyrosine-B10 by However this type of cover-up phenylalanine is responsible for higher arrangement by apolar residues were found to be absent in Datisca ptHb, O2 dissociation constant in the actinorhizal nsHb and sHb than ptHbs similar to Frankia bHb, which and bHbs and replacement of subsequently reflect the packaging of phenylalanine in CD1 region lower the protein structure in Datisca ptHb was oxygen affinity in actinorhizal ptHb. not good enough as like other actinorhizal pHbs (sHb, nsHbs). Codon usage properties revealed that Frankia Hbs are codon biased and Structure based analysis revealed a expressed in a moderate to high common nest amongst modeled nsHb manner and depending upon their 1) proteins, i.e. Ala87(A), Gly88(A), GC compositional constrains and 2) Lys89(A), which substantiated their natural selection on their transitional functionality and made the protein efficiency. Whereas functional stable. Whereas conformation annotation revealed that the different dynamics revealed that Casuarina sHb functions associated within same and Datisca ptHb shows dissimilarities genera of bHbs are depending on 1) with actinorhizal nsHb proteins. their host specificity and 2) eco- Datisca ptHb showed deformation geographical habitat. energy subsequently lowest than other actinorhizal Hb proteins and signify Homology modeling revealed that the that protein can deformed very easily, protein structure of actinorhizal Hbs and therefore confirms the result (sHb, nsHb and ptHb) are homodimeric obtained from physiochemical data that with heme cluster of hetero atom in the Datisca ptHb protein is unstable than centre of each monomer. The that of other actinorhizal Hbs. arrangement of core region of actinorhizal sHb and nsHb proteins Phylogenetic analysis based on were made by the polar serine and structural elucidation along with motif glutamate residues, and covered by the analysis and functional divergent analysis pointed that plant truncated

CONCLUSION 122

Hbs (ptHbs) showed lower divergence inoculated with Frankia, but the rate with bHbs than that of other pHbs. expression level is significantly high in The evolution of ptHbs might have untreated plant root region. This might taken place to overcome the inorganic be due to its search to find out their ion transport and related metabolisom microsymbiont Frankia for interaction. into plant system. Therefore from this study I may A partial mRNA Hb gene from A. conclude that by improving the nepalensis was identified, which efficiency of Hb genes in both host showed 96% similarities with class I plant and its microsymbiont, the global nsHb of A. firma. The expression study nitrogen balance may be improved, this of Hb genes depicted that Hb expresses will in turn benefits the society and in an elevated manner in nodules, when ecological wellbeing.

BIBLIOGRAPHY Bibliography

Agarwal, G., Rajavel, M., Gopal, B., & Sarath, G., Klucas, R. V., & Arredondo- Srinivasan, N. (2009). Structure-based Peter, R. (2001). Cloning and expression phylogeny as a diagnostic for functional analysis of hemoglobin genes from maize characterization of proteins with a cupin (Zea mays ssp. mays) and teosinte (Zea fold. PloS one, 4(5), e5736. mays ssp. parviglumis). Biochimica et Biophysica Acta (BBA)-Gene Structure Ahmad, S., Gromiha, M., Fawareh, H., & Sarai, A. (2004). ASAView: database and and Expression, 1522(1), 1-8. tool for solvent accessibility representation Arredondo-Peter, R., Hargrove, M. S., Sarath, in proteins. BMC bioinformatics, 5(1), 51. G., Moran, J. F., Lohrman, J., Olson, J. S., & Klucas, R. V. (1997). Rice hemoglobins Akkermans, A. D. L., & Hirsch, A. M. (1997). A reconsideration of terminology in (gene cloning, analysis, and O2-binding Frankia research: a need for congruence. kinetics of a recombinant protein synthe- sized in Escherichia coli). Plant Physiol- Physiologia Plantarum, 99(4), 574-578. ogy, 115(3), 1259-1266. Alexandrov, V., Lehnert, U., Echols, N., Mil- burn, D., Engelman, D., & Gerstein, M. Arredondo-Peter, R. (2011). Evolutionary rates (2005). Normal modes for predicting pro- of land plant hemoglobins at the protein tein motions: a comprehensive database level. Global J Biochem, 2(2), 81-95. assessment and associated Web tool. Pro- Ascenzi, P., Bolognesi, M., Milani, M., tein science, 14(3), 633-643. Guertin, M., & Visca, P. (2007). Altschul, S. F., Madden, T. L., Schäffer, A. A., Mycobacterial truncated hemoglobins: from genes to functions. Gene, 398(1), 42- Zhang, J., Zhang, Z., Miller, W., & Lip- man, D. J. (1997). Gapped BLAST and 51. PSI-BLAST: a new generation of protein Ascenzi, P., Ciaccio, C., Gasperi, T., Pesce, A., database search programs. Nucleic acids Caporaso, L., & Coletta, M. (2017). research, 25(17), 3389-3402. Hydroxylamine-induced oxidation of ferrous carbonylated truncated Anderson, C. R., Jensen, E. O., Llewellyn, D. hemoglobins from Mycobacterium J., Dennis, E. S., & Peacock, W. J. (1996). A new hemoglobin gene from soybean: a tuberculosis and Campylobacter jejuni is role for hemoglobin in all plants. Proceed- limited by carbon monoxide dissociation. JBIC Journal of Biological Inorganic ings of the National Academy of Sciences, Chemistry, 1-10. 93(12), 5682-5687. Appleby, C. A., Tjepkema, J. D., & Trinick, M. Bailey, T. L., Johnson, J., Grant, C. E., & No- J. (1983). Hemoglobin in a nonleguminous ble, W. S. (2015). The MEME suite. Nu- plant, Parasponia: possible genetic origin cleic acids research, 43(W1), W39-W49. and function in nitrogen fixation. Science, Basistha, B. C., Sharma, N. P., Lepcha, L., 220(4600), 951-953. Arrawatia, M. L., & Sen, A. (2010). Ecol- Appleby, C. A. (1984). Leghemoglobin and ogy of Hippophae salicifolia D. Don of temperate and sub-alpine forests of North Rhizobium respiration. Annual review of Sikkim Himalayas—a case study. Symbio- plant physiology, 35(1), 443-478. sis, 50(1-2), 87-95. Appleby, C. A., Bogusz, D., Dennis, E. S., & Baudouin, E. (2011). The language of nitric Peacock, W. J. (1988). A role for haemo- oxide signalling. Plant Biology, 13(2), 233 globin in all plant roots?. Plant, Cell & -242. Environment, 11(5), 359-367. Appleby, C. A. (1992). The origin and func- Becana, M., Dalton, D. A., Moran, J. F., Iturbe‐ Ormaetxe, I., Matamoros, M. A., & C tions of haemoglobin in plants. Science Rubio, M. (2000). Reactive oxygen spe- Progress (1933-), 365-398. cies and antioxidants in legume nodules. Arechaga-Ocampo, E., Saenz-Rivera, J., Physiologia plantarum, 109(4), 372-381.

BIBLIOGRAPHY 124

Beckwith, J., Tjepkema, J. D., Cashon, R. E., Dennis, E. S., Trinick, M. J., & Peacock, Schwintzer, C. R., & Tisa, L. S. (2002). W. J. (1988). Functioning haemoglobin Hemoglobin in five genetically diverse genes in non-nodulating plants. Nature, Frankia strains. Canadian journal of mi- 331(6152), 178-180. crobiology, 48(12), 1048-1055. Bogusz, D., Llewellyn, D. J., Craig, S., Dennis, Benoit, L. F., & Berry, A. M. (1990). Methods E. S., Appleby, C. A., & Peacock, W. J. for production and use of actinorhizal (1990). Nonlegume hemoglobin genes plants in forestry, low maintenance land- retain organ-specific expression in het- scapes, and revegetation.The Biology of erologous transgenic plants. The Plant Frankia and Actinorhizal Plants. Eds. CR Cell, 2(7), 633-641. Schwintzer. and JD Tjepkema, 281-297. Bond, G. (1983). 3. and distribution Benson, D. R., & Silvester, W. B. (1993). Biol- of non-legume nitrogen-fixii systems. Bio- ogy of Frankia strains, actinomycete sym- logical nitrogen fixation in forest ecosys- bionts of actinorhizal plants. Microbiologi- tems: foundations and applications, 9, 55. cal reviews, 57(2), 293-319. Bose, D., & Sen, A. (2006). Isolation and Benson, D. R., & Clawson, M. L. (2000). Evo- heavy metal resistance pattern of Frankia lution of the actinorhizal plant symbiosis. from Casuarina equisetifolia nodules. In- Prokaryotic nitrogen fixation: a model dian Journal of Microbiology, 46(1), 9. system for the analysis of a biological Bottomley, W. B. (1912). The root-nodules of process., 207-224. Myrica gale. Annals of Botany,26(101), Becking, J. H., de Boer, W. E., & Houwink, A. 111-117. L. (1964). Electron microscopy of the Botzman, M., & Margalit, H. (2011). Variation endophyte of Alnus glutinosa. Antonie van in global codon usage bias among prokary- Leeuwenhoek,30(1), 343-376. otic organisms is associated with their Berg, R. H., & McDowell, L. (1987). Endo- lifestyles. Genome biology, 12(10), R109. phyte differentiation in Cauarina acti- Boyd, E. S., Hamilton, T. L., & Peters, J. W. norhizae. Protoplasma, 136(2), 104-117. (2011). An alternative path for the evolu- Bargali, K. (2011). Actinorhizal plants of Ku- tion of biological nitrogen fixation. Fron- maun Himalaya and their ecological sig- tiers in microbiology, 2. nificance. African Journal of Plant Sci- Breitling, R., Laubner, D., & Adamski, J. ence, 5(7), 401-406. (2001). Structure-based phylogenetic Berry, A. M., & Torrey, J. G. (1983). Root hair analysis of short-chain alcohol dehydro- deformation in the infection process of genases and reclassification of the 17beta- Alnus rubra. Canadian journal of botany, hydroxysteroid dehydrogenase family. 61(11), 2863-2876. Molecular biology and evolution, 18(12), Berry, A. M., Mendoza-Herrera, A., Guo, Y. 2154-2161. Y., Hayashi, J., Persson, T., Barabote, R., Bremner, J. M. (1996). Nitrogen-total. Methods Demchenko, K., Zhang, S., & Pawlowski, of Soil Analysis Part 3—Chemical Meth- K. (2011). New perspectives on nodule ods, (methodsofsoilan3), 1085-1121. nitrogen assimilation in actinorhizal sym- Brooks, B., & Karplus, M. (1983). Harmonic bioses. Functional Plant Biology,38(9), dynamics of proteins: normal modes and 645-652. fluctuations in bovine pancreatic trypsin Bhattacharya, S., Sen, A., Thakur, S., & Tisa, inhibitor. Proceedings of the National L. S. (2013). Characterization of haemo- Academy of Sciences, 80(21), 6571-6575. globin from actinorhizal plants–an in- Browne, W. J., North, A. C. T., Phillips, D. C., silico approach. Journal of biosciences, 38 Brew, K., Vanaman, T. C., & Hill, R. L. (4), 777-787. (1969). A possible three-dimensional Bidon-Chanal, A., Martí, M. A., Estrin, D. A., structure of bovine α-lactalbumin based on & Luque, F. J. (2007). Dynamical regula- that of hen's egg-white lysozyme. Journal tion of ligand migration by a gate-opening of molecular biology, 42(1), 65IN1371- molecular switch in truncated hemoglobin- 7086. N from Mycobacterium tuberculosis. Jour- Brunchorst, J. Ueber einige Wurzelanschwel- nal of the American Chemical Society, 129 lungen, besonders diejenigen von Alnus (21), 6782-6788. und den elaeagnaceen. unters. ad bot. Inst. Bogusz, D., Appleby, C. A., Landsmann, J., z. Tuningen. bd, 2, 151-177.

BIBLIOGRAPHY 125

Bruno, S., Faggiano, S., Spyrakis, F., deforming factor and rhizobia Nod factor. Mozzarelli, A., Cacciatori, E., Dominici, Canadian Journal of Botany, 77(9), 1293- P., Grandi, E., Abbruzzetti, S. & 1301. Viappiani, C. (2007). Different roles of Chothia, C., Lesk, A. M., Levitt, M., Amit, A. protein dynamics and ligand migration in G., Mariuzza, R. A., Phillips, S. E., & non-symbiotic hemoglobins AHb1 and Poljak, R. J. (1986). The predicted AHb2 from Arabidopsis thaliana. Gene, structure of immunoglobulin D1. 3 and its 398(1), 224-233. comparison with the crystal structure. Brunchorst, T. (1885). Ueber die Knöllchen an Science, 233(4765), 755-758. den Leguminosen Wurzeln, Ber. d. Ciaccio, C., Ocaña-Calahorro, F., Droghetti, deutsch. Bot. Gesellsch, 3, 241. E., Tundo, G. R., Sanz-Luque, E., Burris, R. H., & Haas, E. (1944). The red pig- Polticelli, F., Visca, P., Smulevich, G., ment of leguminous root nodules. Journal Ascenzi, P.& Coletta, M. (2015). of Biological Chemistry, 155(1), 227-229. Functional and spectroscopic Burmester, T., Ebner, B., Weich, B., & characterization of Chlamydomonas Hankeln, T. (2002). Cytoglobin: a novel reinhardtii truncated hemoglobins. PloS globin type ubiquitously expressed inver- one, 10(5), e0125005. tebrate tissues. Molecular biology and Clawson, M. L., Bourret, A., & Benson, D. R. evolution, 19(4), 416-421. (2004). Assessing the phylogeny of Bykova, N. V., Igamberdiev, A. U., Ens, W., & Frankia-actinorhizal plant nitrogen-fixing root nodule symbioses with Frankia 16S Hill, R. D. (2006). Identification of an rRNA and glutamine synthetase gene intermolecular disulfide bond in barley sequences. Molecular phylogenetics and hemoglobin. Biochemical and biophysical research communications, 347(1), 301- evolution, 31(1), 131-138. 309. Colovos, C., & Yeates, T. O. (1993). Verifica- tion of protein structures: patterns of non- Callaham, D., Deltredici, P., & Torrey, J. G. bonded atomic interactions. Protein sci- (1978). Isolation and cultivation in vitro of the actinomycete causing root nodulation ence, 2(9), 1511-1519. in Comptonia.Science, 199(4331), 899- Couture, M., Chamberland, H., St-Pierre, B., 902. Lafontaine, J., & Guertin, M. (1994). Nuclear genes encoding chloroplast Callaham, D., Newcomb, W., Torrey, J. G., & Peterson, R. L. (1979). Root hair infection hemoglobins in the unicellular green alga in actinomycete-induced root nodule ini- Chlamydomonas eugametos. Molecular and General Genetics MGG, 243(2), 185- tiation in Casuarina, Myrica, and Compto- 197. nia. Botanical Gazette, 140, S1-S9. Cantrel, C., Vazquez, T., Puyaubert, J., Rezé, Couture, M., Das, T. K., Lee, H. C., Peisach, N., Lesch, M., Kaiser W.M., Dutilleul, C., J., Rousseau, D. L., Wittenberg, B.A., Wittenberg, J.B.& Guertin, M. (1999). Guillas, I., Zachowski, A. , & Baudouin, Chlamydomonas Chloroplast Ferrous E. (2011). Nitric oxide participates in cold- Hemoglobin heme pocket structure and responsive phosphosphingolipid formation and gene expression in Arabidopsis reactions with ligands. Journal of thaliana. New Phytologist, 189(2), 415- Biological Chemistry, 274(11), 6898-6910. 427. D'Alessio, G. (1999). The evolutionary transi- tion from monomeric to oligomeric pro- Case, D. A., & Karplus, M. (1979). Dynamics of ligand binding to heme proteins. teins: tools, the environment, hypotheses. Journal of molecular biology, 132(3), 343- Progress in biophysics and molecular bi- 368. ology, 72(3), 271-298. Das, A. P., & Ghosh, C. (2011). Plant wealth Centeno, N. B., Planas-Iglesias, J., & Oliva, B. (2005). Comparative modelling of protein of Darjiling and Sikkim Himalayas vis-a- structure and its impact on microbial cell vis conservation. NBU J. Pl. Sci, 5(1), 25- factories. Microbial cell factories, 4(1), 33. 20. Das, T. K., Lee, H. C., Duff, S. M., Hill, R. D., Ceremonie, H., Debellé, F., & Fernandez, M. Peisach, J., Rousseau, D.L., Wittenberg, P. (1999). Structural and functional B.A. & Wittenberg, J. B. (1999). The heme environment in barley hemoglobin. comparison of Frankia root hair Journal of Biological Chemistry, 274(7),

BIBLIOGRAPHY 126

4207-4212. and symbiotic bacteria. Trends in Plant Dawson, J. O. (1990). Interactions among acti- Science, 3(12), 473-478. norhizal and associated plant species. The Dundas, J., Ouyang, Z., Tseng, J., Binkowski, biology of Frankia and actinorhizal plants, A., Turpaz, Y., & Liang, J. (2006). 299-316. CASTp: computed atlas of surface topog- Den Camp, R. O., Streng, A., De Mita, S., Cao, raphy of proteins with structural and to- pographical mapping of functionally anno- Q., Polone, E., Liu, W., Ammiraju, J.S., tated residues. Nucleic acids research, 34 Kudrna, D., Wing, R., Untergasser, A., & Bisseling, T. (2011). LysM-type mycorrhi- (suppl_2), W116-W118. zal receptor recruited for rhizobium sym- Duff, S. M., Wittenberg, J. B., & Hill, R. D. biosis in nonlegume Parasponia. Science, (1997). Expression, purification, and prop- 331(6019), 909-912. erties of recombinant barley (Hordeum sp.) hemoglobin optical spectra and reac- Dermitzakis, E. T., & Clark, A. G. (2001). Dif- tions with gaseous ligands. Journal of Bio- ferential selection after duplication in mammalian developmental genes. Molecu- logical Chemistry, 272(27), 16746-16752. lar Biology and Evolution, 18(4), 557-562. Eisenberg, D., Lüthy, R., & Bowie, J. U. (1997). [20] VERIFY3D: Assessment of Diem, H. G., Gauthier, D., & Dommergues, Y. protein models with three-dimensional R. (1982). Isolation of Frankia from nod- ules of Casuarina equisetifolia. Canadian profiles. Methods in enzymology, 277, 396 Journal of Microbiology,28(5), 526-530. -404. Fago, A., Hundahl, C., Malte, H., & Weber, R. Diem, H. G., & Dommergues, Y. (1983). The isolation of Frankia from nodules of E. (2004). Functional properties of neu- Casuarina. Canadian journal of botany, roglobin and cytoglobin. Insights into the ancestral physiological roles of globins. 61(11), 2822-2825. IUBMB life, 56(11‐12), 689-696. Diem, H. G., & Dommergues, Y. R. (1990). Current and potential uses and manage- Felsenstein, J. (1989). PHYLIP 3.2 manual. ment of Casuarinaceae in the tropics and University of California Herbarium, subtropics. The biology of Frankia and Berkeley. actinorhizal plants, 317-342. Fiuczek, M. (1959). Wiazanie azotu atmos- Dikshit, K. L., Dikshit, R. P., & Webster, D. A. ferycznego w czystych kulturach Strepto- (1990). Study of Vitreoscilla globin (vgb) myces alni. Acta Microbiol. Polonica, 8, gene expression and promoter activity in 283-287. E. coli through transcriptional fusion. Nu- Franche, C., Laplaze, L., Duhoux, E., & cleic Acids Research, 18(14), 4149-4155. Bogusz, D. (1998). Actinorhizal Dordas, C., Hasinoff, B. B., Igamberdiev, A. symbioses: recent advances in plant molecular and genetic transformation U., Manac'h, N., Rivoal, J., & Hill, R. D. studies. Critical Reviews in Plant (2003a). Expression of a stress‐induced hemoglobin affects NO levels produced by Sciences, 17(1), 1-28. alfalfa root cultures under hypoxic stress. Freitas, T. A. K., Saito, J. A., Hou, S., & Alam, The Plant Journal, 35(6), 763-770. M. (2005). Globin-coupled sensors, protoglobins, and the last universal Dordas, C., Rivoal, J., & Hill, R. D. (2003b). common ancestor. Journal of inorganic Plant haemoglobins, nitric oxide and hy- poxic stress. Annals of Botany, 91(2), 173- biochemistry, 99(1), 23-33. 178. Frey, A. D., & Kallio, P. T. (2003). Bacterial hemoglobins and flavohemoglobins: ver- Dordas, C., Hasinoff, B. B., Rivoal, J., & Hill, satile proteins and their impact on microbi- R. D. (2004). Class-1 hemoglobins, nitrate and NO levels in anoxic maize cell- ology and biotechnology. FEMS microbi- suspension cultures. Planta, 219(1), 66-72. ology reviews, 27(4), 525-545. Froussart, E., Bonneau, J., Franche, C., & Bo- Doyle, J. J., & Doyle, J. L. (1987). CTAB gusz, D. (2016). Recent advances in acti- DNA extraction in plants. Phytochemical norhizal symbiosis signaling. Plant mole- Bulletin, 19, 11-15. cular biology, 90(6), 613-622. Doyle, J. J. (1998). Phylogenetic perspectives Garau, G., Di Guilmi, A. M., & Hall, B. G. on nodulation: evolving views of plants (2005). Structure-based phylogeny of the

BIBLIOGRAPHY 127

metallo-β-lactamases. Antimicrobial 1754-1759. agents and chemotherapy, 49(7), 2778- Gu, X. (1999). Statistical methods for testing 2784. functional divergence after gene duplica- Gardner, A. M., Martin, L. A., Gardner, P. R., tion. Molecular biology and evolution, 16 Dou, Y., & Olson, J. S. (2000). Steady- (12), 1664-1674. state and transient kinetics of Escherichia Gu, X., & Vander Velden, K. (2002). DI- coli nitric-oxide dioxygenase VERGE: phylogeny-based analysis for (flavohemoglobin) The B10 tyrosine hy- functional–structural divergence of a pro- droxyl is essential for dioxygen binding tein family. Bioinformatics, 18(3), 500- and catalysis. Journal of Biological Chem- 501. istry, 275(17), 12581-12589. Gujral, T. S., Singh, V. K., Jia, Z., & Mulligan, Garrocho-Villegas, V., Gopalasubramaniam, S. L. M. (2006). Molecular mechanisms of K., & Arredondo-Peter, R. (2007). Plant RET receptor–mediated oncogenesis in hemoglobins: what we know six decades Multiple Endocrine Neoplasia 2B. Cancer after their discovery. Gene, 398(1), 78-85. research, 66(22), 10741-10749. Garrocho-Villegas, V., & Arredondo-Peter, R. Guldner, E., Godelle, B., & Galtier, N. (2004). (2008). Molecular cloning and characteri- Molecular adaptation in plant hemoglobin, zation of a moss (Ceratodon purpureus) a duplicated gene involved in plant– nonsymbiotic hemoglobin provides insight bacteria symbiosis. Journal of molecular into the early evolution of plant nonsymbi- evolution, 59(3), 416-425. otic hemoglobins. Molecular biology and evolution, 25(7), 1482-1487. Gunther, C., Schlereth, A., Udvardi, M., & Ott, T. (2007). Metabolism of reactive oxygen Gasteiger, E., Hoogland, C., Gattiker, A., Du- species is attenuated in leghemoglobin- vaud, S. E., Wilkins, M. R., Appel, R. D., deficient nodules of Lotus japonicus. Mo- & Bairoch, A. (2005). Protein identifica- lecular plant-microbe interactions, 20(12), tion and analysis tools on the ExPASy 1596-1603. server (pp. 571-607). Humana Press. Gupta, C. P. (2014). Role of iron (Fe) in body. Gherbi, H., Markmann, K., Svistoonoff, S., IOSR Journal of Applied Chemistry (IOSR Estevan, J., Autran, D., Giczey, G., Au- -JAC), 7, 38-46. guy, F., Peret, B., Laplaze, L., Franche, C., & Parniske, M. (2008). SymRK defines a Gupta, K. J., Hebelstrup, K. H., Mur, L. A., & common genetic basis for plant root endo- Igamberdiev, A. U. (2011). Plant hemo- symbioses with arbuscular mycorrhiza globins: important players at the cross- fungi, rhizobia, and Frankia bacteria. Pro- roads between oxygen and nitric oxide. ceedings of the National Academy of Sci- FEBS letters, 585(24), 3843-3849. ences, 105(12), 4928-4932. Guruprasad, K., Reddy, B. B., & Pandit, M. W. Goodman, M., Pedwaydon, J., Czelusniak, J., (1990). Correlation between stability of a Suzuki, T., Gotoh, T., Moens, L., Shishi- protein and its dipeptide composition: a kura, F., Walz, D., & Vinogradov, S. novel approach for predicting in vivo sta- (1988). An evolutionary tree for inverte- bility of a protein from its primary se- brate globin sequences. Journal of molecu- quence. Protein Engineering, Design and lar evolution, 27(3), 236-249. Selection, 4(2), 155-161. Gopalasubramaniam, S. K., Kovacs, F., Halder, P., Trent, J. T., & Hargrove, M. S. Violante‐Mota, F., Twigg, P., Arredondo‐ (2007). Influence of the protein matrix on Peter, R., & Sarath, G. (2008). Cloning intramolecular histidine ligation in ferric and characterization of a caesalpinoid and ferrous hexacoordinate hemoglobins. (Chamaecrista fasciculata) hemoglobin: PROTEINS: Structure, Function, and Bio- the structural transition from a nonsymbi- informatics, 66(1), 172-182. otic hemoglobin to a leghemoglobin. Pro- Hardison, R. (1998). Hemoglobins from bacte- teins: Structure, Function, and Bioinfor- ria to man: evolution of different patterns matics, 72(1), 252-260. of gene expression. Journal of Experimen- Gribaldo, S., Casane, D., Lopez, P., & Phil- tal Biology, 201(8), 1099-1117. ippe, H. (2003). Functional divergence Hargrove, M. S. (2000). A flash photolysis prediction from evolutionary analysis: a method to characterize hexacoordinate case study of vertebrate hemoglobin. Mo- hemoglobin kinetics. Biophysical Journal, lecular biology and evolution, 20(11), 79(5), 2733-2738.

BIBLIOGRAPHY 128

Hargrove, M. S., Brucker, E. A., Stec, B., Holm, L., & Sander, C. (1993). Protein struc- Sarath, G., Arredondo-Peter, R., Klucas, ture comparison by alignment of distance R.V., Olson, J.S. & Phillips, G. N. (2000). matrices. Journal of molecular biology, Crystal structure of a nonsymbiotic plant 233(1), 123-138. hemoglobin. Structure, 8(9), 1005-1014. Honig, G. R., Vida, L. N., Rosenblum, B. B., Hebelstrup, K. H., Igamberdiev, A. U., & Hill, Perutz, M. F., & Fermi, G. (1990). Hemo- R. D. (2007). Metabolic effects of hemo- globin Warsaw (Phe beta 42 (CD1)---- globin gene expression in plants. Gene, Val), an unstable variant with decreased 398(1), 86-93. oxygen affinity. Characterization of its synthesis, functional properties, and struc- Hebelstrup, K. H., Shah, J. K., & Igamberdiev, A. U. (2013). The role of nitric oxide and ture. Journal of Biological Chemistry, 265 hemoglobin in plant development and (1), 126-132. morphogenesis. Physiologia plantarum, Horchani, F., Prévot, M., Boscari, A., Evangel- 148(4), 457-469. isti, E., Meilhoc, E., Bruand, C., Raymond, Heckmann, A. B., Hebelstrup, K. H., Larsen, P., Boncompagni, E., Aschi-Smiti, S., K., Micaelo, N. M., & Jensen, E. A. Puppo, A., & Brouquisse, R. (2011). Both plant and bacterial nitrate reductases con- (2006). A single hemoglobin gene in tribute to nitric oxide production in Medi- Myrica gale retains both symbiotic and cago truncatula nitrogen-fixing nodules. non-symbiotic specificity. Plant molecular Plant Physiology, 155(2), 1023-1036. biology, 61(4), 769-779. Hoy, J. A., Robinson, H., Trent, J. T., Kakar, Henikoff, S., Pietrokovski, S., & Henikoff, J. S., Smagghe, B. J., & Hargrove, M. S. G. (1998). Superior performance in protein (2007). Plant hemoglobins: a molecular homology detection with the blocks data- base servers. Nucleic Acids Research, 26 fossil record for the evolution of oxygen transport. Journal of molecular biology, (1), 309-312. 371(1), 168-179. Herold, S., & Puppo, A. (2005). Oxyleghemo- Hoy, J. A., & Hargrove, M. S. (2008). The globin scavenges nitrogen monoxide and peroxynitrite: a possible role in function- structure and function of plant hemoglo- ing nodules?. JBIC Journal of Biological bins. Plant Physiology and Biochemistry, Inorganic Chemistry, 10(8), 935-945. 46(3), 371-379. Hunt, P. W., Watts, R. A., Trevaskis, B., Lle- Hibbs, D. E., & Cromack Jr, K. (1990). Acti- norhizal plants in pacific northwest forests. welyn, D. J., Burnell, J., Dennis, E. S., & The biology of Frankia and actinorhizal Peacock, W. J. (2001). Expression and evolution of functionally distinct haemo- plants, 343-363. globin genes in plants. Plant molecular Hill, R. D. (2012). Non-symbiotic haemoglo- biology, 47(5), 677-692. bins—What's happening beyond nitric Hunt, P. W., Klok, E. J., Trevaskis, B., Watts, oxide scavenging?. AoB Plants, 2012. R. A., Ellis, M. H., Peacock, W. J., & Den- Hirsch, A. M. (1992). Tansley review no. 40. nis, E. S. (2002). Increased level of hemo- Developmental biology of legume nodula- globin 1 enhances survival of hypoxic tion. New phytologist, 211-237. stress and promotes early growth in Arabi- Hirsch, A. M. (2009). Brief history of the dis- dopsis thaliana.Proceedings of the Na- covery of nitrogen fixing organisms. tional Academy of Sciences, 99(26), 17197 Available at Web site http://www. mcdb. -17202. ucla. edu/Research/Hirsch/imagesb/ Hubbard, T. J. P., & Blundell, T. L. (1987). HistoryDiscoveryN2fixing Organisms. pdf Comparison of solvent-inaccessible cores (accessed November 2010). of homologous proteins: definitions useful Hoagland, D. R., & Arnon, D. I. (1950). The for protein modelling. Protein Engineer- water-culture method for growing plants ing, Design and Selection, 1(3), 159-171. without soil. Circular. California Agricul- Igamberdiev, A. U., & Hill, R. D. (2004). Ni- tural Experiment Station, 347(2nd edit). trate, NO and haemoglobin in plant adap- Hollup, S. M., Salensminde, G., & Reuter, N. tation to hypoxia: an alternative to classic (2005). WEBnm@: a web application for fermentation pathways. Journal of Experi- normal mode analyses of proteins. BMC mental Botany, 55(408), 2473-2482. bioinformatics, 6(1), 52. Igamberdiev, A. U., Baron, K., Manac'H-Little, N., Stoimenova, M., & Hill, R. D. (2005).

BIBLIOGRAPHY 129

The haemoglobin/nitric oxide cycle: in- (1), 1-68. volvement in flooding stress and effects on Jokipii-Lukkari, S., Frey, A. D., Kallio, P. T., hormone signalling. Annals of Botany, 96 & Häggman, H. (2009). Intrinsic non- (4), 557-564. symbiotic and truncated haemoglobins and Igamberdiev, A. U., Bykova, N. V., & Hill, R. heterologous Vitreoscilla haemoglobin D. (2006). Nitric oxide scavenging by bar- expression in plants. Journal of experi- ley hemoglobin is facilitated by a monode- mental botany, 60(2), 409-422. hydroascorbate reductase-mediated ascor- Jokipii-Lukkari, S., Kastaniotis, A. J., Parkash, bate reduction of methemoglobin. Planta, V., Sundström, R., Leiva-Eriksson, N., 223(5), 1033-1040. Nymalm, Y., Blokhina, O., Kukkola, E., Igamberdiev, A. U., Stoimenova, M., Seregé- Fagerstedt, K.V., Salminen, T.A., & Laara, lyes, C., & Hill, R. D. (2006). Class-1 he- E. (2016). Dual targeted poplar ferredoxin moglobin and antioxidant metabolism in NADP+ oxidoreductase interacts with alfalfa roots. Planta,223(5), 1041-1046. hemoglobin 1. Plant Science, 247, 138- Igamberdiev, A. U., Bykova, N. V., & Hill, R. 149. D. (2011). Structural and functional prop- Jokipii, S., Häggman, H., Brader, G., Kallio, P. erties of class 1 plant hemoglobins. T., & Niemi, K. (2008). Endogenous IUBMB life, 63(3), 146-152 PttHb1 and PttTrHb, and heterologous .Igamberdiev, A. U., Stasolla, C., & Hill, R. D. Vitreoscilla vhb haemoglobin gene expres- (2014). Low oxygen stress, nonsymbiotic sion in hybrid aspen roots with ectomy- corrhizal interaction. Journal of experi- hemoglobins, NO, and programmed cell death. In Low-Oxygen Stress in Plants (pp. mental botany, 59(9), 2449-2459. 41-58). Springer Vienna. Kakar, S., Hoffman, F. G., Storz, J. F., Fabian, M., & Hargrove, M. S. (2010). Structure Ioanitescu, A. I., Dewilde, S., Kiger, L., and reactivity of hexacoordinate hemoglo- Marden, M. C., Moens, L., & Van Doorsl- aer, S. (2005). Characterization of non- bins. Biophysical chemistry, 152(1), 1-14. symbiotic tomato hemoglobin.Biophysical Kakar, S., Sturms, R., Tiffany, A., Nix, J. C., journal, 89(4), 2628-2639. DiSpirito, A. A., & Hargrove, M. S. (2011). Crystal structures of Parasponia Ikai, A. (1980). Thermostability and aliphatic and Trema hemoglobins: differential heme index of globular proteins. The Journal of coordination is linked to quaternary struc- Biochemistry, 88(6), 1895-1898. ture. Biochemistry,50(20), 4273-4280. Jacobsen-Lyon, K., Jensen, E. O., Jorgensen, J. Kaplan, W., & Littlejohn, T. G. (2001). Swiss- E., Marcker, K. A., Peacock, W. J., & PDB viewer (deep view). Briefings in bio- Dennis, E. S. (1995). Symbiotic and non- symbiotic hemoglobin genes of Casuarina informatics, 2(2), 195-197. glauca. The Plant Cell, 7(2), 213-223. Karplus, P., Daniels, M., & Herriogi, J. (1991). Atomic structure of ferredoxin-NADP+ Jeong, S. C., Ritchie, N. J., & Myrold, D. D. reductase: prototype for a structurally (1999). Molecular phylogenies of plants novel flavoenzyme family. Enzyme, 10, and Frankia support multiple origins of actinorhizal symbioses. Molecular Phy- 20. logenetics and Evolution, 13(3), 493-503. Kavanaugh, J. S., Rogers, P. H., Case, D. A., & Arnone, A. (1992). High-resolution x-ray Jeong, H. S., & Jouanneau, Y. (2000). En- hanced nitrogenase activity in strains of study of deoxyhemoglobin rothschild 37. Rhodobacter capsulatus that overexpress beta. Trp. fwdarw. Arg: a mutation that creates an intersubunit chloride-binding the rnf genes. Journal of bacteriology, 182 site.Biochemistry, 31(16), 4111-4121. (5), 1208-1214. Ji, L., Becana, M., Sarath, G., & Klucas, R. V. Keeling, M. M., Ogden, L. L., Wrightstone, R. (1994). Cloning and sequence analysis of a N., Wilson, J. B., Reynolds, C. A., Kitch- ens, J. L., & Huisman, T. H. J. (1971). cDNA encoding ferric leghemoglobin re- Hemoglobin louisville (β42 (CD1) Phe→ ductase from soybean nodules. Plant Leu): an unstable variant causing mild physiology, 104(2), 453-459. hemolytic anemia. Journal of Clinical Johnson, M. S., Srinivasan, N., Sowdhamini, Investigation, 50(11), 2395. R., & Blundell, T. L. (1994). Knowledge- Kennedy, P. G., Weber, M. G., & Bluhm, A. based protein modeling. Critical reviews A. (2010). Frankia bacteria in Alnus rubra in biochemistry and molecular biology, 29

BIBLIOGRAPHY 130

forests: genetic diversity and determinants tional Journal of Systematic and Evolu- of assemblage structure. Plant and soil, tionary Microbiology, 44(1), 1-8. 335(1-2), 479-492. Levitt, M., & Sharon, R. (1988). Accurate Klein, J. B., & Thongboonkerd, V. (2004). simulation of protein dynamics in solution. Overview of proteomics. In Proteomics in Proceedings of the National Academy of Nephrology (Vol. 141, pp. 1-10). Karger Sciences, 85(20), 7557-7561. Publishers. Li, Y. D., Xie, Z. Y., Du, Y. L., Zhou, Z., Mao, Knowlton, S., Berry, A., & Torrey, J. G. X. M., Lv, L. X., & Li, Y. Q. (2009). The (1980). Evidence that associated soil bac- rapid evolution of signal peptides is teria may influence root hair infection of mainly caused by relaxed selection on non actinorhizal plants by Frankia.Canadian -synonymous and synonymous sites. Gene, journal of microbiology, 26(8), 971-977. 436(1), 8-11. Konagurthu, A. S., Whisstock, J. C., Stuckey, Maillet, F., Poinsot, V., Andre, O., Puech- P. J., & Lesk, A. M. (2006). MUSTANG: Pagès, V., Haouy, A., Gueunier, M., Cro- a multiple structural alignment algorithm. mer, L., Giraudet, D., Formey, D., Niebel, Proteins: Structure, Function, and Bioin- A., & Martinez, E. A. (2011). Fungal formatics, 64(3), 559-574. lipochitooligosaccharide symbiotic signals in arbuscular mycorrhiza. Nature, 469 Krassilnikov, N. A. (1949). Determination of bacteria and actinomycetes. Akad. Nauk (7328), 58.) USSR. Maire, R. C. J. E., & Tison, A. (1909). La cyto- logie des Plasmodiophoracées et la classe Kubo, H. (1939). Uber hamoprotein aus den wurzelknollchen von leguminosen. Acta des Phytomyxinae. Friedlaender & Sohn. Phytochim, 11, 195-200. Markmann, K., Giczey, G., & Parniske, M. (2008). Functional adaptation of a plant Kundu, S., & Hargrove, M. S. (2003). Distal receptor-kinase paved the way for the evo- heme pocket regulation of ligand binding and stability in soybean leghemoglobin. lution of intracellular root symbioses with Proteins: Structure, Function, and Bioin- bacteria. PLoS biology, 6(3), e68. formatics, 50(2), 239-248. Marti, M. A., Crespo, A., Capece, L., Boechi, L., Bikiel, D. E., Scherlis, D. A., & Estrin, Kundu, S., Trent, J. T., & Hargrove, M. S. (2003). Plants, humans and hemoglobins. D. A. (2006). Dioxygen affinity in heme proteins investigated by computer Trends in plant science, 8(8), 387-393. simulation. Journal of inorganic Lalonde, M. (1978). Confirmation of the infec- biochemistry, 100(4), 761-770. tivity of a free-living actinomycete iso- lated from Comptonia peregrina root nod- Marti-Renom, M. A., Stuart, A. C., Fiser, A., ules by immunological and ultrastructural Sanchez, R., Melo, F., & Sali, A. (2000). Comparative protein structure modeling of studies. Canadian Journal of Botany, 56 genes and genomes. Annual review of bio- (21), 2621-2635. physics and biomolecular structure, 29(1), Lancelle, S. A., & Torrey, J. G. (1985). Early 291-325. development of Rhizobium-induced root Mathieu, C., Moreau, S., Frendo, P., Puppo, nodules of Parasponia rigida. II. Nodule A., & Davies, M. J. (1998). Direct detec- morphogenesis and symbiotic develop- tion of radicals in intact soybean nodules: ment. Canadian Journal of Botany, 63(1), presence of nitric oxide-leghemoglobin 25-35. complexes. Free Radical Biology and Laskowski, R. A., Watson, J. D., & Thornton, Medicine, 24(7), 1242-1249. J. M. (2005). ProFunc: a server for predict- McCammon, J. A., Gelin, B. R., & Karplus, M. ing protein function from 3D structure. Nucleic acids research, 33(suppl_2), W89- (1977). Dynamics of folded proteins. Na- W93. ture, 267(5612), 585-590. Meakin, G. E., Bueno, E., Jepson, B., Bedmar, Lechevalier, M. P., & Lechevalier, H. A. E. J., Richardson, D. J., & Delgado, M. J. (1984). Taxonomy of Frankia. Biological, Biochemical and Biomedical Aspects of (2007). The contribution of bacteroidal nitrate and nitrite reduction to the forma- Actinomycetes, 575-582. tion of nitrosylleghaemoglobin complexes Lechevalier, M. P. (1994). Taxonomy of the in soybean root nodules. Microbiology, genus Frankia (Actinomycetales). Interna- 153(2), 411-419.

BIBLIOGRAPHY 131

Mendelson, Y., Wang, Y., & Gross, B. D. ports the delineation of Neorhizobium gen. (1994). U.S. Patent No. 5,277,181. Wash- nov. Systematic and applied microbiology, ington, DC: U.S. Patent and Trademark 37(3), 208-215. Office. Mukherjee, D. (2009). Medicinal plant in Dar- Meyen, J. (1829). Uber das hervorwachsen jeeling hills. Krishi Sandesh - Miazik In- parasitischer gebilde aus wurzeln anderer ternational Volunteer Center, Japan, 118- pflanzen. Flora (Jena), 12, 49-64. 121. Milani, M., Ouellet, Y., Ouellet, H., Guertin, Myrold, D. D. (1994). and the Actinorhizal M., Boffi, A., Antonini, G., Bocedi, A., Symbiosis. Methods of Soil Analysis: Part Mattu, M., Bolognesi, M. , & Ascenzi, P. 2—Microbiological and Biochemical (2004). Cyanide binding to truncated Properties, (methodsofsoilan2), 291-328. hemoglobins: a crystallographic and Myrold, D. D., & Huss-Danell, K. (1994). kinetic study. Biochemistry, 43(18), 5213- Population dynamics of Alnus-infective 5221. Frankia in a forest soil with and without Milani, M., Pesce, A., Ouellet, Y., Dewilde, S., host trees. Soil Biology and Biochemistry, Friedman, J., Ascenzi, P., Guertin, M., & 26(5), 533-540. Bolognesi, M. (2004). Heme-ligand Nadler, S. A. (1995). Advantages and disad- tunneling in group I truncated vantages of molecular phylogenetics: A hemoglobins. Journal of Biological case study of ascaridoid nematodes. Jour- Chemistry, 279(20), 21520-21525. nal of Nematology, 27(4), 423. Milenkovic, V. M., Brockmann, M., Stöhr, H., Nagata, M., Murakami, E. I., Shimoda, Y., Weber, B. H., & Strauss, O. (2010). Evo- Shimoda-Sasakura, F., Kucho, K. I., Su- lution and functional divergence of the zuki, A., Abe, M., Higashi, S. , & Uchi- anoctamin family of membrane proteins. umi, T. (2008). Expression of a class 1 BMC evolutionary biology, 10(1), 319. hemoglobin gene and production of nitric Miller, I. M., & Baker, D. D. (1986). Nodula- oxide in response to symbiotic and patho- tion of actinorhizal plants by Frankia genic bacteria in Lotus japonicus. Molecu- strains capable of both root hair infection lar plant-microbe interactions, 21(9), 1175 and intercellular penetration. Protoplasma, -1183. 131(1), 82-91. Nakajima, S., Alvarez-Salgado, E., Kikuchi, Mirza, B. S., Mirza, M. S., Bano, A., & Malik, T., & Arredondo-Peter, R. (2005). Predic- K. A. (2007). Coinoculation of chickpea tion of folding pathway and kinetics with Rhizobium isolates from roots and among plant hemoglobins using an aver- nodules and phytohormone-producing age distance map method. Proteins: Struc- Enterobacter strains. Australian Journal of ture, Function, and Bioinformatics, 61(3), Experimental Agriculture, 47(8), 1008- 500-506. 1015. Nei, M., & Li, W. H. (1979). Mathematical Moens, L., Vanfleteren, J., Van de Peer, Y., model for studying genetic variation in Peeters, K., Kapp, O., Czeluzniak, J., terms of restriction endonucleases. Pro- Goodman, M., Blaxter, M. , & Vinogra- ceedings of the National Academy of Sci- dov, S. (1996). Globins in nonvertebrate ences, 76(10), 5269-5273. species: dispersal by horizontal gene trans- Newcomb, W., & Wood, S. M. (1987). fer and evolution of the structure-function Morphogenesis and fine structure of relationships. Molecular biology and evo- Frankia (Actinomycetales): the microsym- lution, 13(2), 324-333. biont of nitrogen-fixing actinorhizal root Moller, H. (1885). Plasmodiophora alni. Ber. nodules. International review of cytology, dtsch. bot. Ges, 3, 101-105. 109, 1-88. Mori, S. (1999). Iron acquisition by plants. Nie, X., & Hill, R. D. (1997). Mitochondrial Current opinion in plant biology, 2(3), 250 respiration and hemoglobin gene expres- -253. sion in barley aleurone tissue. Plant Physi- Mousavi, S. A., Österman, J., Wahlberg, N., ology, 114(3), 835-840. Nesme, X., Lavire, C., Vial, L., Paulin, L., Nie, X., Durnin, D. C., Igamberdiev, A. U., & De Lajudie, P.& Lindström, K. (2014). Hill, R. D. (2006). Cytosolic calcium is Phylogeny of the Rhizobium– involved in the regulation of barley hemo- Allorhizobium–Agrobacterium clade sup- globin gene expression.Planta, 223(3),

BIBLIOGRAPHY 132

542-549. (2), 351-367. Nienhaus, K., Dominici, P., Astegno, A., O'malley, L. (1907). Bengal District Gazetteers Abbruzzetti, S., Viappiani, C., & -Darjeeling. Nienhaus, G. U. (2010). Ligand migration Ott, T., van Dongen, J. T., Gu, C., Krusell, L., and binding in nonsymbiotic hemoglobins Desbrosses, G., Vigeolas, H., Bock, V., of Arabidopsis thaliana. Biochemistry, 49 Czechowski, T., Geigenberger, P., & Ud- (35), 7448-7458. vardi, M. K. (2005). Symbiotic leghemo- Normand, P., & Bouquet, J. (1989). Phylogeny globins are crucial for nitrogen fixation in of nitrogenase sequences in Frankia and legume root nodules but not for general other nitrogen-fixing microorganisms. plant growth and development. Current Journal of molecular evolution, 29(5), 436 biology, 15(6), 531-535. -447. Pawlowski, K., & Bisseling, T. (1996). Rhizo- Normand, P., Orso, S., Cournoyer, B., Jeannin, bial and actinorhizal symbioses: what are P., Chapelon, C., Dawson, J., Evtushenko, the shared features?. The Plant Cell, 8(10), L. , & Misra, A. K. (1996). Molecular phy- 1899. logeny of the genus Frankia and related Pawlowski, K., Jacobsen, K. R., Alloisio, N., genera and emendation of the family Denison, R. F., Klein, M., Tjepkema, J.D., Frankiaceae. International Journal of Sys- Winzer, T., Sirrenberg, A., Guan, C., & tematic and Evolutionary Microbiology, Berry, A. M. (2007). Truncated hemoglo- 46(1), 1-9. bins in actinorhizal nodules of Datisca Normand, P., & Fernandez, M. P. (2008). Evo- glomerata. Plant Biology, 9(06), 776-785. lution and diversity of Frankia. In Pro- Pawlowski, K., Bogusz, D., Ribeiro, A., & karyotic symbionts in plants (pp. 103-125). Berry, A. M. (2011). Progress on research Springer Berlin Heidelberg. on actinorhizal plants. Functional Plant Ochman, H., & Wilson, A. C. (1987). Evolu- Biology, 38(9), 633-638. tion in bacteria: evidence for a universal Pawlowski, K., & Demchenko, K. N. (2012). substitution rate in cellular genomes. Jour- The diversity of actinorhizal symbiosis. nal of molecular evolution, 26(1-2), 74-86. Protoplasma, 249(4), 967-979. Ogata, K., Ito, T., Okazaki, T., Dan, K., No- Peden, J. F. (1999). Analysis of codon usage mura, T., Nozawa, Y., & Kajita, A. (1986). [thesis].[Nottingham (United Kingdom)]: Hemoglobin sendagi (β42 Phe↠ Val): A University of Nottingham. CodonW: new unstable hemoglobin variant having Correspondence analysis of codon usage. an amino acid Substitution at Cd1 of the β- Chain. Hemoglobin, 10(5), 469-481. Peklo, J. (1910). Die pflanzlichen Aktinomyko- sen: ein Beitrag zur Physiologie der pa- Oh, C. J., Kim, H. B., Kim, J., Kim, W. J., Lee, thogenen Mikroorganismen. Gustav Fis- H., & An, C. S. (2012). Organization of nif cher. gene cluster in Frankia sp. EuIK1 strain, a symbiont of Elaeagnusumbellata. Archives Perazzolli, M., Dominici, P., Romero-Puertas, of microbiology, 194(1), 29-34. M. C., Zago, E., Zeier, J., Sonoda, M., Lamb, C. & Delledonne, M. (2004). Ohwaki, Y., Kawagishi-Kobayashi, M., Wa- Arabidopsis nonsymbiotic hemoglobin kasa, K., Fujihara, S., & Yoneyama, T. AHb1 modulates nitric oxide bioactivity. (2005). Induction of class-1 non-symbiotic The Plant Cell, 16(10), 2785-2794. hemoglobin genes by nitrate, nitrite and nitric oxide in cultured rice cells. Plant Perazzolli, M., Romero-Puertas, M. C., & and Cell Physiology, 46(2), 324-331. Delledonne, M. (2005). Modulation of nitric oxide bioactivity by plant Oldroyd, G. E., Harrison, M. J., & Paszkowski, haemoglobins. Journal of Experimental U. (2009). Reprogramming plant cells for Botany, 57(3), 479-488. endosymbiosis. Science, 324(5928), 753- 754. Perutz, M. F. (1979). Molecular adaptation in haemoglobin and thermophile bacteria. Oliveira, A. A., Rennó, M. N., de Matos, C. A., Differentiation, 13(1), 47-50. Bertuzzi, M. D., Ramalho, T. C., Fraga, C. A., & França, T. C. (2011). Molecular Pesce, A., Dewilde, S., Nardini, M., Moens, L., modeling studies of Yersinia pestis Ascenzi, P., Hankeln, T., Burmester, T., & dihydrofolate reductase. Journal of Bolognesi, M. (2003). Human brain Biomolecular Structure and Dynamics, 29 neuroglobin structure reveals a distinct

BIBLIOGRAPHY 133

mode of controlling oxygen affinity. Ramachandran, G. N., Ramakrishnan, C., & Structure, 11(9), 1087-1095. Sasisekharan, V. (1963). Stereochemistry of polypeptide chain configurations. Jour- Peters, J. W., Fisher, K., & Dean, D. R. (1995). Nitrogenase structure and function: a bio- nal of molecular biology, 7(1), 95-99. chemical-genetic perspective. Annual Re- Raman, N., & Elumalai, S. (1991, March). A views in Microbiology,49(1), 335-366. survey on actinorhizal nodulation status and mycorrhizal association in Casuarina Pommer, E. H. (1956). Beiträge zur anatomie equisetifolia in coastal region of Madras, und biologie der wurzelknöllchen von Al- India. In Proceedings of Second Asian nus glutinosa gaertn. Flora oder Allge- Conference on Mycorrhiza, Bogor (pp. 11- meine Botanische Zeitung, 143(4), 603- 15). 634. Raymond, J., Siefert, J. L., Staples, C. R., & Pratt, C. J. (1968). World Trade in Fertiliser Blankenship, R. E. (2004). The natural Raw Materials'. Fertiliser Production, history of nitrogen fixation. Molecular Technology and Use biology and evolution, 21(3), 541-554. Qi, J., Wang, B., & Hao, B. I. (2004). Whole Reddy, D. M. (2007). Evolutionary trace analy- proteome prokaryote phylogeny without sis of plant haemoglobins: implications for sequence alignment: a K-string composi- site-directed mutagenesis. Bioinformation, tion approach. Journal of molecular evolu- 1(9), 370-375. tion, 58(1), 1-11. Reeder, B. J., & Hough, M. A. (2014). The Qu, Z. L., Zhong, N. Q., Wang, H. Y., Chen, structure of a class 3 nonsymbiotic plant A. P., Jian, G. L., & Xia, G. X. (2006). Ectopic expression of the cotton non- haemoglobin from Arabidopsis thaliana symbiotic hemoglobin gene GhHbd1 trig- reveals a novel N-terminal helical exten- sion. Acta Crystallographica Section D: gers defense responses and increases dis- Biological Crystallography, 70(5), 1411- ease tolerance in Arabidopsis. Plant and 1418. Cell Physiology, 47(8), 1058-1068. Quevillon, E., Silventoinen, V., Pillai, S., Rogers, S., Wells, R., & Rechsteiner, M. (1986). Amino acid sequences common to Harte, N., Mulder, N., Apweiler, R., & rapidly degrade proteins: The PEST hy- Lopez, R. (2005). InterProScan: protein domains identifier. Nucleic acids research, pothesis. Science, 234, 364-369. 33(suppl_2), W116-W120. Roesner, A., Fuchs, C., Hankeln, T., & Burmester, T. (2004). A globin gene of Quillin, M. L., Arduini, R. M., Olson, J. S., & ancient evolutionary origin in lower verte- Phillips, G. N. (1993). High-resolution brates: evidence for two distinct globin crystal structures of distal histidine mu- families in animals. Molecular Biology tants of sperm whale myoglobin. Journal and Evolution, 22(1), 12-20. of molecular biology, 234(1), 140-155. Ross, E. J. H., Shearman, L., Mathiesen, M., Quispel, A. (1988). Hellriegel and Wilfarth's Zhou, Y. J., Arredondo-Peter, R., Sarath, discovery of (symbiotic) nitrogen fixation G., & Klucas, R. V. (2001). Nonsymbiotic hundred years ago. In Nitrogen fixation: hundred years after: proceedings of the hemoglobins in rice are synthesized during germination and in differentiating cell 7th International Congress on N [Triple- bond] Nitrogen Fixation, Koln (Cologne), types. Protoplasma,218(3), 125-133. FRG, March 13-20, 1980/edited by H. Ross, E. J., Stone, J. M., Elowsky, C. G., Arre- Bothe, FJ de Bruijn and WE Newton. dondo-Peter, R., Klucas, R. V., & Sarath, Stuttgart: G. Fischer, 1988. G. (2004). Activation of the Oryza sativa non-symbiotic haemoglobin-2 promoter by Quispel, A. (1990). Discoveries, discussions, the cytokinin-regulated transcription fac- and trends in research on actinorhizal root tor, ARR1. Journal of experimental bot- nodule symbioses before 1978. The biol- ogy of Frankia and actinorhizal plants, 15 any, 55(403), 1721-1731. -33. Sainz, M., Perez-Rontomé, C., Ramos, J., Mu- let, J. M., James, E. K., Bhattacharjee, U., Raes, J., & Van de Peer, Y. (2003). Gene du- Petrich, J.W., & Becana, M. (2013). Plant plication, the evolution of novel gene functions, and detecting functional diver- hemoglobins may be maintained in func- gence of duplicates in silico. Applied bio- tional form by reduced flavins in the nu- clei, and confer differential tolerance to informatics, 2(2), 91-101. nitro‐oxidative stress. The Plant Journal,

BIBLIOGRAPHY 134

76(5), 875-887. Seregelyes, C., Mustárdy, L., Ayaydin, F., Sakamoto, A., Sakurao, S. H., Fukunaga, K., Sass, L., Kovács, L., Endre, G., Lukács, N., Kovács, I., Vass, I., Kiss, G.B., & Matsubara, T., Ueda-Hashimoto, M., Tsu- Horváth, G. V. (2000). Nuclear localiza- kamoto, S., Takahashi, M., & Morikawa, tion of a hypoxia‐inducible novel non‐ H. (2004). Three distinct Arabidopsis he- symbiotic hemoglobin in cultured alfalfa moglobins exhibit peroxidase‐like activity and differentially mediate nitrite‐ cells1. FEBS letters, 482(1-2), 125-130. dependent protein nitration. FEBS letters, Seregelyes, C., Igamberdiev, A. U., Maassen, 572(1-3), 27-32. A., Hennig, J., Dudits, D., & Hill, R. D. (2004). NO-degradation by alfalfa class 1 Santi, C., Bogusz, D., & Franche, C. (2013). Biological nitrogen fixation in non-legume hemoglobin (Mhb1): a possible link to PR‐ 1a gene expression in Mhb1‐ plants. Annals of botany, 111(5), 743-767. overproducing tobacco plants. FEBS let- Sarma, G., Sen, A., Varghese, R., & Misra, A. ters, 571(1-3), 61-66. K. (1998). A novel technique for isolation of Frankia and generation of single-spore Sharma, E., & Ambasht, R. S. (1986). Root cultures. Canadian journal of microbiol- nodule age-class transition, production and decomposition in an age sequence of Alnus ogy, 44(5), 490-492. nepalensis plantation stands in the Eastern Sasakura, F., Uchiumi, T., Shimoda, Y., Su- Himalayas. Journal of Applied Ecology, zuki, A., Takenouchi, K., Higashi, S., & 689-701. Abe, M. (2006). A class 1 hemoglobin Sharma, R. P. (2012). Modelling dry matter gene from Alnus firma functions in symbi- allocation within Alnus nepalensis D. Don otic and nonsymbiotic tissues to detoxify trees in Nepal. International Journal of nitric oxide. Molecular Plant-Microbe Biodiversity and Conservation, 4(2), 47- Interactions, 19(4), 441-450. 53. Schubert, M., Melnikova, A. N., Mesecke, N., Sharp, P. M., & Li, W. H. (1987). The codon Zubkova, E. K., Fortte, R., Batashev, D.R., adaptation index-a measure of directional Barth, I., Sauer, N., Gamalei, Y.V., Ma- mushina, N.S., & Tietze, L. F. (2010). synonymous codon usage bias, and its Two novel disaccharides, rutinose and potential applications. Nucleic acids re- methylrutinose, are involved in carbon search, 15(3), 1281-1295. metabolism in Datisca glomerata.Planta, Shimoda, Y., Nagata, M., Suzuki, A., Abe, M., 231(3), 507-521. Sato, S., Kato, T., Tabata, S., Higashi, S. Schubert, M., Koteyeva, N. K., Wabnitz, P. & Uchiumi, T. (2005). Symbiotic Rhizo- bium and nitric oxide induce gene expres- W., Santos, P., Büttner, M., Sauer, N., sion of non-symbiotic hemoglobin in Lo- Demchenko, K., & Pawlowski, K. (2011). tus japonicus. Plant and Cell Physiology, Plasmodesmata distribution and sugar par- titioning in nitrogen-fixing root nodules of 46(1), 99-107. Datisca glomerata. Planta, 233(1), 139- Silver, W. S. (1964). Root nodules symbiosis I. 152. L endophyte of Myrica cerifera. Journal Schwintzer, C. R. (2012). The biology of of Bacteriology, 87(2), 416-421. Frankia and actinorhizal plants. Elsevier. Sims, G. E., Jun, S. R., Wu, G. A., & Kim, S. H. (2009). Alignment-free genome com- Sen, A. (1996). Electron microscopy and mo- parison with feature frequency profiles lecular biology of Frankia (Doctoral dis- (FFP) and optimal resolutions. Proceed- sertation). ings of the National Academy of Sciences, Sen, A., Daubin, V., Abrouk, D., Gifford, I., 106(8), 2677-2682. Berry, A. M., & Normand, P. (2014). Phy- Smagghe, B. J., Kundu, S., Hoy, J. A., Halder, logeny of the class Actinobacteria revis- P., Weiland, T. R., Savage, A., Venugopal, ited in the light of complete genomes. The orders ‘Frankiales’ and A., Goodman, M., Premer, S. & Hargrove, should be split into coherent entities: pro- M. S. (2006). Role of phenylalanine B10 in plant nonsymbiotic hemoglobins. Bio- posal of Frankiales ord. nov., Geodermato- philales ord. nov., Acidothermales ord. chemistry, 45(32), 9735-9745. nov. and Nakamurellales ord. nov. Inter- Smagghe, B. J., Sarath, G., Ross, E., Hilbert, J. national journal of systematic and evolu- L., & Hargrove, M. S. (2006). Slow ligand tionary microbiology, 64(11), 3821-3832. binding kinetics dominate ferrous hexaco- ordinate hemoglobin reactivities and re-

BIBLIOGRAPHY 135

veal differences between plants and other Sturms, R., DiSpirito, A. A., Fulton, D. B., & species. Biochemistry, 45(2), 561-570. Hargrove, M. S. (2011). Hydroxylamine reduction to ammonium by plant and Smagghe, B. J., Trent III, J. T., & Hargrove, cyanobacterial hemoglobins. Biochemistry, M. S. (2008). NO dioxygenase activity in hemoglobins is ubiquitous in vitro, but 50(50), 10829-10835. limited by reduction in vivo.PloS one, 3 Sturms, R., DiSpirito, A. A., & Hargrove, M. (4), e2039. S. (2011). Plant and cyanobacterial hemo- globins reduce nitrite to nitric oxide under Smagghe, B. J., Hoy, J. A., Percifield, R., anoxic conditions. Biochemistry, 50(19), Kundu, S., Hargrove, M. S., Sarath, G., Hilbert, J.L., Watts, R.A., Dennis, E.S., 3873-3878. Peacock, W.J., & Dewilde, S. (2009). Cor- Suhre, K., & Sanejouand, Y. H. (2004). ElNe- relations between oxygen affinity and se- mo: a normal mode web server for protein quence classifications of plant hemoglo- movement analysis and the generation of bins. Biopolymers, 91(12), 1083-1096. templates for molecular replacement. Nu- Smerdon, S. J., Krzywda, S., Wilkinson, A. J., cleic acids research, 32(suppl_2), W610- Brantley Jr, R. E., Carver, T. E., Hargrove, W614. M. S., & Olson, J. S. (1993). Serine92 (F7) Swensen, S. M. (1996). The evolution of acti- contributes to the control of heme reactiv- norhizal symbioses: evidence for multiple ity and stability in myoglobin. Biochemis- origins of the symbiotic association. try, 32(19), 5132-5138. American Journal of Botany, 1503-1512. Soltis, D. E., Soltis, P. S., Morgan, D. R., Takano, T. (1977). Structure of myoglobin Swensen, S. M., Mullin, B. C., Dowd, J. refined at 2 0 Å resolution: I. crystallo- M., & Martin, P. G. (1995). Chloroplast graphic refinement of metmyoglobin from gene sequence data suggest a single origin sperm whale. Journal of molecular of the predisposition for symbiotic nitro- biology, 110(3), 537-568. gen fixation in angiosperms. Proceedings Takeda, S., & Fukawa, M. (2005). Role of sur- of the National Academy of Sciences, 92 face OH groups in surface chemical prop- (7), 2647-2651. erties of metal oxide films. Materials Sci- Sowa, A. W., Duff, S. M., Guy, P. A., & Hill, ence and Engineering: B, 119(3), 265-267. R. D. (1998). Altering hemoglobin levels Tamura, K., Dudley, J., Nei, M., & Kumar, S. changes energy status in maize cells under (2007). MEGA4: molecular evolutionary hypoxia.Proceedings of the National genetics analysis (MEGA) software ver- Academy of Sciences, 95(17), 10317- sion 4.0. Molecular biology and evolution, 10321. 24(8), 1596-1599. Sprent, J. I., & Sprent, P. (1990). Nitrogen fix- Tarricone, C., Galizzi, A., Coda, A., Ascenzi, ing organisms: pure and applied aspects. P., & Bolognesi, M. (1997). Unusual Nitrogen fixing organisms: pure and ap- structure of the oxygen-binding site in the plied aspects. dimeric bacterial hemoglobin from Spyrakis, F., BidonChanal, A., Barril, X., & Vitreoscilla sp. Structure, 5(4), 497-507. Javier Luque, F. (2011). Protein flexibility Tavares, F., Santos, C. L., & Sellstedt, A. and ligand recognition: challenges for mo- (2007). Reactive oxygen species in legume lecular modeling. Current topics in me- and actinorhizal nitrogen‐fixing symbio- dicinal chemistry, 11(2), 192-210. ses: the microsymbiont’s responses to an Spyrakis, F., Bruno, S., Bidon-Chanal, A., unfriendly reception. Physiologia Planta- Luque, F. J., Abbruzzetti, S., Viappiani, rum, 130(3), 344-356. C., Dominici, P., & Mozzarelli, A. (2011). Taylor, E. R., Nie, X. Z., MacGregor, A. W., & Oxygen binding to Arabidopsis thaliana Hill, R. D. (1994). A cereal haemoglobin AHb2 nonsymbiotic hemoglobin: evidence gene is expressed in seed and root tissues for a role in oxygen transport. IUBMB life, under anaerobic conditions. Plant molecu- 63(5), 355-362. lar biology, 24(6), 853-862. Sturms, R., Kakar, S., Trent III, J., & Thiel, J., Rolletschek, H., Friedel, S., Lunn, J. Hargrove, M. S. (2010). Trema and E., Nguyen, T. H., Feil, R., Tschiersch, H., Parasponia hemoglobins reveal conver- Müller, M., & Borisjuk, L. (2011). Seed- gent evolution of oxygen transport in specific elevation of non-symbiotic hemo- plants. Biochemistry, 49(19), 4085-4093. globin AtHb1: beneficial effects and un-

BIBLIOGRAPHY 136

derlying molecular networks in Arabidop- (Rhamnaceae)-Frankia symbiosis. Cana- sis thaliana. BMC Plant Biology, 11(1), dian Journal of Botany, 77(9), 1302-1310. 48. Vazquez-Limón, C., Hoogewijs, D., Vinogra- Thompson, J. D., Higgins, D. G., & Gibson, T. dov, S. N., & Arredondo-Peter, R. (2012). J. (1994). CLUSTAL W: improving the The evolution of land plant hemoglobins. sensitivity of progressive multiple se- Plant science, 191, 71-81. quence alignment through sequence Vieweg, M. F., Hohnjec, N., & Küster, H. weighting, position-specific gap penalties (2005). Two genes encoding different and weight matrix choice. Nucleic acids truncated hemoglobins are regulated dur- research, 22(22), 4673-4680. ing root nodule and arbuscular mycorrhiza Torrey, J. G. (1978). Nitrogen fixation by ac- symbioses of Medicago truncatula. tinomycete-nodulated angiosperms. Bio- Planta, 220(5), 757-766. science, 28(9), 586-592. Vigeolas, H., Hühn, D., & Geigenberger, P. Trent, J. T., Watts, R. A., & Hargrove, M. S. (2011). Nonsymbiotic hemoglobin-2 leads (2001). Human neuroglobin, a hexacoordi- to an elevated energy state and to a com- nate hemoglobin that reversibly binds oxy- bined increase in polyunsaturated fatty gen. Journal of Biological Chemistry, 276 acids and total oil content when overex- (32), 30106-30110. pressed in developing seeds of transgenic Trent, J. T., & Hargrove, M. S. (2002). A ubiq- Arabidopsis plants. Plant physiology, 155 uitously expressed human hexacoordinate (3), 1435-1444. hemoglobin. Journal of Biological Chem- Vinogradov, S. N., Hoogewijs, D., Bailly, X., istry, 277(22), 19538-19545. Arredondo-Peter, R., Gough, J., Dewilde, Trevaskis, B., Watts, R. A., Andersson, C. R., S., Moens, L., & Vanfleteren, J. R. (2006). A phylogenomic profile of globins.BMC Llewellyn, D. J., Hargrove, M. S., Olson, J.S., Dennis, E.S., & Peacock, W. J. Evolutionary Biology, 6(1), 31. (1997). Two hemoglobin genes in Arabi- Vinogradov, S. N., Hoogewijs, D., Bailly, X., dopsis thaliana: the evolutionary origins Mizuguchi, K., Dewilde, S., Moens, L., & of leghemoglobins. Proceedings of the Vanfleteren, J. R. (2007). A model of glo- National Academy of Sciences, 94(22), bin evolution. Gene,398(1), 132-142. 12230-12234. Vinogradov, S. N., & Moens, L. (2008). Diver- Trevisan, S., Manoli, A., Begheldo, M., Nonis, sity of globin function: enzymatic, trans- A., Enna, M., Vaccaro, S., Caporale, G., port, storage, and sensing. Journal of Bio- Ruperti, B., & Quaggiotti, S. (2011). Tran- logical Chemistry,283(14), 8773-8777. scriptome analysis reveals coordinated Vinogradov, S. N., Tinajero-Trejo, M., Poole, spatiotemporal regulation of hemoglobin R. K., & Hoogewijs, D. (2013). Bacterial and nitrate reductase in response to nitrate and archaeal globins—a revised perspec- in maize roots. New Phytologist, 192(2), tive. Biochimica et Biophysica Acta (BBA) 338-352. -Proteins and Proteomics, 1834(9), 1789- Truog, E. (1930). The determination of the 1800. readily available phosphorus of soils. Von Tubeuf, K. F. (1895). Pflanzenkrankheiten Agronomy Journal, 22(10), 874-882. durch kryptogame parasiten verursacht: Uchiumi, T., Shimoda, Y., Tsuruta, T., Mu- eine einführung in das studium der para- koyoshi, Y., Suzuki, A., Senoo, K., Sato, sitären pilze, schleimpilze, spaltpilze und S., Kato, T., Tabata, S., Higashi, S. & Abe, algen. zugleich eine anleitung zur bekämp- M. (2002). Expression of symbiotic and fung von krankheiten der kulturpflanzen. nonsymbiotic globin genes responding to J. Springer. microsymbionts on Lotus japonicus. Plant Voronin, M. (1866). Uber die bei der and cell physiology, 43(11), 1351-1358. schwarzerle (Alnus glutinosa) und der Untergasser, A., Cutcutache, I., Koressaar, T., gewöhnlichen garten-Lupine Lupinus mu- Ye, J., Faircloth, B. C., Remm, M., & Ro- tabilis auftretenden Wurzelanschwellungen zen, S. G. (2012). Primer3—new capabili- (Vol. 10). ties and interfaces. Nucleic acids research, Waksman, S. A. (1950). The Actinomycetes- 40(15), e115-e115. their nature, occurrence, activities, and Valverde, C., & Wall, L. G. (1999). Regulation importance. The Actinomycetes-their na- of nodulation in Discaria trinervis ture, occurrence, activities, and impor-

BIBLIOGRAPHY 137

tance. norhizal plants,365, 389. Walkley, A. (1947). A critical examination of a Wiederstein, M., & Sippl, M. J. (2007). ProSA- rapid method for determining organic car- web: interactive web service for the recog- bon in soils-effect of variations in diges- nition of errors in three-dimensional struc- tion conditions and of inorganic soil con- tures of proteins. Nucleic acids research, stituents. Soil Science, 63(4), 251-264. 35(suppl_2), W407-W410. Wall, L. G. (2000). The actinorhizal symbiosis. Wittenberg, J. B., Bolognesi, M., Wittenberg, Journal of plant growth regulation, 19(2), B. A., & Guertin, M. (2002). Truncated 167-182. hemoglobins: a new family of hemoglo- Watts, R. A., Hunt, P. W., Hvitved, A. N., bins widely distributed in bacteria, unicel- lular eukaryotes, and plants. Journal of Hargrove, M. S., Peacock, W. J., & Den- nis, E. S. (2001). A hemoglobin from Biological Chemistry, 277(2), 871-874. plants homologous to truncated hemoglo- Wright, F. (1990). The ‘effective number of bins of microorganisms. Proceedings of codons’ used in a gene. Gene, 87(1), 23- the National Academy of Sciences, 98(18), 29. 10119-10124. Wu, G., & Freeland, S. (2005). Quantifying Watson, J. D., & Milner-White, E. J. (2002). A unequal patterns of synonymous codon novel main-chain anion-binding site in usage. The CAI calculator. proteins: The nest. A particular combina- Yang, J., & Zhang, Y. (2015). Protein structure tion of φ, ψ values in successive residues and function prediction using I‐TASSER. gives rise to anion-binding sites that occur Current protocols in bioinformatics, 5-8. commonly and are found often at function- ally important regions. Journal of molecu- Yang, L. X., Wang, R. Y., Ren, F., Liu, J., lar biology, 315(2), 171-182. Cheng, J., & Lu, Y. T. (2005). AtGLB1 enhances the tolerance of Arabidopsis to Webb, B., & Sali, A. (2014). Protein structure hydrogen peroxide stress.Plant and cell modeling with MODELLER. Protein physiology, 46(8), 1309-1316. Structure Prediction, 1-15. Zheng, Y., Xu, D., & Gu, X. (2007). Func- Werner, D. (1992). Physiology of nitrogen- tional divergence after gene duplication fixing legume nodules: compartments and and sequence–structure relationship: a case functions. Biological nitrogen fixation, study of G‐protein alpha subunits. Journal 399-431. of Experimental Zoology Part B: Molecu- Wheeler, C. T., & Miller, I. M. (1990). Current lar and Developmental Evolution, 308(1), and potential uses of actinorhizal plants in 85-96. Europe. The biology of Frankia and acti-

INDEX Index

117, 118, 122 A Actinobacteria, 5, 6, 7, 14, 48, 56, 58, F 88, 91, 92, 93, 110 FASTA, 48 Actinohaemoglobin, 48, 84, 95, 101, Functional divergence, 36, 37, 61, 62, 102 111, 113, 114, 115 Actinomycetes, 2, 13, 14, 15 G Actinorhizal, 2, 3, 4, 5, 6, 7, 8, 9, 10, GC, 88, 91, 92, 120 11, 13, 15, 16, 17, 18, 34, 37, 48, 58, GC3, 56, 57, 88, 91, 92, 84, 85, 87, 88, 95, 99, 101, 102, 103, 105, 106, 107, 108, 110, 120, 121 H Alignment, 5, 20, 34, 35, 36, 58, 59, Hetero atom, 103, 121 60, 61, 62, 111 Homodimer, 103, 107, 121 B Homology modeling, 6, 34, 58, 59, Bacterial haemoglobin (bHb), 34, 35, 102, 103, 121 36, 37, 56, 57, 58, 60, 61, 90, 97, 98, L, M 99, 101, 103, 107, 109, 110, 119, 120, 121, 122, 123, 120, 121 Leghaemoglobin (LHb), 3, 4, 20, 23, 24, 25, 26, 85, 87, 111, 113, 114, 95, Biological nitrogen fixation (BNF), 1, 97, 99, 102, 120 12, 17 Motif, 6, 30, 57, 58, 95, 97, 99, 120, Blast, 58, 95, 97, 102, 103, 116 121 C N Codon, 06, 56, 57, 88, 91, 92, 120 NOD, 10, 11, 93, 95, 118, 120 Codon adaptation index (CAI), 56, 57, Non-symbiotic haemoglobin (nsHb), 91, 92 25, 26, 27, 28, 29, 31, 48, 58, 85, 87, D 88, 95, 97, 99, 101, 102, 103, 106, 107, 108, 109, 110, 111, 113, 114, 116, 117, Deformation energy, 108, 121 118, 120, 121, 122 DNA, 11, 15, 20, 40, 42, 44, 45, 46, Normal Mode Analysis (NMA), 60, 47, 56, 67, 74, 75, 108 108 E P Ecology, 5, 38, 68 Polymerase Chain Reaction (PCR), 15, Effective number of codon (Nc), 56, 45, 46, 47, 48 91, 92 Plant haemoglobin (pHb), 3, 4, 5, 6, Expression, 5, 6, 26, 27, 28, 29, 30, 37, 18, 24, 25, 34, 35, 36, 37, 84, 85, 87, 56, 57, 62, 66, 67, 91, 92, 115, 116, 88, 95, 97, 99, 101, 102, 108, 111, 113,

INDEX 139 114, 115, 120, 121 Real Time Polymerase Chain Reaction Plant truncated haemoglobin (ptHb), 3, (RT-PCR), 66, 116, 117 4, 5, 6, 17, 25, 30, 48, 58, 85, 87, 95, S 97, 99, 101, 102, 106, 107, 108, 109, Soil, 1, 2, 5, 9, 10, 11, 12, 14, 15, 18, 110, 111, 113, 114, 115, 120, 121 24, 40, 41, 62, 64, 68, 70, 73, 74, 88, Population genetics, 6, 74, 119 119 PSI-BLAST, 58, 102 Structure, 5, 6, 9, 13, 14, 15, 17, 19, 21, 25, 27, 28, 31, 32, 34, 35, 36, 58, R 59, 60, 61, 102, 103, 105, 106, 107, RAPD, 46, 47, 48, 75, 82 108, 110, 111, 121 Ramachandran plot, 59, 106 Symbiotic haemoglobin (sHb), 3, 19, Ribosomal, 15, 56, 57 24, 25, 26, 27, 28, 29, 31, 48, 58, 85, 87, 88, 95, 97, 99, 101, 102, 103, 106, RNA, 44, 66, 67, 116, 122 107, 108, 109, 110, 111, 113, 114, 116, Root mean square deviation (RMSD), 117, 118, 120, 121, 122 35, 61, 106, 111 T Root nodule, 3, 4, 7, 9, 10, 12, 13, 14, Template, 34, 46, 58, 59, 67, 102, 103, 16, 17, 18, 24, 26, 70, 116 106, 108

Appendix A

List of publications

Published:

Bhattacharya, S., Sen, A., Thakur, S., & Tisa, L. S. (2013). Characterization of haemoglobin from Actinorhi- zal plants–An in silico approach. Journal of biosciences, 38(4), 777-787.

Pal, S., Bhattacharya, S., Sen, A., Pati, B. R., Mondal, K.C., DasMohapatra, P. K. (2015). Functional elucida- tion and structure prediction of certain hypothetical proteins in Candida glabrata CBS 138: an in silico approach. Journal of advanced microbiology, 2 (1), 32-53

Roy, A., Bhattacharya, S., Bothra, A. K., & Sen, A. (2013). A Database for Mycobacterium Secretome Analysis:‘MycoSec’to Accelerate Global Health Research. Omics: a journal of integrative biology, 17 (10), 502-509.

Submitted:

Bhattacharya, S., Tisa, L. S., & Sen, A. (2017). Characterization of actino-haemoglobins with reference to evolution of plant truncated haemoglobins. (Manuscript submitted)

Manuscript under preparation:

Bhattacharya, S., Bhattacharya, M., & Sen, A. Population genetics and ecological study of Alnus nepalensis in sub-Himalayan West Bengal and Sikkim. (Manuscript under preparation)

Bhattacharya, S., Bhattacharya, M., & Sen, A. Expression study of Alnus nepalensis haemoglobin, in different plant region. (Manuscript under preparation)

Bhattacharya, S., Bhattacharya, M., & Sen, A. Functional annotation of actinobacterial haemoglobins depicted their host specificity. (Manuscript under preparation)

Bhattacharya, S., Bhattacharya, M., & Sen, A. Multipurpose nature of plant haemoglobins – a short review. (Manuscript under preparation)

One Appendix B

List of abbreviations

% Percent / Per °C Degree Celsius N North E East ′ Minute ″ Second 3’→5’ 3 prime to 5 prime 5’→3’ 5 prime to 3 prime α Alpha μg Microgram μl Microlitre μmol Micromole gm Gram(s) gm/l Gram(s) per litre hr Hour(s) mg Milligram ml Millilitre(s) mM Millimolar mm Millimeter µM Micromolar cm Centimeter kb Kilo base pair M Molar rpm Revolution per minute pH Negative logarithm of Hydrogen concentration (Measurement of acidity and basicity) O.D. Optical Density CTAB Cetyl Trimethyl Ammonium Bromide DNA Deoxyribose Nucleic Acid RNA Ribose Nucleic Acid PCR Polymerase chain reaction RT-PCR Real Time PCR RAPD Random Amplified Polymorphic DNA NCBI National Centre for Biotechnol- ogy Information IMG Integrated Microbial Genome CAI Codon Adaptation Index BLAST Basic Local Alignment Search Tool sqKm Square kilometre Two GPS Global positioning system EDTA Ethylenediaminetetraacetic acid TE Tris EDTA Å Angstron ft Foot HgCl2 Mercuric chloride H2O2 Hydrogen peroxide -ve Negative +ve Positive PDB Protein Data Bank 3D Three dimensional kD Kilo dalton Etc Et cetera i.e. That is NC Effective number of codon GC3 Guanine and Cytosine content at the third codon position COG Cluster of orthologous groups of proteins RMSD Root-mean-square deviation > Greater than Hb Haemoglobin SHb Symbiotic haemoglobin NsHb Non-symbiotic haemoglobin PHb Plant haemoglobin BHb Actinobacterial haemoglobin PtHb Plant truncated haemoglobin

Three Appendix-C

Buffer and chemical used for DNA fingerprinting studies

CTAB- buffer: 100mM Trizma Base (Sigma, Cat# T1503) (pH-8.0) 20mM EDTA (Merck India, Cat# 60841801001730) (pH – 8.0) 1.4 M NaCl (Merck India, Cat# 60640405001730) 2% (w/v) CTAB (Hexadecyl cetyl trimethyl ammonium bromide) (Sigma, Cat#H6269) 12.11g of molecular grade Trizma base was dissolved in 400ml double distilled water, pH was adjusted to 8.0 and was divided into two parts of equal volume. To one part 7.44g EDTAwas added and to the other part 81.8g NaCl and 20g CTAB. Both the parts were than mixed and the final volume was made up to 1000 ml with double distilled water prior to autoclaving. The buffer was autoclaved at 121°C and 15 psi for 20 minutes and stored at room tem- perature for further use.

1X TE:

Tris-Cl (pH 8.0)(i.e. 10Mm)=0.6055gm EDTA (pH 8.0)(i.e. 1mM)=0.186gm Final volume=1000ml Both the reagents were dissolved separately and finally mixed together and the final volume was made up to 1000ml with sterile distilled water prior to autoclaving. The buffer was autoclaved at 121°C and 15 psi for 20 min- utes and stored at room temperature for further use.

5X TBE (Tris-botate-EDTA) buffer: Trizma base (Sigma, Cat# T1503)=27 gm Boric acid (Sigma, Cat# 15663)=13.75gm 0.5M EDTA (pH 8.0)=1.86gm Final volume=1000ml All the reagents were dissolved separately and finally mixed together and final volume was made up to 1000ml with sterile distilled water prior to autoclaving. The buffer was autoclaved at 121°C and 15 psi for 20 minutes and stored at room temperature for further use.

Sodium acetate:

Sodium acetate (Sigma, Cat#S-9513)=12.3045gm Final volume=50ml In 50ml of sterile distilled water, 12.3045gm of sodium acetate was dissolved and autoclaved at 121°C, 15 psi for 20 minutes and stored at room temperature for further use.

SDS (10%):

SDS (Sigma, USA. Cat#L4390)=5gm Final volume=50ml In 20ml of sterile distilled water, 5gm of SDS was added and heated to dissolve. The final volume was made upto 50ml and autoclaved for further use.

Four Sodium chloride (NaCl)(5M):

NaCl=14.61gm Final volume=50ml

6X gel loading dye:

TYPE 3: 0.25% Bromophenol blue (Sigma, Cat# B0126) 0.25% Xylene cyanol FF (Sigma, Cat# X4126) 30% Glycerol (Merck India, Cat#61756005001730) in water Stored at 4°C

RNase A:

The RNasa A enzyme (Sigma, Cat#R4875) was dissolved at a concentration of 10mg/ml in 0.01M sodium acetate (Sigma, Cat#S9513) (pH 5.2). The solution was heated at 100°C for 15 minutes in a water bath and allowed to cool slowly to room temperature. The pH was adjusted by adding 1/10 volume of 1M Tris-Cl (pH 7.4) and stored at - 20°C for further use. Note: Both 0.01 M sodium acetate and Tris-Cl were prepared and autoclaved at 121°C and 15 psi for 20 minutes prior to use.

The other chemical used for the molecular works are:

Ready MixTM Taq PCR Reaction (Sigma, Cat# P4600) Chloroform (Merck India, Cat# 822265) Isoamylalcohol (Merck India, Cat# 8.18969.1000) Phenol (Sigma, Cat# P4557) Isopropanol (Merck India, Cat# 17813) Absolute ethyl alcohol (BDH, Cat# 10107) Agarose (Sigma, Cat# A9539) Ethidium bromide (10mg/ml) (Hi Media, Cat# RM813) Lambda DNA/ EcoRI/ HindIII double digest (Promega, Cat# PR-G1731) 100bp ladder (Sigma, Cat# 1473) Go Taq Green Master Mix (Promega, Cat# M7122) HgCl2 Hoagland solution TRIzol reagent (Invitrogen) SYBR Green PCR master mix (Applied Biosystems)

Five Appendix-D

Software used in study

Name Executable Description

Codon W Windows Program for codon and amino acid usage

Modeller Windows Standalone program for homology modeling

XLSTAT Windows Statistical and data analysis software package

MEGA Windows Tool for sequence alignment and phylogeny

ClustalW Windows Multiple sequence alignment program

Phylip Windows Phylogeny Inference Package computer program

Figtree Windows Graphical viewer of phylogenetic trees

Divergent Windows Program for detection of functional divergence

Swiss-PDB viewer Windows Environment for comparative protein modeling

CMG-biotools Windows Stand alone OS for comparative microbial genomics

MUSTANG Windows/Linux Algorithm for protein structure alignment

DSSP Linux Program for secondary structure alignment

NTSYSpc2 Windows Program for analyzing binary matrix

Six Appendix-E Web server used in present study Name Web address Description JGI-IMG www.img.jgi.doe.gov Integrated Microbial Genomes system

NCBI www.ncbi.nlm.nih.gov/ For molecular biology information PDB www.rcsb.org/ Repository of 3D protein structure ProFunc http://www.ebi.ac.uk/thornton-srv/ databases/ Identifies biochemical function of pro- ProFunc/ tein from its 3D structure VERIFY3D http://nihserver.mbi.ucla.edu/ Verify_3D/ Aids in the refinement of 3D structures

ERRAT http://nihserver.mbi.ucla.edu/ ERRATv2/ Protein structure verification algorithm

ProSa https://prosa.services.came.sbg.ac.at/ prosa.php Refinement and validation of crystallo- graphic structures DALI http://ekhidna.biocenter.helsinki.fi/ dali_server/ Comparing protein structures in 3D

CASTp server http://sts.bioengr.uic.edu/castp/ Identification of protein pockets

CAI http://userpages.umbc.edu/~wug1/ codon/cai/ Calculation of codon adaptation index Calculator 2 cais.php ASAview http://gibk26.bio.kyutech.ac.jp/ jouhou/shandar/ Provides graphical representation of sol- netasa/asaview/ vent accessibility WEBnm@ http://www.bioinfo.no/tools/ normalmodes Web application for NMA of proteins ElNe′mo http://igs-server.cnrs-mrs.fr/elnemo/ index.html Tool for prediction of protein movements

ExPASy https://www.expasy.org/ Bioinformatics resource portal ProtParam tool http://web.expasy.org/protparam/ Tool for computation of various physical and chemical parameters I-TASSER http://zhanglab.ccmb.med.umich.edu/I-TASSER/ Algorithm to predict protein structure and function prediction MolMovDB http://www.molmovdb.org/ Tools for measuring protein flexibility and Geometric Analysis Pfam http://pfam.xfam.org/ Database for large collection of protein families MEME Suite http://meme-suite.org/ Tools for motif-based analysis of DNA, RNA and protein sequences BacMap http://wishart.biology.ualberta.ca/BacMap/ Tools for Exploring Bacterial Genomes

BLAST https://blast.ncbi.nlm.nih.gov/Blast.cgi Basic Local Alignment Search Tool 3v website http://3vee.molmovdb.org/ Volume Calculation and Extraction Pro- cedures EST database https://www.ncbi.nlm.nih.gov/nucest/ Collection of short single-read transcript sequences from GenBank

Seven Characterization of haemoglobin from Actinorhizal plants – An in silico approach

SANGHATI BHATTACHARYA1; ARNAB SEN2,*; SUBARNA THAKUR2 and LOUIS STISA3 1Department of Botany, University of North Bengal, Raja Rammohunpur, Siliguri, India 2Bioinformatics Facility, Department of Botany, University of North Bengal, Siliguri 734013, India 3Department of Cellular, Molecular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA

*Corresponding author (Email, [email protected])

Plant haemoglobins (Hbs), found in both symbiotic and non-symbiotic plants, are heme proteins and members of the globin superfamily. Hb genes of actinorhizal mostly belong to the non-symbiotic type of haemoglobin; however, along with the non-symbiotic Hb, Casuarina sp. posses a symbiotic one (symCgHb), which is expressed specifically in infected cells of nodules. A thorough sequence analysis of 26 plant Hb proteins, currently available in public domain, revealed a consensus motif of 29 amino acids. This motif is present in all the members of symbiotic class II Hbs including symCgHb and non-symbiotic Class II Hbs, but is totally absent in Class I symbiotic and non-symbiotic Hbs. Further, we constructed 3D structures of Hb proteins from Alnus and Casuarina through homology modelling and peeped into their structural properties. Structure-based studies revealed that the Casuarina symbiotic haemoglobin protein shows distinct stereochemical properties from that of the other Casuarina and Alnus Hb proteins. It also showed considerable structural similarities with leghemoglobin structure from yellow lupin (pdb id 1GDI). Therefore, sequence and structure analyses point to the fact that symCgHb protein shows significant resemblance to symbiotic haemoglobin found in legumes and may thus eventually play a similar role in shielding the nitrogenase from oxygen as seen in the case of leghemoglobin.

[Bhattacharya S, Sen A, Thakur S and Tisa LS 2013 Characterization of haemoglobin from Actinorhizal plants – An in silico approach. J. Biosci. 38 777–787] DOI 10.1007/s12038-013-9357-0

1. Introduction Hbs (s-Hbs) are found mostly in the root nodules of leguminous plants, while non-symbiotic Hbs (ns-Hbs) are expressed in both Haemoglobins (Hbs) are heme proteins with distinguished roles leguminous and non-leguminous plants. In plants, three distinct in oxygen transport and respiration in animals. They are found types of haemoglobins have been characterized: symbiotic (s- ubiquitously in eukaryotes and in many bacteria (Dordas et al. Hb), nonsymbiotic (ns-Hb) and truncated (t-Hb) haemoglobins 2003). Plant Hbs, structurally similar to animal Hbs and myo- (Duff et al. 1997; Arredondo-Peter et al. 1998).s-HBsservesto globins, were first characterized from the root nodules of legu- facilitate oxygen diffusion to the bacterial endosymbiont and to minous plants (Kubo 1939). Initially, plant Hb proteins were buffer the free oxygen concentration at a low tension to protect thought to be restricted to plant species carrying out symbiotic the nitrogenase from oxygen-inactivation (Appleby 1992). ns- nitrogen fixation, but further analysis revealed their presence in Hbs are mainly involved in NO scavenging (Gupta et al. 2011) non-nodulating plants (Arredondo-Peter et al. 1997). Symbiotic and their expression is directly associated with protection

Keywords. Actinorhizal plant; haemoglobin (Hb); homology modelling; vesicle

Abbreviations used: CG, conjugate gradient; non-symCgHb, non-symbiotic haemoglobin of Casuarina glauca; ns-Hb, nonsymbiotic haemoglobins; s-Hb, symbiotic haemoglobins; symCgHb, symbiotic haemoglobin of Casuarina glauca; t-Hb, truncated haemoglobins http://www.ias.ac.in/jbiosci J. Biosci. 38(4), November 2013, 777–787, * Indian Academy of Sciences 777

Published online: 1 October 2013 778 Sanghati Bhattacharya et al. against hypoxic challenge (Hunt et al. 2002). They are further 2.4 Constructions of 3D models categorized into two classes: class I, with dramatically different oxygen-binding properties compared to s-Hbs, and class II, with Three-dimensional protein models of Alnus and C. glauca similar oxygen-binding properties to s-Hbs (Dordas et al. 2003). ns-Hbs along with s-Hbs were constructed by homology t-Hbs share some characteristics with ns-Hbs, but the exact role modelling approach. A suitable template was identified by of t-Hbs remains largely unknown (Watts et al. 2001). Amongst PSI-BLAST (Altschul et al. 1997) against the Protein Data actinorhizal plants, Casuarina glauca contain both symbiotic Bank (PDB) proteins available at the NCBI web server and (symCgHb) and non-symbiotic haemoglobin (non-symCgHb). an appropriate template was selected on the basis of se- In fact, it is the only actinorhizal species that is known to contain quence and phylogenetic similarity (Centeno et al. 2005). s-Hb, and is expressed at a high level in the nodules that forms a Multiple sequence alignments were performed using nitrogen-fixing symbiosis with Frankia. The symCgHb protein ClustalW2 (Thompson et al. 1994). Three-dimensional is similar to other nodule-expressed sym-Hb proteins in respect models of the proteins were constructed by using the to having an unusually low pH-sensitive oxygen off-rate MODELLER 9v6 program (Sali and Blundell 2009) based (Dordas et al. 2003). on its alignment with the template protein (Centeno et al. In the present study, we have carried out a thorough in 2005). The constructed model was subjected to energy mini- silico sequence-based study to identify different trends in the mization for the refinement of the structures, and keeping a physical and chemical properties of haemoglobin proteins harmonic constraint of 100 kJ/mol/Å2. The steepest descent from different plants. Analysis of physiochemical parameters (SD) and conjugate gradient (CG) methods (both 100 steps) revealed that class II ns-Hb proteins shared some features were used to remove existing bad sectors between the pro- with class II s-Hb proteins. A characteristic motif of 29 tein atoms and protein structure geometry. The refined amino acid residues, present in all of the class II s-Hb and models were submitted to ProFunc (http://www.ebi.ac. class II ns-Hb proteins, was also identified. Further, 3D uk/thronton-srv/databases/ProFunc) (Laskowski et al. structures of Hb proteins from Alnus and Casuarina (both 2005) to garner information about functionally important non-symCgHb and symCgHb) have been derived by homol- regions of the protein. ogy modelling approaches and have been analysed to ex- plore the structural features that influence the functional differences of these proteins. 2.5 Evaluation of refined models

Refinements of the modelled structures were performed 2. Materials and methods using a variety of web-based servers. A stringent refinement policy was adopted in order to ensure that the modelled 2.1 Sequences retrieval structures were void of any structural errors. To ensure accuracy and reliability, the refined protein models were Sequences of symbiotic and non-symbiotic types of Hb evaluated via ProSA (Wiederstein and Sippl 2007)and genes were retrieved from NCBI (http://www.ncbi.nlm. VERIFY3D (Eisenberg et al. 1997). Ramachandran plot nih.gov/) database. Accession number and the details of (Ramachandran et al. 1963) was used to assess the retrieved sequences are listed in table 1. constructed model for its backbone conformation and also to inspect the favourable and unfavourable regions of the modelled structure. We used ERRAT (Colovos and Yeates 2.2 Physiochemical parameter analysis 1993)(http://nihserver.mbi.ucla.edu/ERRATv2) and SAVES (Structure analysis and verification server) (http://nihserver. Physiochemical data were generated from the ProtParam mbi.ucla.edu/SAVES/) for other verifications. (http://web.expasy.org/protparam/) software using ExPASy server. 2.6 Studying intrinsic dynamics of the protein model

2.3 Identification of domains and motifs The structural dynamics task was accomplished using vari- ous web-based strategies. WEBnm (http://www.bioinfo.no/ The amino acid sequences were aligned and subjected to tools/normalmodes) (Hollup et al. 2005) was used to calcu- BLOCK MAKER for domain analysis. Accordingly, the late the slowest modes and related deformation energies. blocks were fed to MEME suit for motif elicitation followed ElNemo (http://igs-server.cnrs-mrs.fr/elnemo/index.html) by MAST (http://meme.ebi.edu.au/meme/cgi-bin/meme.cgi) (Suhre and Sanejouand 2004) was utilized to calculate the search. A BLAST search was carried out on the screened normal mode analysis of the proteins that contribute to the motifs for identification of conserved protein motifs. corresponding protein movements. Normal mode analyses

J. Biosci. 38(4), November 2013 Characterization of haemoglobin from Actinorhizal plants 779

Table 1. Accession numbers of the symbiotic and non-symbiotic types of Hb protein sequences

Sample no. Source organisms Types Accession No.

1 Parasponia andersonii Symbiotic class I 1212354A 2 Casuarina glauca Symbiotic class II P08054.2 3 Lupinus luteus Symbiotic class II AAC04853.1 4 Sesbania rostrata Symbiotic class II CAA31859.1 5 Vigna unguiculata Symbiotic class II AAA86756.1 6 Phaseolus vulgaris Symbiotic class II AAA33767.1 7 Glycine max Symbiotic class II CAA23729.1 8 Medicago sativa Symbiotic class II AAB48005.1 9 Vicia faba Symbiotic class II CAA90869.1 10 Pisum sativum Symbiotic class II BAA31155.1 11 Lotus japonicus Symbiotic class II BAE46738.1 12 Casuarina glauca Non-symbiotic class I P23244.1 13 Alnus firma Non-symbiotic class I BAE75956.1 14 Glycine max Non-symbiotic class I AAA97887.1 15 Medicago sativa Non-symbiotic class I AAG29748.1 16 Gossypium hirsutum Non-symbiotic class I AAX86687.1 17 Lotus japonicus Non-symbiotic class I BAE46739.1 18 Arabidopsis thaliana Non-symbiotic class I AEC06463.1 19 Trema tomentosa Non-symbiotic class I CAA68405.1 20 Oryza sativa Non-symbiotic class I AAK72231.1 21 Zea mays Non-symbiotic class I NP_001104966.1 22 Hordeum vulgare Non-symbiotic class I AAB70097.1 23 Brassica napus Non-symbiotic class II AAK07741.1 24 Gossypium hirsutum Non-symbiotic class II AAK21604.1 25 Cichorium intybus Non-symbiotic class II CAA07547.1 26 Arabidopsis thaliana Non-symbiotic class II AEE74919.1 predict the probable movements of the proteins and aid in from 7.84 to 9.3. The low pI value of Class II s-Hb and selection of the slowest activity of proteins (Hollup et al. ns-Hb can be attributed to the dominance of surface metal- 2005). Determination of the lowest frequency modes was OH species (Satoshi and Makoto 2005) and explains the performed using MolMovDB (http://molmovdb.org/) higher oxygen binding capacity of metal cofactors of class (Alexandrov et al. 2005). Solvent accessibility of the amino II s-Hb and ns-Hb members relative to class I members. acid residues in the modelled proteins was determined by Results also reveal that surface of class II ns-Hb and s-Hb using ASA-view (http://gibk26.bse.kyutech.ac.jp/~shandar/ proteins are rich in negatively charged residues, while netasa/asaview/) (Ahmad et al. 2004). class I ns-Hb and s-Hb proteins contain more positively charges residues on their surface. The in vivo half-life of a protein is calculated via the 3. Results and discussion instability index (Guruprasad et al 1990), which indicates the extent of stability of the proteins. Previously, it was 3.1 Physiochemical parameter analysis reported that proteins having instability index of more than 40 have an in vivo half-life of less than 5 h, while those Physiochemical features of haemoglobin protein sequences proteins having instability index value less than 40 have a from various plants are provided in table 2. The total longer in vivo half-life of 16 h (Rogers et al. 1986). In our number of amino acid residues ranged from 145 to 167 study, the instability index values of most of the studied with variable molecular weight. The pI values of class II s- haemoglobin protein sequences were found to be lower than Hb ranged from 5.29 to 6.97, while class II ns-Hb ranged 40, except for class II ns-Hb from Arabidopsis thaliana. The from 5.4 to 5.89. The pI values for only known class I s- thermostability of the proteins was assessed by the aliphatic Hb from Parasponia was found to be 8.59 (Wittenberg index. The aliphatic index is directly proportional to the et al. 1986). In case of class I ns-Hb, the pI value ranged thermostability, and is defined as relative volume occupied

J. Biosci. 38(4), November 2013 780 Sanghati Bhattacharya et al.

Table 2. Physiochemical features of symbiotic and non-symbiotic types of Hb protein sequences from various plants

Organisms No of Molecular Theoritical Total no. Total no. Instability Aliphatic GRAVY amino acids weight pI negative of positive index index residues residues

Parasponia andersonii(S-I) 162 18.18 8.59 19 21 38.32 86.05 −0.098 Pisum sativum(S-II) 146 15.94 5.57 19 15 29.85 92.88 0.019 Vicia faba(S-II) 146 15.85 5.92 18 16 33.24 98.22 0.085 Medicago sativa(S-II) 147 15.93 5.33 18 14 36.41 86.26 −0.012 Lotus japonicus(S-II) 147 15.75 5.29 18 13 19.45 87.69 −0.007 Glycine max(S-II) 151 16.26 6.07 18 16 22.94 95.7 0.055 Phaseolus vulgaris(S-II) 146 15.62 6.09 15 14 21.08 95.07 0.009 Vigna unguiculata(S-II) 145 15.36 6.11 17 16 14 97.03 0.108 Sesbania rostrata(S-II) 148 15.9 5.61 19 16 18.47 92.97 −0.007 Lupinus luteus(S-II) 154 16.75 5.79 19 16 19.34 101.95 0.066 Casuarina glauca(S-II) 152 17.24 6.97 20 20 32.91 91.91 −0.37 Cichorium intybus(ns-II) 161 18.02 5.51 25 21 28.61 90.87 −0.188 Gossypium hirsutum(ns-II) 159 18.1 5.44 28 21 36.67 82.2 −0.473 Arabidopsis thaliana(ns-II) 158 17.87 5.4 26 19 41.4 92.59 −0.312 Brassica napus(ns-II) 161 18.32 5.89 26 22 30.63 87.2 −0.363 Hordeum vulgare(ns-I) 162 18.04 7.84 21 22 27.97 86.23 −0.096 Zea mays(ns-I) 165 18.28 6.32 23 22 22.38 88.24 −0.016 Oryza sativa(ns-I) 167 18.61 9.3 18 22 21.03 83.05 −0.093 Trema tomentosa(ns-I) 161 18.15 8.59 20 22 31.76 82.98 −0.22 Arabidopsis thaliana(ns-I) 160 18.03 8.46 20 22 32.47 85.31 −0.148 Lotus japonicus(ns-I) 161 18.04 9 18 22 33.4 80.56 −0.201 Gossypium hirsutum(ns-I) 163 18.38 8.77 21 24 20.56 86.2 −0.08 Medicago sativa(ns-I) 160 17.96 9.08 19 23 23.79 81.69 −0.255 Glycine max(ns-I) 161 18.05 8.97 19 22 27.05 79.94 −0.245 Alnus firma(ns-I) 160 17.9 8.95 19 22 26.72 84.12 −0.156 Casuarina glauca(ns-I) 160 17.85 8.93 20 23 24.18 81.12 −0.22

by the aliphatic side chain (Ikai 1980). For Parasponia Hb, GRAVY values indicates that a protein tend to be more the aliphatic index was found to be 86.05. With class II hydrophobic in nature (Klein and Thongboonkerd 2004). s-Hb the values ranged from 86.26-101.95, while the values GRAVY values were found to be highest for class II s-Hb for class I and class II ns-Hbs ranged from 79.94 to 88.24 proteins, followed by class II ns-Hb members. Class I and 82.2 to 92.59, respectively. Thus, the class II members members were found to possess comparatively low of both s-Hb and ns-Hb were more thermostable than class I GRAVY scores. Investigation of various physiochemical members based on the higher values of aliphatic index. parameters revealed that class II members of both s-Hb and GRAVY (grand average of hydropathicity) values reflect ns-Hb possess similar characteristics, which are quite dis- the hydrophobicity of the amino acids. An increase in crete from those of the class I members.

Table 3. Distinct motif identified in various symbiotic and non-symbiotic types of plant Hb protein sequences

Types Motif No. Motif length Motif sequence

Class II (symbiotic 1 29 PQNNPKLQAHAEKVFGMTCDSAIQLRANG And non-symbiotic) All 1 50 CFTEEQEALVVKSWEVMKQNIPNYGLRFYTKIFEIAPSARNMFSFLRDSN 2 49 HFQYGVVDPHFEVTKFALLRTIKEAVPDMWSPEMMNAWWQAYDQLVAAI

J. Biosci. 38(4), November 2013 Characterization of haemoglobin from Actinorhizal plants 781

Figure 1. Different motifs identified in various plants Hb protein sequences.

3.2 Motif finding of the three motifs had a distinct 29 amino acid stretch (PQNNPKLQAHAEKVFGMTCDSAIQLRANG). This It is evident from the results of multiple sequence alignments motif was found in class II ns-Hb and s-Hb proteins, but that there is a distinct homology between plant class II s-Hb was totally absent in class I Hb proteins. The symCgHb also and ns-Hb sequences. A total of seven conserved motifs possesses this unique motif. The distinct motif of 29 amino were observed and some short-length conserved regions acid residues further points towards the functional similarity were also found. The seven recognized motifs of varying of class II ns-Hb proteins with that of s-Hb. We also found width were subjected to protein BLAST for confirmation of other motifs of approximately ~49 amino acid and ~50 the motif annotations. Results revealed that only three out of amino acid residues for all of the Hb protein sequences. seven motifs had similarities with GLOBIN family. One out table 3 shows the motifs of plant Hb protein sequences from

Figure 2. Three-dimensional structure of non-symbiotic Casuarina glauca Hb protein constructed by homology modelling technique.

J. Biosci. 38(4), November 2013 782 Sanghati Bhattacharya et al.

Figure 3. Normalized atomic displacement plot calculated for modes 7 to 12 in Hb proteins of (A) Alnus and (B) Ns-Casuarina. The figure shows the plot of the normalized square atomic displacements which represents the square of the displacement of each C-alpha atom. The highest values corresponded to the most displaced regions and residues with maximum displacements associated with functional sites. X and Y-axis denote residue index in sequence and normal mode of square atomic displacement respectively. various actinorhizal plants. The position of these motifs in Hb proteins, respectively. The haemoglobin protein from the various protein sequences are provided in figure 1. Lupinus luteus [PDB ID-1GDI] had an e-value of 2e−47 and 54% similarity to symCgHb and was found to be the best 3.3 Three-dimensional models of Hb proteins template for this protein. The modelled structures of Alnus and Casuarina Hb proteins were also homodimers. Each subunit of modelled proteins has a hetero atom of heme group To infer about the structural details of the Hb from actinorhizal plants, the 3D structures of the proteins were predicted by containing Fe at the center, which is the core functional region homology modelling technique. The PSI-BLAST search of the protein. Figure 2 displays the modelled structure of the found the crystal structure of the Trema tomentosa Hb protein non-symCgHb protein. Functional analysis of the Hb-proteins [3QQQ - PDB ID] to be the best template for A. firma and C. revealed the presence of nests (Watson and Milner-White glauca class I ns-Hb proteins. T. tomentosa ns-Hb is a 2002) in each chain. These nests are structurally important homodimeric protein and this template had a-match with an motifs found in functionally important regions. The non- e-value of 2e−90 and 2e−91 with A. firma and C. glauca ns-Hb symCgHb protein showed 3 nests in its structure, whereas its proteins, respectively. The Trema Hb protein also had 83% counterpart symCgHb showed 1 nest. CASTp analysis re- and 84% sequence similarities with the A. firma and C. glauca vealed the presence of pockets for ligand interactions on the

J. Biosci. 38(4), November 2013 Characterization of haemoglobin from Actinorhizal plants 783

Figure 3. continued.

protein surface. Alnus and Casuarina ns-Hb proteins showed Alnus and Casuarina ns-Hb proteins revealed that 95.5% of 23 and 29 pockets, respectively, whereas s-Hb displayed 22 the total residues were present in the most favored regions, pockets. No DNA binding templates and enzyme active site while symCgHb exhibited 92.5%. A good quality model is templates were recognized in the protein structures. expected to have more than 90% in the most favoured regions The constructed models were further assessed to estimate of the Ramachandran plot (Ramachandran et al. 1963). their accuracy. The z-score obtained from ProSA analysis PROVE, VERIFY 3D and ERRAT results for all three proteins specifies the overall quality of the models and determines the illustrated that the overall quality of the models are good. These extent to which total energy of the modelled structure drifts results imply that the stereochemical properties and quality of from energy distributions of the random conformations modelled structures are quite consistent. (Wiederstein and Sippl 2007). The Alnus Hb, non-symCgHb In normal mode analysis (NMA), the first six modes and symCgHb proteins had z scores of −7.35, −7.72 and −7.01, matching with global rotation and translation of the system respectively. These results specify that our models are very are generally ignored (Hollup et al. 2005) and hence the much within the range of scores normally found for proteins lowest frequency mode of concern is the seventh one. of comparable size and the outcome of the energy plot signifies NMA of the Hb protein showed that low deformation ener- that 3D models of Hb proteins are reliable and precise. VERIFY gies were associated with relatively rigid regions in the 3D analysis revealed that 80.063% of the residues had an protein. From figures 3 and 4, it can easily be recognized average 3D-1D score >0.2. The Ramachandran plot for the that normalized atomic displacement plot of symCgHb

J. Biosci. 38(4), November 2013 784 Sanghati Bhattacharya et al.

Figure 4. Normalized atomic displacement plot calculated for modes 7 to 12 in Hb proteins of (A) Lupin abd (B)s-Casuarina. The figure shows the plot of the normalized square atomic displacements which represents the square of the displacement of each C-alpha atom. The highest values corresponded to the most displaced regions and residues with maximum displacements associated with functional sites. X and Y-axis denote residue index in sequence and normal mode of square atomic displacement respectively.

shows dissimilarities with non-symCgHb from the sev- rigid regions than symCgHb. The B factors calculated from enth mode onwards. NMA indicated the vibrational and ElNemo analysis signifies that the models of Alnus and thermal properties of the protein at the atomic level. Casuarina ns-Hb proteins contain enough rigid regions and Alnus and Casuarina ns-Hb proteins had the lowest de- are less flexible, while the symCgHb protein model was more formation energies of 2699.93 and 2700.92. The non- flexible. ASAVIEW analysis of the solvent accessibility for symCgHb protein showed higher deformation energy of the modelled proteins pointed out that the accessible residues 3869.35, which is significantly diverse from ns-Hb pro- were present on the outermost ring of the spiral. Figure 5 teins in the seventh mode. B factors calculated from shows the solvent accessibility plot of the symCgHb protein. ElNemo analysis were based on the first 100 normal The majority of negatively charged residues and polar modes and were scaled to match the overall B factors. uncharged residues were present on the outermost surface In the case of Alnus and Casuarina Hb proteins, very low and hydrophobic residues were confined to the inner rings negative correlations were obtained between the computed of the spiral. However, ns-Hb had positively charged and observed B factors. The deformation energies implied that residues predominantly present on the outermost surface the seventh mode of non-symCgHb had comparatively large (data not shown).

J. Biosci. 38(4), November 2013 Characterization of haemoglobin from Actinorhizal plants 785

Figure 4. continued.

4. Conclusion the conformational reaction of the protein. Clefts and cavi- ties present of the protein surface represents the active sites, An investigation of the physiochemical parameters, in- which are responsible for the inherent physicochemical fea- cluding theoretical pI, aliphatic index and GRAVY, tures, and have a vital role in protein functioning. NMA of found that class II s-Hb and ns-Hb proteins share many the Hb protein demonstrated that low deformation energies features, which set them apart from those belonging to are associated with relatively rigid regions in the protein and class I. Members of class II haemoglobin also have revealed that symCgHb and non-symCgHb have different sequence-based similarity and were found to share a motional properties. common motif consisting of 29 amino acid sequences. The combined results from sequence analysis, motif anal- The symCgHb was also found to bear this characteristic ysis and comparison of 3D protein models indicate that motif, which is totally absent in non-symCgHb. This symCgHb and non-symCgHb have distinctively different distinctive sequence features could further be utilized properties. In many aspects, the symCgHb protein is similar for designing strategy for cloning the putative genes to the nodule-expressed symbiotic haemoglobin of legumes. based on PCR amplification using degenerate primers Thus, it may play a similar role in nitrogenase protection The 3D structures presented here are the first ever report- mechanism and provide a plausible explanation towards the ed models of Alnus and Casuarina Hb proteins. The protein absence of vesicle (the site for N2 fixation in Frankia)in models are homo-dimeric with the heme group controlling Frankia–Casuarina symbiosis.

J. Biosci. 38(4), November 2013 786 Sanghati Bhattacharya et al.

Figure 5. Solvent accessibility plot of the s-Casuarina Hb protein.

References Appleby CA 1992 The origin and functions of haemoglobin in plants. Sci. Prog. 76 365–398 Arredondo-Peter R, Hargrove MS, Moran JF, Sarath G, Moran JF, Ahmad S, Gromiha M, Fawareh H and Akinori S 2004 ASAview: Lohrman J, Olson JS and Klucas RV 1997 Rice hemoglobins. database and tools for solvent accessibility representation in Plant Physiol. 115 1259–1266 proteins. BMC Bioinform. 5 51 Arredondo-Peter R, Hargrove MS, Moran JF, Sarath G and Alexandrov V, Lehnert U, Echols N, Milburn D, Engelman D and Klucas RV 1998 Plant hemoglobins. Plant Physiol. 118 Gerstein M 2005 Normal modes for predicting protein motions: 1121–1126 a comprehensive database assessment and associated web tool. Centeno NB, Planas-Iglesias J and Oliva B 2005 Comparative Protein Sci. 14 633–643 modeling of protein structure and its impact on microbial cell Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and factories. Microbial Cell Factories 4 20 Lipman DJ 1997 Gapped BLAST and PSI-BLAST: a new generation Colovos C and Yeates TO 1993 Verification of protein structures: of protein database search programs. Nucleic Acids Res. 25 3389– patterns of nonbonded atomic interactions. Protein Sci. 2 1511– 3402 1519

J. Biosci. 38(4), November 2013 Characterization of haemoglobin from Actinorhizal plants 787

Dordas C, Rivoal J and Hill RD 2003 Plant haemoglobins, nitric Rogers S, Wells R and Rechsteiner M 1986 Amino acid sequences oxide and hypoxic stress. Ann. Bot. 91 172–178 common to rapidly degraded proteins: the PEST hypothesis. Duff SMG, Wittenberg JB and Hill RD 1997 Expression, purifica- Science 234 364–368 tion, and properties of recombinant barley (Hordeum sp.) hemo- Sali A and Blundell TL 2009 Comparative protein modelling globin. J. Biol. Chem. 272 16746–16752 by satisfaction of spatial restraints. J. Mol. Biol. 234 283– Eisenberg D, Luthy R and Bowie JU 1997 VERIFY3D: assessment 291 of protein models with three-dimensional profiles. Methods Satoshi T and Makoto F 2005 Role of surface OH groups in surface Enzymol. 277 396–404 chemical properties of metal oxide films. Mater. Sci. Engr. 119 Gupta KJ, Hebelstrup KH, Mur LAJ and Igamberdiev AU 2011 265–267 Plant hemoglobins: important players at the crossroads be- Suhre K and Sanejouand YH 2004 ElNemo: a normal mode web tween oxygen and nitric oxide. FEBS Lett. 585 3843–3849 server for protein movement analysis and the generation of Guruprasad K, Reddy BVP and Pandit MW 1990 Correlation templates for molecular replacement. Nucleic Acids Res. 32 between stability of a protein and its dipeptide composition: a 610–614 novel approach for predicting in vivo stability of a protein from Thompson JD, Higgins DG and Gibson T 1994 CLUSTALW: its primary sequence. Prot. Eng. 4 155–164 improving the sensitivity of progressive multiple sequence Hollup SM, Salensminde G and Reuter N 2005 WEBnm@: a web alignment through sequence weighting, position-specific gap application for normal mode analysis of proteins. BMC penalties and weight matrix choice. Nucleic Acids Res. 22 Bioinform. 6 1–8 4673–4680 Hunt PW, Klok EJ, Trevaskis B, Watts RA, Ellis MH, Peacock WJ Watson JD and Milner-White EJ 2002 The conformations of and Dennis ES 2002 Increased level of hemoglobin 1 enhances polypeptide chains where the main-chain parts of the suc- survival of hypoxic stress and promotes early growth in cessive residues are enantiomeric. Their occurrence in cation Arabidopsis thaliana. Proc. Nat. Acad. Sci. USA 99 17197– and anion-binding regions of proteins. J. Mol. Biol. 315 17202 187–198 Ikai A 1980 Thermostability and aliphatic index of globular pro- Watts RA, Hunt PW, Hvitved AN, Hargrove MS, Peacock WJ and teins. J. Biochem. 6 1895–1898 Dennis ES 2001 A hemoglobin from plants homologous to Klein JB and Thongboonkerd V 2004 Overview of proteomics. truncated hemoglobins of microorganisms. Proc. Nat. Acad. Proteomics Nephrol. 141 1–10 Sci. USA 98 10119–10124 Kubo H 1939 Uber hamaprotein as den wurzelknollchen von Wittenberg JB, Wittenberg BA, Gibson QH, Trinick MJ and leguminosen. Acta Phytochim. 11 195–200 Appleby CA 1986 The kinetics of the reactions of Laskowski RA, Watson JD and Thronton JM 2005 ProFunc: a Parasponia andersonii hemoglobin with oxygen, carbon server for predicting protein function from structure. Nucleic monoxide, and nitric oxide. J. Biol. Chem. 261 13624– Acids Res. 33 89–93 13631 Ramachandran GN, Ramakrishnan C and Sasisekharan V 1963 Wiederstein M and Sippl MJ 2007 ProSA-web: interactive web Stereochemistry of polypeptide chain configurations. J. Mol. service for the recognition of errors in three-dimensional struc- Biol. 7 95–99 tures of proteins. Nucleic Acids Res. 35 407–410

J. Biosci. 38(4), November 2013 JAM 2 (1) 2015 pp 32 - 53 32 ISSN Online: 2349-7785

Functional Elucidation and Structure Prediction of Certain Hypothetical Proteins in Candida glabrata CBS 138: an In silico Approach

Shilpee Pal 1, Sanghati Bhattacharya 2, Arnab Sen 2, Bikash Ranjan Pati 1, Keshab Chandra Mondal 1, Pradeep Kumar DasMohapatra 1*

1 Bioinformatics Infrastructure Facility Centre, Department of Microbiology, Vidyasagar University, Midnapore 721 102, India 2 Department of Botany, School of Life Sciences, University of North Bengal, Darjeeling 734 013, India Received 27 October 2015; accepted in revised form 20 November 2015

Abstract: Transition from an opportunistic pathogen to a deadly disease forming one, depends on a number of aspects, which activate causative protein production and their functions. In recent world, most of the proteins are still functionally unknown i.e. hypothetical proteins (HPs) and are not negligible. In the present study, functional categorization and structure prediction of 20 HPs of Candida glabrata was performed. Their physicochemical properties, functional domains, as well as interacted proteins were predicted and analyzed. Signal peptide, subcellular localizers, transmembrane regions were determined to explore physical characteristics of selected HPs. They were involved in various important functions such as nuclear-vacuolar junction, metabolic pathways, ATPase execution, DNA binding, mitochondrial transportation, amino acid biosynthesis and catabolism, vesicular transportation, extracellular, DNA repair and cell cycle control, nuclear functions, mitochondrial RNA synthesis and translation, RNA synthesis, glucose transportation and Ca2+ ion exchanger. 3D structure of each HP was determined and energy minimization was done by using GROMOS96 force field implicated in Swiss-Pdb Viewer. Ligand binding sites of the HPs showed the active regions, involved in functional modulation. Thus in silico analysis of HPs is more easy to reveal the structures and functions, which was experimentally very expensive and tedious. It would also be helpful in recognizing the mechanism of pathogenesis and in developing therapeutic drug molecules and their docking studies.

Key words: Candida glabrata, hypothetical proteins, domains, subcellular localizer, functional categorization, structure prediction, active site.

Introduction respiratory tract 5. Phylogenetically it is closely Candida glabrata, earlier known as Crypto- related with Saccharomyces cerevisiae 6 and coccus glabrata, identified by Anderson in 1917 comparatively less pathogenic agent with respect and renamed as Torulopsis glabrata in 1938 1 is to C. albicans. Naturally, Cryptococcus glabrata becoming second highest frequent cause of does not infect its host but takes advantages of ‘candidiasis’ as Candida albicans 2 and showed impaired host immune system and causes fungal equal mortality rate 3. The genus Candida consist infections (candidiasis) of skin, oropharyngeal, of more than 150 species 4 among which C. glab- esophageal, bloodstream infections, etc.7. rata is a normal commensal of human gastro- Transition from normal commensal to pathogenic intestinal tract, oral cavity, alimentary tract, one is an outcome of various aspects such as *Corresponding author (Pradeep Kumar DasMohapatra) E-mail: < [email protected] > © 2015, Har Krishan Bhalla & Sons Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 33 environmental conditions, host immune system, Materials and methods host-pathogen interactions, pathogen’s adaptation Sequence retrieval efficiency, etc.8. These situations actually modulate Total 20 hypothetical protein sequences of C. the pathogenesis causing proteins. From this point glabrata CBS 138 were randomly selected and of view we cannot neglect hypothetical proteins retrieved from IMG-JGI database (https:// (HPs), as their functions are still unknown 9. img.jgi.doe.gov). Structural as well as functional Revelation of these proteins by analyzing their properties of the selected HPs were analyzed by structure and functions may improve the using several bioinformatics tools and databases. knowledge about the pathogen. An interesting fact is, in living world, almost Physicochemical properties analysis half of the proteins belong to HP 10. According to Physicochemical properties such as amino acid IMG-JGI database (https://img.jgi.doe.gov/), length, molecular weight, iso-electric point (pI), complete genome of C. glabrata has already been instability index, aliphatic index and hydro- sequenced but very little amount of proteins have phobicity were calculated by ProtParam tool of been characterized. During study of pathogenesis, ExPASy server (web.expasy.org/protparam/) and characterization of HPs can explore their roles in showed in Table 1. a number of functional pathways. Structure determination will help to reveal the active sites Motifs and domains prediction where ligands bind and accelerate proteins The conserved domains and motifs of HPs were function. It will also help to predict drug molecules predicted by using several databases like against pathogen as well as in docking studies. Conserved Domain Database( CDD-BLAST), Table 1. Physiochemical properties of selected HPs

Accession Amino Molecular pI Positively Negatively Instability Aliphatic Grand No. acid weight charged charged ndex ndex average of number residues residues hydropathy (GRAVY)

XP_444880 1049 119916.6 9.30 163 138 53.57 73.85 -0.878 XP_444843 568 62918.1 8.09 48 45 36.85 92.31 0.357 XP_444844 563 62315.2 7.81 43 41 33.35 89.15 0.318 XP_444845 533 59869.7 8.47 45 39 35.96 93.21 0.322 XP_444846 934 104760.8 6.15 91 101 37.96 86.81 -0.321 XP_444847 794 85613.7 4.28 40 80 29.30 80.52 -0.096 XP_445514 868 96648.3 6.27 92 100 41.25 91.54 -0.452 XP_444860 552 61546.4 8.09 46 43 38.58 90.04 0.321 XP_444851 675 75493.7 8.17 92 89 34.73 101.01 -0.334 XP_444829 707 76529.7 5.59 70 91 30.53 85.64 -0.190 XP_444861 549 61215.1 8.27 46 42 37.65 89.65 0.337 XP_444859 605 69296.2 5.88 78 87 33.22 83.17 -0.396 XP_444820 801 88290.7 8.90 92 83 40.51 96.99 -0.100 XP_444856 548 61736.6 5.57 71 82 46.59 83.30 -0.385 XP_444788 784 86523.0 5.74 91 108 32.67 81.21 -0.368 XP_444794 902 98376.1 4.86 77 116 34.67 103.22 0.179 XP_444780 756 83646.4 8.67 91 85 57.28 59.30 -0.894 XP_444808 851 98674.3 9.19 108 82 33.72 103.49 -0.185 XP_444842 605 69153.8 9.05 82 71 34.98 94.38 -0.307 XP_444795 1122 122964.7 6.42 128 133 33.55 98.94 -0.033 Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 34 Pfam, ScanProsite and TIGRFAMs. Protein-protein interaction prediction CDD-BLAST (http://www.ncbi.nlm.nih.gov/ Search Tool for the Retrieval of Interacting BLAST/) identifies domain in query protein Genes/Proteins; [STRING (http://string-db.org/)] sequence by searching CDD, which is linked with was used to evaluate protein-protein interaction other databases such as PubMed, Entrez Protein of nominated HPs. In this database the co- and NCBI BioSystems, etc. The search algorithm occurrences and association between proteins are is based on Reverse Position-Speciûc BLAST derived from statistical analyses 18. (RPS-BLAST), a variation of Position Specific Iterated Blast (PSI-BLAST) method to scan query Protein structure determination sequence for conserved domains 11. Iterative Threading ASSEmbly Refinement [I- Pfam (http://pfam.sanger.ac.uk/) is an assembly tasser (http://zhanglab.ccmb.med.umich.edu/I- of protein families, superfamilies, domains, Hidden tasser)] server was considered to predict 3D Markov Models ( HMMs), repeats, etc.12. structures of selected HPs. It is an automated Scan Prosite (http://www.expasy.org/tools/ protein structure prediction tool which depends on scanprosite/) identifies motif of query sequence threading analysis 19. Total energy of each by scanning remote homologues from PROSITE predicted structure was calculated using database. It provides signature sequences built by GROMOS96 (GROningen molecular dynamics manually derived alignments and also delivers simulation) force field associated in Swiss-Pdb intra-domain topologies 13. Viewer 20 and energy minimization was performed TIGRFAMs (http://www.jcvi.org/cgi-bin/ to get optimum structure. tigrfams/index.cgi) is a collection of manually curated protein families as well as superfamilies Active site determination defined by HMMs and provides structurally as DogSiteScorer (http://dogsite.zbh.uni-hamburg. well as functionally conserved domains of full- de/) was accessed to determine pockets in predicted length query protein sequence 14. structures. Pockets with highest scores were Domains with 100 % confidence level i.e. considered as active sites of selected HPs. delivered by all above four tools were tabulated in Table 2. Results Physicochemical properties of selected 20 HPs Signal peptide identification of C. glabrata CBS 138 revealed the length of SignalP 4.1 Server 15 was used to detect signal proteins varied from 530bp to 1122bp (Table 1). peptides among the selected HPs. pI of protein XP_444880, XP_444808, XP_444842, XP_444820, XP_444780, Subcellular localization site determination XP_444845, XP_444861, XP_444843, PSORT II (http://psort.hgc.jp/form2.html) was XP_444860 and XP_444844 was higher. The used to predict the subcellular localization of surface of protein XP_444880, XP_444808, selected HPs. It is a new version of PSORT, XP_444851, XP_444820, XP_444780, considers the eukaryotic amino acid sequence as XP_444842, XP_444843, XP_444861, input 16. Reinhardt’s method was used for XP_444860, XP_444845 and XP_444844 were cytoplasmic/nuclear discrimination. more positively charged than other HPs (Table 1). In vitro stability of proteins were determined by Transmembrane region prediction calculating instability index, which was less than Transmembrane Hidden Markov Models 40 (<40) for stable proteins. In present study, [TMHMM (at http://www.cbs.dtu.dk/services/ protein XP_444880, XP_444849, XP_444856 TMHMM)] was used to predict transmembrane and XP_444780 showed instability index above protein topologies in selected HPs. It is constructed 40. The volume occupied by aliphatic side chains by HMM and can predict 97-98 % correct (Ala, Val, Ile and Leu) were measured by transmembrane helices 17. calculating aliphatic index. Majority of the HPs Table 2. Domain description of selected HPs

Functional domains Descriptions Hypothetical proteins

ZF_FYVE The FYVE zinc finger was first found in Fab1, YOTB/ZK632.12, Vac1 and EEA1. XP_444780 The FYVE finger has eight potential zinc coordinating cysteine positions. Aconitase Aconitase (aconitate hydratase; EC: 4.2.1.3) is an iron-sulphur protein contains a XP_444788 [4Fe-4S]-cluster and catalyses the interconversion of isocitrate and citrate via a

cis-aconitate intermediate. Shilpee Pal E1-E2_ATPase Transmembrane ATPases are membrane-bound enzyme complexes/ion transporters that XP_444794; use ATP hydrolysis to drive the transport of protons across a membrane. XP_444795 RVT_1 Reverse transcriptase gene is usually indicative of a mobile element such as a XP_444808

retrotransposon or retrovirus. et al. ABC_membrane Water-soluble domain of transmembrane ABC transporters XP_444820 Trp_syntA Tryptophan synthase catalyses the last step in the biosynthesis of tryptophan 35 XP_444829 /JAM2(1)2015pp32-53 AA_TRNA_LIGASE Aminoacyl-tRNA synthetase (also known as aminoacyl-tRNA ligase) catalyses the XP_444842; attachment of an amino acid to its cognate transfer RNA molecule XP_444856; XP_444859 Sugar_tr ATP-binding cassette (ABC) superfamily and major facilitator superfamily (MFS), XP_444845; XP_444844; also called the uniporter-symporter-antiporter family. XP_444843; XP_444860; XP_444861 GATA_ZN_FINGER GATA-type zinc fingers (Znf), a transcription factors (including erythroid-specific XP_444846 transcription factor and nitrogen regulatory proteins), specifically bind the DNA sequence (A/T) GATA (A/G) in the regulatory regions of genes. Collagen Generally extracellular structural proteins involved in formation of connective tissue XP_444847 structure. The sequence is predominantly repeats of the G-X-Y and the polypeptide chains form a triple helix. LRR_1 Composed of repeating 20–30 amino acid stretches that are unusually rich in the XP_445514 hydrophobic amino acid leucine. IF2 Translation initiation factor 2 (IF2) promotes 30S initiation complex (IC) formation XP_444851 and 50S subunit joining, which produces the 70S IC. SANT SANT domain is a motif of ~50 amino acids present in proteins involved in chromatin- XP_444880 remodelling and transcription regulation. Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 36 displayed aliphatic index from 70 to 103 without XP_444788 were cytoplasmic; XP_444843, protein XP_444780 (Table 1). GRAVY revealed XP_444844, XP_444845, XP_444860, the hydrophobicity of a protein. Grand Average XP_444861, XP_444794 and XP_444795 were hydropathy (GRAVY) of the studied HPs were membrane proteins; XP_444851 and XP_444842 varied between -0.878 to 0.357. A significant were mitochondrial and protein XP_444820 was positive correlation was observed between ali- localizing in endoplasmic reticulum (Table 3). phatic index and GRAVY (r = 0.583, p<0.01). For Protein XP_444843, XP_444844, XP_444845, functional analysis of selected HPs, a number of XP_444860 and XP_444861 showed similar parameters such as, protein domains, cleavage number of transmembrane regions (Table 4) sites, transmembrane regions, subcellular localized in plasma membrane (Table 3) and localizations and protein-protein interactions were possessed Sugar_tr domain (Table 2); protein predicted and analyzed. Functional domains, XP_444794 and XP_444795 also localized in determined by all four databases were ZF_FYVE, plasma membrane (Table 3) showed same number Aconitase, E1-E2_ATPase, RVT_1, ABC_mem- of transmembrane regions (Table 4) and comprised brane, Trp_syntA, AA_TRNA_LIGASE, of E1-E2_ATPase domain (Table 2). Another Sugar_tr, GATA_ZN_FINGER, Collagen, transmembrane protein XP_444820 (Table 4) LRR_1, IF2 and SANT (Table 2). Protein contained ABC_membrane domain (Table 2). XP_444847 revealed signal peptide and the Protein-protein interactions of selected 20 HPs cleavage site was between 17th (Ala) and 18th (Phe) were shown in Fig. 2. Associating the above amino acids (Fig. 1). According to PSORTII, results, functional categorization was done (Table protein XP_444880, XP_444846, XP_445514, 5). HPs were involved in nuclear–vacuolar XP_444780 and XP_444808 were nuclear; junction, metabolic pathways, ATPase execution, XP_444829, XP_444859, XP_444856 and DNA binding, mitochondrial transport, amino acid

Figure 1. Protein XP_444847 showed max. C at position 18, max. Y at position 18 and mean S value between position 1-17. Cleavage site between positions 17 (Ala) and 18 (Phe) Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 37 Table 3. Subcellular localization of the HPs

Protein id Localization

XP_444880 Nuclear XP_444843 Plasma membrane XP_444844 Plasma membrane XP_444845 Plasma membrane XP_444846 Nuclear XP_444847 Extracellular XP_445514 Nuclear XP_444860 Plasma membrane XP_444851 Mitochondrial XP_444829 Cytoplasmic XP_444861 Plasma membrane XP_444859 Cytoplasmic XP_444820 Endoplasmic reticulum XP_444856 Cytoplasmic XP_444788 Cytoplasmic XP_444794 Plasma membrane XP_444780 Nuclear XP_444808 Nuclear XP_444842 Mitochondrial XP_444795 Plasma membrane

Table 4. Transmembrane regions of selected HPs

Sequence Id N-terminal C-terminal Transmembrane region Length

XP_444843 60 82 ASAYVTVSIFCLFIAFGGFVFGW 23 116 135 NGTHYLSKVRTGLVVSIFNIGCAIGGVILS 20 145 164 PGLIIVVVIYVVGIIIQIAT 20 171 193 YFIGRIISGLGVGGIAVLSPMLI 23 203 225 ATLVACYQLMITLGIFLGYCTNF 23 238 257 VPLGLCFAWAIFMISGMTFV 20 361 383 FETSIVIGVVNFFSTFVGIFLVG 23 390 410 CLLWGAATMTACMVVFASVGV 21 430 452 MIVFTCFYIFCFATTWAPLAFVI 23 465 487 CMALAQASNWIWGFLISFFTPFI 23 491 513 INFNYGYVFMGCLCFSYFYVFFF 23 XP_444844 54 73 AFVGVIISCFMVAFGGFVFG 20 110 127 LIVSIFNIGCAIGGIILS 18 137 156 MGLVVVVVIYIVGIIIQIAS 20 163 185 YFIGRIISGLGVGGISVLSPMLI 23 195 217 GSLVSCYQLMITLGIFLGYCTNF 23 230 249 VPLGLCFAWALFMIGGMTFV 20 352 374 SFETSIVFGVVNFFSTCCSLLTV 23 381 403 NCLLYGAIGMVCCYVVYASVGVT 23 Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 38 table 4. (continued).

Sequence Id N-terminal C-terminal Transmembrane region Length

423 445 IVFSCFYIFCFATTWAPIAYVII 23 457 479 AMSIATAANWMWGFLIAFFTPFI 23 483 505 INFYYGYVFMGCMVFAYFYVFFF 23 XP_444845 25 44 LIFVSLCCIMVAFGGFVFGW 20 78 100 TGLIVAIFNIGCAIGGITLSKLG 23 107 124 LGLVTVVVVYTIGIVIQI 17 134 156 FIGRIISGLGVGGIAVLSPMLIS 23 163 185 LRGTLVSCYQLMITCGIFLGYCT 23 200 219 VPLGLCFAWALFMIFGVMCV 20 300 319 LTGANYFFYYGTTIFRAVGL 20 323 342 FQTAIVLGVVNFVSTFYALY 20 355 377 WGCVGMVCCYVVYASVGVTRLWP 23 392 414 MIVFACFFIFCFATTWAPIAYVI 23 427 449 AMSIAIAANWIWGFLIAFFTPFI 23 453 475 INFYYGYVFMGCMVFAYFYVFFF 23 XP_444860 44 66 ASAYVSISIFCLFIAFGGFVFGW 23 100 119 TGLVVSIFNIGCAIGGIVLS 20 129 148 IGLISVVVIYIVGIVIQIAT 20 155 177 YFIGRIISGLGVGGIAVLSPMLI 23 187 209 GSLVSCYQLMITCGIFLGYCTNY 23 222 241 VPLGLCFAWALFMIGGMTFV 20 344 366 SFETSIVIGIVNFASTFVALYVV 23 375 394 LLWGAAAMTACMVVFASVGV 20 414 436 MIVFTCFYIFCFATTWAPIPFVV 23 449 471 CMAIAQASNWIWGFLIAFFTPFI 23 475 497 INFYYGYVFMGCLCFSYFYVFFF 23 XP_444861 41 63 ASAYVAISIFCLFIAFGGFVFGW 23 97 116 TGLIVSIFNIGCAIGGVVLS 20 126 145 IGLISVVVIYIVGIVIQIAT 20 152 174 YFIGRIISGLGVGGIAVLSPMLI 23 184 206 GSLVSCYQLMITCGIFLGYCTNY 23 219 238 VPLGLCFAWALFMIGGMTFV 20 341 363 SFETSIVIGIVNFASTFVALYVV 23 372 391 LLWGAAAMTACMVVFASVGV 20 411 433 MIVFTCFYIFCFATTWAPIPFVV 23 446 468 CMAIAQASNWIWGFLIAFFTPFI 23 472 494 INFYYGYVFMGCLCFSYFYVFFF 23 XP_444820 133 155 LLLTAVGLLTISCSIGMTIPKVI 23 193 215 FLGGFALALLVGIAANYGRIILL 23 272 294 GFKALICGSVGIGMMLALSPQLS 23 XP_444794 98 120 VKFVMFFVGPIQFVMEAAAILAA 23 125 144 WVDFGVICGLLMLNACVGFI 20 273 295 VLNGIGILLLVLVIVTLLGVWAA 23 315 337 IIGVPVGLPAVVTTTMAVGAAYL 23 Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 39 table 4. (continued).

Sequence Id N-terminal C-terminal Transmembrane region Length

675 697 YVVYRIALSLHLELFLGLWIIIL 23 703 722 IELIVFIAIFADVATLAIAY 20 742 764 MSIILGIVLAIGTWICLTTMFLP 23 807 829 WQLAGAVFAVDIIATMFTLFGWF 23 842 861 IYIWSIGVFCVLGGFYYIMS 20 XP_444795 83 102 LLLTGAAVVSFTLGIYEVLT 20 117 139 VDWIEGLAIMMAVLVVVLVSAAN 23 307 329 ISVYGCVAAITLFVVLFARYLSY 23 352 374 IFITAITVIVVAVPEGLPLAVTL 23 854 876 FIQFQLIVNVTAVLLTFVTSVISS 23 886 905 VQLLWVNLIMDTLAALALAT 20 1022 1044 YFIFIMSLIAVLQVLIMFFGGAP 23 1054 1076 MWLVSVSSGILAIPVGALIRICP 23

Table 5. Putative function of selected HPs

Functional categories Hypothetical proteins

Nuclear–vacuolar junction XP_444780 Metabolic pathways XP_444788 ATPase execution XP_444794 DNA binding XP_444808 Mitochondrial transporter XP_444820 Amino acid biosynthesis and catabolism XP_444829 Vesicular transport XP_444846 Extracellular XP_444847 DNA repair and cell cycle control XP_444880 Nuclear functions XP_445514 Mitochondrial RNA synthesis and translation XP_444851; XP_444842 Cellular RNA synthesis XP_444856; XP_444859 Glucose transporter XP_444845; XP_444844; XP_444843; XP_444860; XP_444861 Ca2+ ion exchanger XP_444795 biosynthesis and catabolism, vesicular transport, XP_444820, XP_444794 and XP_444795 showed extracellular, DNA repair and cell cycle control, large amount of helical structure (Fig. 3). Signal nuclear functions, mitochondrial RNA synthesis peptide containing protein XP_444847 revealed and translation, cellular RNA synthesis, glucose large quantity of â-barrel (Fig. 3). Active sites of transport and Ca2+ ion exchange. Template each HP were shown in Table 7. structures of predicted protein models were arranged in Table 6. Optimized 3D structures were Discussion predicted and shown in Fig. 3. Among them Physicochemical properties analysis uncovers transmembrane protein XP_444843, XP_444844, the basic knowledge about the nature of proteins. XP_444845, XP_444860, XP_444861, A very important property i.e. pI revealed the pH Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 40 Table 6. Template structures of the HPs

Accession No Template

XP_445514 4mn8 XP_444780 3gav XP_444794 1mhs XP_444795 3ba6 XP_444808 4c0o XP_444842 1f7u XP_444843 4gc0 XP_444844 4pyp XP_444845 4gby XP_444846 3efo XP_444847 2pff XP_444860 4pyp XP_444880 3gaw XP_444859 1f7u XP_444861 4gby XP_444820 4f4c XP_444856 2xti XP_444829 1k8z XP_444788 1c96 XP_444851 3j4j

Figure 2. (continued). Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 41

Figure 2. (continued). Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 42

Figure 2. Protein-protein interaction of selected hypothetical proteins

Figure 3. (continued). Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 43

Figure 3. Predicted structures of selected hypothetical proteins at which they are stable, least soluble and immobile XP_444880, XP_444856, XP_444849 and in an electro focusing system i.e. contain no net XP_444820 displayed instability index (II) greater charge 21. In present study, protein XP_444880, than 40, thus unstable and rests were seemed to XP_444808, XP_444842, XP_444820, be stable. Ligand binding residues on protein XP_444780, XP_444845, XP_444861, surface require flexible side chains to undergo XP_444851, XP_444843 and XP_444860 showed conformational changes. Thus positive charged higher pI, so, during protein purification by residues (Arg, Lys and His) as well as negatively isoelectric focusing method, require basic buffer. charged (Glu and Asp) amino acids reside in active In vitro protein stability is measured by calculating sites of proteins 23. Surface of selected protein instability index (II) ranges from 1 to 40 and XP_444880, XP_444808, XP_444851, depends on the primary sequence of protein22. In XP_444820, XP_444780, XP_444842, the current study, protein XP_444780, XP_444843, XP_444861, XP_444860, Table 7. Predicted active sites of selected HPs

Sequence Active site Residues at active site Id volume

XP_444780 208 ARG 673; ASN 684, 692, 699 and 701; ASP 628; CYS 664; GLU 667 and 703; GLY 650, 651, 652, 653, 688 and 693; HIS 665; ILE 694; LEU 671 and 700; LYS 633, 685, 686 and 697; MET 704; PHE 690; PRO 683 and 689; SER 675, 679, 698 and 705; THR 677; TYR 668 and 687; VAL 648, 649, 654, 680, 682,695 and 702. XP_444794 153 ALA 94 and 95; ASN 96; GLU 97 and 98; GLY 100 and 101; ILE 102, 104, 105 and 108; LEU 146, 149, 271, 274 and 275; MET 278, 279 and 281; PHE 282 and 285; PRO 311 and 314; SER 315; THR 318, 319 and 320; Shilpee Pal VAL 321, 322, 323, 324, 326, 327 and 330. XP_444795 189 ALA 65, 76, 137 and 377; ARG 64 and 80; ASP 79, 140 and 293; GLN 148 and 297; GLU 75 and 144; LEU 72, 145, 300, 368, 370, 376 and 393; LYS 67 and 143; MET 82; PHE 69 and 147; PRO 66 and 369; SER 68,

136 and 301; THR 81, 304, 373 and 380; TYR 141; VAL 73, 372 and 390. et al. XP_444808 397 ALA 70,75 and 92; ASN 77,,88,,90, 107 and 126; ASP 37, 52 and 152; CYS 86; GLN 60, 132 and 155; GLU 87, 102, 128 and 156; GLY 71, 109 and 176; HIS 105 and 120; ILE 34, 51, 55, 83, 108, 117, 44 148, 149, 179 and /JAM2(1)2015pp32-53 183; LEU 68, 74, 84, 96, 103, 104, 116, 118 and 138; LYS 38, 73, 76, 94, 97, 140 and 180; PHE 41, 113, 123, 129 and 177; PRO 141, 144 and 145; SER 91, 98, 99, 100, 110, 121 and 137; THR 79, 95, 106 and 124; TRP 101 and 147; TYR 72 and 153; VAL 67, 69, 93, 114 and 133. XP_444842 192 ALA 353, 354 and 371; ARG 357; ASN 153 and 187; ASP 350; GLN 295 and 374; GLU 148, 356 and 397; GLY 161, 190 and 292; HIS 159, 162, 377 and 398; ILE 351; LEU 146, 189, 381, 385 and 396; MET 186, 366 and 378; PHE 149 and 401; PRO 152; SER 150, 151, 165 and 291; TYR 188, 346, 368 and 382; VAL 296, 369 and 370. XP_444788 266 ALA 220, 228, 462, 502 and 503; ARG 364 and 477; ASN 223, 234, 362 and 370; ASP 184, 195, 367 and 549; CYS 360; GLU 219, 227, 366, 500 and 501; GLY 185, 190, 294 and 363; HIS 122 and 199; ILE 191, 292, 297 and 298; LEU 187, 196, 224 and 499; LYS 186, 194, 293 and 373; MET 230 and 548; PHE 551; PRO 189, 461 and 550; SER 188, 193, 225, 231, 295 and 460; THR 123, 146, 192, 300 and 361; TYR 504; VAL 183, 251, 374 and 553. XP_444829 400 ALA 407, 438, 525, 553, 565, 598, 622 and 644; ARG 436, 443, 462 and 674; ASN 440 and 532; ASP 433, 592, 601, 625, 676 and 678; CYS 526; GLN 409 and 437; GLU 404, 552 and 646; GLY 379, 406, 408, 484, 488, 528, 529, 530, 550, 554, 555, 599, 604, 606 and 673; HIS 381, 410, 563, 576, 594 and 649; ILE 487 and 621; LEU 439, 461, 567, 600 and 645; LYS 382 and 677; PHE 469, 575 and 620; PRO 489, 603 and 607; SER 380, 485, 531, 564, 595, 597, 672 and 681; THR 378, 405, 460, 465, 566, 568, 580, 593 and 624; TYR 481 and 602; VAL 412, 483, 527, 551, 578, 582, 596 and 605. table 7. (continued).

Sequence Active site Residues at active site Id volume

XP_445514 519 ALA 394, 402, 429, 446, 494 and 509; ARG 413, 473, 481 and 505; ASN 428, 448, 449, 471 and 545; ASP 407, 458, 492 and 512; CYS 410 and 421; GLN 397, 436 and 454; GLU 405, 406, 425, 456, 501, 502 and 514; GLY 478; HIS 442, 460 and 507; ILE 412, 433, 453, 463, 493 and 524; LEU 378, 395, 396, 409, 419, 439, 440, 441, 479, 485, 498, 543 and 544; LYS 398, 401, 443, 444, 452, 467, 482, 503 and 526; MET 414, 417 Shilpee Pal and 426; PHE 400, 403, 404, 411, 415, 416 and 432; PRO 377, 425, 483, 491 and 525; SER 418, 422, 431, 445, 447, 450, 457, 466, 468, 480, 511, 515, 516, 522 and 542; THR 424, 434 and 438; TRP 430; TYR 423, 427 and 506; VAL 420, 477, 488, 496, 510 and 517.

XP_444880 294 ALA 737; ARG 510, 715, 733 and 744; ASN 528 and 714; ASP 505 and 515; GLN 509, 534, 710 and 711; et al. GLU 502, 506, 507, 527 and 560; HIS 707; ILE 503 and 520; LEU 500, 523, 530, 538 and 568; LYS 516, 517,

519 and 708; MET 511 and 743; PHE 521, 541 and 740; PRO 501, 542, 563 and 571; SER 504, 45 514, 518, 531, /JAM2(1)2015pp32-53 535, 539 and 567; THR 524 and 526; TRP 736; TYR 508 and 544; VAL 543 and 564. XP_444861 203 ALA 167, 359, 423, 427 and 450; ASN 105, 323, 352, 434 and 454; ASP 322 and 364; CYS 108; GLN 191, 314, 317, 318 and 451; GLY 62 and 184; ILE 194, 198 and 461; LEU 360; MET 447; PHE 61, 104, 326, 422 and 431; PRO 171, 428 and 430; SER 188, 355 and 435; THR 65, 195 and 356; TRP 455; TYR 190 and 327; VAL 163, 168, 187, 363 and 432. XP_444860 629 ALA 76, 170, 430 and 465; ARG 82, 83, 159 and 396; ASN 180, 208, 216, 326, 355, 412, 457 and 473; ASP 67, 79, 325, 343 and 403; CYS 199, 206, 227 and 419; GLN 77, 145, 194, 219, 317, 320, 321 and 454; GLY 65, 69, 73, 101, 141, 200, 204, 332, 340, 352, 411 and 461; HIS 93; ILE 71, 142, 197, 201, 349, 351, 353, 415 and 422; LEU 84 and 224; LYS 97, 212, 337 and 408; MET 414; PHE 64, 74, 80, 107, 202, 328, 329, 336, 345, 356, 389, 425, 434, 466, 470, 477 and 483; PRO 223 and 469; SER 72, 96, 105, 162, 217 and 348; THR 68, 70, 100, 198, 211, 333, 334, 347 and 418; TRP 66, 220, 398, 429 and 458; TYR 94, 138, 155, 205, 209, 327, 330, 331, 421 and 479; VAL 75, 98, 104, 335 and 339. XP_444859 240 ALA 146, 147 and 148; ARG 149 and 151; ASN 152; ASP 153, 154 and 159; GLN 160, 163 and 186; GLU 187, 188 and 189; GLY 190, 191, 192 and 193; HIS 194 and 197; ILE 222 and 250; LEU 251, 254, 292 and 338; LYS 341, 342, 343 and 344; MET 345; PHE 348, 349, 356, 367, 368 and 369; SER 370, 373 and 374; THR 377, 378 and 380; TRP 381; TYR 395, 396, 397 and 400; VAL 402. XP_444851 557 ALA 142, 143, 144, 145 and 146; ARG 147, 150 and 151; ASN 152, 153, 154 and 155; ASP 156, 159, 160, 163, 164, 169, 173 and 175; GLN 178, 179 and 180; GLU 181, 182, 192 and 193; GLY 194, 195, 196, 197, table 7. (continued).

Sequence Active site Residues at active site Id volume

198, 199 and 201; HIS 202, 203, 204, 205, 206 and 207; ILE 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219 and 220; LEU 223, 224, 225, 226, 227, 228, 229 and 230; LYS 231, 232, 233, 234, 235, 236, 237, 238 and 239; MET 241, 243, 244 and 245; PHE 248, 249 and 251; PRO 252, 266 and 269; SER 270, 273, 274, 275, 284, 290 and 291; THR 292, 326, 327, 328, 329, 330, 331, 372 and 373; THR 329, 330, 331, 372 and Shilpee Pal 373; TRP 394; VAL 395, 396, 397, 398, 399, 475, 477, 478, 479, 506, 507, 508 and 510. XP_444847 159 ALA 22, 662 and 733; ASN 21, 26 and 731; ASP 706 and 713; GLN 718; GLU 29 and 31; GLU 724; GLY 57, 583, 664, 716 and 723; ILE 708 and 720; LEU 663; LYS 586; PHE 719; PRO 20 and 712; SER 24, 30, 32,

585, 709, 714 and 715; THR 23, 28, 710, 711 and 722; TRP 60; TYR 56 and 584; VAL 707 and 735. et al. XP_444846 315 ALA 59, 60 and 61; ARG 62, 63 and 64; ASN 65, 66, 67, 68, 69 and 70; ASP 71, 72 and 73; CYS 75; GLN 76

and 80; GLU 81 and 82; GLY 83, 85, 86, 87, 96 and 97; HIS 98 and 99; ILE 102; LEU 103, 46 105, 106, 107 and /JAM2(1)2015pp32-53 108; LYS 109, 118, 119 and 120; MET 121; PHE 122, 123, 124 and 685; PRO 686, 687, 689, 694, 695, 697 and 698; SER 699, 701, 702, 740, 741, 742, 779, 780 and 781; THR 782, 783 and 784; TYR 785, 883 and 884; TYR 885; VAL 886, 893, 894, 895 and 898. XP_444845 211 ALA 36; ASP 24; CYS 32, 170 and 177; GLN 10; GLU 12, 14 and 16; GLY 38 and 39; ILE 2, 33 and 155; LEU 9, 17, 25, 163, 167, 173, 181 and 341; LYS 20, 21 and 22; MET 1 and 174; PHE 37, 40, 180, 334 and 338; PRO 5 and 152; SER 3, 29, 151 and 169; THR 166; TYR 171; VAL 13, 28, 35, 331 and 335. XP_444844 462 ALA 72, 73 and 74; ASN 76, 77, 78, 79, 80 and 81; ASP 83; CYS 84, 85 and 102; GLN 105, 106, 109 and 110; GLY 112, 113, 115, 116, 119, 174, 198, 199 and 201; ILE 202, 205, 206, 207, 208 and 209; LEU 210, 211, 212 and 213; LYS 214, 215 and 216; PHE 217, 220, 231, 234, 235, 328, 329, 333, 334, 336, 337, 338, 340 and 341; PRO 342, 344 and 345; SER 352, 353, 355, 356 and 357; THR 359, 360, 361, 363, 364 and 429; TRP 430, 433 and 437; TYR 438, 439, 442, 443, 465 and 466; VAL 469, 470, 473, 474, 477, 478 and 481. XP_444843 553 ALA 92, 186, 406, 446 and 490; ARG 175 and 412; ASN 124, 224, 342, 371, 473 and 492; ASP 83, 95, 341 and 489; CYS 222, 243 and 435; GLN 93, 161, 210, 235, 333, 336, 337 and 470; GLY 81, 85, 89, 117, 131, 179, 183, 216, 220, 348, 368, 409, 419, 477 and 496; ILE 87, 122, 158, 213, 217, 350, 365, 367, 438, 480 and 491; LEU 215, 240 and 338; LYS 228 and 353; MET 430; PHE 80, 90, 123, 218, 225, 344, 345, 352, 361, 372, 405, 441, 450, 478, 482, 486, 493 and 499; PRO 239 and 485; SER 88, 121, 178, 360, 364 and 481; THR 84, 86, 214, 227, 349, 434, 484 and 488; TRP 82, 236, 445 and 474; TYR 154, 221, 343, 346, 347, 437 and 495; VAL 119, 120, 187, 351, 369, 408 and 498. Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 47 XP_444845 and XP_444844 contained more number of positively charged amino acids which might be involved in ligand binding. On the other hand, protein XP_444795, XP_444794, XP_444788, XP_444846, XP_445514, XP_444829, XP_444859, XP_444856 and XP_444847 comprised more amounts of negati- vely charged amino acids at their surface area which might display binding sites of ligand. Hydrophobicity of a protein depends on hydro- phobic amino acids 24. Protein XP_444843, XP_444861, XP_444845, XP_444860, XP_444844 and XP_444794 was more hydrophobic than others (Table 1). Moreover, a significant positive correlation was found between aliphatic index of selected HPs and their GRAVY (r = 0.583, p<0.01), i.e. hydrophobic proteins contained long chain aliphatic side chains 24. Functional categorization of proteins was per- formed by determining protein domains, signal peptide, subcellular localizations, transmembrane regions, protein network analysis, etc. 21. According to evolutionary conservation theory, domains are the most conserved parts as well as the functional regions of proteins 25. They can modulate the function of a protein by changing its arrangement (http://www.ncbi.nlm.nih.gov/ Structure/cdd/wrpsb.cgi/). So, only sequence homology based methods may not predict the actual function of proteins 26. Furthermore, hypothetical proteins do not have any known homologous protein sequences. So, a number of domain/superfamily prediction tools were used to predict accurate domains of selected 20 hypo- thetical proteins of C. glabrata CBS 138. In the present study, protein XP_444780 showed Zn+ ion binding domain ZF_FYVE, involved in 27

ARG 245, 248, 269, 270, 271, 277, 278 and 296; ASP 297; CYS 298 and 299; GLN 300, 301 304; GLU ASP ARG 245, 248, 269, 270, 271, 277, 278 and 296; 331, 332, 333, 334 and 335; HIS 336 337; ILE 315, 317, 318, 319, 321, 323, 327, 328, 329 and 330; GLY 463 and 471; PHE 472 473; PRO 474, 461; MET 338, 340 and 421; LEU 445, 446, 447, 448 449; LYS TYR 533, THR 518, 519, 520, 521 and 522; and 476; SER 478, 479, 485, 502, 506, 515, 516 517; 475 540 and 541. VAL 537, 538 and 539; 74, 75, 76, 77, 79, 80, 81 and 82; PRO 83, 84 85; 63 and 67; HIS 70 71; LEU 72 73; LYS GLY VAL TYR 108 and 109; 107; TRP THR 101, 102, 103, 105 and 106; 86, 87, 89, 90, 92, 93, 98 and 99; SER 316, 319, 320 and 323. 111, TGFâ signalling . XP_444780 showed interaction (Fig.2) with GTP-binding protein (gtr1), PRA1- like (Prenylated Rab acceptor 1) protein (SPCC306.02c-1) involved in cell trafficking 28; vacuolar protein 8 (vac8), important for vacuolar 29 Active siteActive Residues at active site sorting . These functions were generally occurred by NVJ (nuclear–vacuolar junction) proteins, which resided in nuclear–vacuolar junctions of

Id volume yeast cell and helped in engulfment of nucleus into 30

Sequence vacuole during carbon and nitrogen depletion .

XP_444856 304 XP_444820 269ASN 14, 15, 16, 17 and 18; GLN 20, 49 50; GLU 51, 52, 53, 56 57; ARG 12 and 13; 7, 8 and 11; ALA Very naturally, protein XP_444780 was localized table 7. (continued). Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 48 in nucleus (Table 3), thus it might be concluded XP_444829 was associated with amino acid that protein XP_444780 was functioning as biosynthesis and catabolism. Protein XP_444846 nuclear–vacuolar junction protein. Protein showed GATA_ZN_FINGER domain, a DNA XP_444788 showed Aconitase domain, involved binding domain also performed in vesicular in catalysis of interconversion between isocitrate trafficking 42. It was associated with (Fig.2) and citrate via cis-aconitate intermediate 31 and transport proteins (SEC23, SEC24, SEC31, interacted with (Fig.2) 3-isopropylmalate SEC13); GRASP65 homolog protein 1 (GRH1), dehydrogenase (LEU2); branched-chain-amino- a coat protein complex II (COPII) that promote acid aminotransferase (BAT1); 2-isopropylmalate formation of transport vesicles from the synthase (LEU4); mitochondrial oxaloacetate endoplasmic reticulum 43 and localized above transport protein (OAC1); dihydroxy-acid nucleus which was also a localizer of XP_444846 dehydratase (ILV3), which were involved in (Table 3). Thus protein XP_444846 might be cellular metabolic pathways 32. These functions involved in vesicular transportation. used to occur in cytoplasm33, which was the Protein XP_444847 revealed collagen domain, subcellular localizer of protein XP_444788. involved in formation of connective tissue 44. Protein XP_444808 showed RVT_1 domain, a XP_444847 showed interaction with nuclear reverse transcriptase gene usually present in mobile receptor subfamily 6- group A- member 1 element like retrotransposon 34. The protein (NR6A1), involved in integrin-mediated cell- interacted with (Fig. 2) telomere elongation protein matrix interaction45; integrin (ITGA10); laminin (EST1; EST3); high affinity DNA-binding factor (LAMB3); fibronectin, component of extracellular subunit 2 (YKU70, YKU80); DNA repair and matrix 46, which was the subcellular localizer of recombination protein (RAD52). Thus protein the studied protein. Cleavage site was present in XP_444808 might function as a DNA binding this protein between 17th (Ala) and 18th (Phe) protein in nucleus. ABC_membrane domain acts amino acids (Fig. 1). So it could be concluded that as ABC transporter 35 existed in protein XP_444847 was an extracellular protein and might XP_444820. It was connected with (Fig.2) para- be involved in host-pathogen associations 47. hydroxybenzoate-polyprenyl transferase (COQ2), Protein XP_445514 contained LRR_1 domain, an integral membrane protein involved a structural basis of various purposes such as in ubiquinone biosynthesis 36, fatty acid elongation formation of protein-protein interactions 48, protein 3 (SUR4); mitochondrial inner membrane tyrosine kinase receptors, cell-adhesion molecules, translocase subunit TIM16 (PAM16); virulence factors and extracellular matrix-binding mitochondrial carrier (MTM1). Above functions glycoproteins 49. It interacted with (Fig. 2) protein generally happened in mitochondria and HYM1, helped in cell cycle regulation 50; serine/ endoplasmic reticulum, which was the subcellular threonine-protein kinase (KIC1), required for cell localization of protein XP_444820 (Table 3). integrity, cellular polarity and morphogenesis 51; Moreover the protein revealed transmembrane nicotinamide riboside kinase 1( NRK1), coenzyme regions (Table 4), thus might be involved in of oxidoreductase and performed as a source of mitochondrial transportation. Protein XP_444829 ADP-ribosyl groups used in various reactions 52; comprised Trp_syntA domain, responsible for Serine/threonine-protein kinase (CBK1), seemed tryptophan biosynthesis 37. XP_444829 interacted to play role in regulation of cell morphogenesis with N-(5'-phosphoribosyl) anthranilate isomerase and proliferation 53; autophagy-related protein 17 (TRP1); anthranilate synthase components (TRP2; (ATG17), responsible for pexophagy and nucleo- TRP3); anthranilate phosphoribosyltransferase phagy 54. All the above functions typically occurred (TRP4); threonine dehydratase (ILV1), all in nucleus, which was the subcellular localizer of involved in amino acid biosynthesis and catabolism protein XP_445514 (Table 3). Thus it might be 38-41, used to occur in cytoplasm 33, which was involved in nuclear functions. Translation initiation subcellular localizer of protein XP_444829 (Table factor 2 (IF2) and 30S initiation complex (IC) 3). Thus it might be concluded that protein forming domain IF2 55 was found in protein Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 49 XP_444851. Moreover, phenylalanyl-tRNA XP_444859. Protein XP_444842 shown synthetase (MSF1); RNA exonuclease (NGL2); interaction with (Fig.2) methionine-tRNA ligase 37S ribosomal protein (RSM28), involved in (MES1); GMP synthase (GUA1); glutamyl-tRNA mitochondrial protein translation 56; ATP- synthetase (SES1); isoleucyl-tRNA synthetase dependent RNA helicase (PRP22) a pre-mRNA- (ILS1). Above proteins were involved in RNA splicing factor; mitochondrial precursor required synthesis andsubcellular localizer of protein for initiation of translation of the COX1 coding XP_444842 was mitochondria (Table 3). So, it region (PET309) were interacted with the protein might be concluded that protein XP_444842 was (Fig. 2). The above functions were involved in gene involved in mitochondrial RNA synthesis. On translation of mitochondria, which was the another hand protein XP_444856 and XP_444859 subcellular localizer of protein XP_444851 (Table interacted with cytoplasmic tRNA synthetase (Fig. 3). So, the protein might be involved in mitochon- 2) and PSORT II shown cytoplasm as subcellular drial gene translation. Protein XP_444880 showed localizer (Table 3) of above HPs. Thus protein SANT domain, involved in chromatin-remodelling XP_444856 and XP_444859 might be involved and transcription regulation 57. It was interacted in RNA synthesis. Protein XP_444845, with (Fig. 2) chromatin modification-related XP_444844, XP_444843, XP_444860 and protein (EAF5, YNG2, EAF7); SWR1 complex XP_444861 showed Sugar_tr domain, played role mediated ATP- dependent exchange of histone H2A in uptake of sugar 64. They interacted with glucose (YAF9, SWC4), which were involved in centro- transporters (HXT 1-7) (Fig. 2) and subcellular mere functions, DNA damage control, cell cycle localizer was plasma membrane (Table 3). They control 58 in nucleus the subcellular localizer of also revealed transmembrane regions (Table 4). protein XP_444880 (Table 3). Protein XP_444880 Thus the above HPs might be involved in glucose might be involved in DNA repair and cell cycle transportation. control of C. glabrata CBS 138. Among two types of transmembrane proteins, Protein XP_444794 and XP_444795 both con- helical proteins are more abundant than β-barrel tained E1-E2_ATPase domain, a transmembrane 65. In the present study, protein XP_444843, ATPase that transport membrane-bound enzyme XP_444844, XP_444845, XP_444860, complexes or ions 34. Protein XP_444794 was XP_444861, XP_444820, XP_444794 and associated with (Fig.2) ubiquitin (UBI4); serine/ XP_444795 were membrane proteins; contained threonine-protein kinase PTK2/STK2 (PTK2); large quantity of helical structure (Fig. 3) and their general amino-acid permease (GAP1); yeast hydrophobicity as well as aliphatic index was elongation factor 3 (YEF3), requiredATPase for higher (Table 1). On the other side, outer functioning 59-61. XP_444794 and XP_444795 membrane proteins generally contain more amount showed transmembrane regions (Table 4) and of β-barrel 65 and formerly established outer plasma membrane as subcellular localizer (Table membrane protein XP_444847 was containing 3). They might be involved in ATPase execution. large amount of â-barrel (Fig. 3). Active sites were On the other hand proteinXP_444795 shown also identified and shown in Table 7. Determination interaction with vacuolar Ca2+/H+ exchanger of tertiary structure of the hypothetical proteins (VCX1); golgi Ca2+-ATPase (PMR1); calcium- (Fig. 3), their templates (Table 6) and active sites channel protein (CCH1); vacuolar v-SNARE might be helpful to study the conformations and (NYV1), which were involved in voltage-gated to identify molecular docking sites that would help Ca2+ channels62. Subcellular localizer of the HP in in silico drug designing. was plasma membrane (Table 3) and trans- membrane regions were present (Table 4). Thus Conclusion protein XP_444795 might be involved in Ca2+ ion In silico approach in revelation of protein exchange. AA_TRNA_LIGASE domain, structure as well as function is less time consuming catalysed aminoacyl-tRNA ligase 63, was observed and cost effective than experimental investigation. in three proteins XP_444842, XP_444856 and Functional annotation may distinguish required Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 50 proteins from others. Screening of significant BLAST, Position Specific Iterated Blast; HMM, protein from a number of hypothetical proteins by Hidden Markov Models; TMHMM, Trans- experimental analysis is tedious and very expensive membrane Hidden Markov Models; STRING, job. Since C. glabrata CBS 138 is a new emerging Search Tool for the Retrieval of Interacting Genes/ pathogen, it is very important to become acquain- Proteins; I-tasser, Iterative Threading ASSEmbly ted about each protein. In the present study, Refinement; GROMOS, GROningen molecular randomly selected HPs have been categorized in dynamics simulation; GRAVY, Grand Average different important intracellular as well as hydropathy. extracellular functions. Protein structures and their active site prediction would help in drug designing Acknowledgement and docking studies. The Department of Biotechnology (DBT), Government of India, New Delhi is acknowledged List of abbreviations gratefully for creation of Bioinformatics Infra- CDD, Conserved Domain Database; RPS- structure Facility Centre at Vidyasagar University, BLAST, Reverse Position-Speciûc BLAST; PSI- Midnapore, West Bengal, India.

References 1. Bethea, E.K., Carver, B.J., Montedonico, A.E., Reynolds, T.B. (2009). The inositol regulon controls viability in Candida glabrata. Microbiology156: 452-462. 2. Paul, L.F., Jose, A.V., Jack, D.S. (1999). Candida glabrata: review of epidemiology, pathogenesis, and clinical disease with comparison to C. albicans. Clinical Microbiology Reviews 12: 80-96. 3. Wingard, J.R. (1995). Importance of Candida species other than C. albicans as pathogens in oncology patients. Clinical Infectious Diseases 20: 115-125. 4. Mandell G.L., Bennett J.E., Dolin R. (2010). Mandell, Douglas, and Bennett’s principles and practice of infectious diseases. 7th ed. Philadelphia. 5. Corno, F., Caldart, M., Toppino, M., Tapparo, A., Capozzi, M.P., Goitre, M., Cervetti, O., Forte, M., Forcheri, V. (1989). Ano-rectal candidiasis. Minerva Chirurgica 44: 2251-2253. 6. Roetzer, A., Gabaldon, T., Schuller, C. (2011). From Saccharomyces cerevisiae to Candida glabrata in a few easy steps: important adaptations for an opportunistic pathogen. FEMS Micro- biology Letters 314: 1-9. 7. Papon, N., Courdavault, V., Clastre, M., Bennett, R.J. (2013). Emerging and emerged pathogenic Candida species: beyond the Candida albicans paradigm. PLOS Pathogens 9: e1003550. 8. Seider, K., Heyken, A., Luttich, A., Miramon, P., Hube, B. (2010). Interaction of pathogenic yeasts with phagocytes: survival, persistence and escape. Current Opinion in Microbiology 13: 392-400. 9. Zarembinski, T.I., Hung, L.W., Mueller-Dieckmann, H.J., Kim, K.K., Yokota, H., Kim, R., Kim, S.H. (1998). Structure-based assignment of the biochemical function of a hypothetical protein: A test case of structural genomics. Proceedings of the National Academy of Sciences of the United States of America 95: 15189-15193. 10. Nimrod, G., Schushan, M., Steinberg, D.M., Ben-Tal, N. (2008). Detection of functionally important regions in ‘‘hypothetical proteins’’ of known structure. Structure 16: 1755-1763. 11. Marchler-Bauer, A., Lu, S., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., DeWeese-Scott, C., Fong, J.H., Geer, L.Y., Geer, R.C., Gonzales, N.R., Gwadz, M., Hurwitz, D.I., Jackson, J.D., Ke, Z., Lanczycki, C.J., Lu, F., Marchler, G.H., Mullokandov, M., Omelchenko, M.V., Robertson, C.L., Song, J.S., Thanki, N., Yamashita, R.A., Zhang, D., Zhang, N., Zheng, C., Bryant, S.H. (2011). CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Research 39: D225-D229. 12. Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 51 Hetherington, K., Holm, L., Mistry, J., Sonnhammer, E.L.L., Tate, J., Punta, M. (2014). Pfam: the protein families database. Nucleic Acids Research 42: D222-D230. 13. Castro, E., Sigrist, C.J.A., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P.S., Gasteiger, E., Bairoch, A., Hulo, N. (2006). Scan Prosite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Research 34:W362- W365. 14. Haft, D.H., Selengut, J.D., White, O. (2003). The TIGRFAMs database of protein families. Nucleic Acids Research 31: 371-373. 15. Petersen, T.N., Brunak, S., Heijne, G., Nielsen, H. (2011). SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods 8: 785-786. 16. Nakai, K., Kanehisa, M. (1992). A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14: 897-911. 17. Krogh, A., Larsson, B., Heijne, G., Sonnhammer, E.L. (2001). Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. Journal of Molecular Biology 305: 567-580. 18. Franceschini, A., Szklarczyk, D., Frankild, S., Kuhn, M., Simonovic, M., Roth, A., Lin, J., Minguez, P., Bork, P., von Mering. C., Jensen. L.J. (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Research 41:D808- D815. 19. Roy, A., Kucukural, A., Zhang, Y. (2010). I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols 5: 725-738. 20. Guex, N., Peitsch, M.C. (1997). SWISS-MODEL and the Swiss-Pdb Viewer: an environment for comparative protein modelling. Electrophoresis 18: 2714-2723. 21. Mohan, R., Venugopal, S. (2012). Computational structural and functional analysis of hypothetical proteins of Staphylococcus aureus. Bioinformation 8: 722-728. 22. Guruprasad, K., Reddy, B.V., Pandit, M.W. (1990). Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence. Protein Engineering 4: 155-161. 23. Najmanovich, R., Kuttner, J., Sobolev, V., Edelman, M. (2000). Side-chain flexibility in proteins upon ligand binding. Proteins 39: 261-268. 24. Kyte, J., Doolittle, R.F. (1982). A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology 157: 105-132. 25. Pettit, F.K., Bare, E., Tsai, A., Bowie, J.U. (2007). HotPatch: a statistical approach to finding biologically relevant features on protein surfaces. Journal of Molecular Biology, 369: 863-879. 26. Schnoes, A.M., Brown, S.D., Dodevski, I., Babbitt, P.C. (2009). Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLOS Computational Biology 5: e1000605. 27. Tsukazaki, T., Chiang, T.A., Davison, A.F., Attisano, L., Wrana, J.L. (1998). SARA, a FYVE domain protein that recruits Smad2 to the TGFbeta receptor. Cell 95: 779-791. 28. Kamei, C.L.A., Boruc, J., Vandepoele, K., Van den Daele, H. (2008). The PRA1 gene family in Arabidopsis. Plant Physiology 147: 1735-1749. 29. Chen, Y.J., Stevens, T.H. (1996). The VPS8 gene is required for localization and trafficking of the CPY sorting receptor in Saccharomyces cerevisiae. European Journal of Cell Biology 70: 289-297. 30. Honscher, C., Ungermann, C. (2014). A close-up view of membrane contact sites between the endoplasmic reticulum and the endolysosomal system: From yeast to man. Critical Reviews in Biochemistry and Molecular Biology 49: 262-268. 31. Juang, H.H. (2004). Modulation of iron on mitochondrial aconitase expression in human prostatic Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 52 carcinoma cells. Molecular and Cellular Biochemistry 265: 185-194. 32. Smith, E., Morowitz, H. (2004).Universality in intermediary metabolism. Proceedings of the National Academy of Sciences of the United States of America 101: 13168-13173. 33. Goodsell, D.S. (1991). Inside a living cell. Trends in Biochemical Sciences 16:203-206. 34. Auer, M., Scarborough, G.A., Kühlbrandt, W. (1998). Three-dimensional map of the plasma membrane H+-ATPase in the open conformation. Nature 392: 840-843. 35. Hung, L.W., Wang, I.X., Nikaido, K., Liu, P.Q., Ames, G.F., Kim, S.H. (1998). Crystal structure of the ATP-binding subunit of an ABC transporter. Nature 396: 703-707. 36. Uchida, N., Suzuki, K., Saiki, R., Kainou, T., Tanaka, K., Matsuda, H., Kawamukai, M. (2000). Phenotypes of fission yeast defective in ubiquinone production due to disruption of the gene for p-hydroxybenzoate polyprenyl diphosphate transferase. Journal of Bacteriology 182: 6933-6939. 37. Crawford, I.P. (1989). Evolution of a biosynthetic pathway: the tryptophan paradigm. Annual Review of Microbiology 43: 567-600. 38. Baker, T.I., Crawford, I.P. (1966). Anthranilate synthetase: Partial purification and some kinetic studies on the enzyme from Escherichia coli. The Journal of Biological Chemistry 241: 5577- 5584. 39. He, U., Brown, B. (1957). Threonine deamination in Escherichia coli. II. Evidence for two L- threonine deaminases. Journal of Bacteriology 73: 105-112. 40. Creighton, T.E., Yanofsky, C. (1970). Chorismate to tryptophan (Escherichia coli) - Anthranilate synthetase, PR transferase, PRA isomerase, InGP synthetase, tryptophan synthetase. Methods in Enzymology 17A: 365-380. 41. Mayans, O., Ivens, A., Nissen, L.J., Kirschner, K., Wilmanns, M. (2002). Structural analysis of two enzymes catalysing reverse metabolic reactions implies common ancestry. The EMBO Journal 21: 3245-3254. 42. Matthews, J., Sunde, M. (2002). Zinc fingers - folds for many occasions. IUBMB Life 54: 351- 355. 43. Jensen, D., Schekman, R. (2011). COPII-mediated vesicle formation at a glance. Journal of Cell Science 124: 1-4. 44. Sundaramoorthy, M., Meiyappan, M., Todd, P., Hudson, B.G. (2002). Crystal structure of NC1 domains. Structural basis for type IV collagen assembly in basement membranes. The Journal of Biological Chemistry277: 31142-31153. 45. Barreto, G., Reintsch, W., Kaufmann, C., Dreyer, C. (2003). The function of Xenopus germ cell nuclear factor (xGCNF) in morphogenetic movements during neurulation. Developmental Biology 257: 329-342. 46. Xiong, Y., Eickbush, T.H. (1990). Origin and evolution of retro elements based upon their reverse transcriptase sequences. The EMBO Journal 9: 3353-3362. 47. Sandini, S., La Valle, R., De Bernardis, F., Macri, C., Cassone, A. (2007). The 65 kDa manno- protein gene of Candida albicans encodes a putative b-glucanase adhesin required for hyphal morphogenesis and experimental pathogenicity. Cellular Microbiology 9: 1223-1238. 48. Kobe, B., Kajava, A.V. (2001). The leucine-rich repeat as a protein recognition motif. Current Opinion in Structural Biology 11: 725-732. 49. Rothberg, J.M., Jacobs, J.R., Goodman, C.S., Artavanis-Tsakonas, S. (1990). Slit: an extra- cellular protein necessary for development of midline glia and commissural axon pathways contains both EGF and LRR domains. Genes & Development 4: 2169-2187. 50. Hsu, J., Weiss, E.L. (2013). Cell cycle regulated interaction of a yeast hippo kinase and its acti- vator MO25/Hym1. PLoS ONE 8: e78334. 51. Sullivan, D.S., Biggins, S., Rose, M.D. (1998). The yeast centrin, cdc31p, and the interacting Shilpee Pal et al. / JAM 2 (1) 2015 pp 32 - 53 53 protein kinase, Kic1p, are required for cell integrity. The Journal of Cell Biology 143:751-765. 52. Bieganowski, P., Brenner, C. (2004). Discoveries of nicotinamide riboside as a nutrient and conserved NRK genes establish a Preiss-Handler independent route to NAD+ in fungi and humans. Cell 117:495-502. 53. Du, L.L., Novick, P. (2002). Pag1p, a novel protein associated with protein kinase Cbk1p, is required for cell morphogenesis and proliferation in Saccharomyces cerevisiae. Molecular Biology of the Cell 13: 503-514. 54. Kamada, Y., Funakoshi, T., Shintani, T., Nagano, K., Ohsumi, M., Ohsumi, Y. (2000). Tor- mediated induction of autophagy via an Apg1 protein kinase complex. The Journal of Cell Biology 150:1507-1513. 55. Simonetti, A., Marzi, S., Billas, I.M., Tsai, A., Fabbretti, A., Myasnikov, A.G., Robline,P., Vaianag,A.C., Hazemanna,I., Eilerh,D., Steitzi,T.A., Puglisic, J.D., Gualerzik, C.O., Klaholz, B.P. (2013).Involvement of protein IF2 N domain in ribosomal subunit joining revealed from architecture and function of the full-length initiation factor. Proceedings of the National Academy of Sciences of the United States of America 110: 15656-15661. 56. Williams, E.H., Bsat, N., Bonnefoy, N., Butler, C.A., Fox, T.D. (2005). Alteration of a novel dispensable mitochondrial ribosomal small-subunit protein, Rsm28p, allows translation of defective COX2 mRNAs. Eukaryotic Cell 4:337-345. 57. Horton, J.R., Elgar, S.J., Khan, S.I., Zhang, X., Wade, P.A., Cheng, X. (2007). Structure of the SANT domain from the Xenopus chromatin remodeling factor ISWI. Proteins: Structure, Function, and Bioinformatics 67: 1198-1202. 58. Krogan, N.J., Baetz, K., Keogh, M.C., Datta, N., Sawa, C., Kwok, T.C., Thompson, N.J., Davey, M.G., Pootoolal, J., Hughes, T.R., Emili, A., Buratowski, S., Hieter, P., Greenblatt, J.F. (2004). Regulation of chromosome stability by the histone H2A variant Htz1, the Swr1 chromatin remodeling complex, and the histone acetyltransferase NuA4. Proceedings of the National Academy of Sciences of the United States of America 101: 13513-13518. 59. Peters, J.M., Franke, W.W., Kleinschmidt, J.A. (1994). Distinct 19 S and 20 S subcomplexes of the 26 S proteasome and their distribution in the nucleus and the cytoplasm. The Journal of Biological Chemistry 269: 7709-7718. 60. Goossens, A., Fuente, N., Forment, J., Serrano, R., Portillo, F. (2000). Regulation of Yeast H+- ATPase by protein kinases belonging to a family dedicated to activation of plasma membrane transporters. Molecular and Cellular Biology 20: 7654-7661. 61. Dasmahapatra, B., Chakraburtty, K. (1981). Protein synthesis in yeast. I. Purification and properties of elongation factor 3 from Saccharomyces cerevisiae. The Journal of Biological Chemistry 256: 9999-10004. 62. Ramakrishnan, N., Drescher, M., Drescher, D. (2012). The SNARE complex in neuronal and sensory cells. Molecular and Cellular Neuroscience 50: 58-69. 63. Eriani, G., Delarue, M., Poch, O., Gangloff, J., Moras, D. (1990). Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs. Nature 347: 203-206. 64. Pao, S.S., Paulsen, I.T., Saier, M.H. Jr. (1998). Major facilitator superfamily. Microbiology and Molecular Biology Reviews 62: 1-34. 65. Schulz, G.E. (2002).The structure of bacterial outer membrane proteins. Biochimica et Biophysica Acta 1565: 308-317. OMICS A Journal of Integrative Biology Volume 17, Number 10, 2013 Original Articles ª Mary Ann Liebert, Inc. DOI: 10.1089/omi.2013.0015

A Database for Mycobacterium Secretome Analysis: ‘MycoSec’ to Accelerate Global Health Research

Ayan Roy,1 Sanghati Bhattacharya,1 Asim K Bothra,2 and Arnab Sen1

Abstract

Members of the genus Mycobacterium are notorious for their pathogenesis. Investigations from various per- spectives have identified the pathogenic strategies employed by these lethal pathogens. Secretomes are believed to play crucial roles in host cell recognition and cross-talks, in cellular attachment, and in triggering other functions related to host pathogen interactions. However, a proper idea of the mycobacterial secretomes and their mechanism of functionality still remains elusive. In the present study, we have developed a comprehensive database of potential mycobacterial secretomes (MycoSec) using pre-existing algorithms for secretome predic- tion for researchers interested in this particular field. The database provides a platform for retrieval and analysis of identified secretomes in all finished genomes of the family Mycobacteriaceae. The database contains valuable information regarding secretory signal peptides (Sec type), lipoprotein signal peptides (Lipo type), and Twin arginine (RR/KR) signal peptides (TAT type), prevalent in mycobacteria. Information pertaining to COG analysis, codon usage, and gene expression of the predicted secretomes has also been incorporated in the database. MycoSec promises to be a useful repertoire providing a plethora of information regarding myco- bacterial secretomes and may well be a platform to speed global health research. MycoSec is freely accessible at http://www.bicnbu.in/mycosec.

Introduction fect the human host. However, every mode of infection has a common scenario of bacterial adhesion to the host receptor, ycobacterium is one of the oldest known disease- secretion of toxins, and thus, paving way for successful in- Mcausing microorganisms associated with human and sertion of the virulence factors (Lee and Schneewind, 2001). bovine pathogenesis. Mycobacterium tuberculosis and Myco- Secretory proteins hold the key to interaction with the host bacterium leprae are notorious obligate pathogens (Cole et al., and inception of the disease (Bonin-Debs et al., 2004). Secre- 2001; De Voss et al., 2000) that have posed a serious menace to tomes are often found to be linked crucially with virulence human health from antiquity. Opportunistic pathogens such and thus promise to be striking drug targets for proper rem- as Mycobacterium abscessus and Mycobacterium ulcerans also edy of bacterial infections (Niederweis et al., 2010). have a significant impact on mycobacterial pathogenesis Secretomes have been defined as the complete set of pro- (Zumla and Grange, 2002). However, several members of the teins secreted by a cell (Ranganathan and Garg, 2009) and are genus are also nonpathogenic, saprophytic, and eco-friendly associated with a broad range of functions and critical bio- strains such as Mycobacterium vanbaalenii and Mycobacterium logical processes, such as cell-to-cell communication and cross sp. strains JLS, KMS, and MCS help in bioremediation process talks, cell migration, and most inevitably virulence and po- by degrading environmentally toxic polycyclic aromatic hy- tential infective strategies in disease mechanism (Tjalsma drocarbons (Miller et al., 2004). Thus, the genus Mycobacter- et al., 2004). ium, which includes lethal pathogens such as M. tuberculosis The signal peptide part of the secreted protein, which is and M. leprae and also biofriendly strains like M. vanbaalenii, generally composed of around thirty amino acid residues, generates a thrill among researchers not only from the path- transports the newly synthesized protein to the protein- ogenic perspective but also from the eco-friendly angles. conducting SecE and SecY channels associated with the Bacterial pathogenesis and its impact on human health has plasma membrane (Leversen et al., 2009). Signal peptides in always been a sensitive field of biomedical research. Bacterial most cases are reported to possess three domains: a positively communities exhibit a variety of pathogenic strategies to in- charged n-terminus (n-region), a stretch of hydrophobic

1Bioinformatics Facility, Department of Botany, University of North Bengal, Siliguri, India. 2Cheminformatics Bioinformatics Laboratory, Department of Chemistry, Raiganj College (University College), Raiganj, India.

502 MYCOBACTERIUM SECRETOME ANALYSIS 503 residues (H-region), and a region of mostly small uncharged candidates for being confirmed signal peptides once experi- residues containing a characteristic cleavage site recognized mental validation is accomplished. by a specific signal peptidase (SPase) (von Heijne, 1984, 1989, 1990a,b). It is this characteristic site that holds the key in Materials and Methods cleavage of a secretory protein by either of the two SPases, Type I or Type II. Software Various types of signal peptides are reported in bacterial The database has been designed on a HTML platform using systems among which secretory signal peptides (Sec type), the Macromedia Dreamweaver database development soft- Twin arginine signal peptides (TAT type), lipoprotein signal ware version 8. peptides (Lipo type), pseudopilin-like signal peptides, and bactericin and pheromone type signal peptides are most Strategies for identification of Mycobacterial secretomes prevalent (Tjalsma et al., 2004). However, mainly the first three types of signal peptides (i.e., Sec type, TAT type, and Identifying the initial pool of secretomes: Complete ge- Lipo type) are common in gram-positive bacteria (Pallen et al., nome sequences of Mycobacterium strains were retrieved from 2003). Sec type and Tat type signal peptides are cleaved by the IMG website (http://img.jgi.doe.gov/cgi-bin/w/main Type I SPase, whereas Lipo types are cleaved by Type II SPase .cgi) (Markowitz et al., 2006). Initially, forty-one strains re- (Storf et al., 2010). presenting twenty-five species were taken for analysis. The tremendous advancement in genome sequencing However, more species will be added as and when available. technology has yielded complete genome sequences of a Predictions of signal peptides were done with SignalP (ver- broad range of bacterial population. Automated prediction of sion 3.0). Although several other algorithms, including a the secretome has generated a lot of interest. Prediction of the newer version of SignalP, (SignalP 4.0) are available, we signal peptide-containing genes, along with their cleavage used SignalP 3.0 because as per Leversen et al., (2009) and sites in the finished bacterial genomes, have been achieved by Leversen and Wiker, (2012), SignalP 3.0 is more suitable for employing various algorithms such as Hidden Markov Model accurate prediction of signal peptides in Mycobacteria than (HMM), Neural Network (NN) (Bendtsen et al., 2004), and any other web-server, including Signal P 4.0. Gram-positive Support Vector Machines (SVM) (Vert, 2002). bacteria have been found to secrete proteins in the external There have been various web-based servers that employ environment by virtue of three important pathways (Pallen these algorithms and use perl scripts to predict the secretomes et al. 2003). These include Sec (general secretion) pathway, accurately in a given genome such as Signal P, Signal-CF, Twin arginine transporter (TAT) pathway, ESAT-6 pathway SIGCLEAVE, Predisi, SPEPLip, SecretomeP, and Phobius. (Champion, 2007), Type VII secretion system (Abdallah

Bioinformatics-based analysis and comparison of secretomes et al., 2007), and most importantly Lipo signaling pathway have been performed in a few cases; however, extensive in- (Rezwan et al., 2007). Since mycobacteria are gram-positive formation pertaining to the features and behavior of secre- bacteria,weusethesethreetypesofsignalingsystems tomes in various bacterial genomes remains to be plowed for identification and analysis. The primary pool was pro- from the depths of secretomic study. The study of codon cessed in three different ways for the classification of signal usage patterns, expressional behavior, and functional classi- peptides. fication of the predicted secretomes and also the evolutionary constraints on these secretomes in many bacterial genomes Identification of Sec Type signal peptides. The initial pool still remain elusive. of secretomes was fed to TMHMM (version 2.0) server in order Significant progress has been made in the field of myco- to fish out the sec type of signal peptides from the transmem- bacterial secretome analysis and their possible role in infec- brane proteins. We have considered protein sequences, with 0 tions. Secretory proteins were reported to be crucial for the to 2 transmembrane helices, as potential sec type of signal efficiency of BCG vaccines (Heimbeck, 1948). Various other peptides as per Mastronunzio et al. (2008) and Gore (2011). novel findings by Gomez et al., (2000), Rosenkrands et al., (2000), McDonough et al., (2005, 2008) have conferred con- Filtering lipoprotein-type signal peptides. For filtering li- siderable information about mycobacterial secretory systems poprotein-type signal peptides, two algorithms (Pred-Lipo and their varying types. However, predicting secretomes in and LipoP) are widely used. However, we used Pred-Lipo, mycobacterial genomes appears to be a difficult chore due which operates on the Hidden Markov Model, and has been to the unusual nature of mycobacterial cell membranes reported to be the most efficient in terms of prediction accu- (Leversen et al., 2009). Recently Leversen et al. (2009) reported racy and reports the lowest false positives (Bagos et al., 2008). a set of confirmed signal peptides in the Mycobacterium tu- The SignalP predicted data set was fed to Pred-Lipo server berculosis H37Rv strain by validating the putative signal (http://www.compgen.org/tools/PRED-LIPO) for lipopro- peptides that they and previous researchers had analyzed, tein prediction. employing various algorithms and finally matching them with high accuracy MS data. However, a complete schema of TAT-type signal peptide prediction. Among the three signal peptides in Mycobacterium is yet to be reported. widely used prediction servers (Pred-Tat, TatP, and TatFind), Keeping this in mind, we have developed the database, TatFind has a slight edge over others as it executes on a MycoSec, a repository of potential mycobacterial secretomes. combined approach of regular expression search (searching The database has a plethora of information pertaining to the twin arginine-RR/KR pattern) and hydrophobicity analysis secretome analysis in mycobacterial strains and presents in- (Rose et al., 2002). Moreover, TatFind results are more specific formation in an organized manner. The putative secretomes while matching with experimentally validated set of proteins. that have been included in the database promise to be strong Similar to previous section, the SignalP predicted data set was 504 ROY ET AL. fed to the TatFind server (http://signalfind.org/tatfind.html) method. This shows that our method is quite accurate. How- for TAT-type signal peptide prediction. ever, eleven proteins identified by Leversen et al. (2009) as A complete flowchart depicting our method of in silico signal peptides in strain H37Rv were not filtered by our identification of signal peptides is illustrated in Figure 1. method of identification (Locus tags: Rv0519c, Rv0744c, Rv0999, Rv1845c, Rv2693c, Rv3484, Rv3717, Rv0129c, Rv0285, Rv2576c, Comparison with experimentally validated data Rv2878c). This may be accounted by the stringency of our identification schema. These particular proteins have also been In silico prediction of any kind always demands an exper- incorporated in our database marked with an asterisk (*). imental validation. Scarcity of experimental wet lab data is a major bottleneck in the field of mycobacterial secretomic re- Database description search. However, Leversen et al. (2009) identified fifty-seven signal peptides and confirmed them by experimental valida- The database main page consists of the following inter- tion in Mycobacterium tuberculosis H37Rv. We have matched faces: our results with those of Leversen et al. (2009) and found that around eighty-one percent of the signal peptides identified by HOME. The homepage provides a general introduction to Leversen and co-workers were present in our identified dataset the genus Mycobacterium and pathogenesis of different my- of H37Rv strain. Leversen et al. (2009) also identified around cobacterial strains. It also discusses the characteristics of se- sixty-one proteins that had the potential to be signal peptides, cretomes and utility of research in the area of secretomics. The but were experimentally validated to be nonsignal peptides. All page also provides an idea as to why the database has been of these proteins were not screened by our identification developed, thus sketching the advantages of the database in

FIG. 1. Flowchart displaying the method of in silico identification of signal peptides. Completely sequenced Mycobacterium genomes were used as input data. The genomes were fed to SignalP 3.0 web-server for initial prediction of secretomes. Sequences that were predicted by both Hidden Markov Model and Neural Network algorithms of SignalP 3.0 were screened and used as the initial set of secretomes. The initial pool was fed to three different web based servers (i.e., TMHMM 2.0, PRED LIPO, and TATFIND 1.4) for the prediction of secretory signal peptides (Sec type), lipoprotein signal peptides (Lipo type), and Twin arginine (RR/KR) signal peptides (TAT type) respectively, allowing us to identify the final set of signal peptides of each type. MYCOBACTERIUM SECRETOME ANALYSIS 505 mycobacterial research. There is a ‘‘QUICK SEARCH’’ drop (on usage of all the codons with equal frequency excluding the down menu comprising the list of mycobacterial species/ termination codons). strains analyzed in all major pages. Clicking on a species/ Codon adaptation index (CAI) has been a well-established strain will take the user to the specific page displaying all parameter in determining the extent of codon usage bias for a pertinent information regarding the secretome classification gene concerned relative to a reference set of genes (usually and properties of that particular strain. ribosomal proteins) (Sharp and Li, 1987). CAI values have been employed extensively as measure of gene expression ORGANISMS. The ORGANISMS page displays the list of level (Ikemura, 1981; Naya et al., 2001; Wright and Bibb, all the mycobacterial species/strains on which we have exe- 1992). Higher CAI values signify higher expression levels of cuted our analysis. genes in a genome (Sen et al., 2008) and generally highly ex- pressed genes are more biased than the lowly expressed ones ANALYSIS. The analysis page describes the general (Dos Reis et al., 2003; Lafay et al., 2000; Sharp and Li, 1986, scheme adopted for prediction and analysis of secretomes. 1987). It is hypothesized that, in a genome, the codon usage of highly expressed genes are governed by selection pressure for USEFUL LINKS. For the benefit of the users, this page translational efficiency, whereas mutational bias influences has links to all major web-servers and tools used in MycoSec. the codon usage of the lower expressed ones (Sharp and Li, 1987). The CAI values for the mycobacterial secretomes were FUTURE PLANS. This includes our future plans as how calculated to explore their expression tendencies. CAI values to improve the database and update the contents with the have been calculated using the CAI Calculator 2 server availability of more finished sequences of Mycobacterium at (http://userpages.umbc.edu/*wug1/codon/cai/cais.php) publicly available domains. (Wu et al., 2005). The upper plot in each page represents the GC3 versus Nc ABOUT US. Provides an insight into the field of research values for the whole geome of a strain under scrutiny with an being carried out by our group, recent developments and insight to the ribosomal proteins and predicted secretomes. activities. The lower plot represents the CAI vs Nc values for the se- cretomes with respect to the ribosomal proteins and the whole CONTACT. Contact information of corresponding au- genome. thor and our group members who have been instrumental in developing the database. CoA graph. Correspondence analysis (CoA), a type of multivariate statistical analysis, has been very instrumental in

ORGANISM specific page/analysis page for a specific studying the codon usage patterns in a single genome and be- strain. These species/strain specific pages contain general tween different genomes (Ghosh et al., 2000; Greenacre, 1984). information about the respective species/strain. These pages Relative synonymous codon usage (RSCU) is a simple measure also contain icons which lead the users to TAT-TYPE, LIPO- of the heterogeneity in the usage pattern of synonymous codons TYPE, and SEC-TYPE specific analysis. (Sharp and Li, 1986). RSCU values represent the number of times a particular codon is observed relative to the number of Each specific analysis page consists of all the general in- times it would have been expected in case of a uniform syn- formation regarding the predicted secretomes in tabular onymous codon usage. Correspondence analysis on the basis of form in various columns: GenBank Accession, Locus Tag, RSCU and amino acid usage of the secretomes with respect to COG Categories, GC3%, and Nc & CAI values. The indexes the ribosomal protein coding genes and whole genomes were Twin Arginine and Hydrophobic Region are specific for the TAT- also calculated using CodonW (Ver. 1.4.2) software. type pages, as the twin arginine region is a specific charac- In the plot, the upper figure displays the correspondence teristic pattern of TAT-type signal peptides. Similarly, the analysis on amino acid usage of the predicted secretomes in LIPO-TYPE page contains the parameters: Most likely cleavage contrast to the whole genome and the ribosomal proteins. The site--the predicted cleavage site, Cleavage at–signifying the lower figure depicts the correspondence analysis on RSCU of position of cleavage by Type II SPase, and the Reliability score– the secretomes in reference to the whole genome and ribo- the reliability score for cleavage prediction by PRED-LIPO somal proteins. server. Each specific secretome type page contains five icons on COG analysis. This page has the graphical representation the top: of the Cluster of Orthologous Groups (COG) categories of the predicted secretomes. COGs comprise the collection of ortho- GC3/CAI-Nc plot. GC (frequency of guanine and cyto- logous proteins from similar phylogenetic lineage (Tatusov sine), GC3s (frequency of guanine or cytosine in the third et al., 2003). Information regarding the COG categories of the position of the codon), and Nc (effective number of codons) of potential secretomes was obtained from the IMG database. the mycobacterial genomes and secretomes were calculated The genes encoding the three different types of signal peptide using CodonW (Ver. 1.4.2) software (http://www.molbiol containing proteins were sorted into different COG categories .ox.ac.uk/cu) (Peden, 1999). The effective number of codons such as Information Storage and Processing, Cellular Processes and has always been an important index in understanding the Signaling, Metabolism, and Poorly characterized in accordance extent of codon preference in codon usage of a genome with the classification scheme followed by Hsiao et al. (2005). (Wright, 1990). It is a quantitative measure reflecting the fre- quency of a subset of codons used by a gene and its value Sequences in FASTA. Users can retrieve and download ranges from 20 (on usage of one codon per amino acid) to 61 all the gene and protein sequences predicted as secretomes for 506 ROY ET AL. a particular mycobacterial strain by clicking on ‘Genes in in investigating the extent of heterogeneity in a given genome. FASTA’ and ‘Proteins in FASTA’ icons respectively. An The Nc versus the GC3s graphical plots in our case depict that overall description of MycoSec is illustrated in Figure 2. Fig- majority of the genes, along with the signal peptide coding ure 3a, b, c depicts the snapshots of various pages of the genes, in all the genomes concerned, fall well below the ex- database. pected curve. A few genes, however, remain on or just below the curve as evident from the plots. Results CAI versus Nc plots MycoSec contains the predicted secretomes and various bioinformatic analysis related to secretomes of almost all The CAI versus Nc plots have also been generated to pro- ‘finished’ mycobacterial genomes. We have generated all vide a clear understanding of the expressional pattern of the relevant information regarding the codon usage indices, ex- secretomes. The CAI values for the secretomes range from 0.4 pressional patterns (using CAI values), and codon usage bias to 0.8, at the maximum, for all strains under investigation. in the mycobacterial genomes. The results are given in both tabular as well as graphical form, which may provide the Correspondence analysis on the basis of RSCU users with general information about the forces that have and amino acid usage been instrumental in shaping the codon usage patterns in the Multivariate statistical analysis performed on the basis of genomes as well as the secretomes. The COG (cluster of RSCU and amino acid usage can also be employed to explore orthlogous group) can be employed to have a brief knowhow the codon bias in genes and genomes (Sen et al., 2008). Results of the functional classification of predicted secretory protein from the CoA plots (on the basis of both RSCU and amino acid genes. usage) portray that the ribosomal proteins cluster at one ex- treme end of the major principal axis and secretome-related GC3 versus Nc plots genes were found to merge somewhat with this cluster, on The effective number of codons (Nc) versus the GC3s plotting Axis 1 versus Axis 2, the two major principle axes of graphical plot has been recommended to be an efficient way separation.

FIG. 2. Flowchart describing the database MycoSec. The database comprises of seven major interfaces as listed in green color code in the figure. The three interfaces—HOME, ORGANISMS, and ANALYSIS leads to the ORGANISM Specific Page. An ORGANISM Specific Page has information about a particular strain of Mycobacteria and has links to its Sec, Lipo, and TAT type signal peptides. Users can use this link to visit and retrieve specific information pertaining to each type of signal peptides. MYCOBACTERIUM SECRETOME ANALYSIS 507

FIG. 3. Snapshot of various pages of MycoSec: a) Homepage; b) Genome information page; c) Analysis page. 508 ROY ET AL.

COG graphs infections at the molecular level and promises to provide ample avenues for developing novel therapeutics for eradi- The COG graphs reveal that the majority of the secre- cation of the mycobacteria-related diseases. tomesfallunderthecategories‘Cellular Processes and Signaling’ and ‘Metabolism’, while very few lie in the ‘In- formation Storage and Processing’ category. Among them, Acknowledgments COG M (cell wall/membrane/envelope biogenesis), was The authors are grateful to the Department of Biotechnol- foundtobemostabundantinalltypesofsecretomesforall ogy, Government of India, for providing financial help in the strains studied. setting up Bioinformatics Infrastructural facility at University of North Bengal. A. Sen acknowledges the receipt of the DBT- Discussion CREST Award. Early findings were presented as an abstract Synonymous codon usage bias in prokaryotic genomes has in the International Interdisciplinary Science Conference held at been inferred to be shaped by the effects of translation effi- Jamia Malia University, Delhi, India, in 2011. ciency and mutation bias. The effective number of codons (Nc) versus the GC3s graphical plots can be employed as a Author Disclosure Statement tool to determine the forces that govern the codon usage patterns. Genes whose codon bias are entirely governed by a No competing financial interests exist. mutation bias must lie on or just below the curve in a GC3 versus Nc plot, and genes lying well below the expected curve References are considered to be under the influence of translational se- lection (Peden, 1999). It can be easily deduced from the GC3 Abdallah AM, van Pittius NCG, Champion PADG, et al. (2007). versus Nc plots, from the present study, that a majority of the Type VII secretion—Mycobacteria show the way. Nat Rev genes encoding the signal peptides are under the effect of Microbiol 5, 883–891. Bagos PG, Nikolaou EP, Liakopoulos TD, and Tsirigos KD. selection for translational efficiency. However, a few genes (2010). Combined prediction of Tat and Sec signal peptides also display the influence of mutation bias. This trend has with hidden Markov models. Bioinformatics 26, 2811–2817. been found in all the strains under study. Bagos PG, Tsirigos KD, Liakopoulos TD, and Hamodrakas SJ. Focusing on the expressional behavior, it is quite evident (2008). Prediction of lipoprotein signal peptides in Gram- from the CAI versus Nc plots that the secretomes are mod- positive bacteria with a Hidden Markov Model. J Proteome erately expressed. Res 7, 5082–5093. Correspondence analysis (CoA) is a congregated technique Banerjee R, Roy A, Ahmad F, Das S, and Basak S. (2012). Evo- that highlights the major tendencies in the variation of data lutionary patterning of hemagglutinin gene sequence of 2009 and places them along the continuous axes according to the H1N1 pandemic. J Biomol Struct Dyn 29, 733–742. variations observed (Banerjee et al., 2012). Selection force due Bendtsen JD, Nielsen H, Von Heijne G, and Brunak Sr. (2004). to translational efficiency can be inferred to be acting on the Improved prediction of signal peptides: SignalP 3.0. J Mol Biol genomes when the ribosomal proteins cluster at any extreme 340, 783–795. end of the major principal axis in a CoA plot based on RSCU Bonin-Debs AL, Boche I, Gille H, and Brinkmann U. (2004). and amino acid usage (Peden, 1999). A similar trend was Development of secreted proteins as biotherapeutic agents. noticed for all the mycobacterial genomes on plotting Axis 1 Expert Opin Biol Ther 4, 551–558. versus Axis 2, the two major principle axes of separation. Champion PA, and Cox JS. (2007). Protein secretion systems in Correspondence analysis reveals the crucial role of transla- Mycobacteria. Cell Microbiol 9, 1376–1384. tional selection pressure in shaping the codon usage pattern of Cole ST, Eiglmeier K, Parkhill J, et al. (2001). Massive gene decay the whole genome as well as the secretomes, along with a in the leprosy bacillus. Nature 409, 1007–1011. subtle effect of mutation bias. De Voss JJ, Rutter K, Schroeder BG, Su H, Zhu YQ, and Barry CE. (2000). The salicylate-derived mycobactin siderophores of Mycobacterium tuberculosis are essential for growth in macro- Conclusion phages. Proc Natl Acad Sci USA 97, 1252. Research on mycobacterial pathogenesis has always been a Dos Reis M, Wernisch L, and Savva R. (2003). Unexpected cor- topic of immense interest in biomedical sciences and has taken relations between gene expression and codon usage bias from a giant leap with the advancement of genome sequencing microarray data for the whole Escherichia coli K-12 genome. programs. Numerous genomes of Mycobacterium have been Nucleic Acids Res 31, 6976–6985. Ghosh TC, Gupta SK, and Majumdar S. (2000). Studies on codon sequenced and the number is increasing day by day. It is now usage in Entamoeba histolytica. Int J Parasitol 30, 715–722. a daunting task to cluster and analyze the huge amount of Gomez M, Johnson S, and Gennaro ML. (2000). Identification of data that are being generated from these genome sequencing secreted proteins of Mycobacterium tuberculosis by an ioin- programs to a meaningful conclusion in a reasonable time. It formatics approach. Infect Immun 68, 2323–2327. was therefore a humble effort from our group to analyze at Gore D. (2011). In silico identification of cell surface antigens in least the secretome-related information of all sequenced my- Neisseria ioinformati. Biomirror 2, 1–5. cobacterial genomes and bring the information into one spe- Greenacre MJ. (1984). Theory and Applications of Correspondence cific platform (the MycoSec) for the valued researchers. Analysis. Academic Press, London. MycoSec is freely accessible at http://www.bicnbu.in/ Heimbeck J. (1948). BCG vaccination of nurses. Tubercle 29, 84–88. mycosec and will be updated and expanded regularly. My- Hsiao WW, Ung K, Aeschliman D, Bryan J, Finlay BB, and coSec, a repository of potential mycobacterial signal pep- Brinkman FS. (2005). Evidence of a large novel gene pool as- tides, can divulge much information underlying pathogenic sociated with prokaryotic genomic islands. PloS Genet 1, e62. MYCOBACTERIUM SECRETOME ANALYSIS 509

Ikemura T. (1981). Correlation between the abundance of Es- Rosenkrands I, Weldingh K, Jacobsen S, et al. (2000). Mapping cherichia coli transfer RNAs and the occurrence of the respec- and identification of Mycobacterium tuberculosis proteins by tive codons in its protein genes: A proposal for a synonymous two-dimensional gel electrophoresis, microsequencing and codon choice that is optimal for the E. coli translational system. immunodetection. Electrophoresis 21, 935–948. J Mol Biol 151, 389–409. Sen A, Sur S, Bothra AK, Benson DR, Normand P, and Tisa LS. Lafay B, Atherton JC, and Sharp PM. (2000). Absence of trans- (2008). The implication of life style on codon usage patterns lationally selected synonymous codon usage bias in Helico- and predicted highly expressed genes for three Frankia ge- bacter pylori. Microbiology 146, 851–860. nomes. Antonie Leeuwenhoek 93, 335–346. Lee VT, and Schneewind O. (2001). Protein secretion and Sharp PM, and Li WH. (1986). An evolutionary perspective on the pathogenesis of bacterial infections. Genes Dev 15, 1725– synonymous codon usage in unicellular organisms. J Mol Evol 1752. 24, 28–38. Leversen NA, de Souza GA, Malen H, Prasad S, Jonassen I, and Sharp PM, and Li WH. (1987). The codon adaptation index-a Wiker HG. (2009). Evaluation of signal peptide prediction al- measure of directional synonymous codon usage bias, and its gorithms for identification of mycobacterial signal peptides potential applications. Nucleic Acids Res 15, 1281–1295. using sequence data from proteomic methods. Microbiology Storf S, Pfeiffer F, Dilks K, Chen ZQ, and Imam S. (2010). Mu- 155, 2375–2383. tational and ioinformatics analysis of haloarchaeal lipobox- Leversen NA, and Wiker HG. (2012). Improved signal peptide containing proteins. Archaea 2010, 1–11. predictions in mycobacteria? Tuberculosis 92, 291–292. Tatusov RL, Fedorova ND, Jackson JD, et al. (2003). The COG Markowitz VM, Ivanova N, Palaniappan K, et al. (2006). An database: An updated version includes eukaryotes. BMC experimental metagenome data management and analysis Bioinformatics 4, 41. system. Bioinformatics 22, e359–e367. Tjalsma H, Antelmann H, Jongbloed JD, et al. (2004). Proteomics Mastronunzio JE, Tisa LS, Normand P, and Benson DR. (2008). of protein secretion by Bacillus subtilis: Separating the ‘‘secrets’’ Comparative secretome analysis suggests low plant cell wall of the secretome. Microbiol Mol Biol Rev 68, 207–233. degrading capacity in Frankia symbionts. BMC Genomics 9, 47. Vert JP. (2002). Support vector machine prediction of signal McDonough JA, Hacker KE, Flores AR, Pavelka MS, and peptide cleavage site using a new class of kernels for strings. Braunstein M. (2005). The twin-arginine translocation path- Proc Pacific Sympos Biocomput Citeseer, pp. 649–660. way of Mycobacterium smegmatis is functional and required for Von Heijne G. (1984). How signal sequences maintain cleavage the export of mycobacterial beta-lactamases. J Bacteriol 187, specificity. J Mol Biol 173, 243–251. 7667–7679. Von Heijne G. (1989). The structure of signal peptides from McDonough JA, McCann JR, Tekippe EME, Silverman JS, Rigel bacterial lipoproteins. Protein Eng 2, 531–534. NW, and Braunstein M. (2008). Identification of functional Tat Von Heijne G. (1990a). Protein targeting signals. Curr Opin Cell signal sequences in Mycobacterium tuberculosis proteins. J Biol 2, 604. Bacteriol 190, 6428–6438. Von Heijne G. (1990b). The signal peptide. J Membr Biol 115, Miller CD, Hall K, Liang YN, et al. (2004). Isolation and char- 195–201. acterization of polycyclic aromatic hydrocarbon-degrading Wright F. (1990). The ‘effective number of codons’ used in a mycobacterium isolates from soil. Microbial Ecol 48, 230–238. gene. Gene 87, 23–29. Naya H, Romero H, Carels N, Zavala A, and Musto H. (2001). Wright F, and Bibb MJ. (1992). Codon usage in the G + C-rich Translational selection shapes codon usage in the GC-rich Streptomyces genome. Gene 113, 55–65. genome of Chlamydomonas reinhardtii. FEBS Lett 501, 127–130. Wu G, Culley DE, and Zhang W. (2005). Predicted highly ex- Niederweis M, Danilchanka O, Huff J, Hoffmann C, and pressed genes in the genomes of Streptomyces coelicolor and Engelhardt H. (2010). Mycobacterial outer membranes: In Streptomyces avermitilis and the implications for their metabo- search of proteins. Trends Microbiol 18, 109–116. lism. Microbiology 151, 2175–2187. Pallen MJ, Chaudhuri RR, and Henderson IR. (2003). Genomic Zumla AI, and Grange J. (2002). Non-tuberculous mycobacterial analysis of secretion systems. Curr Opin Microbiol 6, 519–527. pulmonary infections. Clin Chest Med 23, 369–376. Peden J. (1999). Analysis of codon usage. PhD Thesis, The Uni- versity of Nottingham, UK. Ranganathan S, and Garg G. (2009). Secretome: Clues into Address correspondence to: pathogen infection and clinical applications. Genome Med Arnab Sen 1, 113. Bioinformatics Facility Rezwan M, Grau T, Tschumi A, and Sander P. (2007). Lipo- Department of Botany protein synthesis in mycobacteria. Microbiology 153, 652–658. University of North Bengal Rose RW, Bruser T, Kissinger JC, and Pohlschroder M. (2002). Siliguri 734013 Adaptation of protein secretion to extremely high-salt condi- India tions by extensive use of the twin-arginine translocation pathway. Mol Microbiol 45, 943–950. E-mail: [email protected] Characterization of actino-haemoglobins with reference to evolution of plant truncated haemoglobins Bhattacharya, S., Tisa, L. S., & Sen, A. (2017) (Abstract of communicated paper)

Plant haemoglobins (pHb) constitute a diverse group of haem proteins and evolutionarily belong to three different classes - symbiotic, non-symbiotic and truncated. Since truncated pHbs have a 2-on -2 structure, they are structurally different from the other two haemoglobin groups. Here, we car- ried out an in silico analysis with different types of pHbs along with actinobacterial haemoglobins (bHb) in order to elucidate the inherent functional diversity and the underlying evolutionary mechanism among intra and inter-specific members. To analyze functional divergence, we have performed assessment of land pHbs along with bHb and have done altered evolutionary rates among all types of member proteins. Our assessment revealed that plant truncated haemoglobin (ptHb) was functionally diverged from the other pHbs (class I & II) while some properties are similar with non-symbiotic (ns) class I Hb. The percentile calculation of codon usage amongst bHb and a phylogenetic approach was undertaken to understand the ancient lineage of the protein family. The backbone structures of pHbs (including truncated Hb) and bHbs were totally different, but their side chain modifications were much more similar for ptHbs and bHbs. This result sup- ports the hypothesis that ptHbs have a similar structural arrangement with bHbs. However,the fol- lowing evidence - the similar backbone structural arrangements of all pHbs, similar side chain ar- rangements of ptHbs with bHbs and several similarities of ptHbs with class I nsHbs - suggests that divergence may occur amongst plant and bacterial haemoglobin proteins where ptHbs have some linker values between them.