Isolation and Characterization of -Producing Microorganisms in the

Red Sea

Dissertation by

Siham Kamal Fatani

In Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

King Abdullah University of Science and Technology, Thuwal,

Kingdom of Saudi Arabia

September, 2019

2

Examination Committee Page

The Dissertation of Siham Fatani is Examined by the Committee Members

Committee Chairperson: Prof. Takashi Gojobori

Committee Members: Prof. Vladimir Bajic, Prof. Susana Agusti, Prof. Shugo Watabe

3

© September, 2019

Siham Kamal Fatani

All Rights Reserved

4

ACKNOWLEDGMENTS

This work is a consequence of great help and guidance from many people; faculty, family and friends. I am really happy to have these people by my side while undertaking my PhD Dissertation.

First, I would like to express my profound gratitude and respect to my supervisor, Prof. Takashi

Gojobori, Distinguished Professor of Bioscience and Associate Director of Computational

Bioscience Research Center for his professional guidance, and regular encouragement and motivation at various stages of this work. I would also like to thank Dr. Katsuhiko Mineta for his support and advice during my research. Moreover, I would like to express my deepest appreciation to Dr. Yoshimoto Saito for his assistance and suggestions throughout my project. I also appreciate

Mr. Mohammad Al-Arawi for his support and technical advice and without them this work would not have been possible for me to complete.

In addition, I would like to thank my committee members, Prof. Vladimir Bajic, Prof. Susana

Agusti and Prof. Shugo Watabe for giving their time to review my Ph.D. thesis and for offering their insight and suggestions.

Finally, I would like to thank my family who always encourage me to continue for higher education and push me to do my best during my Ph.D. and my KAUSTian friends who made the five-years journey in KAUST really enjoyable and the CGG lab member who always support me.

5

ABSTRACT

Isolation and Characterization of Cellulase-Producing Microorganisms in the Red Sea

Siham Kamal Fatani

Cellulase-producing microorganisms are considered as a key player in various environments to degrade the plant and were isolated from various environment like soils, mangroves and oceans. The Red Sea has a unique environment in terms of high seawater temperature, high salinity and low nutrients. This study aims of examining if the Red Sea is a potential resource for cellulase-producing microorganisms and cellulase genes.

First, I investigated types of microbial cellulase genes in the Red Sea based on public metagenomic datasets. The analysis revealed 3,383 microbial cellulase were more abundant in shallow depth than in deep seawater, and were classified into 16 sub-GH orthologous groups.

These results suggest that the Red Sea environment is potentially an excellent gene resource of microbial due to its high diversity.

Next, cellulase-producing microorganisms were isolated and screened from the Red Sea.

Three bacterial and one fungal strain were successfully obtained. The MLTS analysis showed that the three bacterial strains belong to Bacillus paralichiniformis. The 18S rRNA of fungal strain showed 99% similarity to Aspergillus ustus and the enzymatic assay of the four strains showed high cellulase activity. These results suggest that these four isolates secreted active cellulases.

Next, I tried to identify cellulase genes actually working during their cellulolysis by conducting comparative transcriptome analysis of the candidate genes and identified cellulase genes that are highly expressed during cellulolysis.

To my knowledge, it is the first attempt to find out cellulase genes functioning during their cellulolysis among distinct cellulases on genomes of microorganisms. The results showed that

6

although all the candidate genes were upregulated in general, a limited number of cellulase genes were highly expressed, which are highly expected to have a crucial role in cellulolysis. I also identified operon structures composed of genes including cellulases. This will provide us with the information to elucidate the cellular mechanisms occurring along with the cellulolysis in bacterial strains. We can expect that the Red Sea is a potential resource for new cellulase genes applicable for the industry. These information can be significantly useful for the bio-prospecting research of microbial cellulases in the Red Sea.

7

Table of Contents

Examination Committee Page ...... 2 ACKNOWLEDGMENTS ...... 4 ABSTRACT ...... 5 LIST OF ABBREVIATIONS ...... 9 LIST OF FIGURES ...... 11 LIST OF TABLES ...... 12 Chapter 1. General Introduction ...... 13 1.1. and Cellulases ...... 13 1.2. Cellulase-Producing Microorganisms ...... 14 1.3. Carbohydrate-Active Database (CAZy) ...... 15 1.4. The Red Sea Metagenome ...... 16 1.5. The Aim of This Study ...... 17 Chapter 2. A Survey of the Red Sea Metagenome for Cellulase-Producing Microorganisms ... 18 2.1. Introduction ...... 18 2.2. Material and Methods ...... 21 2.2.1. Collection and Analysis of the Red Sea Metagenomes ...... 21 2.2.2. Hierarchical Clustering Analysis ...... 23 2.3. Results and Discussion ...... 23 2.3.1 Relative Abundance and Composition of Cellulase Orthologues in the Red Sea ...... 23 2.3.2 Hierarchical Clustering Analysis of Cellulase Orthologous Composition in Red Sea Metagenomes ...... 30 2.4. Conclusion ...... 31 Chapter 3. Isolation, Screening, and Characterization of Cellulase-Producing Microorganisms from the Red Sea ...... 32 3.1. Introduction ...... 32 3.2. Material and Methods ...... 33 3.2.1 Collection of Samples from the Red Sea ...... 33 3.2.2 Isolation and Screening of Cellulase-Producing Microorganisms ...... 34 3.2.3 Taxonomy Identification ...... 35 3.2.4 DNA Preparation for Whole Genome Sequencing of Bacterial Isolates ...... 36

8

3.2.5 De novo Assembly of the Genome Sequencing Data ...... 37 3.2.6 Genome Annotation ...... 37 3.2.7 Multilocus Sequence Typing (MLST) Analysis of Bacterial Isolates ...... 38 3.2.8 Preparation of Extracellular Cellulase Enzymes ...... 40 3.2.9 Measuring the Cellulase Activity ...... 41 3.3. Results and Discussion ...... 42 3.3.1 Isolation of Cellulase-Producing Microorganisms from the Red Sea Samples ...... 42 3.3.2. Taxonomic Identification of Cellulase-Active Strains ...... 45 3.3.3 Whole Genome Sequencing and Assembly ...... 47 3.3.4. Multilocus Sequence Typing (MLST) Analysis ...... 47 3.3.5. Activity Assay...... 48 3.3.6. Conclusion ...... 52 Chapter.4. Characterization of Cellulase Genes by Transcriptome Analysis ...... 53 4.1. Introduction ...... 53 4.2. Materials and Methods ...... 54 4.2.1. RNA Extraction and Sequencing ...... 54 4.2.2. De Novo Assembly of RNA-Seq ...... 54 4.2.3. Expression Analysis of Cellulase Genes in the Isolates ...... 55 4.2.4 Operon Structure Identification ...... 55 4.3. Results and Discussion ...... 56 4.3.1. A Search of Cellulase Gene from the Complete Genomes ...... 56 4.3.2. Identification of Operon Structure ...... 60 4.4 Conclusion ...... 68 Chapter 5. General Discussion ...... 69 References...... 73 APPENDIX...... 78

9

LIST OF ABBREVIATIONS

Abbreviation Description

BLAST Basic Local Alignment Search Tool

CAZy Carbohydrate-Active Enzymes

CMC

CBM Carbohydrate-Binding Module

DNS 3,5-Dinitrosalicylic acid

EMBOSS European Molecular Biology Open Software Suite.

FPU Filter Paper Unit

FPA Filter Paper Assay

GH Glycoside

MLTS Multilocus Sequences Typing

NM Nutrient Media

mg Milligram

μg Microgram

ml Milliliter

NCBI National Centre for Biotechnology Information

Pfam Collection of protein families, each represented by multiple sequence alignments and hidden Markov models

PTS Phospho- System

10

PacBio Pacific Bioscience

RNAseq Ribonucleic acid Sequences

rpm Revolutions Per Minute

RPKM Reads Per Kilobase per Million

SRA Sequence Read Archive

TE Tris EDTA

U Unit

11

LIST OF FIGURES

Figure 1.1 Diagrammatic overview represents the cellulose hydrolysis by cellulase system...... 14 Figure 2.1 The relative abundance of the microbial cellulases genes in the metagenome samples at all six different depths of the Red Sea. X-axis represent the different depths of the Red Sea and Y-axis represent the relative abundance of microbial cellulase genes...... 25 Figure 2.2 The abundance and diversity of cellulase orthologs genes in two representative locations of the shallow seawater...... 29 Figure 2.3 The abundance and diversity of cellulase orthologs genes in two representative locations of the shallow seawater...... 30 Figure 2.4 Hierarchical cluster analysis of Red Sea metagenomes of different depths...... 31 Figure 2.5 The abundance of Cellulase orthologs in 10m depth of Red Sea metagenome...... 79 Figure 2.6 The abundance of Cellulase orthologs in 25m depth of Red Sea metagenome...... 80 Figure 2.7 The abundance of Cellulase orthologs in 50m depth of Red Sea metagenome...... 80 Figure 2.8 The abundance of Cellulase orthologs in 100m depth of Red Sea metagenome...... 81 Figure 2.9 The abundance of Cellulase orthologs in 200m depth of Red Sea metagenome...... 82 Figure 2.10 The abundance of Cellulase orthologs in 500m depth of Red Sea metagenome...... 82 Figure 3.1 Red Sea samples from the coastal region. (A) Plankton fraction, (B) Seawater, and (C) Seaweeds...... 33 Figure 3.2 Screening of cellulase-producing microorganisms by Congo red on CMC agar plate. (A) Fungal isolate from plankton (PF1), (B) Bacterial isolate from plankton sample (PB4), (C and D) Bacterial isolate from the seaweeds (SB4 and SB5) and (E) E. coli as a negative control...... 44 Figure 3.3 Neighbor-joining phylogenetic tree based on the 18s rRNA of the strains PF1. The Taxa clustered together in the bootstap test (1000 replicates). The evolutionary distance bar shows the unit of the number of base substitutions per site...... 46 Figure 3.4 Neighbor-joining phylogenetic tree based on MLST housekeeping genes in the PB1, SB2 and SB3 and other bacillus strains. The value of each node represents the bootstrap value (1000 replicates). The evolutionary distance bar shows the unit of the number of base substitutions per site...... 48 Figure 3.5 Filter paper assay. The degradation of the filter paper by the four strains. (A) negative control, (B) PF1, (C) PB1, (D) SB2 and (E) SB3...... 50 Figure 3.6 Filter paper assay activity measurement of the four strains. (A) PF, (B) PB1, (C) SB2, (D) SB3 and (E) activity measurements for four isolates cultured in non-inducing media as a control...... 51 Figure 4.1 Operon structure of cellulase genes in PB1 strain...... 66 Figure 4.2 Operon structure of cellulase genes in SB2 strain...... 66 Figure 4.3 Operon structure of cellulase genes in SB3 strain...... 67

12

LIST OF TABLES

Table 2.1 Glycoside families including cellulases in CAZy...... 20 Table 2.2 Samples of the Red Sea metagenomes used in this study with different location and depths...... 22 Table 2.3 Number of cellulase orthologs detected from all the metagenome samples...... 24 Table 2.4 The relative abundance of cellulase orthologs in 10m metagenome samples...... 26 Table 2.5 The relative abundance of cellulase orthologs in 25m metagenome samples...... 26 Table 2.6 The relative abundance of cellulase orthologs in 50m metagenome samples...... 27 Table 2.7 The relative abundance of cellulase orthologs in 100m metagenome samples...... 27 Table 2.8 The relative abundance of cellulase orthologs in 200m metagenome samples...... 28 Table 2.9 The relative abundance of cellulase orthologs in 500m metagenome samples...... 28 Table 3.1 Reference species genome number for Multilocus sequences typing (MLTS)...... 40 Table 3.2 Types of samples and the number of actual and active isolates...... 43 Table 3.3 isolates taxonomy based on 16S rRNA sequence ...... 45 Table 3.4 Whole genome size of the three bacterial isolates...... 47 Table 4.1 Relative expression of cellulase gene in cellulase-inducing and non-inducing conditions of PB1. Highlighted values represent 10 times upregulated genes...... 57 Table 4.2 Relative expression of cellulase gene in cellulase-inducing and non-inducing conditions of SB2. Highlighted values represent 10 times upregulated genes. NC is stand for not calculated value...... 57 Table 4.3 Relative expression of cellulase gene in cellulase-inducing and non-inducing conditions of SB3. Highlighted values represent 10 times upregulated genes. NC is stand for not calculated value...... 58 Table 4.4 Relative expression of cellulase gene in cellulase-inducing and non-inducing conditions of PF1. Highlighted values represent 10 times upregulated genes. NC is stand for not calculated value...... 58 Table 4.5 Most upregulated cellulase genes and the abundance of gene expression in PB1...... 62 Table 4.6 Most upregulated cellulase genes and the abundance of gene expression in SB2...... 62 Table 4.7 Most upregulated cellulase genes and the abundance of gene expression in SB3...... 62 Table 4.8 Most upregulated cellulase genes and the abundance of gene expression in PF1...... 62 Table 4.9 Expression rate of genes formed operon with cellulase genes in PB1...... 63 Table 4.10 Expression rate of genes formed operon with cellulase genes in SB2 NC is for not calculated values...... 63 Table 4.11 Expression rate of genes formed operon with cellulase genes in SB3...... 64

13

Chapter 1. General Introduction

1.1. Cellulose and Cellulases

Cellulose is the most abundant plant biomass and renewable bio-resource (100 billion dry tons/year) produced on earth (Zhang & Lynd 2004). It comprises around 35-50% of the plant dry weight and it is found associated with other lignocellulosic biomass like and that make 20-35% and 5-30% of the plant dry weight, respectively (Behera et al. 2017). Cellulose is a linear homologous polymeric chain consisting of D- residues that make up to 10,000 glucose residues, linked by β-1,4-glycosidic bonds ( Teeri 1997; Fukuoka & Dhepe 2006).

Biodegradation of β-1,4-glycosidic bonds in the cellulose biomass is accomplished by an enzyme like a cellulase and a multi-enzyme complex called cellulosome, which can catalyze the hydrolysis of cellulose into sugars. These enzymes are produced by various microorganisms such as and fungi (Behera et al. 2017).

All cellulases are classified as (GH) orthologues and have two common cellulase active sites. One is glycoside hydrolase that exhibit endocelluloytic (endo- beta-1,4-D-) activity (EC 3.2.1.4), and bind anywhere in the cellulose fiber. The other one is an exocellulolytic active site (cellobiohydrolase EC 3.2.1.91) which binds at the ends of a (usually 2-4 residues) and produce unit-length products (Davies &

Henrissat 1995), These cellobioses produced are then hydrolyzed by β- (EC 3.2.1.21) to glucose from the non-reducing end (Figure 1.1).

14

Around 60% of cellulase enzymes are multi-domains proteins. In addition to the catalytic domain, they also have accessory domains such as carbohydrate binding modules (CBM) that help the cellulases to binds to cellulose fibers (Sadhu & Maiti 2013).

The glucose monomers then undergo to the fermentation process to produce gases like methane and ethanol that can be used for bio-fuel production (Sukharnikov et al. 2011). Moreover, since cellulases can be used in wide range of applications, they are now available in industries such as food and beverage, pulp and paper, textile, animal feed, detergent and agriculture (Kuhad et al.

2011).

Polysacchride (Cellulose)

Endoglucanase

Exoglucanase

Disacchride (cellobiose) Monosacchride β-glucosidase (Glucose)

Figure 1.1 Diagrammatic overview represents the cellulose hydrolysis by cellulase system.

1.2. Cellulase-Producing Microorganisms

Various bacterial and fungal strains secrete the cellulase enzymes to utilize the cellulose materials as a carbon source. A large number of microbial cellulases have been isolated, classified and

15

characterized from different environments so far such as soils, sediments, oceans and animal excreta etc (Wang et al. 2009). For example, , Acetivibrio, Bacillus, Bacteriodes,

Cellomonas, Thermonospora, Ruminococcus and Erwinia species were reported as cellulase- producing bacterial strains (Robson & Chambliss 1989), and Aspergillus,

Penicillium, , and Fusarium species were also reported as cellulase-producing fungal strains (Galante et al. 2014).

Marine microorganisms that produce cellulase enzymes might provide several features and advantages over other cellulase-producing microorganisms isolated from different environments

(Das et al. 2006). There are many reports regarding the isolations of strains from marine environments. For example, Smira et al. (2011) have isolated cellulase-active strains of

Streptomyces variabilis, Kocuria rosea and Stenotrophomonas maltophilia from the Persian Gulf

(Samira et al. 2011). To our knowledge, there is no report regarding microbial cellulases from the

Red Sea environment.

1.3. Carbohydrate-Active Enzymes Database (CAZy)

The carbohydrate-Active Enzymes (CAZy) database (Cantarel et al. 2008) provides us the classification of cellulase enzymes based on sequence comparison and classified into 153 glycoside-hydrolase orthologues. Following this classification in the CAZy database, all known cellulases are classified into 16 GH orthologues corresponding to two enzyme commission (EC) numbers; EC 3.2.1.4 (endoglucanase) and EC 3.2.1.91 (cellobiohydrolase). GH5 and GH9 orthologues have the largest number of cellulase sequences with the biochemical characterization

(Sukharnikov et al. 2011).

16

Many types of research have reported that cellulases belong to different GH orthologues. Crennell

(2002) has reported cellulase enzymes that belong to GH12 orthologs from Rhodothermus marinus

(Crennell et al. 2002). Malburg et al. (1997) have also reported an endoglucanase F from

Fibrobacter succinogenes S85 of GH51 orthologues (Malburg et al. 1997). Cellulase from GH7 that have endo-β-1,4-glucanase and cellobiohydrolase activities has been reported from

Fusicoccum sp. (Kanokratana et al. 2008). Cellulase Egl-257 from Bacillus circulans belongs to

GH9 (Hakamada et al. 2002). Cellulase cel44a from belonging to GH44 has also been reported to have an active cellulase enzyme (Kitago et al. 2007).

1.4. The Red Sea Metagenome

The Red Sea environment is very unique among various oceans on earth; for example, high salinity, low nutrient concentration, and high temperature. It is described as an oligotrophic environment that encompasses one of the warmest and saltiest waters in the world with year-round high UV radiations (Behzad et al. 2016a). These characteristics are thought to have created and changed the evolution and diversity of microbial forms in the Red Sea. Ngugi et al. (2012) revealed that the community structure of bacteria and pico-eukaryotic plankton in the Red Sea surface water is quite different from those of other marine environments, and that many of these organisms can apparently adapt to harsher conditions and higher temperature (Ngugi et al. 2012). They also stated that numerous species can be found only in the Red Sea environment (Ngugi et al. 2012). However, cellulolytic microorganisms and cellulases have remained unstudied in the Red Sea.

17

The metagenomic approach has improved the field of microbial ecology by revealing a large amount of diversity of unknown microbial life in different environments. The huge amount of generated data has allowed the identification of a large number of microbial genes, their community interactions, the adaptation mechanisms and their potential use in different biotechnological and industrial applications (Behzad et al. 2016a). So far, genes that are involved in DNA repair, high-intensity light responses and osmoregulation were found in the Red Sea metagenomic databases (Behzad et al. 2016b), suggesting environmental adaptation of the Red

Sea microbiota.

1.5. The Aim of This Study

The aim of this study is to understand cellulase-producing microorganisms and their cellulase genes by the isolation and characterization of them from the Red Sea.

This thesis is composed of five chapters. In Chapter 1, I have summarized the background of cellulose, cellulase enzymes and cellulase-producing microorganisms. In Chapter 2, I conducted a survey of microbial cellulases using Red Sea metagenomes to reveal whether the Red Sea is a possible resource of cellulases. In Chapter 3, I isolated cellulase-producing microorganisms from the Red Sea and characterized their cellulase activities. In Chapter 4, I conducted comparative transcriptome analysis to determine cellulase genes that are actually working during their cellulolysis. In addition, I identified the type of genes that were co-expressed and formed operons with the highly expressed cellulase genes. Finally, in Chapter 5 by combining all the results from

Chapters 2, 3 and 4, I discussed the significance of my study and possible contribution to industry.

18

Chapter 2. A Survey of the Red Sea Metagenome for Cellulase-Producing Microorganisms

2.1. Introduction

As I mentioned in Chapter 1, the metagenomic sequencing is a technique that can provide us with high-throughput information of microbial genome sequences living in the environments including unculturable microbial species (Vartoukian et al. 2010). The metagenomic survey of environments where plant are being effectively utilized and decomposed is the most promising way to find novel microbial-cellulases that work for cellulolysis more efficiently than currently available cellulases (Sukharnikov et al. 2011). A lot of metagenomic data called “metagenomes” have been generated recently from various environments such as gut of insects (Warnecke et al.

2007), freshwater (Debroas et al. 2009), the ocean (Yooseph et al. 2007), mammalians

(Brulc et al. 2009) and human intestine (Qin et al. 2010), which can be used to find out cellulase- producing microorganisms.

Cellulases are classified as glycol- (GH), a large group of enzymes catalyzing hydrolysis. CAZy is a database composed of approximately 660,000 GH sequences (Cantarel et al. 2008). These GH sequences are further classified into 153 orthologous groups based on their characteristic functional domains (i.e., GH1-GH153) in CAZy. Cellulase enzymes are classified into 16 sub-GH orthologous groups in CAZy (Table 2.1). These GH functional domains have different enzymatic activities like β-glucosidase, endoglucanase and cellobiohydrolase for cellulose depolymerization. (Table 2.1) (Sukharnikov et al. 2011; Berlemont & Martiny 2013), suggesting that microorganisms may use different cellulase orthologues in various environments.

19

The Red Sea has a number of interesting features such as warm seawater temperature and high salinity. It is reasonably speculated that novel cellulase genes can be potentially found in such interesting seawater environments. So far to our knowledge, however, microbial cellulases and their genes products have still not been studied from the Red Sea metagenomes.

The aim of this chapter is to reveal abundance, diversity, and distribution of microbial cellulases in the Red Sea environments. In this Chapter, I conducted a comprehensive survey of microbial cellulase genes from publicly available 42 Red Sea whole shotgun metagenomes. In addition, I classified cellulases obtained based on their functional domains and investigated the differences of cellulase orthologues in different environments (i.e., location and depth).

20

Table 2.1 Glycoside hydrolase families including cellulases in CAZy

GH family Enzyme name (Activity) GH1 β-glucosidase; β-galactosidase; β- ; β-glucuronidase; β-xylosidase ; β-D-; phlorizin PF00232.17 hydrolase; exo-β-1,4-glucanase; 6-phospho-β-galactosidase; 6-phospho-β-glucosidase; strictosidine β- glucosidase; ; amygdalin β-glucosidase; prunasin β-glucosidase; vicianin hydrolase; raucaffricine β- glucosidase; thioglucosidase; β-primeverosidase; isoflavonoid 7-O-β-apiosyl-β-glucosidase; ABA-specific β-glucosidase; DIMBOA β-glucosidase; β-glycosidase; hydroxyisourate hydrolase. GH3 β-glucosidase; xylan 1,4-β-xylosidase; β-; β-N-acetylhexosaminidase; α-L- PF00933.20 arabinofuranosidase; glucan 1,3-β-glucosidase; glucan 1,4-β-glucosidase; isoprimeverose-producing oligoxyloglucan hydrolase; coniferin β-glucosidase; exo-1,3-1,4-glucanase; β-N-acetylglucosaminide phosphorylases GH5 Endo-β-1,4-glucanase / cellulase; endo-β-1,4-; β-glucosidase; β-mannosidase; β- PF00150.17 glucosylceramidase; glucan β-1,3-glucosidase; licheninase; exo-β-1,4-glucanase / cellodextrinase; glucan endo-1,6-β-glucosidase; mannan endo-β-1,4-mannosidase; cellulose β-1,4-cellobiosidase; steryl β- glucosidase; endoglycoceramidase ; ; β-primeverosidase; xyloglucan-specific endo-β-1,4- glucanase; endo-β-1,6-galactanase; hesperidin 6-O-α-L-rhamnosyl-β-glucosidase; β-1,3-mannanase; arabinoxylan-specific endo-β-1,4-xylanase; mannan transglycosylase GH6 Endoglucanase; cellobiohydrolase PF01341.16 GH7 Endo-β-1,4-glucanase; reducing end-acting cellobiohydrolase; chitosanase; endo-β-1,3-1,4-glucanase PF00840.19 GH8 Chitosanase; cellulase; licheninase; endo-1,4-β-xylanase; reducing-end-xylose releasing exo-oligoxylanase PF01270.16 GH9 Endoglucanase; endo-β-1,3(4)-glucanase / -laminarinase; β-glucosidase; lichenase / endo-β-1,3- PF00759.18 1,4-glucanase; exo-β-1,4-glucanase / cellodextrinase; cellobiohydrolase; xyloglucan-specific endo-β-1,4- glucanase / endo-xyloglucanase; exo-β-glucosaminidase GH12 Endoglucanase; xyloglucan hydrolase; β-1,3-1,4-glucanase; xyloglucan endotransglycosylase PF01670.15 GH26 β-mannanase; exo-β-1,4-mannobiohydrolase; β-1,3-xylanase; lichenase / endo-β-1,3-1,4-glucanase; PF02156.14 mannobiose-producing exo-β-mannanase GH30 Endo-β-1,4-xylanase; β-glucosidase; β-glucuronidase; β-xylosidase; β-fucosidase; glucosylceramidase; β- PF02055.15 1,6-glucanase; glucuronoarabinoxylan endo-β-1,4-xylanase; endo-β-1,6-galactanase; [reducing end] β- xylosidase GH44 Endoglucanase; xyloglucanase PF12891.6 GH45 Endoglucanase PF02015.15 GH48 Reducing end-acting cellobiohydrolase; endo-β-1,4-glucanase; PF02011.14 GH51 Endoglucanase; endo-β-1,4-xylanase; β-xylosidase; α-L-arabinofuranosidase No-pfam ID GH61 copper-dependent lytic monooxygenases (LPMOs) that have oxidative cleavage of PF03443 cellulose. GH74 Endoglucanase; oligoxyloglucan reducing end-specific cellobiohydrolase; xyloglucanase. No-pfam ID ** Red, blue and green letters indicate beta-glucosidase, endoglucanaase and cellobiohydrolase activities.

21

2.2. Material and Methods

2.2.1. Collection and Analysis of the Red Sea Metagenomes

Forty-two data samples of metagenomes from the Red Sea were retrieved from Sequence Read

Archives (SRA) in the NCBI (http://www.ncbi.nlm.nih.gov/sra/?term=SAR) (Table 2.2). The obtained SRA files were converted to the fastq format using a fastq-dump program in sra tool kit packages (Oki et al. 2014). Trimming and quality control has been done using the prinseq-lite.pl

(Schmieder & Edwards 2011a). The search parameters for the trimming were set as follows: removal of low quality sequence from 3’end of the read using threshold value (phred score 30), removal of ambiguous nucleotides (maximal 1% of nucleotides allowed), removal of sequences with mean values of phred scores lower than 30 and removal of short sequences (minimum length

30 nucleotides) (Vartoukian et al. 2010). The obtained reads were assembled using A5_miseq.

Here, note that A5-miseq is an updated pipeline to assemble microbial genomes from Illumina

MiSeq data (Coil et al. 2015).

Genes on contigs were predicted using MetaGene software

(http://metagene.nig.ac.jp/metagene/metagene.html). The predicted gene sequences were translated into amino acid sequences by transeq in EMBOSS package (Rice et al. 2000), where

EMBOSS stands for the European Molecular Biology Open Software Suite. Then, sequences were annotated by Pfam database (Pfam-A, http://pfam.xfam.org/) using hmmscan in the Hmmer 3 package (http://hmmer.org/) (Eddy 1998). Sequences annotated by 16 Pfam IDs, as shown in

(Table 2.1), were extracted as candidate cellulase sequences. The relative abundance of cellulase orthologues in each sample of metagenomes was obtained by calculating the number of short reads that were mapped on contigs using sort.bam files which were generated by A5-miseq pipeline and normalized to be read numbers per 1,000,000 reads.

22

Table 2.2 Samples of the Red Sea metagenomes used in this study with different locations and depths

Sample SRR accessions Depths Latitude geographical location Saudi Arabia: SRR2103027 10 27.53 N 34.30 E Red Sea SRR2103029 25 SRR2103030 50 SRR2103031 100 SRR2103032 200 SRR2103033 500 SRR2103021 10 25.46 N 36.6 E SRR2103022 25 SRR2103023 50 SRR2103024 100 SRR2103025 200 SRR2103026 500 SRR2103015 25 23.36 N 37.3 E SRR2103016 50 SRR2103017 100 SRR2103018 200 SRR2103019 500 SRR2103008 10 22.2 N 37.55 E SRR2103009 25 SRR2103010 50 SRR2103012 200 SRR2103001 10 20.31 N 38.46 E SRR2103002 25 SRR2103003 50 SRR2103004 100 SRR2103005 200 SRR2103007 500 SRR2103038 10 18.34 N 40.44 E SRR2102996 25 SRR2102997 50 SRR2102998 100 SRR2102999 200 SRR2103000 258 SRR2103017 10 17.59 N 39.47 E SRR2103028 25 SRR2103034 50 SRR2103035 100 SRR2103036 200 SRR2103037 500 SRR2102994 10 17.39 N 40.54 E SRR2102995 25 SRR2103006 50

23

2.2.2. Hierarchical Clustering Analysis

The data matrix was constructed by combining vectors that were composed of a relative abundance of cellulase orthologues in each sample of metagenomes in this study. Distances were calculated by the Euclidean method with “dist” software in R3.1.1 packages (Version 0.99.486 – © 2009-

2015 RStudio, Inc), and the clustering was conducted with hclust software in the R packages using

Ward method (ward.D2).

2.3. Results and Discussion

2.3.1 Relative Abundance and Composition of Cellulase Orthologues in the Red Sea

To obtain nucleotide sequences of microbial cellulases genes existing in the Red Sea, I conducted large-scale survey of cellulases genes using 42 metagenomes samples at different locations and depths (Table 2.2). Cellulases genes were detected from all of the metagenomes tested, and a total number of detected sequences became 3,383 (Table 2.3). However, the abundance of cellulases genes (i.e., summing up of relative abundances of cellulase orthologues detected in each metagenome sample and getting the mean value) varied largely among locations and depths (Table

2.4-2.9). In particular, the abundance of cellulases genes was very high in metagenomes of 10 m depth when compared with metagenomes of deeper depths (Figure 2.1). It is convincing that more photosynthetic organisms such as algae bodies which contain are living in the shallow depth of marine environments. On the other hands, 500 m depth also has a relatively larger number of cellulase orthologs than those at the other depths beside 10 m.

24

Table 2.3 Number of cellulase orthologs detected from all the metagenome samples

Metagenome GH1 GH3 GH5 GH6 GH7 GH8 GH9 GH12 GH26 GH30 GH44 GH45 GH48 GH61

SRR2103027 51 62 15 4 3 0 1 1 2 2 3 1 0 3 SRR2103029 25 28 11 6 2 2 0 0 2 2 2 2 1 0 SRR2103030 34 58 17 5 2 2 1 0 2 2 2 1 1 0 SRR2103031 18 20 8 7 0 0 0 0 1 2 0 1 1 0 SRR2103032 12 16 9 5 0 0 0 0 2 1 1 1 1 1 SRR2103033 13 38 17 5 1 0 1 0 3 2 0 1 2 0 SRR2103021 29 81 25 3 5 2 0 0 1 6 0 1 1 1 SRR2103022 36 89 20 7 1 1 2 0 2 2 0 1 1 3 SRR2103023 33 49 0 5 2 2 0 0 0 1 4 0 1 3 SRR2103024 31 41 9 4 1 3 0 0 2 1 0 1 0 0 SRR2103025 10 20 19 7 1 1 0 1 4 3 0 1 0 0 SRR2103026 14 52 13 5 1 1 8 1 3 2 2 1 1 0 SRR2103015 25 41 8 5 0 1 0 0 1 4 1 1 2 1 SRR2103016 34 41 8 5 1 1 0 0 2 5 0 2 2 3 SRR2103017 34 34 10 0 2 0 0 0 0 0 0 1 1 0 SRR2103018 13 8 3 4 1 0 1 0 0 1 0 1 11 0 SRR2103019 12 26 16 10 1 0 2 3 2 3 0 1 1 0 SRR2103008 29 44 9 5 0 0 0 0 2 3 0 0 1 0 SRR2103009 23 47 6 2 0 0 0 1 0 0 0 0 1 0 SRR2103010 26 46 12 5 0 0 0 0 5 1 2 1 1 0 SRR2103012 4 19 10 1 0 2 1 0 1 2 1 2 1 1 SRR2103001 9 29 2 0 0 2 0 0 0 2 2 2 1 0 SRR2103002 18 19 10 0 2 0 0 0 0 0 0 1 1 0 SRR2103003 14 17 4 0 0 3 0 0 1 0 0 11 2 0 SRR2103004 21 37 16 5 0 1 2 0 2 4 1 1 2 0 SRR2103005 8 31 12 0 0 2 4 0 0 0 0 1 2 0 SRR2103007 10 35 22 1 0 6 4 0 1 4 0 1 0 0 SRR2103038 33 48 17 2 4 2 0 2 0 3 1 1 1 1 SRR2102996 18 23 0 0 2 1 0 1 2 1 0 0 0 0 SRR2102997 10 25 0 0 0 1 0 1 0 1 1 0 2 0 SRR2102998 7 27 7 1 0 2 1 0 2 0 0 1 1 0 SRR2102999 15 39 12 3 1 4 6 0 0 0 0 1 1 1 SRR2103000 18 30 17 1 0 5 3 0 2 0 0 1 1 0 SRR2103017 34 34 10 0 2 0 0 0 0 0 0 1 0 0 SRR2103028 30 27 7 0 2 0 0 1 0 1 2 0 0 0 SRR2103034 23 32 8 1 1 2 0 0 0 2 0 0 1 0 SRR2103035 9 17 13 2 0 0 0 0 1 1 0 1 1 2 SRR2103036 6 20 9 2 0 1 4 0 0 1 0 1 1 0 SRR2103037 12 27 23 1 0 6 4 0 1 0 0 1 1 0 SRR2102994 22 31 6 0 5 1 0 0 0 0 2 0 0 0 SRR2102995 39 54 16 0 6 1 1 0 2 1 3 0 0 1 SRR2103006 11 35 4 0 3 0 0 1 0 1 0 0 1 0 TOTAL 873 1497 460 119 52 58 46 13 51 67 30 46 50 21

25

Figure 2.1 The relative abundance of the microbial cellulases genes in the metagenome samples at all six different depths of the Red Sea. X-axis represents the different depths of the Red Sea whereas Y-axis represents the relative abundance of microbial cellulase genes

26

Table 2.4 The relative abundance of cellulase orthologs in 10 m metagenome samples

GH SRR2102994 SRR2103001 SRR2103008 SRR2103017 SRR2103021 SRR2103027 SRR2103038 group 10 m 10 m 10 m 10 m 10 m 10 m 10 m

GH1 0.00128162 0.00084221 0.00239303 0.00092809 0.00171472 0.03692978 0.00142392 GH3 0.00709615 0.00394056 0.04266848 0.00244232 0.05073702 0.16855872 0.00376035 GH5 0.00026204 0.00022263 0.00349899 0.00047501 0.00212533 0.03451342 0.00021356 GH6 0.00000000 0.00000000 0.00090341 0.00000000 0.00014929 0.00211808 0.00001527 GH7 0.00352370 0.00000000 0.00000000 0.00007465 0.00017044 0.03319493 0.00072667 GH8 0.00036356 0.00039354 0.00000000 0.00000000 0.00061253 0.00000000 0.00000459 GH9 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000157 0.00000000 GH12 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.02252956 0.00110827 GH26 0.00000000 0.00000000 0.00030388 0.00000000 0.00003141 0.00000513 0.00000000 GH30 0.00000000 0.00000796 0.00027332 0.00000000 0.00005961 0.00017078 0.00000000 GH44 0.00014110 0.00002330 0.00000000 0.00000000 0.00000000 0.00089427 0.00000656 GH45 0.00000000 0.00000000 0.00000000 0.00000466 0.00000000 0.00000000 0.00000000 GH48 0.00000000 0.00000000 0.00119158 0.00021567 0.00876228 0.02696555 0.00000768 GH51 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 GH61 0.00000000 0.00000000 0.00000000 0.00000000 0.00001980 0.00001552 0.00000225 GH74 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

Table 2.5 The relative abundance of cellulase orthologs in 25 m metagenome samples

GH SRR2102995 SRR2102996 SSR2103002 SRR2103009 SRR2103015 SRR2103022 SRR2103028 SRR2103029 group 25 m 25 m 25 m 25 m 25 m 25 m 25 m 25 m

GH1 0.00139359 0.00138800 0.00169661 0.00098630 0.00175904 0.00177950 0.00162020 0.00142843 GH3 0.00437896 0.00236222 0.00221410 0.00345228 0.00254485 0.00813019 0.00397081 0.00857295 GH5 0.00074770 0.00048656 0.00028408 0.00019410 0.00100585 0.00164265 0.00008176 0.00067030 GH6 0.00000000 0.00000000 0.00000000 0.00008073 0.00012344 0.00050191 0.00000000 0.00040531 GH7 0.00078686 0.00019388 0.00004078 0.00000000 0.00000000 0.00002266 0.00003256 0.00001307 GH8 0.00000878 0.00000997 0.00000000 0.00000000 0.00037836 0.00000157 0.00000000 0.00011932 GH9 0.00001446 0.00000000 0.00000000 0.00000000 0.00000000 0.00002188 0.00000000 0.00000000 GH12 0.00000000 0.00000390 0.00000000 0.00000138 0.00000000 0.00000000 0.00001179 0.00000000 GH26 0.00003572 0.00005626 0.00000000 0.00000000 0.00000282 0.00009113 0.00000000 0.00004649 GH30 0.00000422 0.00039555 0.00000000 0.00000000 0.00007683 0.00000196 0.00002445 0.00018937 GH44 0.00000422 0.00000000 0.00000000 0.00000000 0.00000353 0.00000000 0.00011226 0.00001007 GH45 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000621 GH48 0.00000000 0.00000000 0.00000000 0.00000000 0.00000240 0.00001725 0.00000000 0.00001457 GH51 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 GH61 0.00001635 0.00000000 0.00000000 0.00000000 0.00000297 0.00000486 0.00000000 0.00000000 GH74 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

27

Table 2.6 The relative abundance of cellulase orthologs in 50 m metagenome samples

GH SRR2102997 SRR2103003 SRR2103006 SRR2103010 SRR2103016 SRR2103023 SRR2103030 SRR2103034 group 50 m 50 m 50 m 50 m 50 m 50 m 50 m 50 m GH1 0.00095909 0.00105383 0.00111488 0.00094858 0.00239904 0.00172002 0.00477425 0.00100264 GH3 0.00308742 0.00101660 0.00396578 0.00186933 0.00208304 0.00860793 0.01094486 0.00263856 GH5 0.00008486 0.00001363 0.00062717 0.00018435 0.00073550 0.00147496 0.00108748 0.00078533 GH6 0.00000000 0.00000000 0.00000000 0.00000845 0.00032784 0.00001709 0.00340671 0.00000259 GH7 0.00000000 0.00000000 0.00018292 0.00000000 0.00000943 0.00030150 0.00001503 0.00014282 GH8 0.00003580 0.00000871 0.00000000 0.00000000 0.00057739 0.00062439 0.00043966 0.00001830 GH9 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000617 0.00000000 GH12 0.00001061 0.00000000 0.00001206 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 GH26 0.00000000 0.00000927 0.00000000 0.00009989 0.00005066 0.00000000 0.00023024 0.00000000 GH30 0.00000486 0.00000000 0.00049056 0.00000197 0.00006721 0.00000146 0.00000530 0.00141052 GH44 0.00000508 0.00000000 0.00000000 0.00001551 0.00000000 0.00068564 0.00044900 0.00000000 GH45 0.00044507 0.00000000 0.00000319 0.00000378 0.00000888 0.00000000 0.00000000 0.00000000 GH48 0.00002033 0.00000000 0.00000000 0.00000000 0.00000000 0.00005205 0.00050134 0.00000000 GH51 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 GH61 0.00000000 0.00000000 0.00000000 0.00000000 0.00001327 0.00000447 0.00000000 0.00000000 GH74 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

Table 2.7 The relative abundance of cellulase orthologs in 100 m metagenome samples

GH SRR2102994 SRR2102998 SRR2103004 SRR2103018 SRR2103024 SRR2103031 SRR2103035 group 100 m 100 m 100 m 100 m 100 m 100 m 100 m GH1 0.00128162 0.00006762 0.00182664 0.00093810 0.00146263 0.00216494 0.00146250 GH3 0.00709615 0.00038734 0.00033182 0.00010534 0.00222912 0.00436494 0.00039405 GH5 0.00026204 0.00163028 0.00107867 0.00132520 0.00037156 0.00379710 0.00377528 GH6 0.00000000 0.00001011 0.00004120 0.00006356 0.00002811 0.00047240 0.00006937 GH7 0.00352370 0.00000000 0.00000000 0.00006106 0.00000449 0.00000000 0.00000000 GH8 0.00036356 0.00001836 0.00000255 0.00000000 0.00027408 0.00000000 0.00000000 GH9 0.00000000 0.00002454 0.00001417 0.00001071 0.00000000 0.00000000 0.00000000 GH12 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 GH26 0.00000000 0.00001442 0.00000806 0.00000000 0.00003460 0.00001049 0.00001026 GH30 0.00000000 0.00000000 0.00003416 0.00010106 0.00000516 0.00002252 0.00000944 GH44 0.00000000 0.00000000 0.00001948 0.00000000 0.00000000 0.00000000 0.00000000 GH45 0.00000000 0.00001161 0.00002651 0.00000000 0.00000000 0.00181359 0.00000000 GH48 0.00000000 0.00012587 0.00000000 0.00003642 0.00011487 0.00037669 0.00033145 GH51 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 GH61 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00029184 GH74 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

28

Table 2.8 The relative abundance of cellulase orthologs in 200 m metagenome samples

GH SRR2102999 SRR2103000 SRR2103005 SRR2103012 SRR2103019 SRR2103025 SRR2103032 SRR2103036 group 200 m 200 m 200 m 200 m 200 m 200 m 200 m 200 m GH1 0.00140881 0.00071863 0.00007832 0.00014804 0.00069868 0.00045298 0.00066062 0.00020265 GH3 0.00063223 0.00054866 0.00036815 0.00046218 0.00114824 0.00114035 0.00225119 0.00092436 GH5 0.00196231 0.01553139 0.00086393 0.00310517 0.00089208 0.00144085 0.00083614 0.00278771 GH6 0.00002326 0.00000217 0.00000366 0.00000589 0.00064939 0.00071745 0.00677569 0.00002303 GH7 0.00000286 0.00000000 0.00000000 0.00000000 0.00000340 0.00000709 0.00000000 0.00000000 GH8 0.00001730 0.00004515 0.00002295 0.00002984 0.00000000 0.00002265 0.00000000 0.00000411 GH9 0.00003793 0.00005258 0.00003241 0.00000602 0.00150225 0.00000000 0.00000000 0.00005696 GH12 0.00000000 0.00000000 0.00000000 0.00000000 0.00001141 0.00000263 0.00000000 0.00000000 GH26 0.00000000 0.00001120 0.00003922 0.00000576 0.00008838 0.00043354 0.00111601 0.00000000 GH30 0.00000000 0.00000000 0.00000000 0.00000641 0.00038024 0.00008270 0.00000138 0.00002125 GH44 0.00000000 0.00000000 0.00000000 0.00000484 0.00000000 0.00000000 0.00002367 0.00000000 GH45 0.00014446 0.00027250 0.00000706 0.00010131 0.00000000 0.00000286 0.00000967 0.00000000 GH48 0.00003817 0.00027399 0.00000504 0.00038731 0.00012238 0.00015362 0.00027416 0.00047423 GH51 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 GH61 0.00002958 0.00000000 0.00000000 0.00000366 0.00000000 0.00000000 0.00000328 0.00000000 GH74 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

Table 2.9 The relative abundance of cellulase orthologs in 500 m metagenome samples

GH group SRR2103007-500 m SRR2103020-500 m SRR2103026500 m SRR2103033-500 m SRR2103037-500 m GH1 0.00009129 0.00281865 0.00593254 0.00302166 0.00420788 GH3 0.00063976 0.00507000 0.00885068 0.00264915 0.01365982 GH5 0.00202261 0.00380316 0.00399714 0.00498573 0.00942957 GH6 0.00000859 0.00292039 0.00498548 0.00123110 0.00000460 GH7 0.00000000 0.00000000 0.00000242 0.00000808 0.00000000 GH8 0.00001967 0.00000518 0.00000219 0.00000000 0.00119598 GH9 0.00001977 0.00003670 0.00006393 0.00000363 0.00027568 GH12 0.00000000 0.00000000 0.00000181 0.00000000 0.00000000 GH26 0.00000344 0.00094451 0.00157294 0.00088914 0.00000106 GH30 0.00237181 0.00166879 0.00289422 0.00107937 0.00000000 GH44 0.00000000 0.00000235 0.00000755 0.00000000 0.00000000 GH45 0.00042836 0.00041212 0.00015202 0.00002213 0.00152840 GH48 0.00005930 0.00009411 0.00008099 0.00043725 0.00051565 GH51 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 GH61 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 GH74 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000

29

To understand a degree of the diversity of cellulases genes in the Red Sea, I also clarified the orthologous composition of cellulases genes in metagenomic samples. The results showed that six to ten orthologues were generally detected from metagenome sample tested (Figure 2.2). GH3 was the most dominant, and GH1 also showed high dominancy in the orthologous composition of cellulases in the shallow depths, 10m, 25m and 50m (Figure 2.3). On the other hands, samples in deep depths such as 100m, 200m, and 500m are largely different from those of shallow depths.

The dominancy of GH5 increased in general, and GH45 and GH48 also became dominant in the composition of cellulase orthologous (Figure 2.5-Figure 2.10 in supplement data).

Figure 2.2 The abundance and diversity of cellulase orthologous genes in two representative locations of the shallow seawater

30

Figure 2.3 The abundance and diversity of cellulase orthologous genes in two representative locations of the shallow seawater

2.3.2 Hierarchical Clustering Analysis of Cellulase Orthologous Composition in Red Sea

Metagenomes

To confirm the difference in orthologous composition of cellulases in metagenomes between shallow and deep depths above, I conducted the hierarchical clustering analysis (Figure 2.4). As a result, I found that there is a large cluster composed of 16 metagenomes which were mostly obtained from samples at 10 m to 50 m depth in the phylogenetic tree. In addition, metagenome samples obtained from 100 m to 500 m depth tend to gather in the same clusters independently from those of shallow ones. These results suggest that orthologous composition of cellulases were different at least between shallow and deep depths.

31

Figure 2.4 Hierarchical cluster analysis of Red Sea metagenomes of different depths

2.4. Conclusion

Cellulase genes were detected from all the metagenomes tested, suggesting that microbial cellulase genes are present in most of the environments in the Red Sea. In addition, cellulase genes detected were classified into 14 orthologous groups based on GH functional domains. Distribution of these orthologues were suggested to be determined by the depth of environments.

GH functional domains are known to have their own catalytic characters. For example, GH3 and

GH1 have been reported to have a β-glucosidase activity while GH5 is commonly known as endoglucanase (endo-β-1,4-glucanase) (Duan & Feng 2010). Microorganisms may use cellulase orthologues of different substrates that range in environments between shallow and deep depths.

Thus, the results in this chapter strongly suggest that the Red Sea can be an excellent resource of cellulases genes with regard to the abundance and functional diversity.

32

Chapter 3. Isolation, Screening, and Characterization of Cellulase-Producing Microorganisms from the Red Sea

3.1. Introduction

In Chapter 2, I conducted a metagenomic survey of cellulase genes in microorganisms from different locations and different depths of the Red Sea environment. I then continued a search for potential cellulase-producing microorganisms with keen interest in discovering cellulase genes applicable in industry.

Several reports have been published regarding the isolation of cellulase-producing microorganisms that were obtained from various environments by actual experiments with a culturing method from various environments. Leis et al. (2015) and Nishida et al. (2011) reported that isolation of cellulase genes of GH9 family was made from the Japanese purple sea urchin Strongylocentotus nudus. El-Morsy (2000) has also isolated cellulase-producing fungi such as Aspergillus niger,

Cladosporium cladosporioides and Penicillim chrysogenum in the mangrove environment from the Red Sea coast of Egypt (El-Morsy 2000). So far to our knowledge, no reports have been published for isolating microorganisms producing active cellulases from marine environments of the Red Sea.

The aim of this Chapter is to isolate active cellulase-producing microorganisms from the Red Sea by the culturing and screening method and to characterize them in the enzymatic activity.

33

3.2. Material and Methods

3.2.1 Collection of Samples from the Red Sea

Marine samples were collected from the coastal region of the Red Sea at Thuwal in Saudi Arabia, on August 26 and September 30, 2015. The seawater samples were obtained from the seawater surface of the site at (22°17.444’N, 39°03.183’E) using a Niskin bottle. Seaweed samples were obtained at the KAUST coastal marina (22°18'16.7"N 39°06'12.1"E) and plankton samples were collected by drawing a net of which the mesh size was 0.63 μm, at 1 knot for ten minutes. All samples were kept in sterile tubes and stored at 4˚C until use (Figure 3.1).

A B C

Figure 3.1 Red Sea samples from the coastal region. (A) plankton fraction, (B) seawater, and (C) seaweeds.

34

3.2.2 Isolation and Screening of Cellulase-Producing Microorganisms

The isolation of cellulase-producing microorganisms was conducted by following a standard microbiological method of Kasana et al. (2008). The seawater samples and plankton fraction samples were vortexed for 15 min in sterilized 50 ml tube, then tubes were left to let the debris to

-1 -7 settle down for 5 min. Supernatants were then collected and diluted from 10 to 10 using sterilized water and then 100-μl of the diluted samples were used as an inoculation source on nutrient media (NM) that contain 0.3% beef extract, 0.5% peptone, 0.5% NaCl and 1.7% Agar

(Kasana et al. 2008). For seaweeds, 1 g of the seaweeds were measured and added to 10 ml sterilized water. 1 g of the glass beads (425-600 µm) were dipped in 1 M HCl for 1 hr and then rinsed with distilled water. The resultant acid-washed glass beads were added to the seaweed tubes and the mixture were vortexed for 10 min. The tube was left for 5 min to let the solids settle down.

The supernatant was collected and diluted to use it as an inoculation source, 100μl aliquot of each dilution series were spread on the nutrient media and incubated at 30˚C for 48 hrs (Patagundi et al. 2014).

An assay utilizing the carboxymethyl-cellulose (CMC) was used to indicate the cellulase activity by observing the formation a halo zone around the cellulase-producing colonies. The obtained colonies from the NM media were inoculated on CMC media (0.02% peptone, 0.2%

CMC, 0.2% K2HPO4, 0.05% MgSO4.7H2O, 0.2% NaNO3 and 1.7% agar) with PH 7.0. The plates were incubated at 30˚C for 72 hrs (Chantarasiri 2014). To detect cellulase-producing microorganisms, the plates were stained with 1 ml of 1% Congo red solution and were incubated for 15 min. After removing the Congo red solution, the plates were washed with 1% sodium

35

hydroxide solution for 15 min. Strains forming clear halo zones on CMC media were selected as cellulase-producing strains.

3.2.3 Taxonomy Identification

Bacterial and fungal genomic DNAs were extracted and purified for the sequencing analysis following the protocol described by Nakada et al. (Nakada et al. 1994). In brief, the isolates were inoculated in nutrient broth (0.06 g beef extract, 0.1 g peptone and 0.1 g NaCl) and incubated at

30˚C for 48 hrs for the bacteria and at 28˚C for 48 hrs for fungal isolate. The cells were collected by centrifugation at 5000 rpm for 5 min at 4˚C. For the bacterial cells, the collected pellets were resuspended in extraction buffer [100 mM NaH2PO4 (pH 8), EDTA 100 mM, 100 mM NaCl, 500 mM Tris (pH 8), 10% (wt/vol) and 1% mercaptoethanol]. The fungal mycelia were also collected by the centrifugation at the same condition, then removed from the media using a filter paper.

Fungal mycelia were grounded by using a pestle and mortar to form a fine powder while freezing them with liquid nitrogen. The powdered mycelia were then resuspended in the extraction buffer.

The resuspended buffer was then heated at 70 ̊C for 30 min with shaking at 30 rpm, and then DNA was extracted using the phenol/chloroform extraction method. The obtained DNA pellets were dissolved into 20 μl TE buffer and stored it at -20˚C. The concentrations of the DNAs were measured using Qubit® dsDNA BR Assay Kit, (500 assay) (ThermoFisher scientific®) (Saitoh et al. 2009).

To amplify the bacterial 16S rRNA gene, PCR amplification was performed using QIAGEN

Multiplex PCR kit. The primers Bac27F (5′-AGAGTTTGGATCMTGGCTCAG-3′) and

Univ1492R (5′-CGGTTACCTTGTTACGACTT-3′) were used. Fungal 18S rRNA genes were amplified using primer pairs, Euk1A (5′-CTGGTTGATCCTGCCAG-3′) and Euk516r (5′-

36

ACCAGACTTGCCCTCC-3′). All PCR reaction was carried out in 25 µl reactions, and it began with initial denaturation at 94˚C for 15 min, annealing at 55˚C for 30 secs and making an extension at 72 ̊C for 30 sec, and a final extension at 72 ̊C for 10 min. The obtained amplicons were purified using the WIZARD cleaning kit (Promega) and cloned into the plasmid pCRTM 2.1 using the TA

Cloning® Kit with pCRTM 2.1.

The 16S and 18S rRNA sequences were determined by using the ABI sequencing system (3730xl-

DNA analyzer) with the primers M13F (5 ́- GTAAAACGACGGCCAG-3 ́) and M13R (5 ́-

CAGGAAACAGCTATGAC-3 ́). The resulting sequences were used for inference of their taxonomy using BLASTN at the NCBI web service (https://blast.ncbi.nlm.nih.gov/Blast.cgi).

Phylogenetic trees were constructed using the MEGA7 software (Tamura et al. 2013). Alignment construction was performed on the Clustalw software in the MEGA packages.

3.2.4 DNA Preparation for Whole Genome Sequencing of Bacterial Isolates

DNA samples for the genome sequencing were prepared by culturing the bacterial isolates in nutrient broth for overnight at 30˚C under shaking at 150 rpm. DNAs were extracted from the isolates using Qiagen DNeasy Blood & Tissue Kit following the manufacturer’s instructions

(Qiagen, Valencia, CA). The DNAs obtained were quantified with Qubit dsDNA BR assay kit

(ThermoFisher Scientific). The electrophoresis was also examined with the 1% agarose gel to confirm that the length of the DNAs longer than 40 kb. 50 µg DNA of each strain was used for sequencing with Pacific Biosciences (PacBio) RSII platform by the support of the Bioscience Core

Lab in KAUST (Woo et al. 2014; Quail et al. 2012).

37

3.2.5 De novo Assembly of the Genome Sequencing Data

Whole genome sequencing of four isolates was conducted using PacBio RSII platform by the support of the Bioscience Core Lab in KAUST (Quail et al. 2012; Woo et al. 2014). The denovo assembly of nucleotide sequences of the bacterial isolate sequence was conducted using PacBio software HGAP3/Quiver (Qiu et al. 2016). Circular closure was done by Minimus2

(http://amos.sourceforge.net/wiki/index.php/Minimus2) to trim the ends and permute the genome to begin at the DnaA gene (identified by BLAST), followed by Quiver-based error correction for a final closed genome. The coverages of sequencings were approximately 50x in each bacterial strain.

3.2.6 Genome Annotation

The FGENESB_annotator was used to predict the presence of likely genes in genomes of the bacterial isolates (http://www.softberry.com/berry.phtml?topic=fgenesb) (Solovyev et al. 2011).

The obtained nucleotide sequences of the genes were directly extracted from the genomes using the information of gene location provided by FGENESB. Recently, all known cellulases were classified based on sequence comparison into 16 GH orthologous groups that was defined by

CaZy database (Cantarel et al. 2008), since each GH group in the CAZy database is known to have a corresponding Pfam domain as described in Table 2.1 in Chapter 2 (Park et al. 2010).

Pfam annotation of the deduced amino acid sequences of predicted genes was conducted with

Pfam-A database using the hmmscan program in HMMER (v3.0). The annotations of GH families shown in Table 2.1 were used to extract candidate cellulase genes with an E-value cutoff of < 1.0e-60 (Punta et al. 2012).

38

For the fungal isolate, the quality control of raw sequences obtained by RNA-Seq was conducted by PrinSeq (Schmieder & Edwards 2011b) so that the mean of Phred Score became more than 30 and a presence rate of N was smaller than 1% in each read of nucleotide sequences. The quality controlled datum was then used for de novo assembly with the software Trinity (Haas et al. 2013), and the obtained sequences were translated into amino acid sequences using EMBOSS transeq program (Schmieder & Edwards 2011b) with 6–frame option. The resultant amino acid sequences were annotated by Pfam functional domains using HMMER v3.1b2 program (Eddy 2011), and

Glyco_hydro families described in Table 2.1 in Chapter 2 were extracted as cellulases. The E- value smaller than 1.0e-60 was used as a threshold.

3.2.7 Multilocus Sequence Typing (MLST) Analysis of Bacterial Isolates

The phylogenetic relationship between the bacterial isolates and other Bacillus species was determined by the multilocus sequence typing (MLST). The amino acid sequences of total fourteen housekeeping genes were obtained from genome sequences of Bacillus lichiniformis strain TAB7

(adk, ccpA, recF, rpoB, sucC and spo0A) (Jeong et al. 2018), Bacillus subtilis strain WB800N

(glpF, ilvD, pta, purH, pycA, rpoD and tpiA) (Bóka et al. 2019) and Bacillus anthracis strain CZC5

(gmk) (Bartoszewicz & Marjańska 2017) in the MLST database (PubMLST; http://pubmlst.org/bsubtilis/) and these were used as query gene sequences.

Protein sequences (protein fasta format) of twenty-eight different Bacillus species (Table 3.1) were obtained from NCBI assembly database and were converted to the database format with makeblastdb program of Blast+ package version 2.2.31 (Tsimpidis et al. 2017). Amino acid sequences of genes predicted with FGENESB were converted to the Blastp database format in each isolate. Extraction of the housekeeping genes in each species was conducted by the Blastp

39

search using the query sequence of fourteen housekeeping genes against the protein databases that were converted from genomes of isolates and Bacillus species. The sequences of the top- scoring hit with E-value lower than 1.06e-90 and 100% query coverage were extracted as orthologues of the fourteen genes.

The sequence alignment of each of 14 genes were conducted using MAFFT version 7.394, and the obtained alignments were concatenated to one manually (Katoh & Standley 2013). The phylogenetic tree was constructed by the neighbor-joining method with MEGA7 (Tamura et al.

2013). The branch quality was evaluated by using a bootstrap test with 1,000 replicates

(Akamatsu et al. 2019).

40

Table 3.1 Reference species genome number for multilocus sequences typing (MLTS).

Reference species Accession number Bacillus paralicheniformis Bac48 ASM299394v1 Bacillus paralicheniformis Bac84 ASM299392v1 Bacillus paralicheniformis ATCC 9945a ASM40888v1 Bacillus paralicheniformis B4123 ASM192503v1 Bacillus paralicheniformis KJ-16 ASM104248v2 Bacillus paralicheniformis MKU3 ASM196956v1 Bacillus paralicheniformis NCIMB 8874 ASM169151v1 Bacillus paralicheniformis 167/2 ASM193954v1 Bacillus licheniformis DSM 13 ATCC 14580 ASM1164v1 Bacillus sonorensis SRCM101395 ASM220201v1 Bacillus atrophaeus SRCM101359 ASM217349v1 Bacillus subtilis subsp.inaquosorum DE111 ASM153478v1 Bacillus siamensis SCSIO 05746 ASM285053v1 Bacillus vallismortis ASM211380v1 Bacillus velezensis FZB42 ASN1578v1 Bacillus altitudinis B388 ASM78942v2 Baciilus litoralis Bac94 ASM366782v1 Bacillus luciferensis CH01 ASM171275v1 Bacillus sp. FDAARGOS_527 ASM381212v1 Bacillus anthracis str.Ames AMS784v1 Bacillus methanolicus MGA3 ASM72448v1 Bacillus thermoamylovorans A1A Bacillus macauensis ZFHKF-1 ASM26986v1 Bacillus cellulosilyticus DSM_2522 ASM17723v2 Bacillus selenitireducens_MLS10 ASM9308v1 Bacillus clausii KSM-K16 ASM982v1 Bacillus patagoniensis ASM201970v1 Staphylococcus aureus NCTC_8325 ASM1342v1

3.2.8 Preparation of Extracellular Cellulase Enzymes

The obtained cellulase-producing isolates on CMC plates were inoculated into 100 ml Nutrient broth media prepared in 500 ml Erlenmeyer flask, and they were pre-cultured at 30˚C by shaking for 48 hrs at 200 rpm. Two days after incubation, 2 ml aliquots of the pre-culture were inoculated in 200 ml liquid culture (0.2% NaNO3, 0.1% K2HPO4, 0.05% MgSO4, 0.05% KCl, 0.02% peptone)

41

supplemented with a 1.0 x 6.0 cm (=50 mg) strip of Whatman no.1 filter paper as a carbon source.

The flasks were further incubated at 30˚C shaking 200 for four days (Kim et al. 2012a). The cell growth was monitored every 24 hr by the optical density at 600 nm. The obtained culture was centrifuged at 5000 rpm at 4˚C for 5 min, and the supernatant was used as a cell-free crude enzyme.

The obtained cell-free enzyme was used immediately for measuring the cellulase activity by the filter paper assay (Gopinath et al. 2014).

3.2.9 Measuring the Cellulase Activity

Filter paper assay (Hankin and Anagnostakis 1977) was conducted to measure the total cellulase activity in the culture (Behera et al. 2017). Total cellulase activity was determined by measuring the amount of formed by the degradation of filter paper strip. 0.5 mL of supernatant of the culture was incubated in the 1.0 mL of 0.05 M sodium citrate buffer (pH 4.8) with Whatman no.1 filter paper strip 1.0 x 6.0 cm (=50 mg). After incubation for one hour at 50˚C, incubated with

0.5 ml enzyme solution for 60 min at 50 ˚C (Shanmugapriya et al. 2012). The reaction was stopped by adding 3 ml of dinitrosalicylic acid (DNS) to the reaction mixture prepared by mix 1.50 g DNS and 2.80 g sodium hydroxide in 200 ml distilled water. Then, I added 5.33 g NaOH and 43.2 g sodium potassium tartrate (Rochelle salt) and 1.0 ml phenol (Shareef et al. 2015). The reduced sugar generated by cellulase hydrolyzation were estimated spectrophotometrically at 540 nm, and determined using a calibration curve for D-glucose which should give absorbance in the range from 0.1 to 1.0 A. Two kinds of controls were used. One was with enzyme and without substrate, and the other one was with substrate and without enzyme. They were also prepared in all enzyme assays.

42

The concentration of cellulase enzymes that would have released exactly 2.0 mg of glucose was estimated by plotting glucose liberated against the logarithm of enzyme concentration (Adney &

Baker 1996; Gupta et al. 2012).

The activities were estimated following the described formula to calculate FPU (Adney & Baker

1996):

(2.0 mg glucose/ 0.18016 mg glucose/µmol) = 0.37 µmol/minute − ml (0.5 ml enzyme dilution X 60 minutes)

0.37 Filter paper activity = FPU/ml (enzyme) releasing 2.0 mg glucose

3.3. Results and Discussion

3.3.1 Isolation of Cellulase-Producing Microorganisms from the Red Sea Samples

The survey in Chapter 2 allows me to expect that microbial cellulolysis actively occurs in the Red

Sea. I therefore tried to isolate microbial strains that possess cellulase activities from the Red Sea samples.

Surface seawater samples were collected from the coastal region of the Red Sea at Thuwal in Saudi

Arabia. The samples were diluted and spread on NM plates. Four hundred and fifty-six colonies were isolated on NM plates and subsequently streaked on the media containing CMC as a sole carbon source for the screening of cellulase-producing microorganisms. No isolates showed cellulase activity on the CMC media plates (Table 3.2).

Previous research by Trivedi et al. (2011a, 2011b, 2013) has reported the isolation of Bacillus aquimaris, B. flexus NT and Pseudoalteromonas CSMCRI-5 with cellulolytic potential from green

43

seaweed. Therefore, next, I tried to isolate cellulase-producing bacteria from the surface of the green seaweed that was predicted to be Sargassum weed by the morphological feature (Figure 3.1).

One hundred and twenty-six strains were obtained on NM plates, and two strains (SB2, SB3) showed a clear halo zone on CMC media plates after the staining (Figure 3.2). This result indicates that cellulase-producing microorganisms are living on the cellulose-containing substrates like seaweeds in the Red Sea as is suggested in different marine environments (Joint, Mühling, &

Querellou, 2010).

I also tried to isolate cellulase-producing microorganisms by using the plankton fraction, i.e., samples collected and condensed phytoplankton cells. Sixty strains were isolated from it on NM plates, and one bacterial isolate (PB1) and one fungal isolate (PF1) formed zones of clearance around the colonies on CMC media plates after staining with Congo Red (Table 3.2 and Figure

3.2). This result suggests that cellulase-producing bacteria are living in the seawater, probably on the surface of phytoplankton cells in the Red Sea. This study failed to isolate cellulase-producing microorganism directly from seawater. It is probably because the number of phytoplankton cells was very few there. So far, there is no report to isolate cellulase-producing microorganisms from plankton fraction. Our strains are the first examples to isolated cellulase-producing microorganisms associated with marine plankton.

Table 3.2 Types of samples and the number of actual and active isolates

Samples Seawater Seaweed Plankton

No. isolates 456 126 60

Positive cellulase isolates None 2 bacteria 1 bacteria

1

44

Figure 3.2 Screening of cellulase-producing microorganisms by Congo red on CMC agar plate. (A) fungal isolate from plankton (PF1), (B) bacterial isolate from plankton sample (PB4), (C and D) bacterial isolate from the seaweeds (SB4 and SB5) and (E) Escherichia coli as a negative control

45

3.3.2. Taxonomic Identification of Cellulase-Active Strains

To predict taxonomies, I determined the sequences of a small subunit rRNA gene from each isolate. PCR reactions were performed using the primer pair that encompassed about 1,500 bp of

16S rRNA gene region for bacterial strains while using the pair targeting around 420 bp of 18S rRNA gene region for a fungal strain. Amplicons with expected sizes were obtained from each strain. The Blastn search revealed that all the bacterial strains were classified as genus Bacillus.

PB1, SB2 and SB3 showed 92.78% identity to Bacillus sp. G5-6b, 92.76% identity to Bacillus sp.

B5-4b and 99.57% identity to Bacillus sp. strain B10, respectively. The fungal strain PF1 were predicted to be an Aspergillus fungus and was closely-related by 99% identity to A. ustus (Figure

3.3 and Table 3.3).

Table 3.3 Isolates taxonomy based on 16S rRNA sequence

Isolate Identity Query coverage Close homology PB1 92.78 % 100% Bacillus sp. G5-6b SB2 92.76% 99% Bacillus sp. B5-4b SB3 99.57% 100% Bacillus sp. strain B10 PF1 99% 99% Aspergillus ustus

46

Penicillium brevicompactum AF548082.1 Penicillium sp.KBS3 GQ228447.1 Penicillium malachiteum FJ358346.1 Penicillium decumbens EU667998.1 Penicillium oxalicum GU078431.1 7 Penicillium Penicillium viridicatum JN939249.1 Penicillium verrucosum JN938976.1 87 Penicillium sp.Y12 EG-2010 HM161749.1 Penicillium griseofulvum EF608151.1

64 Penicillium expansum AB028137.1 Penicillium janthinellum AB293968.1 Aspergillus ustus AB002072.1 PF1 13 Aspergillus ustus AB008410.1 64 17 Aspergillus ustus GU573851.1 21 Aspergillus keveii MF004311.1 Aspergillus niger AF548064.1 Aspergillus oryzae D63698.1 Aspergillus Aspergillus fumigatus AB008401.1 10 Aspergillus awamori D63695.1 5 63 Aspergillus terreus GU573850.1 4 AB002063.1 6 Aspergillus flavus AF548060.1 13 Aspergillus penicillioides DQ985959.1

0.005

Figure 3.3 Neighbor-joining phylogenetic tree based on the 18s rRNA of the strains PF1. The taxa clustered together in the bootstap test (1000 replicates). The evolutionary distance bar shows the unit of the number of base substitutions per site

47

3.3.3 Whole Genome Sequencing and Assembly

Whole genome sequencing of the three bacterial isolates was conducted using Pacific Biosciences

(PacBio) RSII platform by the support of the Bioscience Core Lab in KAUST ( Quail et al. 2012;

Woo et al. 2014). The de-novo assembly of nucleotide sequences of the bacterial isolate sequence was conducted using PacBio software HGAP3/Quiver (Qiu et al. 2016). I found that the complete genome is composed of one circular chromosome. The genome size of PB1, SB2, and SB3 were

4,318,221 bp, 4,318,038 bp and 4,317,481 bp with 4362, 4441 and 4675 genes, respectively, were predicted on the genomes (Table 3.4). The coverages of sequences were approximately 50X in each bacterial strain.

Table 3.4 Whole genome size of the three bacterial isolates.

Sample PB1 SB2 SB3 Genome size 4,318,221 bp 4,318,038 bp 4,317,481 bp No. genes 4362 4441 4675 No. candidate 10 genes 10 genes 11 genes cellulase genes

3.3.4. Multilocus Sequence Typing (MLST) Analysis

To determine the phylogenetic positions of the three strains in genus Bacillus precisely, we conducted MLST by using fourteen housekeeping genes (adk, ccpA, recF, rpoB, spo0A, sucC, glpF, ilvD, pta, purH, pycA, rpoD, tpiA and gmk). The three bacterial strains were all included in the cluster of Bacillus paralicheniformis strains with the 99% bootstrap value. Furthermore, PB1 and SB2 formed a cluster with 81% bootstrap support, whereas SB3 branched independently from the other isolates and formed a cluster with B. paralichiniformis strains MKU3 and NCIMB 8874 with the 100% bootstrap value (Figure 3.4).

48

Figure 3.4 Neighbor-joining phylogenetic tree based on MLST housekeeping genes in the PB1, SB2 and SB3 and other Bacillus strains. The value of each node represents the bootstrap value (1000 replicates). The evolutionary distance bar shows the unit of the number of base substitutions per site.

3.3.5. Enzyme Activity Assay

The total cellulase activity of the four strains PF1, PB1, SB2 and SB3 was measured by using the reducing sugar assay. After cultivation in a 200 ml liquid medium supplemented with the filter paper as cellulose substrate, all media showed degradation of the filter papers whereas the control

(i.e. the same media cultured without bacterial inoculums) showed only a slight degradation

(Figure 3.5), suggesting that these degradations of the filter paper took place by microbial cellulolysis. I then took aliquots of supernatant of broth culture and measured cellulase activities

49

in them. The cellulase activity reached its peak after 72 hrs then fell down sharply. The highest cellulase activities were 0.71, 0.75, 0.70 and 0.59 FPU/ml in the strains PF1, PB1, SB2 and SB3, respectively (Figure 3.6).

On the other hand, little activity of cellulases was detected from the broth media when isolates were cultured in NM media, indicating that these isolates secrete cellulase enzymes when cellulose is present as a sole carbon source. Although there are no common criteria (unit) to evaluate the cellulase activity developed, Samira et al. (2011) measured cellulase activities of three marine bacterial isolates obtained from Persian Gulf with the same method as ours and reported the activities of 0.079, 0.074 and 0.072 FPU/ml in each strain. Our isolates showed more than ten times higher cellulase activity than their strains.

50

A B C

D E

Figure 3.5 Filter paper assay. The degradation of the filter paper by the four strains. (A) negative control, (B) PF1, (C) PB1, (D) SB2 and (E) SB3

51

Figure 3.6 Filter paper assay activity measurement of the four strains. (A) PF, (B) PB1, (C) SB2, (D) SB3 and (E) activity measurements of the four isolates using non-inducing media as a control

52

3.3.6. Conclusion

In this Chapter, I isolated cellulase-producing microorganisms from the Red Sea. All isolates performed cellulolysis strong enough to degrade filter papers completely, expecting the application of them in industry as biological catalysts of the plant biomass degradation in future.

In this study, I successfully isolated strains from seaweed surface and plankton fraction, whilst no isolates were obtained directly from seawater sample. These results suggest the important ecological trait of them that they were living associated with organisms composed of cellulose materials in the Red Sea. The three bacterial strains were all predicted to be Bacillus paralicheniformis. This species has been isolated from agricultural soil, mangrove ecosystem, and plant body itself. Combined with our isolates, the life of this species might be highly depending on cellulolysis of plant biomass. On the other hand, the fungal isolate was predicted to be close to

Aspergillus ustus. This fungus has been reported as cellulases-producing species by several studies

(for example, Shamala & Sreekantiah, 1986). This fungus has been isolated from various environments including marine, but this study is the first example of isolation from the Red Sea.

My results suggest that this species also adapt well in the Red Sea. In recent research, the isolation of cellulase-producing microorganisms has been increased, while information of microbial cellulase genes are still limited. Toward the application of these isolates in industry, the next step should be the identification of cellulase genes actually working during their cellulolysis.

53

Chapter.4. Characterization of Cellulase Genes by Transcriptome Analysis

4.1. Introduction

In Chapter 3, four cellulase-producing microorganisms (three bacterial strains and one fungal strain) were isolated from the Red Sea. As a next step, it is very important to know what kind of cellulase genes actually working during their cellulolysis considering the application of them in industry. It has been revealed that microorganism possesses several cellulase genes in their genomes in general (Feng et al. 2007). It is therefore particularly important to investigate which cellulase genes are actively expressed and working during their cellulolysis.

For this purpose, I conducted genome sequencing of the three bacterial isolates in chapter 2 and found out genes predicted to be cellulases (candidate cellulases). I then compared the expression of these genes between the cellulase-inducing and non-inducing conditions. Regarding the fungal isolates, I predicted the cellulase genes using the de novo assembly of RNA-Seq and conducted the following analyses. It has been revealed that the microorganisms possess several cellulase genes in their genomes in general (Feng et al. 2007).

The aim of this Chapter is to reveal the set of cellulase genes highly expressed during cellulolysis in each strain, and to predict what kind of cellulases the isolates are actually using for their cellulolysis. In addition, I also compared their cellulase gene sets and discussed the diversity of cellulolysis mechanisms among isolates.

54

4.2. Materials and Methods

4.2.1. RNA Extraction and Sequencing

The isolates were cultured in two different conditions: cellulase-induced and non-inducing ones.

Under the inducing condition, isolates were cultured in the media composed of 0.02% peptone,

0.2% K2HPO4, 0.05% MgSO4-7H2O, 0.2% NaNO3 and Whatman no.1 filter paper, while under the non-induced condition, the filter paper was excluded from the media. Total RNA was extracted from the bacterial isolates by using QIAGEN RNeasy mini Kit (Qiagen, Valencia, CA) according to the manufacturer’s protocol. RNA extraction from the fungal strain was conducted by the phenol-chloroform extraction methods (Saitoh et al. 2009; Urakawa et al. 2010). Total RNA quality and concentration were determined using the Agilent RNA 6000 Pico kit (Agilent, Santa

Clara, CA) with a 2100 Bioanalyzer (Agilent). Paired-end libraries with approximate average insert lengths of 200 bp were synthesized using the Genomic Sample Prep kit (Illumina, San Diego,

CA) according to manufacturer's instructions. Libraries were sequenced on the Illumina HiSeq

4000 platform (Illumina, San Diego, CA) with the support by KAUST Bioscience Core-lab

(Mizrachi et al. 2010).

4.2.2. De Novo Assembly of RNA-Seq

The quality control of RNA-Seq raw data was checked by PrinSeq (Schmieder & Edwards 2011b) so that the mean of Phred Score became more than 30 and a presence rate of N was small than 1% in each read of nucleotide sequences. The quality controlled datum was then used for de novo assembly with the software Trinity (Haas et al. 2013), and the obtained sequences were translated into amino acid sequences using EMBOSS transeq program (Schmieder & Edwards 2011b) with

6–frame option. The resultant amino acid sequences were annotated by Pfam functional domains

55

using HMMER v3.1b2 program (Eddy 2011), extracting a part of sequences that are annotated as

GH families.

4.2.3. Expression Analysis of Cellulase Genes in the Isolates

The nucleotide sequences of cellulase genes predicted from the genome sequences of the three bacterial strains by the method described above were used to build an index using the Bowtie2- build program in the Bowtie2 package (Langmead & Salzberg 2012). Since the whole genome sequencing of the fungal strain was not complete, gene prediction was conducted using sequences generated by de-novo assembly of RNA-Seq sequences.

Only one-side of pair-end reads generated by RNA-Seq experiments described above were aligned to the sequence index by using Bowtie2 alignment program in each isolate. The gene expression rate was determined by the RPKM unit, by calculating as the following RPKM equation: the number of short reads were mapped on each cellulase gene region (rg) normalized, and divided by the feature length (flg) multiplied by the total number of reads from the sequencing run (R)

(Wagner et al. 2012).

rg × 109 RPKMg = flg × R

Moreover, the cellular localization of these cellulase genes was determined by using Cell-Ploc

2.0 web server (http://www.csbio.sjtu.edu.cn/bioinf/Cell-PLoc-2/).

4.2.4 Operon Structure Identification

FGENESB_annotator web server was used to predict the operon structure in genomes of bacterial isolates. The expression rate of the genes in the same operons with cellulase genes was determined by the same method described before.

56

4.3. Results and Discussion

4.3.1. A Search of Cellulase Gene from the Complete Genomes

To determine cellulase genes in the three bacterial isolates, I conducted genome sequencing of them in Chapter 3. I further conducted the survey of cellulase genes, revealing the presence of eleven candidate cellulase genes in the PB1 and SB3, and ten genes in SB2. In general, the cellulolysis was accomplished by three types of cellulases, β-glucosidase, endoglucanase and cellobiohydrolases. The genes identified included all these cellulases. However, all the bacterial isolates possess multiple genes with different GH domains in either of these three cellulases (Table

4.1).

In the strain PB1, five candidate genes were annotated as GH1 family, as shown in (Table 4.1).

The other four genes were annotated as GH3, GH5, GH9 and GH26, respectively, and the remaining two were predicted to belong to GH48 family. SB2 strain had five cellulase genes of

GH1, and the remaining five genes were predicted to belong to GH5, GH9, GH26, GH44 and

GH48 families (Table 4.2). SB3 strain also had five cellulase genes classified as GH1, and two genes belonged to GH48, with five single genes of GH3, GH5, GH9 and GH26. The repertory of candidate cellulase genes conserved in the bacterial strains were very similar to each other (Table

4.3). In the fungal isolate (PF1), three candidate cellulase genes belonging to GH1, GH3 and GH61 families were found from the de novo assembly of the RNA-Seq sequences (Table 4.4).

57

Table 4.1 Relative expression of cellulase gene under cellulase-inducing and non-inducing conditions of PB1. Highlighted values represent 10 times upregulated genes.

Genes in GH Expression Expression rate Inducing/non- Cellulase activity PB1 family rate by RPKM by RPKM under inducing under inducing non-inducing culture culture GENE_718 GH1 258.8642905 8.231543015 31.44784519 β-glucosidase/ exo-β-1,4-glucanase GENE_743 GH1 23.32948924 19.25093414 1.211862711 GENE_769 GH1 17.53974653 6.052009981 2.898168804 GENE_1298 GH1 1.490646353 0.9172980486 1.625040362 GENE_3516 GH1 2.165987817 0.404207467 5.358604167 GENE_1100 GH3 11.3140458 4.030814985 2.806887897 β-glucosidase GENE_3008 GH5 31.01714162 0.6238110449 49.72201418 Endo-β-1,4-glucanase Endo-β-1,4-glucanase/ β- GENE_2740 GH9 0.1985041987 0.07072038714 2.806887897 glucosidase/Cellobiohydrolase GENE_1657 GH26 31.19677691 4.788465659 6.514983949 Endo-β-1,4-glucanase GENE_2741 GH48 7.457844982 3.255620545 2.290759896 Endo-β-1,4-glucanase/Cellobiohydrolase ** Red, blue and green letters indicate beta-glucosidase, endoglucanaase and cellobiohydrolase activities, respectilvely.

Table 4.2 Relative expression of cellulase gene under cellulase-inducing and non-inducing conditions of SB2. Highlighted values represent 10 times upregulated genes. NC stand for not calculated value. Genes in GH Expression rate Expression rate Inducing/non- Cellulase activity SB2 family by RPKM by RPKM inducing under inducing under non- culture inducing culture GENE_958 GH1 73.69366732 0.04475867257 1646.4667758 β-glucosidase/ exo-β-1,4-glucanase GENE_2557 GH1 0.7994501766 0.7277255574 1.0985599838 GENE_2582 GH1 5.154397266 6.386958486 0.8070190651 GENE_2609 GH1 14.3125962 12.02389362 1.1903462105 GENE_3145 GH1 1.065933569 1.433398825 0.7436406044 GENE_2942 GH3 2.259563993 0.03280449294 68.87971099 β-glucosidase GENE_443 GH5 3.861339218 39.19928323 0.098505353 Endo-β-1,4-glucanase Endo-β-1,4-glucanase/ β- GENE_173 GH9 0.05855300223 0.06460579038 0.906311987 glucosidase/Cellobiohydrolase GENE_3511 GH26 1.591149451 1.228941725 1.294731410 Endo-β-1,4-glucanase

58

GENE_17 GH48 0.398326523 0.014983045 26.585151609 Endo-β-1,4-glucanase/Cellobiohydrolase

Table 4.3 Relative expression of cellulase gene under cellulase-inducing and non-inducing conditions of SB3. Highlighted values represent 10 times upregulated genes. NC stand for not calculated value.

Genes in GH Expression rate by Expression rate by Inducing/non- Cellulase activity SB3 family RPKM under RPKM under non- inducing inducing culture inducing culture GENE_288 GH1 16.03403525 4.051116106 3.957930316 β-glucosidase/ exo-β-1,4- glucanase GENE_2665 GH1 509.7408188 0.3505671194 1454.046288 GENE_4343 GH1 47.85326444 1.067736028 44.81750469 GENE_4369 GH1 68.32428719 3.021224223 22.61476877 GENE_4396 GH1 378.23733 4.704787042 80.39414464 GENE_74 GH3 5.109965896 0.04671588942 109.383894 β-glucosidase GENE_2102 GH5 23.56539853 74.05323581 0.3182224014 Endo-β-1,4-glucanase Endo-β-1,4-glucanase/ β- GENE_1821 GH9 0.4833728245 0.02300078959 21.01548829 glucosidase/Cellobiohydrolase GENE_679 GH26 1.464276571 0.9167184506 1.597302389 Endo-β-1,4-glucanase Endo-β-1,4- GENE_1822 GH48 0.07074935986 0.0000000000 NC glucanase/Cellobiohydrolase Endo-β-1,4- GENE_1823 GH48 2.855602989 0.3752101595 7.610676089 glucanase/Cellobiohydrolase ** Red, blue and green letters indicate beta-glucosidase, endoglucanaase and cellobiohydrolase activities, respectively.

Table 4.4 Relative expression of cellulase genes under cellulase-inducing and non-inducing conditions of PF1. Highlighted values represent 10 times upregulated genes. NC stand for not calculated value.

Genes in PF1 GH Expression rate Expression Inducing/non- Cellulase activity family by RPKM rate by inducing under inducing RPKM under culture non-inducing culture GENE_5091 GH1 3.37151714 0.51607108 6.53304797 β-glucosidase/ exo-β-1,4- GENE_3612 GH3 160.77389072 0.51607108 311.5343911 β-glucanaseglucosidase GENE_17375 GH61 29.98470238 0.08989827 333.5403715 Copper-dependent lytic polysaccharide monooxygenases

** Red, blue and green letters indicate beta-glucosidase, endoglucanaase and cellobiohydrolase activities, respectively.

59

Each strain possessed multiple candidate cellulase genes in their genomes. So, I investigated which candidate genes are actually working during cellulolysis. We compared expression of the candidate cellulase genes between two different conditions, under the cellulase-inducing and non-inducing ones. The results showed that, in each bacterial strain, all cellulase genes were upregulated in the inducing condition in general (Tables 4.1- 4.4). However, each strain had some cellulase genes that were particularly highly upregulated under the inducing condition (i.e., ten times higher than the non-inducing condition).

In PB1, GENE_718 of the GH1 family (β-glucosidase), GENE_3008 and GENE_1657 of GH5 and GH26 (endo-β-1,4-glucanase), were highly upregulated. In strain SB2, the expression of

GENE_958 (GH1), and GENE_2942 of GH3 family (β-glucosidase), and GENE_174 belonging to family GH48 which possesses both endo-β-1,4-glucanase and cellobiohydrolase activities were also highly upregulated. In SB3 strain, four genes belonging to GH1 family, GENE_2665,

GENE_4343, GENE_4309 and GENE_4396 were highly upregulated. The other three genes,

GENE_74, GENE_1821 and GENE_1822 belonged to GH3, GH9 and GH48, respectively.

In PF1, GENE_3612 encoding β-glucosidase of GH3 family was upregulated. GENE_17375, which was annotated as a copper-depended lytic polysaccharide monooxygenases (LPMOs), was also highly upregulated. Although this gene had GH61 functional domain, it was not annotated as any kinds of cellulases in CaZy database. However, Merino et al. (2007) and Eijsink et al. (2019) reported that this enzyme catalyzed the oxidative cleavage of glycosidic bonds in cellulose, suggesting that this gene also encodes a cellulase, probably an endo-β-1,4-glucanase.

Results in Chapter 3 revealed that the three bacterial isolates were taxonomically very close to each other. The results also suggest that candidate cellulase genes conserved in their genomes are

60

also very similar among the bacterial strains. Nevertheless, our comparative expression analysis revealed that the sets of cellulases genes expressed highly during their cellulolysis were clearly different from each other (Tables 4.1- 4.4).

Upregulation of GH1-type β-glucosidase genes was commonly observed among the three bacterial strains during cellulolysis. However, only one GH1-type gene was upregulated in PB1 and SB2, while four GH1-types were upregulated in SB3. On the other hand, β-glucosidases genes of GH3 family were upregulated only in SB2 and SB3, although GH3-types were commonly found from genomes of all the isolates. On the other hand, a β-glucosidase gene of GH3 family was also highly upregulated in PF1, suggesting that β-glucosidases of GH3 family are commonly used in the cellulolysis of bacteria and fungi. Regarding endo-β-1,4-glucanase, genes used for cellulolysis were totally different among isolates. Endo-β-1,4-glucanase genes of GH5 and GH26 families were highly expressed in PB1, and a single genes of each GH48, GH9 and GH61 families were upregulated in SB2, SB3 and PF1, respectively. These results suggest that microorganisms use different sets of cellulase genes for their cellulases probably at the strain level.

4.3.2. Identification of Operon Structure

Operon structures play important roles to regulate multiple genes at the same time and conduct various biological reactions effectively in prokaryotes. I conducted the operon search using the genomes of isolates, and revealed that some of cellulase genes highly expressed during cellulolysis also formed an operon with other genes in their genomes.

In PB1, GENE_718 (GH1) formed an operon with a gene encoding a Lichenan permease IIC component. GENE_3008 (GH5) also formed an operon with two genes encoding YnfE and a hypothetical protein either of which is still uncharacterized (Figure 4.1). In SB2, GENE_958

61

(GH1) formed an operon with three contiguous genes (Figure 4.2). These genes encode cellobiose- specific components phosphotransferase system (PTS), EIIBC, EIIBA and EIIBB. GENE_2942

(GH3) formed with three genes, which encode hypothetical protein, D-aminopeptidase and PTS

EIIBC component. GENE_174 (GH48) formed an operon with a gene encodes endo-β-1,4- glucanase (GH9) (Figure 4.2).

Although strain SB3 has four GH1-type genes, GENE_2665, GENE_4343, GENE_4369 and

GENE_4396, they formed operons with different structures (Figure 4.3). GENE_2665 was grouped together with three contiguous genes encoding the PTS system EIIBC, EIIBA and EIIBB which are very similar to one of GENE_958 in SB2. GENE_4343 that were formed an operon with a gene encoding a Lichenan permease IIC component. GENE_4369 was included in an operon with two genes encoding DNA gyrase inhibitor and PTS system beta-glucoside-specific EIIBCA component. GENE_4396 formed an operon with a gene encoding a PTS system beta-glucoside- specific EIIBCA component. GENE_74, which encode a β-glucosidase of GH3 family, formed an operon with seven genes. Three genes encode hypothetical protein and the other genes encode PTS system EIIBC component, a putative HTH-type transcriptional regulator YbbH and a N- acetylmuramic acid 6-phosphate etherase. GENE_1821 and GENE_1822 which encode endo-β-

1,4-glucanase of GH9 and GH48 families, respectively, clustered in the same operon with the other endoglucanase gene of GH48 (Figure 4.3). I also measured the expression amounts of genes included in the operons with the cellulase genes. Most genes included in the operons were upregulated in general. Although cellulase genes in the operon were upregulated by 10-folds, upregulations of the adjacent genes ranged from 7.2 to 5,200 times (Table 4.9 – 4.11).

62

Table 4.5 Most upregulated cellulase genes and the abundance of gene expression in PB1

Cellulase GH Cellulase activity Gene expression ratio Cellular localization genes In PB1 family Inducing/non-inducing GENE_718 GH1 β-glucosidase/ exo-β-1,4- 31.44784519 Cytosol glucanase GENE_3008 GH5 Endo-β-1,4-glucanase / 49.72201418 Secreted cellulase/ cellulose β-1,4- cellobiosidase GENE_1657 GH26 endo-β-1,3-1,4- 6.514983949 Secreted glucanase;

Table 4.6 Most upregulated cellulase genes and the abundance of gene expression in SB2

Cellulase GH Cellulase activity Gene expression ratio Cellular localization genes In family Inducing/non-inducing SB2 GENE_958 GH1 β-glucosidase/ exo-β-1,4- 1646.4667758 Cytosol glucanase GENE_2942 GH3 β-glucosidase 68.87971099 Cytosol/Secreted GENE_174 GH48 Reducing end-acting 26.585151609 Secreted cellobiohydrolase/ endo- β-1,4-glucanase

Table 4.7 Most upregulated cellulase genes and the abundance of gene expression in SB3

Cellulase GH Cellulase activity Gene expression ratio Cellular localization genes In family Inducing/non-inducing SB3 GENE_2665 GH1 1454.046288 Cytosol GENE_4343 GH1 β-glucosidase/ exo-β-1,4- 44.81750469 GENE_4369 GH1 glucanase 22.61476877 GENE_4396 GH1 80.39414464 GENE_74 GH3 β-glucosidase 109.383894 Cytosol/Secreted GENE_1821 GH9 Endo-β-1,4-glucanase 21.01548829 Secreted

Table 4.8 Most upregulated cellulase genes and the abundance of gene expression in PF1

Cellulase GH Cellulase activity Gene expression ratio Cellular localization genes In family Inducing/non-inducing Fungi GENE_3612 GH3 β-glucosidase 311.5343911 Secreted GENE_17375 GH61 LPMOs 333.5403715 Secreted

63

Table 4.9 Expression rate of genes formed operon with cellulase genes in PB1

Operon PB1 GH family RPKM value in RPKM values in non- Inducing/non- inducing culture inducing culture inducing GENE_718 GH1 258.8642905 8.231543015 31.44784519 OP1 Lichenan permease GENE_719 IIC component 15.96955 1.090580697 14.64316216 GENE_3008 GH5 31.01714162 0.6238110449 49.72201418 OP2 GENE_3009 putative protein YnfE 132.0860788 1.277284201 103.4116594 GENE_3010 hypothetical protein 59.43761891 1.128910795 52.65041237

Table 4.10 Expression rate of genes formed operon with cellulase genes in SB2 NC is for not calculated values

Operon SB2 GH family RPKM value in RPKM values in Inducing/non- inducing culture non-inducing culture inducing GENE_958 GH1 73.69366732 0.04475867257 1646.4667758 GENE_959 PTS system EIIBA component 39.86379089 0.07218482957 552.2461039 OP1 GENE_960 PTS system EIIBC component 12.53456251 0 NC GENE_961 PTS system EIIBB component 34.20404892 0 NC GENE_2941 hypothetical protein 3.798233286 0.2318717574 16.38074998 GENE_2942 GH3 2.259563993 0.03280449294 68.87971099 OP2 GENE_2943 D-aminopeptidase 2.666373604 0.02347343717 113.5911023 GENE_2944 PTS system EIIBC component 1.867983583 0.257635286 7.250495893 OP3 GENE_173 GH9 0.05855300223 0.06460579038 0.9063119867

GENE_174 GH48 0.398326523 0.014983045 26.585151609 **NC is for not calculated values.

64

Table 4.11 Expression rate of genes formed operon with cellulase genes in SB3

Operon SB3 GH family RPKM value in RPKM values in non- Inducing/non-inducing inducing culture inducing culture

GENE_2665 GH1 509.7408188 0.3505671194 1454.046288 PTS system EIIBC GENE_2666 component 328.4384318 0 NC OP1 PTS system EIIBA GENE_2667 component 194.9699386 0 NC PTS system EIIBB GENE_2668 component 761.3082749 0.1460438485 5212.874646

GENE_4343 GH1 47.85326444 1.067736028 44.81750469 OP2 Lichenan permease IIC GENE_4344 component 26.44197027 1.094001192 24.16996477

GENE_4368 DNA gyrase inhibitor 1.197939538 0 NC

GENE_4369 GH1 68.32428719 3.021224223 22.61476877 OP3 PTS system beta- glucoside-specific GENE_4370 EIIBCA component 29.41356833 1.219663491 24.11613411

GENE_4396 GH1 378.23733 4.704787042 80.39414464 OP4 PTS system beta-glucoside- GENE_4396 specific EIIBCA component 311.6069842 3.461972979 90.00849691

GENE_73 hypothetical protein 6.977267358 0.3668906438 19.01729433

GENE_74 GH3 5.109965896 0.04671588942 109.383894

GENE_75 hypothetical protein 6.695832115 0.0443732047 150.8980963

GENE_76 hypothetical protein 3.535649668 0.4217527961 8.383227571 PTS system EIIBC OP5 GENE_77 component 5.261944852 0.4288436691 12.27007702 putative HTH-type transcriptional regulator GENE_78 YbbH 10.14733826 0.5785583228 17.53900662 N-acetylmuramic acid 6- GENE_79 phosphate etherase 11.56688213 0.775387443 14.91755153 N-acetylmuramic acid 6- GENE_80 phosphate etherase 10.47744189 0.9810336779 10.68000225

GENE_1821 GH9 0.4833728245 0.02300078959 21.01548829

OP6 GENE_1822 GH48 0.07074935986 0 NC

GENE_1823 GH48 2.855602989 0.3752101595 7.610676089 **NC is for not calculated values.

65

Our results showed that β-glucosidase genes of GH1 and GH3 families commonly form operons with genes encoding phosphotransferase system PTS in SB2 and SB3. β-glucosidases hydrolyze cellobiose (disaccharide) to glucose () in the cellulolysis process. The PTS system is required to transport the soluble cellobiose into the bacterial cells (Milton & Saier 2015). The genes products of GH-type β-glucosidases are predicted to be localized in cytosol, while endoglucanases and cellobiohydrolases are secreted outside cells in general (Tables 4.5 – 4.10), suggesting that the co-transcription of PTS carbohydrate transport system with GH1 and GH3- type β-glucosidases may make the utilization of celluloses for bacterial growth and other cellular mechanisms more effective.

I also found that some of β-glucosidases genes formed an operon with genes encoding proteins other than PTS system-related ones. These genes encode a wide range of proteins such as Lichenan permease IIC components, D-aminopeptidase, DNA gyrase inhibitor, putative HTH-type transcriptional regulator YbbH, N-acetylmuramic acid 6-phosphate etherase, hypothetical proteins and so on. These results suggest that cellulolysis is interacting with various kind of cellular mechanisms in bacterial strains such as metabolism of other than cellulose and the aerobic respiration by the co-regulation of genes included in the same operon with β- glucosidase genes.

When compared with the case of β-glucosidases, large difference was observed in the gene set of endoglucanases and cellobiohydrolases used for cellulolysis among the three isolates, in particular, between isolates of SB2 and SB3, and PB1 isolate. Endoglucanase of GH48 family was highly expressed in SB2, while GH9 endoglucanases were also upregulated in SB3 as well as GH48 one.

Interestingly, genes encoding GH9 and GH48 endoglucanases formed an operon in SB3, and the similar operon was also found in SB2 (Table 4.11), suggesting that the endoglucanase operon plays

66

important roles in the cellulolysis of SB2 and SB3. However, In PB1, although the operon composed GH9- and GH48-endoglucanases was also found in the genome, endoglucanases of

GH5 and GH26 families were upregulated instead during its cellulolysis.

GENE_719 GENE_718

Lichenan permease beta-glucosidase BglC IIC component (GH1)

GENE_3008 GENE_3009 GENE_3010

Endo-β-1,4-glucanase putative protein hypothetical protein (GH5) (Uncharacterized) YnfE

Figure 4.1 Operon structure of cellulase genes in PB1 strain

Figure 4.2 Operon structure of cellulase genes in SB2 strain

67

Figure 4.3 Operon structure of cellulase genes in SB3 strain

68

4.4 Conclusion

In summary, although multiple cellulase genes were found in the genome of the microorganisms, a certain set of cellulase genes were only working for cellulolysis. The sets of cellulase genes were different by isolates with regard to GH families. These differences were large even among bacterial isolates, although our MLTS analysis revealed that three strains are taxonomically very close to each other and therefore expected to have similar homologous genes in their genomes.

The operon structures of three bacterial strains showed that cellulase genes are co-regulated with other genes that have cellulase mechanisms occurring along with cellulolysis.

The genes included in the same cluster with cellulase genes were predicted to be involved in various kind of cellular reactions such as protein metabolism, cell-wall recycling, aerobic respiration and so on. These results suggest the presence of cross-talks between cellulolysis and other metabolic pathways in the bacterial isolates. Further characterization of these genes inside the operon will provide useful information to apply the isolates in industry.

69

Chapter 5. General Discussion

This study aimed to isolate and characterize cellulase-producing microorganisms from the Red

Sea. For attaining this aim, this study at first conducted a survey of microbial cellulases in the Red

Sea by the comparative metagenomes from a wide range of locations and depths in Chapter 2.

Cellulase genes were detected from all of the metagenomes tested, suggesting that this enzyme is ubiquitous in the Red Sea. In addition, cellulases detected were very diverged with regards to types of GH functional domains present in their sequences. The analysis also suggests that these cellulase orthologues were distributed depending on each depth. In the shallow depth (from 10m to 100m), cellulase orthologues of GH3 were dominated while GH5 orthologues were detected mostly in the deep depth (from 200m to 500m). GH3 functional domains have ß-1,4-glucosidase activities that hydrolyze , such as cellobiose into individual in the latter process of cellulolysis. On the other hand, GH5 domains are known to have the endo-1,4-ß-D-glucanase activity that breaks down the long chain of cellulose by cleaving internal bonds at amorphous sites

(Lee & Koo 2001). In addition to the diversity, this study also revealed that the abundance of cellulases are much higher in 10 m than in the other depths. These results suggest that microbial cellulolysis differs largely according to environments, in particular, by depth in the Red Sea. It is essential information for the understanding of the circulation of carbon-hydrates in the Red Sea environment.

To understand the cellulase-producing microorganisms in detail, this study isolated cellulase-producing microorganisms from the Red Sea. No isolates were directly obtained from the seawater (1 m). This result looks contradicted to the high abundance of cellulase genes in the shallow depth revealed in Chapter 2. The result of Chapter 2 revealed that most dominant cellulases in shallow water are ß-1,4-glucosidase genes of GH3 orthologues. I used CMC which can be

70

cleaved by the cellulase-producing microorganisms for the screening in this Chapter, and this compound requires endo-1,4-ß-D-. It may be one of reasons to explain the difficulty to isolate cellulase-producing activity from the seawater.

However, endo-1,4-ß-D-glucanases (GH5) are important enzymes when considering the application in the industry because it breaks long cellulose chains into small ones at the initial stage of cellulolysis. I was able to successfully obtain isolates from seaweeds growing in the Red

Sea. I also obtained isolates growing on plankton fraction from the surface seawater and used it as the seed of future inoculation. These results suggest that microorganisms producing cellulases of endo-1,4-ß-D-glucanase types are living associated with the body of organisms composed of cellulose. These isolates secreted cellulases with remarkably high activities to degrade materials composed of the cellulose like filter paper in a short period (3 days). This fact allows us to expect them to be a promising microbial catalyst for cellulolysis in industry. Our MLTS analysis classified the three bacterial isolates as B. paralicheniformis. The remaining one was a fungus showed to be closely related to A. ustus based on the 18S rRNA. Although there are several studies reporting cellulase activities from Bacillus and Aspergillus species (Rey et al. 2004; Bischoff et al. 2006), cellulase activities have not been reported in B. paralicheniformis and A. usutus.

Our genome and RNA sequencing revealed that our isolates have multiple cellulase genes in their genomes. Recent genome sequencing projects of microorganisms have revealed that there are several cellulase genes on their genomes. It is therefore important to know what kind of cellulases are actually used for their cellulolysis. In Chapter 4, I tried to identify cellulase genes that are actually working during cellulolysis by the comparative expression analysis of cellulase

71

genes. To my knowledge, it is the first attempt to find out cellulase genes functioning actually on their hydrolysis among distinct cellulases on genomes of microorganisms.

In fact, my results showed that although all the candidate genes were upregulated in general, a limited number of cellulase genes were particularly highly expressed during the cellulolysis.

These cellulases found here are highly expected to have a crucial role in cellulolysis. The gene sets include all of ß-1,4-glucosidase, endo-1,4-ß-D-glucanase and cellobiohydrolases genes in each isolate. However, especially in endo-1,4- ß-D-glucanase, the expressed genes varied among the three isolates, indicating the possibility that the gene set of cellulases are different by the strain in bacteria, and therefore it is important to characterize cellulase-producing bacteria at strain levels when applying them in industry.

Our operon structure analysis also provided the information to elucidate the cellular mechanisms occurring along with the cellulolysis in bacterial strains. Although operon structures of ß-1,4-glucosidase and PTS system component genes, which probably help proceed the degradation of cellulose polymer into a monosaccharide molecule effectively, are commonly found in the three bacterial isolates, cellulase genes clustered with genes involved in various kind of cellular reactions such as protein metabolism, cell-wall recycling (Uehara et al, 2006.), the aerobic respiration and so on. These results suggest the presence of cross-talks between cellulolysis and other metabolic pathways in the bacterial isolates. Further characterization of these genes inside the operon will provide useful information to apply the isolates in industry.

This study suggests strongly that cellulase-producing bacteria are actively conducting cellulolysis using a variety of cellulase genes in the Red Sea environments. The Red Sea exhibits high salinity (36–4 p.s.u), and temperature (24 °C in spring, and up to 35 °C in summer) (Othoum

72

et al., 2018). Therefore, it is expected to find cellulase-producing microorganisms or microbial cellulases that is stable under the harsh conditions with regard to high temperature and salinity by further comprehensive analysis. The importance provided by this research can be the fundamentals of new research utilizing the Red Sea as an attractive source for cellulases, which will contribute to the development of an excellent biorefinery system in the future.

73

References

Adney, B. & Baker, J., 1996. Measurement of cellulase activities. Laboratory analytical procedure, 6(465), p.1996.

Akamatsu, R. et al., 2019. Novel Sequence Type in Bacillus cereus Strains Associated with Nosocomial Infections and Bacteremia, Japan . Emerging Infectious Diseases, 25(5), pp.883–890.

Bartoszewicz, M. & Marjańska, P.S., 2017. Milk-originated Bacillus cereus sensu lato strains harbouring Bacillus anthracis-like plasmids are genetically and phenotypically diverse. Food Microbiology, 67, pp.23–30.

Behera, B.C. et al., 2017. Microbial cellulases–Diversity & biotechnology with reference to mangrove environment: A review. Journal of Genetic Engineering and Biotechnology, 15(1), pp.197–210.

Behzad, H. et al., 2016a. Metagenomic studies of the Red Sea. Gene, 576(2), pp.717–723.

Behzad, H. et al., 2016b. Metagenomic studies of the Red Sea. Gene, 576(2), pp.717–723. Available at: http://dx.doi.org/10.1016/j.gene.2015.10.034.

Berlemont, R. & Martiny, A.C., 2013. Phylogenetic distribution of potential cellulases in bacteria. Appl. Environ. Microbiol., 79(5), pp.1545–1554.

Bischoff, K.M. et al., 2006. Purification and characterization of a family 5 endoglucanase from a moderately thermophilic strain of Bacillus licheniformis. Biotechnology Letters, 28(21), pp.1761–1765.

Bóka, B. et al., 2019. Genome analysis of a Bacillus subtilis strain reveals genetic mutations determining biocontrol properties. World Journal of Microbiology and Biotechnology, 35(3), pp.1–14. Available at: http://dx.doi.org/10.1007/s11274-019-2625-x.

Brulc, J.M. et al., 2009. Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proceedings of the National Academy of Sciences, p.pnas-0806191105.

Cantarel, B.L. et al., 2008. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic acids research, 37(suppl_1), pp.D233–D238.

Chantarasiri, A., 2014. Novel halotolerant cellulolytic Bacillus methylotrophicus RYC01101 isolated from feces in Thailand and its application for bioethanol production. King Mongkut’s University of Technology North Bangkok International Journal of Applied Science and Technology, 7(3), pp.63–68.

Coil, D., Jospin, G. & Darling, A.E., 2015. A5-miseq: An updated pipeline to assemble microbial genomes from Illumina MiSeq data. Bioinformatics, 31(4), pp.587–589.

74

Crennell, S.J., Hreggvidsson, G.O. & Karlsson, E.N., 2002. The structure of Rhodothermus marinus Cel12A, a highly thermostable family 12 endoglucanase, at 1.8 Å resolution. Journal of molecular biology, 320(4), pp.883–897.

Das, S., Lyla, P.S. & Khan, S.A., 2006. Marine microbial diversity and ecology: importance and future perspectives. Current Science, pp.1325–1335.

Davies, G. & Henrissat, B., 1995. Structures and mechanisms of glycosyl hydrolases. Structure, 3(9), pp.853–859.

Debroas, D. et al., 2009. Metagenomic approach studying the taxonomic and functional diversity of the bacterial community in a mesotrophic lake (Lac du Bourget–France). Environmental microbiology, 11(9), pp.2412–2424.

Duan, C.-J. & Feng, J.-X., 2010. Mining metagenomes for novel cellulase genes. Biotechnology letters, 32(12), pp.1765–1775.

Eddy, S.R., 1998. Profile hidden Markov models. Bioinformatics (Oxford, England), 14(9), pp.755–763.

Eddy, S.R., 2011. Accelerated profile HMM searches. PLoS computational biology, 7(10), p.e1002195.

El-Morsy, E.M., 2000. Fungi isolated from the endorhizosphere of halophytic plants from the Red Sea Coast of Egypt. Fungal Diversity, 5, pp.43–54.

Feng, Y. et al., 2007. Cloning and identification of novel cellulase genes from uncultured microorganisms in rabbit cecum and characterization of the expressed cellulases. Applied microbiology and biotechnology, 75(2), pp.319–328.

Fukuoka, A. & Dhepe, P.L., 2006. Catalytic conversion of cellulose into sugar alcohols. Angewandte Chemie International Edition, 45(31), pp.5161–5163.

Galante, Y.M., De Conti, A. & Monteverdi, R., 2014. Application of Trichoderma enzymes in the textile industry. Trichoderma & Gliocladium, 2, pp.311–325.

Gopinath, S.M., Shareef, I. & Ranjit, S., 2014. Isolation , Screening and Purification of Cellulase from Cellulase Producing Klebsiella variicola. , 3(6), pp.1398–1403.

Gupta, P., Samant, K. & Sahu, A., 2012. Isolation of cellulose-degrading bacteria and determination of their cellulolytic potential. International Journal of Microbiology, 2012.

Haas, B.J. et al., 2013. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nature Protocols, 8(8), pp.1494– 1512.

Hakamada, Y. et al., 2002. Enzymatic properties, crystallization, and deduced amino acid sequence of an alkaline endoglucanase from Bacillus circulans. Biochimica et Biophysica

75

Acta (BBA)-General Subjects, 1570(3), pp.174–180.

Jeong, D.W. et al., 2018. Urease characteristics and phylogenetic status of bacillus paralicheniformis. Journal of Microbiology and Biotechnology, 28(12), pp.1992–1998.

Kanokratana, P. et al., 2008. Identification and expression of cellobiohydrolase (CBHI) gene from an endophytic fungus, Fusicoccum sp.(BCC4124) in Pichia pastoris. Protein expression and purification, 58(1), pp.148–153.

Kasana, R.C. et al., 2008. A rapid and easy method for the detection of microbial cellulases on agar plates using Gram’s iodine. Current microbiology, 57(5), pp.503–507.

Katoh, K. & Standley, D.M., 2013. MAFFT Multiple Sequence Alignment Software Version 7 : Improvements in Performance and Usability Article Fast Track. , 30(4), pp.772–780.

Kim, Y.-K. et al., 2012a. Isolation of cellulolytic Bacillus subtilis strains from agricultural environments. ISRN microbiology, 2012.

Kitago, Y. et al., 2007. Crystal structure of Cel44A, a glycoside hydrolase family 44 endoglucanase from Clostridium thermocellum. Journal of Biological Chemistry, 282(49), pp.35703–35711.

Kuhad, R.C., Gupta, R. & Singh, A., 2011. Microbial cellulases and their industrial applications. Enzyme research, 2011.

Langmead, B. & Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), p.357.

Lee, S.-M. & Koo, Y.-M., 2001. Pilot-Scale Production of Cellulase Using Rut C-30 Fed-Batch Mode. Journal of Microbiology and Biotechnology, 11(2), pp.229–233.

Leis, B. et al., 2015. Functional Screening of Hydrolytic Activities Reveals an Extremely Thermostable Cellulase from a Deep-Sea Archaeon. Frontiers in Bioengineering and Biotechnology, 3(July), p.95. Available at: http://www.ncbi.nlm.nih.gov/pubmed/26191525.

Malburg, S.R. et al., 1997. Catalytic properties of the cellulose-binding endoglucanase F from Fibrobacter succinogenes S85. Applied and environmental microbiology, 63(6), pp.2449– 2453.

Milton, H. & Saier, J., 2015. Years After Its Discovery. J Mol Microbiol Biotecnnol, 25(0), pp.73–78.

Mizrachi, E. et al., 2010. De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq. BMC Genomics, 11(1).

Nakada, M. et al., 1994. RFLP analysis for species separation in the generaBipolaris andCurvularia. Mycoscience, 35(3), pp.271–278.

76

Ngugi, D.K. et al., 2012. Biogeography of pelagic bacterioplankton across an antagonistic temperature–salinity gradient in the Red Sea. Molecular ecology, 21(2), pp.388–405.

Nishida, Y. et al., 2007. Isolation and primary structure of a cellulase from the Japanese sea urchin Strongylocentrotus nudus. Biochimie, 89(8), pp.1002–1011.

Oki, S. et al., 2014. SraTailor: Graphical user interface software for processing and visualizing Ch IP‐seq data. Genes to Cells, 19(12), pp.919–926.

Park, B.H. et al., 2010. CAZymes Analysis Toolkit (cat): Web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database. Glycobiology, 20(12), pp.1574–1584.

Patagundi, B.I., Shivasharan, C.T. & Kaliwal, B.B., 2014. Isolation and characterization of cellulase producing bacteria from soil. International Journal of Current Microbiology and Applied Sciences, 3(5), pp.59–69.

Punta, M. et al., 2012. The Pfam protein families database. Nucleic Acids Research, 40(D1), pp.290–301.

Qin, J. et al., 2010. A human gut microbial gene catalogue established by metagenomic sequencing. nature, 464(7285), p.59.

Qiu, J. et al., 2016. The complete genome sequence of the nicotine-degrading bacterium Shinella sp. HZN7. Frontiers in microbiology, 7, p.1348.

Quail, M.A. et al., 2012. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC genomics, 13(1), p.341.

Rey, M.W. et al., 2004. Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species. Genome biology, 5(10), p.r77.

Rice, P., Longden, I. & Bleasby, A., 2000. EMBOSS: the European molecular biology open software suite. Trends in genetics, 16(6), pp.276–277.

Robson, L.M. & Chambliss, G.H., 1989. Cellulases of bacterial origin. Enzyme and Microbial Technology, 11(10), pp.626–644.

Sadhu, S. & Maiti, T.K., 2013. Cellulase production by bacteria: a review. British Microbiology Research Journal, 3(3), p.235.

Saitoh, Y., Izumitsu, K. & Tanaka, C., 2009. Phylogenetic analysis of heavy-metal ATPases in fungi and characterization of the copper-transporting ATPase of Cochliobolus heterostrophus. Mycological research, 113(6–7), pp.737–745.

Samira, M., Mohammad, R. & Gholamreza, G., 2011. Carboxymethyl-cellulase and filter- paperase activity of new strains isolated from Persian Gulf. Microbiology Journal, 1(1),

77

pp.8–16.

Schmieder, R. & Edwards, R., 2011a. Quality control and preprocessing of metagenomic datasets. Bioinformatics, 27(6), pp.863–864.

Schmieder, R. & Edwards, R., 2011b. Quality control and preprocessing of metagenomic datasets. Bioinformatics, 27(6), pp.863–864.

Shamala, T.R. & Sreekantiah, K.R., 1986. Production of cellulases and d-xylanase by some selected fungal isolates. Enzyme and Microbial Technology, 8(3), pp.178–182.

Shanmugapriya, K. et al., 2012. Isolation, screening and partial purification of cellulase from cellulase producing bacteria. International Journal of Advanced Biotechnology and Research, 3(1), pp.509–514.

Shareef, I., Satheesh, M. & Christopher, S.X., 2015. Isolation and Identification of Cellulose Degrading Microbes. International Journal of Innovative Research in Science, Engineering and Technology, 8(4), pp.6788–6793.

Solovyev, V. V et al., 2011. Automatic Annotation of Bacterial Community Sequences and Application To Infections Diagnostic. , (March 2014), pp.346–353.

Sukharnikov, L.O. et al., 2011. Cellulases: ambiguous nonhomologous enzymes in a genomic perspective. Trends in biotechnology, 29(10), pp.473–479.

Tamura, K. et al., 2013. MEGA6: molecular evolutionary genetics analysis version 6.0. Molecular biology and evolution, 30(12), pp.2725–2729.

Teeri, T.T., 1997. Crystalline cellulose degradation: new insight into the function of cellobiohydrolases. Trends in biotechnology, 15(5), pp.160–167.

Tsimpidis, M. et al., 2017. T-RECs: Rapid and large-scale detection of recombination events among different evolutionary lineages of viral genomes. BMC Bioinformatics, 18(1), pp.1– 8. Available at: http://dx.doi.org/10.1186/s12859-016-1420-z.

Urakawa, H., Martens-Habbena, W. & Stahl, D.A., 2010. High abundance of ammonia-oxidizing Archaea in coastal waters, determined using a modified DNA extraction method. Applied and environmental microbiology, 76(7), pp.2129–2135.

Vartoukian, S.R., Palmer, R.M. & Wade, W.G., 2010. Strategies for culture of ‘unculturable’bacteria. FEMS microbiology letters, 309(1), pp.1–7.

Wagner, G.P., Kin, K. & Lynch, V.J., 2012. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory in biosciences, 131(4), pp.281–285.

Wang, F. et al., 2009. Isolation and characterization of novel cellulase genes from uncultured microorganisms in different environmental niches. Microbiological research, 164(6),

78

pp.650–657.

Warnecke, F. et al., 2007. Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher . Nature, 450(7169), p.560.

Woo, H.L. et al., 2014. Complete genome sequence of the lignin-degrading bacterium Klebsiella sp. strain BRL6-2. Standards in genomic sciences, 9(1), p.19.

Yooseph, S. et al., 2007. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS biology, 5(3), p.e16.

Zhang, Y.P. & Lynd, L.R., 2004. Toward an aggregated understanding of enzymatic hydrolysis of cellulose: noncomplexed cellulase systems. Biotechnology and bioengineering, 88(7), pp.797–824.

APPENDIX

Chapter 2

79

SRR2103001-Depth 10m SRR2102994-Depth 10m SRR2103008-Depth 10m SRR2103017-Depth 10m

GH5 GH1 GH48 GH8 GH1 GH8 2% GH5 5% 5% GH1 3% 10% GH5 7% 7% GH1 16% GH5 4% 22% 12% GH7 28%

GH3 56% GH3 GH3 GH3 73% 83% 59%

SRR2103021-Depth 10m SRR2103027-Depth 10m SRR2103038-Depth 10m GH1 GH48 GH1 GH12 GH48 GH1 3% 8% 11% 15% GH8 14% GH12 20% 1% 7% GH30 GH5 1% GH7 3% GH7 10% 10%

GH5 GH5 3% 11% GH3 GH3 52% 79% GH3 51%

GH1 GH3 GH5 GH6 GH7 GH9 GH26 GH30 GH12 GH44 GH48 GH61

Figure 2.5 The abundance of cellulase orthologs in 10 m depth of Red Sea metagenome.

80

SRR2102996-Depth 25m SRR2102995-Depth 25m SRR2103002-Depth 25m SRR2103009-Depth 25m GH30 GH26 6% GH7 GH5 GH6 GH44 1% GH5 GH1 1% 4% 2% GH7 3% GH7 7% GH1 GH1 21% 10% 3% 21% 18%

GH5 GH1 10% 40%

GH5 34%

GH3 GH3 35% 52% GH3 58% GH3 73%

SRR2103015-Depth 25m SRR2103022-Depth 25m SRR2103028-Depth 25m SRR2103029-Depth 25m GH7 GH26 GH8 GH6 GH51% GH44 GH6 GH30 1% GH1 GH1 GH6 7% 4% 1% 2% 4% 2% 15% GH5 12% 2% GH5 GH1 6% GH1 13% 30% 28% GH5 17%

GH3 GH3 68% GH3 GH3 43% 67% 75%

GH1 GH3 GH5 GH6 GH7 GH8 GH9 GH26 GH30 GH48 GH61

Figure 2.6 The abundance of cellulase orthologs in 25 m depth of Red Sea metagenome.

SRR2102997-Depth 50m SRR2103003-Depth 50m SRR2103006-Depth 50m SRR2103010-Depth 50m

GH26 GH44 GH45 GH30 GH5 1% GH8 GH5 GH26 GH1 3% 10% GH1 GH7 8% 6% 1% 1% 1% 17% 21% 3% GH5 GH1 GH5 2% 30% 10%

GH1 GH3 50% 48%

GH3 GH3 GH3 60% 66% 62%

SRR2103016-Depth 50m SRR2103023-Depth 50m SRR2103030-Depth 50m SRR2103034-Depth 50m GH48 GH44 GH44 GH26 GH30 GH1 GH26 2% GH8 GH8 5% GH8 2% GH1 1% 1% 13% 1% 9% GH75% 2% GH1 GH30 17% GH6 2% 22% 24% GH6 5% GH1 GH5 16% 39% 11% GH5 12% GH5 GH7 5% 2%

GH5 13% GH3 44% GH3 GH3 GH3 33% 64% 50%

GH1 GH3 GH5 GH6 GH7 GH8 GH30 GH44 GH48 GH61

Figure 2.5 The abundance of cellulase orthologs in 50 m depth of Red Sea metagenome.

81

SRR2102998-Depth 100m SRR2103004-Depth 100m SRR2103018-Depth 100m

GH48 GH1 GH30 GH45 GH48 GH7 GH30 GH45 5% 3% GH6 1% 1% 1% 2% 4% 1% GH3 1% GH44 GH6 17% 1% 3% GH1 GH5 36% 32%

GH1 54%

GH5 GH3 50% GH3 4% GH5 10% 71%

SRR2103024-Depth 100m SRR2103035-Depth 100m SRR2103031-Depth 100m GH8 GH48 GH61 3% GH48 GH48 5% 6% 3% GH1 GH1 GH45 17% 5% GH5 GH1 14% 23% 8% 32% GH6 4%

GH3 6%

GH3 GH5 33% GH3 29% GH5 49% 60%

GH1 GH3 GH5 GH6 GH26 GH30 GH45 GH48

Figure 2.6 The abundance of cellulase orthologs in 100 m depth of Red Sea metagenome.

82

SRR2102999-Depth 200m SRR2103000-Depth 258m SRR2103005-Depth 200m SRR2103012-Depth 200m GH45 GH26 GH48GH1 GH1 GH45 2% GH9 3% GH1 GH48 2% 4% GH3 6% 4% 3% GH8 2% GH45 9% GH3 3% 2% 2% 11% GH1 GH8 33% 1% GH3 26%

GH5 46%

GH3 15% GH5 GH5 89% 61% GH5 73%

SSR2103025-Depth 200m SRR2102999-Depth 200m SRR2103032-Depth 200m GH30GH48 SRR2103036-Depth 200m GH48 2% 3% GH1 GH1 GH48 GH1 GH30 2% GH26 10% 2% GH26 6% GH48 GH1 13% 9% GH8 10% GH26 7% 11% 4% 1% GH3 2% 19% GH3 21% GH3 GH3 GH6 GH9 26% 21% 16% 27% GH5 7%

GH5 GH6 GH5 GH6 16% 57% GH5 32% 12% 62%

GH1 GH3 GH5 GH6 GH7 GH9 GH26 GH30 GH12 GH48

Figure 2.7 The abundance of cellulase orthologs in 200 m depth of Red Sea metagenome.

SRR2103007-Depth 500m SRR2103026-Depth 500m SRR2103033-Depth 500m SRR2103037-Depth 500m GH1 GH30 GH48 GH45 2% GH3 8% 10% GH45 GH48 11% GH30 3% GH1 GH1 GH1 5% 2% 14% GH26 21% 8% GH26 21% GH8 6% 6% 4%

GH6 9% GH6 17% GH30 GH5 GH5 GH3 42% 30% 36% 18% GH3 31% GH5 GH5 GH3 44% 14% 35% SRR2103020-Depth 500m

GH45 GH30 2% GH1 9% 16%

GH26 5%

GH6 17% GH3 29%

GH5 21%

GH1 GH3 GH5 GH6 GH8 GH9 GH26 GH30 GH44 GH45 GH48

Figure 2.8 The abundance of cellulase orthologs in 500 m depth of Red Sea metagenome.

83

Sequences of highly expressed cellulase genes

>GENE_718_PB1_GH1 MARQTWKIPADFILGAAASAWQTEGWAGKRPTQDSYLDMWYKNDPHVWHNGYGPAVATDFYNRYKED IHHMKEIGLTHYRTSINWSRFLIDYETAEVDEVYAGYIDDVINELIASGVEPMICLEHYEIPAVLMEKYGGWGS KHVIDLFAAYAKKVFERYGDRVKYWFTFNEPIVPQTRIFLDAIRYPYEQNTKKWMQWNFNKALATAKCVRLF HARKSDCLEGAKIGVILNPEVTYARSSAPHDQKAARIYDLFFNRVFLDPSIKGAYPDELIELLVKHDILFDHDEAE LDIIKQHTVDFAGINLYYPRRVKAPSRQWNDSTPFHPAYYYEYFELPGRKMNPFRGWEIYPQIVYDMAMRLK HEYGNIEWLIAENGMGVEHEERFKNEEGVIQDDYRIDFISAHLREAMKGIADGANCKGYMLWAFTDNVSP MNAFKNRYGLVEIQLEDNRSRALKKSACFYRDIIKNRQFETEEFRYK

>GENE_3008_PB1_GH5 MSYMKRSISVFIACFMVAALGISGIIAPKASAASQTPVAVNGQLTLKGTQLVNQKGKAVQLKGISSHGLQWY GDYVNKDSLKWLRDDWGINVFRAAMYTGEGGYIDNPSVKNKVKEAVEAAKELGIYVIIDWHILSDGNPNQ NKAKAKEFFNEMSRLYGKTPNVIFEIANEPNGDVNWNRDIKPYAEDILSVIRKNSPKNIVIVGTGTWSQDVN DAADNQLKDGNVMYALHFYAGTHGQSLRDKADYALSKGAPIFVTEWGTSDASGNGGVYLDQSREWLKYL DSKKISWVNWSLCDKQESSAALNSGASKKGGWSQSDLSSSGKFVRENIRSGSNGSSGDSGSDSKGSDQKDQ KKDQDKPGQDSGAAANTIAVQYRAGDNNVNGNQIRPQLNIKNNSKKTVSLNRITVRYWYKTNHKGQNFD CDYAQIGCSKLTHKFVQLKKAVNGADTYLEVGFKNGTLAPGASTGEIQIRLHNDNWSNYAQIGDYSFSSGSN TFKNTKKITLYENGKLIWGAEPK

>GENE_1657_PB1_GH26 VKKSIVCSIFALLLAFAVSQPSYAHTVSPVNPNAQPTTKAVMNWLAHLPNRTENRVMSGAFGGYSLDTFSLA EADRIKQATGQLPAVYGCDYARGWLEPEEIADTIDYSCNSDLIAYWKSGGIPQISLHLANPAFTSGHYKTQISN SQYERILDSSTPEGKRLEAMLSKIADGLQELENEGVPVLFRPLHEMNGEWFWWGLTQYNQKDSVRISLYKRL YVKIYDYMTKTRGLDHLLWVYAPDANRDFKTDFYPGASYVDIVGLDAYFDDPYAIDGYDQLTSLNKPFAFTEV GPQTTNGGLDYARFIHAIKEKYPNTTYFLAWNDEWSPTVNKGAGALYLHPWTLNKGDIWDGGSLTPVVE

84

>GENE_958_SB2_GH1 MSKTELQLEQIQYRFPKGFWWGSAASATQTEGAAAEGGKGKNIWDHWYEKEPNRFFDGVG PEKTSRFYETYREDIQLMKELGHNSFRFSISWARLFPEGKGRLNEEGAAFYNQVIDELLA AGIEPFVNLYHFDMPLALQYIGGWENRQVVDHFVSYAETCFRLYGDRVKKWFTHNEPIVP AEGGYLYDFHYPNIIDFQKAVQVAYHEILSNAKAVEAYRRLGGDGKIGIILNLTPSYPRS QHPADVRASEIADAFFNRSFLDPAVKGEFPQLLTDILKEKGYLPVMEEGDLELIKNHTVD LLGINYYQPRRVKAKEHMPHPDAPFMPERFFDHYEMPGRKMNRHRGWEIYEKGIYDILMN VKENYGNFECFISENGMGVEGEERFRGEDGIIRDDYRIAFIEEHLKWVHRAIQEGANVKG YHLWTFMDNWSWTNAYKNRYGFVSVNLDKNGERTIKKSGYWFKKLAENNGF

85

>GENE_2942_SB2_GH3 MKRFLQCALIALLLSSLALQPAAREAEAKQQPEQHLKQMVSSMSLEEKIGQMLMPDFRNW KKKGESNAKGLTKMNDEVAGIIQKYRLGGVILFAENVTGTEQTVRLTDGLQKASPDIPLF ITIDQEGGIVTRLESGTNLPGNMAVGASRSSKNAFKSGKIIGKELASLGINVNFSPVLDV NNNPGNPVIGVRSFSSKPELTSKLGIQMMKGLQDEQMIATAKHFPGHGDTAVDSHYGLPL VPHDEKRLRSIELAPFQKAIDAGIDMIMTAHVQFPAFDNTTYKSKKDGEDIMVPATLSKK VMTDLLRKDLGFKGVVVTDALNMKAISDNFGQEEAVVMAVKAGVDIALMPAQVTSLETEK NLARVFEALLTAVKKGEIPIEQIDQSVERILQLKINRGIIDHTGSEPLQKKIKYALKTVG SNKHMKSERKMARESVTILKNEKSTLPFKPKKGDTVLILSPYEEQTAAIAKTISKIKKNI KVVEYRFAEKTFDEEIQKKIDEADYVITGSYVVKNDPVVNDGVIDDSIQDSSKWATAFPR AAMKYAQANGKKFVLMSLRNPYDTANFEEAAAVIAVYGFKGYANGRFRQPNIPAGVEVIF GKAKPKGTLPVDIPSVTRPGETLYPFGYGLNIKNGKPLHKGGS

>GENE_174_SB2_GH48 MYNKTRFMQLYEQIKNPQNGYFSPEGIPYHSVETLICEAPDYGHMTTSEAYSYWLWLEAM YGRYTQDWSKLEAAWDNMEKYIIPVNEGDGNEEQPTMNYYNPSSPATYAAEHRYPDLYPS ALTGQYPAGNDPLDSELRSTYGSNETYLMHWLLDVDNWYGFGNLLNPSHTAVYVNTYQRG EQESVWETVPHPSQDNQTFGKPNEGFMSLFTKENQAPAPQWRYTNATDADARAVQAMYWA MQWGYSNTKYLEKAKKMGDFLRYGMYDKYFQEIGSAADGSPSRGTGKNACHYLMAWYTAW GGGLGQYANWAWRIGASHVHQGYQNPVASYALSTAEGGLVPNSSTARSDWEQALKRQLEL YTWLLSSEGAVAGGATNSWNGSYSAYPQNVSTFYGMAYTEAPVYHDPPSNNWFGMQVWPL ERVAELYYIFAEKGDKSSENFQMAKHVIEKWIAYSLDYVFVGERPVTDEEGYYLNEAGER VLGGQNPQIAVQSDPGEFWIPANLEWSGQPDPWKGFDSFTGNPGLHVTTKNPSQDVGVLG SYIKTLVFFAAGTKAETGGFTALGNKAKNVAKELLDAAWNKNDGIGIAAEEEHEDYIRYF TKEVYFPNGWSGKNGQGNTIPGSNTVPSDPAKGGNGVYISHADLRPKIKNDPMWPYLENK YQTSWNPNTGKWENGLPTFVYHRFWSQVDMATAYAEYDRLIGNA

>GENE_2665_SB3_GH1

86

MSKTELQLEQIQYRFPKGFWWGSAASATQTEGAAAEGGKGKNIWDHWYEKEPNRFFDGVG PEKTSRFYETYREDIQLMKELGHNSFRFSISWARLFPEGKGRLNEEGAAFYNQVIDELLA AGIEPFVNLYHFDMPLALQYIGGWENRQVVDHFVSYAETCFRLYGDRVKKWFTHNEPIVP AEGGYLYDFHYPNIIDFQKAVQVAYHEILSNAKAVEAYRRLGGDGKIGIILNLTPSYPRS QHPADVRASEIADAFFNRSFLDPAVKGEFPQLLTDILKEKGYLPVMEEGDLELIKNHTVD LLGINYYQPRRVKAKEHMPHPDAPFMPERFFDHYEMPGRKMNRHRGWEIYEKGIYDILMN VKENYGNFECFISENGMGVEGEERFRGEDGIIRDDYRIAFIEEHLKWVHRAIQEGANVKG YHLWTFMDNWSWTNAYKNRYGFVSVNLDKNGERTIKKSGYWFKKLAENNGF

>GENE_4343_SB3_GH1 MARQTWKIPADFILGAAASAWQTEGWAGKRPTQDSYLDMWYKNDPHVWHNGYGPAVATDF YNRYKEDIHHMKEIGLTHYRTSINWSRFLIDYETAEVDEVYAGYIDDVINELIASGVEPM ICLEHYEIPAVLMEKYGGWGSKHVIDLFAAYAKKVFERYGDRVKYWFTFNEPIVPQTRIF LDAIRYPYEQNTKKWMQWNFNKALATAKCVRLFHARKSDCLEGAKIGVILNPEVTYARSS APHDQKAARIYDLFFNRVFLDPSIKGAYPDELIELLVKHDILFDHDEAELDIIKQHTVDF AGINLYYPRRVKAPSRQWNDSTPFHPAYYYEYFELPGRKMNPFRGWEIYPQIVYDMAMRL KHEYGNIEWLIAENGMGVEHEERFKNEEGVIQDDYRIDFISAHLREAMKGIADGANCKGY MLWAFTDNVSPMNAFKNRYGLVEIQLEDNRSRALKKSACFYRDIIKNRQFETEEFRYK

>GENE_4309_SB3_GH1 LQYKGMKRILAALTVLTCMQGAAIIMQAEWLAEAVTRLFNGERVGSLVPLIILFTAAFLF RHAVTLVRQRLIFDYAAKTGADLRKKFLEKLFQSGPGLARKEGTGHVVTLAMEGIAQFRR YLELFLPKMISMAVIPPAVVCYVFFKDTSSAAVLMITLPILIAFMILLGYAAKRKADSQW KTYEMLSNHFTDSLRGLETLKVLGMSRSHTKNIFHVSERYRKATMGTLKIAFLSSFALDF FTMLSVATVAVFLGLGLVDGTIILEPALAILILAPEFFLPVREVGNDYHATLNGREAGKA IKAILESPGFKDEAPLDLERWSDDDQIEFKDVEVRHDEEENSSLSGISLSFKGKKKIGII GESGAGKSTLIDVLGGFLETKSGVIKVGGKERTHLQTDSWQNQLLYIPQHPYIFPDTLGA NIRFYHPGASDEEVEQAARAAGLTELIDQLPSGLEERIGEGGRALSGGQAQRTAVARAFL GNRPILLLDEPTAHLDIETEYELKKTMLKLFEDKLVFMATHRLHWMLDMDEIIVLKNGQV

87

AETGTHQELIEKRGVYYELVQAQSFGGAS

>GENE 4396_SB3_GH1 MTEQTKKFPDGFLWGGAVAANQVEGAYNVGGKGLSTADVSPNGVMYPFDESMKSLNLYHK GIDFYHRYKEDIALFAEMGFKAFRTSIAWTRIFPNGDESEPNEEGLEFYDRLFDELLKYN IEPVVTISHYEMPLGLIKKYGGWKNRKVIECYEHYAKTVFTRYKDKVKYWMTFNEINMVL HAPFTGGGLVFEEGENQLNAMYQAAHHLFVASALAVKAGKDIIPDAKIGCMIAATTTYPM TPKPEDVLAAMENERRTLFFSDVQARGAYPGYMKRFFKENGIAIEMAEGDEDILKENTVD YIGFSYYMSMVASTNPEDLAKTGGNLLGGVKNPYLESSEWGWQIDPKGIRITLNTLYDRY QKPLFIVENGLGAVDVVEEDGSIQDDYRINYLRDHLKEVREAIADGVDLIGYTSWGPIDL VSASTAEMKKRYGYIYVDRDNEGNGTFARTRKKSFYWYKKVIETNGESL

>GENE_74_SB3_GH3 MKRFLQCALIALLLSSLALQPAAREAEAKQQPEQHLKQMVSSMSLEEKIGQMLMPDFRNW KKKGESNAKGLTKMNDEVAGIIQKYRLGGVILFAENVTGTEQTVRLTDGLQKASPDIPLF ITIDQEGGIVTRLESGTNLPGNMAVGASRSSKNAFKSGKIIGKELASLGINVNFSPVLDV NNNPGNPVIGVRSFSSKPELTSKLGIQMMKGLQDEQMIATAKHFPGHGDTAVDSHYGLPL VPHDEKRLRSIELAPFQKAIDAGIDMIMTAHVQFPAFDNTTYKSKKDGEDIMVPATLSKK VMTDLLRKDLGFKGVVVTDALNMKAISDNFGQEEAVVMAVKAGVDIALMPAQVTSLETEK NLARVFEALLTAVKKGEIPIEQIDQSVERILQLKINRGIIDHTGSEPLQKKIKYALKTVG SNKHMKSERKMARESVTILKNEKSTLPFKPKKGDTVLILSPYEEQTAAIAKTISKIKKNI KVVEYRFAEKTFDEEIQKKIDEADYVITGSYVVKNDPVVNDGVIDDSIQDSSKWATAFPR AAMKYAQANGKKFVLMSLRNPYDTANFEEAAAVIAVYGFKGYANGRFRQPNIPAGVEVIF GKAKPKGTLPVDIPSVTRPGETLYPFGYGLNIKNGKPLHKGGS

>GENE_1821_SB3_GH9 LKEKAFWKMKAFFFVLLLTFAMLFMPVSGKADIASAKESQNYAELLQKSILFYEAQRSGK LPESSRLNWRGDSALEDGKDVGHDLTGGWYDAGDHVKFGLPMAYSAAVLSWSVYEYRDAY EAAGQLDAILDNIRWATDYFIKAHTDRYEFWGQVGHGAQDHAWWGPAEVMPMKRPAYKID

88

AACPGSDLAGGTAAALASASIIFKPTDASYSNKLLAHAKELYDFADRYRGKYSDCITDAQ QYYNSWSGYKDELTWGAVWLYLATGDQKYLDKALASVSDWGDPANWPYRWTLSWDDVTYG AQLLLARLTNESRFTTSVERNLDYWSTGYNHNGSTERITYTPGGLAWLEQWGSLRYASNA AFLAFVYSDWVKDAGKAKRYRDFAVQQMNYMLGDNPQQRSFIVGYGTNPPKHPHHRTAHG SWADHMNVPENHRHTLYGALVGGPGKDDSYRDETNDYVSNEVAIDYNAAFTGNAAKMFQL YGAGQSPLPHFPEKETPEDEFFAEASINSSGNNYSEIRVQLNNRSGWPAKKTDKLSFRYY VDLTEAVNAGFSSEDIKISTGYNEGASVSQLKPYHIGEHIYYTEVSFSGVMIYPGGQSAH KKEVQFRLAAPAGTSFWNPKNDHSYRGLSHTLAKTRYIPVYDEGRLVFGNEPD

>GENE_1822_SB3_GH48 MYNKTRFMQLYEQIKNPQNGYFSPEGIPYHSVETLICEAPDYGHMTTSEAYSYWLWLEAM YGRYTQDWSKLEAAWDNMEKYIIPVNEGDGNEEQPTMNYYNPSSPATYAAEHRYPDLYPS ALTGQYPAGNDPLDSELRSTYGSNETYLMHWLLDVDNWYGFGNLLNPSHTAVYVNTYQRG EQESVWETVPHPSQDNQTFGKPNEGFMSLFTKENQAPAPQWRYTNATDADARAVQAMYWA MQWGYSNTKYLEKAKKMGTFSVTACMTNTFKRLEALLTARLPAELEKMPVII

>TRINITY_DN3612_c0_g2_i1_5_F_GH3 QMLLAITWSGARFPVSQTRAMVMSLLXAALLSIIRLFGLYSLPREWSWSGSHSIKLSAXYLRNVIRPTVVAHLP GLDSTVLHEPPNLVLSSRELKPTAGAPCCQLHRRRRSPLTFHAPLXHGFSTPTEVRGPSHQALAWHSSQEPRA IMPLKEDMLPPAWDNLDRQMGQLFMMGFDGTTVSPQIRSLIENYHLGSILLTAKNLKSAEDATQLVLELQTI ARNAGHPVPLLIALDQENGGVNSLYDEIFIRQFPSAMGITATGSKTLAHDVAYATAQELKAVGVNWILGPVL DVLNNVRNQLMGVRTCGDDPQEVSQYGVEFVKGYQEGGLVTCGKHFPSYGNLEFLGSQTDVPIITESLEQLS LTALVPFRNAIINGLDAMMVGGVSMSSAGMNVMHACLSEQVVDDLLRKDLKFDGVVVSECLEMEALTHNI GVGGGTVMAKNAGCDIVLLCRSFQVQQEAINGLKLGVENGILSRTQIEQSLKRVLALKSKCTSWEQALNPGG LPSLTQMQPSHTSLSTRAYSNSISVVRDRKNLLPLSNLVSANEELLLLTPLVKPLPASAVSRSVTEQLEMSIDAV AWDRTASVLSGESVFKEMGRSLSRHRNGRVLHTSYTSNGVRPIHESLIDRASAVIVVTADAVRNMYQQGFT KHVSMICRSQFTPSGEPREKPMVVVAASSPYDFVMDTTIGTYICTYDFTDTALEALVRVLYGESAPKGSLPGSF NRSQKLHQARQHWLVENWNEERDSDALDTLLTTVREDSRGQRSELLGVTPSSFLLKRDDIDEAHFVVRNSTT RALYGFCSTYFFRASGTGVIGSLIVDPSRRKLSIGNSLHNRAIRTLLQRKGMKRFQLGSRLPGIYLGIPAANPVER

89

KQLRQWFANLGWNTALSRPVCSAVLRTLQTWQPPEGLVHSLQSAEVTYDLVHGWDYADSIIDHVKTNSRQ GVIDIYKVALGGAPHCGIIRARRPQDGAILGSVVIYNERASLAEHMPAMKATHASTGGISSPVISPSVGEYATL LQGLILLGIKQIRRQGAEAVIVDCVSVLHFWPLFPLLFHPPCMMHVLFFWGSARKQRTPARIPPCANHDPPC RWMRIAISTGXRSWALARCIALKRLTVMLRHGRWCRGLDLSPLESKSLVRVTRXYFTIFLYTPIFFLPFGLSISVI YLSTSKRAMARWRSLTVGTVHIQXMISRDIYWVXFXICEAIPIRSEERRVGKECRSRWSPYHX

>TRINITY_DN17375_c0_g1_i1_6_F_GH61 GLIDGSSAPGTWASDELIANNNSWSTTIPTGIAAGNYVLRHEIIALHSAGNENGAQNYPQCFNLEITGGGSDA PSGVLGTELYTPTDEGILFNIYQPMESYPIPGPALYSGGSSGGSQPTSSAPATTATSSAVPTSAPTATATATTTT PPVTVTTTPPAETQVVVPSETAVPTTTSTPEPSETPSVPDDSTSLSDYFDSLSAEEFLSFLKETLSWLVTDKVHA RSLNXIDXVKRTTFFLLDFFFLSFDGQIGRA

90