Identification of the Chromosomal Origins of Replication (Oricrsi Oricrsii) in R
Total Page:16
File Type:pdf, Size:1020Kb
Identification of the Chromosomal Origins of Replication (oriCRSI oriCRSII) in R. sphaeroides 2.4.1 Tim Johnson, Randi Harbour, Kristina Hernandez, Lin Lin, and Madhusudan Choudhary Department of Biological Sciences, Sam Houston State University, Huntsville, Texas 77341 INTRODUCTION RESULTS AND DISCUSSION Rhodobacter sphaeroides belongs to the α-3 subdivision of the Proteobacteria. This organism is The advantage of the program used in this study is that it offers a progressive search and it also allows the metabolically versatile, and it grows under a variety of growth conditions, such as aerobic, semi-aerobic, processing of an entire genome at once whereas many currently available web-based programs only allow (a) and photosynthetic growth conditions. R. sphaeroides possesses a complex genome, which is comprised for a small number sequences, which is very time consuming and even providing limited information. of two chromosomes (CI and CII) and five endogenous plasmids (1). CI and CII are ~3.0Mbp and However, an alternative approach through ARTMIS used in this study further validated the result as ~0.9Mbp in size, respectively (2). Analysis of the R. sphaeroides genome reveals that genes for a wide shown in Figure 2. Furthermore, the efficacy and the accuracy of this program was tested using the entire variety of essential functions are dispersed between the two chromosomes. Recently, it has also been genomic sequences of Caulobacter cresentus and Sinirhizobium meliloti, and the program was able to demonstrated that CI and CII have been both essential and ancient partners within the R. sphaeroides identify all putative origins including the one which is biologically functional. genome since its separation from its ancestor lineage (3). The output of both search programs provided 3013 and 336 nucleotide sequence files of Chromosome I Unlike eukaryotes, prokaryotic cell lacks mitosis or mitosis like apparatus. The existence of multiple and Chromosome II, respectively. After matching the overlapping regions, there were a total 125 CI- and chromosomes in bacteria may require a well coordinated chromosomal replication and chromosomal 16 CII-specific sequences remained. Following through the protein database search, there were 37 CI- and segregation to distribute the chromosomes equally in the two daughter cells. Therefore, in order to 9 CII-specific sequences were chosen to be analyzed further. These regions of putative origins were then understand the process of DNA replication, the origin of chromosomal replication must first be further analyzed to determine the presence of known cis-elements to which the DnaA and other replication identified. The origin of replication (referred as oriC in E. coli) is the specific region in the chromosome proteins bind as shown in Figure 3. Each of the total 13 DNA regions (as shown in Table 1) along with where the DNA double helix will begin to denature allowing replication of the chromosome to initiate the 300 nucleotide upstream and the down stream sequences was then analyzed for 21 different conserved (4). This region varies 40-80 base pairs in length among different bacterial species, and usually remains boxes for oriC, DnaA, RepABC1, and RepABC2 (5). Many of these sequences contain 2 to 5 of these very AT rich (70 to 80%) as the bonds between adenine and thymine are more easily denatured than the conserved binding boxes as shown in Figure 3. Based on the %AT content and the number of binding bonds between guanine and cytosine. There are cis-elements located within and around this region, boxes, 13 possible origin of replication were identified in R. sphaeroides’ chromosomes. which are recognized by a set of proteins including DnaA, RepABC, and other proteins associated with Comparison of the genome sequences of Caulobacter crescentus and Rickettsia prowazekii revealed that chromosomal replication that bind to the specific DNA sequences in this region and facilitate the both species shared a conserved cluster of genes in the hemE-hemH region that overlapped the established initiation of the chromosomal replication. origin of replication in C. cresecentus and the putative origin of replication in R. prowazekii (6). The origin of replication of the S. meliloti chromosome has also been predicted as well as experimentally In order to identify the putative origins on CI and CII in R. sphaeroides, a silico-approach was employed confirmed to be approximately 400 kb from dnaA and adjacent to hemE (5). A putative origin of to search CI- and CII-specific genomic sequences both with variable sequence length and %GC replication of CI in R. sphaeroides is located ~40 kb from hemE but it remains uncertain until it will be composition. Two different computer programs, which search either overlapping or discrete segments of confirmed experimentally. Like R. sphaeroides, Vibrio cholerae possesses two chromosomes and the DNA sequence, were used to search the entire chromosome specific sequences. All the sequences of 50 origin of replication of the two chromosomes (oriCIvc and oriCIIvc), has been experimentally studied (7). to 100 nucleotides in length with >65% AT content were selected for further analysis. These sequences Thus, the identification of chromosomal origins in R. sphaeroides may further facilitate the mechanism of were then analyzed for the presence of cis-elements using the conserved consensus sequence found in chromosomal replication in bacterial species which possess multiple chromosomes. Sinorhizobium meliloti (5), which is closely related species to R. sphaeroides and which also belong to the α-3 subgroup of proteobacteria. METHODS Silico-approach for the identification of the origin of replication: To identify the chromosomal origin of replication in Rhodobacter sphaeroides 2.4.1, a computer program was designed in order to Chromosome I Chromosome II search the A-T rich regions within CI and CII sequence. Further, the sequence was analyzed for the ~69% GC ~69% GC presence of the consensus cis-elements which are necessary for the initiator proteins to start the replication. The algorithm was developed as such that it searches both variable nucleotide lengths (50- 100 nucleotide range) and varying %AT composition (65 % to 80%) in an overlapping and progressive manner as shown in Figure 1. The program was applied on each of the chromosomal sequence of R. sphaeroides in the fasta format, which were directly obtained from the NCBI server. For efficient use of memory and input-output loading, each sequence is analyzed sequentially in a a b Figure 3. DnaA and RepABC box biding sites for the origin of replication. a) A G+C content graph of a ~6kb region buffer. The analysis is performed by using the %AT calculation for each candidate sequence and then encompassing in the possible region for origin of replication in R. sphaeroides. b) The sequence of possible regions for checking if the nucleotide composition of the sequence is above a chosen threshold value. If a Figure 2. The G+C content and possible sites for origin of replication in CI and CII in R. sphaeroides 2.4.1 origin of replication. c) DnaA and RepABC biding sites that match the DnaA and RepABC box consensus sequences. d) sequence is shown to be above the chosen threshold value, it is then sent to the output data files. In (purple-below average; yellow-over average). a) G+C content and two possible sites for origin of The sequences of the putative DnaA and RepABC boxes. (* Biding sites for multiple box consensus sequences ) addition, ARTIMIS was also used to calculate the %GC composition within each of the discrete 120 replication in Chromosome I. b) G+C content and 9 possible sites for origin of replication in Chromosome nucleotide s long sequence along each of the two chromosomal sequences as shown in Figure 2. II. FUTURE WORKS Identification of the conserved DNA sequence boxes in the origin region: The sequences, All thirteen putative chromosomal origins of R. sphaeroides 2.4.1 will be cloned into the suicide vector (pLO1 or however, overlapped each other as was the nature of the program and as such had to be combined to Table 1. The possible regions for origin of replication in Chromosome I and Chromosome II pSUP202). The resulting recombinant plasmid will be tested biologically if one of these origins allow the suicide plasmid eliminate analyzing the same region twice. The assembled sequences were searched against the to autonomously replicate in R. sphaeroides. This work is currently in progress. protein database of the R. sphaeroides in order to identify if any of these sequences encode for the A+T content for A-T rich Locations protein. Finally, the remaining sequences were further analyzed using the DNADynamo to determine Coordinates Sequences (with A-T rich region marked as red) region TCGCATCGCCCCTCCCGCTTCGTTGAACATTTTGGCCGATTAAATTCATTTTTTTGCCGACCATCAACGTTTATTTTCTTTTTG whether they contain the consensus boxes as they were previously identified in the chromosomal 2380028-2380181 ATGAAGATTTCCAGATTTACTTTCAGTTTTTCCATGCTTATGCCTTGGAAACTGGCAGTTTCCCGTTGGC 69.32% CI origin of S. meliloti (5). The program was downloaded through the internet from the publically GGAGTGACTGAATGAAAGGCAACGATGTATCAATCATGAGATCGGAACATGAGTCTGCTCTCGAATAGAGTGAGATCAGG 1700865-1701165 ATTTAAGACAAAGTAAACATTTTTGGTATTCTTAAGTGATTGATTTTATTGAATAAATCAAGGGTGTCATATGGATTTGTTTT 72.58% CI REFERENCES available website. The program performs the searches both in forward and reverse complement TCTTAAGAAATCGTTTAATGATTGATTTATTGATTTATTAAGAAATGGATGAATCGAGATTTGATGTTCATGGTTCTTGAATG