700 the Gene. a Gene Is a Sequence of Nucleotides in a Molecule Of

[Frontiers in Bioscience, Landmark, 24, 700-711, March 1, 2019] Speckle-interferometry and speckle-correlometry of GB-speckles Onega V. Ulianova1, Sergey S. Zaytsev1, Yury V. Saltykov1, Anna Lyapina1, Irina Subbotina1, Nadezhda Filonova1, Sergey S. Ulyanov1, Valentina A. Feodorova1 1Federal Research Center for Virology and Microbiology, Branch in Saratov, Saratov, 410028, Russia TABLE OF CONTENTS 1. Abstract 2. Introduction 3. Transformation of sequence of nucleotides in gene-based speckle pattern 3.1. Algorithm of re-coding of a nucleotide sequence 3.2. Algorithm of generating of 2D speckle pattern, based on a nucleotide sequence 3.3. Generating of gene-based speckles 4. Comparison of GB-speckles, based on similar genovars: cross-correlation technique 5. Comparison of GB-speckles, based on similar С. trachomatis genovars of different subtypes: speckle-interferometry 6. Optical processing of GB-speckles: detection of genetic mutations in a gene of micro-organisms 7. Conclusions 8. Acknowledgements 9. References 1. ABSTRACT A new method of coding of genetic the gene. A gene is a sequence of nucleotides information using laser speckles has been in a molecule of nucleic acids. Typically, DNA developed. Specific technique of transforming consists of four types of nucleotides. These the nucleotide of gene into a speckle pattern nucleotides contain, correspondingly, four (gene-based speckles or GB-speckles) is nitrogen bases, namely, adenine (A), thymine suggested. Reference speckle patterns of (T), guanine (G) and cytosine (C). omp1 gene of typical wild strains of Chlamydia trachomatis of genovars D, E, F, G, and J are The nucleotide sequence can be generated. This is the first report in which determined using the specific procedure of perspectives of the proposed technique in sequencing (1, 2) which allows presenting the the bacterial gene identification and detection primary structure of analyzing a macromolecule of natural genetic mutations in bacteria as a in the form of linear sequence of monomers in single nucleotide polymorphism (SNP) are a text format. demonstrated. The usage of GB-speckles can be viewed as the next step on the way to the All arbitrary sequences of nucleotides era of digital biology. can be synthetized artificially, even if such sequence did not exist in nature before. This 2. INTRODUCTION unique approach has been developed in pioneer investigations, which have been conducted As it is well-known, the nucleic acids recently by Microsoft Corporation (Microsoft (DNA or RNA) transmit and store the genetic Research). Papers (3, 4) are dedicated to the information in living organisms. The structural recording and storage of big numerical data and functional unit of hereditary information is using genetic methods and the development 700 Bacterial typing based on GB-speckles of DNA-based archival storage system. The Biology Will Reinvent Nature and Ourselves and above-mentioned studies devoted to the stored it on DNA-carrier (12). Later, in 2013, the recording of digital information using synthetic European Bioinformatics Institute copied 739 strands of DNA are extremely important from kilobytes of sound, images, and text, including the viewpoint of long-time storage of big data. a 26-second audio clip of Martin Luther King, “I Have a Dream” speech. More recently, Harvard In the opinion of experts (5), we are Medical School and a Technicolor research currently facing a serious risk of data storage group reported storing and retrieving back crisis. Thus, for example, the total amount of 22 Mb of information that included a French digital data generated in 2013 amounted to 3.5 silent film A Trip to the Moon (13). zettabytes (Zb, 1 Zb = 1021 bytes), and 92 percent of these data in the world was generated only However, the real essential progress in 2012-2013. The exact information about the has been achieved in very recent research volume of data generated since 2014 up to published in Ref. (14) seria of images (five present time has not been found in the open sequential frames from a digital movie) sources by the authors of this paper, but it is have been archived in the genome of living expected (5) the world will generate 40 Zb of bacteria, keeping the ability of reproduction. data annually by 2020. It will not be possible, It is also expedient to mention the in principle, to provide the storage of such big papers (15, 16, 17, 18) directly referring to the data using traditional data media. However, problem of encoding numerical data using DNA as mentioned, such a problem can be easily carries of information. resolved using data storage on DNA-carriers. Using synthetic DNA for the storage of numerical So, at the moment, the main trend information makes it possible to achieve and the most perspective approach to the potential information density of 109 Gb/mm3 at storage of big data is the usage of DNA-based the potential durability of thousands of years (6). carriers. However, at present time, genetic Thus, the researchers of Microsoft Corporation data themselves (including the big data) are (Microsoft Research) in co-operation with sequenced and stored then atavistically in scientists from University of Washington have the text or numerical format. Definitely, the stored more than 200 Mb numerical data in the DNA sequencing is absolutely necessary for form of DNA (4, 6, 7, 8, 9). In particular, these obtaining and pre-processing primary genetic data contain the encoding of high definition information. However, further processing of video (10), copies of the Universal Declaration very long nucleotide sequences is forestalling of Human Rights in different languages, the a serious problem in bioinformatics, especially top 100 books from Project Gutenberg, and when the critically fast growth of gene database the Crop Trust seed database (6). As it has is taken into consideration. The matter is been mentioned (9), the suggested algorithm of that finding similar or identical fragments in encoding a file containing digital data in DNA is nucleotide sequence of two different genes robust to outliers and high levels of noise and is requires a lot of computations, especially in characterized by higher accuracy in comparison the case of large size genes. In order to find to similar methods. either highly similar or identical fragments in the nucleotide sequences of two different Remarkably, those studies in the field genes being compared, the sequences need of encoding numerical data on synthetic strands to be analyzed in depth using several routine of DNA have also been performed by several steps, which, however, require considerable other scientific teams, in parallel with Microsoft time. In fact, it is necessary to conduct Corporation. Thus, in 2012, Harvard Medical additional repeated sequencing/re-sequencing School researchers encoded (11) in DNA a (sometimes up to 3-5 times) of the original digital full-text book Regenesis: How Synthetic sample with a repeated multi-stage computer 701 © 1996-2019 Bacterial typing based on GB-speckles analysis by several Genomic packages. Nevertheless, to the knowledge of This procedure is time-consuming (from a the authors of this paper, at the present time, few days to several weeks), requires costly interdisciplinary research in the field of coherent equipment, special facility and highly-qualified optics and molecular biology has not been personnel. However, it is critical to avoid a carried out. The authors of this paper tried to significant amount (up to 20 percent) of false fill that gap. Earlier, a nucleotide sequence of nucleotide polymorphisms at the sequencing omp1 gene of bacteria Chlamydia trachomatis stage when the primary nucleotide reads are (genovars D, E, F, G, J and K) has been being prepared. Moreover, in this case there successfully converted into 2D speckle-patterns. is a certain difficulty related to interpreting the A special term, GB-speckles (gene-based results obtained. Apparently, the algorithm speckles) has been introduced in Ref. (21, 22, 23) used could produce a serious inconvenience for the definition of a principally new class of and complicate significantly the study of the speckles. Using GB-speckles as a new excellent data even when small nucleotide sequences tool for diagnostics is the next step to the era (250-500 bp in size) are analyzed. Working of digital biology. GB-speckles possess unique with nucleotide sequences of individual genes statistical properties, habitual for the speckles of a larger size (1000-1300 kb and more) is with a small number of scatterers (24, 25, 26) or considered to be the greatest challenge despite so-called small-N speckles (27). The statistics of the availability of a large number of different GB-speckles has been particularly investigated in Ref. (22). As it has been shown in Ref. (21), sequencing strategies for extended DNA using such methods of speckle-optics, as fragments. Numerous errors of reading and speckle-correlometry, speckle-interferometry analysis are overcome by repeatedly reading and subtraction of speckle-images allows different fragments of the gene with the same defining the presence of natural mutations in package of multi-stage computer calculations. comparing strains even in the case of a single Working with such large nucleotide sequences SNP. It has been demonstrated in Ref. (21) obviously is 10-20 times more difficult and time- that the appearance of any type of mutations consuming than with the short ones (19). leads to the formation of system of interferential fringes in the interference pattern in the case of But, if nucleotide sequence is recorded using speckle-interferometric technique, which in analog format on diffraction optical element may serve as the basis for the operation of (DOE) or on hologram (HOE), then such optical processor of genetic information. The artificial optical element can be used in the optimization of the algorithm of encoding a design of optical processor (20). In other words, nucleotide sequence of bacteria C.

700 the Gene. a Gene Is a Sequence of Nucleotides in a Molecule Of

Exploring the Structure of Long Non-Coding Rnas, J

Upstream Sequences Other Than AAUAAA Are Required for Efficient Messenger RNA 3’-End Formation in Plants

POST-TRANSCRIPTIONAL REGULATION of AFP and Igm GENES

The Nucleotide Sequence of the Gene for Human Protein C (DNA Sequence Analysis/Vitamin K-Dependent Proteins/Blood Coagulation) DONALD C

Classification and Function of Small Open-Reading Frames Abstract

Eukaryotic and Prokaryotic Gene Structure Thomas Shafee*, Rohan Lowe

Gene Structure 1

Mrna Editing, Processing and Quality Control in Caenorhabditis Elegans

Computational Gene Finding

What Is a Gene, Post-ENCODE? History and Updated Definition

MODULE 5: ALTERNATIVE OPEN READING FRAME Module 5: Alternative Open Reading Frame

The Human Genome: Structure and Function of Genes and Chromosomes