<<

International Journal of Applied Science and Mathematics Volume 7, Issue 2, ISSN (Online): 2394-2894

On the Application of Space on DNA Hybridisation

Zigli David Delali 1*, Otoo Henry 2 and Agbenyeavu Vincent 3 1, 2, 3 Department of Mathematical Sciences, University of Mines and Technology, Tarkwa, Ghana.

Date of publication (dd/mm/yyyy): 04/05/2020 Abstract – This study constructed a special metric by doing a comparative study of two strings, human and chimpanzee DNA strand. The two cords were paralleled leading to the construction of a different metric. One DNA strand (human) is converted to another (chimpanzee) by employing the Levenshtein technique. The work then established the between the hybridized filament formed.

Keywords – Metric Space, , Hamming Distance, Hybridisation.

I. INTRODUCTION

In 1953 Rosalind Franklin [13] and her student R.G. Gosling published their crystal structure of DNA upon which our modern fashion of DNA is based “DNA is a helical structure” with “two co-axial molecules”. The co- axial having a common central axis molecule, shown (Figure 1) as green ribbons, refer to a chain of phosphate groups connected through sugar groups. The sugar groups are shown in blue. Each sugar group is connected to a base: A, T, G, or C. The four base pair are called Adenine (A), Guanine (G), Thymine (T), Cytosine (C) [13]. These bases spell outour genetic information; i.e., they form our DNA sequence.

In most cases, the base A pairs with the base T while the base G pairs with thebase C. Consequently, understanding the sequence of one strand of double stranded DNA is a way of understanding the series of both strands. However, the two strands making up DNA are examined in opposite directions [1], [2], [15].

Fig. 1. DNA helical structure.

We also know that there are two basic categories of nitrogenous bases: the purines (adenine [A] and guanine [G]), each with two fused rings, and the pyrimidines (cytosine [C], thymine [T]), each with a single ring [2].

Copyright © 2020 IJASM, All right reserved 70 International Journal of Applied Science and Mathematics Volume 7, Issue 2, ISSN (Online): 2394-2894

Furthermore, it is now widely accepted that RNA contains only A, G, C, and U (no T), whereas DNA contains only A, G, C, and T (no U) [3].

A string metric (also known as a string similarity metric or string distance function) is a metric that measures distance between two text strings for approximate string matching. String matching is a method to find out pattern from the specified input string. String matching algorithms are used to find the fit among between the pattern and specified string [9].

String metrics are used heavily in information integration and are currently used in areas including fraud detection, fingerprint analysis, plagiarism detection, merging, DNA, RNA analysis, image analysis, evidence- based machine learning, data deduplication, , incremental search, data integration, and semantic knowledge [14].

In this paper we have established and explained the application of Hamming and Levenshtein Distance on DNA hybridity.

II. PRELIMINARIES

1.1. Metric Space

2.1.1. Definition

Let X be a non-empty set. A metric on X, or distance function, associates toeachpair of elements x, y ∈ X a real number d (x, y) such that

(a) d (x, y) ≥ 0; and d (x, y) = 0, x = y (positive definite);

(b) d (x, y) = d (y, x) (symmetric);

(c) d (x, z) ≤ d (x, y) + d (y, z) (triangle inequality), [The triangle inequality since it reflects the geometrical fact that the length of one side of the triangle is less than or equal to the sum of the length of the other two side] since for all x, y and z in X. The d is said to be metric on X, (푋, 푑) is called a metric space and (푥, 푦) is reffered to as the distance between x and y. Usually, we say X is the metric space but if we want to specify the metric we write (푥, 푑) as the metric space.

2.1.2. Example

(a) Both R and C are metric spaces when endowed with the distance function 푑 푥, 푦 = 푥 − 푦 . We will often refer to “the metric space R”, without explicit mention of the metric.

(b) Let 퐹 ∶ 푅 → 푅 be any strictly monotonic function (say increasing). Then, R endowed with the distance function 푑(x, y) = 푓 푥 − 푓(푦) is a metric space. Symmetry and positivity are immediate. It only remains to verify the triangle inequality.

For x, y, z ∈ R, 푑 푥, 푦 = 푓 푥 − 푓(푦) ≤ 푓 푥 − 푓(푦) + 푓 푦 − 푓(푧) = 푑 푥, 푦 + 푑(푥, 푦).

2.1.3. Theorem

Every metric space is Hausdorff

2.2. String Metric

Copyright © 2020 IJASM, All right reserved 71 International Journal of Applied Science and Mathematics Volume 7, Issue 2, ISSN (Online): 2394-2894

2.2.1. Definition

String metrics are the class of textual based metrics resulting in a similarity or dissimilarity (distance) score between two text strings. A string metric provides a floating-point number indicating the level of similarity based on plain lexicographic match.

2.2.2. Example

Similarity between the strings orange and range can be considered to be much more than the string apple and orange by using string metrics.

2.3. Hamming Distance

2.3.1. Definition

Let A be an alphabet of symbols and C a subset of 푨풏, the set of words of length n over A. Let u = (풖ퟏ,…,풖풏) and v = (풗ퟏ,…,풗풏) be words in C. The Hamming distance d(u, v) is defined as the number of places in which u and v differ: that is, {i: 풖풊 ≠ 풗풊, i = 1,…,n}.

The Hamming distance satisfies d(u, v) ≥ 0 and d(u, v) = 0 if and only if u = v; d(u, v) = d(v, u) d(u, v) ≤ d(u, w) + d(w, v)

Hamming distance is thus a metric on C.

2.3.2. Theorem

풅−ퟏ If a code C of length n is chosen so that its minimum distance is d, then every message of length n with ퟐ or fewer errors can be corrected.

Proof.

Suppose c is the original code word that was sent and f is the word that arrives. Since f has no more than

풅−ퟏ 풅−ퟏ errors, we know that 푫 풇, 풄 ≤ suppose that there is a second codeword c' such that 푫 풇,풄′ ≤ ퟐ 푯 ퟐ 푯 풅−ퟏ 풅−ퟏ 풅−ퟏ 풅−ퟏ then we find that 푫 풄, 풄′ ≤ 푫 풄, 풇 + 푫 풇,풄′ ≤ + = ퟐ = 풅 − ퟏ a contradicti ퟐ 푯 푯 푯 ퟐ ퟐ ퟐ 풅−ퟏ -on to our minimum distance of d. Hence, c is the only code word that is within of f, and therefore must ퟐ be the code word that was sent.

2.4. Levenshtein Distance

2.4.1. Definition

푚𝑖푛 The Levenshtein distance between sequence x and y is given by 퐷퐿 푥, 푦 = 푆 𝑖푠 + 푑푠 + 푟푠 where the minimum is taken over all sequences S that turn x to y

2.4.2. Example

Let a code x = ΓΩλλδΩΓΓλδδ and code y = ΓΩδλδ ΓΩ Ω ΓΓλδ. Then we can get from x to y by the following process: x: Γ Ω λ λ δ Ω Γ Γ λ δ δ

Copyright © 2020 IJASM, All right reserved 72 International Journal of Applied Science and Mathematics Volume 7, Issue 2, ISSN (Online): 2394-2894

Replace λ: Γ Ω δ λ δ Ω Γ Γ λ δ δ

Insert Γ: Γ Ω δ λ δ Γ Ω Γ Γ λ δ δ

Insert Ω: Γ Ω δ λ δ Γ Ω Ω ΓΓ λ δ δ

Delete δ: Γ Ω δ λ δ Γ Ω Ω Γ Γ λ δ

We can check, by examining all possibilities with three or fewer operations, that the fewest number of insertions, deletions, and replacements to get us from x = Γ Ω λ λ δ Ω Γ Γ λ δ δ to y = Γ Ω δ λδ ΓΩ Ω ΓΓλδ is four, as seen above. Therefore 퐷퐿 푥, 푦 = 4.

2.5. At the DNA Level

There are four classes of gene mutation at the level of DNA: substitutions, deletions, insertions and. A substitutional mutation, also called a point mutation, changes a single nucleotide base pair inside DNA molecule. There are two purine bases, adenine (A) and guanine (G), and two pyrimidine bases, cytosine (C) and thymine (T). In the normal double-stranded DNA molecule, A on one strand is paired with T on the complementary strand, and C with G. Deletion mutations simply remove one or more base pairs, while insertion mutations add one or more base pairs [5], [8], [11].

Fig. 2. Gene mutation.

III. MAIN RESULT

3.1. Hybridization

As we have already seen, under ordinary situation, the DNA molecule is composed of two strands. These two strands are connected by hydrogen bonds, and collectively form the well-known double helix structure. When a solution containing DNA is heated, these hydrogen bonds disappear, and the two strands drift apart. This single stranded DNA is called denatured DNA (or, surprisingly enough single-stranded DNA). When the solution is cooled, hydrogen bonds form between matching bases in the strands. These bonds are formed in places where a match (or at least a partial match) exists. If these bonds begin to form in corresponding parts of two strands, they will quickly completely join and the double-helix will reappear. However, this is not guaranteed to happen. Bonds can form even between strands of different DNA molecules or strands of different length [10], [11], [12]. The genome is the sum overall of double stranded DNA present in the nucleus of the fertilized embryo, and that is subsequently duplicated in every other cell in the body [6]. Mammalian nuclear genomes are roughly three

Copyright © 2020 IJASM, All right reserved 73 International Journal of Applied Science and Mathematics Volume 7, Issue 2, ISSN (Online): 2394-2894 billion nucleotides (“letters”) in length, duplicated into pairs of chromosomes. Whereas the human genome is divided into 23 pairs of chromosomes, the chimpanzee and different wonderful apes carry 24 pairs [14].

Fig. 3. DNA hybridization.

Let us take short single-stranded chains of nucleotides, called DNA of chimpanzee that we have synthesized and add them to the heated DNA solution. Each DNA of chimpanzee is a known nucleotide sequence between 10 and 12 bases long [4]. Now, when the solution is cooled, the DNA of chimpanzee will stick to parts of the DNA of human which contain a DNA sequence complementary to that of the DNA of chimpanzee. The resulting composition is called hybrid DNA.

Fig. 4. Insertion and deletion errors.

This new string formed is the hybrid DNA. It has a Hamming distance of 7 and a Levenshtein 푚𝑖푛 distance,퐷퐿 ℎ, 푐 = 푆 𝑖푠 + 푑푠 + 푟푠 = 5. This is minimum insertions, deletions and replacement taken over all sequences in Fig. 4, that turn human DNA to Chimpanzee DNA.

IV. CONCLUSION

In this study, we tried to make a comparative study and analysis on two algorithms of string metric, human and chimpanzee DNA strand. We have also implemented the two algorithm techniques, namely - Hamming and Levenshtein distance on the DNA hybridisation and studied their behavior.

Copyright © 2020 IJASM, All right reserved 74 International Journal of Applied Science and Mathematics Volume 7, Issue 2, ISSN (Online): 2394-2894

REFERENCES

[1] Alberts, B., Bray, D., Hopkin, K., Johnson, A.D., Lewis, J., Raff, M., Roberts, K. and Walter, P., (2013). Essential cell biology. Garland Science. [2] An Introduction to DNA and Chromosomes (Text and Audio) [www Document], 2011. HOPES Huntington’s Disease Information. URL https://hopes.stanford.edu/an-introduction-to-dna-and-chromosomes-text-and-audio/ (accessed 4.19.20). [3] Darcy, I.K., Levene, S.D. and Scharein, R.G., (2014). Introduction to DNA Topology. In Discrete and Topological Models in Molecular Biology (pp. 327-345). Springer, Berlin, Heidelberg. [4] DNA Basics 1 [www Document], n.d. URL http: //employees.csbsju.edu/ hjakubowski/classes/ SrSemMedEthics/Human%20Genome %20Project/DNA1.html (accessed 4.19.20). [5] DNA is constantly changing through the process of mutation | Learn Science at Scitable [www Document], n.d. URL https:// www.nature.com/scitable/topicpage/dna-is-constantly-changing-through-the-process-6524898/ (accessed 4.19.20). [6] Genomic comparisons of humans and chimpanzees [www Document], n.d. research gate. URL https://www.researchgate.net/ publication/228173328_Genomic_Comparisons_of_Humans_and_Chimpanzees (accessed 4.19.20). [7] Hamming distance - encyclopedia of mathematics [www Document], n.d. URL https://www.encyclopediaofmath.org/index.php/ Hamming_distance (accessed 3.4.20). [8] Johnston, M.O., (2001). Mutations and new variation: Overview. e LS. [9] Levenshtein distance, 2020. Wikipedia. [10] Maiti, P.K., Pascal, T.A., Vaidehi, N., Heo, J., Goddard, W.A., 2006. Atomic-Level Simulations of Seeman DNA Nanostructures: The Paranemic Crossover in Salt Solution. Bio-phys. J 90, 1463–1479. https://doi.org/10.1529/biophysj.105.064733 [11] Micheal, O., Seacoid, (2006). Undergraduate Mathematics, Metric space, springer, USA. [12] Pray, L., (2008). Discovery of DNA structure and function: Watson and Crick. Nature Education, 1(1), p.100. [13] Rosalind Franklin :: DNA from the Beginning [WWW Document], n.d. URL http://www.dnaftb.org/19/bio-3.html (accessed 4.19.20). [14] Shamir, R., (1999). Algorithms for Molecular Biology-Lecture 10. [15] Sheikh, R.H., Raghuwanshi, M.M. and Jaiswal, A.N., (2008). Genetic algorithm based clustering: a survey. In 2008 First International Conference on Emerging Trends in Engineering and Technology (pp. 314-319). IEEE. AUTHOR’S PROFILE

First Author Zigli David Delali is a Lecturer in the Department of Mathematical Sciences, University of Mines and Technology, Ghana. His research interest lies in Algebraic Topology and more specifically Fundamental Groups.

Second Author Dr. Henry Otoo is a Lecturer in the Department of Mathematical Sciences, University of Mines and Technology, Ghana.

Third Author Agbenyeavu Vincent (2020), is a student in the Department of Mathematical Sciences, University of Mines and Technology, Ghana.

Copyright © 2020 IJASM, All right reserved 75