Drawing DNA Sequence Networks

Drawing DNA Sequence Networks

Drawing DNA Sequence Networks Julia Olivieri Motivation Datasets Caryophyllaceae (16) Castilleja (19) Cactaceae (82) Arenaria congesta Castilleja applegatei Mammillaria ??? Datasets Caryophyllaceae Castilleja Cactaceae J R Bennett and S Mathews 2006 S Fior et. al. 2006 R Nyffeler 2002 Method for Comparing Sequences Alignment cost, aij σ = Substitution penalty g = gap penalty The Problem D: S P aij = dD(i),D(j) Ideally Ideal isn’t always possible 1. AAAAA 2. ATATA 3. TTTTT 4. AAATT Quadratic Assignment Problem QAP: n facilities n locations to place those facilities Flow between every pair of facilities Our Problem: n sequences n locations to place those sequences Nucleotide sequence similarity between every pair of sequences Ordering Cost ∈ a ≤ a s S ls = u1, …, un s,u_i s,u_(i+1) L = v , …, v d ≤ d |S| = n s 1 n D(s),D(v_i) D(s),D(v_(i+1)) u i vj Point Placement Heuristic: Random Assignment ● Choose random drawings and save the drawing with the lowest cost ● r is the number of runs (r = 10,000) ● Assuming one optimal solution, the chance of finding it is: ● If |P| = 10 and r = 100,000, chance of finding optimum is 2.72% Heuristic: Greedy Assignment Create a drawing by assigning sequences to points one at a time, each time to the lowest-cost point by the Euclidean cost Ordering method Heuristic: Hill Climbing 2-swap: 1 1 4 3 One run: 2 2 Find the minimum-cost 2-swap 3 4 For n = 16: 120 2-swaps 1120 3-swaps Heuristic: Simulated Annealing Temperature T > 1, ratio q ∈ (0,1), runs r If new < cur, D = D’ Else, choose b ∈ (0,1) 2-swap D D’ If b < e(cur – new)/T , D = D’ cur new If not, D does not change T = qT Continue until T < 1 Random: 1 run Greedy: ordered Hill climbing Random: 10,000 runs Greedy: unordered Simulated annealing Results: Caryophyllaceae Random: 1 run Greedy: ordered Hill climbing Castilleja Random: 10,000 runs Greedy: unordered Simulated annealing Results: Random: 1 run Greedy: ordered Hill climbing Random: 10,000 runs Greedy: unordered Simulated annealing Results: Cactaceae Results Table Time Crunch: 1000 seconds 175432 99412 Time Crunch: 60 seconds 120136 Further Directions ● Larger datasets ● Combining heuristics ● Figuring out the dimension of graphs ● Continuous technique ● Biological interpretations of images Acknowledgements Math Department Bob Bosch Friends and family Audience! Works Cited [1] Rodney J. Dyer and John D. Nason. Population graphs: the graph theoretic shape of genetic structure. Molecular Ecology, 13:1713–1727, 2004. [2] Ashesh Nandy, Marissa Harle, and Subhash C. Basak. Mathematical descriptors of DNA sequences: development and applications. Archive of Organic Chemistry, 9:211– 238, 2006. [3] Jonathan R. Bennett and Sarah Mathews. Phylogeny of the parasitic plant family Orobanchaceae inferred from phytochrome A. American Journal of Botany, 93(7):1039–1051, 2006. [4] Simone Fior, Per Ola Karis, Gabriele Casazza, Luigi Minuto, and Francesco Sala. Molecular phylogeny of the Caryophyllaceae (Caryophyllales) inferred from chloroplast matK and nuclear rDNA ITS sequences. American Journal of Botany, 93(3):399–411, 2006. [5] Reto Nyffeler. Phylogenetic relationships in the cactus family (Cactaceae) based on evidence from trnK/matK and trnL-trnF sequences. American Journal of Botany, 89(2):312–326, 2002. [6] Harold William Rickett. Wild Flowers of the United States, volume 1. Hinkhouse Inc, New York, New York, 1966. Works Cited [7] Samuel F. Brockington, Ya Yang, Fernando Gandia-Herrero, Sarah Covshoff, Ju- lian M. Hibberd, Rowan F. Sage, Gane K. S. Wong, Michael J. Moore, and Stephen A. Smith. Lineage-specific gene radiations underlie the evolution of novel betalain pig- mentation in Caryophyllales. New Phytologist, 207(4):1170–1180, 2015. [8] National Library of Medicine. BLAST , 2016. http://blast.ncbi.nlm.nih.gov /Blast.cgi. [9] István Miklós. Introduction to Algorithms in Bioinformatics. Budapest, Hungary, 2016. http://www.renyi.hu/ miklosi/AlgorithmsOfBioinformatics.pdf. [10] Eranda Cela. The Quadratic Assignment Problem: Theory and Algorithms. Springer Science+Business Media, B.V., Dordrecht, Holland, 1998. [11] Rainer E. Burkard. Handbook of Combinatorial Optimization. Springer Reference, Media, New York, 2013. [12] Zbigniew Michalewicz and David B. Fogel. How to Solve it: Modern Heuristics. Springer, Berlin, Germany, 2002. .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    24 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us