Drawing DNA Sequence Networks
Total Page:16
File Type:pdf, Size:1020Kb
Drawing DNA Sequence Networks Julia Olivieri Motivation Datasets Caryophyllaceae (16) Castilleja (19) Cactaceae (82) Arenaria congesta Castilleja applegatei Mammillaria ??? Datasets Caryophyllaceae Castilleja Cactaceae J R Bennett and S Mathews 2006 S Fior et. al. 2006 R Nyffeler 2002 Method for Comparing Sequences Alignment cost, aij σ = Substitution penalty g = gap penalty The Problem D: S P aij = dD(i),D(j) Ideally Ideal isn’t always possible 1. AAAAA 2. ATATA 3. TTTTT 4. AAATT Quadratic Assignment Problem QAP: n facilities n locations to place those facilities Flow between every pair of facilities Our Problem: n sequences n locations to place those sequences Nucleotide sequence similarity between every pair of sequences Ordering Cost ∈ a ≤ a s S ls = u1, …, un s,u_i s,u_(i+1) L = v , …, v d ≤ d |S| = n s 1 n D(s),D(v_i) D(s),D(v_(i+1)) u i vj Point Placement Heuristic: Random Assignment ● Choose random drawings and save the drawing with the lowest cost ● r is the number of runs (r = 10,000) ● Assuming one optimal solution, the chance of finding it is: ● If |P| = 10 and r = 100,000, chance of finding optimum is 2.72% Heuristic: Greedy Assignment Create a drawing by assigning sequences to points one at a time, each time to the lowest-cost point by the Euclidean cost Ordering method Heuristic: Hill Climbing 2-swap: 1 1 4 3 One run: 2 2 Find the minimum-cost 2-swap 3 4 For n = 16: 120 2-swaps 1120 3-swaps Heuristic: Simulated Annealing Temperature T > 1, ratio q ∈ (0,1), runs r If new < cur, D = D’ Else, choose b ∈ (0,1) 2-swap D D’ If b < e(cur – new)/T , D = D’ cur new If not, D does not change T = qT Continue until T < 1 Random: 1 run Greedy: ordered Hill climbing Random: 10,000 runs Greedy: unordered Simulated annealing Results: Caryophyllaceae Random: 1 run Greedy: ordered Hill climbing Castilleja Random: 10,000 runs Greedy: unordered Simulated annealing Results: Random: 1 run Greedy: ordered Hill climbing Random: 10,000 runs Greedy: unordered Simulated annealing Results: Cactaceae Results Table Time Crunch: 1000 seconds 175432 99412 Time Crunch: 60 seconds 120136 Further Directions ● Larger datasets ● Combining heuristics ● Figuring out the dimension of graphs ● Continuous technique ● Biological interpretations of images Acknowledgements Math Department Bob Bosch Friends and family Audience! Works Cited [1] Rodney J. Dyer and John D. Nason. Population graphs: the graph theoretic shape of genetic structure. Molecular Ecology, 13:1713–1727, 2004. [2] Ashesh Nandy, Marissa Harle, and Subhash C. Basak. Mathematical descriptors of DNA sequences: development and applications. Archive of Organic Chemistry, 9:211– 238, 2006. [3] Jonathan R. Bennett and Sarah Mathews. Phylogeny of the parasitic plant family Orobanchaceae inferred from phytochrome A. American Journal of Botany, 93(7):1039–1051, 2006. [4] Simone Fior, Per Ola Karis, Gabriele Casazza, Luigi Minuto, and Francesco Sala. Molecular phylogeny of the Caryophyllaceae (Caryophyllales) inferred from chloroplast matK and nuclear rDNA ITS sequences. American Journal of Botany, 93(3):399–411, 2006. [5] Reto Nyffeler. Phylogenetic relationships in the cactus family (Cactaceae) based on evidence from trnK/matK and trnL-trnF sequences. American Journal of Botany, 89(2):312–326, 2002. [6] Harold William Rickett. Wild Flowers of the United States, volume 1. Hinkhouse Inc, New York, New York, 1966. Works Cited [7] Samuel F. Brockington, Ya Yang, Fernando Gandia-Herrero, Sarah Covshoff, Ju- lian M. Hibberd, Rowan F. Sage, Gane K. S. Wong, Michael J. Moore, and Stephen A. Smith. Lineage-specific gene radiations underlie the evolution of novel betalain pig- mentation in Caryophyllales. New Phytologist, 207(4):1170–1180, 2015. [8] National Library of Medicine. BLAST , 2016. http://blast.ncbi.nlm.nih.gov /Blast.cgi. [9] István Miklós. Introduction to Algorithms in Bioinformatics. Budapest, Hungary, 2016. http://www.renyi.hu/ miklosi/AlgorithmsOfBioinformatics.pdf. [10] Eranda Cela. The Quadratic Assignment Problem: Theory and Algorithms. Springer Science+Business Media, B.V., Dordrecht, Holland, 1998. [11] Rainer E. Burkard. Handbook of Combinatorial Optimization. Springer Reference, Media, New York, 2013. [12] Zbigniew Michalewicz and David B. Fogel. How to Solve it: Modern Heuristics. Springer, Berlin, Germany, 2002. .