bioRxiv preprint doi: https://doi.org/10.1101/2021.06.03.446950; this version posted June 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Discovery, Article A fast, general synteny detection engine Joseph B. Ahrens1*, Kristen J. Wade1 and David D. Pollock1* 1Department of Biochemistry and Molecular Genetics, University of Colorado Denver, Anschutz Medical Campus, Denver, Colorado, USA * Corresponding authors Joseph B. Ahrens E-mail:
[email protected] David D. Pollock E-mail:
[email protected] 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.06.03.446950; this version posted June 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract The increasingly widespread availability of genomic data has created a growing need for fast, sensitive and scalable comparative analysis methods. A key aspect of comparative genomic analysis is the study of synteny, co-localized gene clusters shared among genomes due to descent from common ancestors. Synteny can provide unique insight into the origin, function, and evolution of genome architectures, but methods to identify syntenic patterns in genomic datasets are often inflexible and slow, and use diverse definitions of what counts as likely synteny. Moreover, the reliable identification of putatively syntenic regions (i.e., whether they are truly indicative of homology) with different lengths and signal to noise ratios can be difficult to quantify.