White Paper White paper on CLC read mapper October 10, 2012 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com
[email protected] White paper: White paper on CLC read mapper 2 Contents 1 Abstract 3 2 Introduction 3 3 Performance Benchmark of CLC Genomics Workbench/Server 5 4 Computational Resource Requirements of CLC Assembly Cell 9 5 Accuracy Benchmark of CLC Assembly Cell 12 6 Algorithmic Details 18 7 Conclusions 19 8 Materials & Methods 19 8.1 Performance Benchmark of CLC Genomics Workbench/Server ........... 19 8.2 Computational Resource Requirements of CLC Assembly Cell ............ 20 8.3 Accuracy Benchmark of CLC Assembly Cell ...................... 20 8.4 Read Mappers ..................................... 22 8.5 Data Sets ....................................... 22 8.6 Benchmarking Hardware ............................... 23 8.7 bash related ...................................... 23 References 25 White paper: White paper on CLC read mapper 3 1 Abstract After decades of biological research relying on Sanger sequencing, massively parallel high throughput sequencing (HTS) techniques have created a broad range of new and exciting research applications by increasing the output sequencing data dramatically. Although allowing for hitherto unseen DNA and RNA sequencing and binding study designs, HTS technologies have created remarkable bioinformatic challenges. In typical resequencing studies read mappers are used to align sequence reads to reference genomes. Read mapping, while not too time consuming in the Sanger-century, quickly evolved into one of the most insistent bottlenecks. Nowadays, researchers can resort to a broad range of read mapping solutions and it becomes increasingly challenging to choose software that optimally meets given requirements.