Seqan Manual Release 1.4.2
Total Page:16
File Type:pdf, Size:1020Kb
SeqAn Manual Release 1.4.2 The SeqAn Team October 13, 2014 Contents 1 Table Of Contents 3 2 API Documentation 5 2.1 Tutorial..................................................5 437paragraph*.389 437paragraph*.390 437paragraph*.391 437paragraph*.392 437paragraph*.393 437paragraph*.394 438paragraph*.395 438paragraph*.396 438paragraph*.397 438paragraph*.398 438paragraph*.399 438paragraph*.400 439paragraph*.401 439paragraph*.402 439paragraph*.403 439paragraph*.404 440paragraph*.405 440paragraph*.406 2.2 How Tos................................................. 440 2.3 Infrastructure............................................... 536 2.4 Build Manual............................................... 542 2.5 SeqAn Style Guides........................................... 560 2.6 Glossary................................................. 597 2.7 References................................................ 598 Bibliography 601 i ii SeqAn Manual, Release 1.4.2 This is the manual of the SeqAn library. SeqAn is the C++ template library for the analysis of biological sequences. It contains algorithms and data structures for • string representation and their manipluation, • online and indexed string search, • efficient I/O of bioinformatics file formats, • sequence alignment, and • much more. Contents 1 SeqAn Manual, Release 1.4.2 2 Contents CHAPTER 1 Table Of Contents Tutorial Each tutorial takes 30 to 60 minutes of your time for learning how to use SeqAn. Jump right into using SeqAn using our tutorials! • Detailed installation descriptions are available in the Getting Started Tutorial. How Tos Use these short and target-oriented articles to solve common, specific problems. Build Manual These articles describe how to use the SeqAn build system and integrate into your Makefiles, for example. Infrastructure These pages describe the SeqAn infrastructure and are mostly interesting to SeqAn developers. SeqAn Style Guides Please follow these style guides for SeqAn library and application code. Glossary These pages contain definitions of various terms. 3 SeqAn Manual, Release 1.4.2 4 Chapter 1. Table Of Contents CHAPTER 2 API Documentation The API documentation can be found on here. 2.1 Tutorial 2.1.1 Getting Started This chapter gives you the necessary steps to get started with SeqAn: • Necessary Prerequisites • Installing SeqAn from Subversion • Creating a first build. • Creating your own first application. Use the following links to select your target operating system and IDE/build system. The bold items show the recom- mended build system for the given platforms. Linux using • Makefiles • Eclipse Mac Os X • Makefiles • Xcode Windows • Visual Studio Click “more” for details on the supported development platforms. Note: In-Depth Information: Supported OS, Build Systems, and Compilers The content of this box is meant as additional information. You do not need to understand it to use SeqAn or follow the tutorials. There are three degrees of freedom when selecting a SeqAn development platform. The degrees of free- dom are: 5 SeqAn Manual, Release 1.4.2 1. The operating system. We support Linux, Mac Os X and Windows. 2. The build system. This is partially orthogonal to the operating system, although each build system is only available on some platforms (e.g. Visual Studio is only supported on Windows). We use CMake to generate the actual build files and the build system maps to “CMake generators”. A CMake generator creates either build files for a build system (e.g. GNU Make) or a project file for an IDE (e.g. for Visual Studio 2008). 3. The compiler. This is partially orthogonal to the operating system and build system, although only some combinations of each are possible. For example, Visual Studio projects of a particular version can only use the Visual Studio compiler of the same version. The SeqAn team offers support for the following operating systems, build systems, and compilers: • Operating System: Linux, Mac Os X, Windows. • Build System: Makefiles, Visual Studio projects, XCode projects, Eclipse CDT projects. • Compilers: GNU g++ from version 4.1, LLVM/Clang from version 3.0, Visual C++ from Version 8. We are told that SeqAn also works on FreeBSD. It should work with all generators available in CMake that work with the supported compilers (e.g. the CodeBlocks generator will probably work as long as you use it on a operating system with a supported compiler, although we cannot offer any support for CodeBlocks). Relevant How-Tos Although slightly more advanced than “getting started”, the following How-Tos apply to setting up your build envi- ronment: • Using Parallel Build Directories • Installing Contribs On Windows • Integration with your own Build System ToC Contents • Getting Started With SeqAn On Linux Using Makefiles – Prerequisites – Install – A First Build – Hello World! – Further Steps Getting Started With SeqAn On Linux Using Makefiles This tutorial explains how to get started with SeqAn on Linux using Makefiles. We assume that you are using the Debian or a Debian-like Linux distributions such as Ubuntu. The only difference to other distributions is the name of the packages and the package management system in the Prerequisites section. It should be very simple for you to tailor these instructions to your requirements. 6 Chapter 2. API Documentation SeqAn Manual, Release 1.4.2 Prerequisites Use the following command line to install the required dependencies: the Subversion client, the GNU C++ compiler, CMake for the build system and the Python script interpreter for running helper scripts. ~ # sudo apt-get install subversion g++ cmake python The following command line installs optional dependencies: developer versions of zlib and libbzip2 (for compressed I/O support) and the Boost library (required by a few apps). ~ # sudo apt-get install zlib1g-dev libbz2-dev libboost-dev Install Now, go to the directory you want to keep your SeqAn install in (e.g. Development in your home folder). ~ # cd $HOME/Development Then, use Subversion to retrieve the current SeqAn trunk: Development # svn co https://github.com/seqan/seqan/branches/master seqan-trunk You can now find the whole tree with the SeqAn library and applications in $HOME/Development/seqan-trunk. A First Build Next, we will use CMake to create Makefiles for building the applications, demo programs (short: demos), and tests. For this, we create a separate folder seqan-trunk-build on the same level as the folder seqan-trunk. Development # mkdir seqan-trunk-build When using Makefiles, we have to create separate Makefiles for debug builds (including debug symbols with no optimization) and release builds (debug symbols are stripped, optimization is high). Thus, we create a subdirectory for each build type. We start with debug builds since this is best for learning: debug symbols are enabled and assertions are active. Warning: Compiling debug mode yields very slow binaries since optimizations are disabled. Compile your programs in release mode if you want to run them on large data sets. The reason for disabling optimizations in debug mode is that the compiler performs less inlining and does not optimize variables away. This way, debugging your programs in a debugger becomes much easier. Development # mkdir seqan-trunk-build/debug Development # cd seqan-trunk-build/debug The resulting directory structure will look as follows. ~/Development - seqan-trunk source directory - seqan-trunk-build - debug build directory with debug symbols Within the build directory debug, we call CMake to generate Makefiles in Debug mode. debug # cmake ../../seqan-trunk -DCMAKE_BUILD_TYPE=Debug We can then build one application, for example RazerS 2: debug # make razers2 Optionally, we could also use “make” instead of “make razers2”. However, this can take a long time and is not really necessary. 2.1. Tutorial 7 SeqAn Manual, Release 1.4.2 Hello World! Now, let us create a sandbox for you. This sandbox will be your local workspace and you might want to have it versionized on your own Subversion repository at a later point. All of your development will happen in your sandbox. We go back to the source directory and then use the SeqAn code generator to create a new sandbox. debug # cd ../../seqan-trunk seqan-trunk # ./util/bin/skel.py repository sandbox/my_sandbox Now that you have your own working space, we create a new application first_app. seqan-trunk # ./util/bin/skel.py app first_app sandbox/my_sandbox Details about the code generator are explained in Using The Code Generator. Now, we go back into the build directory and call CMake again: seqan-trunk # cd ../seqan-trunk-build/debug debug # cmake . Tip: When and where do you have to call CMake? CMake is a cross-platform tool for creating and updating build files (IDE projects or Makefiles). When you first create the build files, you can configure things such as the build mode or the type of the project files. Whenever you add a new application, a demo or a test or whenever you make changes to CMakeLists.txt you need to call CMake again. Since CMake remembers the settings you chose the first time you called CMake in a file named CMakeCache.txt, all you have to do is to switch to your debug or release build directory and call “cmake .” in there. ~ # cd $HOME/Development/seqan-trunk-build/debug debug # cmake . Do not try to call “cmake .” from within the seqan-trunk directory but only from your build directory. The step above creates the starting point for a real-world application, including an argument parser and several other things that are a bit too complicated to fit into the Getting Started