Handbook of Open Source Tools
Total Page:16
File Type:pdf, Size:1020Kb
Handbook of Open Source Tools Sandeep Koranne Handbook of Open Source Tools Sandeep Koranne SW Boeckman Road 8005 97070 Wilsonville Oregon USA [email protected] ISBN 978-1-4419-7718-2 e-ISBN 978-1-4419-7719-9 DOI 10.1007/978-1-4419-7719-9 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2010938855 © Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Preface The constant and speedy progress made by humankind in the industrial revolution, and more recently in the information technology era can be directly attributed to sharing of knowledge between various disciplines, reuse of the knowledge as sci- ence and technology advanced, and inclusion of this knowledge in the curriculum. The phrases “do not reinvent the wheel” and “to stand upon the shoulders of giants” come to our mind as representative of this thought process of using existing solu- tions and building upon existing knowledge, but at the same time contributing to the society as a whole. It was with this intention of documenting existing (circa 2010) Open Source Tools for Scientists and Engineers, that I set about to write this book. Computer technology has progressed at such a fast pace that it is difficult (nary impossible) to catalog all of the existing software systems which are available to us. To simplify our task I have chosen a representative software to solve a class of problem. Where space and time permitted I have provided alternatives as well. A key benefit of using open-source applications is that the code can be compiled on a system which is non-standard. Or, it can be compiled with CPU specific opti- mizations which a general purpose binary released from an ISV cannot assume. As CPU technology advances rapidly, and software has a longer lifespan, the ability to recompile the source code becomes more and more important. The same can be said with open-source implementations of data-standard in image processing, and docu- mentation retrieval. In this media focused era, more and more content is being stored as digital data. Unlike, paper, whose archival properties have been refined over cen- turies, digital media has not gone through the same process of archival management. In situations where data archival is necessary, a key component is the persistence of the key software components which read and write the digital data files. Since no one can predict the computers of the next century, open-source software is essential to long term archival of information. Rather than duplicate the fine documentation for each package, this book is orga- nized in sections related to solve a particular problem. This book should be treated as an “existential quantifier” 9, rather than 8, on the information provided for each task. Once the existence of a solution or software tool to address the problem is v vi Preface known, more details about the solution can be researched. Each software system or tool is presented in a simple to read manner describing the main problem the system addresses and the tasks performed. Each chapter is presented in a similar manner to ease referencing. Although many of the software mentioned in this book are routinely used in science and engineering tasks, more and more I have found that students and gen- eral practitioners from other fields, such as liberal arts, music, statistics, are using these in their work and study. This book contains references to artificial intelligence programs and tools which are being widely used in cognitive sciences. Many of the software tools use libraries and development tools which are themselves open- source; this synergy is representative of the open-source concept, and a key driver to its proliferation. As such, any large open-source software is a good learning ex- ample to study the use of its components. For example the GRASS GIS software (presented in Section 14.9.1) is itself developed using a number of libraries such as (i) PROJ4, (ii) HDF5, (iii) MySQL, (iv) FFTW, and many others. To learn how to use these libraries in a real world example, one only has to study the GRASS GIS source code. This is a key advantage of open-source tools. Another argument (made mostly in the context of mathematical proofs) stems from the scientific validity and acceptance of computer generated, or computer as- sisted proofs. For such proofs to be included as standard material, the software sys- tem used to arrive at the result must also be available to researchers, as well as its own correctness be verified. These goals are readily achieved by open-source math- ematical software as presented in Chapter 16. I present a short summary of the book contents: This book is divided in six parts. The first part describes the open-source operat- ing systems and user interaction as well as introspection tools. Chapter 1 includes a discussion on Bash shell, POSIX compliant libraries and open-source programming languages (including Erlang, Lua and Smalltalk). External utilities (such as tar, find and rsync) as well as OpenSSH are discussed to ease the users interaction with a modern GNU/Linux type operating system. Chapter 2 presents several text processing and document creation and management tools. These include OpenOf- fice and various LATEXprocessing tools. Software for Wiki management as well as graphical page layout are also described. Part II of this book focuses on the process of open-source software creation. For the reader who wants to know more about the methods and systems used by the authors to create the open-source tools, this part provides information on the GNU build system, version control, compilers, APIs and much more. In Chapter 3 I present the GNU Compiler Collection. Commonly used command-line options, pragmas, pre-processor defines as well as GCC intrinsics are explained. Examples of using GCC to compile Java and Ada are presented, as well as recent features of GCC including OpenMP support and C++ advice features. Source code version control with CVS and SVN is presented with the help of examples; GUI front-ends for version control (TkCVS) is presented and used in many examples. The GNU Build system is discussed with the help of examples in Section 3.3. GNU Make as Preface vii well as SConstruct are described in Section 3.4. Both GNU make and SConstruct are also used in many examples in the sequel of this book. Chapter 3 also contains a description of Bugzilla for defect tracking (Section 3.5) and a section on various editors and IDEs available on GNU/Linux for editing source code. Static code checking and analysis of source code is presented in Section 3.8, while the use of GNU debugger GDB is shown with examples in Section 3.9.1. Graphical front-ends to GDB (including GDB Insight) are shown in Section 3.9.2. Code optimization using profiling and cache measurement with GNU profiler and Valgrind is discussed with examples in Section 3.12. The C standard library and the C++ standard library (including STL) are discussed in Chapter 4. In Chapter 5 I describe the Apache Portable Runtime (APR) library. In particular APR memory pools, APR process library, APR thread and thread pool library, APR file information and memory mapping library are explained with the help of short examples. Advanced APR concepts, dealing with the use of Memcache library are also demonstrated. All examples have been compiled and run on GNU/Linux system running Fedora Core 12, and thus are known to work. Using the examples presented in this part of the book, alongwith the documentation for the library, simplifies the learning process of the API. Chapter 6 contains a description of the most useful parts of the Boost C++ API. For lack of space, I had to choose only a small portion of Boost for demonstra- tion. I depict the design and usage of Boost library with the help of examples in- cluding Boost smart pointer and memory pool, the Boost asynchronous IO (asio) framework, Boost data-structures. Boost Graph Library (BGL) is an almost com- plete graph representation library, which also includes an implementation of many graph algorithms. BGL is presented with the help of examples in Section 6.4. Boost multi-threading (like APR) is a portable and integrated (with C++) threading sys- tem. I have presented examples which the reader can contrast with APR threading and POSIX pthread examples presented in this book. Python language integration with C++ can be achieved using SWIG as well as using Boost Python integration framework, an example of which is presented in Section 6.7. Boost generic im- age processing library is presented in Section 6.8, while Boost parsing framework (SPIRIT) is presented in Section 13.4.1 of Chapter 13 on Compiler systems. Performance optimization of programs using Google perftools memory alloca- tion and profiler is shown in Chapter 7.