Hands-On Document for Code Reuse Matthieu J
Total Page:16
File Type:pdf, Size:1020Kb
Hands-on document for Code Reuse Matthieu J. Verstraete1;2 1 Physics Dept. University of York, Heslington, YO10 5DD, United Kingdom 2 European Theoretical Spectroscopical Facility (ETSF) This document describes and accompanies the hands-on session on code reuse, contained in the CECAM workshop on development and maintenance of software (February 2008). The aim of the hands-on is to become familiar with the concepts and practice of modularizing and reusing code efficiently and systematically. 1 1 Introduction This short hands-on session will deal with the basics of code reuse, i.e. the encapsulation of source code in subroutines, modules, or libraries, in order to use it efficiently several times in a given program, without having to copy code or duplicate features. The examples used will be very simple, and students are asked to resist immediately writing modularized code (as their instincts should rightly push them to!), in order to see the evolution of the code structure over the whole exercise. Any coding language may be used, as long as the hands-on- er masters it sufficiently well to use loops and basic maths, and can call linear algebra functions from it (C or fortran are recommended). In the following I will refer to the F90 source, but a C translation is provided as well. 2 Concepts This section reviews very briefly some of the concepts presented in the lecture on code reuse. This is only a refresher, and not much detail will be provided. Most users with some programming experience will find much of this familiar (if not tedious!). Code reuse: copying or calling a single portion of source code several times in one program or between independent programs. This avoids duplication of effort, allows for debugging or implementing features only once, improves maintainability, and gives the code overall structure, simplifying the conceptual structure as well. Module: the general definition of a module is a portion of source code (subroutine, function, or collection of these, eventually with data) which has some degree of independence. Interface /API: the module interface is the calling sequence, with argu- ments etc... for the subroutines which are callable, as well as the data which is directly accessible, by the outside world. It is very important that this interface be as clear and general as possible. The more generic it is the easier it will be to debug, and especially reuse, in other circumstances or by someone else. Module independence: there are a number of criteria defining how in- dependent a module is. The simplest is if it can be compiled alone. This is physical modularity. Other criteria include conceptual modularity, if all routines and data related to a topic or concept are bundled together. On the more purely software side, modularity can be defined by the small size of the module, or the minimal interface it has with the environment (calling program). These differ- ent criteria can in general not all be maximized simultaneously, and one must choose, as a function of specific needs and usage of the module, to favor the most appropriate one. Module interdependence: in a program different modules will necessarily interact with each other. This interaction can be implemented in good or bad ways, depending on how modular the code remains, how easy it is to maintain or expand or improve, etc... The coupling between modules can occur through 2 passing of data in a function call (always necessary, and well controlled in prin- ciple), through the use of common data (external to both using modules), or through the passing of flags for the conditional execution of different functions inside a module (need to know about the internals), known as control coupling. The two final forms of interdependence are usually worse for code reusability, but sometimes unavoidable: external coupling which requires the use of the in- ternals of another module (the calling code needs to know about module A, which itself needs to know about module B to function correctly), and content coupling which is the spreading of conceptually related objects between different modules (e.g. the routine for initialization of an object is in module A whereas the routine to write it is in B, or even worse the routine to destroy it!). Libraries: are packaged modules or sets of modules on related topics, usu- ally precompiled and/or optimized. Ideally they can be used as black boxes and represent an optimum in code reuse, being totally external to the program being written. 3 Physical system and motivation A first program is provided, with a source file called tbchain f.F90 which calculates a few quantities for a tight binding chain of electrons, in periodic boundary conditions (see Fig. 1). The parameters are quite straightforward: Figure 1: Cartoon of 6 site tight binding chain with periodic boundary condi- tions. the onsite energy for an electron on an site (atom, orbital, whatever), , a hopping integral for passing between neighboring sites, β, and the number of sites, which is 6 in the figure. In the program, the hamiltonian is set up in matrix form, and a candidate eigenvector (a simple Bloch vector with given wavevector) is tested. The com- mutator of the Hamiltonian and the position operator is finally evaluated, and its effect on the eigenvector as well. Physically, one expects the commutator to 3 be proportional to the momentum operator: [H; x] = p (1) This is indeed the case here, as can be seen by plotting the result: @φ [H; x]φ = α (2) @x with a constant factor α. The hands-on-er should compile the program, run it, and use xmgrace or gnu- plot to represent the data saved in files eigvec1.dat and commuteigvec1.dat, for φ and [H; x]φ respectively (note: the plot must be rescaled to see the data for the central interesting region). 4 Back to reality This is, of course, just a pretext. The aim of this tutorial is to modularize the program, which is initially written in a single block of code. The students should 1. examine the program and ensure they understand it, 2. identify the parts which can be profitably reused, 3. encapsulate them in subroutines and in a separate module, 4. and finally isolate a minimal main program. In a word, clean my code. 5 Questions about the resulting programs and modules The following questions should be considered during the hands on and once the programming part is finished: • How much of the program can be modularized? • How much of the program can be usefully modularized? • Are there (dis)advantages to putting code into a module if it is used only once? • What are the relations between the different modules (in the general sense of subroutines-functions-fortran .mod) that you have created? • Can you further reduce the coupling between modules, or make several isolated sub-modules? • How much of this could be reused in a completely different context/program? 4 6 Extra credit: Use library calls • Use linear algebra libraries (BLAS) to replace the most time consuming subroutines. • Study the scaling of the normal and BLAS routines. On a single processor running with up to 1000 sites takes less than a minute in either case. For larger systems, it's up to your skill. • What is the relation between this case study and the modern theory of polarization? My results for the scaling of computing time wrt the number of sites is shown in Fig. 2. The BLAS call is always faster, and even scales better than O(N3)- which should be the limiting behavior - up to several thousand sites. Figure 2: Performance of TB routine with and without use of BLAS linear algebra for matrix operations 5.