Matrix Computations in Basic on a Microcomputer*
Total Page:16
File Type:pdf, Size:1020Kb
Matrix Computations in Basic on a Microcomputer* NICHOLAS J. HIGHAM. GIMA Department of Mathematics, University of Manchester We consid er the effi cient implementation of matrix computations in standard BBC Microcomputer and the sse with a Torch interpreted Basic on a microcomputer. Linear equations routines SGEFA and SGESL from the U NPACK library ofFor tran programs are translated Z-80 second processor (we shall regard the last two into Basic and run on four microcomputers: the Commodore 64, the configurations as different machines). All the machines Amstrad CPC464, the BBC Microcomputer and the BBC with a Z-80 were used in their standard interpreted Basic program second processor. The computational cost of the routines is found to be ming environment (in the case of the sse with Z-80 dominated by subscripting calculations rather than by floating point second processor we used the Z-80 version of sse arithmetic. For the BBC Microcomputer and the Commodore 64, the BLAS routines which constitute the inner loops of SGEFA and SGESL are Basic, "BBCBASIC Z80"); in addition the Commodore 64 coded in assembly language; speed increases of factors 2·8 (ssc) and 5·3 was used with the Basic-related Coma! programming (Commodore 64) accrue, and the improved execution times are com language. For details of Basic the reader is recommended parable to ones which have been quoted for the more powerful and to consult Lientz, 6 Alcock, 7 ANSI, 8 Brown, 9 Genz and expensive IBM PC running under a Fortran compiler. The com Hopkins10 and Kemeny and Kurtz. 11 Good references BLAS putational cost of the routines using coded is found to be 12 13 dominated by floating point arithmetic, subscripting calculations and for Coma! are Atherton and Lindsay. The technical other overheads having been reduced to a negligible level, and it is specifications of the four machines and of their particular concluded that these hybrid Basic/assembly language routines extract language implementations are described in Higham. 14 near optimum performance from their host machines. Our findings are Here we define two terms that will be used frequently shown to be applicable to any matrix routine wh ose computational cost can be measured in "fl ops." in the following sections. Machine code (or machine language) is the collection of instructions that a micro processor recognises and can execute as fundamental 1. Introduction 1 operations. To the microprocessor, a machine code STEWART describes his experiences in implementing a instruction is simply a binary bit pattern that specifies an linear equations solver on three hand-held calculators. action to be performed. Assembly language is a low level His routine for the Hewlett Packard HP-41C, coded in language bearing a one to one relationship to machine the machine's low level programming language, solved a code; it allows the use of mnemonics to refer to machine system of linear equations of order 10 in 250 seconds. 2 code instructions, and symbolic names (or labels) to refer Dongarra gives a list of the times taken by various to numeric values and addresses. The translation from micro-, mini- and mainframe computers to solve a linear assembly language to machine code is carried out by an system of order 100 using standard linear equations assembler. Programming in assembly language is easier, software written in Fortran. The timings include one for less prone to error, and much less tedious than pro the IBM PC microcomputer: this machine solved the gramming in machine code. 100 x 100 problem in 20 minutes. In sections 2 and 3 we describe the development of For several years I have used in my research the efficient hybrid Basic/assembly language translations of Commodore Pet and Commodore 64 micro two standard Fortran subroutines for solving systems 3 4 5 computers, · · which in terms of cost and computing of linear equations. Section 4 presents and analyses the power lie between the hand-held calculators and the results of timing experiments carried out on the four test more powerful microcomputers such as the IBM PC. machines using the hybrid routines and, for comparison, 1 Unlike the calculators used by Stewart my microcom the equivalent purely Basic versions. puters run a high level programming language, Basic, but In section 5 we introduce a set of benchmarks for they are not equipped to run Fortran, the language of interpreted Basics and apply them to the four test choice for scientific computation on large computers. machines. The results obtained are used to gain insight Consideration of the papers of Stewart and Dongarra into the results of section 4. Finally, in section 6 we led me to ask the following questions. summarise our findings in relation to questions (1.1), (1.2) (1.1 ) How should algorithms for matrix computations be and (1.3). implemented on a microcomputer in order to make The view taken in this work is that one wishes to use the best possible use of the machine's processing the fastest and most accurate special-purpose algorithms power, if Basic is the only available high-level available for solving on a microcomputer the problem at language? hand (cf K. Stewart). 15 This is the view that is. naturally (1.2) What will be the dominant computational costs in taken by a numerical analysis researcher who uses a implementations that answer question (1.1)? microcomputer as a more convenient, easy-to-use (1.3) How can one make use of the rich supply of high substitute for a mainframe computer. An alternative 1 6 1 7 quality Fortran software when coding algorithms approach, taken by Nash, • is to develop compact, in Basic? versatile routines for small computers that are easy to These questions are investigated in this paper. implement and to maintain, and that can be used to soiv:e In this work we experimented with four microcom a variety of computational problems; some loss of effici puters: the Commodore 64, the Amstrad CPC 464, the ency is-accepted in return for the economies achieved. We * The views expressed in this paper are not necessarily those of the believe that our findings concerning the efficiency of Institute. interpreted Basic programs could usefully be employed Ja11uar yjFebruary, 1986, Vo lume 22 13 in enhancing the efficiency of the compact routines, such as is done in some Fortran implementations of UNPACK as those in Nash, 1 7 albeit with loss of machine (references 20 and 18, p. 1.23). independence. An alternative approach is to write the BLAS m assembly language; the BLAS calls can then be replaced by machine-specific Basic statements that pa~s contr~l 2. Translating two UNPACK subroutines into Basic to the specially written machine code routmes. Thts To investigate questions (1.1), (1.2) and (1.3), and to approach promises to achieve the dual aim of increased enable us to compare our experiments with those of efficiency, since machine code generally runs much faster Stewart and Dongarra, we decided to focus on the than interpreted Basic code and the bulk of the com problem of solving a system of linear equations putation in SGEFA is done inside the BLAS. In fact it is true probably the most fundamental and widely occurring for most of the UNPACK routines that if the total number problem in numerical linear algebra. We took as our of assignments, array element references and floating starting point the routines SGEFA/ and SGESL in the point additions and multiplications is O(nq) (q = 2, 3), UNPACK library of Fortran programs for analysing and 1 then only O(nq - ) of these operations are performed solving linear systems. 18 SGEFA performs LU factorisa outside the BLAS. tion of a matrix A, using a column oriented version of We have tried both approaches towards translating Gaussian elimination with partial pivoting, and SGESL the BLAS. In section 4 we compare the performances of uses the factorisation to solve a linear system Ax = b programs based on the two approaches. But first, in the (reference 18, Ch. 1) . next section, we examine in detail the theoretical and the Consider the following outline of the factorisation practical aspects of coding the BLAS in assembly language algorithm used by SGEFA. Here A = (ai) is an n x n real for use with a Basic interpreter on a microcomputer. matrix. Algorithm 2.1 Fork= 1, ... , n-1 3. Assembly language BLAS (2.1) Find the smallest r?:; k such that 3.1. Theoretical gains in efficiency a,k = max { aik i = k, ... , I I I I: n} Before describing the details of coding the BLAS in Swap akk and a,k assembly language we first consider what we can hope to (2.2) Fori= k+ 1, ... , n achieve by using these special BLAS with an interpreted mik = -aiJ akk Basic. Endfor i One of the characteristics of the 6502 and Z-80 central For j = k+ 1, ... , n processing units (cPus) of our test machines is that their Swap akj and a,j instruction sets do not contain a multiply operation; (2.3) Fori= k+ 1, ... ,n therefore all four machines must carry out floating point aij aij mik akj = + * arithmetic in software. The four Basic interpreters con Endfor i tain individual collections of floating point arithmetic Endfor j subroutines and, under the reasonable assumption that Endfor k. these routines are efficiently coded, it is sensible to In the Fortran code SGEFA the loops (2.2) and (2.3), and attempt to make use of these routines in the assembly the search (2. 1), are executed by the Basic Linear Algebra language BLAS. In addition to simplifying the program Subprograms (BLAS).