KOLUMNE 22 CHI MIA 47 (1993) Nr. II~ (lnnuor/Fcbru",)

COMPUTATIONAL

CHEMISTRY Column Editors: Prof. Dr. J. Weber, University of Geneva COLUMN Prof. Dr. H. Huber, University of Basel Dr. H. P. Weber, Sandoz AG, Basel

Clzilllia 47 (/993) 22-23 ond instruction on the first number at the © Neue Sclz\i'eizerisclze Clzelllische Gesellschaft same time as the first worker performs the /SSN 0009-4293 first instruction on the second number, etc .. For the first 20 steps no product is leaving your assembly line (no result is Your Speedup is a Megaflop! leaving your vector-processor) but after that you get a product (result) in each step, 'I just bought a new personal compu- Another common measure is the mil- i.e. after a total of 120 steps your loop is ter, the LAST, with a speed of 50 MHz, lions of instructions performed per sec- processed! You gain a factor of about 17, whereas your old MADOS has only a 25 ond (Mips). Again this is not a very useful i.e. if your scalar processor performs I MHz frequency'. 'You are joking, the measure for number crunching as differ- Mflops, your vector-processor yields 17 LAST makes only 2 Mflops, whereas mine ent processors need a different number of Mflops, which would roughly con'espond makes 25 Mips.' 'Never mind, your old instructions to perform a useful calcula- to the performance advertized by the com- scalar processors will not do it for long, I tion, e.g. a multiplication. For example puter manufacturer. However, any scien- just read about a of the workstations usually have risc-processors tific program has instructions other than Teraflop generation with a speedup of (reduced instruction set computer) which loops which do not vectorize and, there- over 100!' in average need slightly more instructions fore, the efficiency is in practice much to perform floating point operations than lower! Depending on your problem and on Not far from reality - perhaps virtual the 'classical' main-frame processors. the specific architecture of the processor, reality. Such a talk shows how confusing the computing power might be quite dif- the mixing of abbreviations and miscon- After all, if we are interested in number ferent. Moreover, believe it or not, by ceptions in the field of new developments crunching, why do we not use the number making the programming run slower, you in computing is. As many applications in of floating point operations per second as can rise the Mflops rate. Just add to your chemical research are real numbercrunch- a measure? Indeed, the Mjlops (Megajloat- code a large loop, which vectorizes excel- ing problems, the measures of speed of a ing point operations per second) are quite lently, but does not produce any useful computer have generally a large impor- a good measure for most comparisons of numbers for your problem and you will tance, e.g. if one has to decide whether it scalar computers. The problems start with see that your program uses more computer pays to port a program to a certain compu- some of the new risc-architectures, as well time but shows (due to the additional well ter, whether one should buy a work-sta- as with the vector- and parallel-comput- vectorized part) a higher Mflop rate. This tion or rather try to get computer-time on ers! This is the point where we should might look strange to you, but such things a so-called supercomputer. These reasons perhaps say something about vector- and happen often when comparing programs prompted us to publish in previous Col- parallel-processing for readers who are based on different algorithms and contain- umns two comparisons of computer-per- not familiar with the subject. ing such portions of code which are un- formances with two different typical ap- known to the tester. A typical example is plications in the field of chemistry, quan- Vector-processing is perhaps best com- a scalar program where some tests (if) are tum chemical calculations [1] and simula- pared with the work done at an assembly present in a loop, which might tell the tions [2]. In this Column, we discuss some line. Let us assume you have in your computer to perform the loop only if a of the measures often used for scalar, program a loop with 20 instructions to be condition is satisfied. To make the loop vector, and parallel processors and show performed and the loop is repeated 100 vectorize, one has usually to eliminate the how carefully one has to deal with them. times with different data. If you have one test (if), which rises the Mflops rate due to worker (scalar processor) you give him vectorizing, but also makes the program One of the most basic comparisons is the first instruction which he applies to the performing unnecessary operations. performed by clock-rate. Each computer first number, than the second instruction is has a clock, which gives the working/re- applied to the same number (or its result) As vector-processors have probably quency of the computer. If we compare and so on until the last of the 20 instruc- already passed their cl imax of success and two processors of the same type, e.g. tions is performed with the first number. parallel- or -processing Motorola 68030 found in many Macin- Then the whole starts again with (MPP) are the key words for the near toshes, one with a frequency of 25 MHz the second number etc., i.e. you wait 100* future, we would like to say also a few and one with 40 MHz, then the second one 20 or 2000 steps until your loop is fin- words about the speedup measure used in is normally 1.6 times faster. This might ished. For simplicity let us assume now . Parallel computing not be true for the whole computer as other you have instead of one worker an assem- means that your workers are not standing components may not be scaled by the bly line with 20 workers (vector-proces- at an assembly line, but do the same oper- same ratio! The clock-rate is also a bad sor of a vector-length twenty). At the first ation at the same time on different data measure for a comparison between differ- step the first number enters the assembly (single instruction multiple data; SIMD) ent processor-types as different proces- line and is handled by the first worker as in or do even different things with different sors might need a different number of a scalar processor. However, in the second data (multiple instructions multiple data clock-. step the second worker performs the sec- (MIMD); we are not going into more de- 23 CHI MIA 4~ (1992) Nr. 1/2 (hnuarIF

tails here). The 'speedup' is the factor you tion took about 400 s (this number is that of CSCS in Manno, enforcing that gain in speed if you use n processors different from the one in [3], which was only programs yielding 275 Mflops should instead of one to solve a problem. If you wrong, due to an input error [4]) and a run on the NEC, are questionable in view have for example 20 workers (processors) performance of 153] Mflops was achieved. of the difficulty to accurately estimate the working together, they will usually loose The speedup for 8 processors was 7.65. efficiency of application programs run- some time for communication, i.e. ex- Brode [5] has carried out a very similar ning on vector processors. An alternative change oftheir working pieces (data), and, calculation on the same molecule (356 policy would be to supply a supplementa- therefore, reach a 'speedup' of less than contracted/592 primitive basis functions) ry national cluster of workstations or a 20, let us assume only ]2. Such a low with the TURBOMOLE-program on a super parallel computer for codes per- number would result if they need to com- workstation cluster of 14 machines per- forming badly on a mak- municate significantly and are often block- forming to a maximum rate of 660 Mflops. ing the scientists choose the most efficient ing each others way. Again you could The speedup was only] ] .6. Although the facility for their purpose. 'cheat' by giving each worker (processor) loss in parallelization was higher and the additional work which is not needed, but Mflop-rate was much smaller (the formal We thank Dr. Stefan Brode. BASF Aktienge- which forces him to sit longer at his table rate for the 8 Cray processors would even sellschaft, Ludwigshafen/Rhein and Dr. Hans and do relatively less communication, be 2660 Mflops) the time for the first Peter Liithi, [nterdisziplinares Projektzentrum fUr Supercomputing, ETH, ZUrich for the ex- making the process slower, but the speed- iteration (taking usually most time) was ample. up higher! only 524 s, i.e. slightly more than with DISCO. Similar experiences have been Finally, we would like to give an ex- made by Vogel et.a/. [6] with a version of ample from the real world of quantum DISCO on a network of workstations. [I J Th. Bally. P.-A. Carrupt. 1. Weber, Chimia chemistry, where people are not 'cheat- 1991. 45, 352. ing' their processors, but nevertheless sim- Summarizing, we can state that the [2] R. Eggenberger. H. Huber, Chilllia 1992,46, ilar effects can be found. Luthi et a/. [3] Mflop- and the speedup-measure are often 227. reported results from a calculation on a not very useful criteria for the real world of [3] H.P. LUthi. lE. Mertz. M.W. Feyereisen. J.E. AlmiOf. J. COlllpllt. Chem. 1992. 13. Cray Y -MP/8-128 supercomputer with the vector- and parallel-computers, but that a 160. DISCO-program for bis (2,6-dimethylphe- comparison between programs solving [4] H.P. LUthi. private communication. nyl) carbonate (C1703HI8), a molecule problems as similar as possible is the best [5] S. Brode, private communication. with 38 atoms (314 contracted / 610 prim- way to estimate the pelfonnance of com- [6] S. Vogel. J. Hutter, T.H. Fischer. H.P. LUthi, itive basis functions), in which one itera- putersfor a specific task. Policies such as /111 •• J. Quantum Chelll., accepted for publi- cation.

Chimia 47 (1993) 23-24 years and is meant for chemical engi- © Neue Schweizerische Chemische Gesellschaft neering students only. ISSN 0009-4293 The total number of hours per week varies among the different Schools of Engineering. Most of them devote a significant part of time to computer How Computer Science is applications in the laboratory or pilot plant (simulations, data comprehen- Taught to Our Students in sion, processing and modelling), which appears only partially in the Table. Chemistry. Part II - A first introduction to computer archi- tecture and programming is often giv- en by a computer specialist or mathe- Heiner G. Bi.ihrer* matician. Chemical applications are usually taught by a chemist. In a recent survey [1], the present state . one third of all diplomas in chemistry in - As is the case with university students, of the compulsory introduction courses in Switzerland [3]. In the following Table PASCAL is here too the preferred pro- computer science taught in different Swiss and listof contents this gap has been filled. gramming language. However, many universities and federal institutes of tech- Some conclusions can be drawn with re- Schools of Engineering tend to reduce nology were compared with a report called gard to [1]: the number of lessons devoted to pro- 'Recommendation for the Introduction of gramming for the benefit of computer Computer Science in the Chemistry Cur- - All Schools of Engineering offer to applications in chemistry. riculum' [2]. The survey did not take into their chemistry students compulsory In general, the differences between the account the seven chemistry departments courses in the first two semesters. Most university students' results [I] and of the Swiss Schools of Engineering (In- of them propose advanced courses in those of students at Schools of Engi- genieurschulen, HTL), which confer about higher semesters. As many schools neering are small. There may be less emphasise on chemical engineering, time for programming for the latter but *Correspondence: Prof. H.G. BUhrer automation, and electronics are addi- more for chemical and technical appli- Technikum Winterthur [ngenieurschule TWl tional elements of training. The School cations; this reflects the difference in Postfach 805 . of Engineering at Geneva is different interest and requirements of the two CH-840 I Winterthur inasmuch as its curriculum lasts five groups of chemists.