<<

TN-91-6 June 1991 (TN) The Comparison and Selection of Programming Languages for High Energy Physics Applications

Bebo White StanfordLinear AcceleratorCenter P.O. Box 4349, Bin 97 Stanford, California 94309 USA

This paper discussesthe issues surroundingthe comparisonand selection of a programminglanguage to be usedin high energyphysics applications. The evaluation methodused was specifically devised to addressthe issues of particular importance to high energy physics (HEP) applications, not just the technical featuresof the languages considered. The methodassumes a knowledge of the requirementsof current HEP applications, the data-processing environments expectedto supportthese applications and relevant non-technicalissues. The languagesevaluated were Ada, C, FORTRAN 77, FORTRAN 90 (formerly 8X), Pascal and PL/l. Particular emphasisis placed upon the past, present and anticipated future role of FORTRAN in HEP software applications. Upon examination of the technical and practical issues,conclusions are reachedand some recommendationsare made regarding the role of FORTRAN and other programminglanguages in the currentand future developmentof HEP software. I. Introduction

The programming languageto be usedfor any software application is a critical determinantof the speedof ,the easeof software maintenanceand the portability of software to other systems.Many languagecomparisons have appearedin computer scienceand programming literature. A large portion of thesecomparisons have been conductedon the languagesin situ. Little, it seems,has been written about how languagesshould be evaluatedand assessedwith respectto specific software projects.

Physicistshave long been recognizedas among the most knowledgeableof natural scientistswith respectto the technical aspectsof . Advancementsin high energy physics have been driven by advancementsin computer hardwareand software technology and vice versa. There is significant cross-over of physicists into computer-relatedtasks. Physicistsand developersof physics software generally have the expertiseto make a choice of developmentprogramming languages.

This paper describes an exercise in which a number of languages were evaluatedspecifically for high energy physics (HEP) applications. The method devised for this evaluation is basedupon a knowledge of the requirements of the applicationsinvolved, the data- processing environmentsexpected to support those applicationsand additional surroundingtechnical and non-technicalissues. This method, coincidentally, closely parallels the feasibility and requirements analysiscommon to many software engineeringmethodologies. Therefore, the goals of this study were,

Contributed to the InternationaJ Workshop on , and Expert System for High Energy and Nuclear Physics, Lyon ViJJeurbanne, fiance, March 19-24, 1990 Presented at the Conference on Computing in High-Energy Physics, Oxford, April 1989 O to systematicallyevaluate candidate programming languages specifically for high energy physics applications;and O to evaluatethe role of programminglanguages in future HEP computing environments.

II. Methodology

In order to perform a meaningful languagecomparison, it was necessaryto define specific language evaluationelements. These evaluation elements define the languagecomparison environment and insure that only the meaningful featuresof the candidatelanguages are considered. A concise definition of theseelements hopefully reducesthe possibility of “programming languagebigotry ” and the comparisonof irrelevant languagefeatures. For this study, the following elementswere identified: OTechnical specificationof a genericHEP programmingapplication;

OIdentification of candidateprogramming language features necessary to satisfy these

specifications; Ocandidate “ fit” in HEP softwareenvironments; OAnticipated growth and future developmentof the candidateprogramming language. III. Identification and Comparison of Relevant Language Features

High energy physics data reduction and analysisprograms provide good examplesof highly numerical-intensive,batch-oriented scientific programmingapplications. An analysisof author- solicited, “typical” off-line programsused at SLAC and CERN was usedto compile a list of common processesrequired by such applications.This list included Oinput/output of binary files O64-bit floating-point arithmetic Ooperations using complex data types Ovector and matrix arithmetics Odata structuremanipulation Oaccess of common blocks of data Odirect addressingof dimensionedvariables (as distinct from matrix operationsand including non- zero lower boundsfor arrays) Oseparate or independentcompilation of subprogramsand crossmodule checking

2 Osubprogram parameter passing by value and referenceand the ability to passroutines as p-ters Oaccess to libraries of mathematicalfunctions Oaccess to histogrammingservices Oaccess to graphicsservices Oextended file or databaseservices (e.g., particle tables).

This list of processeswas used to identify the candidateprogramming languagesto be evaluatedin the study. It was expectedthat the candidatelanguages would either have features-whichwould map directly to the processeson the list or would have accessto toolsetswhich would supplementtheir functionality. It was also decided that the candidatelanguages would be well-established(i.e., standardizedto somedegree), scientific and algorithmic. The languageswhich met thesecriteria were Ada, C, FORTRAN 77, FORTRAN 90 (formerly 8X), Pascaland PL/l.

The next stepwas to derive a “wishlist” of languagefeatures from the list of typical HEP software processes.Specific candidatelanguage features were then comparedagainst this list. The result is illustrated in Table 1, “The Detailed LanguageFeatures Comparison. ” Referencesin this table provide clarification and indicate alternativesto deprecatedfeatures. In this way eachcandidate languagecan be realistically evaluatedin terms of its standardand non-standardfeatures, implementation-dependentfeatures and extensionsand toolsetscommonly availablein HEP computing environments.

* Ada has the broadestbase of features of the languagesbeing discussed.It has an extensive range of array definitions and permits subprogramsto use arbitrarily sized arrays. Ada doesnot permit the user to define the internal representationneeded for multi-dimensional arrays . Intrinsic array operationsare not defined within the languageand are viewed as an extensionin an Ada packageif required. Ada offers extended capabilitiesin the area of bit manipulation on the order of enumeratedtypes. Real data types are defined within the languageand support for extended precision and complex arithmetic can be provided through packagesin a reasonableway, since overloading of operatorsallows the mathematicalsyntax to be preserved.However, Ada does not recognizethe needfor at least two (single and extendedprecision) structurally different floating point arithmetics. Ada was obviously designedwith consciousemphasis on sound software engineering.It has the best ability of the candidatelanguages to hide data and routines. It is especially strong in demandinga clarity of exposition of the data and routines which may be imported to, and exported from, any program unit by providing cross-module checking. Ada provides a wide range of I/O facilities though a set of pmdefmed packages, such as DIRECT-IO, TEXT-IO, or LOW LEVEL-IO. The with and use constructs can be used for computation.

* C is a languagewith functionality similar to Pascal, but it is much less strict in the consistency checking of program units. LINT must be used to achievethe kind of crossmodule checking that Ada and Pascaloffer. Data hiding is availableat the samelevel as Ada. C defines I/O through a standardlibrary ( stdio). C defines real and double precision data types in the language, and support for complex arithmetic can be provided using the typedef and struct facilities, but without retaining the mathematicalsyntax. C managesfairly well in the areaof bit manipulation. Bit fields of variable widths can be incorporated into user-definedstructures, and a pointer type can be usedto locate the structureat a given addressin memory. Sets are not defined. Enumeratedtypes are not defined in Kernighan and Ritchie but are in the ANSI draft standard. The languageprovides a very useful preprocessor for performing such operationsas macro definition and library inclusion.

* FORTRAN 77 provides reasonablesupport for array handling and manipulation. It is the only one of the candidatelanguages which definesreal, double precision and complex data types as part of the language.FORTRAN 77 containsa very clear definition for formatted and unformatted J/O to terminal, printer and massstorage. However, it is clear that FORTRAN 77 lacks a great deal of the functionality that the other languagesoffer, and little type checking and no crossmodule checking. There is no data hiding beyond local variables. In realistic HEP applications, FORTRAN 77 must be supplementedby additional software tools. At CERN, the whole conceptof ZEBRA is a manifestation of one of FORTRAN 77’s needs(i.e., data structuremanipulation). FORTRAN 90 has directly addressedmany of thesedeficiencies of FORTRAN 77.

* Pascal,as implementedby both DEC and IBM, is very similar to Ada from the stand-pointof computing itself. It offers the ability to hide data and code at the samelevel and cross module checking. It does not offer multi-programming as part of the language. Pascalprovides rather simplistic I/O facilities in the Jensenand Wirth defmition which are not as comprehensiveas those of FORTRAN 77 and do not provide for unformatted(binary) data. ExtendedI/O facilities are common in implementation-dependentcompilers. Extensions in the areaof array manipulation alleviate many of the problemswhich exist in the Jensenand Wirth definition. Bit manipulation is possible in an arcane,poorly documentedmanner. Real data types are defined in the language, but double precision and complex must be emulatedin user-defineddata structures.

* PL/l is a very rich language in terms of its constructs,nearly on a par with Ada. It offers data hiding at the samelevel as Pascal. Like Ada, PUl offers a languagefeature for exception

4 handling. It offers rather little, however, in terms of type checking and no cross module checking. According to the languagedefinition, concurrentcomputation is part of the language. However, this feature was not available in the IBM and DEC used.Complex and double precision data types are defined as part of the language. Bit functions are also supported. Setsand enumeratedtypes are not defined in the language. PLJl doesprovide a very powerful preprocessor which provides such capabilities as including text from an external library, conditional compilation of sectionsof the sourceprogram, macro developmentand variable namereplacement.

The technical issuesalone would appearto suggestthe superiority of any of the candidatelanguages over FORTRAN 77. However, it is clear that this would obviously not be a realistic conclusion. From a historical perspectiveno programminglanguage evaluation could be conductedwith the prospectof completely replacing FORTRAN in HEP applications. This presumptionis substantiatedin the evaluationof practical issues.

IV. Programming Language “Fit” (Practical and Non-Technical Issues)

The following categories of practical issuesrelevant to collaborativeHEP software applications have beenidentified:

O The Historical Role of FORTRAN

“For all its inelegance,and lack of safety features, it seems certain that FORTRAN will remain the main languagefor HEP code well into the 199Os....” Computing at CERN in the 1990s

The widespreadadoption of FORTRAN by the physics community probably came about becauseit was the best approximation to the generalpurpose language, capableof abstracting most of the computer-orientedideas that one physicist wished to expressto another(i.e., the lingua franca). Millions of lines of program code written in FORTRAN are a valuable foundation of the role of computing in high energy physics. Much of this code has proven itself over the courseof many yearsto be accurate, reliable, efficient and flexible enoughto be used as the requirementsof experimentsand even as physics itself have changed. Concurrently, the shortcomingsof the language have led to the production of softwarelibraries that alleviate many of its defects, and representa huge investmentin accumulatedexpertise. From a historical perspectiveit is more realistic to evaluateprogramming languagesin the HEP environmentin sucha way as to complement FORTRAN applications. To fit Ada, C, Pascalor PL/l into a HEP application of any consequence

5 will dependon the ability of that languageto communicateeffectively with modules,libraries and other program units which were developedin FORTRAN. O Code Portability and Code portability is one of the major practical issuesin a HEP experimentalenvironment. Experiment collaborators are literally “seconds away” from one another via network. Programs, subprograms,libraries, etc. are easy and necessaryto share. Many of the advantagesof a collaborative environmentwould be lost if softwareportability presenteda major problem. The ideal programming language would operateindependently of the hardware and the within which it functions. Programswritten in that languagewould executewith minimum modification on all collaborative systemsand yield identical results. The portability of software written in a high level languagedepends upon the availability of the appropriatecompilers on the target machines or on the existence of a that is itself portable and which can be moved easily to new machinesat a cost far less than that required to produce it initially. Moreover, for portability to be successful, a common subset(or dialect) of the high level languagemust exist among the compilers. Given this ,the programmerhas a maximum degree of independence and the functionality allowed by the portability of software.

6 O Inter-Language Communication It is not uncommonin the evolution of a software system to want to program a new application in one programming languagewhile still maintaining the use of existing libraries that have been programmed in a different language. In most casesthe recoding of existing software solely for compatibility cannotbe cost justified. This conceptof inter-languagecommunication has played a very important historical role. Subprograms and libraries programmed in assemblerlanguage offer greaterefficiency to applicationsprograms written in higher level languages. The increasing demandfor program portability acrossmultiple computer systems has pushed the demand for inter-languagecommunication to the higher level languagelevel from the assemblylanguage level. Separatecompilation of program modules allows for compatibility at the source code level if the module interfaces are preciselydefined. Effective inter-languagecommunication deletes the needto ‘Ye-inventthe wheel” and allows the capability to integratethe old and the new in a satisfactorymanner. In a HEP environment, this is a measureof the fit to existing FORTRAN libraries, graphicsand databasefacilities.

OLanguage Standardization Languagestandardization activity must be consideredfor projects with lengthy life cycles. The technical direction assumedby the standardizationcommittee for a programming language is of critical importance. Standards activities should attempt to preserveinvestments in software written in the language and to createnew standardswith as high a degreeof compatibility as possiblewith previous standards.At issueis the questionof object code compatibility and sourcecode compatibility.

V. Preliminary Conclusions A survey of the technical and practical featuresof the candidatelanguages led to the following preliminary conclusions: O All of the candidatelanguages are mature; O Ada has the broadestbase of standardizedfeatures; O C has broad baseof featurescoupled with high portability and availability factors; O FORTRAN 77 has all the HEP data types native; functionality has a strong dependenceon extensionsand additional softwaretools especiallywith data structures; O FORTRAN 90 holds greatpromise, but when will reliable compilers be available? O Many important Pascalfeatures are implementation-dependent; O PI.,4 is a very rich in features;availability could be a major concern.

7 It is obvious from theseconclusions that not one of the candidatelanguages is clearly superior for HEP applicationswhen consideringtechnical and practical issues.Yet, it doesbecome clear that the size and complexity of HEP software systemsis forcing the HEP community to confront the notorious “Software Crisis.” It may have beenmore realistic to addressthe question ” Can an evaluation of programming languagesfor HEP applicationshelp improve programmer/physicist productivity and increasethe reliability and maintainability of HEP software systems? ” VI.What About Multi-Language Systems?

Insteadof expecting (or hoping) a single programming languagewould be an obvious choice for HEP applications,perhaps a more realistic future lies in multi-language systems? As was indicated earlier, FORTRAN pioneeredmulti-language programs with Assemblerroutines. However, portability requirementshave pushedthe issuefrom assemblerto higher order languages.All of the candidatelanguages have somecapability for inter-languagecommunication. Opening the door for multi-languagesystems allows complex and collaborativeHEP softwareprojects to:

O choosethe best tool (language,toolset, library, software product, etc.) for the job at hand;

O choosethe bestprogrammer/software designer for the tool;

O take better advantageof the technologyand expertisein current computing. VII. Conclusions The following generalconclusions were reachedas a result of this programming language comparisonand evaluation: O The common assumptionthat FORTRAN “will remain the main languagefor HEP code well into the 90’s” is a valid, but conservative,one; O HEP softwaredevelopment in alternativelanguages and multi-languagesystems should be encouragedif it can be proven to be a sound decision. Goal of This Study

l To systematically evaluate candidate programming languages spec$cally for high energy physics ap- plications; and

l To evaluate the role of programming languages in future HEP computing environments

9 “For all its inelegance, and lack of safety features, it seems certain that FORTRAN will remain the main language for HEP code well into the 199Os....” Computing at CERN in the 1990s

“FORTRAN is probably the only perknnial stan- dard which will never be questioned.” Trends in Computing for HEP

“I don’t know what the language of the year 2000 will look like but I know it will be called FOR- TRAN.”

“If HEP wishes to keep to its level of achievement, credibility and excellence, then it needs an injection of bright young computer-wise scientists and engi- neers .” Pa010 Zanella

10 The Tower of Babel, Not a New Accelerator Design

11 Language Evaluation Elements

l Technical specification of the programming appli- cation

l Analysis of candidate programming language fea- tures

l Programming language fit in the application envi- ronment

l Growth and future development Typical/Required Processes in HEP Applications

l input/output of binary files

l 64-bit floating-point arithmetic

l operations using complex data types

l vector and matrix arithmetics

l data structure manipulation

l access of common blocks of data

l direct addressing of dimensioned variables (as dis- tinct from matrix operations and including non- zero lower bounds for arrays)

l separate or independent compilation of subpro- grams

13 l subprogram parameter passing by value and refer- ence and the ability to pass routines as parameters

l access to libraries of mathematical functions

l access to histogramming services

l access to graphics services

l extended file or services (e.g., particle tables).

14 Wishlist and Detailed Language Feature Comparison

Features Ada m Fgx Prrcal PL/I address arithmetic no arbitrary array bounds yes arbitrary function return v&e DO argument by name or position array bound checking ;k” l lTC4y Of StNCtWCS m array operations no biaary I/O Yes bit logic yes call by reference Ye call by value a0 complex variables Ya co-t computation a0 uoss rnodule argumcat checking a0 data hiding descriptive variable aames y!3) double precision ya dynamicalanory (21) cnumcratcd data types aception handling & formatted I/O ~acm3i;=r.~‘- &*a “.*...CI.*-.. --.- - gi) inrcr-hrlguagc communicaJon (29) local procedures a0 multiple subprw catria Ya pass arbitrary wtrix yes pass arbitrary l-dimensional array pass structure to subprogram & pointers to functions pointers to variables EL prepr- (42) ItXWSiOtl tl0 routine hiding a0 sets a0 static variables yes StliJlg.5 strong type cbccking &, subprogram argument checking II0 user generic functions 110 variable equivalence Y= variable initialization Yes variable range checking a0

15 References

(1) tlOkOOWU bar-0 s; in ref. [11 only sitnpk variabks or poinms; ANSI draft standard also includes strwutrcs (81 (4) USUgCOCliCfUOCliOOS may be vritM (9 sulus bit string. sttitq at* pointer, some uprcssions (6) DEC - yes: IBM - with IAD facility 0 DEC-yea;IBM-uo 69 uac operator ovdoadiry @wrod io a package) (9) cao assign an array to anotbcr. but 00 opentioos on arrays as units w (11) implcmcntation dqwndatt 02) be INTENT of a dummy argument may be specified (13) can use typedef and struct, but without mathcmatic.aI syntax 04) 00, must be emulated with raords (15) defined in the language defitioo: not implemented by DEC or IBM (16) 00. occusaly to UK LINT (17) yes, if INTERFACE blocks arc used (18) 00, only in local procedures w9 DEC - yes; IBM - oo (only 8 characters allowed) m no, must be emulated (21) no, ItsczEBRA -- no, io ref. m; yes io the ANSI draft standard (81 2 implementation apendenl (34) ya for I/O impkmtntation depxknt i2 DEC-yu;IBM-no,uscZEBRA defii via INTERFACE PRAGMA; not always implancntod DEC - yes; IBM - yes; standard defiition - unknown if2 . (29) EXTERNAL declaration m defined via EXTERNAL rod FORTRAN declarations; implaneotatioo dependent (31) yes, with parameters of the DECLARAlion and PROCEDURE statements and only with specific languages (32; unknown WI uoknown WI pass array of pointers to arrays; both must start at 0 (35) pass array of poiota-s to arrays; turo off range checking (36) lower bound must be 0; no bound checking possible (37) DEC(IS0) - yes; IBM(ANS1) - no; turn off range checking (38) DEC - yes; IBM -no (39) not in the sense of C, pass routine at FORTRAN level w oq use ZEBRA . (41) conditiooal compilation, declaration import, and simple lexical substitution of constants with arithtnedc arc part of the W3-e (43) yes, MORTIWN is one example (43) substitutioo of constants with atithtnetic is supported by DEC and IBM w 00. only as local procedures (45) implementation dependent WI canbccodcdasaMODULE. (47) hnpluncntation dependent WI yes for simple variables. no for routines (4% if IMPLICIT NONE is used w-9 00, must use LINT 61) uocbeckd-woversion or variant record (52) variaot record or union (53) variant record or union w implementation dependent

16 Practical Issues

l The FORTRAN legacy

l Port ability

l Maintainability

l Standardization (features vs. extensions)

l Availability of knowledgeable users

l Fit with current technology, available tools and projected hardware environments

17 Preliminary Conclusions I

l All of the candidate languages are mature

Ada has the broadest base of standardized fea- tures

C has broad base of features coupled with high portability and availability factors

FORTRAN 77 has all the HEP data types na- tive; functionality has a strong dependence on extensions and additional software tools esp. with data structures

FORTRAN 90 holds great promise, but when??

Many important Pascal features are implementation- dependent

PL/l is a very rich in features; availability could be a major concern HEP Programming Languages an .d the “Software Crisis”

Can an evaluation of programming languages for HEP applications l Improve programmer productivity?

l Increase the reliability and maintainability of HEP software systems?

19 Can Software Engineering Help?

The software project life-cycle:

1) Feasibility and 2) Logical Design 3) Detailed Design 4) Coding 5) Implement at ion 6) Maintenance

20 Structure Charts

l A tool often used in the Detailed Design phase

l Defines the skeleton of the final system in terms of subprogram structures, data structures, etc.

l Helps to separate interface specification from im- plement ation details

l Demonstrates the “maintenance view” of the sys- tem

MAIN

Level 1 Routines

Level 2 Routines

21 What About Multi-Language Systems??

l FORTRAN pioneered multi-language programs with Assembler routines

l Portability requirements have pushed the issue from Assembler to Higher Order Languages

. Allows:

l The best tool for the job

l The best designer for the tool

22 Inter-Language Communication

Ada defined via INTERFACE PRAGMA; not always implemented c DEC - yes; IBM - yes; standard definition - unknown _ FORTRAN 77 EXTERNAL declaration FORTRAN 90 EXTERNAL declaration Pascal defined ia EXTERNAL and FOR- TRAN dveclarations;implementation- dependent PL/l yes, with parameters of the DEC- LARATion and PROCEDURE statements and only with spe- cific languages

23 ILC Compatibility Issues

data type conventions: it is imperative that data type matching occurs with parameters passed to modules; array conventions: it is necessary to ktiow how multi- dimensional arrays are mapped into memory by the called and calling languages; calling conventions: it is necessary to know how pa- rameters are passed to modules, how values are returned and how re ister and memory management is a andled.

24 FORTRAN e C Data Typing

c TYPe FORTRAN Type int * INTEGER*2 long int * INTEGER*4 float * REAL*4 . double * REAL*8 float [2] COMPLEX Struct Complex * COMPLEX double[2] COMPLEX*16 Struct Double Complex * COMPLEX*16 char * LOGICAL*1 struct CHARACTER{ CHARACTER*(*) char * text; int length; 1

25 Conclusions II

l The assumption that FORTRAN “will remain the main language for HEP code well into the 90%” is a valid, but conservative, one;

l HEP software development in alternative languages should be encouraged if it can be proven to be a sound software design decision.