The Comparison and Selection of Programming Languages for High Energy Physics Applications
Total Page:16
File Type:pdf, Size:1020Kb
TN-91-6 June 1991 (TN) The Comparison and Selection of Programming Languages for High Energy Physics Applications Bebo White StanfordLinear AcceleratorCenter P.O. Box 4349, Bin 97 Stanford, California 94309 USA This paper discussesthe issues surroundingthe comparisonand selection of a programminglanguage to be usedin high energyphysics software applications. The evaluation method usedwas specifically devised to addressthe issues of particular importance to high energy physics (HEP) applications, not just the technical features of the languages considered. The methodassumes a knowledge of the requirementsof current HEP applications, the data-processing environments expectedto supportthese applications and relevant non-technicalissues. The languagesevaluated were Ada, C, FORTRAN 77, FORTRAN 90 (formerly 8X), Pascal and PL/l. Particular emphasisis placed upon the past, present and anticipated future role of FORTRAN in HEP software applications. Upon examination of the technical and practical issues,conclusions are reachedand some recommendationsare made regarding the role of FORTRAN and other programminglanguages in the currentand future developmentof HEP software. I. Introduction The programming languageto be usedfor any software application is a critical determinantof the speedof software development,the easeof software maintenanceand the portability of software to other systems.Many languagecomparisons have appearedin computer scienceand programming literature. A large portion of thesecomparisons have been conductedon the languagesin situ. Little, it seems,has been written about how languagesshould be evaluatedand assessedwith respectto specific software projects. Physicistshave long been recognizedas among the most knowledgeableof natural scientistswith respectto the technical aspectsof computing. Advancementsin high energy physics have been driven by advancementsin computer hardwareand software technology and vice versa. There is significant cross-over of physicists into computer-relatedtasks. Physicistsand developersof physics software generally have the expertiseto make a choice of developmentprogramming languages. This paper describes an exercise in which a number of computer programming languages were evaluatedspecifically for high energy physics (HEP) applications. The method devised for this evaluation is basedupon a knowledge of the requirements of the applicationsinvolved, the data- processing environmentsexpected to support those applicationsand additional surroundingtechnical and non-technicalissues. This method, coincidentally, closely parallels the feasibility and requirements analysiscommon to many software engineeringmethodologies. Therefore, the goals of this study were, Contributed to the InternationaJ Workshop on Software Engineering, Artificial Intelligence and Expert System for High Energy and Nuclear Physics, Lyon ViJJeurbanne, fiance, March 19-24, 1990 Presented at the Conference on Computing in High-Energy Physics, Oxford, April 1989 O to systematicallyevaluate candidate programming languages specifically for high energy physics applications;and O to evaluatethe role of programminglanguages in future HEP computing environments. II. Methodology In order to perform a meaningful languagecomparison, it was necessaryto define specific language evaluationelements. These evaluation elements define the languagecomparison environment and insure that only the meaningful featuresof the candidatelanguages are considered. A concise definition of theseelements hopefully reducesthe possibility of “programming languagebigotry ” and the comparisonof irrelevant languagefeatures. For this study, the following elementswere identified: OTechnical specificationof a genericHEP programmingapplication; OIdentification of candidateprogramming language features necessary to satisfy these specifications; Ocandidate programming language“ fit” in HEP softwareenvironments; OAnticipated growth and future developmentof the candidateprogramming language. III. Identification and Comparison of Relevant Language Features High energy physics data reduction and analysisprograms provide good examplesof highly numerical-intensive,batch-oriented scientific programmingapplications. An analysisof author- solicited, “typical” off-line programsused at SLAC and CERN was usedto compile a list of common processesrequired by such applications.This list included Oinput/output of binary files O64-bit floating-point arithmetic Ooperations using complex data types Ovector and matrix arithmetics Odata structuremanipulation Oaccess of common blocks of data Odirect addressingof dimensionedvariables (as distinct from matrix operationsand including non- zero lower boundsfor arrays) Oseparate or independentcompilation of subprogramsand crossmodule checking 2 Osubprogram parameter passing by value and referenceand the ability to passroutines as p-ters Oaccess to libraries of mathematicalfunctions Oaccess to histogrammingservices Oaccess to graphicsservices Oextended file or databaseservices (e.g., particle tables). This list of processeswas used to identify the candidateprogramming languagesto be evaluatedin the study. It was expectedthat the candidatelanguages would either have features-whichwould map directly to the processeson the list or would have accessto toolsetswhich would supplementtheir functionality. It was also decided that the candidatelanguages would be well-established(i.e., standardizedto somedegree), scientific and algorithmic. The languageswhich met thesecriteria were Ada, C, FORTRAN 77, FORTRAN 90 (formerly 8X), Pascaland PL/l. The next stepwas to derive a “wishlist” of languagefeatures from the list of typical HEP software processes.Specific candidatelanguage features were then comparedagainst this list. The result is illustrated in Table 1, “The Detailed LanguageFeatures Comparison. ” Referencesin this table provide clarification and indicate alternativesto deprecatedfeatures. In this way eachcandidate languagecan be realistically evaluatedin terms of its standardand non-standardfeatures, implementation-dependentfeatures and extensionsand toolsetscommonly availablein HEP computing environments. * Ada has the broadestbase of features of the languagesbeing discussed.It has an extensive range of array definitions and permits subprogramsto use arbitrarily sized arrays. Ada doesnot permit the user to define the internal representationneeded for multi-dimensional arrays . Intrinsic array operationsare not defined within the languageand are viewed as an extensionin an Ada packageif required. Ada offers extended capabilitiesin the area of bit manipulation on the order of enumeratedtypes. Real data types are defined within the languageand support for extended precision and complex arithmetic can be provided through packagesin a reasonableway, since overloading of operatorsallows the mathematicalsyntax to be preserved.However, Ada does not recognizethe needfor at least two (single and extendedprecision) structurally different floating point arithmetics. Ada was obviously designedwith consciousemphasis on sound software engineering.It has the best ability of the candidatelanguages to hide data and routines. It is especially strong in demandinga clarity of exposition of the data and routines which may be imported to, and exported from, any program unit by providing cross-module checking. Ada provides a wide range of I/O facilities though a set of pmdefmed packages, such as DIRECT-IO, TEXT-IO, or LOW LEVEL-IO. The with and use constructs can be used for computation. * C is a languagewith functionality similar to Pascal, but it is much less strict in the consistency checking of program units. LINT must be used to achievethe kind of crossmodule checking that Ada and Pascaloffer. Data hiding is availableat the samelevel as Ada. C defines I/O through a standardlibrary (library stdio). C defines real and double precision data types in the language, and support for complex arithmetic can be provided using the typedef and struct facilities, but without retaining the mathematicalsyntax. C managesfairly well in the areaof bit manipulation. Bit fields of variable widths can be incorporated into user-definedstructures, and a pointer type can be usedto locate the structureat a given addressin memory. Sets are not defined. Enumeratedtypes are not defined in Kernighan and Ritchie but are in the ANSI draft standard. The languageprovides a very useful preprocessor for performing such operationsas macro definition and library inclusion. * FORTRAN 77 provides reasonablesupport for array handling and manipulation. It is the only one of the candidatelanguages which definesreal, double precision and complex data types as part of the language.FORTRAN 77 containsa very clear definition for formatted and unformatted J/O to terminal, printer and massstorage. However, it is clear that FORTRAN 77 lacks a great deal of the functionality that the other languagesoffer, and little type checking and no crossmodule checking. There is no data hiding beyond local variables. In realistic HEP applications, FORTRAN 77 must be supplementedby additional software tools. At CERN, the whole conceptof ZEBRA is a manifestation of one of FORTRAN 77’s needs(i.e., data structuremanipulation). FORTRAN 90 has directly addressedmany of thesedeficiencies of FORTRAN 77. * Pascal,as implementedby both DEC and IBM, is very similar to Ada from the stand-pointof computing itself. It offers the ability to hide data and code at the samelevel and cross module checking. It does not offer multi-programming