Top Physics RTT Samples & Plots
Total Page:16
File Type:pdf, Size:1020Kb
eX Computing from my PhD onwards All the way from ALGOL68 to C++, Python and the Grid, by way of Fortran, CERNLIB and RAL/CERN batch systems 2008-11-27 Joe Foster 1 eX Itinerary • Leeds (late 1970s): PhD computing - the simple life. • CERN (1980s): Heroic days of HEP computing. • All over: Distributed computing before the WWW and the Grid. • 1992: “A World Wide Web?”: my response. • After the millenium: The Grid 2008-11-27 Joe Foster 2 eX Leeds: PhD Computing (late 1970s) • Stand alone program to analyze spark chamber photos – Small group working on cosmic rays. – Simple detector. – Small data volumes. – Chose ALGOL68. – Recommended by computer scientists. – Modular, procedures and variables have scope. – Define new data types. – A precursor of C, Ada, Python... – Example ... IF a = b THEN x := y ELSE x := z FI; ... – Nice! 2008-11-27 Joe Foster 3 eX Leeds: PhD Computing (late 1970s) • ICL 1906A mainframe computer at Leeds – 128K(?) of 24-bit core memory using 6 bit ‘characters’. – ?MB fixed disc + 8M char exchangeable disc store. – Drum store for paging. – Tape drives for archive storage. – User input: • Line mode VDU (later on) • Teletype terminal • IBM card punch – George 4 operating system. – Huge power consumption (~0.5MW?). – Slow, limited local area network. – Email? 2008-11-27 Joe Foster 4 eX Some Early Hardware ICL 1906A at RAL 2008-11-27 Joe Foster 5 eX Storage in the 1970s 8M character exchangeable disc Core Memory Tape Drives 2008-11-27 Joe Foster 6 eX User Data Entry Teletype Terminal with tape punch Card punch 2008-11-27 Joe Foster 7 eX CERN (1980s): HEP computing • HEP computing was very different. – Data: • Large complex datasets, resident on tape. • Event based. Events too big for memory. – Need memory management! • Time dependent alignment and calibration data. – Software: • Large bodies of code with many contributors. – Code management system needed. • FORTRAN – Fast, well understood. – Initially only FORTRAN 4 compiler. – Code management needed for COMMON blocks, etc. – Memory management from separate libraries. • Much functionality provided by CERN Program Library. • IBM JCL for scripting. • WYLBUR user interface. – Hardware • IBM mainframe. PDP11 for DAQ in EMC. • Line mode terminals in common user area. • Fast CERN-wide network. Email, with limitations. 2008-11-27 Joe Foster 8 eX PDP 11 DEC PDP11 mini- computer like the one used in EMC for data acquisition. Note the programming switches on the front panel. They were used to set up the start address for the initial program load when booting. 2008-11-27 Joe Foster 9 eX IBM JCL • Unix: cp OldFile NewFile • IBM JCL: //IS198CPY JOB (IS198T30500),'COPY JOB',CLASS=L,MSGCLASS=X //COPY01 EXEC PGM=IEBGENER //SYSPRINT DD SYSOUT=* //SYSUT1 DD DSN=OLDFILE,DISP=SHR //SYSUT2 DD DSN=NEWFILE, // DISP=(NEW,CATLG,DELETE), // SPACE=(CYL,(40,5),RLSE), // DCB=(LRECL=115,BLKSIZE=1150) //SYSIN DD DUMMY 2008-11-27 Joe Foster 10 eX FORTRAN • FORTRAN 77 has some nice features, but there was much legacy code in FORTRAN 4. • FORTRAN 77: IF (A .EQ. B) THEN X = Y ELSE X = Z END IF • FORTRAN 4: IF (A .EQ. B) GOTO 10 X = Z GOTO 30 10 X = Y 30 CONTINUE 2008-11-27 Joe Foster 11 eX FORTRAN • Standard FORTRAN was well understood and fast, but had serious limitations for HEP computing: – Subroutine, function, common block names have global scope, and are limited to six upper case characters. It is hard to generate meaningful names that don’t collide with library functions, etc. – Rigid code format based on IBM punch cards. – Arrays can’t be allocated at run time, but fixed arrays for HEP event data would be too big. – Common blocks must be declared in full at the start of all program units that use them, and they can have many variables. There is no FORTRAN equivalent of the C preprocessor to manage this. – It is hard to write well structured code with features like GOTO. – Limited ability to define new complex data types. – No IO access by keywords like Python dictionary type. • Proprietary extensions were nice, but not portable. • Most of the missing features were supplied by the CERN Program Library (CERNLIB). 2008-11-27 Joe Foster 12 eX Some (old) CERNLIB Addons • PATCHY code management library – Let many users work on a common body of code. – Select particular versions of program units. – Conditionally include platform dependent code. – Managed FORTRAN COMMON block inclusion in a platform independent way. – Superseded by CMZ, and now by CVS and Subversion. • HBOOK histogram package – Used to fill and display histograms in batch mode. – Superseded by PAW, and now by ROOT, JAS, etc. • KAPACK, FFREAD keyed random access packages – Useful for calibration databases, read from text file. – Now we have SQL databases and XML. 2008-11-27 Joe Foster 13 eX Memory Management and FORTRAN • ZBOOK was an early FORTRAN based memory management package. – Declare a large array in a COMMON block which can contain either REALs or INTEGERs. – ‘Book’ space (banks) in the array and access data there by bank pointers and offsets. – ‘Drop’ the banks after use and book new ones. – Garbage collection occurs when the array gets too full. Current banks are shuffled over into space left by dropped ones. Old pointers are lost only now. • More recent packages ZEBRA, HYDRA, BOS work in essentially the same way. • Now we use malloc in C, new in C++. – Continuing source of memory leaks! • Python does it all for us. 2008-11-27 Joe Foster 14 eX ZBOOK Example SUBROUTINE ZEXAM1 C ID1 may be used in another COMMON/ZCOMM/Z(1), ID1, ID2, ID3, WS(1) DIMENSION IZ(1000), IWS(1) subroutine. Invalid there? EQUIVALENCE (IZ(1),Z(1)), (IWS(1),WS(1)) No type checking possible! C C INITIALIZATION CALL ZINIT (Z,WS,1000) C C BOOKING NWORDS = 23 CALL ZBOOK (Z, ID1,NWORDS) C CHECK IF BANK EXISTS IF (ID1.EQ.0) GOTO 999 C FILLING DO 10 I=1, NWORDS 10 IZ(ID1 + I) = I C C ACCESS TO DATA STORED IN THE BANK ILAST = IZ(ID1 + NWORDS) C 999 RETURN END 2008-11-27 Joe Foster 15 eX 1980S Batch Analysis • Typical 1980s batch analysis job: Compile Link Process Visualize Debug Debug Debug Debug – Could take hours to run even a small job. – Rerun whole job just to add one histogram. – Errors caught late in cycle. 2008-11-27 Joe Foster 16 eX Modern Job with PAW or ROOT • Typical modern analysis chain using PAW or ROOT Compile Link Process Visualize Debug Debug Debug Debug – Break job into separate steps - easy with Unix scripting. – Debug each step separately - faster error catching. – Easily add new histograms using PAW ntuple or ROOT TTree. 2008-11-27 Joe Foster 17 eX Cost of Fixing Bugs (from memory) 40 30 20 Cost (log scale) 10 0 Design Code Test Deploy 2008-11-27 Joe Foster 18 eX 1980s Distributed Computing By 1984 most UK Universities were linked to SERCNET (later JANET ) - some via sites like Manchester or Daresbury. Connection to CERN and US sites was possible via gateways. Services included remote login, remote job submission, email. 2008-11-27 Joe Foster 19 eX 1980s Networks • Many UK HEP users in the 1980s had three network protocols to learn. – SERCNET/JANET - e.g. ‘[email protected]' – IBM based BITNET (USA), EARN (Europe) - e.g. ‘JF2 AT RALVM’ – VAX based DECNET - (‘RALMVS:JFS’) – Others, too... • Login, job submission, email were easy within any network. • These networks were incompatible, but connected via gateways. • You could get info from remote computers anywhere in the world, but only if you had an account there and you could navigate the gateways. • Connectivity could be lost for hours or days at a time. 2008-11-27 Joe Foster 20 eX 1980s Attitudes to Network Use “2. ARPANET ACCESS SERC users wishing to use ARPANET to access machines in the USA are reminded that formal approval is required from SERC and the ARPANET Governing Committee. Approved users are provided with identifiers and password and must observe the strict rule that this is only for their use. Anyone who allows their identifier and password to be used by someone not authorised to access ARPANET will have their approval withdrawn immediately. Use of this International Network is carefully scrutinised by British Telecom who have a representative on the Governing Committee. Abuse of the system could seriously prejudice the future for ARPANET.” 2008-11-27 Joe Foster 21 eX Hot Topics from CHEP ‘87 • Physics Analysis Workstation (PAW) . • Supercomputing, parallel computing (e.g. transputers). • Vectorization of HEP software. • Relational databases for HEP. • Use of structured analysis and software engineering. • Working towards object oriented software. • Compatibility: networks, languages, graphics. • Optical storage for HEP. • Dream of a terabyte scale disk farm somewhere... 2008-11-27 Joe Foster 22 eX HEP Computing in the 1990s • The 1990s brought many improvements to HEP computing. – C and C++ began to replace FORTRAN. – Object oriented techniques began to be widely adopted. – Open source and ‘Free’ software became important. – Most HEP computing done under some form of Unix. – Farms of cheap PCs running linux replaced mainframes as workhorses. – Many new productivity tools became available: • Interactive Development Environments, e.g. LabView. • Powerful text editors with features like syntax colouring. • Database Management Systems for relational databases. • Interpreted languages like PERL and PYTHON made scripting easy and powerful. • Tools like CMT (Configuration Management Tool) make it easy to define standard releases of large applications with many components - e.g. ATLAS reconstruction. 2008-11-27 Joe Foster 23 eX 1990s Distributed Computing • Big changes came in the 1990s – UK joined the rest of the Internet and adopted TCP/IP protocols, ditching the old ‘big-endian’ SERCNET addresses, as well as BITNET + DECNET.