<<

Swift Parallel Scripting: Workflows for Simulations and Data Analytics at Extreme Scale

Michael Wilde [email protected]

hp://swi-lang.org The Swift Team

§ Timothy Armstrong, Yadu Nand Babuji, Ian Foster, Mihael Hategan, Daniel S. Katz, Ketan Maheshwari, Michael Wilde, Jusn Wozniak, Yangxinye Yang § 2015 REU Summer Collaborators: Jonathan Burge, Mermer Dupres, Basheer Subei, Jacob Taylor § Contribuons by Ben Clifford, Luiz Gadelha, Yong Zhao, Sco Krieder, Ioan Raicu, Tiberius Stef-Praun, Mahew Shaxted § Our sincere thanks to the enre Swi user community

hp://swi-lang.org 2 Increasing capabilities in computational science “Complexity”

Domain of Interest “Time”

hp://swi-lang.org 3 Workflow needs

§ Applicaon Drivers – Applicaons that are many-task in nature: parameters sweeps, UQ, inverse modeling, and data-driven applicaons – Analysis of capability applicaon outputs – Analysis of stored or collected data – Increase producvity at major research instrumentaon – Urgent compung – These applicaons are all many-task in nature § Requirements – Usability and ease of workflow expression – Ability to leverage complex architecture of HPC and HTC systems (fabric, scheduler, hybrid node and programming models), individually and collecvely – Ability to integrate high-performance data services and volumes – Make use of the system task rate capabilies from clusters to extreme-scale systems § Approach – A programming model for programming in the large

4 When do you need HPC workflow? Example application: protein-ligand docking for drug screening

O(100K) O(10) proteins drug implicated in a disease X candidates

… = 1M docking tasks…

…then hundreds of detailed MD models to find 10-20 fruiul candidates for wetlab & APS (B) crystallography Expressing this many task workflow in Swift

For protein docking workflow: foreach p, i in proteins { foreach c, j in ligands { (structure[i,j], log[i,j]) = dock(p, c, minRad, maxRad); } scatter_plot = analyze(structure) To run: swift –site tukey,blues dock.swift

6 Swift enables execution of simulation campaigns across multiple HPC and cloud resources

Data servers Data servers Data servers

Petascale systems

Naonal infrastructure

Scripts

Campus systems

Local data Apps mark

Swift host: login node, laptop, … Cloud resources

The Swi runme system has drivers and algorithms to efficiently support and aggregate diverse runme environments

7

colorize analyze assemble colorize analyze mark assemble colorize analyze mark assemble colorize analyze apps mark assemble colorize analyze apps mark assemble apps mark apps apps Comparison of parallel programming models

§ MPI – Each process has a persistent state for the life of the applicaon – Data is exchanged via messages – Program an app by “domain decomposion” § Shared Memory – Tasks need not be persistent – Tasks iniated explicitly (eg pthreads) or implicitly (eg by OpenMP direcves) – Challenge is to assure mutual exclusion and avoid deadlock – Limited to running within a single node, or use hybrid approach for mul-node § Workflow – Tasks run to compleon – Data is exchanged on task iniaon and compleon – Used to coordinate execuon of mulple serial or parallel apps § Hybrid – MPI and/or OpenMPI apps within a many-task workflow

8 Swift in a nutshell

§ Data types § Structured data string s = “hello world”; image A[]; int i = 4; int A[]; § Loops § Mapped data types foreach f,i in A { type image; B[i] = convert(A[i]); image file1<“snapshot.jpg”>; }

§ Mapped funcons app (file o) myapp(file f, int i) § Data flow { mysim "-s" i @f @o; } analyze(B[0], B[1]);

analyze(B[2], B[3]); § Convenonal expressions if (x == 3) { y = x+2; s = @strcat(“y: ”, y); }

Swi: A language for distributed parallel scripng, J. Parallel Compung, 2011 Encapsulation enables distributed parallelism

Typed Swi data object Applicaon program

Swi app( ) funcon Files expected or produced Interface definion by applicaon program

Encapsulaon is the key to transparent distribuon, parallelizaon, and automac provenance capture Critical in a world of scientific, engineering, technical and analytical applications

10 app( ) funcons specify command line arg passing

To run: Fasta psim -s 1ubq.fas -pdb p -t 100.0 -d 25.0 >log file 100.0 25.0 In Swift code:

seq dt t

app (PDB pg, Text log) predict (Protein seq, Float t, Float dt) { psim "-c" "-s" @pseq.fasta "-pdb" @pg -t -c -s -d "–t" temp "-d" dt; PSim applicaon }

pdb > Protein p ; Swi app funcon PDB structure; Text log;

pg “predict()” log

(structure, log) = predict(p, 100., 25.);

11 Implicitly parallel § Swi is an implicitly parallel funconal programming language for clusters, grids, clouds and supercomputers § All expressions evaluate when their data inputs are “ready”

(int ) myproc (int i) { int f = F(i); int g = G(i); r = f + g; }

§ F() and G() are computed in parallel – Can be Swi funcons, or leaf tasks (executables or scripts in shell, python, R, Octave, MATLAB, ...) § r computed when they are done § This parallelism is automac § Works recursively throughout the program’s call graph Pervasive parallel data flow Functional composition in Swift – enables powerful parallel loops 1. Sweep(Protein pSet[ ]) 2. { 3. int nSim = 1000; 4. int maxRounds = 3; 5. float startTemp[ ] = [ 100.0, 200.0 ]; 6. float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ]; 7. foreach p, pn in pSet { 8. foreach t in startTemp { 9. foreach d in delT { 10. IteraveFixing(p, nSim, maxRounds, t, d); 11. } 12. } 10 proteins x 1000 simulaons x 13. } 3 rounds x 2 temps x 5 deltas 14. } = 300K tasks

14 Data-intensive example: Processing MODIS land-use data

landUse' x'317' colorize' Swift loops process hundreds of images in parallel x'317'

analyze'

mark' assemble'

Image processing pipeline for land-use data from the MODIS satellite instrument… Processing MODIS land-use data

foreach raw,i in rawFiles { land[i] = landUse(raw,1); colorFiles[i] = colorize(raw); } (topTiles, topFiles, topColors) = analyze(land, landType, nSelect); gridMap = mark(topTiles); montage = assemble(topFiles,colorFiles,webDir); Example of Swift’s implicit parallelism: Processing MODIS land-use data

landUse' x'317' colorize' Swift loops process hundreds of images in parallel x'317'

analyze'

mark' assemble'

Image processing pipeline for land-use data from the MODIS satellite instrument…

hp://swi-lang.org Dataset mapping example: deep fMRI directory tree type Study { Group g[ ]; } type Group { Subject s[ ]; On-Disk } Data type Subject { Swift’s Layout Volume anat; in- Run run[ ]; memory } data type Run { model Volume v[ ]; } type Volume { Mapping funcon Image img; or script Header hdr; } 18 Spatial normalization of functional MRI runs

reorientRun reorient reorient/25 reorient/27 reorient/29 reorient/09 reorient/01 reorient/05 reorient/31 reorient/33 reorient/35 reorient/37

reorientRun reorient reorient/51 reorient/52 reorient/53 reorient/10 reorient/02 reorient/06 reorient/54 reorient/55 reorient/56 reorient/57

random_select alignlinear alignlinear/11 alignlinear/03 alignlinear/07 alignlinearRun reslice reslice/12 reslice/04 reslice/08 resliceRun softmean softmean/13 softmean

alignlinear alignlinear/17 alignlinear

combine_warp combinewarp/21 combinewarp

reslice_warp reslice_warp/26 reslice_warp/28 reslice_warp/30 reslice_warp/24 reslice_warp/22 reslice_warp/23 reslice_warp/32 reslice_warp/34 reslice_warp/36 reslice_warp/38 reslice_warpRun

strictmean/39 strictmean strictmean

binarize binarize binarize/40

gsmoothRun gsmooth gsmooth/44 gsmooth/45 gsmooth/46 gsmooth/43 gsmooth/41 gsmooth/42 gsmooth/47 gsmooth/48 gsmooth/49 gsmooth/50 Dataset-level workflow Expanded (10 volume) workflow hp://swi-lang.org Complex scripts can be well-structured programming in the large: fMRI spatial normalization script example (Run or) reorientRun ( Run ir, string direction) (Run snr) functional ( Run r, NormAnat a, { foreach Volume iv, i in ir.v { Air shrink ) or.v[i] = reorient(iv, direction); { Run yroRun = reorientRun( r , "y" ); } Run roRun = reorientRun( yroRun , "x" ); } Volume std = roRun[0]; Run rndr = random_select( roRun, 0.1 ); AirVector rndAirVec = align_linearRun( rndr, std, 12, 1000, 1000, "81 3 3" ); Run reslicedRndr = resliceRun( rndr, rndAirVec, "o", "k" ); Volume meanRand = softmean( reslicedRndr, "y", "null" ); Air mnQAAir = alignlinear( a.nHires, meanRand, 6, 1000, 4, "81 3 3" ); Warp boldNormWarp = combinewarp( shrink, a.aWarp, mnQAAir ); Run nr = reslice_warp_run( boldNormWarp, roRun ); Volume meanAll = strictmean( nr, "y", "null" ) Volume boldMask = binarize( meanAll, "y" ); snr = gsmoothRun( nr, boldMask, "6 6 6" ); } Provenance graph from Swift log

Work of Luiz Gadelha and Maria Luiza Mondelli, LNCC Provenance data from log into SQL DB

Work of Luiz Gadelha and Maria Luiza Mondelli, LNCC Use SQL to mine insight from provenance

Work of Luiz Gadelha and Maria Luiza Mondelli, LNCC Domain-specific metadata on workflow results

Work of Luiz Gadelha and Maria Luiza Mondelli, LNCC Swift’s distributed architecture is based on a client/worker mechanism (internally named “coasters”)

Data servers Data servers Data servers

Petascale systems

Naonal infrastructure

Scripts

Campus systems

Local data Apps mark

Swift host: login node, laptop, … Cloud resources

The Swi runme system has drivers and algorithms to efficiently support and aggregate diverse runme environments

25

colorize analyze assemble colorize analyze mark assemble colorize analyze mark assemble colorize analyze apps mark assemble colorize analyze apps mark assemble apps mark apps apps Worker architecture handles diverse environments

Swift  

t i Scripts compilation

m e

t

i

b

s u    swift command S Karajan

mark LocalAPI data Apps

Swift hCoasterost: login Client node, laptop, …

socket

e

t

o

e

t

i Coaster Service

m

s

e

R sockets

e

t

u

s

p

e

t

i

m

s Worker Worker Worker Worker

o

C scheduler job

26 Summary of Swift main benefits

Makes parallelism more transparent Implicitly parallel funconal dataflow programming Makes compung locaon more transparent Runs your script on mulple distributed sites and diverse compung resources (desktop to petascale) Makes basic failure recovery transparent Retries/relocates failing tasks Can restart failing runs from point of failure Enables provenance capture Tasks have recordable inputs and outputs

27 BUT: Centralized evaluation can be a bottleneck at extreme scales

Had this (Swi/K): For extreme scale, we need this (Swi/T):

28 Two Swift implementations § Swi/K: Classic, Java Swi with “Karajan” engine – Portable, mature (2006) – Runs script from single node – Installs anywhere, instantly: just untar and run – Runs any POSIX app – serial or parallel – 500-1000 tasks/second – Use for irregular workloads and flexible MPI app invocaon § Swi/T: HPC Swi (in C) with “Turbine” engine – Faster, newer (2011) – Runs script and apps from mulnode MPI program – Runs anywhere that MPI runs – Runs both POSIX apps and and library funcons (C/C++, Fortran, Python, R, Julia) – 1.5 billion tasks/s on 512K Blue Waters cores – Use for fine-grained tasking, in-memory workflow and single- core MPI apps

hp://swi-lang.org 29

Swift/T: productive extreme-scale scripting

Parallel Swi worker process Swi worker process evaluator Swi MPI Swi worker process Swi Scripts control and control C C++ Fortran Fortran data store process C C C++ C++ Fortran process

§ Script-like programming with “leaf” tasks – In-memory funcon calls in C++, Fortran, Python, R, … passing in-memory objects – More expressive than master-worker for “programming in the large” – Leaf tasks can be MPI programs, etc. Can be separate processes if OS permits. § Distributed, scalable runme manages tasks, load balancing, data movement § User funcon calls to external code run on thousands of worker nodes Parallel tasks in Swift/T

§ Swi expression: z = @par=32 f(x,y); § ADLB server finds 8 available workers – Workers receive ranks from ADLB server – Performs comm = MPI_Comm_create_group() § Workers perform f(x,y)communicang on comm

LAMMPS parallel tasks foreach i in [0:20] { t = 300+i; sed_command = sprintf("s/_TEMPERATURE_/%i/g", t); lammps_file_name = sprintf("input-%i.inp", t); lammps_args = "-i " + lammps_file_name; file lammps_input = sed(filter, sed_command) => @par=8 lammps(lammps_args); }

§ LAMMPS provides a convenient C++ API § Easily used by Swi/T parallel tasks Tasks with varying sizes packed into big MPI run Black: Compute Blue: Message White: Idle Swift/T-specific features

§ Task locality: Ability to send a task to a process – Allows for big data –type applicaons – Allows for stateful objects to remain resident in the workflow – location L = find_data(D); int y = @location=L f(D, x); § Data broadcast § Task priories: Ability to set task priority – Useful for tweaking load balancing § Updateable variables – Allow data to be modified aer its inial write – Consumer tasks may receive original or updated values when they emerge from the work queue

Wozniak et al. Language features for scalable distributed-memory dataflow compung. Proc. Dataflow Execuon Models at PACT, 2014.

33 Swift/T: scaling of trivial foreach { } loop 100 microsecond to 10 millisecond tasks on up to 512K integer cores of Blue Waters

34 Large-scale applications using Swift

A § Simulaon of super- cooled glass materials § B Protein and biomolecule A structure and interacon

C § Climate model analysis and decision making for global B food producon & supply § D Materials science at the Advanced Photon Source

E § Mulscale subsurface flow modeling F E F § Modeling of power grid for OE applicaons All have published science results obtained using Swi D C Benefit of implicit pervasive parallelism: Analysis & visualization of high-resolution climate models powered by Swift

§ Diagnosc scripts for each climate model (ocean, atmospehere, land, ice) were expressed in complex shell scripts § Recoded in Swi, the CESM community has benefited from significant speedups and more modular scripts

Work of: J Dennis, M Woitasek, S Mickelson, R Jacob, M Vertenstein

AMWG Diagnosc Package Results:

Rewring the AMWG Diagnosc Package in Swi created a 3X speedup.

(a) The AMWG Diagnosc Package was used to calculate the climatological mean files for 5 years of 0.10 degree up-sampled data from a 0.25 degree CAM-SE cubed sphere simulaon. This was ran on fusion, a cluster at Argonne, on 4 nodes using one core on each. (b) The AMWG Diagnosc Package was used to compare two data sets. The data consisted of two sets of 30 year, 1 degree monthly average CAM files. This was ran on one data analysis cluster node on mirage at NCAR. (c) The AMWG Package was used to compare 10 years of 0.5 degree resoluon CAM monthly output files to observaonal data. This comparison was also ran on one node on mirage. Boosting Light Source Productivity with Swift ALCF Data Analysis H Sharma, J Almer (APS); J Wozniak, M Wilde, I Foster (MCS) Impact and Approach Accomplishments ALCF Contributions • HEDM imaging and analysis • Mira analyzes experiment in • Design, develop, support, and trial shows granular 10 mins vs. 5.2 hours on APS user engagement to make Swi material structure, of cluster: > 30X improvement workflow soluon on ALCF non-destrucvely ! • Scaling up to ~ 128K cores systems a reliable, secure and • APS Sector 1 sciensts use (driven by data features) supported producon service Mira to process data from live • Cable flaw was found and HEDM experiments, providing fixed at start of experiment, • Creaon and support of the Petrel real-me feedback to correct saving an enre mul-day data server or improve in-progress experiment and valuable user experiments me and APS beam me. • Reserved resources on Mira for • Sciensts working with • In press: High-Energy Synchrotron X- APS HEDM experiment at Sector Discovery Engines LDRD ray Techniques for Studying Irradiated developed new Swi analysis Materials, J-S Park et al, J. Mat. Res. 1-ID beamline (8/10/2014 and workflows to process APS data • Big data staging with MPI-IO for future sessions in APS 2015 Run 1) from Sectors 1, 6, and 11 interacve X-ray science, J Wozniak et al, Big Data Conference, Dec 2014

1 Analyze

2 3 5 Assess Fix Valid Data! 4 Re-analyze Red indicates higher stascal confidence in data Conclusion: parallel workflow scripting is practical, productive, and necessary, at a broad range of scales

§ Swi programming model demonstrated feasible and scalable on XSEDE, Blue Waters, OSG, DOE systems § Applied to numerous MTC and HPC applicaon domains – aracve for data-intensive applicaons – and several hybrid programming models § Proven producvity enhancement in materials, genomics, biochem, earth systems science, … § Deep integraon of workflow in progress at XSEDE, ALCF Workflow through implicitly parallel dataflow is productive for applications and systems at many scales, including on highest-end system What’s next? § Programmability – New paerns ala Van Der Aalst et al (workflowpaerns.org) § Fine grained dataflow – programming in the smaller? – Run leaf tasks on accelerators (CUDA GPUs, Intel Phi) – How low/fast can we drive this model? § PowerFlow – Applies dataflow semancs to manage and reduce energy usage § Extreme-scale reliability § Embed Swi semancs in Python, R, Java, shell, make – Can we make Swi “invisible”? Should we? § Swi-Reduce – Learning from map-reduce – Integraon with map-reduce

GeMTC: GPU-enabled Many-Task Computing Movaon: Support for MTC on all accelerators! Goals: Approach: 1) MTC support 2) Programmability Design & implement GeMTC middleware: 3) Efficiency 4) MPMD on SIMD 1) Manages GPU 2) Spread host/device 5) Increase concurrency to warp level 3) Workflow system integraon (Swi/T)

Further research directions

§ Deeply in-situ processing for extreme-scale analycs § Shell-like Read-Evaluate-Print Loop ala iPython § Debugging of extreme-scale workflows

Deeply in-situ analytics of a climate simulation

The Swift Team § Timothy Armstrong, Yadu Nand Babuji, Ian Foster, Mihael Hategan, Daniel S. Katz, Ketan Maheshwari, Michael Wilde, Jusn Wozniak, Yangxinye Yang § 2015 REU Summer Collaborators: Jonathan Burge, Mermer Dupres, Basheer Subei, Jacob Taylor § Contribuons by Zhao Zhang, Ben Clifford, Luiz Gadelha, Yong Zhao, Sco Krieder, Ioan Raicu, Tiberius Stef-Praun, Mahew Shaxted § Sincere thanks to the enre Swi user community Author's personal copy

Parallel Computing 37 (2011) 633–652

Contents lists available at ScienceDirect

Parallel Computing

journal homepage: www.elsevier.com/locate/parco

Swift: A language for distributed parallel scripting

a,b, a b d a Michael Wilde ⇑, Mihael Hategan , Justin M. Wozniak , Ben Clifford , Daniel S. Katz , Ian Foster a,b,c a Computation Institute, University of Chicago and Argonne National Laboratory, United States b Mathematics and Computer Science Division, Argonne National Laboratory, United States c Department of Computer Science, University of Chicago, United States d Department of Astronomy and Astrophysics, University of Chicago, United States article info abstract

Article history: Scientists, engineers, and statisticians must execute domain-specific application programs Available online 12 July 2011 many times on large collections of file-based data. This activity requires complex orches- tration and data management as data is passed to, from, and among application invoca- Keywords: tions. Distributed and parallel computing resources can accelerate such processing, but Swift their use further increases programming complexity. The Swift parallel scripting language Parallel programming reduces these complexities by making file system structures accessible via language con- Scripting structs and by allowing ordinary application programs to be composed into powerful par- Dataflow allel scripts that can efficiently utilize parallel and distributed resources. We present Swift’s implicitly parallel and deterministic programming model, which applies external applications to file collections using a functional style that abstracts and simplifies distrib- uted parallel execution. Parallel Compung, Sep 2011 Ó 2011 Elsevier B.V. All rights reserved. hp://swi-lang.org 43

1. Introduction

Swift is a scripting language designed for composing application programs into parallel applications that can be executed on multicore processors, clusters, grids, clouds, and supercomputers. Unlike most other scripting languages, Swift focuses on the issues that arise from the concurrent execution, composition, and coordination of many independent (and, typically, distrib- uted) computational tasks. Swift scripts express the execution of programs that consume and produce file-resident datasets. Swift uses a C-like syntax consisting of function definitions and expressions, with dataflow-driven semantics and implicit par- allelism. To facilitate the writing of scripts that operate on files, Swift mapping constructs allow file system objects to be ac- cessed via Swift variables. Many parallel applications involve a single message-passing parallel program: a model supported well by the Message Passing Interface (MPI). Others, however, require the coupling or orchestration of large numbers of application invocations: either many invocations of the same program or many invocations of sequences and patterns of several programs. Scaling up requires the distribution of such workloads among cores, processors, computers, or clusters and, hence, the use of parallel or grid computing. Even if a single large parallel cluster suffices, users will not always have access to the same system (e.g., big machines may be congested or temporarily unavailable because of maintenance). Thus, it is desirable to be able to use what- ever resources happen to be available or economical at the moment when the user needs to compute—without the need to continually reprogram or adjust execution scripts.

Corresponding author at: Computation Institute, University of Chicago and Argonne National Laboratory, United States. Tel.: +1 630 252 3351. ⇑ E-mail address: [email protected] (M. Wilde).

0167-8191/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.parco.2011.05.005 New Book on Programming Models Editor: Pavan Balaji Chapter Contribuons: • MPI: W. Gropp and R. Thakur • GASNet: P. Hargrove • OpenSHMEM: J. Kuehn and S. Poole • UPC: K. Yelick and Y. Zheng • GA: S. Krishnamoorthy, J. Daily, A. Vishnu, and B. Palmer • Chapel: B. Chamberlain • Charm++: L. Kale, N. Jain, and J. Lifflander • ADLB: E. Lusk, R. Butler, and S. Pieper • Scioto: J. Dinan • Swi: T. Armstrong, J. M. Wozniak, M. Wilde, and I. Foster • CnC: K. Knobe, M. Burke, and F. Schlimbach • OpenMP: B. Chapman, D. Eachempa, and S. Chandrasekaran • Cilk Plus: A. Robison and C. Leiserson • Intel TBB: A. Kukanov • CUDA: W. Hwu and D. Kirk • OpenCL: T. Mason Pre-order at https://mitpress.mit.edu/models Discount code: MBALAJI30 (valid till 12/31/2015) Join the Swift Community: explore, use, engage

hp://swi-lang.org 45 Learning Swift Author's personal copy

Parallel Computing 37 (2011) 633–652

Contents lists available at ScienceDirect

Parallel Computing

journal homepage: www.elsevier.com/locate/parco

Swift: A language for distributed parallel scripting

a,b, a b d a Michael Wilde ⇑, Mihael Hategan , Justin M. Wozniak , Ben Clifford , Daniel S. Katz , Ian Foster a,b,c

a Computation Institute, University of Chicago and Argonne National Laboratory, United States b Mathematics and Computer Science Division, Argonne National Laboratory, United States c Department of Computer Science, University of Chicago, United States d Department of Astronomy and Astrophysics, University of Chicago, United States

article info abstract

Article history: Scientists, engineers, and statisticians must execute domain-specific application programs Available online 12 July 2011 many times on large collections of file-based data. This activity requires complex orches- tration and data management as data is passed to, from, and among application invoca- Keywords: tions. Distributed and parallel computing resources can accelerate such processing, but Swift their use further increases programming complexity. The Swift parallel scripting language Parallel programming reduces these complexities by making file system structures accessible via language con- Scripting structs and by allowing ordinary application programs to be composed into powerful par- Dataflow allel scripts that can efficiently utilize parallel and distributed resources. We present Swift’s implicitly parallel and deterministic programming model, which applies external applications to file collections using a functional style that abstracts and simplifies distrib- uted parallel execution. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction

Swift is a scripting language designed for composing application programs into parallel applications that can be executed on multicore processors, clusters, grids, clouds, and supercomputers. Unlike most other scripting languages, Swift focuses on the issues that arise from the concurrent execution, composition, and coordination of many independent (and, typically, distrib-46 hp://swi-uted) computationallang.org tasks. Swift scripts express the execution of programs that consume and produce file-resident datasets. Swift uses a C-like syntax consisting of function definitions and expressions, with dataflow-driven semantics and implicit par- allelism. To facilitate the writing of scripts that operate on files, Swift mapping constructs allow file system objects to be ac- cessed via Swift variables. Many parallel applications involve a single message-passing parallel program: a model supported well by the Message Passing Interface (MPI). Others, however, require the coupling or orchestration of large numbers of application invocations: either many invocations of the same program or many invocations of sequences and patterns of several programs. Scaling up requires the distribution of such workloads among cores, processors, computers, or clusters and, hence, the use of parallel or grid computing. Even if a single large parallel cluster suffices, users will not always have access to the same system (e.g., big machines may be congested or temporarily unavailable because of maintenance). Thus, it is desirable to be able to use what- ever resources happen to be available or economical at the moment when the user needs to compute—without the need to continually reprogram or adjust execution scripts.

Corresponding author at: Computation Institute, University of Chicago and Argonne National Laboratory, United States. Tel.: +1 630 252 3351. ⇑ E-mail address: [email protected] (M. Wilde).

0167-8191/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.parco.2011.05.005 Try Swift from your browser: http://swift-lang.org/tryswift

hp://swi-hp://swi-lang.orglang.org 47 Appendix:

Examples of Swift Science Applications (~ Sep 2014)

hp://swi-lang.org 48 Glass Structure Modeling powered by Swift

This project models of aspects of glass structure at a theorecal chemistry level. (Hocky/Reichman)

Recent studies of the glass transion in model systems have focused on calculang from theory or simulaon what is known as the “mosaic length”. This project evaluated a new “cavity method” for measuring this length scale. Correlaon funcons are calculated at the interior of cavies of varying sizes and averaged over many independent simulaons to determine a thermodynamic length. Using Swi on Beagle, Hocky invesgated whether this thermodynamic length causes variaons among seemingly idencal systems. ~1M Beagle CPU hours were used.

Results: Three simple models of glassy behavior were studied. All appear the same (top, abc) but only two of which have parcles relaxing at the same rate for the same temperature (top, d). This would imply that the glass structure does not dictate the dynamics. A new computaonal technique was used to extract a length scale on which the liquid is ordered in an otherwise undetectable way. Results (boom) showed that using this length we can disnguish the two systems which have the same dynamics as separate from the third which has faster dynamics than the other two.

Published in Physical Review Leers B.

Powder diffraction experiment analysis workflow: making a notable difference to APS users!

10s images 100s spectra 1000s peaks 10 parameters

Detector* Diffrac'on*data* Batch& Binning&/& characteriza'on/* Peak&fi3ng& Instrument* fromLaB6/CeO2* correc*on& Caking& parameters*

Learn*something*about* the*material*(strain*/* Diffrac'on*data* Batch& Binning&/& texture)*and*error*bar* from*sample* Peak&fi3ng& correc*on& Caking& associated*with*the* data* 10000s images 100000s spectra 1000000s peaks § Background-removal step extracted into separate step for Powder Diffracon beamline (Sector 1) § Used 210+ mes by 30 users to process 50TB (90% of PD data at Sector 1) in the past 6 months § Enables Sector 1 users to test data quality at beam me, and to leave APS with all their data, ready to analyze SwiftSeq Fast parallel annotation of next-generation sequence data powered by Swift

Efficiently and dynamically balances RAM and CPU on 5K-15K cores

Work of Jason Pi and Kevin White, UChicago IGSB hp://swi-lang.org 51 Protein structure prediction powered by Swift

T0623, 25 res. The laboratories of Karl Freed and Tobin Sosnick use Beagle to develop and validate 8.2Å to 6.3Å methods to predict protein structure using homology-free approaches. (excluding tail region)

In this lab, Aashish Adhikari (now UC Berkeley) has developed new structure predicon techniques based on Monte Carlo simulated annealing which employ novel, compact molecular representaons and innovave “moves” of the protein backbone to achieve accurate predicon with far less computaon then previous methods. One of the applicaons of the method involves rebuilding local regions in protein structures, called “loop modeling”, a problem which the group T0585,45 res. 15.5Å to tackled with considerable success in the CASP protein folding tournament(shown in 9.1Å right).They are now tesng further algorithmic innovaons using the computaonal power of Beagle.

Results: The group developed a new iterave algorithm for predicng protein structure and folding pathway starng only from the amino acid sequence. Inial OOPS modeling Nave Protein loop modeling. Courtesy A. Adhikari

Protein-RNA interaction modeling powered by Swift

M. Parisien (with T. Sosnick, T. Pan, and K. Freed) developed a novel algorithm for the predicon of the RNA- protein interactome, on the UCHicago Beagle Cray XE6, powered by Swi.

Non-coding RNAs oen funcon in cells through specific interacons with their protein partners. Experiments alone cannot provide a complete picture of the RNA-protein interactome. To complement experimental methods, computaonal approaches are highly desirable. No exisng method, however, can provide genome-wide predicons of docked RNA-protein complexes. The applicaon of computaonal predicons, together with experimental methods, provides a more complete understanding on cellular networks and funcon of RNPs. The approach makes use of a rigid-body docking algorithm and a scoring funcon custom- tailored for protein-tRNA interacons. Using Swi, Beagle screened ~300 proteins per day on 1920 cores.

Results: the scoring funcon can idenfy the nave docking conformaon in large sets of decoys (100,000) for many known protein-tRNA complexes (4TRA shown here). (le) Scores for true posive complexes (●)(N=28) are compared to true negave ones of low (▼)(N=40) and high (▲) (N=40) isoelectric points. (right) Because the density curve of the true posives, which have pI < 7, has minimal overlap with the curve of the low pI true negaves (blue area), the scoring funcon has the specificity to idenfy tRNA-binding proteins.

Docked complexes: (L) tRNA docked at many posions.

(R) Many conformaons in a docking site tesng site robustness.

Systemac predicon and validaon of RNA-protein interactome. Parisien M, Sosnick TR, Pan T. Kyoto, June 12-19, 2011, RNA Society. Protein-RNA interacon. Courtesy M. Parisien

RDCEP Center for Robust Decision Making on Climate and Energy Policy powered by Swift The RDCEP project operates a large-scale integrated modeling framework for decision makers in climate and energy policy, powered by Swi. (Ian Foster, Joshua Ellio, et al)

UChicago’s Midway research cluster is used to study land use, land cover, and the impacts of climate change on agriculture and the global food supply. Using a DSSAT 4.0 (“decision support system for agrotechnology transfer”) crop systems model, a parallel simulaon framework was implemented using Swi. Simulaon campaigns measure, e.g., yield and climate impact for a single crop (maize) across the conterminous USA with daily weather data and climate model output for 120 years (1981-2100) and 16 different configuraons ferlizer, irrigaon, and culvar.

Top two maps: maize yields across the USA with intensive nitrogen applicaon and full irrigaon Boom two maps show results with no irrigaon. Each map is a RDCEP model run of ~120,000 DSSAT invocaons.

DSSAT models of corn yield. Courtesy J. Ellio and K. Maheshwari

Modeling climate impact on hydrology powered by Swift

Projecng biofuel producon impact on hydrology (E. Yan)

This project studies the impact of global temperature increase on the Upper Mississippi River Basin on water and plant producvity. It is in the process of combining future climate data obtained from a stascally downscaled global circulaon model (GCM) into the Upper Mississippi River Basin model. The results from these models will be used in the proposed study to evaluate the relave performance of the proposed coupling of climate and hydrology models.

Results of this research demonstrate that plausible changes in temperature and precipitaon caused by increases in atmospheric greenhouse gas concentraons could have major impacts on both the ming and Visualizaon of mulple layers of SWAT magnitude of runoff, soil moisture, water quality, hydrology model. Courtesy E. Yan. water availability, and crop yield (including energy crops) in important agricultural areas.

Benefit of implicit pervasive parallelism: Analysis & visualization of high-resolution climate models powered by Swift

§ Diagnosc scripts for each climate model (ocean, atmospehere, land, ice) were expressed in complex shell scripts § Recoded in Swi, the CESM community has benefited from significant speedups and more modular scripts

Work of: J Dennis, M Woitasek, S Mickelson, R Jacob, M Vertenstein

hp://swi-lang.org

AMWG Diagnosc Package Results:

Rewring the AMWG Diagnosc Package in Swi created a 3X speedup.

(a) The AMWG Diagnosc Package was used to calculate the climatological mean files for 5 years of 0.10 degree up-sampled data from a 0.25 degree CAM-SE cubed sphere simulaon. This was ran on fusion, a cluster at Argonne, on 4 nodes using one core on each. (b) The AMWG Diagnosc Package was used to compare two data sets. The data consisted of two sets of 30 year, 1 degree monthly average CAM files. This was ran on one data analysis cluster node on mirage at NCAR. (c) The AMWG Package was used to compare 10 years of 0.5 degree resoluon CAM monthly output files to observaonal data. This comparison was also ran on one node on mirage. Swift gratefully acknowledges support from:

U.S. DEPARTMENT OF ENERGY

hp://swi-lang.org

57