Final Program and Abstracts

This conference is sponsored by the SIAM Activity Group on Supercomputing, and is co-sponsored by Inria and Université Pierre et Marie Curie.

The SIAM Activity Group on Supercomputing provides a forum for computational math- ematicians, computer scientists, computer architects, and computational scientists to exchange ideas on mathematical algorithms and computer architecture needed for high- performance computer systems. The activity group promotes the exchange of ideas by focusing on the interplay of analytical methods, numerical analysis, and efficient compu- tation. The activity group organizes a biennial SIAM Conference on Parallel Processing, awards the SIAG/Supercomputing Career Prize, the SIAG/Supercomputing Early Career Prize, and the SIAG/Supercomputing Best Paper Prize, and maintains a member directory and an electronic mailing list.

Society for Industrial and Applied Mathematics 3600 Market Street, 6th Floor Philadelphia, PA 19104-2688 USA Telephone: +1-215-382-9800 Fax: +1-215-386-7999 Conference Email: [email protected] Conference Web: www.siam.org/meetings/ Membership and Customer Service: (800) 447-7426 (US& Canada) or +1-215-382-9800 (worldwide)

www.siam.org/meetings/pp16 2 SIAM Conference on Parallel Processing for Scientific Computing

Table of Contents John Shalf SIAM Registration Desk Lawrence Berkeley National Registration and all sessions will be Laboratory, USA Program-at-a-Glance...... Separate piece located at Université Pierre et Marie

General Information...... 2 Peter Tang Curie, Cordeliers Campus. Get-togethers...... 4 Intel Corporation, USA The SIAM registration desk is located in Invited Plenary Presentations...... 6 the Entrance Hall. It is open during the Sivan Toledo following hours: Prize(s)...... 8 Tel Aviv University, Israel Tuesday, April 12 Program Schedule...... 9 Weichung Wang 10:00 AM – 7:00 PM Poster Session...... 34 National Taiwan University, Taiwan Wednesday, April 13 Abstracts...... 65 7:45 AM- 5:00 PM Speaker and Organizer Index...... 149 Barbara Wohlmuth Thursday, April 14 Technische Universität München, 7:45 AM - 5:15 PM Campus Map...... Back Cover Germany Friday, April 15

8:15 AM - 5:15 PM Organizing Committee Local Organizing Conference Location Co-Chairs Committee All sessions and on-site registration Laura Grigori Xavier Claeys will take place at the Université Pierre Inria and Université Pierre et Marie Université Pierre et Marie Curie and et Marie Curie, Cordeliers Campus. Curie, France Inria, France The entrance is 21 rue de l’Ecole de

Médecine. Rich Vuduc Pierre Fortin Georgia Institute of Technology, USA Université Pierre et Marie Curie, France Université Pierre et Marie Curie, Cordeliers Campus Jean-Frederic Gerbeau 21 rue de l’Ecole de Médecine Organizing Committee Inria and Université Pierre et Marie 75006 Paris Rosa M. Badia Curie, France Barcelona Supercomputing Center, France Chantal Girodon Spain Inria, France

Costas Bekas Child Care IBM Research - Zurich, Switzerland Frederic Hecht Université Pierre et Marie Curie and Please visit http://paris.angloinfo.com/ af/705/baby-sitting-nanny-and-au-pair- Brad Chamberlain Inria, France services.html for information about Cray Inc., USA Baby Sitting services in Paris and Yvon Maday Ron Dror Université Pierre et Marie Curie Ile-de-France. Stanford University, USA and IUF, France and DAM, Brown University, USA Mary Hall Corporate Members University of Utah, USA Frederic Nataf CNRS, Université Pierre et Marie Curie and Affiliates Bruce Hendrickson and Inria, France SIAM corporate members provide Sandia National Laboratories, USA their employees with knowledge about, access to, and contacts in the applied Naoya Maruyama mathematics and computational sciences RIKEN, community through their membership benefits. Corporate membership is more Philippe Ricoux than just a bundle of tangible products Total, France and services; it is an expression of

support for SIAM and its programs. Olaf Schenk Università della Svizzera italiana (USI), SIAM is pleased to acknowledge its Switzerland corporate members and sponsors. In recognition of their support, non- member attendees who are employed by the following organizations are entitled to the SIAM member registration rate. SIAM Conference on Parallel Processing for Scientific Computing 3

Corporate Institutional Funding Agencies Standard Audio/Visual Members SIAM and the Conference Organizing Set-Up in Meeting Rooms The Aerospace Corporation Committee wish to extend their thanks SIAM does not provide computers for and appreciation to the U.S. National Air Force Office of Scientific Research any speaker. When giving an electronic Science Foundation and DOE Office presentation, speakers must provide Aramco Services Company of Advanced Scientific Computing their own computers. SIAM is not AT&T Laboratories - Research Research for their support of this responsible for the safety and security Bechtel Marine Propulsion Laboratory conference. of speakers’ computers. The Boeing Company Each meeting room will have one (1) screen and one (1) data projector. CEA/DAM The data projectors support VGA Department of National Defence connections only. Presenters requiring (DND/CSEC) alternate connection must provide their DSTO- Defence Science and own adaptor. Technology Organisation ExxonMobil Upstream Research Registration Fee Includes Hewlett-Packard Leading the applied • Admission to all technical sessions IBM Corporation mathematics community • Business Meeting (open to IDA Center for Communications Join SIAM and save! SIAG/SC members) Research, La Jolla SIAM Members save up to 100 € on full • Coffee breaks daily IDA Center for Communications registration for the 2016 SIAM Confer- • Poster Session Research, Princeton ence on Parallel Processing for Scientific • Room set-ups and audio/visual Institute for Computational and Computing (PP16)! Join your peers in equipment Experimental Research in Mathematics supporting the premier professional soci- • Welcome Reception (ICERM) ety for applied mathematicians and com- Institute for Defense Analyses, Center putational scientists. SIAM members for Computing Sciences receive subscriptions to SIAM Review, Optional Conference SIAM News, and SIAM Unwrapped, and Dinner Banquet Lawrence Berkeley National enjoy substantial discounts on SIAM A Conference Dinner Banquet will be Laboratory books, journal subscriptions, and confer- held on Thursday, April 14, from 7:30 Lockheed Martin ence registrations. PM - 10:30 PM at Salons Hoche (http:// Los Alamos National Laboratory www.salons-hoche.fr/), 9 Avenue For a limited time, nonmember regis- Max-Planck-Institute for Dynamics of Hoche, 75008, Paris. trants to PP16 receive a 25% discount Complex Technical Systems off of their annual membership dues.* A separate fee of 82.50 Euros applies. Mentor Graphics Please complete and return the applica- Dinner includes appetizer, main dish, National Institute of Standards and tion you received with your registration dessert, and coffee. Technology (NIST) materials or sign up online at https:// Salons Hoche is 1/2 mile from Metro my.siam.org using code MBPP16 when National Security Agency (DIRNSA) Charles de Gaulle Etoile (Arc de prompted. Triomphe) Oak Ridge National Laboratory, From Les Cordeliers: managed by UT-Battelle for the *this discount is available to new Department of Energy SIAM members only. Metro: line 4 from station Odeon Sandia National Laboratories (Direction Porte de Clignancourt) to station Chatelet Schlumberger-Doll Research Internet Access then, change to line 1 (Direction LA Tech X Corporation Wireless internet details will be DEFENSE) to station Charles de U.S. Army Corps of Engineers, distributed to participants onsite. Gaulle-Etoile Engineer Research and Development Complimentary wireless Internet access and then 10 minutes walk on Avenue Center will be available in the meeting rooms. Hoche Department of Energy ETA : 35 minutes Visit the registration desk for more List current March 2016. information. 4 SIAM Conference on Parallel Processing for Scientific Computing

Job Postings Conference Sponsors Statement on Inclusiveness Visit http://jobs.siam.org. As a professional society, SIAM is committed to providing an inclusive climate that encourages the open Important Notice to Poster expression and exchange of ideas, that is free from all forms of discrimination, Presenters harassment, and retaliation, and that The poster session is scheduled is welcoming and comfortable to all for Wednesday, April 13 at 7:15 members and to those who participate PM - 9:15 PM. Poster presenters in its activities. In pursuit of that are requested to set up their poster commitment, SIAM is dedicated to the material on the provided 157 cm philosophy of equality of opportunity x 115 cm poster boards in the and treatment for all participants Exhibition Space. All materials must regardless of gender, gender identity be posted by Wednesday, April 13 or expression, sexual orientation, at 7:15 PM, the official start time race, color, national or ethnic origin, of the session. Poster displays must religion or religious belief, age, be removed by the published time. marital status, disabilities, veteran Posters remaining after this time status, field of expertise, or any other will be discarded. SIAM is not reason not related to scientific merit. responsible for discarded posters. This philosophy extends from SIAM conferences, to its publications, and to its governing structures and bodies. SIAM Books and Journals We expect all members of SIAM and participants in SIAM activities to work Display copies of books and towards this commitment. complimentary copies of journals are Comments? available on site. SIAM books are Comments about SIAM meetings are available at a discounted price during encouraged! Please send to: the conference. If a SIAM books Social Media Cynthia Phillips, SIAM Vice President representative is not available, orders SIAM is promoting the use of social for Programs ([email protected]). may be placed according to the media, such as Facebook and Twitter, instructions on the Titles on Display in order to enhance scientific discussion form. Please Note at its meetings and enable attendees to connect with each other prior to, The local organizers are not during and after conferences. If you responsible for the safety and security are tweeting about a conference, please Get-togethers of attendees’ computers. Do not leave use the designated hashtag to enable Welcome Reception your laptop computers unattended. other attendees to keep up with the Tuesday, April 12, 2016 Please remember to turn off your cell Twitter conversation and to allow better 7:00 PM - 9:00 PM phones, pagers, etc. during sessions. archiving of our conference discussions.

Poster Session Recording of Presentations Wednesday, April 13, 2016 7:15 PM - 9:15 PM Audio and video recording of presentations at SIAM meetings is prohibited without the written Business Meeting permission of the presenter and SIAM. (open to SIAG/SC members) Wednesday, April 13, 2016 6:30 PM - 7:15 PM

Dinner Banquet (separate fees apply) Thursday, April 14, 7:30 PM - 10:30 PM Salons Hoche (http://www.salons-hoche.fr/) SIAM Conference on Parallel Processing for Scientific Computing 5

The hashtag for this meeting is SIAM Activity Group on Supercomputing (SIAG/SC) www.siam.org/activity/supercomputing A GREAT WAY TO GET INVOLVED! Collaborate and interact with mathematicians and applied scientists whose work involves mathematical algorithms and computer architecture.

ACTIVITIES INCLUDE: • Special sessions at SIAM Annual Meetings • Biennial conference • SIAG/SC Career Prize • SIAG/SC Early Career Prize • SIAG/SC Best Paper Prize

BENEFITS OF SIAG/SC MEMBERSHIP: • Listing in the SIAG’s online-only membership directory • Electronic communications about recent developments in your specialty • Eligibility for candidacy for SIAG/SC office • Participation in the selection of SIAG/SC officers ELIGIBILITY: • Be a current SIAM member. COST: • $10 per year • Student members can join two activity groups for free! 2016-17 SIAG/SC OFFICERS • Chair: Laura Grigori, INRIA • Vice Chair: Richard Vuduc, Georgia Institute of Technology • Program Director: Olaf Schenk, Universita Della Svizzera italiana • Secretary, Mark Hoemmen, Sandia National Laboratories

TO JOIN: SIAG/SC: my.siam.org/forms/join_siag.htm SIAM: www.siam.org/joinsiam 6 SIAM Conference on Parallel Processing for Scientific Computing

Invited#SIAMPP16. Plenary SpeakersOrganizer: Albert Cohen, All Invited Plenary Presentations will take place inUniversité Farabeuf Pierre . et Marie Curie, France Tuesday, April 12 Minitutorials5:15 PM - 6:00 PM IP1 ScalabilityAll minitutorials of Sparse will Directtake Codes 2:10 PM - 4:10 PM Iain Duff, Science & Technologyplace in Auditorium Facilities A.Council, UnitedMT4 Particle Kingdom and Ensemble and CERFACS, Toulouse, France Kalman Filters for Data Assimilation and Time Series Analysis Tuesday, April 5 Organizer: Hans Rudolf 10:40Wednesday, AM - 12:40 April PM 13 Künsch, ETH Zürich, MT18:15 Computer AM - 9:00 Model AM Switzerland IP2 Towards SolvingEmulators Coupled and Flow Rare Problems Event at the Extreme Scale Simulations Ulrich J. Ruede, University of Erlangen-Nuremberg, Germany Organizer: Elaine Spiller, Marquette University, USA 1:45 PM - 2:30 PM IP3 Why Future Systems should be Data Centric Paul Coteus, IBM Research, USA 2:10 PM - 4:10 PM MT2 The Worst Case Approach to Uncertainty Quantification Organizer: Houman Owhadi, California Institute of Technology, USA

Wednesday, April 6 8:35 AM - 10:35 AM MT3 High Dimensional Approximation of Parametric PDE’s SIAM Conference on Parallel Processing for Scientific Computing 7

Invited Plenary Speakers All Invited Plenary Presentations will take place in Farabeuf .

Thursday, April 14 8:15 AM - 9:00 AM IP4 Next-Generation AMR Ann S. Almgren, Lawrence Berkeley National Laboratory, USA

1:45 PM - 2:30 PM IP5 Beating Heart Simulation Based on a Stochastic Biomolecular Model Takumi Washio, University of Tokyo, Japan

Friday, April 15 8:45 AM - 9:30 AM IP6 Conquering Big Data with Spark Ion Stoica, University of California, Berkeley, USA

1:45 PM - 2:30 PM IP7 A Systems View of High Performance Computing Simon Kahan, Institute for Systems Biology and University of Washington, USA 8 SIAM Conference on Parallel Processing for Scientific Computing

SIAG Supercomputing Prizes

Thursday, April 14 9:25 AM - 9:50 AM SP1 SIAG/Supercomputing Career Prize: Supercomputers and Superintelligence Horst D. Simon, Lawrence Berkeley National Laboratory, USA

9:55 AM - 10:10 AM SIAG/Supercomputing Early Career Prize Tobin Isaac, University of Chicago, USA

Friday, April 15

SP2 SIAG/Supercomputing Best Paper Prize: Communication-Optimal Parallel and Sequential QR and LU Factorizations

James W. Demmel, University of California, Berkeley, USA Laura Grigori, Inria, France Mark Hoemmen, Sandia National Laboratories, USA Julien Langou, University of Colorado, Denver, USA SIAM Conference on Parallel Processing for Scientific Computing 9

PP16 Program 10 SIAM Conference on Parallel Processing for Scientific Computing

Tuesday, April 12 2:00-2:20 Coarse Grid Aggregation Tuesday, April 12 for Sa-Amg Method with Multiple MS1 Kernel Vectors Akihiro Fujii, Naoya Nomura, and Parallel Programming Registration Teruo Tanaka, Kogakuin University, Models, Algorithms and Japan; Kengo Nakajima, University 10:00 AM-7:00 PM Frameworks for Extreme of Tokyo, Japan; Osni A. Marques, Room:Entrance Hall Computing - Part I of III Lawrence Berkeley National Laboratory, USA 1:10 PM-2:50 PM 2:25-2:45 Evaluation of Energy Room:Farabeuf Profile for Krylov Iterative Methods For Part 2 see MS9 on Cluster of Accelerators Multicore/manycore processors and Langshi Chen, CNRS, France; Serge accelerators are universally available G. Petiton, Université Lille 1 and as both collections of homogeneous CNRS, France standard microprocessors and as attached heterogeneous co-processors. Application and library software developers may often effectively use these processors and some general approaches have emerged. It is widely recognized that careful design of software and data structures, with effective memory management, are the most critical issues to obtain scalable optimized performance on those systems. In these minisymposia we discuss current experiences and development of applications, libraries and frameworks using a variety of hardware. Speakers will address performance results and software design. Organizer: Kengo Nakajima University of Tokyo, Japan Organizer: Michael Heroux Sandia National Laboratories, USA Organizer: Serge G. Petiton Université Lille 1 and CNRS, France 1:10-1:30 Opportunities and Challenges in Developing and Using Scientific Libraries on Emerging Architectures Michael Heroux, Sandia National Laboratories, USA 1:35-1:55 Linear Algebra for Data Science Through Examples Nahid Emad, University of Versailles, France

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 11

Tuesday, April 12 1:10-1:30 Correlating Facilities, Tuesday, April 12 System and Application Data Via MS2 Scalable Analytics and Visualization MS3 Abhinav Bhatele and Todd Gamblin, Improving Performance, Lawrence Livermore National Combinatorial Scientific Throughput, and Efficiency Laboratory, USA; Alfredo Gimenez Computing - Part I of III of HPC Centers through and Yin Yee Ng, University of 1:10 PM-2:50 PM California, Davis, USA Full System Data Analytics Room:Roussy - Part I of II 1:35-1:55 A Monitoring Tool for HPC Operations Support For Part 2 see MS11 1:10 PM-2:50 PM R. Todd Evans, James C. Browne, John Combinatorial algorithms and tools Room:Pasquier West, and Bill Barth, University of are used in enabling parallel scientific Texas at Austin, USA computing applications. The general For Part 2 see MS10 approach is to identify performance As the complexity of HPC centers 2:00-2:20 Developing a Holistic issues in an application and design, increases, understanding how to Understanding of I/O Workloads on analyze, implement combinatorial optimize full system performance Future Architectures algorithms to tackle the identified will require big data analytics. Glenn Lockwood and Nicholas Wright, issues. The proposed minisymposium HPC centers comprise parallel Lawrence Berkeley National gathers 12 talks, covering applications in supercomputing systems, application Laboratory, USA bioinformatics, solvers of linear systems, codes, and facility systems such as 2:25-2:45 MuMMI: A Modeling and data analysis; and graph algorithms power and cooling, and applications Infrastructure for Exploring Power for those applications. The objective is compete for shared compute, storage, and Execution Time Tradeoffs in to summarize the latest combinatorial networking, and power resources. Parallel Applications algorithmic developments and the Techniques to monitor, manage, and Valerie Taylor and Xingfu Wu, Texas needs of the applications. The goal is analyze these resources to optimize A&M University, USA to cross-fertilize the both domains: the HPC center efficiency are in their applications will raise new challenges infancy. This minisymposium will to the combinatorial algorithms, and the bring together experts on: combinatorial algorithms will address 1) measurement and monitoring; some of the existing problems of the 2) data integration and storage; and applications. 3) visualization and analytics, to develop parallel analysis techniques Organizer: Aydin Buluc for optimizing the performance, Lawrence Berkeley National throughput, and efficiency of HPC Laboratory, USA centers. LLNL-ABS-677080. Organizer: Alex Pothen Organizer: Abhinav Bhatele Purdue University, USA Lawrence Livermore National 1:10-1:30 Parallel Algorithms for Laboratory, USA Automatic Differentiation Organizer: Todd Gamblin Alex Pothen and Mu Wang, Purdue University, USA Lawrence Livermore National Laboratory, USA 1:35-1:55 Parallel Combinatorial Algorithms in Sparse Matrix Organizer: James Brandt Computation? Sandia National Laboratories, USA Mathias Jacquelin and Esmond G. Organizer: Ann Gentile Ng, Lawrence Berkeley National Laboratory, USA; Barry Peyton, Sandia National Laboratories, USA Dalton State College, USA; Yili Zhrng and Kathy Yelick, Lawrence Berkeley National Laboratory, USA 2:00-2:20 Parallel Graph Matching Algorithms Using Matrix Algebra Ariful Azad and Aydin Buluc, Lawrence Berkeley National Laboratory, USA 2:25-2:45 Randomized Linear Algebra in Large-Scale Computational Environments Michael Mahoney, Alex Gittens, and continued in next column Farbod Roosta-Khorasani, University of California, Berkeley, USA 12 SIAM Conference on Parallel Processing for Scientific Computing

Tuesday, April 12 Tuesday, April 12 1:35-1:55 Time Parallelizing Edge Plasma Simulations Using the Parareal MS4 MS5 Algorithm Debasmita Samaddar, UK Atomic Minimizing Communication Parallel-in-Time Integration Energy Authority, United Kingdom; in Numerical Algorithms - Methods - Part I of III David Coster, Max-Planck-Institut Part I of II 1:10 PM-2:50 PM für Plasmaphysik, Germany; Xavier Bonnin, ITER Organization, France; 1:10 PM-2:50 PM Room:Marie Curie Eva Havlickova, UK Atomic Energy Room:Salle des theses For Part 2 see MS13 Authority, United Kingdom; Wael For Part 2 see MS12 With the expected further increase R. Elwasif, Donald B. Batchelor, The growing gap between the costs in available computing power, and Lee A. Berry, Oak Ridge of computation and communication new parallelisation strategies National Laboratory, USA; Christoph in terms of both time and energy have to be developed to tackle the Bergmeister, University of Innsbruck, necessitates the development of new scalability challenges arising from Austria algorithms which minimize data the trend towards massively parallel 2:00-2:20 Micro-Macro Parareal movement, both across the network and architectures. In association with Method for Stochastic Slow-Fast within the memory hierarchy, in order upcoming Exascale computing, Systems with Applications in to make efficient use of today’s and simulations with real-time Molecular Dynamics future hardware. This minisymposium requirements face particular challenges Giovanni Samaey and Keith discusses recent progress in both in HPC. These challenges led to a Myerscough, KU Leuven, Belgium; the practice of designing and growing interest in parallel-in-time Frederic Legoll, Ecole Nationale des implementing dense and sparse linear methods over the last decade. Adding Ponts et Chaussées, France; Tony algebra algorithms and in the theory temporal parallelisation has been Lelievre, École des Ponts ParisTech, of analyzing lower bounds on their shown to be able to improve or extend France communication costs. the scalability of programs relying 2:25-2:45 A Parallel-in-Time Solver for on traditional parallelisation-in-space Time-Periodic Navier-Stokes Problems Organizer: Oded Schwartz methods. We invite speakers to present Daniel Hupp and Peter Arbenz, ETH Hebrew University of Jerusalem, Israel their current research progress related Zürich, Switzerland; Dominik Obrist, Organizer: Erin C. Carson to parallel-in-time methods from all University of Bern, Switzerland New York University, USA areas. 1:10-1:30 Communication-Avoiding Organizer: Martin Schreiber Krylov Subspace Methods in Theory University of Exeter, United Kingdom and Practice Erin C. Carson, New York University, Organizer: Tobias Neckel USA Technische Universität München, Germany 1:35-1:55 Enlarged GMRES for Reducing Communication When Organizer: Daniel Ruprecht Solving Reservoir Simulation Problems University of Leeds, United Kingdom Hussam Al Daas, UPMC-Inria-TOTAL, France; Laura Grigori, Inria, France; 1:10-1:30 Time Parallel and Space Time Solution Methods - From Heat Pascal Henon, Total E&P, France; Equation to Fluid Flow Philippe Ricoux, TOTAL SA, France Rolf Krause, Università della Svizzera 2:00-2:20 CA-SVM: Communication- Italiana, Italy; Pietro Benedusi, Avoiding Support Vector Machines on USI, Switzerland; Peter Arbenz Distributed Systems and Daniel Hupp, ETH Zürich, Yang You, University of California, Switzerland Berkeley, USA 2:25-2:45 Outline of a New Roadmap to Permissive Communication and Applications That Can Benefit James A. Edwards and Uzi Vishkin, University of Maryland, USA

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 13

Tuesday, April 12 Tuesday, April 12 2:00-2:20 Optimizing Numerical Simulations of Elastodynamic Wave MS6 MS7 Propagation Thanks to Task-Based Parallel Programming The Strong Scaling Limit: Task-based Scientific, High Lionel Boillot, Inria Bordeaux Sud- From Industrial Applications Performance Computing Ouest, France; Corentin Rossignon, to New Algorithms - - Part I on Top of Runtime Systems - Total, France; George Bosilca, of III Part I of II University of Tennessee, Knoxville, USA; Emmanuel Agullo, Inria, 1:10 PM-2:50 PM 1:10 PM-2:50 PM France; Henri Calandra, Total, France Room:Leroux Room:Dejerine 2:25-2:45 Controlling the Memory For Part 2 see MS14 For Part 2 see MS15 Subscription of Applications with a There is a strong interest in running The extreme complexity of hardware Task-Based Runtime System parallel applications in the strong platforms makes them harder and Marc Sergent, Inria, France; David scaling limit. There, however, there harder to program. To fully exploit Goudin, CEA/CESTA, France; is an increasing communication cost such machines, the High Performance Samuel Thibault, University of and a decreasing compute cost. In Computing community often uses a Bordeaux, France; Olivier Aumage, this mini symposium, we analyse how MPI + X (X being pthreads, OpenMP, Inria, France various applications behave in this Cuda ...) programming models. In limit and how novel communication this minisymposium, we overview hiding/avoiding algorithms can help. an alternative solution consisting of We start with industrial and scientific programming at a higher level of applications and discuss both the abstractions by descrbing a scientific, software and algorithmic challenges. high performance computing This symposium reports work of the application as a sequence of tasks Exa2ct project, a EU supported project. whose execution is delegated to a runtime system. We discuss the Organizer: Wim I. Vanroose potential of numerical libraries that University of Antwerp, Belgium have followed that design as well as 1:10-1:30 Toward Large-Eddy recent advances made by fully-featured Simulation of Complex Burners with runtime systems to support them. Exascale Super-Computers: A Few Challenges and Solutions Organizer: Emmanuel Agullo Vincent R. Moureau, CORIA, France; Inria, France Pierre Bénard, Université de Rouen, Organizer: Alfredo Buttari France CNRS, France 1:35-1:55 Extraction the Compute and Communications Patterns from Organizer: Abdou An Industrial Application Guermouche Thomas Guillet, Intel Corporation, LaBRI, France France 1:10-1:30 Programming Linear 2:00-2:20 A Tiny Step Toward a Algebra with Python: A Task-Based Unified Task Scheduler for Large Approach Scale Heterogeneous Architectures Rosa M. Badia, Barcelona Clement Fontenaille, PRiSM - UVSQ, Supercomputing Center, Spain France 1:35-1:55 Parsec: A Distributed 2:25-2:45 Parallel GMRES Using A Runtime System for Task-Based Variable S-Step Krylov Basis Heterogeneous Computing David Imberti and Jocelyne Erhel, Aurelien Bouteiller and George Bosilca, Inria-Rennes, France University of Tennessee, Knoxville, USA

continued in next column 14 SIAM Conference on Parallel Processing for Scientific Computing

Tuesday, April 12 Tuesday, April 12 Tuesday, April 12 MS8 CP1 MS9 Extreme Scale Solvers for Advances in Material Parallel Programming Coupled Systems Science Models, Algorithms and 1:10 PM-2:50 PM 1:10 PM-2:10 PM Frameworks for Extreme Room:Delarue Room:Danton Computing - Part II of III Exascale computers will exhibit billion- Chair: Masha Sosonkina, Old 3:20 PM-5:00 PM way parallelism. Computing on such Dominion University, USA Room:Farabeuf extreme scale needs methods scaling 1:10-1:25 Parallel Recursive Density For Part 1 see MS1 perfectly with optimal complexity. Matrix Expansion in Electronic For Part 3 see MS17 This minisymposium combines talks Structure Calculations Multicore/manycore processors and on crucial aspects of extreme scale Anastasia Kruchinina, Emanuel H. accelerators are universally available solving. The solver must be of optimal Rubensson, and Elias Rudberg, as both collections of homogeneous complexity, which is more an more Uppsala University, Sweden standard microprocessors and as severe with increasing problem size, attached heterogeneous co-processors. 1:30-1:45 Comparison of and scale efficiently on extreme Application and library software scales of parallelism. To that end, the Optimization Techniques in Parallel Ab Initio Nuclear Structure developers may often effectively use minisymposium brings together talks on Calculations these processors and some general parallel adaptive multigrid methods in Masha Sosonkina, Old Dominion approaches have emerged. It is widely space and time, as well as optimziation University, USA recognized that careful design of and uncertainty quantification software and data structures, with techniques. Also reducing power 1:50-2:05 Scale-Bridging Modeling of Material Dynamics: Petascale effective memory management, are the consumption will be a topic of the Assessments on the Road to most critical issues to obtain scalable minisymposium. Exascale optimized performance on those systems. Organizer: Gabriel Wittum Timothy C. Germann, Los Alamos In these minisymposia we discuss University of Frankfurt, Germany National Laboratory, USA current experiences and development of applications, libraries and frameworks 1:10-1:30 Uncertainty Quantification using a variety of hardware. Speakers Using Tensor Methods and Large will address performance results and Scale Hpc Coffee Break Lars Grasedyck and Christian Löbbert, software design. 2:50 PM-3:20 PM RWTH Aachen University, Germany Organizer: Kengo Nakajima 1:35-1:55 Parallel Methods in Space Room:Exhibition Center University of Tokyo, Japan and Time and Their Application in Computational Medicine Organizer: Michael Heroux Rolf Krause, Università della Svizzera Sandia National Laboratories, USA Italiana, Italy; Andreas Kreienbühl, Organizer: Serge G. Petiton USI, Switzerland; Dorian Krause, Université Lille 1 and CNRS, France University of Lugano, Switzerland; 3:20-3:40 Thread-Scalable FEM Gabriel Wittum, University of Kengo Nakajima, University of Tokyo, Frankfurt, Germany Japan 2:00-2:20 Scalable Shape 3:45-4:05 Iterative ILU Preconditioning Optimization Methods for Structured Inverse Modeling Using Large Scale Hartwig Anzt, University of Tennessee, Hpc USA; Edmond Chow, Georgia Volker H. Schulz and Martin Institute of Technology, USA Siebenborn, University of Trier, 4:10-4:30 ICB: IC Preconditioning with Germany a Fill-in Strategy Considering Simd Instructions 2:25-2:45 Parallel Adaptive and Robust Multigrid Takeshi Iwashita, Hokkaido University, Gabriel Wittum, University of Frankfurt, Japan; Naokazu Takemura, Akihiro Germany Ida and Hiroshi Nakashima, Kyoto University, Japan 4:35-4:55 Kokkos: Manycore Programmability and Performance Portability H. Carter Edwards and Christian Trott, Sandia National Laboratories, USA SIAM Conference on Parallel Processing for Scientific Computing 15

Tuesday, April 12 3:45-4:05 Low-Overhead Monitoring Tuesday, April 12 of Parallel Applications MS10 John Mellor-Crummey and Mark MS11 Krentel, Rice University, USA Improving Performance, Combinatorial Scientific 4:10-4:30 Data Management and Throughput, and Efficiency Analysis for An Energy Efficient HPC Computing - Part II of III of HPC Centers through Full Center 3:20 PM-5:00 PM Ghaleb Abdulla, Anna Maria Bailey, System Data Analytics - Room:Roussy Part II of II and John Weaver, Lawrence Livermore National Laboratory, USA For Part 1 see MS3 3:20 PM-5:00 PM For Part 3 see MS19 4:35-4:55 Discovering Opportunities Combinatorial algorithms and tools Room:Pasquier for Data Analytics Through are used in enabling parallel scientific For Part 1 see MS2 Visualization of Large System Monitoring Datasets computing applications. The general As the complexity of HPC centers approach is to identify performance increases, understanding how to Michael Showerman, University of Illinois at Urbana-Champaign, USA issues in an application and design, optimize full system performance will analyze, implement combinatorial require big data analytics. HPC centers algorithms to tackle the identified comprise parallel supercomputing issues. The proposed minisymposium systems, application codes, and gathers 12 talks, covering applications facility systems such as power and in bioinformatics, solvers of linear cooling, and applications compete for systems, and data analysis; and graph shared compute, storage, networking, algorithms for those applications. The and power resources. Techniques objective is to summarize the latest to monitor, manage, and analyze combinatorial algorithmic developments these resources to optimize HPC and the needs of the applications. center efficiency are in their infancy. The goal is to cross-fertilize the both This mini-symposium will bring domains: the applications will raise together experts on: 1) measurement new challenges to the combinatorial and monitoring; 2) data integration algorithms, and the combinatorial and storage; and 3) visualization algorithms will address some of the and analytics, to develop parallel existing problems of the applications. analysis techniques for optimizing the performance, throughput, and efficiency Organizer: Aydin Buluc of HPC centers. LLNL-ABS-677080. Lawrence Berkeley National Laboratory, USA Organizer: Abhinav Bhatele Lawrence Livermore National Organizer: Alex Pothen Laboratory, USA Purdue University, USA 3:20-3:40 A Partitioning Problem Organizer: Todd Gamblin for Load Balancing and Reducing Lawrence Livermore National Communication from the Field of Laboratory, USA Quantum Chemistry Organizer: James Brandt Edmond Chow, Georgia Institute of Sandia National Laboratories, USA Technology, USA 3:45-4:05 Community Detection on Organizer: Ann Gentile GPU Sandia National Laboratories, USA Md Naim and Fredrik Manne, 3:20-3:40 Smart HPC Centers: Data, University of Bergen, Norway Analysis, Feedback, and Response 4:10-4:30 Scalable Parallel Algorithms James Brandt and Ann Gentile, for De Novo Assembly of Complex Sandia National Laboratories, Genomes USA; Cindy Martin, Los Alamos Evangelos Georganas, University of National Laboratory, USA; Benjamin California, Berkeley, USA Allan and Karen D. Devine, Sandia 4:35-4:55 Talk Title To Be Announced National Laboratories, USA Aydin Buluc, Lawrence Berkeley National Laboratory, USA

continued in next column 16 SIAM Conference on Parallel Processing for Scientific Computing

Tuesday, April 12 Tuesday, April 12 4:10-4:30 Parallel-in-Time Coupling of Models and Numerical Methods MS12 MS13 Franz Chouly, Université de Franche- Comté & CNRS, France; Matteo Minimizing Communication Parallel-in-Time Integration Astorino, École Polytechnique in Numerical Algorithms - Methods - Part II of III Fédérale de Lausanne, Switzerland; Part II of II 3:20 PM-5:00 PM Alexei Lozinski, CNRS, France; Alfio 3:20 PM-5:00 PM Quarteroni, École Polytechnique Room:Marie Curie Fédérale de Lausanne, Switzerland Room:Salle des theses For Part 1 see MS5 4:35-4:55 Parareal in Time and Fixed- For Part 1 see MS4 For Part 3 see MS21 Point Schemes The growing gap between the costs With the expected further increase Olga Mula, CEREMADE Universite of computation and communication in available computing power, Paris 9 Dauphine, France in terms of both time and energy new parallelisation strategies necessitates the development of new have to be developed to tackle the algorithms which minimize data scalability challenges arising from movement, both across the network and the trend towards massively parallel within the memory hierarchy, in order architectures. In association with to make efficient use of today’s and upcoming Exascale computing, future hardware. This minisymposium simulations with real-time requirements discusses recent progress in both the face particular challenges in HPC. practice of designing and implementing These challenges led to a growing dense and sparse linear algebra interest in parallel-in-time methods algorithms and in the theory of analyzing over the last decade. Adding temporal lower bounds on their communication parallelisation has been shown to costs. be able to improve or extend the scalability of programs relying on Organizer: Oded Schwartz traditional parallelisation-in-space Hebrew University of Jerusalem, Israel methods. We invite speakers to present Organizer: Erin C. Carson their current research progress related New York University, USA to parallel-in-time methods from all areas. 3:20-3:40 Communication-Efficient Evaluation of Matrix Polynomials Organizer: Martin Schreiber Sivan A. Toledo, Tel Aviv University, University of Exeter, United Kingdom Israel Organizer: Tobias Neckel 3:45-4:05 Communication-Optimal Technische Universität München, Loop Nests Germany Nicholas Knight, New York University, USA Organizer: Daniel Ruprecht 4:10-4:30 Write-Avoiding Algorithms University of Leeds, United Kingdom Harsha Vardhan Simhadri, Lawrence 3:20-3:40 Implementating Parareal - Berkeley National Laboratory, USA OpenMP Or MPI? Daniel Ruprecht, University of Leeds, 4:35-4:55 Lower Bound Techniques for Communication in the Memory United Kingdom Hierarchy 3:45-4:05 Fault-Tolerance of the Gianfranco Bilardi, University of Parallel-in-Time Integration Method Padova, Italy PFASST Robert Speck and Torbjörn Klatt, Jülich Supercomputing Centre, Germany; Daniel Ruprecht, University of Leeds, United Kingdom

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 17

Tuesday, April 12 Tuesday, April 12 4:10-4:30 High Performance Task- Based QDWH-eig on Manycore MS14 MS15 Systems Dalal Sukkari, Hatem Ltaief, and David The Strong Scaling Limit: Task-based Scientific, High E. Keyes, King Abdullah University From Industrial Applications Performance Computing of Science & Technology (KAUST), to New Algorithms - on Top of Runtime Systems - Saudi Arabia Part II of III Part II of II 4:35-4:55 Exploiting Two-Level Parallelism by Aggregating 3:20 PM-5:00 PM 3:20 PM-5:00 PM Computing Resources in Task-Based Room:Leroux Room:Dejerine Applications Over Accelerator-Based Machines For Part 1 see MS6 For Part 1 see MS7 Terry Cojean, Inria, France For Part 3 see MS22 The extreme complexity of hardware There is a strong interest in running platforms makes them harder and parallel applications in the strong harder to program. To fully exploit scaling limit. There, however, there such machines, the High Performance is an increasing communication cost Computing community often uses a and a decreasing compute cost. In MPI + X (X being pthreads, OpenMP, this minisymposium, we analyse how Cuda ...) programming models. In various applications behave in this this minisymposium, we overview limit and how novel communication an alternative solution consisting of hiding/avoiding algorithms can help. programming at a higher level of We start with industrial and scientific abstractions by descrbing a scientific, applications and discuss both the high performance computing application software and algorithmic challenges. as a sequence of tasks whose execution This symposium reports work of the is delegated to a runtime system. We Exa2ct project, a EU supported project. discuss the potential of numerical Organizer: Wim I. Vanroose libraries that have followed that design University of Antwerp, Belgium as well as recent advances made by fully-featured runtime systems to 3:20-3:40 Rounding Errors in Pipelined support them. Krylov Solvers Siegfried Cools, University of Antwerp, Organizer: Emmanuel Agullo Belgium Inria, France 3:45-4:05 Combining Software Organizer: Alfredo Buttari Pipelining with Numerical Pipelining CNRS, France in the Conjugate Gradient Algorithm Emmanuel Agullo, Luc Giraud, and Organizer: Abdou Stojce Nakov, Inria, France Guermouche 4:10-4:30 Notification Driven LaBRI, France Collectives in Gaspi 3:20-3:40 Task-Based Multifrontal QR Christian Simmendinger, T-Systems Solver for GPU-Accelerated Multicore International, Germany Architectures 4:35-4:55 Espreso: ExaScale PaRallel Florent Lopez, Universite Paul Sabatier, Feti Solver France Lubomir Riha, Tomas Brzobohaty, 3:45-4:05 An Effective Methodology and Alexandros Markopoulos, VSB- for Reproducible Research on Technical University Ostrava, Czech Dynamic Task-Based Runtime Republic Systems Arnaud Legrand, Inria Grenoble, France; Luka Stanisic, Inria Grenoble Rhône-Alpes, France

continued in next column 18 SIAM Conference on Parallel Processing for Scientific Computing

Tuesday, April 12 4:10-4:30 Bit-Reproducible Climate Tuesday, April 12 and Weather Simulations: From MS16 Theory to Practice CP2 Andrea Arteaga, ETH Zürich, Numerical Reproducibility Switzerland; Christophe Charpilloz Eigensolvers for High-Performance and Olive Fuhrer, MeteoSwiss, 3:20 PM-5:00 PM Switzerland; Torsten Hoefler, ETH Computing Room:Danton Zürich, Switzerland 3:20 PM-5:00 PM Chair: Aaron Melman, Santa Clara 4:35-4:55 Parallel Interval Algorithms, University, USA Room:Delarue Numerical Guarantees and Numerical reproducibility is the Numerical Reproducibility 3:20-3:35 Parallel Computation of property to get a bitwise identical Nathalie Revol, Inria - LIP, Ecole Recursive Bounds for Eigenvalues of floating-point result from multiple Normale Superieure de Lyon, France Matrix Polynomials runs of the same code. Floating-point Aaron Melman, Santa Clara University, operations are not associative which USA hinder numerical reproducibility 3:40-3:55 Exploring Openmp Task especially on large-scale heterogeneous Priorities on the MR3 Eigensolver platform. This property of floating- Jan Winkelmann and Paolo Bientinesi, point calculation causes validations RWTH Aachen University, Germany and debugging issues and may lead 4:00-4:15 Increasing the Arithmetic to deadlock. Those problems occur Intensity of a Sparse Eigensolver for especially with parallel implementation Better Performance on Heterogenous embedding exchange of floating-point Clusters results in linear algebra libraries. This Jonas Thies, German Aerospace Center minisymposium will address some (DLR), Germany of those problems and speakers will 4:20-4:35 Feast Eigenvalue Solver present both theoretical and practical Using Three Levels of MPI Parallelism solutions. James Kestyn and Eric Polizzi, University of Massachusetts, Amherst, Organizer: David Defour USA; Peter Tang, Intel Corporation, University of Perpignan, France USA Organizer: Stef Graillat 4:40-4:55 Highly Scalable Selective University Pierre and Marie Curie Inversion Using Sparse and Dense (UPMC), France Matrix Algebra 3:20-3:40 Reproducible and Fabio Verbosio and Drosos Kourounis, Accurate Algorithms for Numerical Università della Svizzera italiana, Linear Algebra Switzerland; Olaf Schenk, Università Roman Iakymchuk, KTH Royal Institute della Svizzera Italiana, Italy of Technology, Sweden; David Defour, University of Perpignan, France; Sylvain Collange, Inria Welcome Remarks Rennes, France; Stef Graillat, University Pierre and Marie Curie 5:05 PM-5:15 PM (UPMC), France Room:Farabeuf 3:45-4:05 Summation and Dot Products with Reproducible Results Siegfried M. Rump, Hamburg University of Technology, Germany

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 19

Tuesday, April 12 Wednesday, Wednesday, April 13 IP1 CP3 Scalability of Sparse Direct April 13 Data Analysis Codes 9:10 AM-9:50 AM 5:15 PM-6:00 PM Registration Room:Pasquier Room:Farabeuf 7:45 AM-5:00 PM Chair: Robert Robey, Los Alamos Chair: Sivan A. Toledo, Tel Aviv National Laboratory, USA Room:Entrance Hall University, Israel 9:10-9:25 Fast Mesh Operations As part of the H2020 FET-HPC Using Spatial Hashing: Remaps and Neighbor Finding Project NLAFET, we are studying the Robert Robey, David Nicholaeff, and scalability of algorithms and software IP2 Gerald Collom, Los Alamos National for using direct methods for solving Towards Solving Coupled Laboratory, USA; Redman Colin, large sparse equations. We examine Arizona State University and Los how techniques that work well for Flow Problems at the Alamos National Laboratory, USA dense system solution, for example Extreme Scale communication avoiding strategies, can 8:15 AM-9:00 AM 9:30-9:45 Traversing Time-Dependent be adapted for the sparse case. We also Networks in Parallel study the benefits of using standard run Room:Farabeuf Weijian Zhang, University of time systems to assist us in developing Chair: Philippe Ricoux, TOTAL SA, Manchester, United Kingdom codes for extreme scale computers. We France will look at algorithms for solving both Scalability is only a necessary sparse symmetric indefinite systems and prerequisite for extreme scale unsymmetric systems. computing. Beyond scalability, we Iain Duff propose here an architecture aware Science & Technology Facilities co-design of the models, algorithms, Council, United Kingdom and and data structures. The benefit will be CERFACS, Toulouse, France demonstrated for the Lattice Boltzmann method as example of an explicit time stepping scheme and for a finite element multigrid method as an implicit Concurrent Panels solver. In both cases, systems with 6:00 PM-7:00 PM more than ten trillion (1013) degrees of freedom can be solved already on the PD1 Towards Exascale Memory Systems current peta-scale machines. Room: Farabeuf Ulrich J. Ruede University of Erlangen-Nuremberg, PD2 What Do You Need to Know About Task-Based Programming for Germany Future Systems? Room: Pasquier

Welcome Reception 7:00 PM-9:00 PM Room: Exhibition Space 20 SIAM Conference on Parallel Processing for Scientific Computing

Wednesday, April 13 Wednesday, April 13 Wednesday, April 13 CP4 CP5 PD3 Uncertainty Quantification Parallel Software Tools Forward-Looking Panel: 9:10 AM-10:10 AM 9:10 AM-10:10 AM Supercomputers Beyond Room:Roussy Room:Salle des theses the Exascale Chair: Michael McKerns, California Chair: Elmar Peise, RWTH Aachen 9:10 AM-10:10 AM Institute of Technology, USA University, Germany Room:Farabeuf 9:10-9:25 “Mystic”: Highly- 9:10-9:25 The Elaps Framework: Chair: Bruce Hendrickson, Sandia Constrained Non-Convex Experimental Linear Algebra National Laboratories, USA Optimization and Uncertainty Performance Studies Quantification Chair: Sivan A. Toledo, Tel Aviv Elmar Peise and Paolo Bientinesi, University, Israel Michael McKerns, California Institute RWTH Aachen University, Germany of Technology, USA One-hundred petaflops machines are 9:30-9:45 HiPro: An Interactive being built today. Exascale machines are 9:30-9:45 A Resilient Domain Programming Tool for Parallel clearly feasible and the design of their Decomposition Approach for Programming hardware and software is emerging from Extreme Scale UQ Computations Li Liao and Aiqing Zhang, Institute of the haze of uncertainty. But although Paul Mycek and Andres Contreras, Applied Physics and Computational current technologies look likely to be Duke University, USA; Olivier P. Mathematics, China; Zeyao Mo, able to get us to the exascale, further Le Maître, LIMSI-CNRS, France; CAEP Software Center for High performance improvements will require Khachik Sargsyan, Francesco Rizzi, Performance Numerical Simulations, solutions to enormous challenges in Karla Morris, Cosmin Safta, and China Bert J. Debusschere, Sandia National power, resiliency and other areas. Now Laboratories, USA; Omar M. Knio, 9:50-10:05 Reliable Bag-of-Tasks is the time to ask what the first 10 Duke University, USA Dispatch System exaflops machine might look like. Will Tariq L. Alturkestani, King Abdullah 9:50-10:05 Scalability Studies of it require post-CMOS technologies? University of Science and Will it exploit reconfigurable hardware Two-Level Domain Decomposition Technology (KAUST), Saudia Algorithms for Stochastic Pdes (FPGAs)? Perhaps most importantly, Arabia and University of Pittsburgh, Having Large Random Dimensions how will we program it? Our expert USA; Esteban Meneses, Costa panelists will lead a discussion that will Abhijit Sarkar, Ajit Desai, and Rica Institute of Technology, Costa Mohammad Khalil, Carleton aim to help attendees formulate high- Rica; Rami Melhem, University of impact long-term research agendas in University, Canada; Chris Pettit, Pittsburgh, USA United States Naval Academy, USA; high-performance scientific computing. Dominique Poirel, Royal Military College, Canada Coffee Break 10:10 AM-10:35 AM Room:Exhibition Space SIAM Conference on Parallel Processing for Scientific Computing 21

Wednesday, April 13 11:25-11:45 Parallel Generator Wednesday, April 13 of Non-Hermitian Matrices with MS17 Prescribed Spectrum and Shape MS18 Control for Extreme Computing Parallel Programming Herve Galicher, King Abdullah Challenges in Parallel Models, Algorithms and University of Science & Technology Adaptive Mesh Refinement Frameworks for Extreme (KAUST), Saudi Arabia; France - Part I of III Computing - Part III of III Boillod, CNRS, France; Serge 10:35 AM-12:15 PM G. Petiton, Université Lille 1 and 10:35 AM-12:15 PM CNRS, France Room:Pasquier Room:Farabeuf 11:50-12:10 Monte Carlo Algorithms For Part 2 see MS26 For Part 2 see MS9 and Stochastic Process on Parallel adaptive mesh refinement Multicore/manycore processors and Manycore Architectures (AMR) is a key technique when accelerators are universally available Christophe Calvin, CEA Saclay, France simulations are required to capture as both collections of homogeneous multiscale features. Frequent standard microprocessors and as re-adaptation and repartitioning of the attached heterogeneous co-processors. mesh during the simulation can impose Application and library software significant overhead, particularly in developers may often effectively use large-scale parallel environments. Further these processors and some general challenges arise due to the availability of approaches have emerged. It is widely accelerated or special-purpose hardware, recognized that careful design of and the trend toward hierarchical and software and data structures, with hybrid compute architectures. Our effective memory management, are minisymposium addresses algorithms, the most critical issues to obtain scalability, and software issues of scalable optimized performance on parallel AMR on HPC and multi-/ those systems. In these minisymposia manycore platforms. It will discuss we discuss current experiences and novel techniques and applications that development of applications, libraries demonstrate particular use cases for and frameworks using a variety of AMR. hardware. Speakers will address Organizer: Michael Bader performance results and software Technische Universität München, design. Germany Organizer: Kengo Nakajima Organizer: Martin Berzins University of Tokyo, Japan University of Utah, USA Organizer: Michael Heroux Organizer: Carsten Burstedde Sandia National Laboratories, USA Universität Bonn, Germany Organizer: Serge G. Petiton 10:35-10:55 Nonconforming Meshes Université Lille 1 and CNRS, France and Mesh Adaptivity with Petsc Toby Isaac, University of Texas at 10:35-10:55 Hierarchical Distributed and Parallel Programming for Auto- Austin, USA; Matthew G. Knepley, Tunned Extreme Computing Rice University, USA Serge G. Petiton, Université Lille 1 and 11:00-11:20 Space-Filling Curves and CNRS, France Adaptive Meshes for Oceanic and Other Applications 11:00-11:20 Scalable Sparse Solvers on GPUS Michael Bader, Technische Universität München, Germany Timothy A. Davis and Wissam M. Sid- Lakhdar, Texas A&M University, 11:25-11:45 A Tetrahedral Space- USA Filling Curve Via Bitwise Interleaving Johannes Holke and Carsten Burstedde, Universität Bonn, Germany 11:50-12:10 ADER-DG on Spacetrees in the ExaHyPE Project Dominic E. Charrier, Durham University, United Kingdom; Angelika continued in next column Schwarz, Technische Universität München, Germany; Tobias Weinzierl, Durham University, United Kingdom 22 SIAM Conference on Parallel Processing for Scientific Computing

Wednesday, April 13 11:25-11:45 Clustering Sparse Wednesday, April 13 Matrices with Information from Both MS19 Numerical Values and Pattern MS20 Iain Duff, Science & Technology Combinatorial Scientific Facilities Council, United Kingdom Algorithm Comparison, Computing - Part III of III and CERFACS, Toulouse, France; Selection, and Tuning for 10:35 AM-12:15 PM Philip Knight, University of Large Scale Network and Strathclyde, United Kingdom; Graph Problems Room:Roussy Sandrine Mouysset, Universite de For Part 2 see MS11 Toulouse, France; Daniel Ruiz, 10:35 AM-12:15 PM Combinatorial algorithms and tools ENSEEIHT, Toulouse, France; Bora Room:Salle des theses are used in enabling parallel scientific Ucar, LIP-ENS Lyon, France This minisymposium focuses on using computing applications. The general 11:50-12:10 Parallel Graph Coloring multiple algorithms for analyzing large- approach is to identify performance on Manycore Architectures scale networks. The topics include issues in an application and design, Mehmet Deveci, Erik G. Boman, and methods for selecting among the rapidly analyze, implement combinatorial Siva Rajamanickam, Sandia National growing set of available algorithms and algorithms to tackle the identified Laboratories, USA their implementation, comparing the issues. The proposed minisymposium accuracy, performance, and power of gathers 12 talks, covering applications different algorithms for static as well as in bioinformatics, solvers of linear dynamic networks. systems, and data analysis; and graph algorithms for those applications. The Organizer: Boyana Norris objective is to summarize the latest University of Oregon, USA combinatorial algorithmic developments Organizer: Sanjukta Bhowmick and the needs of the applications. University of Nebraska, Omaha, USA The goal is to cross-fertilize the both domains: the applications will raise 10:35-10:55 Parallel Algorithms for new challenges to the combinatorial Analyzing Dynamic Networks Sanjukta Bhowmick and Sriram algorithms, and the combinatorial Srinivasan, University of Nebraska, algorithms will address some of the Omaha, USA; Sajal Das, Missouri existing problems of the applications. University of Science and Organizer: Aydin Buluc Technology, USA Lawrence Berkeley National 11:00-11:20 The Structure of Laboratory, USA Communities in Social Networks Organizer: Alex Pothen Sucheta Soundarajan, Syracuse Purdue University, USA University, USA 10:35-10:55 Directed Graph 11:25-11:45 Generating Random Partitioning Hyperbolic Graphs in Subquadratic Umit V. Catalyurek, The Ohio State Time University, USA; Kamer Kaya, Moritz von Looz, Henning Meyerhenke, Sabanci University, Turkey; Bora and Roman Prutkin, Karlsruhe Ucar, LIP-ENS Lyon, France Institute of Technology, Germany 11:00-11:20 Parallel Approximation 11:50-12:10 Performance and Algorithms for b-Edge Covers and Power Characterization of Network Data Privacy Algorithms Arif Khan and Alex Pothen, Purdue Boyana Norris, University of Oregon, University, USA USA; Sanjukta Bhowmick, University of Nebraska, Omaha, USA

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 23

Wednesday, April 13 11:00-11:20 Multigrid Methods with Wednesday, April 13 Space-Time Concurrency MS21 Stephanie Friedhoff, University of MS22 Parallel-in-Time Integration Cologne, Germany; Robert D. Falgout and Tzanio V. Kolev, Lawrence The Strong Scaling Limit: Methods - Part III of III Livermore National Laboratory, From Industrial Applications 10:35 AM-12:15 PM USA; Scott Maclachlan, Memorial to New Algorithms - Room:Marie Curie University, Newfoundland, Canada; Part III of III Jacob B. Schroder, Lawrence For Part 2 see MS13 Livermore National Laboratory, USA; 10:35 AM-12:15 PM With the expected further increase Stefan Vandewalle, KU Leuven, Room:Leroux in available computing power, new Belgium parallelisation strategies have to be For Part 2 see MS14 developed to tackle the scalability 11:25-11:45 PinT for Climate and There is a strong interest in running challenges arising from the trend Weather Simulation parallel applications in the strong towards massively parallel architectures. Martin Schreiber, University of Exeter, scaling limit. There, however, there In association with upcoming Exascale United Kingdom; Pedro Peixoto, is an increasing communication cost computing, simulations with real-time University of Sao Paulo, Brazil; and a decreasing compute cost. In requirements face particular challenges Terry Haut, Los Alamos National this minisymposium, we analyse how in HPC. These challenges led to a Laboratory, USA; Beth Wingate, various applications behave in this growing interest in parallel-in-time University of Exeter, United Kingdom limit and how novel communication methods over the last decade. Adding 11:50-12:10 Convergence Rate hiding/avoiding algorithms can help. temporal parallelisation has been of Parareal Method with Modified We start with industrial and scientific shown to be able to improve or extend Newmark-Beta Algorithm for applications and discuss both the the scalability of programs relying 2nd-Order Ode software and algorithmic challenges. Mikio Iizuka and Kenji Ono, RIKEN, on traditional parallelisation-in-space This symposium reports work of the Japan methods. We invite speakers to present Exa2ct project, a EU supported project. their current research progress related to Organizer: Wim I. Vanroose parallel-in-time methods from all areas. University of Antwerp, Belgium Organizer: Martin Schreiber 10:35-10:55 Increasing Arithmetic University of Exeter, United Kingdom Intensity Using Stencil Compilers. Simplice Donfack and Patrick Sanan, Organizer: Tobias Neckel Università della Svizzera italiana, Technische Universität München, Switzerland; Olaf Schenk, Università Germany della Svizzera Italiana, Italy; Reps Organizer: Daniel Ruprecht Bram and Wim Vanroose, Antwerp University of Leeds, United Kingdom University, Belgium 10:35-10:55 Improving Applicability 11:00-11:20 Toward Exascale Qp and Performance for the Multigrid Solvers Reduction in Time (mgrit) Algorithm D. Horak, Technical University Ostrava, Robert D. Falgout, Veselin Dobrev, Czech Republic and Tzanio V. Kolev, Lawrence 11:25-11:45 Many Core Acceleration Livermore National Laboratory, USA; of the Boundary Element Library Ben O’Neill, University of Colorado BEM4I Boulder, USA; Jacob B. Schroder, Michal Merta, Technical University Lawrence Livermore National Ostrava, Czech Republic; Jan Laboratory, USA; Ben Southworth, Zapletal, VSB-Technical University University of Colorado Boulder, Ostrava, Czech Republic USA; Ulrike Meier Yang, Lawrence 11:50-12:10 HPGMG-FV Benchmark Livermore National Laboratory, USA with GASPI/GPI-2: First Steps Dimitar M. Stoyanov, Fraunhofer Institute for Industrial Mathematics, Germany

continued in next column 24 SIAM Conference on Parallel Processing for Scientific Computing

Wednesday, April 13 10:35-10:55 Extreme-Scale Wednesday, April 13 Solver for Earth’s Mantle -- MS23 Spectral/Geometric/Algebraic MS24 Multigrid Methods for Nonlinear, Extreme Scale Implicit PDE Heterogeneous Stokes Flow Massively Parallel Phase- Solvers: Parallel Algorithms Johann Rudi, University of Texas field Simulations of Phase and Applications at Austin, USA; A. Cristiano I. Transformation Processes Malossi, IBM Research-Zurich, 10:35 AM-12:15 PM 10:35 AM-12:15 PM Switzerland; Tobin Isaac, University Room:Dejerine of Chicago, USA; Georg Stadler, Room:Delarue Many PDE problems in computational Courant Institute of Mathematical The minisymposium focuses on science and engineering are Sciences, New York University, massively parallel phase-field characterized by highly nonlinear USA; Michael Gurnis, California simulations to study pattern formations phenomena, a wide range of length and Institute of Technology, USA; Peter during solidification or solid-solid time scales, and extreme variability in W. J. Staar, Yves Ineichen, Costas phase transformations in alloys. The material properties. These in turn often Bekas, and Alessandro Curioni, various material and process parameters necessitate aggressive adaptivity and IBM Research-Zurich, Switzerland; influence the evolving microstructure advanced discretizations. Achieving Omar Ghattas, University of Texas at and hence the macroscopic material scalability to the extreme core counts Austin, USA behavior. To minimize boundary effects that are necessary for problems with 11:00-11:20 FE2TI - Computational and obtain statistically reliable results, such complexity often requires novel Scale Bridging for Dual-Phase Steels simulations on large domain sizes are implementation strategies as well Martin Lanser, University of necessary. The computational study as new algorithmic developments. Cologne, Germany; Axel Klawonn, of phase transformation processes in This minisymposium bring together Universitaet zu Koeln, Germany; real systems under real processing researchers who have developed solvers Oliver Rheinbach, Technische conditions requires the implementation capable of scaling to O(105) cores and Universitaet Bergakademie Freiberg, of highly parallel algorithms. This have demonstrated them on challenging Germany minisymposium shows the usage of problems in mantle convection, 11:25-11:45 3D Parallel Direct Elliptic highly optimized and parallel methods subsurface flows, ice sheet dynamics, Solvers Exploiting Hierarchical Low for engineering applications related to materials processing, and others. Rank Structure material development and microstructure Gustavo Chavez, Hatem Ltaief, design. Organizer: Toby Isaac George M Turkiyyah, George M University of Texas at Austin, USA Organizer: Harald Köstler Turkiyyah, and David E. Keyes, Universität Erlangen-Nürnberg, King Abdullah University of Science Organizer: Omar Ghattas Germany University of Texas at Austin, USA & Technology (KAUST), Saudi Arabia Organizer: Martin Bauer Organizer: Georg Stadler 11:50-12:10 Title Not Available University of Erlangen-Nuremberg, Courant Institute of Mathematical Germany Sciences, New York University, USA Steffen Muething, Universität Heidelberg, Germany 10:35-10:55 Performance Organizer: Johann Rudi Optimization of a Massively Parallel University of Texas at Austin, USA Phase-Field Method Using the Hpc Framework Walberla Martin Bauer, University of Erlangen- Nuremberg, Germany; Harald Köstler, Universität Erlangen-Nürnberg, Germany; Ulrich Rüde, University of Erlangen-Nuremberg, Germany 11:00-11:20 Large Scale phase-field simulations of directional ternary eutectic solidification Johannes Hötzer, Hochschule Karlsruhe Technik und Wirtschaft, Germany; Philipp Steinmetz and Britta Nestler, Karlsruhe Institute of Technology, Germany

continued in next column continued on next page SIAM Conference on Parallel Processing for Scientific Computing 25

11:25-11:45 Building High Wednesday, April 13 Wednesday, April 13 Performance Mesoscale Simulations using a Modular Multiphysics CP6 IP3 Framework Cody Permann, National Matrix Factorization Why Future Systems should Laboratory, USA 10:35 AM-12:15 PM be Data Centric 11:50-12:10 Equilibrium and Room:Danton 1:45 PM-2:30 PM Dynamics in Multiphase-Field Theories: A Comparative Study and a Chair: F. Sukru Torun, Bilkent Room:Farabeuf Consistent Formulation University, Turkey Chair: John Shalf, Lawrence Berkeley Gyula I. Toth, University of Bergen, 10:35-10:50 Solving Sparse National Laboratory, USA Norway; Laszlo Granasy and Tamas Underdetermined Linear Least We are immersed in data. Classical Pusztai, Wignes Research Centre for Squares Problems on Parallel high performance computing often Physics, Hungary; Bjorn Kvamme, Computing Platforms generates enormous structured University of Bergen, Norway F. Sukru Torun, Bilkent University, data through simulation; but, a vast Turkey; Murat Manguoglu, Middle amount of unstructured data is also East Technical University, Turkey; being generated and is growing at Cevdet Aykanat, Bilkent University, exponential rates. To meet the data Turkey challenge, we expect computing to 10:55-11:10 A New Approximate evolve in two fundamental ways: first Inverse Technique Based on the GLT through a focus on workflows, and Theory second to explicitly accommodate Ali Dorostkar, Uppsala University, the impact of big data and complex Sweden analytics. As an example, HPC systems 11:15-11:30 Sparse Matrix should be optimized to perform well on Factorization on GPUs modeling and simulation; but, should Mohamed Gadou, University of Florida, also focus on other important elements USA; Tim Davis, Texas A&M of the overall workflow which include University, USA; Sanjay Ranka, data management and manipulation University of Florida, USA coupled with associated cognitive 11:35-11:50 Locality-Aware Parallel analytics. At a macro level, workflows Sparse Matrix-Matrix Multiplication will take advantage of different Emanuel H. Rubensson and Elias elements of the systems hierarchy, Rudberg, Uppsala University, Sweden at dynamically varying times, in 11:55-12:10 Designing An Efficient different ways and with different data and Scalable Block Low-Rank Direct distributions throughout the hierarchy. Solver for Large Scale Clusters This leads us to a data centric design Xavier Lacoste, Cédric Augonnet, and point that has the flexibility to handle David Goudin, CEA/DAM, France these data-driven demands. I will explain the attributes of a data centric system and discuss challenges we see Lunch Break for future computing systems. It is highly desirable that the data centric 12:15 PM-1:45 PM architecture and implementation lead Attendees on their own to commercially viable Exascale-class systems on premise and in the cloud. Paul Coteus IBM Research, USA

Intermission 2:30 PM-2:40 PM 26 SIAM Conference on Parallel Processing for Scientific Computing

Wednesday, April 13 Wednesday, April 13 3:05-3:25 Basilisk: Simple Abstractions for Octree-Adaptive MS25 Schemes MS26 Stephane Popinet, CNRS / UPMC, Middleware in the Era of Challenges in Parallel France Extreme Scale Computing - Adaptive Mesh Refinement 3:30-3:50 Fast Solvers for Complex Part I of II - Part II of III Fluids 2:40 PM-4:20 PM George Biros and Dhairya Malhotra, 2:40 PM-4:20 PM University of Texas at Austin, USA Room:Farabeuf Room:Pasquier 3:55-4:15 Simulation of For Part 2 see MS33 For Part 1 see MS18 Incompressible Flows on Parallel The unprecedented complexity of future For Part 3 see MS34 Octree Grids exascale systems will make the role Parallel adaptive mesh refinement Arthur Guittet and Frederic G. Gibou, of resource management increasingly (AMR) is a key technique when University of California, Santa important for allocating system simulations are required to capture Barbara, USA resources to users at policy-driven levels multiscale features. Frequent of quality of service while meeting re-adaptation and repartitioning of the power and reliability requirements of the mesh during the simulation can impose entire system. The key challenge lies in significant overhead, particularly in maintaining scalability while managing large-scale parallel environments. and monitoring allocations targeting Further challenges arise due to the multiple, often conflicting optimization availability of accelerated or special- criteria. In this minisymposium, we purpose hardware, and the trend toward will cover several R&D activities on hierarchical and hybrid compute HPC resource management systems architectures. Our minisymposium both in academia and industry. In addresses algorithms, scalability, and particular, we will discuss their design software issues of parallel AMR on and implementation to identify how HPC and multi-/manycore platforms. to improve the sustained throughput, It will discuss novel techniques and system availability and power efficiency applications that demonstrate particular at scale. use cases for AMR. Organizer: Keita Teranishi Organizer: Michael Bader Sandia National Laboratories, USA Technische Universität München, Germany Organizer: Martin Schulz Lawrence Livermore National Organizer: Martin Berzins Laboratory, USA University of Utah, USA 2:40-3:00 Middleware for Efficient Organizer: Carsten Burstedde Performance in HPC Universität Bonn, Germany Larry Kaplan, Cray, Inc., USA 2:40-3:00 Recent Parallel Results 3:05-3:25 {PMI}-Exascale: Enabling Using Dynamic Quad-Tree Applications at Exascale Refinement for Solving Hyperbolic Ralph Castain, Intel Corporation, USA Conservation Laws on 2d Cartesian 3:30-3:50 Co-Scheduling: Increasing Meshes Efficency for HPC Donna Calhoun, Boise State Josef Weidendorfer, Technische University, USA Universität München, Germany 3:55-4:15 Power Aware Resource Management Tapasya Patki, Lawrence Livermore National Laboratory, USA

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 27

Wednesday, April 13 Wednesday, April 13 Wednesday, April 13 MS27 MS28 MS29 To Thread or Not To Thread - Extreme-Scale Linear The Present and Future Part I of II Solvers - Part I of II Bulk Synchronous Parallel 2:40 PM-3:55 PM 2:40 PM-4:20 PM Computing - Part I of III Room:Roussy Room:Salle des theses 2:40 PM-4:20 PM For Part 2 see MS35 For Part 2 see MS36 Room:Marie Curie Emerging architectures, in particular This minisymposium highlights the For Part 2 see MS37 the large, hybrid machines now being challenges of solving sparse linear The Bulk Synchronous Parallel built by the Department of Energy, will systems at extreme scale. Both scalable (BSP) bridging model, proposed have smaller local memory/core, more algorithms and approaches that can by Valiant in 1990, has had great cores/socket, multiple types of user- exploit manycore and massively parallel influence on algorithm and software addressable memory, increased memory architectures will be discussed. Topics design over the past decades. It has latencies -- especially to global memory of interest include (but is not limited inspired programming frameworks and -- and expensive synchronization. to) hierarchical solvers, low-rank structured parallel programming of Two popular strategies, MPI+threads compression, fine-grain parallelism, communication minimising algorithms. and flat MPI, need to be contrasted in asynchronous iterative methods, random In the current era of post-petascale terms of performance metrics, code sampling, and communication-avoiding HPC and of big data computing, complexity and maintenance burdens, methods. BSP has again led to successful and interoperability between libraries, programming frameworks such as applications, and operating systems. Organizer: Erik G. Boman MapReduce, Pregel, and Giraph. This session will present talks aimed Sandia National Laboratories, USA This minisymposium explores the at clarifying the tradeoffs involved for Organizer: Siva Rajamanickam use of BSP in contemporary high developers and users. Sandia National Laboratories, USA performance and big data computing, Organizer: Matthew G. 2:40-3:00 Recent Directions in highlighting new BSP algorithms and Knepley Extreme-Scale Solvers frameworks, new BSP-like models, Rice University, USA Erik G. Boman, Sandia National and BSP-inspired auto-generation and Laboratories, USA verification tools. Organizer: Barry F. Smith 3:05-3:25 Preconditioning Using Organizer: Albert-Jan N. Argonne National Laboratory, USA Hierarchically Semi-Separable Yzelman Organizer: Richard T. Mills Matrices and Randomized Sampling Pieter Ghysels Huawei Technologies, France Intel Corporation, USA , Francois-Henry Rouet, and Xiaoye Sherry Li, Lawrence Organizer: Rob H. Bisseling Organizer: Jed Brown Berkeley National Laboratory, USA Utrecht University, The Netherlands Argonne National Laboratory and 3:30-3:50 Preconditioning University of Colorado Boulder, USA 2:40-3:00 Towards Next-Generation Communication-Avoiding Krylov High-Performance BSP 2:40-3:00 Tradeoffs in Domain Methods Albert-Jan N. Yzelman, Huawei Decomposition: Halos Vs Shared Ichitaro Yamazaki, University of Technologies, France Contiguous Arrays Tennessee, Knoxville, USA; Siva Jed Brown, Argonne National Rajamanickam, Andrey Prokopenko, 3:05-3:25 Computing Almost Asynchronously: What Problems Can Laboratory and University of Erik G. Boman, Mark Hoemmen, and We Solve in O(log P) Supersteps? Colorado Boulder, USA Michael Heroux, Sandia National Alexander Tiskin, University of Laboratories, USA; Stan Tomov, 3:05-3:25 Firedrake: Burning the Warwick, United Kingdom Thread at Both Ends University of Tennessee, USA; Jack Gerard J Gorman, Imperial College J. Dongarra, University of Tennessee 3:30-3:50 Deductive Verification of BSP Algorithms with Subgroups London, United Kingdom and Oak Ridge National Laboratory, USA Frédéric Gava, Universite de Paris-Est, 3:30-3:50 Threads: the Wrong France Abstraction and the Wrong Semantic 3:55-4:15 A New Parallel Hierarchical Jeff R. Hammond and Timothy Mattson, Preconditioner for Sparse Matrices 3:55-4:15 A BSP Scalability Analysis of 3D LU Factorization with Row Intel Corporation, USA Chao Chen and Eric F. Darve, Stanford Partial Pivoting University, USA; Siva Rajamanickam Antoine Petitet, Huawei Technologies, and Erik G. Boman, Sandia National France Laboratories, USA 28 SIAM Conference on Parallel Processing for Scientific Computing

Wednesday, April 13 3:05-3:25 Balanced Dense-Sparse Wednesday, April 13 Parallel Solvers for Nonlinear MS30 Vibration Analysis of Elastic MS31 Structures Scientific Computing Svetozar Margenov, Bulgarian Novel Educational towards Ultrascale Systems - Academy of Science, Bulgaria Programs for CSE and Big Part I of II 3:30-3:50 High Performance Data 2:40 PM-4:20 PM Computing in Micromechanics 2:40 PM-4:20 PM Radim Blaheta, O. Jakl, and J. Stary, Room:Leroux Academy of Sciences of the Czech Room:Dejerine For Part 2 see MS38 Republic, Ostrava, Czech Republic; People skilled in the principles of Ultrascale computing systems are I. Georgiev, Institute of Information Computational Science and Engineering large-scale complex computing systems Technologies, Bulgaria (CSE) and Big Data are the most combining technologies from high 3:55-4:15 Parallel Block important asset in bringing these performance computing, distributed Preconditioning Framework and technologies to bear on science and systems, and cloud computing. In this Efficient Iterative Solution of Discrete industry. We showcase four different minisymposium we discuss challenges Multi-Physics Problems educational programs that in various and experiences with the development Milan D. Mihajlovic, University of ways are all different from standard of applications for Ultrascale systems. Manchester, United Kingdom faculty-oriented university degree The speakers are involved in the EU programs in order to address the cross- COST action ‘Network for Sustainable disciplinary and practical requirements Ultrascale Computing’ (NESUS) and of computational and data-driven will address design and performance science. Novelties include a blend of aspects of algorithms and applications Germany’s vocational training concept from scientific computing for Ultrascale with academic instruction (Aachen), a systems, including numerical and strong project orientation with business computational efficiency, energy participation (Maastricht), the tie-in aspects as well as implementations of non-university research institutions for heterogeneous architectures. The (Aachen), or a strong algorithmic minisymposium is planned to consist of orientation (Jena). two parts. Organizer: Christian H. Bischof Organizer: Svetozar Margenov Technische Universität Darmstadt, Bulgarian Academy of Science, Bulgaria Germany Organizer: Maya Neytcheva 2:40-3:00 Exploiting Synergies with Uppsala University, Sweden Non-University Partners in CSE Education Organizer: Thomas Rauber Marek Behr, RWTH Aachen University, Universität Bayreuth, Germany Germany Organizer: Gudula Rünger 3:05-3:25 Supporting Data-Driven Chemnitz University of Technology, Businesses Through Talented Students in KnowledgeEngineering@Work Germany Carlo Galuzzi, Maastricht University, 2:40-3:00 Simulations for Optimizing The Netherlands Lightweight Structures on Heterogeneous Compute Resources 3:30-3:50 Blending the Vocational Training and University Education Robert Dietze, Michael Hofmann, and Concepts in the Mathematical Gudula Rünger, Chemnitz University Technical Software Developer of Technology, Germany (matse) Program Benno Willemsen, RWTH Aachen University, Germany 3:55-4:15 Designing a New Master’s Degree Program in Computational and Data Science Martin Buecker, Friedrich Schiller continued in next column Universität Jena, Germany SIAM Conference on Parallel Processing for Scientific Computing 29

Wednesday, April 13 3:30-3:50 PIPS-SBB: A Parallel Wednesday, April 13 Distributed Memory Solver MS32 for Stochastic Mixed-Integer CP7 Optimization High Performance Deepak Rajan, Lawrence Livermore Preconditioning and Computing in Optimization National Laboratory, USA; Lluis Multigrid 2:40 PM-4:20 PM Munguia, Georgia Institute of 2:40 PM-4:20 PM Technology, USA; Geoffrey M. Room:Delarue Oxberry, Lawrence Livermore Room:Danton High performance computing (HPC) National Laboratory, USA; Cosmin Chair: Paul Lin, Sandia National becomes increasingly relevant in solving G. Petra, Argonne National Laboratories, USA large-scale optimization problems Laboratory, USA; Pedro Sotorrio 2:40-2:55 Towards Exascalable in many applications areas such as and Thomas Edmunds, Lawrence Algebraic Multigrid ´ operations research, energy systems, Livermore National Laboratory, USA Amani Alonazi and David E. Keyes, industrial engineering, advanced 3:55-4:15 How to Solve Open MIP King Abdullah University of Science manufacturing, and others. Despite Instances by Using Supercomputers & Technology (KAUST), Saudi being in its infancy, HPC optimization with Over 80,000 Cores Arabia was successfully used to solve Yuji Shinano, Zuse Institute Berlin, problems of unprecedented sizes. The Germany; Tobias Achterberg, Gurobi 3:00-3:15 A Matrix-Free mainstream computational approaches GmbH, Germany; Timo Berthold Preconditioner for Elliptic Solvers are parallelization of linear algebra, and Stefan Heinz, Fair Issac Europe Based on the Fast Multipole Method Huda Ibeid, King Abdullah University optimization decomposition algorithms Ltd, Germany; Thorsten Koch, Zuse of Science & Technology (KAUST), and branch-and-bound searches. This Institute Berlin, Germany; Stefan Saudi Arabia; Rio Yokota, Tokyo minisymposium on HPC optimization Vigerske, GAMS Software GmbH, Institute of Technology, Japan; David will host a series of presentations Germany; Michael Winkler, Gurobi E. Keyes, King Abdullah University on parallel methods and software GmbH, Germany implementations and is aimed at of Science & Technology (KAUST), fostering interdisciplinary collaborations Saudi Arabia between the optimization and parallel 3:20-3:35 Efficiency Through Reuse in processing communities. Algebraic Multigrid Andrey Prokopenko and Jonathan J. Organizer: Kibaek Kim Hu, Sandia National Laboratories, Argonne National Laboratory, USA USA Organizer: Cosmin G. Petra 3:40-3:55 A Dynamical Argonne National Laboratory, USA Preconditioning Strategy and Its Applications in Large-Scale Multi- 2:40-3:00 Parallel Decomposition Physics Simulations Methods for Structured MIP Kibaek Kim, Argonne National Xiaowen Xu, Institute of Applied Laboratory, USA Physics and Computational Mathematics, China 3:05-3:25 Parallel Optimization of Complex Energy Systems on High- 4:00-4:15 Performance of Performance Computing Platforms Smoothers for Algebraic Cosmin G. Petra, Argonne National Multigrid Preconditioners for Finite Element Variational Laboratory, USA Multiscale Incompressible Magnetohydrodynamics Paul Lin and John Shadid, Sandia National Laboratories, USA; Paul Tsuji, Lawrence Livermore National Laboratory, USA; Jonathan J. Hu, Sandia National Laboratories, USA

Coffee Break continued in next column 4:20 PM-4:50 PM Room:Exhibition Space 30 SIAM Conference on Parallel Processing for Scientific Computing

Wednesday, April 13 Wednesday, April 13 5:40-6:00 Resiliency in AMR Using Rollback and Redundancy MS33 MS34 Anshu Dubey, Argonne National Laboratory, USA Middleware in the Era of Challenges in Parallel 6:05-6:25 Resolving a Vast Range Extreme Scale Computing - Adaptive Mesh Refinement of Spatial and Dynamical Scales in Part II of II - Part III of III Cosmological Simulations 4:50 PM-6:05 PM Kevin Schaal and Volker Springel, 4:50 PM-6:30 PM Heidelberg Institute for Theoretical Room:Farabeuf Room:Pasquier Studies and Karlsruhe Institute of For Part 1 see MS25 For Part 2 see MS26 Technology, Germany The unprecedented complexity of future Parallel adaptive mesh refinement exascale systems will make the role (AMR) is a key technique when of resource management increasingly simulations are required to capture important for allocating system multiscale features. Frequent resources to users at policy-driven levels re-adaptation and repartitioning of the of quality of service while meeting mesh during the simulation can impose power and reliability requirements of the significant overhead, particularly in entire system. The key challenge lies in large-scale parallel environments. maintaining scalability while managing Further challenges arise due to the and monitoring allocations targeting availability of accelerated or special- multiple, often conflicting optimization purpose hardware, and the trend toward criteria. In this minisymposium, we hierarchical and hybrid compute will cover several R&D activities on architectures. Our minisymposium HPC resource management systems addresses algorithms, scalability, and both in academia and industry. In software issues of parallel AMR on particular, we will discuss their design HPC and multi-/manycore platforms. and implementation to identify how It will discuss novel techniques and to improve the sustained throughput, applications that demonstrate particular system availability and power efficiency use cases for AMR. at scale. Organizer: Michael Bader Organizer: Keita Teranishi Technische Universität München, Sandia National Laboratories, USA Germany Organizer: Martin Schulz Organizer: Martin Berzins Lawrence Livermore National University of Utah, USA Laboratory, USA Organizer: Carsten Burstedde 4:50-5:10 Operation Experiences of Universität Bonn, Germany the K Computer Atsuya Uno, RIKEN Advanced Institute 4:50-5:10 A Simple Robust and for Computational Science, Japan Accurate a Posteriori Subcell Finite Volume Limiter for the Discontinuous 5:15-5:35 Improving Resiliency Galerkin Method on Space-Time Through Dynamic Resource Adaptive Meshes Management Olindo Zanotti and Michael Dumbser, Suraj Prabhakaran and Felix Wolf, University of Trento, Italy Technische Universität Darmstadt, 5:15-5:35 Adaptive Mesh Refinement Germany in Radiation Problems 5:40-6:00 The Influence of Data Alan Humphrey and Todd Harman, Center Power and Energy Constraints University of Utah, USA; Daniel on Future HPC Scheduling Systems Sunderland, Sandia National Torsten Wilde, LRZ: Leibniz Laboratories, USA; Martin Berzins, Supercomputing Centre, Germany University of Utah, USA

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 31

Wednesday, April 13 5:40-6:00 How I Learned to Wednesday, April 13 Stop Worrying and Love {MPI}: MS35 Prospering with an {MPI}-Only MS36 Model in {PFLOTRAN} + {PETSc} in the To Thread or Not To Thread - Manycore Era Extreme-Scale Linear Part II of II Richard T. Mills, Intel Corporation, Solvers - Part II of II 4:50 PM-6:30 PM USA 4:50 PM-6:30 PM Room:Roussy 6:05-6:25 The Occa Abstract Threading Model: Implementation Room:Salle des theses For Part 1 see MS27 and Performance for High-Order For Part 1 see MS28 Emerging architectures, in particular Finite-Element Computations This minisymposium centers on the the large, hybrid machines now being Tim Warburton, Virginia Tech, USA; challenges of solving linear systems built by the Department of Energy, will David Medina, Rice University, USA at extreme scale. Both scalable have smaller local memory/core, more algorithms and approaches that can cores/socket, multiple types of user- exploit manycore and massively parallel addressable memory, increased memory architectures will be discussed. Topics latencies -- especially to global memory of interest include (but is not limited -- and expensive synchronization. to) hierarchical solvers, low-rank Two popular strategies, MPI+threads compression, fine-grain parallelism, and flat MPI, need to be contrasted in asynchronous iterative methods, random terms of performance metrics, code sampling, and communication-avoiding complexity and maintenance burdens, methods. and interoperability between libraries, applications, and operating systems. Organizer: Erik G. Boman This session will present talks aimed Sandia National Laboratories, USA at clarifying the tradeoffs involved for Organizer: Siva Rajamanickam developers and users. Sandia National Laboratories, USA Organizer: Matthew G. 4:50-5:10 Task and Data Knepley Parallelism Based Direct Solvers and Preconditioners in Manycore Rice University, USA Architectures Organizer: Barry F. Smith Siva Rajamanickam, Erik G. Boman, Argonne National Laboratory, USA Joshua D. Booth, and Kyungjoo Kim, Organizer: Richard T. Mills Sandia National Laboratories, USA Intel Corporation, USA 5:15-5:35 Parallel Ilu and Iterative Sparse Triangular Solves Organizer: Jed Brown Thomas K. Huckle and Juergen Argonne National Laboratory and Braeckle, Technische Universität University of Colorado Boulder, USA München, Germany 4:50-5:10 MPI+MPI: Using MPI-3 5:40-6:00 Aggregation-Based Shared Memory As a MultiCore Algebraic Multigrid on Massively Programming System Parallel Computers William D. Gropp, University of Illinois Yvan Notay, Université Libre de at Urbana-Champaign, USA Bruxelles, Belgium 5:15-5:35 MPI+X: Opportunities 6:05-6:25 Revisiting Asynchronous and Limitations for Heterogeneous Linear Solvers: Provable Systems Convergence Rate Through Karl Rupp, Josef Weinbub, Florian Randomization Rudolf, Andreas Morhammer, Tibor Haim Avron, IBM T.J. Watson Research Grasser, and Ansgar Juengel, Vienna Center, USA; Alex Druinsky, University of Technology, Austria Lawrence Berkeley National Laboratory, USA; Anshul Gupta, IBM T.J. Watson Research Center, USA

continued in next column 32 SIAM Conference on Parallel Processing for Scientific Computing

Wednesday, April 13 Wednesday, April 13 5:15-5:35 Optimizing the Use of GPU and Intel Xeon Phi Clusters to MS37 MS38 Accelerate Stencil Computations Roman Wyrzykowski, Krzysztof Rojek, The Present and Future Scientific Computing and Lukasz Szustak, Czestochowa Bulk Synchronous Parallel towards Ultrascale Systems University of Technology, Poland Computing - Part II of III - Part II of II 5:40-6:00 Efficient Parallel Algorithms 4:50 PM-6:30 PM 4:50 PM-6:30 PM for New Unidirectional Models of Nonlinear Optics Room:Marie Curie Room:Leroux Raimondas Ciegas, University Vilnius, For Part 1 see MS29 For Part 1 see MS30 Lithuania For Part 3 see MS45 Ultrascale computing systems are 6:05-6:25 Online Auto-Tuning for The Bulk Synchronous Parallel (BSP) large-scale complex computing systems Parallel Solution Methods for ODEs bridging model, proposed by Valiant combining technologies from high Thomas Rauber, Universität Bayreuth, in 1990, has had great influence on performance computing, distributed Germany; Natalia Kalinnik and algorithm and software design over systems, and cloud computing. In this Matthias Korch, University Bayreuth, the past decades. It has inspired minisymposium we discuss challenges Germany programming frameworks and structured and experiences with the development parallel programming of communication of applications for Ultrascale systems. minimising algorithms. In the current The speakers are involved in the EU era of post-petascale HPC and of big COST action ‘Network for Sustainable data computing, BSP has again led to Ultrascale Computing’ (NESUS) and successful programming frameworks will address design and performance such as MapReduce, Pregel, and Giraph. aspects of algorithms and applications This minisymposium explores the use of from scientific computing for Ultrascale BSP in contemporary high performance systems, including numerical and and big data computing, highlighting computational efficiency, energy new BSP algorithms and frameworks, aspects as well as implementations new BSP-like models, and BSP-inspired for heterogeneous architectures. The auto-generation and verification tools. minisymposium is planned to consist of Organizer: Albert-Jan N. two parts. Yzelman Organizer: Svetozar Margenov Huawei Technologies, France Bulgarian Academy of Science, Bulgaria Organizer: Rob H. Bisseling Utrecht University, The Netherlands Organizer: Maya Neytcheva 4:50-5:10 From Cilk to MPI to BSP: Uppsala University, Sweden The Value of Structured Parallel Organizer: Thomas Rauber Programming Universität Bayreuth, Germany Gaetan Hains, Huawei Technologies, France Organizer: Gudula Rünger 5:15-5:35 Systematic Development of Chemnitz University of Technology, Functional Bulk Synchronous Parallel Germany Programs 4:50-5:10 Parallelization of Numerical Frédéric Loulergue, Universite Methods for Time Evolution d’Orleans, France Processes, Described by Integro- Differential Equations 5:40-6:00 Scalable Parallel Graph Matching for Big Data Ali Dorostkar and Maya Neytcheva, Rob H. Bisseling, Utrecht University, Uppsala University, Sweden The Netherlands 6:05-6:25 Parallel Data Distribution for a Sparse Eigenvalue Problem Based on the BSP Model Dirk Roose, Karl Meerbergen, and Joris Tavernier, KU Leuven, Belgium continued in next column SIAM Conference on Parallel Processing for Scientific Computing 33

Wednesday, April 13 Wednesday, April 13 5:15-5:35 High Performance Computing for Astronomical MS39 MS40 Instrumentation on Large Telescopes Parallel Methodologies for Nigel Dipper, Durham University, High Performance United Kingdom Large Data Assimilation and Computing for Astronomical 5:40-6:00 Parallel Processing for the Inverse Problems - Part I of II Instrumentation Square Kilometer Array Telescope 4:50 PM-6:30 PM 4:50 PM-6:30 PM David De Roure, University of Oxford, United Kingdom Room:Dejerine Room:Delarue 6:05-6:25 The (Square Kilometre Data assimilation is the process of fusing Due to the always increasing complexity Array) and Its Computational information from priors, models, and of astronomical instruments, research Challenges observations, in order to obtain best in high performance computing has Stefano Salvini, University of Oxford, estimates of the state of a physical system become a major vector of scientific United Kingdom such as the atmosphere. Data assimilation discovery. Indeed, the design of relies on comprehensive physical models instruments for large telescopes requires with large numbers of variables, and on efficient numerical simulations of complex observational data sets. The observing conditions and instrument resulting large scale inverse problems need components. The latter can include to be solved in faster than real time in large computing facilities for operations order to, for example, meet the demands of and data storage and the extraction weather forecasting. This minisymposium of observables from these data also discusses the latest developments in requires significant computation parallel methodologies for solving large capabilities. We will cover most of data assimilation problems. these aspects through examples of Organizer: Adrian Sandu use of parallel processing to design instruments, operate them and process Virginia Polytechnic Institute & State the obtained data in the context of University, USA the largest projects in astronomy, Organizer: Marc Bocquet namely the European Extremely Large Ecole Nationale des Ponts et Chaussées, Telescope and the Square Kilometer France Array. 4:50-5:10 Parallel Implementations Organizer: Damien Gratadour of Ensemble Kalman Filters for Huge Observatoire de Paris, Meudon, France Geophysical Models Jeffrey Anderson, Helen Kershaw, Jonathan 4:50-5:10 Adaptive Optics Simulation Hendricks, and Nancy Collins, National for the European Extremely Large Center for Atmospheric Research, USA Telescope on Multicore Architectures with Multiple Gpus 5:15-5:35 Parallel 4D-Var and Hatem Ltaief, King Abdullah University Particle Smoother Data Assimilation of Science & Technology (KAUST), for Atmospheric Chemistry and Saudi Arabia; Damien Gratadour, Meteorology Observatoire de Paris, Meudon, Hendrik Elbern, University of Cologne, France; Ali Charara, King Abdullah Germany; Jonas Berndt and Philipp University of Science & Technology Franke, Forschungszentrum Jülich, (KAUST), Saudi Arabia; Eric Germany Gendron, Observatoire de Paris, 5:40-6:00 Parallel in Time Strong- Meudon, France; David E. Keyes, Constraint 4D-Var King Abdullah University of Science Adrian Sandu, Virginia Polytechnic & Technology (KAUST), Saudi Institute & State University, USA Arabia 6:05-6:25 Development of 4D Ensemble Variational Assimilation at Météo- France Gerald Desroziers, Meteo, France

continued in next column 34 SIAM Conference on Parallel Processing for Scientific Computing

Wednesday, April 13 Wednesday, April 13 Chase: A Chebyshev Accellerated Subspace Iteration Eigensolver CP8 PP1 Edoardo A. Di Napoli, Jülich Numerical Optimization Poster Session and Supercomputing Centre, Germany; Mario Berljafa, University of 4:50 PM-5:50 PM Reception Manchester, United Kingdom; Paul Room:Danton 7:15 PM-9:15 PM Springer and Jan Winkelmann, RWTH Aachen University, Germany Chair: Scott P. Kolodziej, Texas A&M Room:Exhibition Space A Parallel Fast Sweeping Method on University, USA Improving Scalability of 2D Sparse Nonuniform Grids 4:50-5:05 Vertex Separators with Matrix Partitioning Models Via Miles L. Detrixhe, University of Mixed-Integer Linear Optimization Reducing Latency Cost Scott P. Kolodziej and Tim Davis, Texas Cevdet Aykanat and Oguz Selvitopi, California, Santa Barbara, USA A&M University, USA Bilkent University, Turkey Evaluation of the Intel Xeon Phi and Nvidia K80 As Accelerators for Two- 5:10-5:25 Large-Scale Parallel Hybrid Parallel Simulation of Dimensional Panel Codes Combinatorial Optimization -- the Helicopter Rotor Dynamics Lukas Einkemmer, University of Role of Estimation and Ordering the Melven Roehrig-Zoellner, Achim Innsbruck, Austria Sub-Problems Basermann, and Johannes Hofmann, Bogdan Zavalnij and Sandor Szabo, German Aerospace Center (DLR), An Impact of Tuning the Kernel of University of Pécs, Hungary Germany the Structured QR Factorization in the TSQR 5:30-5:45 Parallel Linear Programming Poisson Solver for large-scale Takeshi Fukaya, Hokkaido University, Using Zonotopes Simulations with one Fourier Japan; Toshiyuki Imamura, RIKEN, Max Demenkov, Institute of Control Diagonalizable Direction Japan Sciences, Russia Ricard Borrell, Guillermo Oyarzun, Jorje Chiva, Oriol Lehmkuhl, Ivette Filters of the Improvement of Rodríguez, and Assensi Oliva, Multiscale Data from Atomistic Simulations Business Meeting Universitat Politecnica de Catalunya, Spain David J. Gardner, Lawrence Livermore 6:30 PM-7:15 PM National Laboratory, USA; Daniel High Performance Algorithms and R. Reynolds, Southern Methodist Software for Seismic Simulations Room:Farabeuf University, USA Alexander Breuer, University of California, San Diego, USA; Parallel Multigrid and FFT-Type Alexander Heinecke, Intel Algorithms for High Order Compact Corporation, USA; Michael Bader, Discretization of the Helmholtz Equation. Technische Universität München, Yury A. Gryazin, Idaho State University, Germany USA Large Scale Parallel Computations Computational Modeling and in R Through Elemental Aerodynamic Analysis of Simplified Rodrigo Canales, Elmar Peise, and Automobile Body Using Open Source Paolo Bientinesi, RWTH Aachen Cfd (OpenFoam) University, Germany Javeria Hashmi, National University of Parallel Programming Using Science and Technology, Zimbabwe Hpcmatlab Performance Evaluation of a Mukul Dave, Arizona State University, Numerical Quadrature Based USA; Xinchen Guo, Purdue Eigensolver Using a Real Pole Filter University, USA; Mohamed Sayeed, Yuto Inoue, Yasunori Futamura, and Arizona State University, USA Tetsuya Sakurai, University of Using the Verificarlo Tool to Assess Tsukuba, Japan the Floating Point Accuracy of Large Synthesis and Verification of BSP Scale Digital Simulations Programs Christophe Denis, CMLA, ENS de Arvid Jakobsson and Thibaut Tachon, Cachan, France; Pablo De Oliveira Huawei Technologies, France Castro and Eric Petit, PRiSM - UVSQ, France

continued in next column continued on next page SIAM Conference on Parallel Processing for Scientific Computing 35

A Reconstruction Algorithm for Dual Multilevel Approaches for FSAI Achieving Robustness in Domain Energy Computed Tomography and Preconditioning in the Iterative Decomposition Methods: Adpative Its Parallelization Solution of Linear Systems of Spectral Coarse Spaces Kiwan Jeon, National Institute for Equations Nicole Spillane, Ecole Polytechnique, Mathematical Sciences, Korea; Victor A. Paludetto Magri, Andrea France; Victorita Dolean, University Sungwhan Kim, Hanbat National Franceschini, Carlo Janna, and of Nice, France; Patrice Hauret, Ecole University, Korea; Chi Young Ahn and Massimiliano Ferronato, University of Polytechnique, France; Frédéric Taeyoung Ha, National Institute for Padova, Italy Nataf, CNRS and UPMC, France; Mathematical Sciences, Korea Large Scale Computation of Long Clemens Pechstein, University Nlafet - Parallel Numerical Linear Range Interactions in Molecular of Linz, Austria; Daniel J. Rixen, Algebra for Future Extreme-Scale Dynamics Simulations Technische Universität München, Systems Pyeong Jun Park, Korea National Germany; Robert Scheichl, University Bo T. Kågström, Lennart Edblom, and University of Transportation, Korea of Bath, United Kingdom Lars Karlsson, Umeå University, Accelerated Subdomain Smoothing A Proposal for a Batched Set of Blas. Sweden; Dongarra Jack, University for Geophysical Stokes Flow Pedro Valero-Lara, University of of Manchester, United Kingdom; Patrick Sanan, Università della Svizzera Manchester, United Kingdom; Jack Iain Duff, Science & Technology italiana, Switzerland J. Dongarra, University of Tennessee Facilities Council, United Kingdom ExaHyPE - Towards an Exascale and Oak Ridge National Laboratory, and CERFACS, Toulouse, France ; Hyperbolic PDE Engine USA; Azzam Haidar, University of Jonathan Hogg, Rutherford Appleton Angelika Schwarz, Vasco Varduhn, Tennessee, Knoxville, USA; Samuel Laboratory, United Kingdom; Laura and Michael Bader, Technische Relton, University of Manchester, Grigori, Inria Paris-Rocquencourt, Universität München, Germany; United Kingdom; Stanimire Tomov, France Michael Dumbser, University of University of Tennessee, Knoxville, An Interior-Point Stochastic Trento, Italy; Dominic E. Charrier and USA; Mawussi Zounon, University of Approximation Method on Massively Tobias Weinzierl, Durham University, Manchester, United Kingdom Parallel Architectures United Kingdom; Luciano Rezzolla, Spectrum Slicing Solvers for Extreme- Juraj Kardoš and Drosos Kourounis, Frankfurt Institute for Advanced Scale Eigenvalue Computations Università della Svizzera italiana, Studies, Germany; Alice Gabriel and Eugene Vecharynski, Lawrence Switzerland; Olaf Schenk, Università Heiner Igel, Ludwig-Maximilians- Berkeley National Laboratory, USA; della Svizzera Italiana, Italy Universität München, Germany Ruipeng Li, Lawrence Livermore Seigen: Seismic Modelling Through Malleable Task-Graph Scheduling National Laboratory, USA; Yuanzhe Code Generation with a Practical Speed-Up Model Xi, University of Minnesota, USA; Michael Lange, Christian Jacobs, Fabio Bertrand Simon, ENS Lyon, France; Chao Yang, Lawrence Berkeley Luporini, and Gerard J. Gorman, Loris Marchal, CNRS, France; Oliver National Laboratory, USA; Yousef Imperial College London, United Sinnen, University of Auckland, New Saad, University of Minnesota, USA Kingdom Zealand; Frédéric Vivien, Inria and Compressed Hierarchical Schur High-Performance Tensor Algebra ENS Lyon, France Linear System Solver for 3D Package for GPU Overview of Intel® Math Kernel Photonic Device Analysis with Gpu Ian Masliah, University of Paris- Library (Intel® MKL) Solutions for Accelerations Sud, France; Ahmad Abdelfattah, Eigenvalue Problems Cheng-Han Du and Weichung Wang, University of Tennessee, Knoxville, Irina Sokolova, Peter Tang and National Taiwan University, Taiwan USA; Marc Baboulin, University of Alexander Kalinkin, Intel Parallel Adaptive and Robust Paris-Sud, France; Jack J. Dongarra, Corporation, USA Multigrid University of Tennessee and Oak Gabriel Wittum, University of Frankfurt, Ridge National Laboratory, USA; Germany Joel Falcou, University of Paris-Sud, High performance eigenvalue solver France; Azzam Haidar and Stanimire for Hubbard model on CPU-GPU Tomov, University of Tennessee, hybrid platform Knoxville, USA Susumu Yamada, Japan Atomic Energy Agency, Japan; Toshiyuki Imamura, RIKEN, Japan; Masahiko Machida, Japan Atomic Energy Agency, Japan

continued in next column continued in next column 36 SIAM Conference on Parallel Processing for Scientific Computing

Thursday, April 14 Thursday, April 14 SP1 Thursday, April 14 SIAG/Supercomputing MS41 Registration Career Prize: Resilience Toward Exascale 7:45 AM-5:15 PM Supercomputers and Computing - Part I of VI Room:Entrance Hall Superintelligence 10:35 AM-12:15 PM 9:25 AM-9:50 AM Room:Farabeuf Room:Farabeuf For Part 2 see MS49 IP4 Chair: Laura Grigori, Inria, France Future extreme scale systems are expected to suffer more frequent system Next-Generation AMR In recent years the idea of emerging faults resulting from the unprecedented superintelligence has been discussed 8:30 AM-9:15 AM scale of parallelism. Combined with the widely by popular media, and many technology trend such as the imbalance Room:Farabeuf experts voiced grave warnings about between computing and I/O throughput Chair: Costas Bekas, IBM Research- its possible consequences. This talk and a tight power budget in the system Zurich, Switzerland will use an analysis of progress in operations, the traditional hardware- supercomputer performance to examine Block-structured adaptive mesh level redundancy and checkpoint the gap between current technology and refinement (AMR) is a powerful tool restart may not be a feasible solution. reaching the capabilities of the human for improving the computational Instead, every layer of the systems brain. In spite of good progress in high efficiency and reducing the memory should address these faults to mitigate performance computing (HPC) and footprint of structured-grid numerical their impact in holistic manners. In techniques such as machine learning, simulations. AMR techniques have this minisymposium, we will address this gap is still very large. I will then been used for over 25 years to solve the deficiencies of the current practice explore two related topics through a increasingly complex problems. I will from a spectrum of high performance discussion of recent examples: what give an overview of what we are doing computing research from hardware, can we learn from the brain and apply in Berkeley Lab’s AMR framework, systems, runtime, algorithm and to HPC, e.g., through recent efforts BoxLib, to address the challenges of applications. next-generation multicore architectures in neuromorphic computing? And and the complexity of multiscale, how much progress have we made in Organizer: Keita Teranishi multiphysics problems, including new modeling brain function? The talk will Sandia National Laboratories, USA ways of thinking about multilevel be concluded with my perspective on the true dangers of superintelligence, and Organizer: Luc Giraud algorithms and new approaches to data Inria, France layout and load balancing, in situ and on our ability to ever build self-aware or in transit visualization and analytics, sentient computers. Organizer: Emmanuel Agullo and run-time performance modeling and Inria, France Horst D. Simon, Lawrence Berkeley control. Organizer: Francesco Rizzi National Laboratory, USA Ann S. Almgren Sandia National Laboratories, USA Lawrence Berkeley National Laboratory, Organizer: Michael Heroux USA Sandia National Laboratories, USA SIAG/Supercomputing Early 10:35-10:55 Resilience Research is Essential but Failure is Unlikely Career Prize Michael Heroux, Sandia National Intermission 9:55 AM-10:00 AM Laboratories, USA 9:15 AM-9:25 AM Room:Farabeuf 11:00-11:20 Memory Errors in Modern Systems Steven Raasch and Vilas Sridharan, AMD, USA Coffee Break 11:25-11:45 Characterizing Faults on 10:10 AM-10:35 AM Production Systems Martin Schulz, Lawrence Livermore Room:Exhibition Space National Laboratory, USA 11:50-12:10 The Missing High- Performance Computing Fault Model Christian Engelmann, Oak Ridge National Laboratory, USA SIAM Conference on Parallel Processing for Scientific Computing 37

Thursday, April 14 Thursday, April 14 11:25-11:45 Partitioning and Task Placement with Zoltan2 MS42 MS43 Mehmet Deveci, Karen D. Devine, Erik G. Boman, Vitus Leung, Siva High-End Computing for DOE FASTMath Numerical Next-Generation Scientific Rajamanickam, and Mark A. Taylor, Software on Next- Sandia National Laboratories, USA Discovery - Part I of III generation Computers - 11:50-12:10 Scalable Nonlinear 10:35 AM-12:15 PM Part I of II Solvers and Preconditioners in Room:Pasquier 10:35 AM-12:15 PM Climate Applications David J. Gardner, Lawrence Livermore For Part 2 see MS50 Room:Roussy National Laboratory, USA; Richard Computational modeling and For Part 2 see MS51 Archibald and Katherine J. Evans, simulation at large scale has become This two part minisymposium series Oak Ridge National Laboratory, indispensable for science and focuses on algorithms and software USA; Paul Ullrich, University of engineering. However, the recent end of developed by the DOE FASTMath California, Davis, USA; Carol S. Dennard scaling is causing significant SciDAC team to improve the reliability Woodward, Lawrence Livermore disruption to computing systems, and robustness of application codes. National Laboratory, USA; Patrick resulting in increasingly complex and We describe advances in the scalable H. Worley, Oak Ridge National heterogeneous HPC platforms. In this implementation of unstructured mesh Laboratory, USA minisymposium, we discuss the critical techniques, partitioning libraries, as components for effectively building and well as linear and nonlinear solvers. We leveraging HPC systems. We begin by describe our experiences implementing examining the world’s most powerful advanced numerical software on the supercomputers, and discussing their latest computer platforms including architectural tradeoffs. Next we present techniques for achieving million- key application drivers, whose science parallelism, the use of heterogeneous depends on the successful deployment nodes, and dragonfly networks. We of future exascale systems. Finally, give specific examples of the use of we explore emerging technology these libraries in large-scale scientific solutions that have the potential to computing applications. enable unprecedented next- generation computing capability. Organizer: Lori A. Diachin Lawrence Livermore National Organizer: Leonid Oliker Laboratory, USA Lawrence Berkeley National Laboratory, USA 10:35-10:55 Lessons Learned in the Development of FASTMath Numerical Organizer: Rupak Biswas Software for Current and Next NASA Ames Research Center, USA Generation HPC Lori A. Diachin, Lawrence Livermore Organizer: David Donofrio National Laboratory, USA Lawrence Berkeley National Laboratory, 11:00-11:20 Parallel Unstructured USA Mesh Generation/Adaptation Tools 10:35-10:55 From Titan to Summit: Used in Automated Simulation Hybrid Multicore Computing at ORNL Workflows Arthur S. Bland, Oak Ridge National Mark S. Shephard, Cameron Smith, Laboratory, USA Dan A. Ibanez, Brian Granzow, and 11:00-11:20 Sequoia to Sierra: The Onkar Sahni, Rensselaer Polytechnic LLNL Strategy Institute, USA; Kenneth Jansen, Bronis R. de Supinski, Lawrence University of Colorado Boulder, USA Livermore National Laboratory, USA 11:25-11:45 Hazelhen – Leading HPC Technology and its Impact on Science in Germany and Europe Thomas Bönisch and Michael Resch, University of Stuttgart, Germany 11:50-12:10 Lessons Learned from continued in next column Development and Operation of the K Computer Fumiyoshi Shoji, RIKEN, Japan 38 SIAM Conference on Parallel Processing for Scientific Computing

Thursday, April 14 11:25-11:45 Coupled Matrix and Thursday, April 14 Tensor Factorizations: Models, MS44 Algorithms, and Computational MS45 Issues Parallel Algorithms for Evrim Acar, University of Copenhagen, The Present and Future Tensor Computations - Denmark Bulk Synchronous Parallel Part I of II 11:50-12:10 Parallel Algorithms Computing - Part III of III 10:35 AM-12:15 PM for Tensor Completion and 10:35 AM-12:15 PM Recommender Systems Room:Salle des theses Lars Karlsson, Umeå University, Room:Marie Curie For Part 2 see MS52 Sweden; Daniel Kressner, EPFL, For Part 2 see MS37 Tensors, or multidimensional arrays, Switzerland; Andre Uschmajew, The Bulk Synchronous Parallel are a natural way to represent high- University of Bonn, Germany (BSP) bridging model, proposed dimensional data arising in a multitude by Valiant in 1990, has had great of applications. Tensor decompositions, influence on algorithm and software such as the CANDECOMP/PARAFAC design over the past decades. It has and Tucker models, help to identify inspired programming frameworks and latent structure, achieve data structured parallel programming of compression, and enable other tools communication minimising algorithms. of data analysis. This minisymposium In the current era of post-petascale identifies the key operations in related HPC and of big data computing, algorithms for tensor decomposition. BSP has again led to successful The focus of the minisymposium is programming frameworks such as on recent developments in efficient, MapReduce, Pregel, and Giraph. This parallel algorithms and implementations minisymposium explores the use of BSP for handing large tensors. The eight in contemporary high performance and talks cover three key operations big data computing, highlighting new in different shared memory and BSP algorithms and frameworks, new distributed memory parallel computing BSP-like models, and BSP-inspired systems environments and investigates auto-generation and verification tools. applications in scientific computing and Organizer: Albert-Jan N. recommender systems. Yzelman Organizer: Bora Ucar Huawei Technologies, France LIP-ENS Lyon, France Organizer: Rob H. Bisseling Organizer: Grey Ballard Utrecht University, The Netherlands Sandia National Laboratories, USA 10:35-10:55 New Directions in BSP Organizer: Tamara G. Kolda Computing Sandia National Laboratories, USA Bill McColl, Huawei Technologies, France 10:35-10:55 Parallel Tensor Compression for Large-Scale 11:00-11:20 Multi-ML: Functional Scientific Data Programming of Multi-BSP Algorithms Woody N. Austin, University of Texas Victor Allombert, Université Paris- at Austin, USA; Grey Ballard and est Créteil, France; Julien Tesson, Tamara G. Kolda, Sandia National Université Paris-Est, France; Frédéric Laboratories, USA Gava, Universite de Paris-Est, France 11:00-11:20 Parallelizing and Scaling 11:25-11:45 Parallelization Strategies Tensor Computations for Distributed Machine Learning Muthu M. Baskaran, Benoit Meister, and Eric Xing, Carnegie Mellon University, Richard Lethin, Reservoir Labs, USA USA 11:50-12:10 Panel: The Future of BSP Computing Albert-Jan N. Yzelman, Huawei Technologies, France continued in next column SIAM Conference on Parallel Processing for Scientific Computing 39

Thursday, April 14 Thursday, April 14 Thursday, April 14 MS46 MS47 MS48 Parallel Solvers for High Parallel Methodologies for Particle Method Based Performance Computing - Large Data Assimilation and Applications and their Part I of II Inverse Problems Part II of II Parallel Optimizations 10:35 AM-12:15 PM 10:35 AM-12:15 PM towards Exascale - Part I of II Room:Leroux Room:Dejerine For Part 2 see MS54 Data assimilation is the process of 10:35 AM-12:15 PM Large scale scientific computing fusing information from priors, models, Room:Delarue is demanded by the most complex and observations, in order to obtain For Part 2 see MS56 applications, generally multi scale, best estimates of the state of a physical A large number of industrial problems multi physics, non-linear, and/ system such as the atmosphere. Data can be modeled using particle-based or transient in nature. In the past, assimilation relies on comprehensive methods, The recent published improving performance was only a physical models with large numbers of DOE exascale report identified the matter of waiting for the next generation variables, and on complex observational particles based methods as “well- processors. As a consequence, research data sets. The resulting large scale suited for exascale computing”. The on solvers would take second place inverse problems need to be solved in reason is that particles based methods to the search for new discretisation faster than real time in order to, for provide extremely fine-grained schemes. But since approximately year example, meet the demands of weather parallelism and allow the exploitation 2005 the increase in performance is forecasting. This minisymposium of asynchrony. Efficient parallel almost entirely due to the increase in discusses the latest developments in particle method applications require the number of cores per processor. To parallel methodologies for solving large the efficient implementation: domain keep doubling performance parallelism data assimilation problems. decomposition, dynamic load balancing, must double. Tailored algorithms for Organizer: Adrian Sandu optimized structured and unstructured numerical simulations are of paramount communication, parallel I/O, neighbour importance. Virginia Polytechnic Institute & State University, USA lists searching, particle-to-mesh, Organizer: Victorita Dolean mesh-to- particle interpolation, sparse University of Nice, France Organizer: Marc Bocquet linear solvers. The present proposal Ecole Nationale des Ponts et Chaussées, aims to identify the common efficient 10:35-10:55 Linear and Non-Linear France Domain Decomposition Methods for implementation of the above kernels Coupled Problems 10:35-10:55 From Data to Prediction and their optimizations among different Rolf Krause, Università della Svizzera with Quantified Uncertainties: applications for the future exascale Italiana, Italy; Erich Foster, USI, Scalable Parallel Algorithms and systems Switzerland; Marco Favino, University Applications to the Flow of Ice Sheets Omar Ghattas, University of Texas at Organizer: Xiaohu Guo of Lugano, Switzerland; Patrick Austin, USA; Tobin Isaac, University Science and Technology Facilities Zulian, USI, Switzerland of Chicago, USA; Noemi Petra, Council, United Kingdom 11:00-11:20 Improving Strong University of California, Merced, Organizer: Tao Cui Scalability of Linear Solver for USA; Georg Stadler, Courant Institute Reservoir Simulation Chinese Academy of Sciences, China of Mathematical Sciences, New York Tom B. Jonsthovel, Schlumberger, 10:35-10:55 Applications of University, USA United Kingdom Smoothed Particle Hydrodynamics 11:00-11:20 Parallel Aspects of (SPH) for Real Engineering Problems: 11:25-11:45 PCBDDC: Robust BDDC Diffusion-Based Correlation Operators Particle Parallelisation Requirements Preconditioners in PETSc in Ensemble-Variational Data and a Vision for the Future Stefano Zampini, King Abdullah Assimilation Benedict Rogers, University of University of Science & Technology Anthony Weaver and Jean Tshimanga, Manchester, United Kingdom (KAUST), Saudi Arabia CERFACS, France 11:50-12:10 Solving Large Systems 11:25-11:45 Title Not Available for the Elastodynamics Equation to Mike Fisher, European Center for Perform Acoustic Imaging Medium Range Weather Forecasts, Dimitri Komatitsch, CNRS & Universite United Kingdom de Marseille, France; Paul Cristini, continued on next page CNRS, France 11:50-12:10 Using Adjoint Based Numerical Error Estimates and Optimized Data Flows to Construct Surrogate Models Abani Patra, State University of New York, Buffalo, USA 40 SIAM Conference on Parallel Processing for Scientific Computing

Thursday, April 14 Thursday, April 14 Thursday, April 14 MS48 CP9 Lunch Break 12:15 PM-1:45 PM Particle Method Based System-level Scalability Attendees on their own Applications and their 10:35 AM-12:15 PM Parallel Optimizations Room:Danton towards Exascale - Part I of II Chair: Courtenay T. Vaughan, Sandia National Laboratories, USA IP5 10:35 AM-12:15 PM 10:35-10:50 On the Path to Trinity - Beating Heart Simulation Room:Delarue Experiences Bringing Code to the Based on a Stochastic Next Generation ASC Capability continued Platform Biomolecular Model Courtenay T. Vaughan and Simon 1:45 PM-2:30 PM D. Hammond, Sandia National Room:Farabeuf 11:00-11:20 Parallel Application Laboratories, USA Chair: Naoya Maruyama, RIKEN, Japan of Monte Carlo Particle Simulation 10:55-11:10 Implicitly Parallel Code JMCT in Nuclear Reator Interfaces to Operational Quantum In my talk, I will introduce a new Analysis Computation parallel computational approach Gang Li, Institute of Applied Physics Steve P. Reinhardt, D-Wave Systems, indispensable to simulate a beating and Computational Mathematics, Inc., USA human heart driven by stochastic Beijing, P.R. China 11:15-11:30 Investigating IBM biomolecular dynamics. In this 11:25-11:45 Parallel Pic Simulation Power8 System Configuration To approach, we start from a molecular Based on Fast Electromagnetic Field Minimize Energy Consumption level modeling, and we directly simulate Solver During the Execution of CoreNeuron individual molecules in sarcomere Tao Cui, Chinese Academy of Sciences, Scientific Application by the Monte Carlo (MC) method, China Fabien Delalondre and Timothee thereby naturally handle the cooperative 11:50-12:10 An Hybrid OpenMP-MPI Ewart, École Polytechnique Fédérale and stochastic molecular behaviors. Parallelization for Variable Resolution de Lausanne, Switzerland; Pramod Then we imbed these sarcomere MC Adaptive SPH with Fully Object- Shivaj and Felix Schuermann, EPFL, samples into contractile elements in Oriented SPHysics France continuum material model discretized Ranato Vacondio, University of Parma, 11:35-11:50 Scalable Parallel I/O by finite element method. The clinical Italy Using HDF5 for HPC Fluid Flow applications of our heart simulator are Simulations also introduced. Christoph M. Ertl and Ralf-Peter Takumi Washio Mundani, Technische Universität München, Germany; Jérôme Frisch, University of Tokyo, Japan RWTH Aachen University, Germany; Ernst Rank, Technische Universität München, Germany 11:55-12:10 Quantifying Intermission Performance of a Resilient Elliptic PDE Solver on Uncertain 2:30 PM-2:40 PM Architectures Using SST/Macro Karla Morris, Francesco Rizzi, Khachik Sargsyan, and Kathryn Dahlgren, Sandia National Laboratories, USA; Paul Mycek, Duke University, USA; Cosmin Safta, Sandia National Laboratories, USA; Olivier P. Le Maître, LIMSI- CNRS, France; Omar M. Knio, Duke University, USA; Bert J. Debusschere, Sandia National Laboratories, USA SIAM Conference on Parallel Processing for Scientific Computing 41

Thursday, April 14 2:40-3:00 A Soft and Hard Faults Thursday, April 14 Resilient Solver for 2D Elliptic PDEs Via MS49 Server-Client-Based Implementation Francesco Rizzi and Karla Morris, MS50 Resilience Toward Exascale Sandia National Laboratories, USA; High-End Computing for Computing - Part II of VI Paul Mycek, Duke University, Next-Generation Scientific 2:40 PM-4:20 PM USA; Olivier P. Le Maître, LIMSI- Discovery - Part II of III CNRS, France; Andres Contreras, Room:Farabeuf Duke University, USA; Khachik 2:40 PM-4:20 PM For Part 1 see MS41 Sargsyan and Cosmin Safta, Sandia Room:Pasquier For Part 3 see MS57 National Laboratories, USA; Omar Future extreme scale systems are For Part 1 see MS42 M. Knio, Duke University, USA; For Part 3 see MS58 expected to suffer more frequent system Bert J. Debusschere, Sandia National Computational modeling and faults resulting from the unprecedented Laboratories, USA simulation at large scale has become scale of parallelism. Combined with the 3:05-3:25 Fault-Tolerant Multigrid indispensable for science and technology trend such as the imbalance Algorithms Based on Error engineering. However, the recent end of between computing and I/O throughput Confinement and Full Approximation Dennard scaling is causing significant and a tight power budget in the system Mirco Altenbernd, Universität Stuttgart, disruption to computing systems, operations, the traditional hardware-level Germany resulting in increasingly complex and redundancy and checkpoint restart may heterogeneous HPC platforms. In this not be a feasible solution. Instead, every 3:30-3:50 Soft Error Detection for Finite Difference Codes on Regular Grids minisymposium, we discuss the critical layer of the systems should address these Peter Arbenz and Manuel Kohler, ETH components for effectively building and faults to mitigate their impact in holistic Zürich, Switzerland leveraging HPC systems. We begin by manners. In this minisymposium, we examining the world’s most powerful will address the deficiencies of the 3:55-4:15 Resilience for Multigrid at supercomputers, and discussing their current practice from a spectrum of high the Extreme Scale architectural tradeoffs. Next we present performance computing research from Ulrich J. Ruede, University of Erlangen- key application drivers, whose science hardware, systems, runtime, algorithm Nuremberg, Germany; Markus Huber, depends on the successful deployment and applications. Technische Universität München, Germany; Bjoern Gmeiner, University of future exascale systems. Finally, Organizer: Keita Teranishi Erlangen-Nürnberg, Germany; we explore emerging technology Sandia National Laboratories, USA Barbara Wohlmuth, Technische solutions that have the potential to Universität München, Germany enable unprecedented next-generation Organizer: Luc Giraud computing capability. Inria, France Organizer: Leonid Oliker Organizer: Emmanuel Agullo Lawrence Berkeley National Inria, France Laboratory, USA Organizer: Francesco Rizzi Organizer: Rupak Biswas Sandia National Laboratories, USA NASA Ames Research Center, USA Organizer: Michael Heroux 2:40-3:00 Extreme-Scale Genome Sandia National Laboratories, USA Assembly Leonid Oliker, Aydin Buluc, and Rob Egan, Lawrence Berkeley National Laboratory, USA; Evangelos Georganas, University of California, Berkeley, USA; Steven Hofmeyr, Lawrence Berkeley National Laboratory, USA; Daniel Rokhsar and Katherine Yelick, Lawrence Berkeley National Laboratory and University of California Berkeley, USA

continued in next column continued on next page 42 SIAM Conference on Parallel Processing for Scientific Computing

Thursday, April 14 Thursday, April 14 3:30-3:50 Comparing Global Link Arrangements for Dragonfly MS50 MS51 Networks Vitus Leung, Sandia National High-End Computing for DOE FASTMath Numerical Laboratories, USA; David Bunde, Next-Generation Scientific Software on Next- Knox College, USA Discovery - Part II of III generation Computers - 3:55-4:15 Scaling Unstructured Mesh 2:40 PM-4:20 PM Part II of II Computations 2:40 PM-4:20 PM Kenneth Jansen, University of Room:Pasquier Colorado Boulder, USA; Jed Brown, Room:Roussy Argonne National Laboratory and continued For Part 1 see MS43 University of Colorado Boulder, This two part minisymposium series USA; Michel Rasquin and Benjamin focuses on algorithms and software Matthews, University of Colorado 3:05-3:25 Research Applications and developed by the DOE FASTMath Boulder, USA; Cameron Smith, Dan Analysis of a Fully Eddying Virtual SciDAC team to improve the reliability A. Ibanez, and Mark S. Shephard, Ocean. and robustness of application codes. Rensselaer Polytechnic Institute, Chris Hill, Massachusetts Institute of We describe advances in the scalable USA Technology, USA implementation of unstructured mesh 3:30-3:50 Progress in Performance techniques, partitioning libraries, Portability of the XGC Plasma as well as linear and nonlinear Microturbulence Codes: solvers. We describe our experiences Heterogeneous and Manycore Node implementing advanced numerical Architectures software on the latest computer Eduardo F. D’Azevedo, Oak Ridge platforms including techniques for National Laboratory, USA; Mark achieving million-parallelism, the use Adams, Lawrence Berkeley National of heterogeneous nodes, and dragonfly Laboratory, USA; Choong-Seock networks. We give specific examples Chang, Stephane Ethier, Robert of the use of these libraries in large- Hager, Seung-Hoe Ku, and Jianying scale scientific computing applications. Lang, Princeton Plasma Physics Organizer: Lori A. Diachin Laboratory, USA; Mark Shepard, Lawrence Livermore National Rensselaer Polytechnic Institute, Laboratory, USA USA; Patrick H. Worley, Oak Ridge National Laboratory, USA; Eisung 2:40-3:00 An Extension of Hypre’s Structured and Semi-Structured Yoon, Rensselaer Polytechnic Matrix Classes Institute, USA Ulrike M. Yang and Robert D. Falgout, 3:55-4:15 Challenges in Adjoint- Lawrence Livermore National Based Aerodynamic Design for Laboratory, USA Unsteady Flows Eric Nielsen, NASA Langley Research 3:05-3:25 Unstructured Mesh Adaptation for MPI and Center, USA Accelerators Dan A. Ibanez, Rensselaer Polytechnic Institute, USA

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 43

Thursday, April 14 3:30-3:50 Low Rank Bilinear Thursday, April 14 Algorithms for Symmetric Tensor MS52 Contractions MS53 Parallel Algorithms for Edgar Solomonik, ETH Zürich, Switzerland Large-scale Electronic Tensor Computations - Structure Calculation: 3:55-4:15 An Input-Adaptive and Part II of II In-Place Approach to Dense Tensor- Algorithms, Performance 2:40 PM-4:20 PM Times-Matrix Multiply and Applications - Jiajia Li, Casey Battaglino, Ioakeim Room:Salle des theses Part I of III Perros, Jimeng Sun, and Richard For Part 1 see MS44 Vuduc, Georgia Institute of 2:40 PM-4:20 PM Tensors, or multidimensional arrays, Technology, USA Room:Marie Curie are a natural way to represent high- dimensional data arising in a multitude For Part 2 see MS61 Electronic structure calculations and of applications. Tensor decompositions, their applications are among the most such as the CANDECOMP/PARAFAC challenging and computationally and Tucker models, help to identify demanding science and engineering latent structure, achieve data problems. This minisymposium aims at compression, and enable other tools presenting and discussing new numerical of data analysis. This minisymposium and parallel processing avenues that identifies the key operations in related are suitable for modern computing algorithms for tensor decomposition. architectures, for achieving ever higher The focus of the minisymposium is level of accuracy and scalability in DFT, on recent developments in efficient, TDDFT and other types of ground and parallel algorithms and implementations excited states simulations. We propose to for handing large tensors. The eight bring together physicists/chemists who talks cover three key operations are involved in improving the numerical in different shared memory and development of widely known quantum distributed memory parallel computing chemistry and solid-state physics systems environments and investigates application software packages, with applications in scientific computing and mathematicians/computer scientists who recommender systems. are focusing on advancing the required Organizer: Grey Ballard state-of-the-art mathematical algorithms Sandia National Laboratories, USA and parallel implementation. Organizer: Tamara G. Kolda Organizer: Chao Yang Sandia National Laboratories, USA Lawrence Berkeley National Laboratory, USA Organizer: Bora Ucar LIP-ENS Lyon, France Organizer: Eric Polizzi 2:40-3:00 High Performance Parallel University of Massachusetts, Amherst, Sparse Tensor Decompositions USA Oguz Kaya, Inria and ENS Lyon, 2:40-3:00 DGDFT: A Massively Parallel France; Bora Ucar, LIP-ENS Lyon, Electronic Structure Calculation Tool France Chao Yang, Lawrence Berkeley National 3:05-3:25 Efficient Factorization with Laboratory, USA Compressed Sparse Tensors Shaden Smith, University of Minnesota, USA; George Karypis, University of Minnesota and Army HPC Research Center, USA

continued in next column continued on next page 44 SIAM Conference on Parallel Processing for Scientific Computing

Thursday, April 14 Thursday, April 14 3:30-3:50 Block Iterative Methods and Recycling for Improved Scalability of MS53 MS54 Linear Solvers. Pierre Jolivet, CNRS, France; Pierre- Large-scale Electronic Parallel Solvers for High Henri Tournier, Université Pierre et Structure Calculation: Performance Computing - Marie Curie and Inria, France Algorithms, Performance Part II of II 3:55-4:15 Reliable Domain and Applications - 2:40 PM-4:20 PM Decomposition Methods with Multiple Search Part I of III Room:Leroux Nicole Spillane, Ecole Polytechnique, 2:40 PM-4:20 PM For Part 1 see MS46 France; Christophe Bovet, CNRS, Room:Marie Curie Large scale scientific computing Université Paris-Saclay, France; is demanded by the most complex Pierre Gosselet, LMT-Cachan, continued applications, generally multi France; Daniel J. Rixen, Technische scale, multi physics, non-linear, Universität München, Germany; and/or transient in nature. In the François-Xavier Roux, Université past, improving performance was Pierre et Marie Curie, France 3:05-3:25 Design and Implementation only a matter of waiting for the of the PEXSI Solver Interface in SIESTA next generation processors. As a Alberto Garcia, Institut de Ciencia de consequence, research on solvers Materials de Barcelona (ICMAB- would take second place to the search CSIC), Spain; Lin Lin, University of for new discretisation schemes. But California, Berkeley and Lawrence since approximately year 2005 the Berkeley National Laboratory, increase in performance is almost USA; Georg Huhs, Barcelona entirely due to the increase in the Supercomputing Center, Spain; Chao number of cores per processor. To Yang, Lawrence Berkeley National keep doubling performance parallelism Laboratory, USA must double. Tailored algorithms for 3:30-3:50 Wavelets as a Basis Set for numerical simulations are of paramount Electronic Structure Calculations: The importance. BigDFT Project Stefan Goedecker, University of Basel, Organizer: Victorita Dolean Switzerland University of Nice, France 2:40-3:00 Preconditioning of 3:55-4:15 BigDFT: Flexible DFT High Order Edge Elements Type Approach to Large Systems Using Discretizations for the Time-Harmonic Adaptive and Localized Basis Maxwell’s Equations Functions Thierry Deutsch, CEA, France Victorita Dolean, University of Nice, France 3:05-3:25 Parallel Preconditioners for Problems Arising from Whole- Microwave System Modelling for Brain Imaging Pierre-Henri Tournier, Université Pierre et Marie Curie and Inria, France; Pierre Jolivet, CNRS, France; Frédéric Nataf, CNRS and UPMC, France

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 45

Thursday, April 14 Thursday, April 14 3:30-3:50 Parallelization of Smoothed Particle Hydrodynamics for MS55 MS56 Engineering Applications with the Heterogeneous Architectures Auto-Tuning for the Post Particle Method Based Jose M. Domínguez Alonso, Moore’s Era - Part I of II Applications and their Universidade de Vigo, Spain 2:40 PM-4:20 PM Parallel Optimizations 3:55-4:15 Towards a Highly Scalable Particles Method Based Toolkit: Room:Dejerine towards Exascale - Part II of II Optimization for Real Applications For Part 2 see MS63 Xiaohu Guo, Science and Technology It is expected that the progress of 2:40 PM-4:20 PM Facilities Council, United Kingdom semiconductor technology will be Room:Delarue saturated by the end of 2020 due to physical limitations in device For Part 1 see MS48 miniaturization. As a result, no gains in A large number of industrial problems FLOPS computations will be achieved can be modeled using particle-based with a constant power budget. This methods, The recent published change is referred to as “Post Moore’s DOE exascale report identified the Era”. We anticipate that auto-tuning particles based methods as “well- (AT) technologies will still have suited for exascale computing”. The the ability of providing optimized, reason is that particles based methods high performance optimization for provide extremely fine-grained architectures of the Post Moore’s Era. parallelism and allow the exploitation This minisymposium will discuss AT of asynchrony. Efficient parallel frameworks and technologies that target particle method applications require the core of most large applications, the efficient implementation: domain including linear solvers, eigensolvers decomposition, dynamic load balancing, and stencil computations for new optimized structured and unstructured algorithm towards the Post Moore’s Era. communication, parallel I/O, neighbour lists searching, particle-to-mesh, Organizer: Takahiro Katagiri mesh-to-particle interpolation, sparse University of Tokyo, Japan linear solvers. The present proposal Organizer: Toshiyuki Imamura aims to identify the common efficient RIKEN, Japan implementation of the above kernels and their optimizations among different Organizer: Osni A. Marques applications for the future exascale Lawrence Berkeley National systems Laboratory, USA Organizer: Xiaohu Guo 2:40-3:00 Auto-Tuning of Hierarchical Science and Technology Facilities Computations with ppOpen-AT Council, United Kingdom Takahiro Katagiri, Masaharu Matsumoto, and Satoshi Ohshima, Organizer: Tao Cui University of Tokyo, Japan Chinese Academy of Sciences, China 3:05-3:25 Scaling Up Autotuning 2:40-3:00 H5hut: a High Performance Jeffrey Hollingsworth, University of Hdf5 Toolkit for Particle Simulations Maryland, USA Achim Gsell, PSI, USA 3:30-3:50 User-Defined Code 3:05-3:25 Weakly Compressible Transformation for High Performance Smooth Particle Hydrodynamics Portability Performance on the Intel Xeon Phi Hiroyuki Takizawa, Shoichi Hirasawa, Sergi-Enric Siso, Science and and Hiroaki Kobayashi, Tohoku Technology Facilities Council, United University, Japan Kingdom 3:55-4:15 Auto-Tuning for the SVD of Bidiagonals Osni A. Marques, Lawrence Berkeley National Laboratory, USA continued in next column 46 SIAM Conference on Parallel Processing for Scientific Computing

Thursday, April 14 Thursday, April 14 5:15-5:35 Silent Data Corruption and Error Propagation in Sparse Matrix CP10 MS57 Computations Luke Olson, Jon Calhoun, and Marc Scalable Applications Resilience Toward Exascale Snir, University of Illinois at Urbana- 2:40 PM-4:20 PM Computing - Part III of VI Champaign, USA Room:Danton 4:50 PM-6:30 PM 5:40-6:00 Experiments with FSEFI: The Fine-Grained Soft Error Fault Injector Room:Farabeuf Chair: Axel Modave, Virginia Tech, Nathan A. DeBardeleben and Qiang USA For Part 2 see MS49 Guan, Los Alamos National 2:40-2:55 Efficient Solver for High- For Part 4 see MS65 Laboratory, USA Resolution Numerical Weather Future extreme scale systems are Prediction Over the Alps expected to suffer more frequent system 6:05-6:25 What Error to Expect When Zbigniew P. Piotrowski, Damian Wójcik, faults resulting from the unprecedented You Are Expecting a Bit Flip. James Elliott, North Carolina State Andrzej Wyszogrodzki, and Milosz scale of parallelism. Combined with the University, USA; Mark Hoemmen, Ciznicki, Poznan Supercomputing and technology trend such as the imbalance Sandia National Laboratories, USA; Networking Center, Poland between computing and I/O throughput Frank Mueller, North Carolina State and a tight power budget in the system 3:00-3:15 An Adaptive Multi- University, USA Resolution Data Model for Feature- operations, the traditional hardware- Rich Environmental Simulations on level redundancy and checkpoint Multiple Scales restart may not be a feasible solution. Vasco Varduhn, Technische Universität Instead, every layer of the systems München, Germany should address these faults to mitigate 3:20-3:35 An Overlapping Domain their impact in holistic manners. In Decomposition Preconditioner for the this minisymposium, we will address Helmholtz Equation the deficiencies of the current practice Wei Leng, Chinese Academy of from a spectrum of high performance Sciences, China; Lili Ju, University of computing research from hardware, South Carolina, USA systems, runtime, algorithm and 3:40-3:55 Wave Propagation in Highly applications. Heterogeneous Media: Scalability Organizer: Keita Teranishi of the Mesh and Random Properties Sandia National Laboratories, USA Generator Regis Cottereau, CNRS, Organizer: Luc Giraud CentraleSupelec, Université Paris- Inria, France Saclay, France; José Camata, Federal University of Rio de Janerio, Organizer: Emmanuel Agullo Brazil; Lucio de Abreu Corrêa and Inria, France Luciano de Carvalho Paludo, CNRS, Organizer: Francesco Rizzi CentraleSupelec, Université Paris- Sandia National Laboratories, USA Saclay, France; Alvaro Coutinho, Federal University of Rio de Janerio, Organizer: Michael Heroux Brazil Sandia National Laboratories, USA 4:00-4:15 GPU Performance 4:50-5:10 Resilience for Analysis of Discontinuous Galerkin Asynchronous Runtime Systems and Implementations for Time-Domain Algorithms Wave Simulations Marc Casas, Barcelona Axel Modave, Rice University, USA; Supercomputing Center, Spain Jesse Chan and Tim Warburton, Virginia Tech, USA

Coffee Break

4:20 PM-4:50 PM continued in next column Room:Exhibition Space SIAM Conference on Parallel Processing for Scientific Computing 47

Thursday, April 14 Thursday, April 14 Thursday, April 14 MS58 MS59 MS60 High-End Computing for Nonlinear Preconditioning Advances in Parallel Dense Next-Generation Scientific 4:50 PM-6:30 PM and Sparse Linear Algebra Discovery - Part III of III Room:Roussy 4:50 PM-6:30 PM 4:50 PM-6:30 PM Increasing local computational work Room:Salle des theses Room:Pasquier and reducing communication are key Dense and sparse linear algebra are ingredients for efficiently using future For Part 2 see MS50 important kernels for many scientific Computational modeling and exascale machines. In Newton-Krylov problems. Over the past decade simulation at large scale has become methods using domain decomposition important developments in the use of indispensable for science and as preconditioner, these aspects can be runtime systems and other techniques engineering. However, the recent end of mainly treated at the level of solving have been developed for basic dense Dennard scaling is causing significant the linear systems. Computational work linear algebra problems that aim to disruption to computing systems, can be localized and communication manage the increasing complexity of resulting in increasingly complex and reduced by a reordering of operations: first HPC resources. This minisymposium heterogeneous HPC platforms. In this decomposing the nonlinear problem, then addresses recent progress on dense and minisymposium, we discuss the critical linearizing it, leading to nonlinear domain sparse linear algebra issues that must be components for effectively building and decomposition. Nonlinear preconditioning solved as we move towards ever more leveraging HPC systems. We begin by can also be seen as a globalization extreme- scale systems. strategy. Recent approaches to nonlinear examining the world’s most powerful Organizer: Jonathan Hogg supercomputers, and discussing their domain decomposition methods and the composition of algebraic nonlinear solvers Rutherford Appleton Laboratory, United architectural tradeoffs. Next we present Kingdom key application drivers, whose science as well as results on several hundred depends on the successful deployment thousand cores will be presented. Organizer: Lars Karlsson of future exascale systems. Finally, Organizer: Axel Klawonn Umeå University, Sweden we explore emerging technology Universitaet zu Koeln, Germany 4:50-5:10 The Challenge of Strong solutions that have the potential to Scaling for Direct Methods enable unprecedented next- generation Organizer: Oliver Rheinbach Jonathan Hogg, Rutherford Appleton computing capability. Technische Universitaet Bergakademie Laboratory, United Kingdom Freiberg, Germany 5:15-5:35 NUMA-Aware Hessenberg Organizer: Leonid Oliker 4:50-5:10 A Framework for Nonlinear Lawrence Berkeley National Reduction in the Context of the QR Feti-Dp and Bddc Domain Algorithm Laboratory, USA Decomposition Methods Mahmoud Eljammaly and Lars Organizer: Rupak Biswas Axel Klawonn, Universitaet zu Koeln, Karlsson, Umeå University, Sweden NASA Ames Research Center, USA Germany; Martin Lanser and Martin Lanser, University of Cologne, 5:40-6:00 Low Rank Approximation of Organizer: David Donofrio Germany; Oliver Rheinbach, Technische Sparse Matrices for Lu Factorization Based on Row and Column Lawrence Berkeley National Universitaet Bergakademie Freiberg, Permutations Laboratory, USA Germany Laura Grigori and Sebastien Cayrols, 4:50-5:10 Quantum Computing: 5:15-5:35 Scaling Nonlinear FETI-DP Inria, France; James W. Demmel, Opportunities and Challenges Domain Decomposition Methods to University of California, Berkeley, Rupak Biswas, NASA Ames Research 786 452 Cores USA Center, USA Axel Klawonn, Universitaet zu Koeln, Germany; Martin Lanser and Martin 6:05-6:25 Assessing Recent Sparse 5:15-5:35 Emerging Hardware Matrix Storage Formats for Parallel Technologies for Exascale Systems Lanser, University of Cologne, Sparse Matrix-Vector Multiplication David Donofrio, Lawrence Berkeley Germany; Oliver Rheinbach, Technische Weifeng Liu, University of Copenhagen, National Laboratory, USA Universitaet Bergakademie Freiberg, Denmark Germany 5:40-6:00 Silicon Photonics for Extreme Scale Computing 5:40-6:00 Convergence Estimates for Keren Bergman and Sebastien Rumley, Composed Iterations Columbia University, USA Matthew G. Knepley, Rice University, USA 6:05-6:25 Task-Based Execution Models and Co-Design 6:05-6:25 Additive and Multiplicative Romain Cledat and Joshua Fryman, Nonlinear Schwarz Intel Corporation, USA David E. Keyes and Lulu Liu, King Abdullah University of Science & Technology (KAUST), Saudi Arabia 48 SIAM Conference on Parallel Processing for Scientific Computing

Thursday, April 14 5:15-5:35 MOLGW: A Parallel Many- Thursday, April 14 Body Perturbation Theory Code for MS61 Finite Systems MS62 Fabien Bruneval, CEA, DEN, SRMP, Large-scale Electronic France; Samia M. Hamed, Tonatiuh Scalable Task-based Structure Calculation: Rangel-Gordillo, and Jeffrey B. Programming Models - Algorithms, Performance Neaton, University of California, Part I of II and Applications - Berkeley, USA 4:50 PM-6:30 PM Part II of III 5:40-6:00 New Building Blocks for Scalable Electronic Structure Theory Room:Leroux 4:50 PM-6:30 PM Volker Blum, Duke University, USA For Part 2 see MS70 Programming models that enable Room:Marie Curie 6:05-6:25 GPU-Accelerated users to write sequential code that For Part 1 see MS53 Simulation of Nano-Devices from First- For Part 3 see MS69 Principles can be efficiently executed in parallel, Electronic structure calculations and Mathieu Luisier, Mauro Calderara, for example by a dedicated run-time their applications are among the most Sascha Brueck, Hossein Bani- system, on heterogeneous architecture, challenging and computationally Hashemian, and Joost VandeVondele, are becoming ubiquitous. Such demanding science and engineering ETH Zürich, Switzerland programming models provide an easy problems. This minisymposium aims interface to developers while exploiting at presenting and discussing new the performance of the underlying numerical and parallel processing infrastructure. In this minisymposium, avenues that are suitable for modern we explore the current state-of-the-art computing architectures, for achieving of task-based programming models, the ever higher level of accuracy and potential for unification, user interfaces, scalability in DFT, TDDFT and other and special issues such as energy or types of ground and excited states resource awareness and resilience. The simulations. We propose to bring minisymposium is organized in two parts, together physicists/chemists who are with contributions from key players in involved in improving the numerical task-based programming. development of widely known quantum Organizer: Elisabeth Larsson chemistry and solid-state physics Uppsala University, Sweden application software packages, with mathematicians/computer scientists who Organizer: Rosa M. Badia are focusing on advancing the required Barcelona Supercomputing Center, Spain state-of-the-art mathematical algorithms 4:50-5:10 A Single Interface for and parallel implementation. Multiple Task-Based Parallel Programing Paradigms/Schemes Organizer: Chao Yang Afshin Zafari and Elisabeth Larsson, Lawrence Berkeley National Uppsala University, Sweden Laboratory, USA 5:15-5:35 Composition and Organizer: Eric Polizzi Compartmentalisation As Enabling University of Massachusetts, Amherst, Features for Data-Centric, Extreme USA Scale Applications: An MPI-X Approach 4:50-5:10 High-Performance Marco Aldinucci, Maurizio Drocco, and Real-Time TDDFT Excited-States Claudia Misale, University of Torino, Calculations Italy; Massimo Torquati, University of Eric Polizzi, University of Pisa, Italy Massachusetts, Amherst, USA 5:40-6:00 Task Dataflow Scheduling and the Quest for Extremely Fine- Grain Parallelism Hans Vandierendonck and Mahwish Arif, Queen’s University, Belfast, United Kingdom 6:05-6:25 Recent Advances in the continued in next column Task-Based StarPU Runtime Ecosystem Samuel Thibault, University of Bordeaux, France; Marc Sergent and Suraj Kumar, Inria, France; Luka Stanisic, INRIA Grenoble Rhône-Alpes, France SIAM Conference on Parallel Processing for Scientific Computing 49

Thursday, April 14 Thursday, April 14 5:15-5:35 High-Performance Computing in Free Surface Fluid Flow MS63 MS64 Simulations Nevena Perovic, Technische Universität Auto-Tuning for the Post Multi-Resolution Models for München, Germany; Jérôme Frisch, Moore’s Era - Part II of II Environmental Simulations RWTH Aachen University, Germany; 4:50 PM-6:30 PM 4:50 PM-6:30 PM Ralf-Peter Mundani and Ernst Rank, Technische Universität München, Room:Dejerine Room:Delarue Germany For Part 1 see MS55 Many important scientific and 5:40-6:00 Fast Contact Detection for It is expected that the progress of technological developments in the Granulates semiconductor technology will be area of environmental simulations Tomasz Koziara, Konstantinos saturated by the end of 2020 due have resulted in an enormous increase Krestenitis, Jon Trevelyan, and Tobias to physical limitations in device in availability of models on different Weinzierl, Durham University, United miniaturization. As a result, no gains in scales. While such tremendous amounts Kingdom FLOPS computations will be achieved of information become available, with a constant power budget. This 6:05-6:25 LB Particle Coupling on the pure visual data exploration is Dynamically-Adaptive Cartesian change is referred to as “Post Moore’s not satisfying at all in order to gain Grids Era”. We anticipate that auto-tuning sufficient insight. Hence, coupling Michael Lahnert and Miriam Mehl, (AT) technologies will still have multi-resolution models with numerical Universität Stuttgart, Germany the ability of providing optimized, simulations is an indispensable step high performance optimization for towards meaningful predictions of architectures of the Post Moore’s Era. physical phenomena within modern This minisymposium will discuss AT engineering tasks. Important questions frameworks and technologies that target to be discussed in this Minisymposium the core of most large applications, will include efficient coupling of including linear solvers, eigensolvers numerical models on different scales and stencil computations for new and how a predictive quality of such algorithm towards the Post Moore’s Era. coupled simulations can be obtained. Organizer: Toshiyuki Imamura Organizer: Jérôme Frisch RIKEN, Japan RWTH Aachen University, Germany Organizer: Takahiro Katagiri Organizer: Ralf-Peter Mundani University of Tokyo, Japan Technische Universität München, Organizer: Osni A. Marques Germany Lawrence Berkeley National 4:50-5:10 Thermal Comfort Evaluation Laboratory, USA for Indoor Airflows Using a Parallel CFD Approach 4:50-5:10 Auto-Tuning for Eigenvalue Jérôme Frisch, RWTH Aachen Solver on the Post Moore’s Era University, Germany; Ralf-Peter Toshiyuki Imamura, RIKEN, Japan Mundani, Technische Universität 5:15-5:35 Hardware/Algorithm München, Germany Co-Optimization and Co-Tuning in the Post Moore Era Franz Franchetti, Carnegie Mellon University, USA 5:40-6:00 Automatic Tuning for Parallel FFTs on Intel Xeon Phi Clusters Daisuke Takahashi, University of Tsukuba, Japan 6:05-6:25 On Guided Installation of Basic Linear Algebra Routines in Nodes with Manycore Components Javier Cuenca, University of Murcia, Spain; Luis-Pedro García, Technical continued in next column University of Cartagena, Spain; Domingo Giménez, University of Murcia, Spain 50 SIAM Conference on Parallel Processing for Scientific Computing

Thursday, April 14 Friday, April 15 Friday, April 15 CP11 SP2 Domain-specific SIAG/Supercomputing Registration Programming Tools Best Paper Prize: 8:15 AM-5:15 PM 4:50 PM-5:50 PM Communication-Optimal Room:Entrance Hall Room:Danton Parallel and Sequential QR and LU Factorizations Chair: Paul Springer, RWTH Aachen University, Germany 9:40 AM-10:05 AM 4:50-5:05 Maximizing CoreNeuron IP6 Room:Farabeuf Application Performance on All CPU Conquering Big Data with Chair: Olaf Schenk, Università della Architectures Using Cyme Library Timothee Ewart, École Polytechnique Spark Svizzera Italiana, Italy Fédérale de Lausanne, Switzerland; 8:45 AM-9:30 AM We present parallel and sequential dense Kai Langen, Pramod Kumbhar, QR factorization algorithms that are Room:Farabeuf and Felix Schuermann, EPFL, both optimal (up to polylogarithmic France; Fabien Delalondre, École Chair: Rosa M. Badia, Barcelona factors) in the amount of communication Polytechnique Fédérale de Lausanne, Supercomputing Center, Spain they perform and just as stable as Switzerland Today, big and small organizations Householder QR. We prove optimality 5:10-5:25 Fresh: A Framework for alike collect huge amounts of data, by deriving new lower bounds for the Real-World Structured Grid Stencil and they do so with one goal in mind: number of multiplications done by “non- Computations on Heterogeneous extract “value” through sophisticated Strassen-like” QR, and using these in Platforms exploratory analysis, and use it as the known communication lower bounds Yang Yang, Aiqing Zhang, and Zhang basis to make decisions as varied as that are proportional to the number Yang, Institute of Applied Physics personalized treatment and ad targeting. of multiplications. We not only show and Computational Mathematics, To address this challenge, we have that our QR algorithms attain these China developed Berkeley Data Analytics lower bounds (up to polylogarithmic 5:30-5:45 Ttc: A Compiler for Tensor Stack (BDAS), an open source data factors), but that existing LAPACK Transpositions analytics stack for big data processing. and ScaLAPACK algorithms perform Paul Springer and Paolo Bientinesi, asymptotically more communication. We Ion Stoica RWTH Aachen University, Germany derive analogous communication lower University of California, Berkeley, USA bounds for LU factorization and point out recent LU algorithms in the literature that attain at least some of these lower Conference Dinner Intermission bounds. The sequential and parallel QR (separate fees apply) algorithms for tall and skinny matrices 9:30 AM-9:40 AM 7:30 PM-10:30 PM lead to significant speedups in practice over some of the existing algorithms, Room:Offsite (Salons Hoche) including LAPACK and ScaLAPACK, for example, up to 6.7 times over ScaLAPACK. A performance model for the parallel algorithm for general rectangular matrices predicts significant speedups over ScaLAPACK. Joint work with James W. Demmel, University of California, Berkeley, USA; Mark Hoemmen, Sandia National Laboratories, US; Julien Langou, University of Colorado, Denver, USA Laura Grigori Inria, France

Coffee Break 10:10 AM-10:35 AM Room: Exhibition Space SIAM Conference on Parallel Processing for Scientific Computing 51

France Friday, April 15 Friday, April 15 11:25-11:45 A Flexible “Area-Time” Model for Analyzing and Managing MS66 MS65 Application Resilience In Situ Methods and Andrew A. Chien, University of Chicago Infrastructure: Answers Resilience Toward Exascale and Argonne National Laboratory, Computing - Part IV of VI USA; Aiman Fang, University of Without All the I/O 10:35 AM-12:15 PM Chicago, USA 10:35 AM-12:15 PM Room:Farabeuf 11:50-12:10 Local Recovery at Room:Pasquier Extreme Scales with FenixLR: For Part 3 see MS57 Implementation and Evaluation Due to the widening gap between For Part 5 see MS79 Marc Gamell, Rutgers University, USA; the FLOP and I/O capacity of HPC Future extreme scale systems are Keita Teranishi and Michael Heroux, platforms, it is increasingly impractical expected to suffer more frequent Sandia National Laboratories, USA; for computer simulations to save system faults resulting from the Manish Parashar, Rutgers University, full-resolution computations to disk unprecedented scale of parallelism. USA for subsequent analysis. “In situ” Combined with the technology methods offer hope for managing trend such as the imbalance between this increasingly acute problem computing and I/O throughput and by performing as much analysis, a tight power budget in the system visualization, and related processing operations, the traditional hardware- while data is still resident in memory. level redundancy and checkpoint While in situ methods are not new, restart may not be a feasible solution. they are presently the subject of much Instead, every layer of the systems active R&D, which aims to produce should address these faults to mitigate novel algorithms and production-quality their impact in holistic manners. In infrastructure deployed at HPC facilities this minisymposium, we will address for use by the scientific community. the deficiencies of the current practice from a spectrum of high performance Organizer: Wes Bethel Lawrence Berkeley National computing research from hardware, Laboratory, USA systems, runtime, algorithm and applications. 10:35-10:55 In Situ Processing Overview and Relevance to the Hpc Organizer: Keita Teranishi Community Sandia National Laboratories, USA E. Wes Bethel, Lawrence Berkeley National Laboratory, USA Organizer: Luc Giraud Inria, France 11:00-11:20 Overview of Contemporary In Situ Infrastructure Organizer: Emmanuel Agullo Tools and Architectures Inria, France Julien Jomier and Patrick O’Leary, Kitware, Inc., USA Organizer: Francesco Rizzi Sandia National Laboratories, USA 11:25-11:45 Complex In Situ and In Transit Workflows Organizer: Michael Heroux Matthew Wolf, Georgia Institute of Sandia National Laboratories, USA Technology, USA 10:35-10:55 Will Burst Buffers Save 11:50-12:10 In Situ Algorithms Checkpoint/Restart? and Their Application to Science Kathryn Mohror, Lawrence Livermore Challenges National Laboratory, USA Venkatram Vishwanath, Argonne 11:00-11:20 From System-Level National Laboratory, USA Checkpointing to User-Level Failure Mitigation in MPI-3 Camille Coti, Université Paris 13,

continued in next column 52 SIAM Conference on Parallel Processing for Scientific Computing

Friday, April 15 11:00-11:20 Progress on High Friday, April 15 Performance Programming MS67 Frameworks for Numerical Simulations MS68 Zeyao Mo, CAEP Software Center Parallel Programming for High Performance Numerical Scalable Network Frameworks: Technologies, Simulations, China Analysis: Tools, Algorithms, Performance and 11:25-11:45 Addressing the Future Applications - Part I of II Applications - Part I of III Needs of Uintah on Post-Petascale 10:35 AM-12:15 PM Systems 10:35 AM-12:15 PM Martin Berzins, University of Utah, USA Room:Salle des theses Room:Roussy 11:50-12:10 Can We Really Taskify the For Part 2 see MS76 For Part 2 see MS75 World? Challenges for Task-Based Graph analysis provides tools for Parallel programming frameworks play Runtime Systems Designers analyzing the irregular data sets common an important role in computational Raymond Namyst, University of in health informatics, computational sciences. This minisymposium Bordeaux, France biology, climate science, sociology, will focus on aspects of parallel security, finance, and many other programming frameworks, including fields. These graphs possess different key technologies, software design and structures than typical finite element applications. Major topics includes meshes. Scaling graph analysis to programming models, data structures, the scales of data being gathered and load balancing algorithms, performance created has spawned many directions optimization techniques, and application of exciting new research. This developments. Both structured minisymposium includes talks on and unstructured grid frameworks massive graph generation for testing are covered, with an emphasis on and evaluating parallel algorithms, AMR capabilities. It brings together novel streaming techniques, and parallel researchers from Europe, US and China, graph algorithms for new and existing to share experiences and exchange ideas problems. It also covers existing parallel on designing, developing, optimizing frameworks and interdisciplinary and the applications of parallel applications, e.g. the analysis of climate programming frameworks. networks. Organizer: Ulrich J. Ruede Organizer: Henning University of Erlangen-Nuremberg, Meyerhenke Germany Karlsruhe Institute of Technology, Germany Organizer: Zeyao Mo CAEP Software Center for High Organizer: Jason Riedy Performance Numerical Simulations, Georgia Institute of Technology, USA China Organizer: David A. Bader Organizer: Zhang Yang Georgia Institute of Technology, USA Institute of Applied Physics and 10:35-10:55 (Generating And) Computational Mathematics, China Analyzing Large Networks with 10:35-10:55 Exascale Software NetworKit Hans-Joachim Bungartz, Technische Henning Meyerhenke, Elisabetta Universität München, Germany; Bergamini, Moritz von Looz, Roman Ulrich J. Ruede, University of Prutkin, and Christian L. Staudt, Erlangen-Nuremberg, Germany Karlsruhe Institute of Technology, Germany 11:00-11:20 Generating and Traversing Large Graphs in External- Memory. Ulrich Meyer, Goethe University Frankfurt am Main, Germany

continued in next column continued on next page SIAM Conference on Parallel Processing for Scientific Computing 53

11:25-11:45 The GraphBLAS Friday, April 15 Friday, April 15 Effort: Kernels, API, and Parallel Implementations MS69 MS70 Aydin Buluc, Lawrence Berkeley National Laboratory, USA Large-scale Electronic Scalable Task-based 11:50-12:10 Parallel Tools to Structure Calculation: Programming Models - Reconstruct and Analyze Large Algorithms, Performance and Part II of II Climate Networks Applications - Part III of III 10:35 AM-12:15 PM Hisham Ihshaish, University of West England, Bristol, United Kingdom; 10:35 AM-12:15 PM Room:Leroux Alexis Tantet, University of Utrecht, Room:Marie Curie For Part 1 see MS62 Netherlands; Johan Dijkzeul, For Part 2 see MS61 Programming models that enable Delft University of Technology, users to write sequential code that Electronic structure calculations and Netherlands; Henk A Dijkstra, can be efficiently executed in parallel, their applications are among the most Utrecht University, The Netherlands for example by a dedicated run-time challenging and computationally system, on heterogeneous architecture, demanding science and engineering are becoming ubiquitous. Such problems. This minisymposium aims at programming models provide an easy presenting and discussing new numerical interface to developers while exploiting and parallel processing avenues that the performance of the underlying are suitable for modern computing infrastructure. In this minisymposium, architectures, for achieving ever higher we explore the current state-of-the-art level of accuracy and scalability in DFT, of task-based programming models, the TDDFT and other types of ground and potential for unification, user interfaces, excited states simulations. We propose to and special issues such as energy or bring together physicists/chemists who resource awareness and resilience. The are involved in improving the numerical minisymposium is organized in two development of widely known quantum parts, with contributions from key players chemistry and solid-state physics in task-based programming. application software packages, with mathematicians/computer scientists who Organizer: Elisabeth Larsson are focusing on advancing the required Uppsala University, Sweden state-of-the-art mathematical algorithms Organizer: Rosa M. Badia and parallel implementation. Barcelona Supercomputing Center, Spain Organizer: Chao Yang 10:35-10:55 Developing Task-Based Lawrence Berkeley National Laboratory, Applications with Pycompss USA Javier Conejero, Barcelona Organizer: Eric Polizzi Supercomputing Center, Spain University of Massachusetts, Amherst, USA 11:00-11:20 Resilience in Distributed Task-Based Runtimes (2) 10:35-10:55 Advancing Algorithms to George Bosilca, University of Tennessee, Increase Performance of Electronic Structure Simulations on Many-Core Knoxville, USA Architectures 11:25-11:45 DuctTeip: A Framework for Wibe A. De Jong, Lawrence Berkeley Distributed Hierarchical Task-Based National Laboratory, USA Parallel Computing Elisabeth Larsson and Afshin Zafari, 11:00-11:20 A Parallel Algorithm for Approximating the Single Particle Uppsala University, Sweden Density Matrix for O(N) Electronics 11:50-12:10 Heterogeneous Structure Calculations Computing with GPUs and FPGAs Daniel Osei-Kuffuor and Jean-Luc Through the OpenMP Task-Parallel Fattebert, Lawrence Livermore National Model Laboratory, USA Artur Podobas, KTH Royal Institute of Technology, Sweden 11:25-11:45 Parallel Eigensolvers for Density Functional Theory Antoine Levitt, Inria and Ecole des Ponts ParisTech, France 11:50-12:10 Real-Valued Technique for Solving Linear Systems Arising in Contour Integral Eigensolver Yasunori Futamura and Tetsuya Sakurai, University of Tsukuba, Japan 54 SIAM Conference on Parallel Processing for Scientific Computing

Friday, April 15 11:00-11:20 PFLOTRAN: Practical Friday, April 15 Application of High Performance MS71 Computing to Subsurface Simulation MS72 Glenn Hammond, Sandia National Scientific Computing with High Performance Laboratories, USA Simulations for Subsurface Parallel Adaptive Mesh 11:25-11:45 Parallel Performance of a Applications Reservoir Simulator in the Context of Refinement Methods 10:35 AM-12:15 PM Unstable Waterflooding 10:35 AM-12:15 PM Romain De Loubens, Total, France; Room:Dejerine Room:Delarue Pascal Henon, Total E&P, France; Simulation of underground flow Pavel Jiranek, CERFACS, France In the past years the popularization and transport has several important of important parallel computational 11:50-12:10 Ongoing Development applications such as oil reservoirs, resources have helped spreading parallel and HPSC Applications with the Adaptive Mesh Refinement (AMR) deep geological storage of nuclear ParFlow Hydrologic Model across a large range of applications that waste, or carbon dioxide or geothermal Stefan J. Kollet, Agrosphere Institute, use scientific computing. We propose in energy. Common to these areas is Germany; Klaus Goergen, Jülich this minisymposium to present a sample the complexity of the phenomena Supercomputing Centre, Germany; of these applications. We shall focus involved (multi-phase flow, chemistry, Carsten Burstedde and Jose Fonseca, on the specific AMR method that was geomechanics, thermics, surface flow) Universität Bonn, Germany; Fabian chosen, its implementation in a parallel that are coupled, and the heterogeneity Gasper, Agrosphere Research Center framework and how these choices of the geological medium over several Jülich, Germany; Reed M. Maxwell, impact the discretization strategy. scales. High performance computing has Colorado School of Mines, USA; thus been recognized as one important Prabhakar Shresta and Mauro Sulis, Organizer: Samuel Kokh way for increasing the realism of these Universität Bonn, Germany Maison de la Simulation, France simulations. This minisymposium will gather researchers whose collective 10:35-10:55 Bb-Amr and Numerical Entropy Production experience covers a large part of the Frédéric Golay, Universite de Toulon, above domains, or who have developed France tools enabling better, faster, or more detailed numerical simulations. 11:00-11:20 Title Not Available Pierre Kestener, Maison de la Organizer: Michel Kern Simulation, France Inria & Maison de la Simulation, 11:25-11:45 Title Not Available France Donna Calhoun, Boise State University, Organizer: Roland Masson USA University of Nice, France 11:50-12:10 Title Not Available Organizer: Feng Xing Thierry Coupez, Ecole Centrale de University of Nice Sophia-Antipolis and Nantes, France Inria, France 10:35-10:55 Hybrid Dimensional Multiphase Compositional Darcy Flows in Fractured Porous Media and Parallel Implementation in Code ComPASS Feng Xing, University of Nice Sophia- Antipolis and Inria, France; Roland Masson, University of Nice, France

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 55

Friday, April 15 Friday, April 15 Friday, April 15 CP12 IP7 MS73 Advanced Simulations - A Systems View of High How Will Next-generation I of III Performance Computing Architectures Change How 10:35 AM-12:15 PM 1:45 PM-2:30 PM We Reason About and Write Room:Danton Room:Farabeuf Applications? Chair: Sever Hirstoaga, Inria, France Chair: Bruce Hendrickson, Sandia 2:40 PM-4:20 PM 10:35-10:50 Parallel in Time Solvers for National Laboratories, USA Room:Farabeuf DAEs HPC systems comprise increasingly Both European and US research programs Matthieu Lecouvez, Robert D. Falgout, many fast components, all working have ambitious energy goals for extreme- and Carol S. Woodward, Lawrence together on the same problem scale computing - 200 petaFLOPS at 10 Livermore National Laboratory, USA concurrently. Yet, HPC efficient- MW to exaFLOPS at 20 MW. Achieving 10:55-11:10 Implicitly Dealiased programming dogma dictates that performance within energy constraints Convolutions for Distributed Memory communicating is too expensive and requires fundamentally new architectures, Architectures should be avoided. Program ever more potentially including heterogeneous Malcolm Roberts, University of components to work together with CPUs with network-on-chip, multi-level Strasbourg, France; John C. Bowman, ever less frequent communication?!? memories, and optical interconnects. University of Alberta, Canada Seriously? I object! In other disciplines, While accelerators, e.g, have started to 11:15-11:30 A Massively Parallel deriving efficiency from large Systems influence programming models, many Simulation of the Coupled Arterial requires an approach qualitatively applications are agnostic to energy and Cells in Branching Network different from that used to derive hardware trends. To what extent will Mohsin A. Shaikh, Pawsey efficiency from small Systems. Should achieving energy-efficiency and utilizing Supercomputing Center, Australia; we not expect the same in HPC? Using new architectures be an algorithms Constantine Zakkaroff, University of Grappa, a latency-tolerant runtime for problem? What can be achieved Canterbury, New Zealand; Stephen HPC systems, I will illustrate through automatically at the runtime-level versus Moore, University of Melbourne, examples that such a qualitatively application-level?Leading architecture Australia; Tim David, University of different approach can indeed yield researchers demystify emerging hardware Canterbury, New Zealand efficiency from many components trends into specific consequences for 11:35-11:50 Load Balancing Issues in without stifling communication. algorithm design. Particle In Cell Simulations Simon Kahan Arnaud Beck, Ecole Polytechnique, Organizer: Jeremiah Wilke Institute for Systems Biology and France; Julien Derouillat, Maison de Sandia National Laboratories, USA University of Washington, USA la Simulation, France Organizer: Arun Rodrigues 11:55-12:10 Particle-in-Cell Sandia National Laboratories, USA Simulations for Nonlinear Vlasov-Like 2:40-3:00 Helping Applications Reason Models Intermission About Next-Generation Architectures Sever Hirstoaga, Inria, France; Edwin 2:30 PM-2:40 PM Through Abstract Machine Models Chacon-Golcher, Institute of Physics John Shalf and David Donofrio, Lawrence of the ASCR, Prague, Czech Republic Berkeley National Laboratory, USA 3:05-3:25 How Will Application Developers Navigate the Cost/ Lunch Break Peformance Trade-Offs of New Multi- Level Memory Architectures? 12:15 PM-1:45 PM Arun Rodrigues, Sandia National Attendees on their own Laboratories, USA 3:30-3:50 When Is Energy Not Equal to Time? Understanding Application Energy Scaling with EAudit Sudhakar Yalamanchili and Eric Anger, Georgia Institute of Technology, USA 3:55-4:15 Performance-Energy Trade-Offs in Reconfigurable Optical Data-Center Networks Using Silicon Photonics Sebastien Rumley and Keren Bergman, Columbia University, USA 56 SIAM Conference on Parallel Processing for Scientific Computing

Friday, April 15 Friday, April 15 3:05-3:25 Dune - A Parallel Pde Framework MS74 MS75 Christian Engwer, University of Parallel Tree-Code Algorithms Parallel Programming Münster, Germany; Steffen Müthing and Peter Bastian, Universität for Large Scale and Frameworks: Technologies, Heidelberg, Germany Performance and Multiphysics Simulations - 3:30-3:50 Peta-scale Simulations with Part I of II Applications - Part II of III the HPC Software Framework Walberla 2:40 PM-4:20 PM 2:40 PM-4:20 PM Florian Schornbaum and Martin Bauer, University of Erlangen-Nuremberg, Room:Pasquier Room:Roussy Germany; Christian Godenschwager, For Part 2 see MS82 For Part 1 see MS67 University of Erlangen-Nuernberg, This minisymposium aims to bring For Part 3 see MS83 Germany; Ulrich R\”ude, University of together computational scientists that Parallel programming frameworks play Erlangen-Nuremberg, Germany work on large- scale tree-codes and an important role in computational sciences. This minisymposium 3:55-4:15 PHG: A Framework for their applications. Tree-codes are Parallel Adaptive Finite Element hierarchical data structures that are used will focus on aspects of parallel Method for nonuniform discretization of partial programming frameworks, including Deng Lin, Wei Leng, and Tao Cui, differential equations. Tree-codes are also key technologies, software design and Chinese Academy of Sciences, China used for integral-equation formulations applications. Major topics includes and N-body methods. Designing efficient programming models, data structures, parallel tree-code algorithms that scale to load balancing algorithms, performance thousands of processors is challenging. optimization techniques, and application It is even harder to account for multiple developments. Both structured scales and multiple physics due to the and unstructured grid frameworks different resolution requirements of the are covered, with an emphasis on underlying spatiotemporal fields. As a AMR capabilities. It brings together result load balancing, minimization of researchers from Europe, US and China, communication, and efficient memory to share experiences and exchange ideas utilization become even harder than the on designing, developing, optimizing single field case. and the applications of parallel programming frameworks. Organizer: Arash Bakhtiari Technische Universitaet Muenchen, Germany Organizer: Ulrich J. Ruede University of Erlangen-Nuremberg, Organizer: George Biros Germany University of Texas at Austin, USA Organizer: Zeyao Mo 2:40-3:00 A Distributed-Memory Advection-Diffusion Solver with CAEP Software Center for High Dynamic Adaptive Trees Performance Numerical Simulations, Arash Bakhtiari, Technische Universitaet China Muenchen, Germany; Dhairya Malhotra Organizer: Zhang Yang and George Biros, University of Texas Institute of Applied Physics and at Austin, USA Computational Mathematics, China 3:05-3:25 A Volume Integral Equation 2:40-3:00 Cello: An Extreme Scale Solver for Elliptic PDEs in Complex Amr Framework, and Enzo-P, An Geometries Application for Astrophysics and Dhairya Malhotra and George Biros, Cosmology Built on Cello University of Texas at Austin, USA Michael L. Norman, University of 3:30-3:50 Efficient Parallel Simulation of California, San Diego, USA Flows, Particles, and Structures Miriam Mehl and Michael Lahnert, Universität Stuttgart, Germany; Benjamin Uekermann, Technische Universität München, Germany 3:55-4:15 Communication-Reducing continued in next column and Communication-Avoiding Techniques for the Particle-in-Dual-Tree Storage Scheme Tobias Weinzierl, Durham University, United Kingdom SIAM Conference on Parallel Processing for Scientific Computing 57

Friday, April 15 Friday, April 15 3:05-3:25 Complexity and Performance of Algorithmic Variants MS76 MS77 of Block Low-Rank Multifrontal Solvers Patrick Amestoy, ENSEEIHT-IRIT, Scalable Network Low-Rank Approximations France; Alfredo Buttari, CNRS, Analysis: Tools, Algorithms, for Fast Linear Algebra France; Jean-Yves L’Excellent, Inria- Applications - Part II of II Computations - Part I of II LIP-ENS Lyon, France; Theo Mary, 2:40 PM-4:20 PM 2:40 PM-4:20 PM Universite de Toulouse, France 3:30-3:50 Hierarchical Matrix Room:Salle des theses Room:Marie Curie Arithmetics on Gpus For Part 2 see MS85 For Part 1 see MS68 Wajih Halim Boukaram, Hatem Ltaief, Graph analysis provides tools for Low-rank representations are a popular George M Turkiyyah, and David E. analyzing the irregular data sets common way of speeding-up matrix algorithms. Keyes, King Abdullah University of in health informatics, computational They can be used for designing fast Science & Technology (KAUST), biology, climate science, sociology, matrix-vector products, direct solvers Saudi Arabia security, finance, and many other fields. with linear or near-linear complexity, These graphs possess different structures and robust preconditioners, both 3:55-4:15 A Block Low-Rank than typical finite element meshes. Scaling in the dense and the sparse case. Multithreaded Factorization for Dense graph analysis to the scales of data being Many different approaches, such as BEM Operators. Julie Anton, Cleve Ashcraft, and Clément gathered and created has spawned many H-matrices, HSS representations, or Weisbecker, Livermore Software directions of exciting new research. This the BLR format, are currently under Technology Corporation, USA minisymposium includes talks on massive study by different research groups. The graph generation for testing and evaluating minisymposium features researchers parallel algorithms, novel streaming working on all these techniques. They techniques, and parallel graph algorithms will present recent algorithmic ideas as for new and existing problems. It also well as software packages. covers existing parallel frameworks and Organizer: Francois-Henry interdisciplinary applications, e.g. the analysis of climate networks. Rouet Lawrence Berkeley National Organizer: Henning Laboratory, USA Meyerhenke Organizer: Hatem Ltaief Karlsruhe Institute of Technology, King Abdullah University of Science & Germany Technology (KAUST), Saudi Arabia Organizer: Jason Riedy 2:40-3:00 A Comparison of Parallel Georgia Institute of Technology, USA Rank-Structured Solvers Francois-Henry Rouet, Lawrence Organizer: David A. Bader Berkeley National Laboratory, USA; Georgia Institute of Technology, USA Patrick Amestoy, Université of 2:40-3:00 Scalable and Efficient Toulouse, France; Cleve Ashcraft, Algorithms for Analysis of Massive, Livermore Software Technology Streaming Graphs Corporation, USA; Alfredo Buttari, Jason Riedy and David A. Bader, Georgia CNRS, France; Pieter Ghysels, Institute of Technology, USA Lawrence Berkeley National 3:05-3:25 Communication Avoiding Ilu Laboratory, USA; Jean-Yves Preconditioner L’Excellent, Inria-LIP-ENS Lyon, Sebastien Cayrols and Laura Grigori, France; Xiaoye Sherry Li, Lawrence Inria, France Berkeley National Laboratory, USA; 3:30-3:50 Parallel Approximate Graph Theo Mary, Universite de Toulouse, Matching France; Clément Weisbecker and Fanny Dufossé, Inria Lille, France; Kamer Clément Weisbecker, Livermore Kaya, Sabanci University, Turkey; Bora Software Technology Corporation, Ucar, LIP-ENS Lyon, France USA 3:55-4:15 Diamond Sampling for Approximate Maximum All-pairs Dot- continued in next column product (MAD) Search Grey Ballard, C. Seshadhri, Tamara G. Kolda, and Ali Pinar, Sandia National Laboratories, USA 58 SIAM Conference on Parallel Processing for Scientific Computing

Friday, April 15 3:30-3:50 Traversing a BLR Friday, April 15 Factorization Task Dag Based on a MS78 Fan-All Wraparound Map MS79 Julie Anton, Cleve Ashcraft, and Scheduling Techniques Clément Weisbecker, Livermore Resilience Toward Exascale in Parallel Sparse Matrix Software Technology Corporation, Computing - Part V of VI Solvers USA 2:40 PM-4:20 PM 2:40 PM-4:20 PM 3:55-4:15 Scheduling Sparse Room:Dejerine Symmetric Fan-Both Cholesky Room:Leroux Factorization For Part 4 see MS65 For Part 6 see MS87 In this minisymposium, we focus on Mathias Jacquelin, Esmond G. Ng, Future extreme scale systems are expected to the scheduling problem in parallel and Yili Zheng, Lawrence Berkeley suffer more frequent system faults resulting sparse matrix solvers. Scheduling is National Laboratory, USA; Katherine from the unprecedented scale of parallelism. likely to play a crucial role towards Yelick, Lawrence Berkeley National Combined with the technology trend such as solvers that are scalable on extremely Laboratory and University of the imbalance between computing and I/O large platforms. On such platforms, California Berkeley, USA throughput and a tight power budget in the communications will be relatively system operations, the traditional hardware- expensive and subject to performance level redundancy and checkpoint restart may variation. Overlapping them with not be a feasible solution. Instead, every layer computations becomes therefore of of the systems should address these faults prime importance. Efficient scheduling to mitigate their impact in holistic manners. frameworks are likely to be extremely In this minisymposium, we will address the valuable with respect to this objective, deficiencies of the current practice from a as well as tolerating with performance spectrum of high performance computing variability. This minisymposium will research from hardware, systems, runtime, focus on both static and dynamic algorithm and applications. scheduling techniques in the context of sparse direct solvers. Organizer: Keita Teranishi Organizer: Mathias Jacquelin Sandia National Laboratories, USA Lawrence Berkeley National Laboratory, Organizer: Luc Giraud USA Inria, France Organizer: Esmond G. Ng Organizer: Emmanuel Agullo Lawrence Berkeley National Laboratory, Inria, France USA Organizer: Francesco Rizzi 2:40-3:00 Optimizing Memory Usage Sandia National Laboratories, USA When Scheduling Tree-Shaped Task Graphs of Sparse Multifrontal Solvers Organizer: Michael Heroux Guillaume Aupy, ENS Lyon, France; Sandia National Laboratories, USA Lionel Eyraud-Dubois, Inria, France; 2:40-3:00 Soft Errors in PCG: Detection Loris Marchal, Centre National de and Correction la Recherche Scientifique, France; Emmanuel Agullo, Inria, France; Siegfried Frédéric Vivien, ENS Lyon, France; Cools, Antwerp University, Belgium; Luc Oliver Sinnen, University of Giraud, Inria, France; Wim Vanroose, Auckland, New Zealand Antwerp University, Belgium; Fatih 3:05-3:25 Impact of Blocking Yetkin, Inria, France Strategies for Sparse Direct Solvers on 3:05-3:25 Fault Tolerant Sparse Iterative Top of Generic Runtimes Solvers Gregoire Pichon, Inria, France; Mathieu Simon McIntosh-Smith, University of Faverge, Bordeaux INP, Inria, LaBRI, Bristol, United Kingdom France; Pierre Ramet, Université de Bordeaux, Inria, LaBRI, France; Jean 3:30-3:50 Silent Data Corruption in BLAS Roman, Inria, France Kernels Wilfried N. Gansterer, University of Vienna, Austria 3:55-4:15 Algorithm-Based Fault Tolerance of the Sakurai-Sugiura Method continued in next column Tetsuya Sakurai, Akira Imakura, and Yasunori Futamura, University of Tsukuba, Japan SIAM Conference on Parallel Processing for Scientific Computing 59

Friday, April 15 Friday, April 15 Friday, April 15 MS80 CP13 MS81 Towards Reliable and Advanced Simulations - Evolution of Co-Design and Efficient Scientific II of III Scientific HPC Applications Computing at Exascale - 2:40 PM-3:40 PM 4:50 PM-6:30 PM Part I of II Room:Danton Room:Farabeuf 2:40 PM-4:20 PM Chair: Francois-Xavier Roux, ONERA, Modernizing scientific codes to France Room:Delarue run performantly on multiple For Part 2 see MS88 2:40-2:55 Efficient Two-Level Parallel emerging architectures is a task of Exascale supercomputers will Algorithm Based on Domain great complexity and challenge for bring huge computational power. Decomposition Method for Antenna computational scientists today. It Simulation Unfortunately, their architectures have is typical the set of performance Francois-Xavier Roux, ONERA, France to be complex and heterogeneous to necessary optimizations for a code on counteract the memory and power walls. 3:00-3:15 {GPU} Acceleration of a given platform will be distinct from Consequently, scientific codes designed Polynomial Homotopy Continuation those for another. To ameliorate this for homogeneous supercomputers will Jan Verschelde and Xiangcheng Yu, difficulty, we turn to abstraction layers have a low performance on Exascale University of Illinois, Chicago, USA and alternative programming models. supercomputers. Besides, intensive 3:20-3:35 Adaptive Transpose This minisymposium will present both floating point computation and massive Algorithms for Distributed Multicore experiments with proxy applications concurrency can degrade accuracy Processors and full applications towards the goal of and reproducibility. Therefore arises John C. Bowman, University of Alberta, developing strategies for performance a strong need for tools for efficient Canada; Malcolm Roberts, University portability on current and future HPC and reliable port of scientific codes. of Strasbourg, France platforms. The aim of this minisymposium is to Organizer: Geoff Womeldorff present jointly some research works Los Alamos National Laboratory, USA around performance characterization and Coffee Break verification of floating point accuracy. Organizer: Ian Karlin 4:20 PM-4:50 PM Organizer: Christophe Denis Lawrence Livermore National Room:Exhibition Space Laboratory, USA CMLA, ENS de Cachan, France 4:50-5:10 On the Performance 2:40-3:00 Checking the Floating Point and Productivity of Abstract C++ Accuracy of Scientific Code Through Programming Models? the Monte Carlo Arithmetic Christophe Denis, CMLA, ENS de Simon D. Hammond, Christian Trott, Cachan, France H. Carter Edwards, Jeanine Cook, Dennis Dinge, Mike Glass, Robert J. 3:05-3:25 Code Verification in Cfd Hoekstra, Paul Lin, Mahesh Rajan, Daniel Chauveheid, Bruyères-le-Châtel, and Courtenay T. Vaughan, Sandia France National Laboratories, USA 3:30-3:50 Performance Study of Industrial Hydrocodes: Modeling, 5:15-5:35 Demonstrating Advances Analysis and Aftermaths on in Proxy Applications Through Innovative Computational Methods Performance Gains and/or Performance Portable Abstractions: Florian De Vuyst, Ecole Normale CoMD and Kripke with RAJA Superieur de Cachan, France Richard Hornung, Olga Pearce, Adam 3:55-4:15 Code Modernization by Kunen, Jeff Keasler, Holger Jones, Mean of Proto-Applications Rob Neely, and Todd Gamblin, Eric Petit, PRiSM - UVSQ, France Lawrence Livermore National Laboratory, USA

continued on next page 60 SIAM Conference on Parallel Processing for Scientific Computing

Friday, April 15 Friday, April 15 5:40-6:00 Parallel Level-Set Algorithms on Tree Grids MS81 MS82 Mohammad Mirzadeh, Massachusetts Evolution of Co-Design and Institute of Technology, USA; Parallel Tree-Code Arthur Guittet, University of Scientific HPC Applications Algorithms for Large California, Santa Barbara, USA; 4:50 PM-6:30 PM Scale and Multiphysics Carsten Burstedde, Universität Bonn, Germany; Frederic G. Gibou, Room:Farabeuf Simulations - Part II of II University of California, Santa 4:50 PM-6:30 PM continued Barbara, USA Room:Pasquier 6:05-6:25 A Sharp Numerical Method For Part 1 see MS74 for the Simulation of Diphasic Flows This minisymposium aims to bring Maxime Theillard, University of 5:40-6:00 Kokkos and Legion together computational scientists that California, San Diego, USA; Frederic Implementations of the SNAP Proxy work on large- scale tree-codes and Gibou, University of California, Santa Application their applications. Tree-codes are Barbara, USA; David Saintillan, Geoff Womeldorff, Ben Bergen, and hierarchical data structures that are University of California, San Diego, Joshua Payne, Los Alamos National used for nonuniform discretization USA; Arthur Guittet, University of Laboratory, USA of partial differential equations. California, Santa Barbara, USA 6:05-6:25 Future Architecture Tree- codes are also used for integral- Influenced Algorithm Changes and equation formulations and N-body Portability Evaluation Using BLAST methods. Designing efficient parallel Christopher Earl, Barna Bihari, Veselin tree-code algorithms that scale to Dobrev, Ian Karlin, Tzanio V. Kolev, thousands of processors is challenging. Robert Rieben, and Vladimir Tomov, It is even harder to account for multiple Lawrence Livermore National scales and multiple physics due to the Laboratory, USA different resolution requirements of the underlying spatiotemporal fields. As a result load balancing, minimization of communication, and efficient memory utilization become even harder than the single field case. Organizer: Arash Bakhtiari Technische Universitaet Muenchen, Germany Organizer: George Biros University of Texas at Austin, USA 4:50-5:10 Recent Developments on Forests of Linear Trees Carsten Burstedde and Johannes Holke, Universität Bonn, Germany 5:15-5:35 Exploiting Octree Meshes with a High Order Discontinuous Galerkin Method Sabine P. Roller, Harald Klimach, and Jens Zudrop, Universitaet Siegen, Germany

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 61

Friday, April 15 Friday, April 15 Friday, April 15 MS83 MS84 MS85 Parallel Programming Acclerating HPC and Low-Rank Approximations Frameworks: Technologies, Datacenter Workloads with for Fast Linear Algebra Performance and FPGAs Computations - Part II of II Applications - Part III of III 4:50 PM-6:30 PM 4:50 PM-6:30 PM 4:50 PM-6:05 PM Room:Salle des theses Room:Marie Curie Room:Roussy The importance of power efficiency For Part 1 see MS77 For Part 2 see MS75 has been changing the architectural Low-rank representations are a popular Parallel programming frameworks play landscape of high performance way of speeding-up matrix algorithms. an important role in computational computing systems. Heterogeneous They can be used for designing fast sciences. This minisymposium computing with accelerators such as matrix-vector products, direct solvers will focus on aspects of parallel GPUs gained popularity thanks to its with linear or near-linear complexity, programming frameworks, including superior power efficiency for typical and robust preconditioners, both key technologies, software design and HPC workloads. Recent technological in the dense and the sparse case. applications. Major topics includes developments such as hardened Many different approaches, such as programming models, data structures, floating point units as well as the end H-matrices, HSS representations, or load balancing algorithms, performance of Moore’s Law, however, are making the BLR format, are currently under optimization techniques, and application reconfigurable chips such as FPGAs study by different research groups. The developments. Both structured and increasingly promising in a wide range minisymposium features researchers unstructured grid frameworks are of workloads in HPC and datacenter working on all these techniques. They covered, with an emphasis on AMR systems. In this minisymposium, will present recent algorithmic ideas as capabilities. It brings together researchers we will discuss performance and well as software packages. from Europe, US and China, to share programmability of a single FPGA to Organizer: Francois-Henry experiences and exchange ideas on large-scale multi-FPGA systems. designing, developing, optimizing and Rouet the applications of parallel programming Organizer: Naoya Maruyama Lawrence Berkeley National RIKEN, Japan frameworks. Laboratory, USA 4:50-5:10 Benchmarking FPGAs with Organizer: Hatem Ltaief Organizer: Ulrich J. Ruede Opencl King Abdullah University of Science & University of Erlangen-Nuremberg, Naoya Maruyama, RIKEN, Japan; Technology (KAUST), Saudi Arabia Germany Hamid Zohouri, Tokyo Institute of 4:50-5:10 Exploiting H-Matrices in Organizer: Zeyao Mo Technology, Japan; Aaron Smith, Microsoft Research, USA; Motohiko Sparse Direct Solvers CAEP Software Center for High Matsuda, RIKEN Advanced Institute Gregoire Pichon, Inria, France; Eric Performance Numerical Simulations, for Computational Science, Japan; F. Darve, Stanford University, USA; China Satoshi Matsuoka, Tokyo Institute of Mathieu Faverge, Bordeaux INP, Organizer: Zhang Yang Technology, Japan Inria, LaBRI, France; Pierre Ramet, Institute of Applied Physics and Université de Bordeaux, Inria, LaBRI, 5:15-5:35 Architectural Possibilities France; Jean Roman, Inria, France Computational Mathematics, China of Fpga-Based High-Performance 4:50-5:10 Design and Implementation Custom Computing 5:15-5:35 Fast Updating of of Adaptive Spmv Library for Multicore Kentaro Sano, Tohoku University, Japan Skeletonization-Based Hierarchical and Manycore Architecture Factorizations 5:40-6:00 Novo-G#: Strong Scaling Victor Minden, Anil Damle, Ken Ho, Guangming Tan, Chinese Academy of Using Fpga-Centric Clusters and Lexing Ying, Stanford University, Sciences, China with Direct and Programmable USA 5:15-5:35 Towards Efficient Data Interconnects Communications for Frameworks Martin Herbordt, Boston University, on Post-petascale Numa-Multicore USA Systems 6:05-6:25 Are Datacenters Ready for Zhang Yang and Aiqing Zhang, Institute Reconfigurable Logic? of Applied Physics and Computational Aaron Smith, Microsoft Research, USA Mathematics, China 5:40-6:00 Optimal Domain Decomposition Strategies for Parallel Geometric Multigrid Gaurav Saxena, Peter K. Jimack, and continued on next page Mark A. Walkley, University of Leeds, United Kingdom 62 SIAM Conference on Parallel Processing for Scientific Computing

Friday, April 15 Friday, April 15 Friday, April 15 MS85 MS86 MS87 Low-Rank Approximations Code Generation for High Resilience Toward Exascale for Fast Linear Algebra Performance Geoscientific Computing - Part VI of VI Computations - Part II of II Simulation 4:50 PM-6:30 PM 4:50 PM-6:30 PM 4:50 PM-6:30 PM Room:Dejerine Room:Marie Curie Room:Leroux For Part 5 see MS79 Future extreme scale systems are continued The demands of ever-more realstic expected to suffer more frequent system simulations, more sophisticated numerical faults resulting from the unprecedented techniques, and parallelism at multiple scale of parallelism. Combined with the 5:40-6:00 Algebraic and Geometric granularities leads to an explosion in technology trend such as the imbalance Low-Rank Approximations the complexity of simulation software. Rio Yokota, Tokyo Institute of between computing and I/O throughput Code generation approaches enable Technology, Japan; François-Henry and a tight power budget in the system the complexity of these three layers Rouet and Xiaoye S. Li, Lawrence operations, the traditional hardware- to be decoupled, so that sophisticated Berkeley National Laboratory, USA; level redundancy and checkpoint restart techiques developed by experts in domain David E. Keyes, King Abdullah may not be a feasible solution. Instead, science, numerical analysis and computer University of Science & Technology every layer of the systems should address science can be composed together to (KAUST), Saudi Arabia these faults to mitigate their impact in create ever-more capabile simulation holistic manners. In this minisymposium, 6:05-6:25 Parallel Performance of toolchains. This minisymposium focusses we will address the deficiencies of the the Inverse FMM and Application to on the development and use of these current practice from a spectrum of high Mesh Deformation for Fluid-Structure code generation techniques to achieve Problems performance computing research from high performance in simulation, with a Pieter Coulier, Chao Chen, and Eric F. hardware, systems, runtime, algorithm and particular focus on the computational Darve, Stanford University, USA applications. issues created by geoscience applications. Organizer: David Ham Organizer: Keita Teranishi Sandia National Laboratories, USA Imperial College London, United Kingdom Organizer: Luc Giraud Organizer: William Sawyer Inria, France Swiss Centre of Scientific Computing, Organizer: Emmanuel Agullo Switzerland Inria, France 4:50-5:10 Generating High Organizer: Francesco Rizzi Performance Finite Element Kernels Using Optimality Criteria. Sandia National Laboratories, USA Fabio Luporini, David Ham, and Paul Organizer: Michael Heroux Kelly, Imperial College London, Sandia National Laboratories, USA United Kingdom 4:50-5:10 {FA-MPI}: Fault-Aware and 5:15-5:35 Architecture-Agnostic QoS-Oriented Scalable Message Numerics in GridTools Passing Paolo Crosetto, École Polytechnique Anthony Skjellum, Auburn University, USA Fédérale de Lausanne, Switzerland 5:15-5:35 Spare Node Substitution of 5:40-6:00 Kernel Optimization for High- Failed Node Order Dynamic Rupture Simulation - Atsushi Hori, Kazumi Yoshinaga, and Flexibility Vs. Performance Yutaka Ishikawa, RIKEN Advanced Carsten Uphoff and Michael Bader, Institute for Computational Science, Technische Universität München, Japan Germany 5:40-6:00 Reinit: A Simple and Scalable 6:05-6:25 Source-to-Source Translation Fault-Tolerance Model for {MPI} for Code Validation and Acceleration Applications of the Icon Atmospheric General Ignacio Laguna, Lawrence Livermore Circulation Model National Laboratory, USA William Sawyer, Swiss Centre of 6:05-6:25 Title Not Available Scientific Computing, Switzerland George Bosilca, University of Tennessee, Knoxville, USA SIAM Conference on Parallel Processing for Scientific Computing 63

Friday, April 15 Friday, April 15 MS88 CP14 Towards Reliable and Advanced Simulations - Efficient Scientific III of III Computing at Exascale - 4:50 PM-6:10 PM Part II of II Room:Danton 4:50 PM-6:30 PM Chair: Ioan C. Hadade, Imperial Room:Delarue College London, United Kingdom For Part 1 see MS80 4:50-5:05 Robust Domain Exascale supercomputers will Decomposition Methods for Structural bring huge computational power. Mechanics Computations Unfortunately, their architectures have Christophe Bovet, CNRS, Université to be complex and heterogeneous to Paris-Saclay, France; Augustin Parret- counteract the memory and power Fréaud, Safran Tech, France; Pierre walls. Consequently, scientific Gosselet, LMT-Cachan, France; codes designed for homogeneous Nicole Spillane, Ecole Polytechnique, supercomputers will have a France low performance on Exascale 5:10-5:25 A Parallel Contact-Impact supercomputers. Besides, intensive Simulation Framework floating point computation and massive Jie Cheng and Qingkai Liu, Institute of concurrency can degrade accuracy Applied Physics and Computational and reproducibility. Therefore arises a Mathematics, China strong need for tools for efficient and 5:30-5:45 Optimization of the Photon’s reliable port of scientific codes. The Simulation on GPU aim of this minisymposium is to present Clement P. Rey, Benjamin Auer, jointly some research works around Emmanuel Medernach, and Ziad performance characterization and El Bitar, CNRS, Université de verification of floating point accuracy. Strasbourg, France Organizer: Christophe Denis 5:50-6:05 Turbomachinery CFD on CMLA, ENS de Cachan, France Modern Multicore and Manycore Architectures 4:50-5:10 Cere: Llvm Based Codelet Ioan C. Hadade and Luca Di Mare, Extractor and Replayer for Piecewise Imperial College London, United Benchmarking and Optimization Pablo De Oliveira Castro, PRiSM - Kingdom UVSQ, France 5:15-5:35 Using Uncertainties in High Level Performance Prediction Conference Adjourns Philippe Thierry, Intel Corporation, 6:30 PM France 5:40-6:00 Performance Modeling of a Compressible Hydrodynamics Solver on Multicore Cpus Mathieu Peybernes, CEA Saclay, France 6:05-6:25 Topology-Aware Performance Modeling of Parallel SpMVs Amanda Bienz, Luke Olson, and William D. Gropp, University of Illinois at Urbana-Champaign, USA 64 SIAM Conference on Parallel Processing for Scientific Computing

Notes

continued in next column SIAM Conference on Parallel Processing for Scientific Computing 65

Abstracts

Abstracts are printed as submitted by the authors.

148 SIAM Conference on Parallel Processing for Scientific Computing

Notes SIAM Conference on Parallel Processing for Scientific Computing 149149

Speaker and Organizer Index 150 SIAM Conference on Parallel Processing for Scientific Computing

A Ballard, Grey, MS44, 10:35 Thu Boman, Erik G., MS28, 2:40 Wed Abdulla, Ghaleb, MS10, 4:10 Tue Ballard, Grey, MS44, 10:35 Thu Boman, Erik G., MS28, 2:40 Wed Acar, Evrim, MS44, 11:25 Thu Ballard, Grey, MS52, 2:40 Thu Boman, Erik G., MS36, 4:50 Wed Adams, Mark, MS50, 3:30 Thu Basermann, Achim, PP1, 7:15 Wed Bonisch, Thomas, MS42, 11:25 Thu Agullo, Emmanuel, MS7, 1:10 Tue Baskaran, Muthu M., MS44, 11:00 Thu Booth, Joshua D., MS36, 4:50 Wed Agullo, Emmanuel, MS15, 3:20 Tue Bauer, Martin, MS24, 10:35 Wed Borrell, Ricard, PP1, 7:15 Wed Agullo, Emmanuel, MS14, 3:45 Tue Bauer, Martin, MS24, 10:35 Wed Bosilca, George, MS70, 11:00 Fri Agullo, Emmanuel, MS41, 10:35 Thu Beck, Arnaud, CP12, 11:35 Fri Bosilca, George, MS87, 6:05 Fri Agullo, Emmanuel, MS49, 2:40 Thu Behr, Marek, MS31, 2:40 Wed Bouteiller, Aurelien, MS7, 1:35 Tue Agullo, Emmanuel, MS57, 4:50 Thu Benedusi, Pietro, MS5, 1:10 Tue Bovet, Christophe, CP14, 4:50 Fri Agullo, Emmanuel, MS65, 10:35 Fri Bergman, Keren, MS58, 5:40 Thu Brandt, James, MS2, 1:10 Tue Agullo, Emmanuel, MS79, 2:40 Fri Berzins, Martin, MS18, 10:35 Wed Brandt, James, MS10, 3:20 Tue Agullo, Emmanuel, MS87, 4:50 Fri Berzins, Martin, MS26, 2:40 Wed Brandt, James, MS10, 3:20 Tue Al Daas, Hussam, MS4, 1:35 Tue Berzins, Martin, MS34, 4:50 Wed Breuer, Alexander, PP1, 7:15 Wed Aldinucci, Marco, MS62, 5:15 Thu Berzins, Martin, MS67, 11:25 Fri Brown, Jed, MS27, 2:40 Wed Allombert, Victor, MS45, 11:00 Thu Bethel, E. Wes, MS66, 10:35 Thu Brown, Jed, MS27, 2:40 Wed Almgren, Ann S., IP4, 8:30 Thu Bethel, Wes, MS66, 10:35 Fri Brown, Jed, MS35, 4:50 Wed Alonazi, Amani, CP7, 2:40 Wed Bhatele, Abhinav, MS2, 1:10 Tue Bruneval, Fabien, MS61, 5:15 Thu Altenbernd, Mirco, MS49, 3:05 Thu Bhatele, Abhinav, MS2, 1:10 Tue Buecker, Martin, MS31, 3:55 Wed Alturkestani, Tariq L., CP5, 9:50 Wed Bhatele, Abhinav, MS10, 3:20 Tue Buluc, Aydin, MS3, 1:10 Tue Anderson, Jeffrey, MS39, 4:50 Wed Bhowmick, Sanjukta, MS20, 10:35 Wed Buluc, Aydin, MS11, 3:20 Tue Anton, Julie, MS77, 3:55 Fri Bienz, Amanda, MS88, 6:05 Fri Buluc, Aydin, MS11, 4:35 Tue Anzt, Hartwig, MS9, 3:45 Tue Bilardi, Gianfranco, MS12, 4:35 Tue Buluc, Aydin, MS19, 10:35 Wed Arbenz, Peter, MS49, 3:30 Thu Biros, George, MS26, 3:30 Wed Buluc, Aydin, MS68, 11:25 Fri Arteaga, Andrea, MS16, 4:10 Tue Biros, George, MS74, 2:40 Fri Bungartz, Hans-Joachim, MS67, 10:35 Fri Ashcraft, Cleve, MS78, 3:30 Fri Biros, George, MS82, 4:50 Fri Burstedde, Carsten, MS18, 10:35 Wed Avron, Haim, MS36, 6:05 Wed Bischof, Christian H., MS31, 2:40 Wed Burstedde, Carsten, MS26, 2:40 Wed Aykanat, Cevdet, PP1, 7:15 Wed Bisseling, Rob H., MS29, 2:40 Wed Burstedde, Carsten, MS34, 4:50 Wed Azad, Ariful, MS3, 2:00 Tue Bisseling, Rob H., MS37, 4:50 Wed Bisseling, Rob H., MS37, 5:40 Wed Burstedde, Carsten, MS82, 4:50 Fri B Bisseling, Rob H., MS45, 10:35 Thu Buttari, Alfredo, MS7, 1:10 Tue Bader, David A., MS68, 10:35 Fri Biswas, Rupak, MS42, 10:35 Thu Buttari, Alfredo, MS15, 3:20 Tue Bader, David A., MS76, 2:40 Fri Biswas, Rupak, MS50, 2:40 Thu Bader, Michael, MS18, 10:35 Wed C Biswas, Rupak, MS58, 4:50 Thu Calhoun, Donna, MS26, 2:40 Wed Bader, Michael, MS18, 11:00 Wed Biswas, Rupak, MS58, 4:50 Thu Calhoun, Donna, MS72, 11:25 Fri Bader, Michael, MS26, 2:40 Wed Blaheta, Radim, MS30, 3:30 Wed Calvin, Christophe, MS17, 11:50 Wed Bader, Michael, MS34, 4:50 Wed Bland, Arthur S., MS42, 10:35 Thu Canales, Rodrigo, PP1, 7:15 Wed Badia, Rosa M., MS7, 1:10 Tue Blum, Volker, MS61, 5:40 Thu Carson, Erin C., MS4, 1:10 Tue Badia, Rosa M., MS62, 4:50 Thu Bocquet, Marc, MS39, 4:50 Wed Carson, Erin C., MS4, 1:10 Tue Badia, Rosa M., MS70, 10:35 Fri Bocquet, Marc, MS47, 10:35 Thu Carson, Erin C., MS12, 3:20 Tue Bakhtiari, Arash, MS74, 2:40 Fri Boillot, Lionel, MS7, 2:00 Tue Casas, Marc, MS57, 4:50 Thu Bakhtiari, Arash, MS74, 2:40 Fri Castain, Ralph, MS25, 3:05 Wed Bakhtiari, Arash, MS82, 4:50 Fri

Italicized names indicate session organizers SIAM Conference on Parallel Processing for Scientific Computing 151

Catalyurek, Umit V., MS19, 10:35 Denis, Christophe, MS88, 4:50 Fri Fontenaille, Clement, MS6, 2:00 Tue Wed Desai, Ajit, CP4, 9:50 Wed Foster, Erich, MS46, 10:35 Thu Cayrols, Sebastien, MS76, 3:05 Fri Desroziers, Gerald, MS39, 6:10 Wed Franchetti, Franz, MS63, 5:15 Thu Charrier, Dominic E., MS18, 11:50 Detrixhe, Miles L., PP1, 7:15 Wed Friedhoff, Stephanie, MS21, 11:00 Wed Wed Deutsch, Thierry, MS53, 3:55 Thu Frisch, Jérôme, MS64, 4:50 Thu Chauveheid, Daniel, MS80, 3:05 Fri Deveci, Mehmet, MS19, 11:50 Wed Frisch, Jérôme, MS64, 4:50 Thu Chen, Chao, MS28, 3:55 Wed Deveci, Mehmet, MS43, 11:25 Thu Fujii, Akihiro, MS1, 2:00 Tue Chen, Langshi, MS1, 2:25 Tue Di Napoli, Edoardo A., PP1, 7:15 Wed Fukaya, Takeshi, PP1, 7:15 Wed Cheng, Jie, CP14, 5:10 Fri Diachin, Lori A., MS43, 10:35 Thu Futamura, Yasunori, MS69, 11:50 Fri Chien, Andrew A., MS65, 11:25 Fri Diachin, Lori A., MS43, 10:35 Thu Chouly, Franz, MS13, 4:10 Tue Diachin, Lori A., MS51, 2:40 Thu G Chow, Edmond, MS11, 1:10 Tue Gadou, Mohamed, CP6, 11:15 Wed Dipper, Nigel, MS40, 5:15 Wed Ciegas, Raimondas, MS38, 5:40 Wed Galicher, Herve, MS17, 11:25 Wed Dolean, Victorita, MS46, 10:35 Thu Cledat, Romain, MS58, 6:05 Thu Galuzzi, Carlo, MS31, 3:05 Wed Dolean, Victorita, MS54, 2:40 Thu Cojean, Terry, MS15, 4:35 Tue Gamblin, Todd, MS2, 1:10 Tue Dolean, Victorita, MS54, 2:40 Thu Conejero, Javier, MS70, 10:35 Fri Gamblin, Todd, MS10, 3:20 Tue Domínguez Alonso, Jose M., MS56, 3:30 Cools, Siegfried, MS14, 3:20 Tue Gamblin, Todd, MS81, 5:15 Fri Thu Coteus, Paul, IP3, 1:45 Wed Gamell, Marc, MS65, 11:50 Fri Donfack, Simplice, MS22, 10:35 Wed Coti, Camille, MS65, 11:00 Fri Gansterer, Wilfried N., MS79, 3:30 Fri Donofrio, David, MS42, 10:35 Thu Cottereau, Regis, CP10, 3:40 Thu Garcia, Alberto, MS53, 3:05 Thu Donofrio, David, MS58, 4:50 Thu Coulier, Pieter, MS85, 6:05 Fri Gardner, David J., PP1, 7:15 Wed Donofrio, David, MS58, 5:15 Thu Coupez, Thierry, MS72, 11:50 Fri Gardner, David J., MS43, 11:50 Thu Dorostkar, Ali, CP6, 10:55 Wed Crosetto, Paolo, MS86, 5:15 Fri Gava, Frédéric, MS29, 3:30 Wed Dubey, Anshu, MS34, 5:40 Wed Cui, Tao, MS48, 10:35 Thu Gentile, Ann, MS2, 1:10 Tue Duff, Iain, IP1, 5:15 Tue Cui, Tao, MS48, 11:25 Thu Gentile, Ann, MS10, 3:20 Tue Cui, Tao, MS56, 2:40 Thu E Georganas, Evangelos, MS11, 4:10 Tue Edwards, H. Carter, MS9, 4:35 Tue Germann, Timothy C., CP1, 1:50 Tue D Edwards, H. Carter, MS81, 4:50 Fri Ghattas, Omar, MS23, 10:35 Wed Dave, Mukul, PP1, 7:15 Wed Edwards, James A., MS4, 2:25 Tue Ghattas, Omar, MS47, 10:35 Thu De Jong, Wibe A., MS69, 10:35 Fri Einkemmer, Lukas, PP1, 7:15 Wed Ghysels, Pieter, MS28, 3:05 Wed De Oliveira Castro, Pablo, MS88, 4:50 Elbern, Hendrik, MS39, 5:30 Wed Fri Giménez, Domingo, MS63, 6:05 Thu Eljammaly, Mahmoud, MS60, 5:15 Thu De Roure, David, MS40, 5:40 Wed Giraud, Luc, MS41, 10:35 Thu Elliott, James, MS57, 6:05 Thu de Supinski, Bronis R., MS42, 11:00 Giraud, Luc, MS49, 2:40 Thu Thu Emad, Nahid, MS1, 1:35 Tue Giraud, Luc, MS57, 4:50 Thu De Vuyst, Florian, MS80, 3:30 Fri Engelmann, Christian, MS41, 11:50 Thu Giraud, Luc, MS65, 10:35 Fri DeBardeleben, Nathan A., MS57, 5:40 Engwer, Christian, MS75, 3:05 Fri Giraud, Luc, MS79, 2:40 Fri Thu Erhel, Jocelyne, MS6, 2:25 Tue Giraud, Luc, MS87, 4:50 Fri Defour, David, MS16, 3:20 Tue Ertl, Christoph M., CP9, 11:35 Thu Goedecker, Stefan, MS53, 3:30 Thu Delalondre, Fabien, CP9, 11:15 Thu Evans, R. Todd, MS2, 1:35 Tue Golay, Frédéric, MS72, 10:35 Fri Demenkov, Max, CP8, 5:30 Wed Ewart, Timothee, CP11, 4:50 Thu Gorman, Gerard J, MS27, 3:05 Wed Denis, Christophe, PP1, 7:15 Wed Graillat, Stef, MS16, 3:20 Tue Denis, Christophe, MS80, 2:40 Fri F Falgout, Robert D., MS21, 10:35 Wed Grasedyck, Lars, MS8, 1:10 Tue Denis, Christophe, MS80, 2:40 Fri Fisher, Mike, MS47, 11:25 Thu Gratadour, Damien, MS40, 4:50 Wed

Italicized names indicate session organizers 152 SIAM Conference on Parallel Processing for Scientific Computing

Grigori, Laura, MS60, 5:40 Thu Huckle, Thomas K., MS36, 5:15 Wed Keyes, David E., MS59, 6:05 Thu Gropp, William D., MS35, 4:50 Wed Humphrey, Alan, MS34, 5:15 Wed Khan, Arif, MS19, 11:00 Wed Gryazin, Yury A., PP1, 7:15 Wed Hupp, Daniel, MS5, 2:25 Tue Kim, Kibaek, MS32, 2:40 Wed Gsell, Achim, MS56, 2:40 Thu Kim, Kibaek, MS32, 2:40 Wed Guermouche, Abdou, MS7, 1:10 Tue I Klatt, Torbjörn, MS13, 3:45 Tue Iakymchuk, Roman, MS16, 3:20 Tue Guermouche, Abdou, MS15, 3:20 Tue Klawonn, Axel, MS59, 4:50 Thu Ibanez, Dan A., MS51, 3:05 Thu Guillet, Thomas, MS6, 1:35 Tue Klawonn, Axel, MS59, 4:50 Thu Ibeid, Huda, CP7, 3:00 Wed Guittet, Arthur, MS26, 3:55 Wed Knepley, Matthew G., MS27, 2:40 Wed Ihshaish, Hisham, MS68, 11:50 Fri Guo, Xiaohu, MS48, 10:35 Thu Knepley, Matthew G., MS35, 4:50 Wed Iizuka, Mikio, MS21, 11:50 Wed Guo, Xiaohu, MS56, 2:40 Thu Knepley, Matthew G., MS59, 5:40 Thu Imamura, Toshiyuki, MS55, 2:40 Thu Guo, Xiaohu, MS56, 3:55 Thu Knight, Nicholas, MS12, 3:45 Tue Imamura, Toshiyuki, MS63, 4:50 Thu Kokh, Samuel, MS72, 10:35 Fri H Imamura, Toshiyuki, MS63, 4:50 Thu Kolda, Tamara G., MS44, 10:35 Thu Hadade, Ioan C., CP14, 5:50 Fri Inoue, Yuto, PP1, 7:15 Wed Kolda, Tamara G., MS52, 2:40 Thu Hains, Gaetan, MS37, 4:50 Wed Isaac, Toby, MS23, 10:35 Wed Kollet, Stefan J., MS71, 11:50 Fri Ham, David, MS86, 4:50 Fri Isaac, Toby, MS18, 10:35 Wed Kolodziej, Scott P., CP8, 4:50 Wed Hammond, Glenn, MS71, 11:00 Fri Iwashita, Takeshi, MS9, 4:10 Tue Hammond, Jeff R., MS27, 3:30 Wed Komatitsch, Dimitri, MS46, 11:50 Thu Hashmi, Javeria, PP1, 7:15 Wed J Köstler, Harald, MS24, 10:35 Wed Jacquelin, Mathias, MS78, 2:40 Fri Hendrickson, Bruce, PD0, 9:10 Wed Krause, Rolf, MS8, 1:35 Tue Jacquelin, Mathias, MS78, 3:55 Fri Henon, Pascal, MS71, 11:25 Fri Kressner, Daniel, MS44, 11:50 Thu Jakobsson, Arvid, PP1, 7:15 Wed Herbordt, Martin, MS84, 5:40 Fri Krestenitis, Konstantinos, MS64, 5:40 Jansen, Kenneth, MS51, 3:55 Thu Thu Heroux, Michael, MS1, 1:10 Tue Jeon, Kiwan, PP1, 7:15 Wed Kruchinina, Anastasia, CP1, 1:10 Tue Heroux, Michael, MS1, 1:10 Tue Jolivet, Pierre, MS54, 3:30 Thu Heroux, Michael, MS9, 3:20 Tue Jomier, Julien, MS66, 11:00 Thu L Heroux, Michael, MS17, 10:35 Wed Lacoste, Xavier, CP6, 11:55 Wed Jonsthovel, Tom B., MS46, 11:00 Thu Heroux, Michael, MS41, 10:35 Thu Laguna, Ignacio, MS87, 5:40 Fri Heroux, Michael, MS41, 10:35 Thu K Lahnert, Michael, MS64, 6:05 Thu Heroux, Michael, MS49, 2:40 Thu Kågström, Bo T., PP1, 7:15 Wed Lange, Michael, PP1, 7:15 Wed Heroux, Michael, MS57, 4:50 Thu Kahan, Simon, IP7, 1:45 Fri Lanser, Martin, MS23, 11:00 Wed Heroux, Michael, MS65, 10:35 Fri Kaplan, Larry, MS25, 2:40 Wed Larsson, Elisabeth, MS62, 4:50 Thu Heroux, Michael, MS79, 2:40 Fri Kardoš, Juraj, PP1, 7:15 Wed Larsson, Elisabeth, MS70, 10:35 Fri Heroux, Michael, MS87, 4:50 Fri Karlin, Ian, MS81, 4:50 Fri Larsson, Elisabeth, MS70, 11:25 Fri Hill, Chris, MS50, 3:05 Thu Karlin, Ian, MS81, 6:05 Fri Lecouvez, Matthieu, CP12, 10:35 Fri Hirstoaga, Sever, CP12, 11:55 Fri Karlsson, Lars, MS60, 4:50 Thu Leng, Wei, CP10, 3:20 Thu Hogg, Jonathan, MS60, 4:50 Thu Katagiri, Takahiro, MS55, 2:40 Thu Leung, Vitus, MS51, 3:30 Thu Hogg, Jonathan, MS60, 4:50 Thu Katagiri, Takahiro, MS55, 2:40 Thu Levitt, Antoine, MS69, 11:25 Fri Holke, Johannes, MS18, 11:25 Wed Katagiri, Takahiro, MS63, 4:50 Thu Li, Gang, MS48, 11:00 Thu Hollingsworth, Jeffrey, MS55, 3:05 Thu Kaya, Kamer, MS76, 3:30 Fri Li, Jiajia, MS52, 3:55 Thu Horak, D., MS22, 11:00 Wed Kaya, Oguz, MS52, 2:40 Thu Liao, Li, CP5, 9:30 Wed Hori, Atsushi, MS87, 5:15 Fri Kern, Michel, MS71, 10:35 Fri Lin, Deng, MS75, 3:55 Fri Hötzer, Johannes, MS24, 11:00 Wed Kestener, Pierre, MS72, 11:00 Fri Lin, Paul, CP7, 4:00 Wed Huber, Markus, MS49, 3:55 Thu Kestyn, James, CP2, 4:20 Tue Liu, Weifeng, MS60, 6:05 Thu

Italicized names indicate session organizers SIAM Conference on Parallel Processing for Scientific Computing 153

Lockwood, Glenn, MS2, 2:00 Tue Mills, Richard T., MS35, 5:40 Wed Olson, Luke, MS57, 5:15 Thu Lopez, Florent, MS15, 3:20 Tue Minden, Victor, MS85, 5:15 Fri Osei-Kuffuor, Daniel, MS69, 11:00 Fri Loulergue, Frédéric, MS37, 5:15 Wed Mirzadeh, Mohammad, MS82, 5:40 Fri Mo, Zeyao, MS67, 10:35 Fri P Ltaief, Hatem, MS40, 4:50 Wed Paludetto Magri, Victor A., PP1, 7:15 Ltaief, Hatem, MS77, 2:40 Fri Mo, Zeyao, MS67, 11:00 Fri Wed Ltaief, Hatem, MS85, 4:50 Fri Mo, Zeyao, MS75, 2:40 Fri Park, Pyeong Jun, PP1, 7:15 Wed Luisier, Mathieu, MS61, 6:05 Thu Mo, Zeyao, MS83, 4:50 Fri Patki, Tapasya, MS25, 3:55 Wed Luporini, Fabio, MS86, 4:50 Fri Modave, Axel, CP10, 4:00 Thu Patra, Abani, MS47, 11:50 Thu Mohror, Kathryn, MS65, 10:35 Fri Peise, Elmar, CP5, 9:10 Wed M Morris, Karla, CP9, 11:55 Thu Mahoney, Michael, MS3, 2:25 Tue Permann, Cody, MS24, 11:25 Wed Moureau, Vincent R., MS6, 1:10 Tue Malhotra, Dhairya, MS74, 3:05 Fri Perovic, Nevena, MS64, 5:15 Thu Muething, Steffen, MS23, 11:50 Wed Marchal, Loris, MS78, 2:40 Fri Petit, Eric, MS80, 3:55 Fri Mula, Olga, MS13, 4:35 Tue Margenov, Svetozar, MS30, 2:40 Wed Petitet, Antoine, MS29, 3:55 Wed Mundani, Ralf-Peter, MS64, 4:50 Thu Margenov, Svetozar, MS30, 3:05 Wed Petiton, Serge G., MS1, 1:10 Tue Mycek, Paul, CP4, 9:30 Wed Margenov, Svetozar, MS38, 4:50 Wed Petiton, Serge G., MS9, 3:20 Tue Marques, Osni A., MS55, 2:40 Thu N Petiton, Serge G., MS17, 10:35 Wed Marques, Osni A., MS55, 3:55 Thu Naim, Md, MS11, 3:45 Tue Petiton, Serge G., MS17, 10:35 Wed Marques, Osni A., MS63, 4:50 Thu Nakajima, Kengo, MS1, 1:10 Tue Petra, Cosmin G., MS32, 2:40 Wed Maruyama, Naoya, MS84, 4:50 Fri Nakajima, Kengo, MS9, 3:20 Tue Petra, Cosmin G., MS32, 3:05 Wed Maruyama, Naoya, MS84, 4:50 Fri Nakajima, Kengo, MS9, 3:20 Tue Peybernes, Mathieu, MS88, 5:40 Fri Mary, Theo, MS77, 3:05 Fri Nakajima, Kengo, MS17, 10:35 Wed Pichon, Gregoire, MS85, 4:50 Fri Masliah, Ian, PP1, 7:15 Wed Namyst, Raymond, MS67, 11:50 Fri Pinar, Ali, MS76, 3:55 Fri Masson, Roland, MS71, 10:35 Fri Neckel, Tobias, MS5, 1:10 Tue Piotrowski, Zbigniew P., CP10, 2:40 Thu McColl, Bill, MS45, 10:35 Thu Neckel, Tobias, MS13, 3:20 Tue Podobas, Artur, MS70, 11:50 Fri McIntosh-Smith, Simon, MS79, 3:05 Neckel, Tobias, MS21, 10:35 Wed Polizzi, Eric, MS53, 2:40 Thu Fri Neytcheva, Maya, MS30, 2:40 Wed Polizzi, Eric, MS61, 4:50 Thu McKerns, Michael, CP4, 9:10 Wed Neytcheva, Maya, MS38, 4:50 Wed Polizzi, Eric, MS61, 4:50 Thu Mehl, Miriam, MS74, 3:30 Fri Neytcheva, Maya, MS38, 4:50 Wed Polizzi, Eric, MS69, 10:35 Fri Mellor-Crummey, John, MS10, 3:45 Ng, Esmond G., MS3, 1:35 Tue Tue Popinet, Stephane, MS26, 3:05 Wed Ng, Esmond G., MS78, 2:40 Fri Melman, Aaron, CP2, 3:20 Tue Pothen, Alex, MS3, 1:10 Tue Nielsen, Eric, MS50, 3:55 Thu Merta, Michal, MS22, 11:25 Wed Pothen, Alex, MS3, 1:10 Tue Norman, Michael L., MS75, 2:40 Fri Meyer, Ulrich, MS68, 11:00 Fri Pothen, Alex, MS11, 3:20 Tue Norris, Boyana, MS20, 10:35 Wed Meyerhenke, Henning, MS68, 10:35 Pothen, Alex, MS19, 10:35 Wed Norris, Boyana, MS20, 11:50 Wed Fri Prabhakaran, Suraj, MS33, 5:15 Wed Notay, Yvan, MS36, 5:40 Wed Meyerhenke, Henning, MS68, 10:35 Prokopenko, Andrey, CP7, 3:20 Wed Fri O Meyerhenke, Henning, MS76, 2:40 Fri Oliker, Leonid, MS42, 10:35 Thu R Raasch, Steven, MS41, 11:00 Thu Mihajlovic, Milan D., MS30, 3:55 Oliker, Leonid, MS50, 2:40 Thu Wed Rajamanickam, Siva, MS28, 2:40 Wed Oliker, Leonid, MS50, 2:40 Thu Mills, Richard T., MS27, 2:40 Wed Rajamanickam, Siva, MS36, 4:50 Wed Oliker, Leonid, MS58, 4:50 Thu Mills, Richard T., MS35, 4:50 Wed Rajan, Deepak, MS32, 3:30 Wed Ramet, Pierre, MS78, 3:05 Fri

Italicized names indicate session organizers 154 SIAM Conference on Parallel Processing for Scientific Computing

Rauber, Thomas, MS30, 2:40 Wed Rumley, Sebastien, MS73, 3:55 Fri Sid-Lakhdar, Wissam M., MS17, 11:00 Rauber, Thomas, MS38, 4:50 Wed Rump, Siegfried M., MS16, 3:45 Tue Wed Rauber, Thomas, MS38, 6:05 Wed Rünger, Gudula, MS30, 2:40 Wed Siebenborn, Martin, MS8, 2:00 Tue Reinhardt, Steve P., CP9, 10:55 Thu Rünger, Gudula, MS30, 2:40 Wed Simhadri, Harsha Vardhan, MS12, 4:10 Tue Revol, Nathalie, MS16, 4:35 Tue Rünger, Gudula, MS38, 4:50 Wed Simmendinger, Christian, MS14, 4:10 Rey, Clement P., CP14, 5:30 Fri Rupp, Karl, MS35, 5:15 Wed Tue Rheinbach, Oliver, MS59, 4:50 Thu Ruprecht, Daniel, MS5, 1:10 Tue Simon, Bertrand, PP1, 7:15 Wed Rheinbach, Oliver, MS59, 5:15 Thu Ruprecht, Daniel, MS13, 3:20 Tue Simon, Horst D., SP1, 9:25 Thu Riedy, Jason, MS68, 10:35 Fri Ruprecht, Daniel, MS13, 3:20 Tue Siso, Sergi-Enric, MS56, 3:05 Thu Riedy, Jason, MS76, 2:40 Fri Ruprecht, Daniel, MS21, 10:35 Wed Skjellum, Anthony, MS87, 4:50 Fri Riedy, Jason, MS76, 2:40 Fri Smith, Aaron, MS84, 6:05 Fri Riha, Lubomir, MS14, 4:35 Tue S Sakurai, Tetsuya, MS79, 3:55 Fri Smith, Barry F., MS27, 2:40 Wed Rizzi, Francesco, MS41, 10:35 Thu Salvini, Stefano, MS40, 6:05 Wed Smith, Barry F., MS35, 4:50 Wed Rizzi, Francesco, MS49, 2:40 Thu Samaddar, Debasmita, MS5, 1:35 Tue Smith, Shaden, MS52, 3:05 Thu Rizzi, Francesco, MS49, 2:40 Thu Samaey, Giovanni, MS5, 2:00 Tue Sokolova, Irina, PP1, 7:15 Wed Rizzi, Francesco, MS57, 4:50 Thu Sanan, Patrick, PP1, 7:15 Wed Solomonik, Edgar, MS52, 3:30 Thu Rizzi, Francesco, MS65, 10:35 Fri Sandu, Adrian, MS39, 4:50 Wed Sosonkina, Masha, CP1, 1:30 Tue Rizzi, Francesco, MS79, 2:40 Fri Sandu, Adrian, MS39, 5:50 Wed Soundarajan, Sucheta, MS20, 11:00 Rizzi, Francesco, MS87, 4:50 Fri Sandu, Adrian, MS47, 10:35 Thu Wed Roberts, Malcolm, CP12, 10:55 Fri Sano, Kentaro, MS84, 5:15 Fri Spillane, Nicole, PP1, 7:15 Wed Roberts, Malcolm, CP13, 3:20 Fri Sawyer, William, MS86, 4:50 Fri Spillane, Nicole, MS54, 3:55 Thu Robey, Robert, CP3, 9:10 Wed Sawyer, William, MS86, 6:05 Fri Springer, Paul, CP11, 5:30 Thu Rodrigues, Arun, MS73, 2:40 Fri Saxena, Gaurav, MS83, 5:40 Fri Srinivasan, Sriram, MS20, 10:35 Wed Rodrigues, Arun, MS73, 3:05 Fri Schaal, Kevin, MS34, 6:05 Wed Stadler, Georg, MS23, 10:35 Wed Rogers, Benedict, MS48, 10:35 Thu Schornbaum, Florian, MS75, 3:30 Fri Stanisic, Luka, MS15, 3:45 Tue Roller, Sabine P., MS82, 5:15 Fri Schreiber, Martin, MS5, 1:10 Tue Stoica, Ion, IP6, 8:45 Fri Roose, Dirk, MS37, 6:05 Wed Schreiber, Martin, MS13, 3:20 Tue Stoyanov, Dimitar M., MS22, 11:50 Rouet, Francois-Henry, MS77, 2:40 Wed Schreiber, Martin, MS21, 10:35 Wed Fri Sukkari, Dalal, MS15, 4:10 Tue Rouet, Francois-Henry, MS77, 2:40 Schreiber, Martin, MS21, 11:25 Wed Fri Schulz, Martin, MS25, 2:40 Wed T Rouet, Francois-Henry, MS85, 4:50 Schulz, Martin, MS33, 4:50 Wed Takahashi, Daisuke, MS63, 5:40 Thu Fri Schulz, Martin, MS41, 11:25 Thu Takizawa, Hiroyuki, MS55, 3:30 Thu Roux, Francois-Xavier, CP13, 2:40 Fri Schwartz, Oded, MS4, 1:10 Tue Tan, Guangming, MS83, 4:50 Fri Rubensson, Emanuel H., CP6, 11:35 Schwartz, Oded, MS12, 3:20 Tue Taylor, Valerie, MS2, 2:25 Tue Wed Schwarz, Angelika, PP1, 7:15 Wed Teranishi, Keita, MS25, 2:40 Wed Rudi, Johann, MS23, 10:35 Wed Sergent, Marc, MS7, 2:25 Tue Teranishi, Keita, MS33, 4:50 Wed Rudi, Johann, MS23, 10:35 Wed Shaikh, Mohsin A., CP12, 11:15 Fri Teranishi, Keita, MS41, 10:35 Thu Ruede, Ulrich J., IP2, 8:15 Wed Shalf, John, MS73, 2:40 Fri Teranishi, Keita, MS49, 2:40 Thu Ruede, Ulrich J., MS67, 10:35 Fri Shephard, Mark S., MS43, 11:00 Thu Teranishi, Keita, MS57, 4:50 Thu Ruede, Ulrich J., MS75, 2:40 Fri Shinano, Yuji, MS32, 3:55 Wed Teranishi, Keita, MS65, 10:35 Fri Ruede, Ulrich J., MS83, 4:50 Fri Shoji, Fumiyoshi, MS42, 11:50 Thu Teranishi, Keita, MS79, 2:40 Fri Ruiz, Daniel, MS19, 11:25 Wed Showerman, Michael, MS10, 4:35 Tue Teranishi, Keita, MS87, 4:50 Fri

Italicized names indicate session organizers SIAM Conference on Parallel Processing for Scientific Computing 155

Theillard, Maxime, MS82, 6:05 Fri Weinzierl, Tobias, MS74, 3:55 Fri Yzelman, Albert-Jan N., MS45, 11:50 Thibault, Samuel, MS62, 6:05 Thu Wilde, Torsten, MS33, 5:40 Wed Thu Thierry, Philippe, MS88, 5:15 Fri Wilke, Jeremiah, MS73, 2:40 Fri Z Thies, Jonas, CP2, 4:00 Tue Willemsen, Benno, MS31, 3:30 Wed Zafari, Afshin, MS62, 4:50 Thu Tiskin, Alexander, MS29, 3:05 Wed Winkelmann, Jan, CP2, 3:40 Tue Zampini, Stefano, MS46, 11:25 Thu Toledo, Sivan A., MS12, 3:20 Tue Wittum, Gabriel, MS8, 1:10 Tue Zanotti, Olindo, MS34, 4:50 Wed Toledo, Sivan A., PD0, 9:10 Wed Wittum, Gabriel, MS8, 2:25 Tue Zavalnij, Bogdan, CP8, 5:10 Wed Torun, F. Sukru, CP6, 10:35 Wed Wittum, Gabriel, PP1, 7:15 Wed Zhang, Weijian, CP3, 9:30 Wed Toth, Gyula I., MS24, 11:50 Wed Wolf, Matthew, MS66, 11:25 Thu Tournier, Pierre-Henri, MS54, 3:05 Womeldorff, Geoff, MS81, 4:50 Fri Thu Womeldorff, Geoff, MS81, 5:40 Fri Turkiyyah, George M, MS23, 11:25 Wyrzykowski, Roman, MS38, 5:15 Wed Wed Turkiyyah, George M, MS77, 3:30 Fri X Xing, Eric, MS45, 11:25 Thu U Xing, Feng, MS71, 10:35 Fri Ucar, Bora, MS44, 10:35 Thu Xing, Feng, MS71, 10:35 Fri Ucar, Bora, MS52, 2:40 Thu Xu, Xiaowen, CP7, 3:40 Wed Uno, Atsuya, MS33, 4:50 Wed Uphoff, Carsten, MS86, 5:40 Fri Y Yalamanchili, Sudhakar, MS73, 3:30 Fri V Yamada, Susumu, PP1, 7:15 Wed Vacondio, Ranato, MS48, 11:50 Thu Yamazaki, Ichitaro, MS28, 3:30 Wed Valero-Lara, Pedro, PP1, 7:15 Wed Yang, Chao, MS53, 2:40 Thu Vandierendonck, Hans, MS62, 5:40 Yang, Chao, MS53, 2:40 Thu Thu Yang, Chao, MS61, 4:50 Thu Vanroose, Wim I., MS6, 1:10 Tue Yang, Chao, MS69, 10:35 Fri Vanroose, Wim I., MS14, 3:20 Tue Yang, Ulrike M., MS51, 2:40 Thu Vanroose, Wim I., MS22, 10:35 Wed Yang, Yang, CP11, 5:10 Thu Varduhn, Vasco, CP10, 3:00 Thu Yang, Zhang, MS67, 10:35 Fri Vaughan, Courtenay T., CP9, 10:35 Thu Yang, Zhang, MS75, 2:40 Fri Vecharynski, Eugene, PP1, 7:15 Wed Yang, Zhang, MS83, 4:50 Fri Verbosio, Fabio, CP2, 4:40 Tue Yang, Zhang, MS83, 5:15 Fri Verschelde, Jan, CP13, 3:00 Fri Yetkin, Fatih, MS79, 2:40 Fri Vishwanath, Venkatram, MS66, 11:50 Yokota, Rio, MS85, 5:40 Fri Thu You, Yang, MS4, 2:00 Tue von Looz, Moritz, MS20, 11:25 Wed Yzelman, Albert-Jan N., MS29, 2:40 Wed W Yzelman, Albert-Jan N., MS29, 2:40 Wang, Weichung, PP1, 7:15 Wed Wed Warburton, Tim, MS35, 6:05 Wed Yzelman, Albert-Jan N., MS37, 4:50 Washio, Takumi, IP5, 1:45 Thu Wed Weaver, Anthony, MS47, 11:00 Thu Yzelman, Albert-Jan N., MS45, 10:35 Weidendorfer, Josef, MS25, 3:30 Wed Thu

Italicized names indicate session organizers 156 SIAM Conference on Parallel Processing for Scientific Computing

Notes SIAM Conference on Parallel Processing for Scientific Computing 157 158 SIAM Conference on Parallel Processing for Scientific Computing