Warwick.Ac.Uk/Lib-Publications MENS T a T a G I MOLEM

A Thesis Submitted for the Degree of PhD at the University of Warwick Permanent WRAP URL: http://wrap.warwick.ac.uk/102343/ Copyright and reuse: This thesis is made available online and is protected by original copyright. Please scroll down to view the document itself. Please refer to the repository record for this item for information to help you to cite it. Our policy information is available from the repository home page. For more information, please contact the WRAP Team at: [email protected] warwick.ac.uk/lib-publications MENS T A T A G I MOLEM U N IS IV S E EN RS IC ITAS WARW The Readying of Applications for Heterogeneous Computing by Andy Herdman A thesis submitted to The University of Warwick in partial fulfilment of the requirements for admission to the degree of Doctor of Philosophy Department of Computer Science The University of Warwick September 2017 Abstract High performance computing is approaching a potentially significant change in architectural design. With pressures on the cost and sheer amount of power, additional architectural features are emerging which require a re-think to the programming models deployed over the last two decades. Today's emerging high performance computing (HPC) systems are maximis- ing performance per unit of power consumed resulting in the constituent parts of the system to be made up of a range of different specialised building blocks, each with their own purpose. This heterogeneity is not just limited to the hardware components but also in the mechanisms that exploit the hardware components. These multiple levels of parallelism, instruction sets and memory hierarchies, result in truly heterogeneous computing in all aspects of the global system. These emerging architectural solutions will require the software to exploit tremendous amounts of on-node parallelism and indeed programming models to address this are emerging. In theory, the application developer can design new software using these models to exploit emerging low power architectures. However, in practice, real industrial scale applications last the lifetimes of many architectural generations and therefore require a migration path to these next generation supercomputing platforms. Identifying that migration path is non-trivial: With applications spanning many decades, consisting of many millions of lines of code and multiple scientific algorithms, any changes to the programming model will be extensive and invasive and may turn out to be the incorrect model for the application in question. This makes exploration of these emerging architectures and programming models using the applications themselves problematic. Additionally, the source code of many industrial applications is not available either due to commercial ii or security sensitivity constraints. This thesis highlights this problem by assessing current and emerging hardware with an industrial strength code, and demonstrating those issues described. In turn it looks at the methodology of using proxy applications in place of real industry applications, to assess their suitability on the next generation of low power HPC offerings. It shows there are significant benefits to be realised in using proxy applications, in that fundamental issues inhibiting exploration of a particular architecture are easier to identify and hence address. Evaluations of the maturity and performance portability are explored for a number of alternative programming methodologies, on a number of architectures and highlighting the broader adoption of these proxy applications, both within the authors own organisation, and across the industry as a whole. iii Acknowledgements This thesis, and the supporting research, was made possible by the support of a number of people; their academic, professional and personal support throughout the years spent on this work have been invaluable. Firstly, I'd like to thank my employer AWE who has invested not just financially but also allowed dedicated time for me to carry out this research and subsequent write up. In particular thanks go to Paul Tomlinson, Paul Ellicott and Andrew Randewich. A special thank you also goes to Wayne Gaudin for his technical input and general willingness as a sounding board. Colleagues and peers across the HPC industry have also been of assistance: LANL, LLNL, SNL, PGI, Cray, Intel, IBM have either provided access to hardware or given technical input, without which this work would not have been possible. I would like to thank Sriram Swaminarayan, Todd Gamblin, Scott Futral, Si Hammond, Jim Ang, Mike Glass, Doug Miles, Michael Wolfe, Craig Toepfer, Dave Norton, Alistair Hart, John Levesque, Fiona Burgess, Andy Mason, Andy Mallinson, Victor Gamayunov, Stephen Blair-Chappell and Peter Mayes. Also I'd like to express my thanks to the PhD students in the High Perfor- mance and Scientific Systems Group throughout my time at Warwick: Dr Si Hammond, Dr Andy Mallinson, Dr Olly Perks, Dr David Beckingsale, Dr James Davis, Dr Steve Wright, Dr John Pennycook and Dr Bob Bird for their advice and technical steer. I'd also like to express a big thank you to Prof. Stephen Jarvis for acceptance onto the PhD programme and his supervision. Finally, I dedicate this PhD to my family, Sarah, Katie, Jack, Mam and sister; and to my Dad - hope you would have been proud - I love you and I miss you. Andy Herdman - September 2017 iv Declarations This thesis is submitted to the University of Warwick in support of the author's application for the degree of Doctor of Philosophy. It has been composed by the author and has not been submitted in any previous application for any degree. Parts of this thesis have been previously published by the author in peer re- viewed conferences and technical papers: • Herdman, J. A., et al. \Benchmarking and modelling of POWER7, West- mere, BG/P, and GPUs: an industry case study." In: 1st International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS 10), New Orleans, LA, USA. ACM SIGMETRICS Performance Evaluation Review 38.4 (2011): 16-22 [103]. • Herdman, J. A., et al. \Accelerating hydrocodes with OpenACC, OpenCL and CUDA." SC Companion IEEE, 2012 High Performance Computing, Networking Storage and Analysis, Salt Lake City, UT, 10-16 Nov 2012 [101]. • Herdman, J. A., et al. \Achieving portability and performance through OpenACC." Proceedings of the First Workshop on Accelerator Program- ming using Directives (WACCPD '14), held as part of SC14 The Inter- national Confernece for HPC, Networking, Storage and Analysis. New Orleans, LA, USA. IEEE Press, 2014 [102]. In addition developments as part of this thesis's contributions, were among the integrated collection of mini-applications included in the 2013 R&D 100 \Oscars of Innovation" [172] award winning Mantevo test suite [104]. Also, Chapter 5 has been selected for inclusion in the forthcoming book: \Parallel Programming with OpenACC" [88]. Other related publications by the author, although not reproduced in this thesis, include: • Davis, J. A., Mudalige, G. R., Hammond, S. D., Herdman, J. A., Miller, I., and Jarvis, S. A. \Predictive analysis of a hydrodynamics application on large-scale CMP clusters." Computer Science-Research and Development 26, no. 3-4 (2011): 175-185. [79] • Mallinson, A. C., Beckingsale, D. A., Gaudin, W. P., Herdman, J. A., Levesque, J. M., and Jarvis, S. A. \CloverLeaf: Preparing hydrodynamics v codes for Exascale." In: A New Vintage of Computing : CUG2013, Napa, CA, 6 - 9 May 2013. Published in: A New Vintage of Computing : Preliminary Proceedings (2013) [137] • Pennycook, S. J., Hammond, S. D., Wright, S. A., Herdman, J. A., Miller, I., and Jarvis, S. A. \An investigation of the performance portability of OpenCL." Journal of Parallel and Distributed Computing 73, no. 11 (2013): 1439-1450. [162] • Mallinson, A. C., Beckingsale, D. A., Gaudin, W. P., Herdman, J. A., and Jarvis, S. A. \Towards portable performance for explicit hydrodynamics codes." (2013). [138] • Gaudin, W. P, Mallinson, A. C., Perks, O., Herdman, J. A., Beckingsale, D. A., Levesque, J. M., and Jarvis, S. A. \Optimising Hydrodynamics applications for the Cray XC30 with the application tool suite." The Cray User Group (2014): 4-8. [94] • Mudalige, G. R., Reguly, I. Z., Giles, M. B., Mallinson, A. C., Gaudin, W. P., and Herdman, J. A. \Performance Analysis of a High-Level Abstraction Based Hydrocode on Future Computing Systems." In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simula- tion, pp. 85-104. Springer International Publishing, 2014. [149] • Mallinson, A. C., Jarvis, S. A., Gaudin, W. P., and Herdman, J. A. \Ex- periences at scale with PGAS versions of a Hydrodynamics application." In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, p. 9. ACM, 2014. [139] In addition, research conducted during the period of registration has also led to the following publications, which are not directly linked to the contents of this thesis: • Hammond, S. D., Mudalige, G. R., Smith, J. A., Davis, J. A., Jarvis, S. A., Holt, J., Miller, I., Herdman, J. A., and Vadgama, A. \To upgrade or not to upgrade? Catamount vs. Cray Linux Environment." In Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, pp. 1-8. IEEE, 2010. [99] • Wright, S. A., Hammond, S. D., Pennycook, S. J., Miller, I., Herdman, J. A., and Jarvis, S. A. \LDPLFS: improving I/O performance without application modification.” In Parallel and Distributed Processing Sympo- sium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, pp. 1352-1359. IEEE, 2012 [200] vi • Perks, O., Beckingsale, D. A., Hammond, S. D., Miller, I., Herdman, J. A., Vadgama, A., Bhalerao, A. H., He, L., and Jarvis, S. A. Towards Automated Memory Model Generation via Event Tracing, The Computer Journal 56(2), June 2012 [165] • Perks, O., Beckingsale, D. A., Dawes, A. S., Herdman, J. A., Mazau- ric, C., and Jarvis, S. A. \Analysing the influence of InfiniBand choice on OpenMPI memory consumption." In High Performance Computing and Simulation (HPCS), 2013 International Conference on, pp.

Load more