The Hartree Centre

CSE_2013_V3_Layout 1 16/10/2013 13:23 Page 1

Welcome to the 2012-13 issue of our Scientific Highlights - The First Annual Report from STFC’s new Scientific Computing Department (SCD)

This has been an excellent year for the will minimise this quantity. On the same the Department to benefit from the Department with too many highlights day we also signed a major research skills and resources available at these to cover here. We have selected a agreement with Unilever and are famous laboratories. In addition we single contribution from each group mapping out a range of scientific have signed a research agreement with within the Department which hopefully projects that will lead to direct impact NVIDIA and are now working with them will provide a flavour of the incredible on that company. to port codes to GPGPU based systems. breadth of exciting activities underway We are also hard at work on a number within the new Department. of other procurements including a major upgrade of the JASMIN super data cluster. There is also a specific highlight from the Tier 1 team. There is a small note on one of the data racks in our machine room that announces that the Higgs boson is here. A clear example of the excellence of the services we provide leading to world class science.

Hopefully you will find something of interest in this issue. We are always delighted to meet new people and establish new interactions so why not We were delighted to welcome George We were also honoured to welcome come and visit us at one of the many Osborne to the Laboratory on the 1st Douglas Hartree’s daughter and one of events, shows and workshops that we February 2013 to officially open the his sons to an open day in April. This hold and attend or contact us via our Hartree Centre. During his visit he also event was held as a Campus wide website. announced an additional £19M activity and was a really enjoyable day investment into the Department for featuring tours of the machine room www.stfc.ac.uk/scd “Energy Efficient Computing” and work and demonstrations of the new is underway to develop this theme. visualisation facilities as well as Traditionally high performance presentations on the purpose and scope computing has concentrated on of the Hartree Centre. minimising wall clock time to solution. We have also been working to With increasing electricity prices it is strengthen our links to international now recognised that Watts to solution Adrian Wander centres of excellence and we are very is another important metric and the Director of the Scientific pleased to have developed MOUs with Computing Department new investment will concentrate on the EPCC in Edinburgh and LLNL in the USA. design of algorithms and software that Hopefully these new links will enable

CSED Highlights 2012 - 2013 1 CSE_2013_V3_Layout 1 16/10/2013 13:23 Page 2

SCD Highlights 2012 - 2013

HSL: 50 years of getting the right answer

Domain decomposition of reaction-diffusion equation on a brain.

Computational Science & Engineering Department 2 CSE_2013_V3_Layout 1 16/10/2013 13:23 Page 3

In March 2013 the Numerical the best-known codes for solving product of lower and upper triangular Analysis Group released a new large sparse linear systems of matrices L and U. In the symmetric version of the HSL Mathematical equations case, for positive-definite problems Software Library. HSL 2013 is Ax = b U = LT (Cholesky factorization) or, particularly significant because it is where the matrix A and the right- more generally, U = DLT , where D is a 50 years since the first release in hand side b are given and it is (block) diagonal matrix. 1963 of what was then the required to find the solution x. Such Forward elimination Harwell Subroutine Library. systems arise in many applications, Ly = b The library was originally developed including fluid flow problems, followed by backward substitution and maintained by the Numerical electromagnetic scattering, Ux = y Analysis Group of the Theoretical structural analysis and financial completes the solution process for Physics Division of A.E.R.E., Harwell. modelling. In many situations, each b. Such methods are important It was initially used on the Harwell including the industrial processing of because of their generality and IBM 7030 (STRETCH) machine; today complex non-Newtonian fluids, robustness. Indeed, black-box direct it is used worldwide, on a wide the solution of large sparse linear solvers are frequently the method of variety of computing platforms, systems is the single most choice because finding and from supercomputers to modern computationally expensive step. implementing a good preconditioner desktop machines and laptops. Consequently, reducing the solve for an iterative method can be time can result in significant savings expensive in terms of developer time, The majority of HSL users are in the total simulation time, while with no guarantee that it will academics. They represent a wide novel approaches allow ever larger significantly outperform a direct range of disciplines and come from systems to be tackled. Thus new method. For the tough linear systems universities and research labs in algorithms and packages are arising from some applications, the UK, Europe and beyond. HSL constantly being developed by the direct methods are currently the only routines are also used by a number of Numerical Analysis Group for feasible solution methodology as no commercial organisations working in inclusion within HSL. reliable preconditioner is available. diverse areas, including software However, for some problems, companies, oil and gas engineering and Sparse linear systems are normally especially large three-dimensional distribution, animation, and car design solved using either a direct method applications, iterative methods have to for Formula 1. or an iterative method (such as be employed because of the memory conjugate gradients or GMRES), demands of direct methods, which Although originally a general- with the use of the latter being generally grow rapidly with problem purpose mathematical library, in generally dependent upon the size. recent times HSL has specialised in existence of a suitable providing state-of-the-art routines preconditioner. Most sparse direct for handling large-scale sparse matrix methods are variants of Gaussian calculations efficiently and reliably. elimination and involve the explicit In particular, HSL includes some of factorization of A into the

Figure 1: HSL users by location. Figure 2: HSL users by discipline.

SCD Highlights 2012 - 2013 3 CSE_2013_V3_Layout 1 16/10/2013 13:23 Page 4

SCD Highlights 2012 - 2013

Figure 3: Structural analysis of a helicopter using HSL_MA97.

An important new code within HSL identical output. Not only is this potentially conflicts with that of bit 2013 is HSL_MA97 [1]. an important aid in debugging and compatibility. HSL_MA97 has tackled This direct solver is designed for correctness checking, some this issue without a serious multicore machines. It is able to industries (including nuclear, degradation of performance [2]. solve both symmetric positive- aerospace and finance) can require definite and indefinite systems. reproducible results to satisfy Details of HSL 2013 and how to A key feature of HSL_MA97 is that it regulatory requirements. obtain a licence to use the packages computes bit compatible solutions. For sequential solvers, achieving bit are available at www.hsl.rl.ac.uk or In some applications, users want bit compatibility is not a problem. contact the HSL team [email protected] compatibility (reproducibility) But enforcing bit compatibility can in the sense that two runs of the limit dynamic parallelism and, when solver on the same machine with designing a parallel sparse direct identical input data should produce solver, the goal of efficiency

Authors

J.D. Hogg and J.A. Scott. STFC Rutherford Appleton Laboratory

Acknowledgements and References

The work of the Numerical Analysis Group is largely funded by a research grant from EPSRC (EP/I013067/1).

[1] J.D. Hogg and J.A. Scott. HSL_MA97: a bit-compatible multifrontal code for sparse symmetric systems. Technical Report RAL-TR-2011-024. [2] J.D. Hogg and J.A. Scott. Achieving bit compatibility in sparse direct solvers. Technical Report RAL-P-2012-005.

Scientific Computing Department 4 CSE_2013_V3_Layout 1 16/10/2013 13:23 Page 5

The Hartree Centre: Open for Business

SCD Highlights 2012 - 2013 5 CSE_2013_V3_Layout 1 16/10/2013 13:23 Page 6

SCD Highlights 2012 - 2013

The Hartree Centre was established sales forces within the UK and to The event was a great success as the result of the Tildersley progress the IBM opportunities attended by over 120 scientists and report1published in July 2011, this within the Hartree Centre. industrial representatives in the field report identified the importance of of modelling and simulation. applying modelling and simulation to Following the commissioning of the industrially relevant problems in High Performance Computing (HPC) Presentations included: Unilever's order to maintain the systems in September 2012 first Vision for HPC (Massimo Noro – competitiveness of the UK within commercial contracts were signed Unilever), Manufacturing Industry global innovation, manufacturing with end users and the agreement Trends Driving HPC Innovation (Wim and the service sectors. The for delivery of HPC on Demand Slagter – ANSYS) and Up Front subsequent Government investment services through OCF EnCore. Giving Simulation and CAE Driven Design in e-Infrastructure followed in organisations access HPC systems (Tayeb Zeguer - Jaguar Land Rover). October 2011 with the centre BlueJoule and BlueWonder on a pay- officially opening on 1st February as-you-go bias via commercial Hartree Centre demonstrated its 2012. contract. five methods of engagement: This article will track some of the HPC as a Service to Industry event Applications and optimisation – significant milestones in the To launch the Hartree Centre and applying existing modelling and development of the Hartree Centre publicise the opportunity for industry simulation codes to industrial problems based at STFC Daresbury as it to collaborate with the newly formed or optimising existing codes to run on became officially open for business. Hartree Centre we arranged a two the next generation hardware. day event (29th and 30th January Following the signing of the contract 2013) at the STFC Daresbury Campus; Software Development – developing with IBM as the technology provider HPC as a Service to Industry. The aim and optimising new applications. for the Hartree Systems Blue Joule of this event was to: (IBM BlueGene/Q) and Blue Wonder HPC on demand – providing access to (Intel based IBM iDataplex system) • Have industry describe their views HPC systems on a pay-as-you-go basis. were procured. Part of this contract on the opportunity for HPC, Big was not only to provide hardware Data and Visualisation could Collaboration – partner in open and systems but to set up a benefit their business in the innovation projects or to jointly “Collaboratory in association with future. applying for European or UK research IBM.” This is a commitment on both funding grants. sides to drive the development, • Launch the Hartree Centre and deployment and demonstration of the capabilities and the services Training and Education – Hartree is new software to accelerate that the centre would provide. looking to help in creating the next innovation. The Collaboratory generation of researchers trained to use HPC, data analytics and enables cooperation on both • Capture best practice around visualisation techniques. technical and business development industrial and academic elements of building a successful collaboration in the area of HPC, Hartree Centre. David Moss joined modelling and simulation. the Hartree team seconded from IBM to act as an interface into the large

Figure 1: BlueJoule, BlueWonder, IBM Tape store, DDN disc store.

Scientific Computing Department 6 CSE_2013_V3_Layout 1 16/10/2013 13:23 Page 7

First significant Industrial Partner Creating World Class Collaboration Open Service Innovation Following the industrial launch event To accelerate the opportunities in As well as the contracts working for the Hartree Centre signed its first High Performance Computing, Big directly with clients there is other significant industrial engagement Data and Visualisation and the opportunities to create value with Unilever. This was a five year application of modelling and through open innovation, this is agreement to provide modelling and simulation it is important build defined as the ability for knowledge simulation expertise in areas like collaborative relationships with the to flow in and out of the computer aided formulation and best researchers in this area. Hartree organisation to accelerate the engineering. These engagements Centre was delighted to announce development of intellectual property take two potential forms: our first significant collaboration and innovation by delivering new 1) Directly contracted projects with with a US National Laboratory with a value. A good exemplar of this type Unilever World Class Reputation is this field. of interaction is the collaboration 2) Collaborative research project, as Lawrence Livermore National with Optis. Using their software in well as Hartree Centre and Laboratory's High Performance conjunction with the Hartree Unilever these can include Computing Innovation Center visualisation equipment and additional organisations or SMEs. (HPCIC) and the Science and expertise Virtual Engineering Centre Technology Facilities Council (STFC) (University of Liverpool) we are able For a fast moving consumer goods in the United Kingdom will to offer to companies like Bentley a company, speed is all that matters, collaborate to expand industry's use unique service combining software especially when it needs to put on of supercomputing to boost applications and the facilities, the market hundreds of new economic competitiveness in the helping Bentley to design the next products every year. Today, “to out- two countries. The U.S. Department generation of car interiors 3. compute is to outcompete2”. Speed of Energy and The Department for The Hartree Centre has initially is what gives a company like Business, Innovation & Skills (BIS) in identified nine target areas: Unilever the competitive advantage. the United Kingdom jointly • Life sciences announced the collaborative “The Unilever R&D strategy commits • Engineering agreement 29th August 2013. us to a digitally enabled future of • Materials and chemistry eScience and big data. Our • Environment partnership with STFC will give our • Nuclear science R&D community a powerful • Power competitive edge. When we have the • Data analytics HPC computing capabilities of the • Small and medium-sized Hartree Centre fully integrated with enterprises (SMEs) our global strategic science • Government partnerships, we’ll be able to tackle If you would like to discover more even bigger scientific challenges and From left: Fred Streitz,HPCIC, Dona about the Hartree Centre and how it unlock breakthrough innovations Crawford, LLNL; John Bancroft, STFC; John could work in collaboration with faster.” – Jim Crilly, SVP Strategic Womersley (seated), chief executive STFC; your company contact Hartree Science Group, Unilever. Priya Guha, UK Consulate; Bill Goldstein, Centre on Tel: +44 (0)1925 603 444 LLNL; Parney Albright (seated), LLNL director; Doug East, HPCIC; Betsy Cantwell, or Email: [email protected] LLNL; Jeff Wolf, HPCIC and Emily Keir, British consulate.

Authors

M. Gleaves and D.Moss STFC Daresbury Laboratory

Figure 2. From left to right: Michael Gleaves, References Adrian Wander, John Womersley, Jim Crilly, [1] https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/32499/12- Sue Smith, Jim Sexton and Massimo Noro. 517-strategic-vision-for-uk-e-infrastructure.pdf [2] Dr Cynthia McIntyre Senior Vice President, Council on competitiveness NA 2009 [3] http://www.stfc.ac.uk/hartree/43872.aspx

SCD Highlights 2012 - 2013 7 CSE_2013_V3_Layout 1 16/10/2013 13:23 Page 8

SCD Highlights 2012 - 2013

LHC Data Processing at the GRIDPP Tier-1 - High Throughput Computing in Search of the Higgs Particle

Scientific Computing Department 8 CSE_2013_V3_Layout 1 16/10/2013 13:23 Page 9

Introduction and analysed on the 10,000 core The Future batch computing farm before being On the 16th February 2013 at 08:25 The LHC is being upgraded as it written to its SL8500 tape robot for the CERN Large Hadron Collider prepares to restart data taking for long term storage. (LHC) dumped its beams for the very “Run 2” in early 2015. When it last time during “Run 1” data taking. Stored and reprocessed data is restarts it will deliver higher energy Since March 2010 CERN has delivered to any of the 15 Tier-2 beams than ever before and data delivered a deluge of data to its centres in the UK via the national taking will then continue at much twelve Tier-1 computing centres JANET network and may be higher rates than we have seen to which provide the processing redistributed to any other Tier-1 date. Data rates will then continue to backbone supporting the LHC centre around the world via the OPN increase until the next long experiments’ search for the Higgs to ensure resilient copies are held. shutdown in 2017. The Tier-1 is Particle and other new physics. The Data processing and shipping rates taking the opportunity of this year’s GRIDPP Tier-1 at RAL is one of the have been huge; in 2012 alone the slightly quieter times to push forward LHC’s largest computing centres Tier-1 moved 218 petabytes of data essential developments in order to globally, delivering a highly internally (about the equivalent of meet the challenges these higher responsive, high availability, Grid playing 40 million DVDs). data rates will bring. In particular, Computing service. By the end of this year the network backbone and Run 1, disk and tape storage capacity Physics Results robotics are being upgraded for at the Tier-1 had each reached 10 increased capacity and bandwidth LHC Run 1 has of course been hugely Petabytes and our Tier-1 service and the disk system is planned to successful. On the 14th March 2013, uptime met the challenging 98% grow annually until by the end of CERN announced that the ATLAS and target for the run. Run 2 in 2017 it will exceed 20 CMS collaborations had found a petabytes. By providing more particle whose properties were The Computing Challenge generalised “cloud” interfaces to the consistent with those predicted for service the Tier-1 team hope to be Data arrives from CERN via a the Higgs Particle; although its exact able to make this high throughput dedicated 10Gb/s Optical Private nature remains to be understood computing service more accessible Network (OPN), before being placed when further data is collected when for other large projects in the future. on one of the Tier-1’s 500 disk the LHC restarts in 2015. servers. Here it can be processed

Figure 1. Figure 2.

Author

A. Sansum, STFC Rutherford Appleton Laboratory

SCD Highlights 2012 - 2013 9 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 10

SCD Highlights 2012 - 2013

Agile Data Services

Scientific Computing Department 10 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 11

Figure 1. Figure 2. The data services group runs a variety Currently, we are working on a new ready for WLCG usage, so the of science data services, to support the logging mechanism using Hadoop and decision was taken to remain with facilities’ work and to enable STFC to HBASE, as well as a WebDAV interface. CASTOR but continue to monitor participate in important large scale other solutions with a view to projects – big instruments with lots of We have also been the first European moving in the future. data, or sensor networks delivering a lot Tier 1 to implement xrootd between them. At a lower level, redirection for both ATLAS and CMS StorageD is one of the higher level there are data storage services based experiments. xroot is a data access services providing features “on top of” on CERN’s CASTOR, DMF, and databases protocol used widely in high energy CASTOR. StorageD provides a scalable, (Oracle and MySQL); at a higher level, physics, and the purpose of this high-volume pipe between the user’s we have services which move, copy, redirection is to ensure that the data and the storage (supporting aggregate, and preserve data. protocol automatically picks up multiple storage systems.) Finally, services are provided to other other replicas of a file if one site or customers: e.g. other research file is, possibly temporarily, The data moves through a series of councils, or other research groups at unavailable. This is a major “defined states” until the data is universities. In addition, the group is achievement and we continue to stored at its destination. participating in European (FP7) data take a leading role in the evolution, Features include: projects such as EUDAT, contributing deployment and testing of services in • Automatic aggregation of small expertise and services, and benefiting collaboration with the experiments. files according to defined rules to from the outcome of the projects. speed data going to and, more During the past year we have also importantly, from tape; Full history Storage Systems been looking at alternatives to of data presented for ingestion, The group to run the CASTOR storage CASTOR for disk-only storage. While successful or not, and real-time system for both WLCG, the global there is currently no significant issue tracking of where every file is in LHC computing grid, and STFC’s with disk-only performance in the system, complete with history; facilities. During the year we have CASTOR, most deployments of • Robust and simple process to performed several minor and one CASTOR are now purely as a front recover from errors by rolling back major upgrade of CASTOR which end to tape systems, and most to a previous state, a process which have all gone smoothly. Careful development effort focuses on tape. can be automated where the error testing with the pre-production and We evaluated a number of solutions is understood. certification systems proved including HDFS, CEPH, EOS, and invaluable to minimising the impact Lustre, evaluating against WLCG The system utilises a client next to of the upgrade: CASTOR is a rather requirements, simplicity of the user data, talking to a StorageD complicated system, and no two deployment, ease of debugging, and “commodity server”, which in turn CASTOR sites are identical. performance compared to CASTOR talks to the storage resource. In our The advantages of being close to under stress test conditions. At the setup, a single server is capable of CERN in terms of the deployed current time no system currently moving 25TB of data to tape in a day. release are evident, with significant gives significant performance Once written to tape, data is often support when required. enhancements over CASTOR and the copied back, at measured speed of We contribute to the international two most likely candidates (CEPH up to 95% of the speed being copied CASTOR community as well: and HDFS) are not yet production to tape – the limiting factor seems to

SCD Highlights 2012 - 2013 11 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 12

SCD Highlights 2012 - 2013

Figure 3.

be mainly the receiving disk. available copy, so data safety is a diverse user communities. To provide scalability, each server major consideration. These data sets Within EUDAT, the group is involved in can have as many clients as is felt typically lie in the range 30GB-30TB, the assessment of the scalability of the necessary, and each system can with file numbers ranging from tens system and providing an operational include as many servers as necessary. to ten million. node within the EUDAT framework. Specifically we have been working “Elastic Tape” for JASMIN/CEMS The data for archiving is selected by within the metadata and ‘research data Building on the StorageD platform, CEDA, based on web services which store’ task forces looking at the a service called “Elastic Tape” will be allow them to compare the current scalability of the chosen technologies offered as part of the user storage archive with their holdings. New data (CKAN and Invenio, respectively). allocation in JASMIN2. It provides cost- is released for ingestion into In addition, we have taken on a role in effective high volume “scratch space” StorageD when the data scientist the documentation team, reviewing storage, allowing Group Workspace approves the data for archiving. both user and system level Managers to swap data out to tape The initial SLA allowed for CEDA to documentation. Finally, we are leading when they want to free up high backup 2PB of data, and this is in the the task force on the use of existing performance disk. process of being extended as the authentication and authorisation current limit is neared. infrastructures to provide federated These Group Workspaces range in size identity management. Diamond Light Source from terabytes up to over a petabyte, Data Archiving as required by the climate datasets As an EUDAT data site, we have set with which the groups work. The archive for DLS is also based on up a preproduction instance of StorageD. The plot here shows the iRODS which is currently still CEDA (Centre for Environmental historical plot of the holdings. undergoing certification within the Data Archiving) project. In terms at looking at EUDAT SCD provides a “backup of last scalability of the Common Data resort” for CEDA’s archived data. This EUDAT is a European FP7 project which Infrastructure (CDI), we have recently data is not only scientifically aims at building a data e Infrastructure, investigated xroot and the DDN WOS important, some of it is also the only to support other EU projects from very box as alternatives, leading to an

Scientific Computing Department 12 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 13

update to the report. Based on this, while xroot meets several of the requirements and is well supported, the effort to meet all requirements, such as having rules based replication and using a Persistent-ID service, are significant, and outside the scope of EUDAT. The DDN WOS boxes do meet all the requirements and would provide a good alternative but would require significant investment from sites. However, if a suitable collaborative project comes up within the department, these would represent a very good investment.

Digital Preservation and Access Over the last two years, the group has become increasingly involved in data preservation Figure 4. services. We run the ISIS data backup store using Tessella’s Safety Deposity Box (SDB) product, which provides means of preserving data in ways that ensure it will always remain accessible. The group worked closely with Tessella in developing the SDB service as production quality based on open source software. The service moved into full production in 2012 and has become significantly more reliable over the past year with good feedback from the user community. The group is currently looking for other users for SDB who can also benefit from digital preservation – as well as building expertise in new ways of accessing data by making use of Persistent Identifiers. In turn, we are providing feedback back to the SDB community – most of them are archives or libraries, or commercial organisations with archiving requirements, and we are relatively unique Figure 5. in being a science archive.

Authors

J. Jensen, S. de Witt, K. O’Neill, M. Viljoen, STFC Rutherford Appleton Laboratory

SCD Highlights 2012 - 2013 13 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 14

SCD Highlights 2012 - 2013

Supporting Laser Science by Keeping Data Safe

Scientific Computing Department 14 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 15

Over the last decade, SCD has been the system can working with the Central Laser process the data Facility, as well as ISIS and DLS, more quickly than to provide state of the art computing the instrument to support their work. One of the can generate it; most active areas is data and to implement management, which covers the life the data of data from the moment of capture management by a scientific instrument until the policy. last time when it is used by a Data management researcher. The purpose of data policies provide a management is to store and set of access rules catalogue data so that the data are for the data and secure and can be easily retrieved for may include analysis. This is essential so that embargo periods scientists can carry out their research when access is knowing that their data is kept safe, restricted to a Figure 1. secure and easily accessible. particular group of researchers. Consequently, SCD has been The Astra-Gemini Laser collects These requirements, especially the developing a new data management “shot data”, with images and traces volume of data and speed of its system for Astra-Gemini using ICAT. of the laser’s performance, and collection, were challenging, stretched The work with Astra-Gemini has “environmental data”, recording the the current system and called for an gone through several phases. During environmental conditions in which innovative data management solution. 2010, the conceptual design, the the laser is operating. These are vital proof of concept, and the detailed measurements to determine the SCD also wanted to bring the CLF requirements were completed. laser’s performance, and allow the system into line with other During 2011, the new system was operation of the laser to be developments, with a set of common developed and tested using existing developed and tuned. Past data tools shared with other facilities. This data. During 2012, the new system needs to be kept so that new data makes supporting the tools and was run alongside the older system can be compared to old and the sharing the data easier and more showing that the new system is performance of the laser analysed cost-effective. SCD has developed a reliable and accurate. Since the over time. core component for data beginning of 2013, the new system has management called the Information been in continuous use and the earlier In 2012, SCD completed the Catalogue (ICAT), which is in use at system has been decommissioned. development of a new data DLS, ISIS and CLF. ICAT is also in use processing system for the Central at ILL (France) and SNS (USA), and When writing the specifications for Laser Facility. The system takes data experimental ICATs are in the software, two requirements were from the laser, stores and catalogues development at ten large European dominant: to avoid making changes it; the data are then available for laboratories in Germany, France, to the software on Astra-Gemini; and further processing. The system also Italy, Spain, Switzerland, Sweden and to provide a tool to view and analyse provides tools which allow the the United Kingdom. It makes sense the data, with comprehensive operators and users of Astra-Gemini to develop software in a functionality for the user to interact to monitor the status of the laser. collaborative environment involving with the data. The requirements other laboratories to share code document contained about 100 The work with the Astra-Gemini Laser wherever possible between requirements for data management, started in 2006 with an earlier system collaborating projects.Thus ICAT has and about 100 for data analysis. introduced in 2008. Since then, new been released as open-source, The resulting implementation has requirements have emerged: to hosted as a Google Code two main components: the CLF ICAT provide data to a researcher from collaborative project. Chain (CIC) and eCAT. anywhere in the world using a standard web browser; to ensure that

SCD Highlights 2012 - 2013 15 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 16

SCD Highlights 2012 - 2013

The data management pipeline is platform, application begins with raw data files (XML files, server and database custom DAT files, images) being independent and could copied from the Data Acquisition be moved to other computers in CLF onto the operating systems, production machine in SCD. The CIC application servers and reads the files, extracting metadata databases. on the parameters in the files, The system holds all of which it writes directly into ICAT. the diagnostic data Copies of trace and image data files from January 2008 to are passed directly to the ICAT Data the present day. So far Store (IDS), which registers the file in 2.5 million ICAT and creates two copies. One environmental data copy is kept locally for immediate datasets have been retrieval upon request and a second catalogued, along with copy is moved with a copy of the 150,000 shot data original raw data files to DMF, a datasets containing a hierarchical storage management total of over 4 million system supported by SCD, image and trace data which provides archival storage. files. Almost 50,000 Thus a copy of all the original data daily data files are Figure 2. files is safely archived. processed every day, in addition to shot foundation for the service. Two web based applications have been data when the laser is in operation. Intensive testing on large volumes of created to enable the data to be Each shot produces around 30 image legacy data, and extensive periods of displayed in a web browser. CLF’s own and trace data files and the system is parallel running to verify the “Penguin” application provides an up- designed to cope with shots being operation of the system, to-the-minute status display to the fired at 20 second intervals. As well have ensured the reliability and operators whilst the laser is in as being archived for safe storage, all robustness of the result. operation. SCD has also developed the of this data is available online via the “eCAT” application which provides eCAT web application and is The system delivered to CLF for search and filtering tools for the data, protected by the relevant access managing the Astra Gemini data has along with tools for analysis of images permissions. been successful. The CIC has been and trace data. Both tools use ICAT to able to cope with the speed and locate the required datasets and then The success of the project has been volume of data and eCAT has use the IDS to access image and trace the result of the close working of a provided the necessary data files.In terms of architecture, CIC is team with members from both CLF responsiveness to study those data. a standalone multi-threaded Java and SCD. Early in the project, a The system also provides the application, ICAT is a Java Enterprise complete and detailed set of flexibility and extensibility to roll-out application running as a web service, requirements was agreed, and a well- support to the other lasers within IDS is a lightweight web service, and defined set of innovative CLF. Thus the system allows the eCAT is a Java Enterprise application components designed and researchers in CLF to concentrate on created using the Google Web Toolkit implemented. The system uses the science of the laser, safe in the widgets to provide a rich user interface. extensively the core computing knowledge that their data is being ICAT, IDS and eCAT run on the Glassfish infrastructure supplied by SCD, looked after properly, and with the Application Server and use the Oracle including storage, monitoring and right tools for its analysis. databases provided by the SCD system support, to provide a sound Database Services team. The software

Author

A. Mills, K. Phipps, S. Fisher, V. Marshall, B. Matthews, STFC Rutherford Appleton Laboratory

Scientific Computing Department 16 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 17

Behind the scenes of the Hartree Centre - A hidden world of plant and infrastructure

SCD Highlights 2012 - 2013 17 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 18

SCD Highlights 2012 - 2013

Introduction them can remove 2.5MW of heat. uninterruptible power supplies (UPS) We have two loops because one of in preparation for the arrival of the The Hartree Centre high- them is used to cool the BlueGene/Q, Hartree Centre compute equipment. performance computing and storage which requires pure water. This loop We have four 500KVA battery UPS facilities are fully described can dispel up to 1.5MW of heat. The running in a full-parallel elsewhere[1]. In summary, they main loop has an antifreeze additive configuration delivering 2MW for comprise: to protect against winter compute and storage systems. 1. A seven-rack IBM iDataplex temperatures, and supplies the compute cluster, based on Intel We also have a single 250kVA remaining cooling capacity up to Sandybridge processors. This battery UPS for most of the water 2.5MW. This loop is used by the system has a theoretical cooling systems: pumps, cooling iDataplex system. performance of 200Tflop/s and towers, air-handling units and the features water-cooled rear doors. At time of writing, the BlueGene/Q control system. The water chillers loop is running at 22degC, and the themselves are not UPS-backed. 2. A seven-rack IBM BlueGene/Q main loop at 16.5degC, although we supercomputer. This system has a All the UPS systems are specified to have plans to raise the latter theoretical performance of provide sufficient power to the temperature to improve efficiency. 1.2Pflop/s and is 90% water- compute and storage systems to cooled. Cooling of the water in the loops is allow 11 minutes operation at full primarily by large chillers in the plant load. In the event of a major supply 3. Six racks of DataDirect Networks compound behind the server rooms. failure, this is enough time to allow (DDN) high-performance disk We require three chillers to be in us to gracefully power down all storage, with associated operation to supply the full 2.5MW critical systems. filesystem and management of cooling; however we actually have servers, providing almost 6PB of Electrical supplies to the Hartree four in order to provide resilience. capacity. Centre systems are linked to These units can operate throughout Emergency Power-Off (EPO) circuits 4. An IBM TS3500 tape library that the year, in ambient temperatures in the building, so that in the event holds 10,000 tapes, providing down to -18degC. of an emergency a simple button- 15PB capacity. However, when outside temperatures push will kill all the systems There is a considerable amount of permit, most of the cooling effect is immediately. plant and infrastructure required to provided by three cooling towers We are often asked how much power support such systems, and this article which use low power fans to draw the BlueGene/Q actually draws. It attempts to highlight this normally ambient air past vanes through requires about the same amount of unseen area of high-performance which the water is drawn. Typically electrical power to run as the heat computing. these towers provide enough cooling that is dissipated from it; so for the chillers to lie idle from approximately 450kW under high Cooling autumn through to spring. The load. Similarly, the iDataplex advantage, of course, is that the Both of the large compute systems requires approximately 100kW. cooling towers and their fans require are in some way water-cooled much less electrical power than the because the amount of heat Power Usage Effectiveness chiller units. generated is such that air-cooling Power Usage Effectiveness (PUE) is a alone is impractical. The iDataplex commonly-used measure of how system typically emits approximately Electrical much electrical power is used to 100kW of heat under high user load, The Hartree Centre power actually run the computer systems although we did see up to 180kW distribution is based on infrastructure themselves, when compared against during testing at installation time. originally implemented for the HPCx the total power drawn which The BlueGene/Q currently generates national supercomputing facility. We includes the cooling system and UPS approximately 450kW of heat under have two 2.2MW 11kV-415V losses[2]. It is calculated as [total high user load and we saw up to transformers feeding a large facility power/IT equipment power]. 620kW during testing. distribution board in the plant Older data centres could easily have Water cooling is provided by two compound. The distribution system a PUE in excess of 2.0. Modern underfloor water loops that between was modified with the addition of

Scientific Computing Department 18 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 19

of all pumps and fan The external plant compound has to motors so that the provide rainwater drainage, but also correct cooling is must be capable of preventing the applied, without escape of antifreeze-doped water incurring extra power into the drains, should a leak occur. costs in running at Both cooling loops incorporate greater capacity than pressure sensors which trigger the required. The BMS BMS to close a safety shut-off valve effectively balances the if a major water pipe burst occurs. “free cooling” from the The contaminated water is then held cooling towers with within the bunded compound until chiller operation when appropriate disposal can be arranged. required. It also reports The BMS will also stop all pumps and on power consumption close all valves to further contain at key locations in the and isolate any major leak. distribution network Computer systems UPS switchgear. and logs all data to disk Security for subsequent analysis. commercial data centres typically The High Performance Systems aim to run at values in the range Group takes the physical (and logical) Other systems 1.4-1.7. security of our customers’ data very Our first set of power usage data, Fire detection is provided by a seriously. For that reason, there are covering a winter through to a network of interlinked sensors a number of security systems in place summer season, gives a figure of throughout the machine rooms. In around the Hartree Centre machine 1.2. This is an excellent start, which order to trigger a fire alarm, at least and plant rooms. We cannot provide matches the design goal, and we two sensors must be triggered. This details, for obvious reasons, but it is have plans to improve this figure avoids false alarms from individual safe to say that there are a wide through tuning of operational faulty sensors. variety of physical barriers, electronics and foot patrols parameters. We also have water leak detection protecting our equipment. under the machine room raised floor BMS which triggers an alarm in the event We have a Building Management of a leak or burst of the water System (BMS) which monitors all cooling pipes. The under-floor critical temperature and flow surface is specially prepared with parameters within the two water sealant so that it can act as a tank to cooling loops. It controls the speed contain any water spillage.

Authors

D. Cable and S. Hill, STFC Daresbury Laboratory

References

[1] http://community.hartree.stfc.ac.uk/wiki/site/admin/resources.html [2] http://en.wikipedia.org/wiki/Power_usage_effectiveness

SCD Highlights 2012 - 2013 19 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 20

SCD Highlights 2012 - 2013

Exploiting Multi-core Processors for Scientific Applications Using Hybrid MPI-OpenMP

Scientific Computing Department 20 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 21

Introduction memory footprint of the application Fluidity/ICOM: Developing hybrid may be decreased in comparison MPI/OpenMP parallelism for next The current trend for supercomputer with a pure MPI approach. Secondly, generation ocean modelling architectures is towards systems with in grid based programs, the memory software a large number of energy efficient footprint is further decreased computational units (cores). They are Fluidity-ICOM is built on top of through the removal of ‘halo’ aggregated in share memory nodes Fluidity1, an adaptive unstructured regions, which would otherwise be that, in full generality, could contain finite element code for required within the node with pure multiple processors and accelerators, computational fluid dynamics. It MPI. The total size of the mesh halo each with multiple cores that share consists of a three-dimensional non- increases with number of partitions caches at different levels and are hydrostatic parallel multiscale ocean (i.e. number of MPI tasks). It can be connected to multiple memory model, which implements various shown empirically that the size of controllers with affinities to subsets finite element and finite volume the vertex halo in a linear tetrahedral of cores. Decreasing memory, discretisation methods on mesh grows as O(P1.5), where P is the bandwidth per core and non-uniform unstructured anisotropic adaptive number of partitions. Finally, only memory access times are critical meshes so that a very wide range of one task per node will be involved in hardware features that must be coupled solution structures may be I/O (in contrast to the pure MPI case handled to gain efficient parallel accurately and efficiently where potentially 32 tasks per performance. This situation imposes represented in a single numerical compute node could be performing a strong evolutionary pressure on simulation without the need for I/O). This significantly reduces the numerical algorithms and coding nested grids. It is used in a number of number of meta data operations on techniques, where execution speed different scientific areas including the file system at large process increases only if applications utilise geophysical fluid dynamics, counts. efficiently the available memory and computational fluid dynamics, ocean bandwidth in order to keep the In this report we highlight our modelling and mantle convection. computing cores busy. progress in implementing a hybrid Fluidity-ICOM uses state-of-the-art OpenMP-MPI version of several and standardised 3rd party software For these supercomputer software packages: Fluidity-ICOM, components whenever possible. For architectures a ‘hybrid’ or ‘mixed- DL_MG and RAD. Overall we show example, PETSc2 is used for solving mode’ programming approach can be that the hybrid codes can use more sparse linear systems while Zoltan3 is used. Threaded (OpenMP) parallelism computational power with better used for many critical parallel data- is exploited at the node level and MPI efficiency if the coupling between management services both of which is used for inter-node the MPI and OpenMP is implemented have compatible open source communications. Portability across carefully. In particular, when using a licenses. Fluidity-ICOM is coupled to different HPC systems is also very relatively large number of threads, a mesh optimisation library allowing important for application software using non-blocking algorithms and for dynamic mesh adaptivity. packages, and a directives-based libraries with proper NUMA Previous performance analyses [1] approach often expresses parallelism optimization are paramount to have already shown that the two in a highly portable manner. hybrid performance scalability. dominant simulation costs are sparse Directives-based OpenMP is All results shown here were matrix assembly 30%-40% of total expanding its scope to embedded collected from the ‘HECToR’ UK computation, and solving the sparse systems and accelerators and national supercomputing service linear systems defined by these therefore offers potential capabilities Cray XE6 machine. All programs equations. Therefore, the sparse to use the same code base to explore were compiled using the Cray matrix assembly kernels and the accelerator and non-accelerator compiler. spare linear solvers are the most enabled systems. important components to be The hybrid approach may yield other parallelised using OpenMP. significant benefits. First of all, the

1 http://amcg.ese.ic.ac.uk/index.php?title=Fluidity 2 http://www.mcs.anl.gov/petsc 3 http://www.cs.sandia.gov/Zoltan

SCD Highlights 2012 - 2013 21 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 22

SCD Highlights 2012 - 2013

The non-blocking finite element ceiling (number_of_elements/number_ has to be paid to ccNUMA matrix assembly approach of_threads) and assigns a thread to architectures where data locality is each separate chunk. Within this loop particularly important to achieve good Non-blocking finite element matrix an element is only assembled into the intra-node scaling characteristics. assembly can be realised through well- matrix if it has the same colour as the Figure 2 shows the total Fluidity-ICOM established graph colouring colour iteration. run-time and parallel efficiency. Clearly techniques. This is implemented by pure MPI runs faster up to 2048 cores. first forming a graph, where the nodes Figure 1 shows that matrix assembly However, due to the halo size of the graph correspond to mesh scales well up to 32768 cores. The increasing exponentially with number elements, and the edges of the graph speedup of the mixed-mode of MPI tasks, the cost of MPI define data dependencies arising from implementations is 107.1 compared communication becomes dominant the matrix assembly between with 99.3 for pure MPI when using 256 from 4096 cores onwards, where the elements. Each colour then defines an nodes (8192 cores). This is due to the mixed mode begins to outperform the independent set of elements whose use of local assembly, which makes pure MPI version. term can be added to the global matrix this part of the code essentially a local concurrently. This approach removes process. The hybrid mode performs With a full implementation of mixed data contention, so called critical slightly faster than pure MPI, which mode MPI/OpenMP and building upon sections in OpenMP, allowing very can scale well up to 32768 cores. The several years of optimization projects efficient parallelisation. details can be found in the references [3, 4]. [1][2], Fluidity-ICOM can now run well above 32K cores job, which offers To parallelise matrix assembly using The sparse linear systems defined by Fluidity-ICOM the capability to solve graph colouring techniques [2], a loop the underlying partial differential the "grand-challenge" problems. over colours is first added around the equations are solved using threaded main assembly loop. The main PETSc and HYPRE is also utilised as a assembly loop over elements is threaded preconditioner through the parallelised using the OpenMP parallel PETSc interface. Since unstructured directives with a static schedule. This finite element codes are well known to divides the loop into chunks of size be memory bound, particular attention

Figure 1: Matrix Assembly Performance Comparison on HECToR XE6-Interlagos.

Scientific Computing Department 22 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 23

Figure 2: Strong scaling results for the whole Fluidity-ICOM up to 256 XE6 nodes (8192 cores). All hybrid modes use 4 MPI ranks per node and 8 threads per rank.

DL_MG: a hybrid parallel multigrid of problems speed and scalability are The MPI parallelism is used for data solver for Poisson-Boltzmann fundamental requirements for a decomposition. Here the cuboid equation suitable multigrid solver, as the global grid is distributed amongst electrostatic potential needs to be MPI ranks using a 3D topology. As Introduction computed repeatedly during MD the coarse grids are derived from the Multigrid solvers combine highly simulation steps or quantum fine grids by removing the points flexible techniques for optimal, or mechanical iterative solution steps. with even coordinates in all near optimal, solutions of large sets directions no data is exchanged of linear and non-linear equations, DL_MG description between MPI ranks during inter-grid typically derived from several classes transfers. DL_MG is a multigrid solver for PE of partial differential equations [5]. and PBE under development at SCD The number of MPI ranks can vary Another important feature for as a HECToR dCSE project with the across multigrid levels in the today’s architectures is that multigrid main purpose to support CASTEP and following way: if, at a coarsening can use algorithms with multiple ONETEP quantum density functional step a rank gets zero numbers of grid level of parallelism, e.g.: the grid codes for calculations which include points in any direction it is left over which the solution is sought can solvent effects [6]. inactive in the current level and the be partition to MPI tasks which subsequent levels. MPI DL_MG main components were exchange only halos with the communication at each level is selected from the standard choices topological neighbours, the update performed in a separate recommended in literature. operation for smoothing, restriction communicator that includes only the Coarsening is achieved by doubling and prolongation can be parallelised active ranks. Domain halos are the lattice constant in all dimensions, further with OpenMP threads. exchanged with non-blocking send whilst smoothing uses a Gauss-Seidel Amongst the equations of interest receives. red-black method. Inter-grid transfers that can be solved with multigrid are performed with half-weight OpenMP is implemented using only techniques Poisson Equation (PE) and restriction and bilinear interpolation. one parallel region that covers the V Poisson-Boltzmann Equation (PBE) With the above components one can cycles loop. The local domain is have a central position. For example, build a close to optimal solver in the decomposed in thread blocks with an in classical and quantum mechanical case of PE and PBE provided that the algorithm that ensures equal work simulations the solution of PE or PBE models used for the permittivity for all threads even in the case of are needed to model systems with and charge density are smooth very thin local domains (situation electrostatic interaction. For this kind and without strong anisotropies [5]. met in ONETEP). Block sizes can be

SCD Highlights 2012 - 2013 23 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 24

SCD Highlights 2012 - 2013

tuned also for better cache Figure 3 shows total solver speed RAD: R-matrix Inner Region code utilisation. First touch policy is used (1/time) for 3 MPI grid partitions to We show results for the hybrid to ensure optimal memory access on which OpenMP threads are added. MPI/OpenMP inner region ‘RAD’ code NUMA architectures. One can see that OpenMP scaling is which has been developed as part of very good up to 4 threads and MPI communication inside the a recent HECToR dCSE project [8]. This acceptable with 8 threads and it OpenMP region is handled by a application code is part of a suite of stays positive across the whole master thread (the so called programs used to calculate electron- NUMA node. Also one can see that funnelled mode), mainly for reasons atom scattering data, which is OpenMP scales better if the local of portability. Data transfers between essential in the analysis of important domains are larger. MPI buffers and halos are performed physical phenomena in many using one thread per local grid side scientific and technological areas, with the help of “single” directives. A DL_MG internal timer was used to such as the interpretation of measure the time spent in the astrophysical data, diagnostics of Scaling performance multigrid components (computation impurities in nuclear fusion reactors We present scaling performance data and MPI communication) at each and the development of safer of DL_MG for a representative multigrid level. Figure 4 shows the alternatives to mercury vapour Poisson problem extracted from a scaling of computation and lighting. In particular, the codes are solvent computation done currently communication for the smoother used on HECToR by the UK-RAMP with ONETEP [7]. This application uses component, which takes the largest project and the ‘Atoms for 1D MPI domain decomposition of its share of the DL_MG run time. It is Astrophysics’ project. global grid, furthermore the worth noticing that the The original code had been thickness of the local domain cannot communication time scales as well parallelised for shared memory be reduced to size 1 without with the number of threads inside of architectures only and is therefore significant performance loss because a NUMA node. This is due to the labelled ‘Pure OpenMP’ in the figure of high order discretisation needed reduced data transfer inside the below. Consequently, the problem for certain derivatives in defect compute node. It is also to be noted size was limited by the availability of correction computation. In this that the degradation of the total memory on a single node (32GB). context the OpenMP parallelism speed scaling in Figure 3 is correlated Other performance results refer to inside each MPI domain remains one with the loss of scaling for the the new hybrid MPI/OpenMP code of the options to increase communication sectors in Figure 4. following the introduction of the MPI computation parallelism. This points to where one has to look to improve further OpenMP communications, where problem size scalability. is now only limited by the memory available from the whole machine

Figure 3. DL_MG speedup vs number of threads for 3 MPI domain Figure 4. Time spent in smoother components of DL_MG for the top partitions (32, 64 and 128 ranks). Global grid size 449x545x609. 3 levels (highest has the finest grid). In the key the digit represent Runs were performed on HECToR Cray XE6 with default Cray the multigrid level, P stands for compute time (Processing), T for compiler optimisation. communication (Transfer). Runs were performed on HECToR Cray XE6 with default Cray compiler optimisation.

Scientific Computing Department 24 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 25

(90 TB). The main code The results presented in Figure 5 The iterations were assigned to MPI developments for implementing this show that the introduction of the tasks in round-robin fashion and new level of parallelization in RAD MPI tasks, complementary to the while the overall time to completion involve (i) distributing input OpenMP thread-based was stable for repeated testing, the configurations to all MPI Tasks at the parallelization, has significantly load-balancing between threads start of the run, (ii) the replacement improved performance within one becomes progressively worse: from of temporary intermediate files with node, and the performance scales about 10% difference for 2 MPI memory-based storage and (iii) a acceptably to four nodes, whilst tasks, to some tasks taking nearly distribution of outer loop angular increasing memory availability per twice as long as others for 16 MPI momentum l values amongst MPI run from 32GB to 128GB. tasks. These timing differences tasks for both basis function and Performance analyses have shown reflect the different computational radial integral calculations. Note that that it becomes more difficult to loads associated with different l this option not only allows for use of load-balance efficiently MPI tasks to values assigned to the different MPI more than one node, but also can iterations for this dataset as the tasks. improve overall performance by number of MPI tasks increases and These issues are beining addressed determining optimal MPI task/ this is affecting parallel scaling by the introduction of passive OpenMP thread combinations within efficiency at higher core counts. remote memory access, which has a node. proved to be very efficient at removing load imbalance in the related R-matrix code ANG.

Figure 5. Performance results from RAD mixed-mode tests (8 iterations). Runs performed on Cray XE6 with 32 cores per shared memory node, –O3 level of optimisation (Cray compiler).

Authors

X. Guo, L. Anton, A. Sunderland, STFC Daresbury Laboratory

Acknowledgements and References

[1] Xiaohu Guo, G.Gorman, M. Ashworth, S. Kramer, M. Piggott, A. Sunderland, “High performance computing driven software development for next- generation modelling of the Worlds oceans”, Cray User Group 2010: Simulation Comes of Age (CUG2010), Edinburgh, UK, 24th-27th May 2010 [2] Xiaohu Guo, G. Gorman, M Ashworth, A. Sunderland, “Developing hybrid OpenMP/MPI parallelism for Fluidity-ICOM - next generation geophysical fluid modelling technology”, Cray User Group 2012: Greengineering the Future (CUG2012), Stuttgart, Germany, 29th April-3rd May 2012 [3] Xiaohu Guo, Gerard Gorman, Michael Lange, Lawrence Mitchell, Michele Weiland, “Exploring the Thread-level Parallelisms for the Next Generation Geophysical Fluid Modelling Framework Fluidity-ICOM”, Procedia Engineering, 2013, vol: 6, page: 251-257 [4] Xiaohu Guo, Gerard Gorman, Michael Lange, Lawrence Mitchell, Michele Weiland, “Developing the multi-level parallelisms for Fluidity-ICOM -- Paving the way to exascale for the next generation geophysical fluid modelling technology”, submitted to the Advances in Engineering Software Journal [5] “Multigrid”, U. Trottenberg, C. W. Oosterlee, A. Shuller, Academic Press 2001. [6] “Large-Scale DFT Calculation in Implicit Solvent—A Case Study on the T4 Lysozyme L99A/M102Q Protein”, Jacek Dziedic et al, International Journal of Quantum Chemistry, 2012. [7] Jacek Dziedic, private communication; http://www2.tcm.phy.cam.ac.uk/onetep. [8] “Combined-Multicore Parallelism for the UK electron-atom scattering Inner Region R-matrix codes on HECToR”, C.J. Noble, A.G. Sunderland, M. Plummer, dCSE Report, Hector Website, http://www.hector.ac.uk/cse/distributedcse/reports/prmat2/

SCD Highlights 2012 - 2013 25 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 26

SCD Highlights 2012 - 2013

Computational Biology goes HPC

Scientific Computing Department 26 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 27

Biology is facing a revolution as over the traditional microarray memory requirement with the rule of improvements in instrumentation are technology in that it can identify novel thumb being 1 GB of RAM for every leading to an explosion in the transcripts and has a larger dynamic million reads, see Figure 1a. amount of experimental data range for expression levels. Datasets of several hundred million generated, data which require However, the data analysis can be reads are not uncommon, and so the interpretation and linking together. more challenging due to alternative memory requirement can become a Historically, much of the available splicing present in higher organisms limiting factor. We are exploring software consists of small programs leadling to multiple transcripts for ScaleMP software [2] as one possible and scripts suitable for desktop each gene, which need to solution. ScaleMP can aggregate computing. While these programs be distinguished. memory from a large number of run well, and are user friendly for the compute nodes into a single lab biologist, they are not designed We are currently working to addressable image. In Memory to cope with the large datasets being understand the computational Expansion mode, the processors on produced today. There is an urgent requirements of RNA-Seq, and all but one node are disabled with need for a next generation of determine how HPC resources such only the memory of these nodes software, appropriate to the data- as those provided by the Hartree being used. orientated research environment that Centre can help. Scientific drivers now exists. come from collaborators at GraphFromFasta and Rothamsted Research [1] and the ReadsToTranscripts in the Chrysalis Within this context, the computational University of Liverpool. Rothamsted module are typically the most time- biology group in SCD is looking at a are working on the wheat consuming steps. Figure 1b shows an range of advanced computing transcriptome, as part of their 20:20 example of an input dataset solutions for areas of modern biology. Wheat® initiative to increase wheat consisting of ~130 millions RNA-seq As an activity of the Hartree Centre, productivity to yield 20 tonnes per reads (from sugar beet crop) we are looking at next generation hectare in 20 years. In the first processed on a single node with 16 sequencing. The rapid advancement of instance, we are working with the cores and 256 GB RAM, for which sequencing technologies and the Trinity RNA-seq de novo assembler, the overall runtime was ~2.5 days. tremendous decrease in the cost of which consists of three independent We are beginning to look at biological sequencing has brought modules applied sequentially: parallelising parts of Chrysalis for a large-scale projects within the grasp of Inchworm, Chrysalis and Butterfly. distributed memory architecture so more labs. Sequencing is involved in a While some parts of the pipeline that the compute intensive stages wide range of techniques, with the have been parallelised (via OpenMP can be distributed across large well-known genome assembly being or task farming of parallel processes), compute clusters. only one. One major application is it is assumed to be run on a shared RNA-Seq in which RNA from expressed memory architecture. Finally, many steps of the Trinity genes is sequenced, in order to both The runtime and memory pipeline produce large numbers of identify gene products and to estimate characteristics of Trinity are ASCII-based small files as their abundance in different tissues heterogenous, as illustrated in intermediate outputs. and samples. RNA-Seq has advantages Figure 1. The first module has a large

Figure 1: Memory and CPU characteristics of the Trinity pipeline. a) The physical memory usage as a function of runtime. b) The number of processes/threads as a function of runtime. c) Legend for the different stages of Trinity. Figures were produced by the collectl utility.

SCD Highlights 2012 - 2013 27 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 28

SCD Highlights 2012 - 2013

Figure 2: The visualisation wall at Daresbury, showing imaging at different scales. From left to right: electron microscopy of a bacterial needle, electron tomography of HIV virions, and a crystallographic model of the kinase domain of EGFR.

Although recent efforts by the Trinity millions or billions of nucleotides. recently set up such as web service, developers are reducing the i/o More importantly, it is useful for with an initial focus on molecular requirements, it remains an issue. comparing and cross-referencing replacement pipelines. Figure 3 We are currently looking into datasets. Structural biology is shows the interface for the Balbes alternatives to traditional disk-based increasingly multi-discplinary. pipeline [4]. The webserver is a Java files for intermediate storage. Figure 2 shows three example servlet application. It makes use of datasets representing different some of the modern Java While software analysis tools provide levels of detail. frameworks, such as Tapestry, a basis for interpretation, Hibernate and Shiro, and will visualisation and comparison of Macromolecular crystallography implement APIs following the datasets remains an important step software has traditionally run on RESTful design model. Submitted jobs in developing an understanding of desktops and laptops, and after are run on a separate Linux cluster, the biology. We have therefore initial data reduction this is usually with results transferred back to the begun exporing how the Hartree not a problem. However, in recent webserver when the job is finished. Centre visualisation suites can add years a number of automation value for scientists. Existing desktop schemes have arisen which rely on programs already provide good many trials in which workflows or visualisation, for example molecular parameters are varied. While the graphics programs such as CCP4mg problem is usually embarrassingly and Coot allow a detailed parallel, access to compute clusters exploration of protein structure is usually required. One such including stereo views. Large approach is AMPLE [3], described in visualisation suites provide a more last year's Highlights. immersive environment. While this may not be crucial for experienced With advances in computer computational scientists, it is likely hardware, the CPU requirements of to be useful for explaining results these automated approaches Figure 3: Balbes server front end. to collaborators. should not be an issue, and yet access to local compute clusters can However, public web services are not The other key feature of visualisation be a problem for individual scientists. ideal where data are sensitive. suites is the large real estate Public web services backed by good Secure cloud services are preferable, available. This may be useful for compute resources can be valuable. and this has been identified as a single large datasets, for example The CCP4 team within the priority area for the CCP4 project. a complete genome consisting of computational biology group have

Authors

C. Sik Kim and M. Winn, STFC Daresbury Appleton Laboratory

Acknowledgements and References

[1] http://www.rothamsted.ac.uk/ [2] http://www.scalemp.com/ [3] J. Bibby, R. M. Keegan, O. Mayans, M. D. Winn and D. J. Rigden, Acta. Cryst. D68 , 1622-1631 (2012) [4] http://www.ccp4.ac.uk/BALBESSERV

Scientific Computing Department 28 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 29

Parallel hypersonic CFD simulations using FLASH code

SCD Highlights 2012 - 2013 29 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 30

SCD Highlights 2012 - 2013

FLASH [1] is an open-source code reported here have maintained by the Flash center at the enabled us to University of Chicago, designed to simulate high-speed solve compressible reactive flow, compressible originally in the context of viscous flows past astrophysics problems. It consists of arbitrary stationary several inter-operable modules like rigid bodies. We hydrodynamic, material property and have implemented nuclear physics solvers that can be computer graphics combined to generate different algorithms in FLASH applications for solving problems in to generate grids cosmology, high energy density around arbitrary physics etc. Although FLASH has a complex 3D compressible hydrodynamics module polyhedral shapes. based on the finite volume method, The body shape is it has not yet been applied for represented in a practical high-speed computational block-structured fluid dynamics (CFD) simulations. Cartesian adaptive This is mainly due to the fact that mesh refinement FLASH employs a Cartesian grid grid using Figure 1: AMR Cartesian grid demarcating the solid domain (denoted in red color) and fluid domain. structure which means that special PARAMESH [2] schemes need to be devised to provided in FLASH. embed geometries of arbitrary shape. The algorithm is based on a point-in- each face listed according to the In addition, as FLASH was originally polyhedron test using spherical orientation with respect to the developed as an astrophysics code, is polygons proposed by Carvalho and outward normal at that face. Each not designed to handle any viscous- Cavalcanti [3]. As input to the face of the polyhedron is then wall effects. algorithm, one needs to define all projected onto a unit sphere and the faces of the 3D geometry by resulting signed area of the spherical The extensions to the FLASH code specifying the co-ordinate points of polygon determines whether the grid

Figure 2: Computed pressure contours past a blunted cone-cylinder-flare geometry at a Mach number (Ma) of 5.

Scientific Computing Department 30 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 31

Figure 3: Flow past a double-cone geometry at Ma=13 (a) velocity contours and (b) computed numerical Schlieren.

point falls inside the fluid or the solid viscosity module in FLASH to solve carried out by the aerospace domain. An illustration of the the Navier-Stokes equations and research institute in the University of demarcation between fluid and solid implement appropriate boundary Manchester. It is well known that regions in an Adaptive Mesh conditions at the fluid-solid interface shock-boundary layer interactions Refinement Cartesian grid is shown to handle viscous wall effects [4]. (SBLI) cause adverse pressure in Figure 1. A major advantage of this gradients and boundary layer We have applied these modifications scheme is that mesh generation is separation leading to reduced to carry out CFD simulation past fairly simple and extremely quick as propulsive efficiency in hypersonic several 2D and 3D body shapes, there is no time consuming phase vehicles. Micro-ramp is a recently which are characteristic for flow in associated with surface mesh introduced passive flow control hypersonic flight applications. Two- generation and the associated device which has been shown to dimensional axi-symmetric Navier- volume mesh. The limitation is that suppress the adverse effects of SBLI. Stokes simulations for hypersonic as the grid is non-body-fitted, the The computed flow-field in the flow past blunted-cone-cylinder body shape is represented by stair vicinity of the MVG for a Mach (HB2) and double-cone geometries steps. A high level of refinement is number of Ma=5 is shown in Figure have been done and some therefore needed for an accurate 4. The flow results indicate the representative computed flow results body shape representation which can generation of a pair of counter are shown in Figures 2 and 3. result in very large grid sizes, rotating primary vortices which especially for 3D simulations. This We also carry out 3D Navier-Stokes remain in the boundary layer for can be resolved by resorting to high simulation of a recent experimental relatively long distance before being performance parallel computing. We study [5] on micro vortex generators lifted off further downstream. These also invoke the material property- (MVG) for hypersonic flow control, vortices improve the health of the

SCD Highlights 2012 - 2013 31 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 32

SCD Highlights 2012 - 2013

boundary layer by improving flow with the experimental findings. up to 16,384 cores. The code transport between high shear and performs well with increase in load A parallel performance study of the low shear regions near the wall, and increase in number of FLASH code has been carried out on thereby reducing the chance of a processors. Our flow simulation and Blue Joule [6], which is STFC’s IBM flow separation. Additionally, the code scalability results demonstrate Blue Gene/Q system. The variation of computed flow patterns confirm the that FLASH can be used as an CPU time taken with increasing cells presence of horse shoe vortices and effective tool to carry out high-speed per block as a function of number of a pair of secondary vortices. These compressible Navier-Stokes flow cores for a FLASH 3D MVG trends in flow observed from our simulation past arbitrary 2D and 3D simulation is shown in Figure 5. The simulations are very much consistent bodies. scalability of the code is shown for

Figure 4. Flow in the vicinity of MVG (a) streamlines overlaid on velocity contours and (b) density contours downstream of MVG

Scientific Computing Department 32 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 33

Figure 5: Scalability of FLASH code on Blue Gene/Q

Authors

B. John, D.R. Emerson and X.J. Gu, STFC Daresbury Laboratory

Acknowledgements and References

The authors would like to thank the Engineering and Physical Sciences Research Council (EPSRC) for their support of Collaborative Computational Project 12 (CCP12). [1] FLASH user guide, version 4.0, 2012 [2] Olson K, 2006, "PARAMESH: A Parallel Adaptive Grid Tool", in Parallel Computational Fluid Dynamics: Theory and Applications- Proceedings of the parallel CFD 2005 conference, U.S.A., eds. A. Deane et al., Elsevier, p. 341. [3] Carvalho P and Cavalcanti P, 1995. Point in Polyhedron Testing Using Spherical Polygons, in “Graphics Gems V” A. Paeth, Editor. Academic Press, p. 42. [4] John B, Emerson DR and Gu XJ, 2013. Parallel Compressible viscous flow simulations using FLASH code: Implementation for arbitrary three dimensional geometries. Procedia Engineering, Vol 61, p. 52. [5] Mohd R, Hossein, Z., Azam, C., Konstantinos, K., 2012. Micro-Ramps for Hypersonic Flow Control, Micromachines, 3(2), p. 364. [6] Blue Joule – IBM BlueGene/Q, http://community.hartree.stfc.ac.uk/wiki/site/admin/resources.html#bgq

SCD Highlights 2012 - 2013 33 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 34

SCD Highlights 2012 - 2013

Construction of a ‘tethered’ cytochrome- cupredoxin electron transfer complex model using DL_FIELD - a model development tool for DL_POLY

Scientific Computing Department 34 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 35

Introduction data structures all within the The DL_FIELD program package DL_FIELD framework facilitates consists of three main components: DL_FIELD is a computer program migration of one class of system the DL_FIELD main program and two package written in C that primarily model to another, with minimum data library files, the Molecular serves as a support application learning curve. In this way, DL_FIELD Structure file (.sf file) and Potential software tool for DL_POLY molecular not only speeds up research efforts Parameter file (.par file). In addition, dynamics simulation package. It was and scientific output for DL_POLY DL_FIELD also reads a user-defined developed at Daresbury Laboratory users but also encourages force field file (the udff) that can under the auspices of the Engineering researchers to carry out new classes contain information from both the .sf and Physical Sciences Research of molecular systems spanning and .par files. The udff file allows Council (EPSRC) for the EPSRC's across multidisciplinary fields, from users to edit or include new molecular Collaborative Computational Project material sciences to biological and structures and potential parameters No. 5 (CCP5) for the Computer pharmaceutical areas. without doing any physical change to Simulation of Condensed Phases. the standard library files. Users can Functionality interact with the main program via a The program is intended to serve as control file (dl_field.control file) an important application tool to DL_FIELD program has three main where all user-selectable options are enhance the usability of DL_POLY functions: located. Since the program is molecular dynamics (MD) simulation developed with an aim to model package and to facilitate the use of a (i) Force field model convertor. This complex molecules such as wide range of advance features basically involves the conversion of a biomolecules, the current version of included in the MD program suite. user’s atomic configuration in simple DL_FIELD only reads molecular DL_FIELD is intended to serve as a xyz coordinates into identifiable structures in PDB file format. user-friendly tool that automatically atom types based on user-selectable processes the molecular information force field (FF) schemes and The capability and robustness of with minimum intervention from molecular templates and then DL_FIELD has been tested against a users.The philosophy behind the automatically generate the DL_POLY number of complex systems with program development is to minimise configuration file (CONFIG), the force sizes range from several hundred the requirement for users to field file (FIELD) and a generic control thousand to the order of 107 atoms, understand detailed inner workings file (CONTROL). The available all running in a typical one-processor of force field descriptions and model popular FF schemes for conversion PC. DL_FIELD can handle system set up procedures for DL_POLY. are CHARMM, AMBER, OPLS-AA etc. models containing a mixture of However, options are also given for different classes of molecules, for more advanced users to edit and (ii) Force field editor: DL_FIELD instance, proteins and membranes modify the existing standard force allows users to edit or modify a without having to call separate field schemes in order to improve the particular FF scheme to produce a library files. The following case study simulation models. customised scheme that is specific to a particular molecular model. For illustrates the capability of DL_FIELD example, introduction of core-shell DL_FIELD is designed to handle a Case study: MD simulations of a models and insertion of pseudo wide range of molecular system of ‘tethered’ cytochrome-cupredoxin points to model atom and bond varying complexity: from simple ionic electron transfer complex, Copper polarisability effects. compounds, small covalent nitrite reductases (CuNiRs) molecules to systems with complex topologies such as biomolecules, (iii) Force field model repertoire: CuNiRs are enzymes that perform the carbohydrates and organic cages. It is DL_FIELD has a consistent file structure proton-coupled one electron also capable to construct force field format for all FF schemes and reduction of NO2 to NO, a key step in models for random structures such molecular structure definitions. It also the denitrification pathway of the as hydrogels, networked systems and allows users to easily expand the global nitrogen cycle that returns random polymers. In addition, the existing standard model library to fixed nitrogen to the atmosphere: + - ➝ generalisation of file formats and include user-defined molecular models. NO2 + 2H +1e NO + H2O.

SCD Highlights 2012 - 2013 35 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 36

SCD Highlights 2012 - 2013

These multi-domain complex proteins However, the interactions between (RpNiR) provides a unique basis for usually exist as trimeric structures CuNiRs and their individual electron further detailed computational studies with each unit consisting of two donor proteins are by nature very of what is a normally found as a cupredoxin-like domains. A type 1 Cu transient and difficult to capture and transient electron transfer protein- (T1Cu) is located in one of these the structure of such a complex was protein complex. domains and a type 2 Cu (T2Cu) is reported only recently, for the bacterial located at each trimer intersubunit CuNiR–Cytc551 complex from The RpNiR structure was converted interface. The catalytic behaviour1,2 of Alcaligenes xylosoxidans. At Liverpool, using DL_FIELD to an all-atom, fully CuNiRs is known at T2Cu site where we have recently discovered3 and solvated CHARMM force field model 4 NO2 is attached and reduced at the structurally characterised (Figure 1) a running in DL_POLY_4, to examine site, a process that requires electron variant from Ralstonia pickettii. This the fluidity of the interaction transfer from the T1Cu site, proton new family of CuNiRs contains a between the cytochrome and delivery to the T2Cu site. However, cytochrome c domain C-terminal cupredoxin domains. Figure 2 shows the catalytic process also involves an extension, representing a ‘tethered’ the rmsd values of these protein electron donor redox partner protein, electron transfer complex, comprising domains. The initial results show that which is a c-type cytochrome the Fe-haem and the T1Cu and T2Cu the protein structures are markedly (containing Fe-haem) or a cupredoxin sites contained in separate domains in different from the initial (Cu containing). one protein. The ultra-high resolution experimental X-ray structure. (1Å) structure of R. picketti CuNiR

Figure 1: Ribbon diagram of RpNiR trimer4, showing the cupredoxin-like domains (green), where T1Cu and T2Cu sites are located and the cytochrome domains (red), where the Fe-haem (purple) are located.

Scientific Computing Department 36 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 37

The interaction between these two Program release schemes for ionic solids such as domains must be effective in ‘tuning’ metal oxides and minerals. The first version, DL_FIELD v1.0, the electron transfer from the Fe- was first made available to the public haem centre to the T1Cu site. The DL_FIELD is supplied to individuals in March 2010 and since then later structure shows that water mediated under an academic licence, which is versions were released biannually. electron transfer is a possible feature free of cost to academic scientists The latest program, version 3.0, of this process and we have pursuing scientific research of a non- contains a range of all-atom force employed a more sophisticated TIP5P commercial nature. For more field schemes: CHARMM, AMBER water model to observe the detailed information, please visit the web site: (including Glycam) and OPLS-AA and solvent-diffusion in the channels www.CCP5.ac.uk/DL_FIELD also general FF schemes such as PCFF between the cupredoxin and and DREIDING. In addition, DL_FIELD cytochrome domains. 3.0 also includes inorganic FF

Figure 2: Average root mean-square differences (rmsd) of the cupredoxin domains (pink), cytochrome haem-c domains (red) and the overall variation (black) in the RpNiR trimer, with respect to the X-ray crystal structure, obtained from MD trajectories up to 6 ns.

Authors

C. Yong, P. Sherwood, STFC Daresbury Laboratory R. Strange, University of Liverpool

References

[1] Strange, R.W. et al J Mol Biol 1999, 287, 1001-1009. [2] Leferink, N. et al Biochemistry 2011, 50, 4121-4131. [3] Han, C. et al Biochem J 2012, 444, 219–226. [4] Antonyuk, S.V. et al Nature 2013, 496, 123-126.

SCD Highlights 2012 - 2013 37 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 38

SCD Highlights 2012 - 2013

Optomising Hydrogen Fuel Cells - Some Insights from First Principles Calculations

Scientific Computing Department 38 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 39

Commercialisation of fuel cells for theoretically at the atomic scale in application in the automotive order to provide an understanding of industry, however, faces a big the mechanics behind AFC catalysis. obstacle; the high cost of Pt that is used to catalyse the fuel cell reaction In recent studies of a range of non- [4]. In order to overcome this, the noble metal catalysts, LaMnO3 has focus of most research in this field shown some of the highest has centered on how to retain high performance [5, 6, 7]. performance in the fuel cell while A collaboration between SCD’s reducing the loading of Pt [4]. Theoretical and Computational Figure 1: The unrelaxed (100) and (001) Physics Group, Imperial College’s surfaces (left and right) of orthorhombic LaMnO3. The fuel cell that is typically adopted Thomas Young Centre and Arrows indicate the direction of the apical for automotive applications is known Department of Chemistry has, J-T distortion (elongated Mn-O bonds). as the polymer electrolyte membrane therefore, carried out state-of-the-art fuel cell (PEMFC). An alternative hybrid density functional theory A major concern of the international approach to solving the catalyst cost calculations of LaMnO3 by using the community is the provision of energy problem, which has gained traction B3LYP functional as implemented in to a demand that is always increasing, only in the last few years, involves the CRYSTAL09 code [8]. under the constraints of a limited the use of alkaline fuel cells (AFCs). The computed bulk and surface fossil fuel supply. The development of These fuel cells, which also energies can be used to predict the energy generation methods that are use hydrogen fuel, are able to crystallite morphology, the facet not limited by fuel supply has, operate at reasonable performance composition and structure, and thus therefore, been widely investigated levels without the use of expensive the identification of likely surface for many years. The urgency of such noble metal catalysts (such as Pt) [5]. reaction sites. The investigation research has further increased since This is attributed to their alkaline can be separated into three main the environmental issues associated electrolyte, which makes the components that lead to a better with fossil fuel combustion (global reduction of oxygen on the cathode, understanding of non-noble metal warming, ocean acidification, smog, for the fuel cell reaction (Eq. 2), catalysts and optimisation of their low air quality) have become well more facile than in the acidic crystal morphology for AFC catalysis: recognised[1, 2]. environment of a conventional automotive fuel cell (PEMFC). 1. The ground state of the material One solution to this energy challenge for the temperature range in which it is the hydrogen powered fuel cell. The performance of AFCs using non- operates must be identified and its This solution operates on a potentially noble metal catalysts is not yet at the thermodynamic stability established, limitless supply of fuel through, for level of state-of-the-art PEMFCs; allowing for future manipulation of example, solar water splitting, while however, this is due to the early stage crystal morphology depending on the only producing water as a by-product of AFC research, where the most chemical environment. of the electrochemical reaction that promising catalytic materials are still generates electricity (Fig. 2)[3]. being discovered [5]. 2. The stable surfaces of the material The morphology of the catalysts in must be identified, allowing for a these studies is not well understood, prediction of the equilibirum crystal since they are are typically morphology and characterisation of polycrystalline powders. Fine the reaction (adsorption) sites characterisation of AFC catalysis on available on each surface. Figure 2: The water splitting and hydrogen fuel cell reactions. such powders is also, understandably, very difficult, since the fundamental 3. The interaction of the available reaction steps occur on unknown adsoption sites with the molecular Hydrogen fuel cells can be used for a facets at unknown reaction sites. species that are present in the fuel variety of applications, but perhaps The reliability of predictive modelling cell reaction must be studied, their biggest role is in vehicles which based on ab initio quantum theory in order to reveal the most currently use fossil fuels via the means that the catalyst-reactant favourable adsorption sites (and the internal combustion engine. system can be characterised surfaces on which they exist).

SCD Highlights 2012 - 2013 39 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 40

SCD Highlights 2012 - 2013

a surface that has an excess of MnO6 octahedra (as depicted in oxygen will be stabilised by a high Fig. 1). Careful analysis of the surface oxygen chemical potential (oxidising formation energies indicates that conditions); however, if such a high there is a correlation between the J-T oxygen chemical potential is beyond distortion of bond lengths and the

the limit of LaMnO3 stability contribution of Mn-O bond cleavage identified in the phase diagram, the to the surface formation energy. The surface may in fact be unstable [11]. calculations also allow us to understand the relative formation energies of the surfaces and to draw Figure 3: Three-dimensional phase diagram constructed from calculated Gibbs some general conclusions about the formation energies at standard conditions. structure of surfaces that are likely The stability region of LaMnO3 is to be present even in non- represented by the dark grey area [9]. equilibrium conditions.

In the temperature range of an AFC The computed formation energies of (25-75oC), LaMnO3 is observed to the surfaces are found to be adopt an orthorhombic (Pnma) determined by: (1) the strength and structure with the Mn-localised spins number of cleaved Mn-O bonds, (2) arranged in an A-type the compensation of antiferromagnetic configuration [10]. undercoordinated ions after cleavage This is correctly computed to be the and (3) relaxation from the bulk ground state structure in the geometry [12]. calculations carried out here [9]. The thermodynamics of the La-Mn-O These factors are relevant to all system, which are calculated using a Figure 4: Wulff plot showing the equilibrium transition metal oxides and form the novel methodology, are found crystal morphology based on the relaxed basis for rationalising their surface surface formation energies [12]. to more accurately predicted (a formation energies. Equally, mean error of only 1.6%) than in any identifying the distribution previous studies [9]. The high The low-index non-polar of adsorption sites by analysis of

accuracy of these calculations stoichiometric surfaces of LaMnO3 equilibrium crystal morphology is a assures us that the system we have have been studied to establish an prerequisite for the optimisation of modelled is a good representation of initial approximation to the surface non-noble metal catalyst the actual material. Additionally, structures and crystallite morphology. it also means that the phase diagram morphology. The order of stability for of the La-Mn-O system correctly these surfaces from low to high identifies the chemical potential energy was revealed to be (100) < limits for the stability of the bulk (101) < (001) < (110). Using the phase and thus the range formation energies of these surfaces, of environmental conditions for an equilibrium crystal morphology which the surface formation free has been determined (Fig. 4), energies need be considered when which indicates that the (100) predicting the in situ LaMnO3 crystal surface is dominant; it makes up morphology. These limits are 32% of the total surface area of a designated by the edges of dark grey crystallite. The most widely available area in Fig. 3 and are the most adsorption site is predicted to be an

relevant to surfaces of LaMnO3 that MnO5 motif that is created by are non stoichiometric, since their cleaving the apical Mn-O bond of an stability strongly correlates to MnO6 octahedron; 61.7% of sites changes in the chemical potential of adopt this geometry. The apical Mn- reactive gasses in equilibrium O bond refers to one of the with the crystallite. For instance, elongated bonds of the J-T distorted

Scientific Computing Department 40 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 41

Figure 5: The (100) surface with an O2 molecule (blue) adsorbed on an apical Mn site (MnOAp5 ), after relaxation. The lattice Mn, O and La are represented by the small black, medium red and large grey spheres respectively.].

The interaction of the available since the Mn site adsorbs O2 could be identified. The optimisation of

adsorption sites with reactant species neither too strongly (so that O2 LaMnO3 as an AFC catalyst is being explored in the final stages of permanently adsorbs, hindering further hinges on identifying the surfaces

this investigation by the simulation of reactions), nor too weakly (so that O2 corresponding to such sites and [6] molecular O2 adsorption on typical is not reduced) . maximising their coverage on the

reaction centres (Fig. 5). The full range Using models of O2 adsorption, it is equilibrium crystal morphology. of Mn adsorption sites identified in the therefore possible to add a further This, as discussed above, can be previous part of the investigation are dimension to the investigation of AFC achieved by manipulation of the

therefore evaluated in terms of the O2 catalysts by characterising the strength environmental conditions (chemical binding energies and adsorption of binding of each site on a candidate potentials), in order to stabilise the modes. Binding energies, material. Initial results for the studied optimal surfaces during the

in particular, are a good indicator of LaMnO3 surfaces show that a range of preparation of catalyst powders. the reactivity of any particular strong and weak binding exists across Work on fully characterising the adsorption site. It has been proposed the adsorption sites, indicating that an adsorption sites is, therefore, of utmost

that LaMnO3 is a ideal AFC catalyst optimal (intermediate) adsorption site importance and currently underway.

Authors

E. A. Ahmad1,2, G. Mallia1,2, D. Kramer1,3, A. R. Kucernak1, N. M.Harrison1,2,4 [1] Imperial College London [2] Imperial College London [3] University of Southampton [4] STFC, Daresbury Laboratory

Acknowledgements

This work made use of the high performance computing facilities of Imperial College London and, via membership of the UK’s HPC Materials Chemistry Consortium funded by EPSRC (EP/F067496), of HECToR, the UK’s national high-performance computing service, which is provided by UoE HPCx Ltd at the University of Edinburgh, Cray Inc and NAG Ltd, and funded by the Office of Science and Technology through EPSRC’s High End Computing Programme.

SCD Highlights 2012 - 2013 41 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 42

SCD Highlights 2012 - 2013

Accelerating research into the impacts of clouds on the global climate

Scientific Computing Department 42 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 43

Clouds are fundamental components allowing them to regularly reprocess of the Earth’s atmosphere and a play the data on short timescales. a vital role in the energy balance of the Earth, reflecting solar radiation The storage technology used by (tending to cool the surface), but also Jasmin/CEMS and SCARF is provided by reducing the extent to which thermal Panasas and is based on a concept radiation escapes to space (tending known as Object RAID which splits to warm the surface). Globally clouds files into objects and distributes these are responsible for about 2/3rds of objects over several of the individual the planetary albedo. Hence, even storage blades of the Panasas system. relatively small changes in the This allows the bandwidth to scale amount, type and vertical with the number of clients as multiple distribution of cloud could have a blades can be transferring several parts profound impact on climate. of the file simultaneously.

RAL Space has embarked on an RAL Space were able to run more ambitious project to process over than 800 jobs concurrently on SCARF 15 years of satellite data from the and quickly became one of the AATSR (Advanced Along Track largest users of SCARF running over Scanning Radiometer) instrument as 75000 jobs and using more than Figure 1: Pie charts showing the part of the ESA CCI (Climate Change 400,000 hours of compute time over percentage of time on SCARF and number Initiative) program. This imager the last twelve months. They of jobs run by RAL Space compared to provides measurements in a number achieved peak data rates exceeding other user groups in the past year. of spectral bands spanning the visible 19 gigabytes per second transferring to mid-infrared spectral ranges from data between the SCARF compute which it is possible to infer nodes processing the cloud data and information on the amount and the storage systems where the data altitude of cloud, as well as the cloud was located. phase (ice or liquid), optical thickness of the cloud and information on the size of the cloud particles.

The datasets they needed to process were part of the CEDA archive that was hosted on the JASMIN/CEMS super data cluster. To handle the huge volume of processing required RAL Space chose to use the SCARF HPC Figure 2: Network diagram showing almost 20Gb/s being read from Panasas by groups of SCARF nodes. cluster run by STFC Scientific Computing. The advantages of SCARF were its high-speed access to Authors JASMIN/CEMS, plus its own high-speed D. Ross, STFC Rutherford Appleton Laboratory backend storage, and 1500 core compute capacity. Acknowledgements and References

Caroline Poulsen and Barry Latter of Remote Sensing Group, RAL Space The RAL Space team worked closely Jonathan Churchill, Ahmed Sajid and Cristina del Cano Novales of Research Infrastructure Group, STFC Scientific Computing with the SCARF team to migrate their code and SCARF quickly became an http://www.esa-cci.org/ https://earth.esa.int/web/guest/missions/esa-operational-eo-missions/envisat/instruments/aatsr essential component of the workflow http://www.ceda.ac.uk as the data could be processed http://www.jasmin.ac.uk https://sa.catapult.org.uk/climate-and-environmental-monitoring-from-space without the I/O problems associated http://sct.esc.rl.ac.uk/SCARF with their existing compute cluster, http://www.panasas.com

SCD Highlights 2012 - 2013 43 CSE_2013_V3_Layout 1 16/10/2013 13:24 Page 44

SCD Highlights 2012 - 2013

CCPi - A Tomographic Imaging roadmap; from image capture, to reconstruction, to visualisation, back to simulation

Scientific Computing Department 44 CSE_2013_V3_Layout 1 16/10/2013 13:25 Page 45

Introduction: The CCPi network specialities. On the Harwell Campus Reconstruction Algorithms there are a set of specific Tomographic Founded in Autumn 2011 the aim of The first step in the analysis process is Imaging beamlines; for example at the the CCP in Tomographic Imaging is to the 3D reconstruction of the object Diamond Light Source there are I12- support the emerging UK community from the data. The standard method, JEEP and I13. In addition David Willetts with a toolbox of algorithms to provided with lab based systems and announced in 2013 further funding for increase the quality and level of used at DLS, is Filtered Back Projection. a new £5m materials science imaging information that can be extracted by This is reliable when a large number of beamline (DIAD, Dual Imaging and computed tomography (CT). CT projections (>1000) are available. Diffraction) at the DLS. Our aim is to requires the computational process In experiments where you need to bring together this imaging of reconstructing 3D volume data reduce the number of projections in community, maximise return on from a set of 2D projection data; order to lower the radiation dose or investment in software development for example the most well-known get better time resolution of sample and ensure longevity, sustainability form is from multiple X-ray image changes, iterative algorithms can be a and re-use of code. projections within medical scanners. better choice. This is one of a group of When the CCPi started it prioritised Collaborative Computational Projects We have been working with iterative the two areas of reconstruction and (CCPs) that bring expertise in key image reconstruction algorithms quantitative analysis; both of which fields of computational research developed at the University of aim to provide features that are over together to tackle large-scale Manchester to optimise their and above the software provided by scientific software development, performance and apply them to data manufacturers – combining new maintenance and distribution. collected at DLS. We currently have mathematical algorithms as well as This particular CCP is a collaboration Conjugate Gradient Least Squares computational challenges. Adding between researchers across the HE (CGLS) and Total Variation the final link with visualisation and sector and STFC SCD staff across Regularisation (TV) algorithms computational analysis makes them both campuses and involving working in both parallel beam and an ideal emerging CCP problem multiple groups within SCD. cone beam geometries and it will be possible in the near future to analyse data collected at DLS and archived on SCD data-stores on systems such as the Emerald GPGPU cluster and the Hartree Centre machines.

Each iteration in the algorithm typically involves a forward and a Figure 1: X-ray CT reconstruction can be considered as a large data-flow problem requiring backward projection step and each of a cluster of GPGPUs to reconstruct projected data from imaging facilities to create a 3D volume of data. The figure shows a screen-shot (from the processing on one of the large these is at least as computationally Hartree visualisation projection rooms) of 1500 raw projected image files to expensive as the single back projection reconstruction and volume visualisation. in the standard method. The code is currently parallelised within a single The CCP’s target community is compute node using OpenMP. An which STFC facilities can assist in specifically related to materials image reconstruction at the full tackling. SCD, acting in a similar scientists and its size has grown resolution of a Diamond Data set from manner to a facility, has the sharply with many academic groups beamline I12 set on a 4TB ScaleMP opportunity to create a whole around the UK taking up node at the Hartree centre is currently greater than the sum of the parts. tomographic imaging and purchasing estimated to be about 20 hours per new lab based X-ray CT scanners iteration. Future work will involve In the following sections we follow (this includes the Universities of parallelising the algorithms across the data flow illustrated in Figure 1 Bath, Swansea and Nottingham as multiple nodes and exploiting GPGPU from the scanned data through well as the RCaH and the NPL). There and Intel many-core accelerators. reconstruction and image are now over 250 members on the quantification to large-scale distribution list covering different visualisation and data analysis.

SCD Highlights 2012 - 2013 45 CSE_2013_V3_Layout 1 16/10/2013 13:25 Page 46

SCD Highlights 2012 - 2013

Quantitative Visualisation VolView and distributing them freely 3. Ellipsoid fitting. Calculates for others to use parameters of ellipsoids fitting While it is useful to be able to see the (http://ccpforge.cse.rl.ac.uk/gf/project particles in an image. This algorithm 3D reconstructions, and the images /iqa/). Design of the packages is such was produced at the request of an produced can show a lot, hard data that the calculation is separated from attendee at a quantification can only be produced by the interface to the supporting workshop who was using 3D quantification of the images application which means that the images as a basis for further themselves. For example this can heart of the implementation can be computational modelling work. involve measuring simple lengths, re-used in other software. 4. Calculating connected areas and angles of features in the components. This gives a simple image, or pore sizes in bio-glass To date five algorithms have been way to create a labelled image materials, or involve a complex study implemented: from a binary image following of tumour size over time; quantitative 1. 3D quantification for labelled thresholding. Particularly useful visualisation is an important element images based on a Matlab code. for porous media where we want in tomographic imaging. Applying the algorithm to data is to label the pores before now easier thanks to the re- quantification. Work in this area identified implementation and can run in 5. Particle tracking. Tracking algorithms that have been developed parallel on multi-core machines. labelled particles in a time series for a specific purpose but which 2. Accessible volume calculation. of 3D images using a flexible could have a broader application to a Calculates the volume accessible to algorithm for comparing the wider audience. This was done by a series of spheres with user defined particles from one frame to the packaging them as add on modules size and produces a segmented next. This has been used to to existing image analysis image according to the largest analyse leaching of particles in applications such as Avizo and sphere that can reach each voxel. fractured rock samples.

Figure 2: 3D quantification of pore volumes.

Figure 3: Calculating accessible volume for a scaffold. Slices coloured for accessibility by spheres of varying sizes.

Scientific Computing Department 46 CSE_2013_V3_Layout 1 16/10/2013 13:25 Page 47

Visualisation nodes and Training Roadmap combining: capture to only part one and the next step is to for the Tomographic community re-capture and a Vision for guide the follow-on HPC computation Supercomputing assisted Imaging and image capture from these The next stage for this data-flow Capture Facilities quantitative values. pipeline is presenting and discussing Examples include Image-based the volume data. To this end at RAL The final stage is to realise that this Modelling techniques where finite the Atlas Visualisation Facility has process is non-linear; and actually element analysis may be extracted been exploited, a new large screen cyclic and SCD can play a large role in from interactively selected volume room in the Atlas building and this, acting as data-flow, storage, HPC parts, and be then steered by the modified from the original Hartree compute and meta-data archiving. The visualisation; further iterative visualisation specifications. Combining image capture facilities within STFC; comparative reconstructive methods this with two high-memory ½ TB RAM Diamond Light Source, ISIS and high- using the major computational STFC workstations the facility aids in fast powered lasers, as well as the many HPCs can be carried out, and the next and interactive re-reconstruction and laboratory based systems should not image capture experiment can be quantitative visualisation. This area is be considered as just a starting point. guided by the visualisation, creating a being installed with all the major Once the images are captured the cyclic research process. Process is software products used by the relevant HPC clusters are used for about integrating the human-in-the- tomography community, as well as the reconstruction and then analysis can imaging-computation-loop. CCPi toolkits. occur in the method described. This is

Figure 4: Monthly informal Tomography and Coffee sessions are being held at the Authors AVF (Atlas Visualisation Facility) for RAL users and SCD developers. The last B. Searle, D. Worth, M. Turner, STFC Daresbury Laboratory Friday of the month from 3pm: features S. Nagella, R. Fowler, STFC Rutherford Appleton Laboratory a show-and-tell of new data and effects as well as a place to discuss data-flow Acknowledgements and References issues and informal training. www.ccpi.ac.uk main CCPi website including YouTube Channel at: http://www.youtube.com/channel/UCGB578xcyXNQyiBsufEFIeQ http://www.stfc.ac.uk/1808.aspx Harwell Imaging Partnership at STFC http://www.stfc.ac.uk/hartree Hartee Centre at STFC David Worth, Developing and Image Analysis Plugin for VolView, RAL Technical Reports, RAL-TR 2013-007, (2013) http://purl.org/net/epubs/manifestation/9412 Robert Atwood et al, Scripta Mat, 51 (11),1029-33, 2004. doi:10.1016/j.scriptamat.2004.08.014 Thanks to many individuals and groups who have been guiding this processes, notably: Nicola Wadeson (Post-doc), Will Thompson (post doc), W. Lionheart (University of Manchester, Department of Mathematics), Phil Withers, Peter Lee , Tristan Lowe, K. J. Dobson, Sam Mcdonald (University of Manchester, Department of Material Science) Sam Hawker (Nikon Xtek), Michael Fesre (Xradia), Ed Morten (Rapsican systems), A. P. Kaestner (Spallation Neutron Source Division, Paul Scherrer Institut) as well as Robert Atwood and Mark Basham (Diamond Light Source), Mike Curtis-Rouse (Harwell Imaging Partnership) and Robert Allan (SCD staff).

SCD Highlights 2012 - 2013 47 CSE_2013_V3_Layout 1 16/10/2013 13:25 Page 48

SCD Highlights 2012 - 2013

The Software Engineering Support Centre

Scientific Computing Department 48 CSE_2013_V3_Layout 1 16/10/2013 13:25 Page 49

Software Engineering is coming to method of working for yourself and the correct answers and its research software and the Software new developers. There are also tools limitations. All this is down to Engineering Support Centre (SESC) to measure the quality of your software engineering – requirements, is here to help. If you cringe at the software and help you build and test design, implementation, testing, and thought of software engineering, its “correctness”. Figure 1 shows the documentation in whatever form don’t worry, SESC can help you get elements that are traditionally they occur. Just so long as they are started. If you already use some included in software engineering and available to others. Neither last software engineering tools or these form a basis for our thinking. It minute brain dumps nor ESP work in process, that’s great, SESC is here to looks like a waterfall with one stage my experience – write things up as help you go further. leading neatly to the next but we you go along. All this information know that’s not how things happen will be invaluable when it comes to In this article we’ll say why we think in practice. Development is an re-using your software whether as a software engineering is a good thing iterative process with a variety of component in some larger scheme or for research computing and give time scales and different level of by extending its functionality. And, some of the ways SESC can help you detail required at each stage. even if you’re the only one working with your software development on the software, don’t be fooled into process both now and in the future. Now, why is software engineering thinking that you’ll remember all the a good thing? details. Painful experience says that’s Software Engineering – It’s good for you as a not true. A Good Thing scientist/developer and it’s good for First of all we should say what funding bodies. Let’s get the latter Software Engineering software engineering is. This is not a out of the way first. Funders Support Centre hard and fast, text book definition it’s increasingly need more impact The Software Engineering Support just one we can work with for now. (uugh) from less cash. They don’t Centre (SESC) is a £1M, 5 year, Software Engineering is: want to be paying people to re- EPSRC funded programme with the implement existing software; they aim of promoting software The use of processes and associated want new stuff built on what’s engineering in the UK research tools to write good quality software already available. Of course research software community. We provide that produces excellent scientific software development provides information, advice and training on results. many of these building blocks and practical software engineering tools for them to be of good quality and and techniques through reports, sustainable requires software tutorials, hands-on workshops and engineering. It doesn’t have to be seminars. heavy but enough to demonstrate that you have done a good job and We have experience in helping make your software easy for others developers in computational science to re-use or extend. improve their development process, testing and documentation, with the Now we come to the selfish part! I use of simple tools such as revision want my software to be used by as control for their files, coverage Figure 1: Elements in the software many people as possible and I want it testing and systems to extract code engineering process. to help me build my career. To comments as documentation. There achieve these aims I need to have are several useful reports and posts on these topics on our web site. The processes we are thinking of confidence that it will work when need not be tedious and long-winded others use it and be ready to extend One of the major components of but having a process or processes for and adapt the software or let others SESC is the CCPForge service, a different tasks will help organise do it. These are not claims I could collaborative development your work and keep development make about the software I wrote for environment for the UK scientific under control. Using tools to help my PhD. Having confidence in software community and their with the processes saves time and software means knowing it does collaborators. CCPForge provides the means it is easy to keep a consistent what it was designed to do; it gets following for registered projects:

SCD Highlights 2012 - 2013 49 CSE_2013_V3_Layout 1 16/10/2013 13:25 Page 50

SCD Highlights 2012 - 2013

Figure 2: CCPForge project summary page. Figure 3: CCPForge git repository page.

• Revision control for files with software engineering, software platforms (including your own) CVS, Subversion git, bazaar and quality and testing among other with a range of compilers, Mercurial topics libraries etc and get a summary of • Software release downloads for • Prepare tutorials, HowTo guides, the test results on-line. source or pre-compiled versions tool review reports which are • QA tools server – access to • Flexible licensing of software available from the web site advanced software engineering • User forums for help from other • Hold meetings so people quality assurance tools to check users and developer discussions interested in the software your software and identify areas • Tracking of support and feature engineering of their code can get for improvement. requests together and share experiences We’ll also be canvasing the views of • Traditional mailing lists and learn from others the research software development • Task management community as we build up the work • Documentation repository Future plans that SESC does. • Flexible access to information The stand-out features for the Contacts based on "roles" future are: • Enhanced CCPForge service – on- For more information you can visit the Training and dissemination of ideas is line access to QA tools for SESC web site http://softeng- important to us and we: projects – share results with support.ac.uk or the CCPForge web • Run hands-on workshops for colleagues to aid collaborative site http://ccpforge.cse.rl.ac.uk. tools to give users an insight in to software improvement. To discuss how we can support you what the tools offer • Buildbot service – run tests of please contact [email protected]. • Give seminars on SESC, practical your code on a number of

Authors

Figure 4: Buildbot console view C. Greenough, D. Worth, STFC Rutherford Appleton Laboratory

Scientific Computing Department 50