Proceedings of the 2021 Improving Scientific Software Conference

Editors Weiming Hu Davide Del Vento Shiquan Su

NCAR Technical Notes NCAR IS SPONSORED BY NCAR/TN-567+PROC National Center for

Atmospheric Research THE NSF THE P. O. Box 3000

Boulder, Colorado 80307-3000 www.ucar.edu NCAR TECHNICAL NOTES http://library.ucar.edu/research/publish-technote

The Technical Notes series provides an outlet for a variety of NCAR Manuscripts that contribute in specialized ways to the body of scientific knowledge but that are not yet at a point of a formal journal, monograph or book publication. Reports in this series are issued by the NCAR scientific divisions, serviced by OpenSky and operated through the NCAR Library. Designation symbols for the series include:

EDD – Engineering, Design, or Development Reports Equipment descriptions, test results, instrumentation, and operating and maintenance manuals.

IA – Instructional Aids Instruction manuals, bibliographies, film supplements, and other research or instructional aids.

PPR – Program Progress Reports Field program reports, interim and working reports, survey reports, and plans for experiments.

PROC – Proceedings Documentation or symposia, colloquia, conferences, workshops, and lectures. (Distribution maybe limited to attendees).

STR – Scientific and Technical Reports Data compilations, theoretical and numerical investigations, and experimental results.

The National Center for Atmospheric Research (NCAR) is operated by the nonprofit University Corporation for Atmospheric Research (UCAR) under the sponsorship of the National Science Foundation. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

National Center for Atmospheric Research P. O. Box 3000 Boulder, Colorado 80307-3000 NCAR/TN-567+PROC NCAR Technical Note ______

2021-08

Proceedings of the 2021 Improving Scientific Software Conference

Editors Weiming Hu Davide Del Vento Shiquan Su

NCAR Laboratory NCAR Division ______NATIONAL CENTER FOR ATMOSPHERIC RESEARCH P. O. Box 3000 BOULDER, COLORADO 80307-3000 ISSN Print Edition 2153-2397 ISSN Electronic Edition 2153-2400 How to Cite this Document:

Hu, Weiming, Davide Del Vento, Shiquan Su, (Eds.). (2021). Proceedings of the 2021 Improving Scientific Software Conference (No. NCAR/TN-567 +PROC). doi:10.26024/p6mv-en77

Information about future workshops and other SEA news can be found on our website, -- > https://sea.ucar.edu/sea

The website of the 2021 workshop is https://sea.ucar.edu/conference/2021

To be added to the workshop mailing list, please send an email to [email protected].

i Proceedings of the 2021 Improving Scientific Software Conference

Table of Contents Organizing Committee ...... iii

SEA 2021 Peer-Reviewed Papers ...... 1-49

PyELM-MME: A Python Platform For Extreme Learning Machine Based Multi-Model Ensemble...... 1-4 Nachiketa Acharya and Kyle Joseph Chen Hall

Anomaly Detection in Particle Accelerators using Autoencoders...... 5-11 Jonathan P. Edelen and Nathan M. Cook

Empirical Inverse Transform Function for Ensemble Forecast Calibration...... 12-22 Weiming Hu, Laura Clemente, George S. Young, and Guido Cervone

Expanding Impact Metrics Contexts With Software Citation*...... 23-30 Keith E. Maull and Matt Mayernik

A Portable Framework for Multudimensional Spectral-like Transforms At Scale...... 31-39 Dmitry Pekurovsky

ii Organizing Committee

Conference Chairs Davide Del Vento, National Center for Atmospheric Research (NCAR) Shiquan Su, National Center for Atmospheric Research (NCAR)

Program Committee Chairs Davide Del Vento, National Center for Atmospheric Research (NCAR) Shiquan Su, National Center for Atmospheric Research (NCAR) Weiming Hu, The Pennsylvania State University

Proceedings Committee Weiming Hu, The Pennsylvania State University Davide Del Vento, National Center for Atmospheric Research Shiquan Su, National Center for Atmospheric Research Michael Flanagan, National Center for Atmospheric Research Taysia Peterson, National Center for Atmospheric Research

Steering Committee Andrew Younge, Sandia National Laboratories Brian Vanderwende, National Center for Atmospheric Research (NCAR) Davide Del Vento, National Center for Atmospheric Research (NCAR) Edward Hartnett, National Oceanic and Atmospheric Administration (NOAA) Guido Cervone, The Pennsylvania State University Joseph Schoonover, Fluid Numerics Julia Collins, University of Colorado Keith Maull, National Center for Atmospheric Research (NCAR) Maggie Sleziak, National Center for Atmospheric Research (NCAR) Mick Coady, National Center for Atmospheric Research (NCAR) Sheri Mickelson,National Center for Atmospheric Research (NCAR) Shiquan Su, National Center for Atmospheric Research (NCAR) Srinath Vadlamani, Arm Weiming Hu, The Pennsylvania State University

Workshop Administrator Taysia Peterson, National Center for Atmospheric Research (NCAR)

iii This page is intentionally left blank.

iv PyELM-MME: A Python Platform For Extreme Learning Machine Based Multi-Model Ensemble

Nachiketa Acharya Kyle Joseph Chen Hall Center for Earth System Modeling, Analysis, & Data (ESMAD) International Research Institute for Climate & Society Department of Meteorology and Atmospheric Science The Earth Institute at Columbia University The Pennsylvania State University Palisades, NY USA University Park, PA [email protected] [email protected]

Abstract— The generation of a multi-model ensemble (MME) PyELM-MME implements ELM, as well as traditional is a well-accepted way to improve the skill of the climate forecasts MME methods like the ensemble mean (EM) and multiple produced by individual general circulation models. Recently, linear regression (MLR) as benchmarks. One can compare and there has been significant interest in exploring the potential to contrast the prediction skill of the different methods using improve climate prediction using Machine Learning based MME. PyELM-MME’s forecast verification module. In this study, we One such Machine Learning method is the Extreme learning describe the co-design, co-development, and skill assessment Machine (ELM), which is a state-of-the-art non-linear regression of the PyELM-MME. method based on single-hidden-layer feed-forward neural networks. We developed PyELM-MME, a Python platform for II. BACKGROUND AND GOAL producing ELM-based MME climate predictions and comparing them with those produced by other traditional MME methods like The SLFN, a simple form of ML, has been extensively ensemble mean and multiple linear regression. PyELM-MME also studied from both theoretical and practical perspectives for its includes a forecast verification module, which allows one to assess learning capacity and fault tolerance. However, the efficacy of the relative prediction skill of the different methodologies. In this SLFN-based methods is highly dependent on appropriate tuning study, we describe the co-design, co-development, and skill of their adjustable hyperparameters, e.g., transfer function, assessment of PyELM-MME. learning rate, and the number of nodes in the hidden layer. Additionally, there are several disadvantages to traditional Keywords—Multi-Model Ensemble, Extreme Learning Machine, SLFN-based methods, including long computation time, over- Python Platform fitting, and vanishing gradient. I. INTRODUCTION To overcome such shortcomings, a novel learning algorithm The generation of a multi-model ensemble (MME) is a well- for SLFN called Extreme Learning Machine (ELM) has been accepted way to improve the skill of forecasts generated by proposed [6]. In the proposed algorithm, the network’s input individual general circulation models (GCMs). There are two weights and hidden biases are randomly chosen, and its output common approaches to making an MME: one either combines weights are determined analytically using the Moore-Penrose the individual forecasts with equal weights, or weights them generalized inverse. The basic principle distinguishing ELM according to their prior performance [1,2]. Numerous studies from the traditional neural network methodology is that the have shown that multi-model ensembles, regardless of which parameters of the feedforward network are not trained through methods have been used, exhibit increased prediction skill when backpropagation during ELM model fitting. Implementing an compared to single-model forecasting [3,4]. After previous MME using ELM requires the following steps: pioneering work, there is strong interest in exploring the • Selecting input and output neurons (training dataset) potential of Machine Learning (ML) based MME for improving seasonal forecasts [5]. This previous work proposed generating • Scaling the input dataset MME forecasts using a state-of-the-art non-linear regression • Selecting the activation function method based on single-hidden-layer feed-forward neural networks (SLFN) called Extreme Learning Machine (ELM) • Training and testing the model [5]. We developed PyELM-MME, a Python platform for generating ELM-based MME climate predictions, with the goal This entire methodology is summarized in the flow chart of facilitating innovation in forecasting and exploring the use of presented in Fig. 1. ELM in MME generation.

1 Fig. 1. Flow chart illustrating the steps of the procedure for implementing the Extreme Learning Machine based Multi-Model Fig. 2. PyELM-MME Data Flow, implemented through Reader, Ensemble approach. MME, Cast, SPM, Scaler, and Plotter classes.

III. PYELM-MME DESIGN A. Input Files & Data Management In implementing the PyELM-MME software package, we PyELM-MME’s “Reader” class handles file input for both aim not just to explore making ELM-based MME climate the 2D and 4D cases. For the 2D case, the user must supply a predictions, but also to make the process as accessible as comma-separated-values (CSV) file, which can easily be possible to novice programmers and climate forecasters. produced in Microsoft Excel or another spreadsheet application. PyELM-MME takes care of Python file input and output, data The 4D case requires the user to supply one or more NetCDF preprocessing, model training, and plotting of results using files. Depending on the case, the “Reader” object will industry-standard Python libraries like SciKit-Learn [7], hpelm dynamically reshape the data and prepare it for use in model [8], NumPy [9], and Xarray[10]. It handles data formatting and training. storage so the user can build complex statistical models without getting bogged down in the details. After intake, the data is stored as an instance of PyELM- MME’s “Cast” data structure. The “Cast” structure stores the PyELM-MME supports two different use cases: one which data alongside its metadata, models that have been trained on it, handles spatial (4D) input data, and one which handles spatially and instances of preprocessing tools that it was manipulated aggregated (2D) input data. The 4D case involves data for with, for consistency. The training dataset, test dataset, and multiple GCMs, at many points in space, across time, while the forecast dataset are all stored in their own instances of the “Cast” 2D case handles data for multiple GCMs across time, but at a data structure in order to prevent data leakage and model single point in space. Since PyELM-MME’s operations are overfitting. The “Cast” datatype also facilitates leave-n-out spatially independent, and evaluated on a gridpoint-by-gridpoint cross-validation during model training. basis, the 4D case can be treated as a repetition of the 2D case. This greatly simplifies the software implementation. The user is also expected to provide hyperparameter settings in the form of a Python dictionary with specific keys. Defaults PyELM-MME is flexible and robust, and its component are set so that if a hyperparameter dictionary is not provided, no classes and objects can be used like any other Python library by problems arise other than potentially poor performance. More an advanced programmer. However, to facilitate the tool’s use details about the PyELM-MME API and usage examples are by novice programmers, samples of both the 2D and 4D cases available on the project’s github page. have been implemented and made available. All a user need do to create ELM-based MME climate predictions is provide data B. Data Preprocessing files, customize hyperparameter settings, and hit the ‘run’ The “MME” class is the top-level data structure that a user button. interacts with the most. It coordinates access to the different PyELM-MME is distributed through Anaconda.org’s “Cast” objects, organizes data preprocessing, and controls package manager [11], which significantly decreases the user’s model training. Generally, the data is scaled to a standard range initial time investment. The package can be installed with the using either the MinMax scaling methodology, or the following terminal command. Standardized Anomaly scaling methodology. The MinMax scaling method is defined by subtracting the data minimum, conda install -c hallkjc01 pyelmmme dividing by the data range, then adding the minimum of the desired target interval, and multiplying by the range of the PyELM-MME’s full source code is available at desired target interval. The Standardized Anomaly method is https://github.com/kjhall01/PyELM-MME, and the PyELM- MME workflow is summarized by the flowchart provided in Fig. 2.

2 defined by subtracting the data mean and dividing by the data standard deviation.

C. Model Construction One statistical model is constructed at each geographical point in both the 4D case and the 2D case, since the 2D case is treated as a single geographical point. The “Single-Point Modeler” class provides a flexible access to numerous statistical methods, including hpelm’s ELM [8], Ensemble Mean in NumPy [9], and SciKit-Learn’s MLR [7]. It can easily be extended to include new desirable methods. The SPM class implements a “fit” method, which tunes the underlying mathematical structure to the training set, and a “predict” method, which uses a tuned statistical model to make predictions for previously unseen data. D. MME Skill Evaluation Fig. 3. Precipitation time series (mm) from observation and cross- validated two MME schemes: simple arithmetic mean (EM), and Generally, once the underlying statistical models are trained, extreme learning machine (ELM) from 1982–2008. and leave-n-out cross-validated climate predictions are produced, it is desirable to evaluate them by comparing them to rainfall and has a substantial mean bias. Not only does ELM ground-truth observations. The MME class coordinates the have less mean bias, but it also captures the inherent variability calculation of a number of skill metrics on a point-by-point basis of observed rainfall. including: • Pearson Correlation Further, to examine the performance of ELM and EM-based MME, root mean square error (RMSE) and index of agreement • Spearman Correlation (IOA) along with mean (climatology) and standard deviation • Mean Absolute Error (inter-annual variability) are computed and represented in Table 1. The observed mean and standard deviation are also • Mean Squared Error presented in the same table. ELM out-performed EM by all skill • Root Mean Squared Error metrics. Notably, when compared to EM, ELM’s inter-annual variability is significantly closer to that of the observed rainfall, • Index of Agreement with a much smaller RMSE score and a higher IOA. E. Plotting Results PyELM-MME provides versatile plotting utilities that can be V. CONCLUSION used to examine both skill metrics and deterministic forecasts This study focuses on developing an improved multi- across time and space. PyELM-MME’s “Plotter” class model ensemble (MME) scheme using machine learning. For implements mapping functions for the 4D case, as well as time series, box plot, bar graph, and skill matrix functions that can be this purpose, the extreme learning machine (ELM) technique used in the 2D case, or for a selected point in the 4D case. acts as a fast, efficient substitute for a single-hidden-layer Geospatial visualizations are implemented using the Cartopy feed- forward neural network, and the PyELM-MME library [12]. package makes the approach easy to use. Case study rresults strongly indicate that, when compared with the traditional IV. CASE STUDY: PREDICTION OF MONSOON MME scheme, the ELM method significantly enhances RAINFALL OVER BANGLADESH forecast skill.

To examine the performance of the proposed method TABLE I. NUMERICAL SKILL ASSESSMENT OF ELM AND ENSEMBLE compared with the traditional MME method, including simple MEAN arithmetic mean (EM) of the individual ensemble forecasts Metric (combining with equal weights) as a benchmark, we use the Standard summer (Jun-Jul-Aug-Sep) monsoon rainfall over Bangladesh Mean RMSE IOA Deviation as a case study during 1982-2018. The lead-1 hindcasts from seven General Circulation Models belonging to the North Observations 1679.55 225.06 -- -- American Multi-Model Ensemble (NMME) project phase 2 Ensemble [13] were selected along with the ENACTS-BMD dataset for 1044.40 37.70 677.21 0.07 Mean observational reference [14] for this study. The year-to-year rainfall time series of observations and two cross-validated ELM 1687.40 139.96 273.80 0.35 MME methods have been plotted using PyELM-MME in Fig. 3. It is clearly visible that the EM underestimates the observed

3 REFERENCES pp. 2825-2830, 2011. [8] A. Akusok, K. Bjo¨ rk, Y. Miche and A. Lendasse, ”High- [1] Acharya N, Kar S.C, Kulkarni M.A, Mohanty U.C and Sahoo L.N. Performance Extreme Learning Machines: A Complete Toolbox for (2011a) Multi-model ensemble schemes for predicting northeast mon- Big Data Appli- cations,” in IEEE Access, vol. 3, pp. 1011-1025, soon rainfall over peninsular India. J. Earth Syst. Sci. 120, No. 5, 2015, doi: 10.1109/AC- CESS.2015.2450498. October 2011, pp. 795–805. [9] Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array [2] Acharya, N., Kar, S. C., Mohanty, U. C., Kulkarni, M. A. and Dash, programming with NumPy. Nature 585, 357–362 (2020). DOI: S. K., (2011b), Performance of GCMs for seasonal prediction over 0.1038/s41586-020- 2649-2. India [10] Hoyer, S. & Hamman, J., (2017). xarray: N-D labeled Arrays and - A case study for 2009 monsoon”, Theor. Appl. Climatol., 105, 3-4, Datasets in Python. Journal of Open Research Software. 5(1), p.10. 505-520. DOI: http://doi.org/10.5334/jors.148 [3] Casanova, S. and Ahrens, B. (2009). On the weighting of multimodel [11] Anaconda Software Distribution. (2020). Anaconda Documentation. ensembles in seasonal and short-range weather forecasting. Monthly Anaconda Inc. Retrieved from https://docs.anaconda.com/ Weather Review 137, pp. 3811– 3822. [12] Cartopy. v0.11.2. 22-Aug-2014. . UK. [4] Weigel, A. P., M. A. Liniger, and C. Appenzeller, 2008: Can https://github.com/SciTools/cartopy/archive/v0.11.2.tar.gz multimodel combination really enhance the prediction skill of [13] Kirtman, B. P., and Coauthors, 2014: The North American Multi- probabilistic ensemble forecasts? Quart. J. Roy. Meteor. Soc., 134, Model Ensemble (NMME): Phase-1 Seasonal to Interannual 241–260. Prediction, Phase-2 Toward Developing Intra-Seasonal Prediction. [5] Acharya N, Srivastava N.A., Panigrahi B.K. and Mohanty U.C. Bull. Amer. Meteor. Soc. doi: 10.1175/BAMS-D-12-00050.1, in (2014): Development of an artificial neural network based multi- press. model ensemble to estimate the northeast monsoon rainfall over south [14] Acharya, N.; Faniriantsoa, R.; Rashid, B.; Sultana, R.; Montes, C.; peninsular India: an application of extreme learning machine. Climate Dinku, T.; Hassan, S,2020: Developing High-Resolution Gridded Dynamics. 43(5):1303-1310. Rain- fall and Temperature Data for Bangladesh: The ENACTS-BMD [6] Huang GB, Li MB, Chen L, Siew CK (2008) Incremental extreme Dataset. Preprints 2020, 2020120468 (doi: learning machine with fully complex hidden nodes. Neurocom- puting 10.20944/preprints202012.0468.v1). 71(x):576–583 [7] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12,

4 Anomaly Detection in Particle Accelerators using Autoencoders

1st Jonathan P. Edelen 2nd Nathan M. Cook RadiaSoft LLC RadiaSoft LLC Boulder, USA Boulder, USA [email protected] [email protected]

Abstract—The application of machine learning techniques for superconducting radio frequency (RF) cavities and magnets anomaly detection in particle accelerators has gained popularity is of interest due to the potential catastrophic nature of a in recent years. These efforts have ranged from the analysis failure of these devices. ML tools have been applied to detect of quenches in radio frequency cavities and superconducting magnets to anomalous beam position monitors, and even losses anomalies in superconducting magnets at CERN [9] and RF in rings. Using machine learning for anomaly detection can be cavities at DESY [10]–[12]. Additionally, machine learning challenging owing to the inherent imbalance in the amount of has been used to identify and remove malfunctioning beam data collected during normal operations as compared to during position monitors in the Large Hadron Collider (LHC), prior faults. Additionally, the data are not always labeled and therefore to application of standard optics correction algorithms [13]. supervised learning is not possible. Autoencoders, neural net- works that form a compressed representation and reconstruction Other efforts have sought to use ML for detection of errors in of the input data, are a useful tool for such situations. Here we hardware installation [14]. explore the use of autoencoder reconstruction analysis for the While many of these efforts have shown success, results for prediction of magnet faults in the Advanced Photon Source (APS) global fault prediction have been limited. A recent effort at J- storage ring at Argonne National Laboratory. PARC utilized the System Invariant Analysis Technique to de- Index Terms—machine learning, autoencoder, particle accel- erator, light source, anomaly detection, semisupervised learning, velop an operational fault prediction algorithm [15]. However, unsupervised learning these results are preliminary, and loss classification and fault prediction remain active areas of research. Loss prediction has been studied at the LHC [16] but the use of autoencoders for I.INTRODUCTION fault prediction has yet to be fully explored. In this paper In recent years machine learning (ML) has been identified we evaluate the use of autoencoders to identify precursors to as having the potential for significant impact on the modeling, magnet failures in the Advanced Photon Source (APS) storage operation, and control of particle accelerators [1], [2]. These ring. We begin with an overview of autoencoders and our techniques are attractive due to their ability to model nonlinear methodologies followed by a discussion of the APS accelerator behavior, interpolate on complicated surfaces, and adapt to chain and our dataset. We then build autoencoder models and system changes over time. This has led to a number of demonstrate their efficacy using both unsupervised learning dedicated efforts to apply ML and early efforts have shown and semisupervised learning. promise. For example, neural networks have been used as surro- II.AUTOENCODERS gates for traditional accelerator diagnostics to generate non- Autoencoders are a class of neural networks that seek to interceptive predictions of beam parameters [3], [4]. Neural reconstruct an input dataset while simultaneously reducing its networks have been used for a range of machine tuning dimensionality. The two main characteristics of the autoen- problems utilizing inverse models [5], [6]. When used in con- coder are that the inputs are equal to the outputs, and the junction with optimization algorithms these have demonstrated waist of the network is smaller in dimension than the input improved switching times between operational configurations dataset. Figure 1 shows a schematic of an autoencoder. Here [7]. Neural networks have also been demonstrated to signif- the number of nodes is steadily decreased in the encoder icantly speed up multi-objective optimization of accelerators section (Blue). The encoded dimension (Orange), also referred by using them as surrogate models [8]. to as the latent space, is the minimum number of nodes. The Anomaly detection has also been specifically highlighted number of nodes per layer then increases in the decoder section as an area where machine learning can significantly impact (Green) to reproduce the input data. The base dimensionality operational accelerators. These algorithms work by identifying of the dataset is determined by the number of nodes at subtle behaviors of key variables prior to negative events. the encoded dimension. Typically the encode and decode There have been many efforts to apply ML tools for anomaly sections of the network are symmetric. While many types of detection across accelerators and accelerator subsystems. For autoencoders can be constructed using feed-forward layers, example, understanding and predicting faulty behavior in convolutional layers, recurrent layers, or a combination of

5 layers, the focus of this paper is on vanilla feed forward neural For each magnet there are current, voltage, and temperature networks. measurements. These measurements are logged at 1Hz. Because there are a large number of magnets and each magnet has multiple diagnostics we simplified the input dataset in order to narrow the focus of our initial investigation. For each sector we computed the sum of the currents over all magnets in the sector at each time-step. This results in 80 input parameters corresponding to the effective magnet current for each sector. Figure 3 shows a heat map of the magnet current values for each of the 80 sectors for the reference dataset. Figure 4 shows a heat map of the magnet current values for each of the 80 sectors for the test dataset. Visual comparison of Figures 3 and 4 show clear differences in the datasets that should be well characterized by the autoencoder. The bulk structure for these two datasets is similar and the variation from sector to sector is small. This indicates that while

Fig. 1. Schematic digram of an autoencoder. The inputs noted by xn and the autoencoder will likely be able to identify anomalies in 0 the outputs also noted as xn. The encoder section is highlighted in blue, the aggregate, differentiating between sector-specific anomalies latent space in orange, and the decoder in green. In this study, we consider will be more challenging. For our initial analysis, we only fully connected layers, but omit the connections to enhance figure clarity. consider aggregate detection, and consider individual sector Autoencoders are commonly used in two configurations. analysis in Section VI. The first is the direct analysis of the latent space. This IV. MODEL LEARNINGAND EVALUATION is accomplished by removing the decoder section from the network and analyzing the output of the latent space nodes The autoencoder was trained and validated on the reference directly. The second configuration is used to quantify the data using an 80/20 split. The fault data was split into two relationship between a training dataset and a test dataset. Here test sets using roughly a 50/50 split. While the training and one evaluates the ability for the autoencoder to reconstruct validation split was conducted randomly, the test set was a given input data set. This provides a quantifiable metric split by run ensure robustness in the semisupervised learning for how similar a new dataset is to the training data either studies. Because our dataset is not uniformly sampled we with respect to individual input parameters or in aggregate by utilized a robust scalar to scale the data prior to training, computing the root mean squared (RMS) reconstruction error, validation, and testing. The robust scaler removes the median p 0 2 value and scales the data by the interquartile range. Note that 1/nΣn(xi − x ) . i the scaler is fit to the reference data and applied to the test III.THE ADVANCED PHOTON SOURCE data before sending it to the model. Due to the fact that The Advanced Photon Source consists of an electron accel- the inputs and outputs are not between -1 and 1 rectified erator chain that produces a bunched beam at 7 GeV energy, linear units will be the most appropriate choice of activation which traverses a periodic storage ring to generate focused function. The network architecture consisted of 5 layers in the radiation for light source end users. The main storage ring encoder starting from 60 nodes and decreasing to 10 nodes contains over one thousand different magnets that are all in increments of 10. The latent space dimension was 5 nodes individually powered. The primary goal of these magnets is to and the decoder section was a mirror of the encoder section. maintain the optical properties of the beam in order to ensure Figure 5 shows the training and validation loss as a function proper delivery of photons to users. The ring is broken down of training epoch (top) and the R2 value for the reconstruction into forty numbered sectors each with an A and B sub-sectors of each parameter (bottom). resulting in a total of 80 individual sectors. Figure 2 shows a Figure 6 shows the predicted input parameter against the schematic diagram of two sectors in the storage ring. ground truth input parameter for nine of the sectors using the The data used for our study were curated over a three validation data. The linear relationship between the model out- year period of running. The data are broken down into two put and the input ground truth here show that our autoencoder categories, the reference data and the fault data. For the is well trained. reference data the storage ring was operating under normal In order to identify fault precursors we computed the conditions, without any faults or beam loss. The fault data reconstruction error in two ways. The first uses the squared contains information on the magnets leading up to a magnet error of each sector and the second uses the root mean squared fault. Here, one of the magnets in the ring fails resulting in error (RMS) over all 80 sectors. We begin with a comparison beam loss and an unexpected down time. For the purposes of of the two different error calculations on the reference data our analysis the actual seconds leading up to and including the and the fault data. Figure 7 shows the squared reconstruction fault are excluded. This allows us to verify the identification error as a function of sample position for the reference data of precursors that can be used for fault prediction in the future. and the fault data. The shaded region depicts the variance in

6 Fig. 2. Schematic of the APS storage ring. Here we show the A and B sectors for one of the larger numbered sectors. The delineation between A and B sectors is noted by the yellow and blue shading. The magnet types are color coded with quadrupoles in Blue, sextupoles in yellow and correctors in brown. The dipole magnets noted in red are not included in our study.

Fig. 3. Heat map of the reference data used to train the autoencoder. Fig. 4. Heat map of the test data used to train the autoencoder. The vertical The vertical axis denotes the sector number and the horizontal axis denotes axis denotes the sector number and the horizontal axis denotes the sample the sample magnitude. The color scale indicates the frequency of those magnitude. The color scale indicates the frequency of those magnitudes magnitudes aggregated over the entire run period. aggregated over the entire run period.

V. REGIONOFCONVERGENCEANDSEMISUPERVISED EVALUATION the squared reconstruction error across all 80 sectors. Note that the runs are concatenated in time and the samples span We determine the threshold for categorizing the anomalies many different runs over the aforementioned three year period. using two different methods. The first method is entirely unsu- The sample positions are scaled to be between zero and one pervised, and the autoencoder assumes none of the reference to allow for each dataset to be plotted simultaneously while data should be flagged as anomalous. The second method is they may cover different timespans. semisupervised, and determines the threshold for anomalies based upon the performance of the autoencoder on the fault Figure 8 shows the RMS reconstruction error as a function data. In this way, the semisupervised method aims to maximize of sample number for the reference data and the fault data. As the number of true positives while minimizing false positives. with Figure 7, the RMS reconstruction error on the fault data is Half of the fault data runs were used for tuning the threshold on average two orders of magnitude larger than the reference while the other half were held aside for final testing to ensure data. This indicates that the machine is in a fundamentally robustness of our threshold choice. different state leading up to the magnet faults. Our autoencoder We define a false positive as any datapoint in the reference is able to detect this state change using either of the two set that is flagged as anomalous. Conversely, a true positive evaluation metrics discussed by being trained only on data is defined as a sequence within the fault data that contains at during normal operation. least a single point that is flagged as anomalous. This definition

7 Fig. 7. Mean squared error across all sectors as a function of sample position for the reference data (training and validation) and fault data (test and final test). The green dataset is used later for tuning detection thresholds in the semisupervised case while the red data are held aside for final testing. The shaded region indicates the variance in the squared error across the 80 sectors. Because the squared error is always positive the variance is only shown in one direction.

Fig. 5. Top: Training and validation loss for the autoencoder as a function of epoch. The network is trained in mini-batches where the batch size was 1000. Bottom: R2 value as a function of input parameter for the validation set. For a perfectly trained model the relationship between the predicted value and the input value should be linear giving an R2 of 1. Here we see very good agreement between the input data and the output data for the autoencoder.

Fig. 8. RMS error as a function of sample position for the reference data (training and validation) and fault data (test and final test) . The green dataset is used later for tuning detection thresholds in the semisupervised case while the red data are held aside for final testing.

permits us to optimize the threshold for selection such that true positives are maximized while maintaining low rates of false positives. The fully unsupervised routine, using the squared recon- struction error, flags 12% of the fault data as anomalous while correctly identifying 17 of 26 anomalous runs. Using the RMS error metric, the unsupervised autoencoder flags 65 % of the fault data as anomalous with 19 of 26 of the runs correctly identified as containing anomalous datapoints. In the unsupervised case we tested on all fault data. For the semisupervised case, we vary the reconstruction Fig. 6. Model prediction as a function of the ground truth for nine randomly chosen sectors. threshold in order to optimize the number of true positives while minimizing the number of false positives. Figure 9

8 shows the region of convergence (ROC) plot for both the squared error metric and the RMS error metric. More rapid convergence is obtained for the squared error metric than for the RMS metric.

Fig. 10. Number of faulty sectors for a given fault run. The data are sorted by the number of faulty sectors identified in the semisupervised case.

to the total time of that run. For the RMS error metric, Fig. 9. Region of convergence plot for the RMS error and squared error Figure 11 shows that precursors can be easily identified by evaluation metrics. The main plot shows the true positive rate vs the false positive rate as a function of anomaly threshold. Inset a) shows the true the autoencoder for a large fraction of the runs; in some cases positive rate as a function of the error threshold and inset b) shows the false anomalous behavior is identified hours before the fault occurs. positive rate as function of the error threshold. Note that the threshold is When applying the semisupervised threshold, the vast majority normalized to the peak value of the reconstruction error computed on the reference data. of the runs are correctly identified as anomalous at the start of the run, independent of run length.

VI.FAULTDETECTIONANDFORECASTING Our previous analysis used the aggregate sector data to identify an anomalous machine state, with no indication of where the anomaly took place. We next examine the perfor- mance of the autoencoder in identifying sector-specific faults. As with our previous efforts, we consider both unsupervised and semisupervised approaches. To quantify our results, we make use of the fact that for each run in the test dataset, only a single sector experiences a magnet failure. Therefore, if the autoencoder identifies more than one sector as anomalous then either more data or a different evaluation metric will be required to improve the accuracy of the prediction. Because the RMS error metric cannot distinguish between sectors, we only consider the squared error metric. Figure 10 shows the number of anomalous sectors identified using the squared error metric and sorted for semisupervised case. As can be seen, multiple sectors are flagged across both scenarios, indicating a failure of Fig. 11. First indication of an anomaly as a function of the run time for the model to accurately identify the faulty sector, although the the fault data using the RMS error metric. Red is the data used to tune the detection threshold while blue is the final test data that is not used in any of unsupervised routine appears to perform better. This suggests the training or parameter tuning. The dashed lines represent the unsupervised that additional information is necessary to accurately forecast case while the solid line is the semisupervised case. specific magnet failures from these data. However, the identification of precursors is much more Figure 12 shows the same data as Figure 11 but for the promising. Figures 11 and 12 show analysis plots that compare squared error metric. Here we also see that many of the runs unsupervised learning with semisupervised learning for the have precursors that are detectable long before the fault occurs. identification of precursors that result in magnet failures. Moreover when we use the semisupervised thresholds, all but Dashed lines depict the results from unsupervised learning and one of the runs has the majority of the data in a fault condition. the solid lines for semisupervised learning. Because the fault This shows that the autoencoder is well suited to identify runs are not the same length, the prediction is normalized anomalous states in the machine and that precursors to the

9 faults are detectable hours before the fault occurs even using states are detectable hours before a fault occurs. The ability aggregate parameters for a small subset of the dataset. to reliably distinguish between normal and anomalous states is critical to the operations of modern accelerators. Our results show that autoencoders are a promising tool for this application.

ACKNOWLEDGMENT The authors wish to thank Dr. Michael Borland and the Advanced Photon Source for contributing the data that al- lowed us to complete this study. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Nuclear Physics under Award Number DE- SC0019682.

REFERENCES [1] A. L. Edelen, S. G. Biedron, B. E. Chase, D. Edstrom, S. V. Milton, and P. Stabile, “Neural networks for modeling and control of particle accelerators,” IEEE Transactions on Nuclear Science, vol. 63, no. 2, pp. 878–897, April 2016. [2] A. Edelen et al., “Opportunities in Machine Learning for Particle Fig. 12. First indication of an anomaly as a function of the run time for the Accelerators,” 2018. fault data using the squared error metric. Red is the data used to tune the [3] C. Emma, A. Edelen, M. J. Hogan, B. O’Shea, G. White, and V. Yaki- detection threshold while blue is the final test data that is not used in any of menko, “Machine learning-based longitudinal phase space prediction of the training or parameter tuning. The dashed lines represent the unsupervised particle accelerators,” Phys. Rev. Accel. Beams, vol. 21, p. 112802, Nov case while the solid line is the semisupervised case 2018. [4] A. L. Edelen, S. G. Biedron, S. V. Milton, and J. P. Edelen, “First Steps Toward Incorporating Image Based Diagnostics Into Particle Accelerator Control Systems Using Convolutional Neural Networks,” in Proceedings VII.CONCLUSIONS of the 2016 North American Particle Accelerator Conference, 2016. We have demonstrated the use of autoencoders to detect [5] A. L. Edelen, J. P. Edelen, S. G. Biedron, S. V. Milton, and P. J. M. van der Slot, “Using Neural Network Control Policies For Rapid precursors to faults in the APS storage ring using both unsu- Switching Between Beam Parameters in a Free Electron Laser,” in pervised and semisupervised learning. Using autoencoders we Proceedings of the 2017 Deep Learning for Physical Sciences workshop can reliably detect a change in the machine state independent at the Neural Information Processing Systems Conference, 2017. [6] J. P. Edelen, N. M. Cook, K. A. Brown, and D. P. M, “Optimal Control of time before the fault occurs for our dataset. It is likely that for Rapid Switching of Beam Energies for the ATR Line at BNL,” in the machine state change is detectable much sooner than our Proceedings of the International Conference on Accelerator and Large longest available dataset of just over 120 minutes. Analysis of Excremental Physics Control Systems, 2019. [7] A. Scheinker, A. Edelen, D. Bohler, C. Emma, and A. Lutman, “Demon- more data over a wider range of operating configurations is stration of model-independent control of the longitudinal phase space necessary to test this hypothesis. of electron beams in the linac-coherent light source with femtosecond As part of this work we also studied using supervised resolution,” Phys. Rev. Lett., vol. 121, p. 044801, Jul 2018. [8] A. Edelen, N. Neveu, M. Frey, Y. Huber, C. Mayes, and A. Adelmann, learning on a sequence of data to predict whether a fault will “Machine learning for orders of magnitude speedup in multiobjective occur and the expected time to the fault. While the network optimization of particle accelerator systems,” Phys. Rev. Accel. Beams, was easily able to learn the binary output of whether or not the vol. 23, p. 044601, Apr 2020. [9] M. Wielgosz, A. Skoczea, and M. Mertik, “Using lstm recurrent neural fault would occur, our models were unsuccessful in accurately networks for monitoring the lhc superconducting magnets,” Nuclear predicting the timing of the fault. This is likely due to the fact Instruments and Methods in Physics Research Section A: Accelerators, that the test data largely samples machine states that were Spectrometers, Detectors and Associated Equipment, vol. 867, pp. 40 – 50, 2017. already heading towards a fault, without providing measure- [10] A. S. Nawaz, S. Pfeiffer, G. Lichtenberg, and H. Schlarb, “Self-organzied ments during the earlier transition towards a faulty state. This critical control for the european xfel using black box parameter iden- conclusion is consistent with the data presented in Figures 11 tification for the quench detection system,” in 2016 3rd Conference on Control and Fault-Tolerant Systems (SysTol), Sep. 2016, pp. 196–201. and 12, which show that the machine is in a different state [11] A. Nawaz, S. Pfeiffer, G. Lichtenberg, and P. Rostalski, “Anomaly during virtually all of the fault data. As a result, the data does detection for the european xfel using a nonlinear parity space method,” not appear to provide the necessary indicators of the change IFAC-PapersOnLine, vol. 51, no. 24, pp. 1379 – 1386, 2018, 10th IFAC Symposium on Fault Detection, Supervision and Safety for Technical from normal to faulty state, and subsequently cannot capture Processes SAFEPROCESS 2018. the total time-to-fault. Thus, while the autoencoder can clearly [12] A. Nawaz, G. Lichtenberg, S. Pfeiffer, and P. Rostalski, “Anomaly differentiate between the two machine states, more data are detection for cavity signals - results from the european xfel,” in 9th International Particle Accelerator Conference, 2018, p. WEPMF058. needed to take the next step in predicting when a fault will [13] E. Fol, J. M. Coello de Portugal, and R. Tomas, “Unsupervised machine occur. learning for detection of faulty beam position monitors,” p. WEPGW081. We conclude that a) one can correctly identify a non- 4 p, 2019. [14] T. Dewitte, W. Meert, E. Van Wolputte, and P. Van Trappen, “Anomaly standard or anomalous machine state without directly sampling detection for cern beam transfer installations using machine learning.” it (i.e. using unsupervised learning) and b) these anomalous JACoW, 2019.

10 [15] T. Soma, M. Takagi, K. Ishii, and M. Yoshioka, “Predictive detection and diagnosis of accelerator system using system invariant analysis technology (siat),” Japan, 2017, p. 1427. [16] G. Valentino, R. Bruce, S. Redaelli, R. Rossi, P. Theodoropoulos, and S. Jaster-Merz, “Anomaly detection for beam loss maps in the large hadron collider,” Journal of Physics: Conference Series, vol. 874, p. 012002, jul 2017.

11 Empirical Inverse Transform Function for Ensemble Forecast Calibration

Weiming Hu Laura Clemente Department of Geography Geospatial Research Laboratory The Pennsylvania State University U.S. Army Engineer Research and Development Center Pennsylvania, U.S.A. Virginia, U.S.A. [email protected] [email protected]

George S. Young Guido Cervone Department of Meteorology and Atmospheric Science Department of Geography The Pennsylvania State University The Pennsylvania State University Pennsylvania, U.S.A. Pennsylvania, U.S.A. [email protected] [email protected]

Abstract—Ensemble models have become the backbone for un- ability to remove random errors within the ensemble members. certainty quantification for weather and climate simulations. The This accuracy improvement sometimes leads to the preference level of agreement among ensemble members, usually calculated of using a summary of the ensemble when a deterministic as the ensemble spread, can be a quantifiable measure of the prediction uncertainty. However, ensemble models are usually form is desired. However, the characteristics of the ensemble found to be subject to under- and over-dispersion, indicating a provide a representation of the forecast uncertainty, which can mismatch between the observed variability and the predicted be lost when calculating the ensemble mean. The forecast uncertainty. This mismatch can jeopardize the reliability of uncertainty is usually quantified as the standard deviation of such ensemble forecasts and relate to suboptimal uncertainty the ensemble members. Likewise, a probability distribution assessment. Another aspect of the problem lies within the relationship can be calculated from the ensemble to make forecasters or between ensemble spread and its predictive error. Normally, a users better informed of the risks associated with a particular well-calibrated forecast ensemble demonstrates a strong positive forecast. correlation between the ensemble spread and the predictive error, There are generally two types of forecast ensembles, dynam- so that the ensemble spread can be a reliable measure of the ical and statistical. Dynamical modeling is typically performed ensemble quality. For an under- or over- dispersive ensemble, this relationship is usually not fulfilled. using the multi-model and multi-initialization approach [5]. An This work proposes the Empirical Inverse Transform function ensemble of models with different implementations of core (EITrans) as an ensemble calibration technique. The calibration components, e.g. the parameterization of physical processes to a specific forecast ensemble is derived based on its historical and the dynamic solvers [6], are run to generate different weather analogs and the corresponding observations. Experi- predicted values for a particular weather event. Since these ments have been carried out for surface temperature predictions with both under- and over-dispersive ensembles. Results show models take slightly different approaches, forecasts generated that EITrans is capable of matching the forecasted and observed have different random errors. Thus, the average of all ensemble distribution and improving model reliability. members, with the resultant error cancellation, outperforms Index Terms—Ensemble Calibration, Analog Ensemble, In- any individual member. verse Function Dynamical ensembles can also be generated with multi- initialization to account for the highly chaotic characteristic I.INTRODUCTION of the weather continuum. Weather processes are sensitive Probabilistic forecasts, in the form of ensemble predic- to initial conditions in that a slight change in the starting tions, can be generated with Numerical Weather Prediction condition used to initialize a weather model can result in (NWP) models, such as the Global Ensemble Forecast System different evolution of weather processes [7]. These slight (GEFS) and European Center for Medium Weather Forecasting changes can be caused by observation errors from imprecise (ECMWF). Instead of providing a single prediction for the equipment, a limited precision in modern computers, a finite event of interest, an ensemble is generated in order to provide spatial resolution of weather models and observing networks, a range of possible future states of the atmosphere. This and other factors. In this approach, the same physical model ensemble can be summarized using the mean of the members is run multiple times with each run utilizing slightly perturbed which converts it back to the deterministic form [1, 2]. initial conditions in an effort to capture the model’s sensitivity With a well-constructed ensemble, the ensemble mean is on to these varying initial conditions. average more accurate than each member [3, 4] due to the Statistical ensembles are usually generated from postpro-

12 cessing a single run from a deterministic NWP model. En- This study proposes the Empirical Inverse Transform func- sembles generated via the statistical approach are sometimes tion (EITrans) that aims to improve the ensemble quality by referred to as the “daughter” ensembles [8]. The uncertainty of addressing the rank histogram miscalibration problem. The forecasts required to determine the spread between members new ensemble calibration method works with both under- and of the daughter ensemble can be estimated from the statistical over-dispersive ensembles. properties of historical errors. The rest of the paper is organized as follows: Section II There is, however, the third type of ensemble. The Analog introduces an NWP model and the collection process of the Ensemble (AnEn) is generated based on similar historical research data; Section III describes, in depth, the EITrans weather forecasts and the associated observations. The AnEn is and the AnEn technique; Section IV compares and discusses a computationally efficient technique used to generate forecast results from applying the EITrans on two types of ensembles, ensembles because, given a historical simulation archive of the AnEn and the persistence ensemble; Section V summarizes a deterministic model, it does not require additional model the study with conclusions and future directions. simulations. The deterministic model is only run once and then the AnEn can generate ensembles based on the single II.DATA run and the associated observations. This is different from, e.g., the Global Ensemble Forecast System, where 21 separate The North American Mesoscale Model (NAM) model is an models are run simultaneously to generate forecast ensembles. operational weather model run by the National Centers for Furthermore, the AnEn is highly parallelizable because the Environmental Prediction (NCEP). The parent domain of the computation is processed individually at each grid cell [9, 10]. NAM model, which has a 12 km spatial resolution covering Forecast ensembles help assess prediction uncertainty. the Continental United States (CONUS), is used for this study. When ensemble members disagree, meaning the ensemble The model is run four times per day, therefore having four has a large spread, the prediction has a larger uncertainty. initialization cycles, at 00, 06, 12, 18 UTC. For each cycle, it It is preferable to produce ensembles with a small spread provides hourly forecasts up until 36 hours and then three-hour if the ensemble is well calibrated. A well-calibrated forecast forecasts up until 84 hours into the future. Model analyses are ensemble with a small spread indicates high confidence in the available at the Forecast Lead Time (FLT) 000, 001, 002, 003 prediction, and is generally associated with higher predictive and 006 for each cycle. These model analyses are deemed the skill. most realistic source of data for forecast verification. Calibrating forecast ensemble, however, is a difficult task. Historical NAM forecasts [17] and analyses [18] have been One problem is the histogram miscalibration. The rank his- collected from January 1, 2010, to July 31, 2018. Since the togram, also known as the Talagrand diagram, is a diagnostic spatial resolution of NAM is 12 km, there are a total of visualization tool to spot under-/over- dispersion and bias. 262,792 grid points. The total data size reaches approximately It was designed by several scholars [11, 12, 13] based on 8 TB. For this study, therefore, the geographic domain has familiarity with and inspiration from tools such as the Q-Q plot been subset down to six key locations that lie within various [14] and the probability integral transformation [15]. Although climatic zones. Although an analysis over a larger spatial caveats exist [16] for interpreting rank histograms, it remains domain is desired, this study serves as a proof of concept. a popular and powerful tool to visually check the reliability The main objective is to demonstrate that such a calibration of the forecast ensemble. process works for both over- and under- dispersion. Figure 2 Figure 1 shows the rank histogram for the AnEn forecasts shows the location and the climatic characterization of the area of surface temperature with different numbers of ensemble of study. There are 50 grid points for each of the study sites. members ranging from 10 (Figure 1a) to 100 (Figure 1d). Figure 2b calculates the daily cumulative precipitation and the Predictions are generated for State College, Pennsylvania. daily mean temperature for each site averaged using the model A good rule of thumb with respect to the number of analyses from 2010 to 2018. ensemble members is to use the square root of the size Data preprocessing has been carried out for NAM forecasts of the search data repository [9]. Therefore, given daily and analyses. First, relative humidity values are strictly con- forecasts over 7 years,√ a practical choice for the number of fined within the range of [0, 100]. Additionally, since the model ensemble members is 365 × 7 ≈ 50. However, all panels analyses are only available for a subset of the FLTs, they are of Figure 1 show a convex rank histogram, which indicates collected from multiple model cycle times while forecasts are an over-dispersed forecast ensemble. If the ensemble spread only collected from the cycle 00 UTC so that more FLTs can is too large, one traditional solution is to reduce the number be verified. For example, forecasts initialized at 00 UTC have of ensemble members. Figure 1, however, shows that, by 84 FLTs. However, the analyses are only available for the reducing the number of members from 100 to 10, the shape first several FLTs at 000, 001, 002, 003, and 006 h. To verify of the rank histogram barely changes. In fact, it maintains forecasts at 012 h valid time, the verified analysis is collected the generally convex shape and shows that the forecasted from the 000 h valid time from the model run initialized at distribution does not match the observed distribution. This is 12 UTC. After aligning the model forecast and analysis, there a typical manifestation of the rank histogram miscalibration are still some FLTs that do not have a verifying analysis field. problem. They are removed from the verification process

13 Fig. 1. Rank histograms of AnEn forecasts for surface temperature in 2017 with varying numbers of ensemble members. The vertical axis, the observed frequency, shows how often an observed value lies between the predicted values of two adjacent ranked ensemble members. The search period is from January 1, 2010, to December 31, 2016.

dispersive, EITrans is used to calibrate both of them to demonstrate its ability to improve model reliability regardless of the situation.

III.METHOD This section introduces the EITrans technique for calibrating forecast ensembles. The key problem at hand, as demonstrated in Figure 1, is that the forecasted and observed distributions do not match. As previously discussed in Section I, a flat histogram is desired to show that the forecasted and the observed distributions match. However, Figure 1 indicates that the AnEn has an over-dispersive problem and that changing the number of ensemble members does not contribute to any visual improvement to the rank histogram because the convex shape of the histograms barely changes. This suggests that the forecast ensemble needs additional calibration to achieve statistical reliability. The core idea of the EITrans technique is to calculate a series of correction values, each of which corrects a ranked ensemble member (from the lowest to the highest). The num- ber of correction values depends on the number of ensemble members. These correction values are meant to correct the Fig. 2. Study sites are identified in panel (a) and their climatic characterization with daily cumulative precipitation and daily mean temperature averaged from forecast ensemble so that the forecasted distribution better 2010 to 2018 are shown in panel (b) matches the observed distribution. For example, if an ensemble is known to be over-dispersive, values range from a positive value (for the first ranked member) to a negative value (for the Data preprocessing took half an hour on the National Center last ranked member). When applied to the ranked members, for Atmospheric Research (NCAR) supercomputer, Cheyenne this correction shrinks the ensemble spread, bringing the tail [19], with 360 cores. The time was mostly spent on converting values closer to their distribution means. If an ensemble is data from GRIB2 to NetCDF and carrying out spatial queries known to be under-dispersive, the correction tends to move to extract data on the domain of interest. The ensemble the ensemble tails further away from the distribution mean, generation and the EITrans calibration together took 1 (hour) seeking to enlarge the ensemble spread. × 108 (cores) × 6 (locations) = 648 (core∗hours) and the Therefore, there are two parts to the correction technique: computational cost of verification was minimal, carried out 1) Characterization of forecast ensembles (under- or over- on a single core. dispersive); In this work, two forecast ensembles are generated using 2) Calculation of correction values. deterministic NAM predictions. The first technique is the It is important to point out that correction values are AnEn with description in Section III-B; the second technique calculated for a period of historical forecasts, not individually is the Persistence with description in Section III-C. Since for each forecast ensemble. This is because observations the AnEn is over-dispersive and the Persistence is under- are deterministic and a period of observations is needed to

14 generate the observed distribution and, consequently, to verify should be on a continuous scale, for example, where the bin the forecasted distribution. In this work, one set of correction cut should be for a continuous variable such as temperature. values are generated for the entire test period from January 1, A linear transformation is therefore introduced to translate 2017, to July 31, 2018, at each of the six locations. Forecasts from an ordinal scale to a continuous scale. We do not assume for each of the six locations are corrected independently. linearity across all ensemble members, but instead only assume linearity between two adjacent ensemble members when ap- A. Empirical Inverse Transform Function plying this technique. Specifically, the following equation is Given a period of forecast ensembles and the associated used to calculate a correction value: observations, the goal of the EITrans is to calculate a series v − v v − v of additive correction values, one for each of the ranked i+1 i i = ki = 0 , (1) ensemble members, so that the forecast distribution matches (i + 1) − i x − i the observed distribution. The goal is to calculate a correction where x0 is the calculated member rank that lies within the series for ranked ensemble members in order to achieve a rank i and i + 1. vi, and vi+1 correspond to the i-th and the uniform rank histogram. i+1-th ensemble forecast value, for example, temperature. v is To begin with, we assume a rank histogram with m + 1 the linearly interpolated ensemble forecast value at the ordinal 0 bins, where m is the number of ensemble members. The goal location of x . ki is the calculated coefficient between the is to identify a set of correction values {ci, i = 1...m} to be ordinal scale and the continuous scale over the space between added to each of the ranked members so that the new rank rank i and i + 1. histogram would have a uniform distribution. It is important to note that i is an integer from {0, ..., m+1} A rank histogram is analogous to a Probability Mass Func- given m ensemble members. However, v0 and vm+1 are not tion (PMF), a discretized version of the Probability Density defined. If v1 indicates the first ranked member, v0 indicates Function (PDF). The bins are created with ensemble members −∞ and the bar height (Figure 3a) between v0 and v1 indicates as bin cuts, and the bar heights correspond to the frequency the relative frequency of observations outside of the ensemble of observations lying within a particular bin. The correction to the left. Similarly, if vm indicates the last ranked member, values applied to the ranked members change the location vm+1 indicates +∞ and the bar height between vm and vm+1 of bin cuts and consequently, the bin sizes so that some indicates the relative frequency of observations outside of the bins include more observations and other bins include fewer ensemble to the right. If these two bars at both ends are much observations. higher than the other bars in the middle, it is an indication of Let y = h(x) be the distribution function of a rank under-dispersion. histogram, where x, the rank number, takes discrete values To address this issue, ki needs to be predefined as a priori from 1 to m+1 and y is the relative frequency of observations based on how under-dispersive the ensembles are, since it can lying within the (x − 1)-th and x-th ranked members. For no longer be calculated with Equation (1). This predefined example, h(1) is the frequency of observations lying outside slope quantifies how aggressive the EITrans should be when left to the first ranked member, and h(m + 1) is the frequency moving the tail members away from the mean of the ensemble of observations lying outside right to the last ranked member. distribution. In practice, this parameter is estimated using a Since bin heights can be different (because observations are grid search algorithm with historical data. not equally distributed), the original rank histogram might not Figure 3 is a pictorial representation of how the EITrans be flat. To flatten the rank histogram, we need to find a new is implemented for on 20-member forecast ensembles for set of bin cuts. This goal can be achieved with an inverse surface temperature. Figure 3a shows the rank histogram of function. the forecasts for 2017. The convex shape of the bins indicates The Cumulative Distribution Function (CDF) of h(x) can the ensembles are over-dispersive. Figure 3b shows the CDF be found empirically as the accumulative sum of frequencies calculated using the bins from Figure 3. To calculate the Px from ranked members, denoted as Y = H(x) = i=1 h(i) correction values for the 20 ensemble members, 20 values are where Y , ranging within (0, 1], is the probability of obser- equally sampled from (0, 1). Each of the bins that contain vations lying left to the x-th ranked ensemble member. Let these values can then be determined using the CDF. Note that x0 = H−1(Y ) be the inverse function and Y take values from while the CDF is discretized, linearity is assumed between 0 to 1 with an interval of 1/m, the resulting x0 is the set of the adjacent bin cuts to determine the exact bin cut location desired bin cuts that generates a rank histogram with a uniform in the ordinal scale for the sampled values. Figure 3c shows distribution. the same rank histogram, as in Figure 3a, but with the new set However, the horizontal axis of a rank histogram is ordinal, of bin cuts in black dashed lines. Finally, the rank histogram rather than continuous. This means the calculated x0 is only is recalculated using the bin cuts in black dashed lines, and meaningful when it is an integer. For example, x0 = 1 indicates the new rank histogram is shown in Figure 3d. Visually, it has the first member in the ensemble, and the x0 = m indicates the significantly improved compared to the original rank histogram last member in the ensemble. If 1 < x0 < 2, this only explains (Figure 3a), now having a uniform bin shape. the desired bin cut should be between the first and the second At this point, a correction value can be calculated for each ensemble members, yet it does not convey where the bin cut of the m members using a set of forecasts and the associated

15 three FLTs; Fi,t+j is the value of the current forecast for the physical variable i at the valid time t + j; Ai,t0+j is the value of the historical forecast for the physical variable i at the valid time t0 + j. Typically, the generation of AnEn includes four steps. A graphical representation of the process can be found in [20, 21]. 1) A current multi-variate forecast is generated from a deterministic NWP model. 2) To identify similar weather forecasts from the histori- cal archive, pair-wise similarity distances between the current forecast and each of the historical forecasts are calculated using Equation (2); 3) Four historical forecasts with the smallest distances (most similar) are identified and their corresponding Fig. 3. A pictorial representation of the process of determining the correction values for a 10-member forecast ensemble. The predictand is surface temper- observations are selected; ature and the test time is 2017. (a, c) show the rank histogram; (b) shows 4) The AnEn is finally constructed using the selected the CDF; (d) shows the calibrated rank histogram. The black dashed lines observations; represent the corrected bin cut locations. The AnEn is a technique that generates accurate forecast ensembles and utilizes computing resources efficiently [9]. observations. Corresponding additive correction can, therefore, Predictor weights and the number of ensemble members are be applied to each of the ranked members. In operational fore- among the most important parameters for the AnEn. Their casts, however, observations are not available when calculating determination usually involves an extensive grid search process rank histograms. The next section, Section III-B, describes [22, 23, 24, 25, 26, 27] to optimize a predefined error how to use the AnEn technique to generate the rank histogram metric, e.g., Mean Absolute Error (MAE) or Continuous Rank which is vital to the calculation of correction values. Probability Score (CRPS). The AnEn has been used to generate forecast ensembles B. Analog Ensemble from NAM, but it can also be used to characterize forecast The AnEn technique generates ensemble forecasts from ensembles. EITrans requires a set of forecasts and the asso- a single deterministic forecast model run. This technique ciated observations to generate a rank histogram, and then requires a historical forecast archive that is generated with to calculate the set of correction values. In an operational a static weather model. Similar historical forecasts can be forecast where future observations are not available, EITrans identified within this archive and AnEn members consist of can instead use rank histograms generated from the historical the corresponding observations to the most similar historical forecasts found to be the most similar to the operational forecasts. The following similarity metric is used to identify forecasts. Assuming the most similar historical forecasts have similar weather analogs, based on a multi-variate Euclidean a similar dispersion signature with the operational forecast, the distance function [20]: correction values estimated from similar forecasts can then be applied to calibrate operational forecast ensembles. v Nv u w C. Persistence Ensemble X ωi u X 2 kF ,A 0 k = (F − A 0 ) (2) t t σ t i,t+j i,t +j Another commonly used technique to generate forecast i=1 fi j=−w ensembles is the persistence ensemble [28, 23]. It assumes where Ft is the NWP model valid at the model initialization the atmosphere always evolves following the current trend. time stamp t at a specific location and FLT; At0 is the historical The steps are defined as follow: repository of NWP deterministic forecasts from the search 1) To generate a forecast ensemble at noon with 20 mem- space at the same location and FLT, but with a different model bers, the observed data from the previous 20 days, also initialization time t0 from within the historical repository of at noon, are selected as ensemble members; deterministic multi-variate predictions; Nv is the number of 2) Repeat the previous step independently for each day in physical variables used during forecast similarity calculation; the test period, each grid cell, and each FLT. ωi is the weight for each physical variable which suggests the These procedures are considered to be operational in the relevant importance of the physical variable with respect to the sense that the historical repository is growing relative to the others; σfi is the standard deviation for the physical variable forward progress of test days. i calculated from the historical forecasts at the same location and FLT; w equals to half of the time window size of the FLTs IV. RESULTS AND DISCUSSIONS to be compared so that weather analogs are identified within The EITrans has been tested with two types of forecast a very small time window, usually equivalent to the length of ensembles that are generated from the AnEn and Persistence,

16 Fig. 4. A sensitivity study of the AnEn forecasts with the number of ensemble members on 2-meter temperature. (a, b, c) show the CRPS, bias, and rank histograms respectively. Rank histograms are shown as line plots and the horizontal axis is normalized with the number ensemble members to simplify the comparison. for a two-meter temperature forecast. Both ensembles are sufficient number of ensemble members to generate accurate generated with the deterministic forecasts from NAM and both forecasts. This pattern is then reversed, when the number contain 20 ensemble members. The test period is from January of ensemble members is further increased from 20 to 600. 1, 2017, to July 31, 2018. For persistence, the search period is The average CRPS climbs from 0.625°C (20 members) to the immediate 20 historical days preceding the target forecasts. 1.18°C (600 members), equivalent to an 88.8% increase in the For the AnEn, the search period is from January 1, 2010, to prediction error. The average bias is monotonously increasing December 31, 2016. from 0.014°C (5 members) to 0.271 (600 members). Typically, The following four variables are selected as predictors for when the number of ensemble members is smaller than 50 identifying weather analogs for predicting 2-meter temper- (0.040°C), the systematic bias only accounts for 5% of the ature: 2-meter relative humidity (0.1), 2-meter temperature prediction error, leaving the remaining 95% as a random error. (0.7), surface wind speed (0.1), and surface wind direction Figure 4c shows the rank histograms associated with dif- (0.1). The associated predictor weights are listed in paren- ferent numbers of ensemble members. The horizontal axis has theses. These weights are determined with an extensive grid been rescaled from the original member rank to the normalized search algorithm that aims to optimize the predictive skill rank by dividing the member rank by the number of total over training data. Details on tuning the number of ensemble ensemble members. After normalization, rank histograms can members are presented in Section IV-A, followed by statistical be plotted on top of each other for easy comparison of shapes. verification of the original and the calibrated ensembles in Rank histograms are plotted as lines, instead of rectangular Section IV-B. bins, to simplify the visualization. Frequency is shown on the vertical axis, and it is also normalized with the average A. Parameter Tuning bin count for a particular number of ensemble members. Figure 4 shows a sensitivity study of the AnEn with the Figure 4c and Figure 1 are similar in that, varying the number number of ensemble members. 2-meter temperature forecast of ensemble members does not affect the general shape of ensembles have been generated using the AnEn technique for the rank histogram. While tuning the number of ensemble all six locations. A subset of the time period is used to carry members can improve the prediction accuracy in general, it out the sensitivity analysis. The search period spans from falls short of calibrating the forecast distribution with the January 1, 2010, to December 31, 2016. The analysis time observed distribution. is selected as the first day from each of the months in the Based on the results shown in Figure 4, 20 ensemble search period. This avoids directly optimizing parameters for members are generated from the AnEn and also from the the test period in 2017 and 2018. 11 AnEn simulations have persistence ensemble to be consistent with the AnEn. been run with the number of ensemble members ranging from 5 to 600. B. Ensemble Verification Figures 4a, b show the CRPS [29] and bias for each Rank histograms for the AnEn and the persistence ensem- of the AnEn forecasts, as a function of FLTs. NAM has bles containing 20 members are shown in Figure 5. Each 84 FLTs available but only the day-ahead period (24 h) is row represents a spatial domain of 120 km by 60 km (a used to complete the sensitivity analysis. When the number total of 50 grid points) centered at the specified location. of ensemble members increases from 5 to 20, the CRPS Each column represents an ensemble type, from left to right, shows a decreasing trend, indicating that the AnEn requires a being the AnEn, the AnEn calibrated by the EITrans, the

17 Fig. 5. Rank histograms shown at six locations for (from left to right) the AnEn, the AnEn calibrated by the EITrans, the persistence ensemble, and the persistence ensemble calibrated by the EITrans. All ensembles have 20 members. The associated MRE is shown on the top left of each panel.

18 Fig. 6. The spread-skill correlation diagram at six locations for the AnEn, the calibrated AnEn, the persistence ensembles, and the calibrated persistence ensembles. Ensemble spread is shown on the horizontal axis and the RMSE of the ensemble mean is shown on the vertical axis. A diagonal line is shown on each panel for reference to perfect correlation. persistence ensemble, and the persistence ensemble calibrated potentially solved with a bias correction technique [31] or by by the EITrans. The Missing Rate Error (MRE) is shown on using a machine learning similarity metric for weather analogs the top left of each panel [30]. It is calculated as the fraction [32]. of observations higher/lower than the highest-/lowest- ranked The persistence ensembles are shown to be under-dispersive prediction. A positive/negative MRE usually indicates under- with “U”-shaped rank histograms and positive MRE. The /over-dispersion in the ensemble predictions. A perfect MRE majority of bins are flat but the tails are visibly too short (small would be zero. The calculation is as follows: spread), having higher bins to both ends of rank histograms, compared to the observed distribution. This is a more difficult 2 MRE = f1 + fm − , (3) situation for the EITrans to calibrate because, on one hand, m + 1 the EITrans seeks to enlarge the spread, and on the other where f1 is the relative frequency of the first bin, indicating hand, it needs to maintain the already flat rank histogram. The the frequency of observations lying outside to the left of EITrans generally improves the forecast distribution at Denver, the ensemble; fm is the relative frequency of the last bin, New York City, and Sacramento, by showing relatively flat indicating the frequency of observations lying outside to the rank histograms and closer-to-zero MRE values. The slight right of the ensemble; m is the number of ensemble members. over-dispersion at Chicago and State College can be caused The AnEn typically shows an over-dispersive ensemble by a large ki (Equation (1)) that the ensemble spread has spread, indicated by the convex shape of rank histograms been overly stretched. When ki is too large for the two tails, and the slightly negative MRE across locations. After the correction values applied at the tails tend to over-predict the EITrans calibration, most locations, except for Miami, FL ensemble spread, therefore causing potentially over-dispersive show improvement in both rank histogram diagrams and calibrated ensembles. the magnitude of the MRE. The EITrans achieves the best Results show that the EITrans performs poorly at Miami performance at Denver, CO as indicated by the near-zero for both the AnEn and the persistence ensembles. Rather than MRE and a visually flat rank histogram. The calibration having issues with ensemble dispersion, the AnEn has a slight over other locations typically improves the rank histogram by low bias. The EITrans is not able to calibrate for this bias flattening the curve and reducing the magnitude of the MRE, but rather changes the characteristics of the ensembles from but introduces a small low-bias, represented by the higher bins being over-dispersive to under-dispersive (the MRE from - to the right of rank histograms. This is a typical issue with 0.009 to 0.014). Therefore, the EITrans is able to detect a the AnEn technique for modeling extreme values and can be potential over-dispersion but it does not perform as well when

19 Fig. 7. The RMSE (green) of the ensemble mean and the ensemble spread (orange) are shown by FLTs at six locations. Mean difference is calculated by taking the average of the RMSE minus the ensemble spread across all FLTs. the dispersion issue is not the major cause of a distribution The AnEn ensembles are to the left of the persistence mismatch. ensembles on correlation diagrams because AnEn typically The persistence ensembles at Miami, after calibration, are have a lower prediction error and a smaller ensemble spread. shown to be over-dispersive. The EITrans again detects the This result is expected because persistence ensembles always under-dispersive issue of the persistence ensembles and seeks assume the same pattern of atmospheric evolution and use the to increase the overall spread of the ensemble. The persistence immediate historical observations for predictions. The AnEn ensembles, however, misrepresent the ranked members at the is superior because, instead of relying on a continuous time middle of the distribution, resulting in too many observations series, it selects observations based on similar weather patterns falling between the bins at the middle. and operational weather forecasts. Similar forecasts tend to Figure 6 shows the spread-skill correlation at six locations have similar errors, and therefore, by using the observations, for the four types of ensembles. Ensemble spread is shown these errors can be effectively corrected. on the horizontal axis and the RMSE of ensemble mean is In Figure 6, the AnEn (green) lies below the diagonal shown on the vertical axis. Spread-skill correlation quantifies line, meaning the ensemble spread is larger than the RMSE, the reliability of ensembles. A high correlation between the being over-dispersive. This is consistent with negative MRE ensemble spread and the ensemble predictive skill is desired and the convex shape of rank histograms from Figure 5. The so that the ensemble uncertainty can be approximated, without calibrated AnEn (brown) moves the AnEn to the left, reducing using the observations, but only using the ensemble spread. the ensemble spread and bringing the correlation line closer

20 to the diagonal line. EITrans. The reason for choosing these two types of ensembles Results are similar for the persistence ensembles. The is because the AnEn is over-dispersive, while persistence is persistence (purple) lies above the diagonal line, meaning under-dispersive. These two scenarios serve as a well-rounded the ensemble spread is smaller than the RMSE, being under- test for the EITrans on ensembles with different characteristics. dispersive. The EITrans seeks to calibrate the ensembles by The EITrans performs well on both ensembles in most of increasing the ensemble spread, as indicated by the horizontal the places by flattening the rank histogram and increasing the shift from left (persistence) to right (calibrated). Similar prob- spread-skill correlation. It has been found that the EITrans can lems can be observed in Chicago and State College, where be used to specifically calibrate ensembles with distribution the increment of spread seems to be too large so that the mismatch issues, e.g. being over-/under- dispersive. It does correlation line has already been moved over the diagonal line, not, however, perform as well for a biased distribution. The causing over-dispersion. EITrans is not a replacement for an existing bias correction Figure 7 shows the RMSE (aquamarine) and the ensemble technique. spread (orange) by FLT at six locations. Having two lines Finally, the algorithm and an R implementation of the overlapped would demonstrate a high correlation between the EITrans technique can be found at https://weiming-hu.github. ensemble spread and the predictive skill. The AnEn ensembles, io/EITrans/. Future directions on ensemble calibration and in general, have a good correlation but with a small offset the EITrans would include testing the technique on other between the ensemble spread and the RMSE. The offset dynamical or statistical ensembles and over a larger spatial also tends to vary at different FLTs and locations. Typically, domain. Specific attention should be directed to calibrating RMSE is smaller than the ensemble spread up until 40 hours ensembles that are under-dispersive. Under-dispersion is a into the future (Chicago, New York City, Sacramento, and more common problem for ensemble forecasts, than over- State College). After 40 hours, the RMSE increases to a dispersion. An under-dispersive forecast ensemble is usually level similar to the ensemble spread. Therefore, most of the associated with the risk of missing extreme events because the improvement from the EITrans can be observed for the first ground truth might lie outside of the scope of all members. 40 hours, indicating by the compressed ensemble spread and Therefore, its calibration is critical to the weather and climate the overlapping lines. modeling communities. Improvement of the EITrans on the persistence ensembles is visibly more prominent. Throughout all FLTs, the persistence REFERENCES has a high predictive error (aquamarine) but a relatively [1] T. M. Hopson. Assessing the Ensemble Spread–Error small ensemble spread (orange), causing under-dispersion. Relationship. Monthly Weather Review, 142(3):1125– The EITrans successfully approximates the difference and 1142, 2013. ISSN 0027-0644. doi: 10.1175/mwr-d-12- increases the ensemble spread to match the predictive error 00111.1. (Chicago, Denver, Miami, and Sacramento). In New York City [2] Jeffrey S. Whitaker and Andrew F. Loughe. The Rela- and State College, the correction appears to be too aggressive tionship between Ensemble Spread and Ensemble Mean so that the ensemble spread becomes larger than desired. Skill. Monthly Weather Review, 126(12):3292–3302, dec This can, again, be related to the determination of the slope 2002. ISSN 0027-0644. doi: 10.1175/1520-0493(1998) parameter, ki, in Equation (1). 126h3292:trbesai2.0.co;2. [3] Daniel S Wilks. Statistical methods in the atmospheric V. CONCLUSIONS sciences, volume 100. Academic press, 2011. This work proposes an ensemble calibration method, [4] Bruce A Veenhuis. Spread calibration of ensemble mos EITrans, that seeks to improve the reliability of ensemble forecasts. Monthly weather review, 141(7):2467–2482, forecasts. By characterizing ensemble dispersion from the 2013. most similar historical forecasts, a set of correction values [5] Sue Ellen Haupt, Pedro A Jimenez,´ Jared A Lee, and can be calculated for each of the ranked members and then Branko Kosovic.´ Principles of meteorology and numer- applied to ensemble forecasts. If historical ensembles tend to ical weather prediction. In Renewable Energy Forecast- be under-dispersive, the correction values will range from a ing, pages 3–28. Elsevier, 2017. negative value to a positive value, moving ensemble members [6] Alessandro Fanfarillo, Behrooz Roozitalab, Weiming Hu, away from the distribution mean; if historical ensembles and Guido Cervone. Probabilistic forecasting using tend to be over-dispersive, the correction values will range deep generative models. GeoInformatica, 25(1):127–147, from a positive value to a negative value, moving ensemble 2021. members closer to the distribution mean. The EITrans can [7] Julia Slingo and Tim Palmer. Uncertainty in weather also be applied to NWP ensembles without changes in the and climate prediction. Philosophical Transactions of methodology and procedures. the Royal Society A: Mathematical, Physical and Engi- The EITrans has been tested with six locations across the neering Sciences, 369(1956):4751–4767, 2011. CONUS using a dataset spanning from January 1, 2017, to [8] Mark S Roulston and Leonard A Smith. Combining July 31, 2018. Two types of forecast ensembles have first been dynamical and statistical ensembles. Tellus A: Dynamic generated from the NAM forecasts, then calibrated with the Meteorology and Oceanography, 55(1):16–30, 2003.

21 [9] Guido Cervone, Laura Clemente-Harding, Stefano 0027-0644. doi: 10.1175/mwr-d-12-00281.1. Alessandrini, and Luca Delle Monache. Short-term 2019. ISSN 00983004. doi: 10.1016/j.cageo.2019.07. photovoltaic power forecasting using Artificial Neural 003. Networks and an Analog Ensemble. Renewable En- [22] Emilie Vanvyve, Luca Delle Monache, Andrew J. Mon- ergy, 108:274–286, aug 2017. ISSN 18790682. doi: aghan, and James O. Pinto. Wind resource estimates 10.1016/j.renene.2017.02.052. with an analog ensemble approach. Renewable En- [10] Weiming Hu, Guido Cervone, Laura Clemente-Harding, ergy, 74:761–773, feb 2015. ISSN 09601481. doi: and Martina Calovi. Parallel analog ensemble–the power 10.1016/j.renene.2014.08.060. of weather analogs. NCAR Technical Notes NCAR/TN- [23] S. Alessandrini, L. Delle Monache, S. Sperati, and J. N. 564+ PROC, page 1, 2020. Nissen. A novel application of an analog ensemble for [11] Jeffrey L. Anderson. A model for Producing and Evalu- short-term wind power forecasting. Renewable Energy, ating, 1995. 76:768–781, apr 2015. ISSN 18790682. doi: 10.1016/j. [12] Atmospheric Sciences and Thomas M. Hamill. Relia- renene.2014.11.061. bility diagrams for multicategory probabilistic forecasts. [24] Stefano Alessandrini, Luca Delle Monache, Christo- Weather and Forecasting, 12(4):736–741, 1997. ISSN pher M. Rozoff, and William E. Lewis. Probabilistic 08828156. doi: 10.1175/1520-0434(1997)012h0736: Prediction of Tropical Cyclone Intensity with an Analog RDFMPFi2.0.CO;2. Ensemble. Monthly Weather Review, 146(6):1723–1744, [13] Guillem Candille and O. Talagrand. Evaluation of 2018. ISSN 0027-0644. doi: 10.1175/mwr-d-17-0314.1. probabilistic prediction systems for a scalar variable. [25] Simone Sperati, Stefano Alessandrini, and Luca Delle Quarterly Journal of the Royal Meteorological Society, Monache. Gridded probabilistic weather forecasts with 131(609):2131–2150, 2005. ISSN 00359009. doi: an analog ensemble. Quarterly Journal of the Royal 10.1256/qj.04.71. Meteorological Society, 143(708):2874–2885, oct 2017. [14] Geoff Bohling. INTRODUCTION TO GEOSTATISTICS ISSN 1477870X. doi: 10.1002/qj.3137. And VARIOGRAM ANALYSIS. Earth, pages 1–20, [26] Maria E. B. Frediani, Thomas M. Hopson, Joshua P. October 2005. Hacker, Emmanouil N. Anagnostou, Luca Delle [15] John E. Angus. Probability integral transform and related Monache, and Francois Vandenberghe. Object-Based results. SIAM Review, 36(4):652–654, dec 1994. ISSN Analog Forecasts for Surface Wind Speed. Monthly 00361445. doi: 10.1137/1036146. Weather Review, 145(12):5083–5102, 2017. ISSN [16] Thomas M. Hamill. Interpretation of Rank Histograms 0027-0644. doi: 10.1175/mwr-d-17-0012.1. for Verifying Ensemble Forecasts. Monthly Weather [27] Laura Clemente-Harding. Extension of the Analog En- Review, 129(3):550–560, 2002. ISSN 0027-0644. doi: semble Technique to the Spatial Domain. PhD thesis, 10.1175/1520-0493(2001)129h0550:iorhfvi2.0.co;2. Pennsylvania State University, 2019. [17] Unidata, University Corporation for Atmospheric Re- [28] Simone Sperati, Stefano Alessandrini, and Luca Delle search and National Centers for Environmental Predic- Monache. An application of the ECMWF Ensemble tion, National Weather Service, NOAA, U.S. Depart- Prediction System for short-term solar power forecasting. ment of Commerce and European Centre for Medium- Solar Energy, 133:437–450, 2016. ISSN 0038092X. doi: Range Weather Forecasts. Historical unidata internet 10.1016/j.solener.2016.04.016. data distribution (idd) gridded model data, 2003. URL [29] Hans Hersbach. Decomposition of the continuous http://rda.ucar.edu/datasets/ds335.0/. ranked probability score for ensemble prediction systems. [18] National Centers for Environmental Prediction, National Weather and Forecasting, 15(5):559–570, 2000. Weather Service, NOAA, U.S. Department of Commerce. [30] S. Alessandrini, L. Delle Monache, S. Sperati, and Ncep north american mesoscale (nam) 12 km analysis, G. Cervone. An analog ensemble for short-term prob- 2015. URL https://doi.org/10.5065/G4RC-1N91. abilistic solar power forecast. Applied Energy, 157:95– [19] Computational and Information Systems Laboratory. 110, nov 2015. ISSN 03062619. doi: 10.1016/j.apenergy. Cheyenne: HPE/SGI ICE XA System (University Com- 2015.08.011. munity Computing). Boulder, CO: National Center for [31] Stefano Alessandrini, Simone Sperati, and Luca Atmospheric Research, 2019. Delle Monache. Improving the analog ensemble wind [20] Luca Delle Monache, F. Anthony Eckel, Daran L. Rife, speed forecasts for rare events. Monthly Weather Review, Badrinath Nagarajan, and Keith Searight. Probabilistic 147(7):2677–2692, 2019. Weather Prediction with an Analog Ensemble. Monthly [32] Weiming Hu, Guido Cervone, George Young, and Weather Review, 141(10):3498–3516, oct 2013. ISSN Luca Delle Monache. Weather analogs with a machine [21] Weiming Hu and Guido Cervone. Dynamically Opti- learning similarity metric for renewable resource fore- mized Unstructured Grid (DOUG) for Analog Ensem- casting. arXiv preprint arXiv:2103.04530, 2021. ble of numerical weather predictions using evolutionary algorithms. Computers and Geosciences, 133:104299,

22 Expanding Impact Metrics Contexts With Software Citation*

Keith E. Maull Matt Mayernik NCAR NCAR NCAR Library NCAR Library Boulder, CO USA Boulder, CO USA [email protected] [email protected] (ORCID: 0000-0002-3459-5810) (ORCID: 0000-0002-4122-0910)

Abstract—The development of impact metrics is an impor- lack any connection to other important inputs and outputs of tant tool for scientific research organizations and universities scholarly activity, including data and software. to understand the scope and scale of their scholarly output. It has been acknowledged in the past decade or so that the Traditionally in academic contexts, “impact” has been measured by the numbers of papers published, along with citation counts contribution of data and software to the scientific enterprise for those papers. But research organizations like NCAR produce has been greatly underrepresented [2]. Consequently, impor- much more than just publications. In the past decade, it has been tant efforts have been undertaken to lift these assets to first widely acknowledged that a primary input to much research, class citizens so that their contribution to the progress of scientific data, be lifted to a first class citable resource alongside scientific research does not continue to go unnoticed. Several publications. The uptake of data citation has grown robustly in the past 5 years, and efforts to continue to improve data citation strategies have been critical to these efforts, chief among are ongoing. which have been (a) to educate and provide broad resources Today, a parallel effort is rapidly evolving to bring citing about the importance of data and software, (b) to provide scientific software resources into parity with data and publi- infrastructure, tools and techniques for standardizing data and cations. For some time, software has been acknowledged as software citation and communication, and (c) to develop the fundamental to the engine of science, yet citation strategies and practices have yet to achieve widespread adoption. Much necessary mechanisms for measuring the research utilization like data citation, practicing researchers need to understand of software and data. It will be noted that without (b) and (c) community norms and be given guidance to embrace software the probability of success will be low, since without a standard citation practices. Last year at NCAR, the Library set about to mechanism for consistently creating, discovering and resolving develop and implement a software citation recommendation, and citations, measurement is not possible and would remain a with the help of community feedback and leadership support, this recommendation was publicly deployed. high effort, low reward activity. Thus, success must necessarily This paper will explore the development and deployment of include (a), since communities without an understanding of this recommendation, from the data-driven approach used to the value of data and software citation will not take interest, understand the existing public software repositories of NCAR, undermining any progress. to a survey to understand the needs of the community, to the We now are at an important turning point as there is growing application of best practices for minting persistent identifiers and DOIs to NCAR-maintained software. We share our experiences acceptance across the entire scientific enterprise, including and challenges implementing an organization-wide software ci- scientific communities, universities and research institutions, tation recommendation and how both data and software citation that data and software assets are necessary to more com- will bear on the future of impact metrics. prehensively measure research impact. Still lacking, are the Index Terms—software citation, bibliometrics, scientometrics, necessary guidelines, recommendations and best practices to research impact metrics usher in this new era. This paper describes such an effort at the National Center for Atmospheric Research (NCAR), with the I.INTRODUCTION focus on developing a flexible organizational software citation The development of impact metrics for scholarly work has strategy rooted in community-driven contribution. First, we been built largely on peer-reviewed publications. Over many discuss the backdrop of scholarly impact metrics, including the decades, publishers and third party platforms have grown history of digital object identifiers (DOIs) and the important to provide an impressive array of bibliometrics about these efforts in data and software citation over the past decade. publications, which have expanded beyond mere citation count Subsequently, we describe the organizational software citation metrics. These bibliometrics now include metrics such as the strategy implemented at NCAR, from the initial motivation specific details of authorship and institutional collaborations, to understand the needs of the scientific software community the breadth of research topic areas and the scope of journal within the organization, to developing, implementing and utilization. While these and other bibliometrics provide im- publishing a concrete recommendation for the community to portant data to assess the impact of contribution, because the use. Finally, we describe the challenges, opportunities and focus is squarely aimed at peer-reviewed publications, they next steps for the NCAR community, organization and broader

23 metrics impact efforts within NCAR. in 2012 (ISO 26324:2012) and today over 230 million DOIs have been assigned. II.SCHOLARLY IMPACT METRICS Though DOIs are intended to provide “actionable, interop- For many decades, the cornerstone of scholarly impact erable [and] persistent” links to any entity (physical, digital or assessment has been the peer-reviewed publication. Since abstract), academic publications enjoyed the earliest benefits the inception of scholarly writing, citing and acknowledging of the system because of the zeal among publishers to support the contributions of other researchers has been crucial to interoperability and rights management, as well as to exploit the scientific enterprise, not only for appropriate attribution, the mature and robust metadata schemas already developed but for establishing the basis for how a body of research for such objects. Consequently, the DOI provided enumerable developed. How often such attribution occurs promptly leads benefits that supported the growth of complex bibliometrics to citation counting and the genesis of bibliometrics. Though and research impact analyses that expanded into large scale the first formal explorations into metrics for scholarly works publication tracking, authorship attribution and collaboration can be traced to around the first quarter of the 1900s [3]–[7], network analysis. The inceptive days of DOI minting were the entire field of bibliometric science did not evolve in earnest not exclusive to publications, but uptake was very slow for until the 1960s [8] with the confluence of several factors: the other non-publication digital objects in the research enterprise, rise of library sciences as an important research discipline, the especially for datasets. expansive commercialization of scientific journals, the formal development of information storage, retrieval and statistical B. Dataset and Software Citation analysis techniques, and the bewildering evolution of digital Data is now acknowledged as a significant and necessary computing technology. The first widely used citation index, contributor to evaluating research impact, yet minting DOIs the Web of Science, served as the foundation for much of the for datasets did not gain widespread attention until 2009, when development of the field for the first few decades [9]. DataCite was founded to facilitate creating, finding, citing, It was already acknowledged that publications were funda- connecting and using research by (a) minting research output mental to research impact, and that measuring those publi- DOIs, (b) promoting and developing best practices, services cations would lead to important insights not only about the and metadata schemas, and (c) tracking research outputs. strength of a publication as judged by its citation counts, but DataCite has seen an increase of over 2000% in the number also by the strength of the publications citing it. The field of of DOIs minted over the past decade, and in the same amount bibliometrics has grown dramatically since its early days into of time the number of journal publishers that have formal a hugely important tool for measuring the scholarly impact data sharing and citation policies has proliferated [12], with a of the research enterprise – universities judge faculty by the corresponding growth in number of cited dataset DOIs [1]. number of papers they publish, as well as the number of Datasets are not the only digital objects supported by citations they have accumulated; research grant decisions are DataCite and they acknowledge software as “a critical part often made on the basis of prior research contributions to a of modern research . . . [that has] little support across the field, as judged by the breadth and depth of a researcher’s scholarly ecosystem for its acknowledgement and citation.” publications, as well as the reputation of the journals in Important efforts to develop formal approaches to software which they appear; collaboration partnerships are evaluated citation are already in place [2] and frameworks such as the on the scope and intensity of the publications resulting from FAIR principles [13] have garnered the support and recog- collaborating institutions and authors; and so on. nition necessary to lift software to first class citizenship in the research enterprise. Furthermore, platforms like Zenodo A. Persistent Identifiers [14] have eliminated the friction between writing scientific For some years, citation tracking was done by a variety software and obtaining a persistent identifier (DOI) by building of mechanisms, including manual text matching validation of a seamless bridge to software repositories hosted in Github citations, publisher-specific tracking identifiers or other non- that integrates the practice of minting DOIs for research standard and often inconsistent identifiers which had low or software, large or small. By further encouraging researchers uneven adoption across the research enterprise. As the number to use the Zenodo platform to link software associated with of publications dramatically increased and with the prominent scholarly publications alongside relevant datasets, a clearer rise of the Internet and World Wide Web, a standard, consistent path toward open reproducibility is emerging. Barriers that and reliable system for identifying publications and indeed once existed between a publication and its corresponding any digital object that might appear on the Internet became inputs (and outputs) are slowly being reduced, making way increasingly more important. The need for such identifiers was for greater transparency and traceability within and across formally identified as early as 1994 [10], but after the creation research disciplines. of the International DOI Foundation [11] in 1998, progress toward systematic implementation of persistent identifiers was C. Toward Data and Software Metrics rapid, and the first DOI resolution applications were launched The growth of dataset citation has encouraged the devel- in 2000 with academic publications dominating the initial opment of metrics to track citation counts and associated uptake of DOIs. The DOI system became an ISO standard publication networks, and the benefits are becoming clearer

24 to dataset providers and individual scholars whose primary recommendation. Through this work we demonstrate one path outputs are data — the growth of formal citation through towards an organization-centric strategy and implementation persistent identifiers (DOIs) provides a way towards proper of software citation with two parallel goals of first improving measurement of the impact of their data. Data and software the uptake of software citation, alongside the expansion of are often strongly coupled within research since software often metrics capabilities to demonstrate the value of that uptake as require specific datasets as inputs, and datasets often require an important component of impact metrics. specific software to elicit their value. Unsurprisingly, they share similar issues when demonstrating their benefits within III.SOFTWARE LANDSCAPEOF NCAR the scope of impact metrics (arguably, some of the same To arrive at a broad understanding of the software citation issues also plague publication citation). First, there is often practices within NCAR, our attention focused on the software no way to demonstrate the context of a citation, even where repositories maintained by the organization for the expressed a persistent identifier is used. A citation does not prove use, purpose of providing a centralized platform for sharing and it merely provides an associative link between the citing and disseminating software. While there are a number of internal the cited. Additionally, bibliometrics do not contextualize the repositories, many of which are not public, Github contains class of the citation — whether the work is being cited as an the bulk of public NCAR repositories of software in active affirmation, refutation, extension, challenge or other context. development for wider public and community benefit. Initiated This makes metrics based on citation counts a challenge that in 2013, the NCAR organization Github repository now houses requires the usual caveats and appropriate disclaimers when over 600 public open source repositories, and unknown (but publicised. Second, there are domain factors that are important probably large) number of private repositories. We note that when measuring data and software citation. For example, for while many repositories were newly created after the inception some disciplines such as the atmospheric sciences, model of the Github organization, a number of projects that had a input and output data are both important, and it is appropriate mature and wide software base with extensive and established and expected to track input data as well as output data. user communities were ported to the platform from prior Finally, datasets are frequently adjusted, updated or otherwise software management systems. Though many of the repos- appended, which require version tracking, a problem that itories contain tightly focused research software packages or complicates metrics since it is not fully established how to libraries under active community development, a large number track and count attribution to the top level or any intermediate of the repositories represent software of narrowly specialized datasets through a chain of derived datasets. A similar dilemma or personal interest, and some are experimental, exploratory, exists with software versioning, since version information is incomplete, empty or even abandoned. not only an important prerequisite to the correct execution Our initial aim was to characterize all public Github repos- environment, but also correct attribution of parent versions. itories as of April 1, 2020 under the NCAR organization Regardless of the issues that emerge because of the nascence (https://github.com/NCAR) to find evidence of existing pat- and immaturity of current software and dataset metrics, it terns of software citation. We automated this by analyzing is clear DOIs play a distinct role in tracking and attribution the repository-root README.md text of all of the public measurement. DOIs also demonstrate that standardization has repositories for DOIs to explore the hypothesis that a software enumerable benefits to the growth of bibliometrics beyond repository that either cites relevant DOIs or promotes its own citation tracking, and are an important feature of expanding DOI would do so in an expected and prominent location. scientific impact well beyond publications. In addition to README.md files, we explored the explicit A number of lingering questions still remain: How do we usage of licenses within Github via license metadata associated get to the place where we have robust citation data to build with the repository directly from within Github. Such license good metrics upon? What are the necessary preconditions to information is an important feature of repository initialization build confidence in those metrics? What role does community that signals how end users may modify, transmit and otherwise education and training play in promoting best practices among use the software. Further, this license information may also be researchers so that citation practices are robust and consistent? important in the context of citation, since there are reasoned The remainder of this paper will explore these questions arguments that license information could (or even should) be specifically within the scope of software citation, realizing transmitted within a citation, but most certainly it is integral data citation shares similar issues and would benefit from to the complete software citation metadata record. any solutions that emerge from software citation (the inverse The initial analysis revealed that only 14 repositories (3.3% of which is also true). We will describe and expand on a of the total analyzed repository set) exposed or otherwise specific effort within the National Center for Atmospheric mentioned a DOI. Of those 14, only 5 (35.7%) included Research to develop and implement an organization-wide explicit attribution information in the same README.md file. software citation strategy. We methodically explore the state The projects that had attribution information were also some of software citation within the organization, the community of the more mature software projects within the organization perceptions about and behaviors around citation, the moti- including some with large user bases such as WRF-Python, vations and benefits of a citation policy, and the evidence NCAR Command Language (NCL) and the Intermediate supporting the need for a formally publicised software citation Complexity Atmospheric Research (ICAR) model. Explicit

25 license metadata was missing from 235 (55%) repositories. A basic citation count analysis was performed against the 14 DOIs using the DataCite metrics API. The analysis revealed that most of the DOIs had low citation counts, but when measured against the age of the DOI, unsurprisingly, DOIs minted further in the past had higher citation counts than those that were more recently minted. These data uncovered important gaps in both citation prac- tices and license coverage at the site of the source code, and a few cases were found where project websites (e.g. project landing pages) outside of Github provided more detailed citation information, but such information needs to be visible everywhere the software is available, whether within source code repositories or software project landing pages. Where citation information existed, there was broad variance across the language, format and details of the attribution, which could lead end users to confusion and disorientation. It was clear that there was an immediate and obvious need for standards, but a number of questions emerged from the initial exploration. First, and perhaps most important was why there were so few DOIs found among such a large number of repositories? While not every repository requires one, the inconsistency and low uptake required deeper understanding. Did end users need more resources to understand what soft- ware citation was or why it was important? Was there an un- derlying lack of information about software citation? Was there an underlying disinterest in software citation? These questions fueled the development of a formal strategy to (a) learn about the specific community practices around software citation, (b) develop an informed approach to a citation strategy, and to (c) produce a formal citation recommendation. Fig. 1. Citation and Metrics Strategy IV. CITATION AND METRICS STRATEGY The formalized the process of our strategy involved four pri- and important for developing relevant recommendations that mary components and is visualized in Fig 1: (1) organizational were sensitive to the variance in awareness of and attitudes assessment, (2) community survey, (3) draft recommendation about software citation. The details and outcomes of the survey and (4) formal public recommendation. are discussed section V., but the results were used to inform A. Organizational Assessment and sharpen the draft recommendation. The previous section already outlined the first component shown in the top left of Figure 1, the assessment activity using C. Draft Recommendation Github to analyze the repositories that contained a DOI and a license. After processing the survey results and reviewing a number of emerging software citation recommendations [2], [15]–[18], B. Community Survey we developed a draft recommendation that exposed a direct, From the Github assessment information, we developed uncomplicated and actionable framework that could be put to a community survey to understand the community’s basic immediate use. The recommendation was revised iteratively knowledge of software citation, probing along four dimen- through community feedback and input, which proved invalu- sions (a) policies, (b) tools, (c) licenses and (d) community able for making sure the final recommendation accomplished behaviors, in order to understand community knowledge, as the intended goals. Guided by the survey results feedback, well as to assess resource availability and awareness (e.g. do the recommendation was crafted to emphasize four key areas: community members understand where they can learn about (a) demonstrating the benefits of minting DOIs for software, software citation or its importance?). The goal of the sur- (b) understanding how to mint a software DOI (or who to vey was to identify behavioral motivators for, and barriers contact to mint one), (c) crafting clear citation language and impeding, adoption of formal software citation by NCAR (d) finding appropriate resources for selecting an appropriate software developers. This behavioral inquiry was necessary software license.

26 D. Publish Recommendation The open-ended responses revealed a few areas of con- Once completed, the survey recommendation was publi- fusion about the use of DOIs specifically to cite software. cised at https://ncar.github.io/software-citation, and an orga- A few responses indicated that DOIs were of dubious value nizational announcement was posted along with a resource for software, specifically when compared to their use for link directly from the NCAR Library, where a wide array of publications. This clearly demonstrated the misunderstanding resources exist for public retrieval. of the DOI platform, its origins, intent and broad use cases. It further highlighted the respondents’ devaluation of the V. SURVEY DETAILS AND OUTCOMES DOI as an important and necessary tool in tracking and citation counting, which would confer equivalent benefits in The survey was administered in Google Forms and sent to a tracking and counting software uses within publications using sample consisting of the contributors of the most popular and both software and data. This was elaborated further in other active 10% of Github repositories in the NCAR organization responses that suggested “software citations don’t seem to (approximately 40 repositories and 53 unique email addresses). have a place yet” in professional review, promotion and tenure The survey contained 15 questions across the four dimensions processes, which thus far have relied largely on books and peer listed in section IV.B. The response rate was 45% (N = 53), reviewed journal publications likely tracked through DOIs and with a high number of respondents (33%) also leaving open ISBNs. Similarly, other responses suggested directly citing ended write-in responses and additional comments about soft- software and software repositories were undesirable when ware citation. Those responses revealed a number of useful compared to citing the “refereed journal articles that describe data points that were used to inform the recommendation [the software].” This, of course, might be a more appropriate strategy and focus. Questions varied by dimension, but a way to cite software if indeed such journal articles describing sampling is given here: policies: End users are aware of the software existed. For a variety of reasons, a considerable my (formal or informal) citation or acknowledgement policy.; amount of research software is never described in journal tools: I have used Zenodo for creating a persistent publications, often simply because there is no time to do so, or identifier (DOI) for my software releases. ; licenses: How in many cases because creators are not aware of venues such as do end users find the license of your software? (Choose The Journal of Open Source Software [19] to quickly publish all that apply); community behaviors: How much are your such articles, which will also have a DOI minted for it. None software feature enhancements driven by groups external to of these cases, however, obviate the need for a software DOI to NCAR/UCAR? be minted and perceived as a first class citable research object Some of the important results of the survey are highlighted — linked directly to the source code repository or to a journal below: article. Lacking from some of these responses was a realization • 91.6% of respondents strongly or somewhat agreed that that in the future, the integrity of reproducible scientific results software citation was important; is lost if references to source code and data are unavailable. • 79.2% strongly or somewhat agreed that a DOI or other One respondent accurately captured this sentiment by stating persistent identifier for software was important; that such citation “would provide a needed and significant step • 50% of respondents did not have a formal citation policy in supporting repeatability for published science that relies for their software development project, while 25% were on software” and that without published and discoverable uncertain whether they had a citation policy or not; versions of software-based analyses, repetition of results is • 73.9% of respondents placed their software license in a “nearly impossible.” Fortunately, the DOI platform is explicitly LICENSE file at the root of their Github repository. designed to facilitate preserving the integrity of research, and The survey results reinforced much of the anecdotal infor- works best if all important research inputs can be appreciated mation we had amassed before the formal survey. Specifically, equally within the DOI ecosystem. that most of the community understood the importance of software citation, but the lack of direct guidance prevented VI.RECOMMENDATION more deliberate action toward publishing their own formal Using the survey results as the focal point for the soft- citation instructions. One contributor to this behavior might ware citation recommendation, an initial recommendation was have been the inconsistent and incomplete software citation drafted. It was refined after a round of iterative feedback input, guidance at the organizational level. Another contributor to where important enhancements were made, especially those this might have been the absence of any formal guidelines for that included concrete examples and templates. The recom- creating repositories under the Github organization. Perhaps if mendation was then designed as a web accessible resource there were guidelines at repository creation time (or specific and deployed onto an organizational website as mentioned in templates, which are now becoming more common within the section IV.D. It was intended to be easy to navigate with clear, Github ecosystem), then a creator might have the necessary concise language and instructions. prompts to make a better decision about the kind of reposi- The recommendation was broken into three major areas to tory being created, allowing them to understand options that help the community (1) mint a DOI, (2) provide clear citation might encourage or remind them that for certain repositories, instructions to software end users, and (3) select an appropriate software citation would be recommended. software license. These areas were designed to be accessed

27 within a single link to reduce confusion and disorientation, included in an academic publication. Fig 3 shows the example and to facilitate rapid access to the most relevant information page and a concrete example of an acceptable citation is in the recommendation. provided: Unidata, (2012): Integrated Data Viewer (IDV) Ver- A. Minting a DOI sion 3.1 [software]. Boulder, CO: UCAR/Unidata. Perhaps the most prominent and important feature of the (http://doi.org/10.5065/d6rn35xm) recommendation is for DOI minting. DOI minting is per- Supplemental to the citation language recommendation is formed directly through the NCAR Library, and clear instruc- the suggested use of a software DOI badge directly in the tions are provided on who to contact and the requirements for README.md file at the root of the repository. Badges are a properly minting the DOI (see Fig 2). Since DOIs must have simple and effective way to provide a quick visual indicator a permanently resolvable URL, instructions are provided to that the software has a DOI, what that DOI is and where that advise end users about an appropriate location for resolution DOI resolves (which may, but is not required to be the very where the software can be obtained, such as the Github repos- page the user finds the badge). DOI badges are easy to produce itory or a landing page for the software (including historical with the popular, shields.io service and with a single line versions) might be found. While Github.com is owned and of markdown can be integrated directly in the README.md. operated by Microsoft, and is generally not advised as a long- term preservation-quality repository, NCAR does ongoing local back-ups of all repositories within the NCAR Github account. Thus we are confident that any change to the Github platform in the future would not result in lost software or DOIs that fail to resolve to the correct software project. Furthermore, and as a requirement of DOI management, the user is advised to the minimal metadata required to properly mint the DOI which includes the creator, software title, publisher and year. Zenodo is also included as a secondary option for those who may need to take advantage of its advanced integration with Github releases and automatic management of release version DOIs. Fig. 3. Screenshot of citation instructions recommendation webpage

C. Selecting a License The final and perhaps most open-ended aspect of the recom- mendation is the license selection as shown in Fig 4. While NCAR does not have an organization-wide license require- ment, three open source licenses are recommended by the organization: Apache-2.0, BSD3-Clause and MIT. Software creators are encouraged to seek additional information and education about all the licenses that exist, both open source and non-open source, to make an informed decision about the most appropriate license for their software that encompasses Fig. 2. Screenshot of DOI minting instructions recommendation webpage the needs of the software creators, funders, stakeholders and end users. B. Providing User Citation Instructions Fig. 4. Screenshot of license selection recommendation webpage The second most important feature of the recommenda- tion is the citation instruction information. In this part of the recommendation, which builds on the software citation guidelines from the Earth Science Information Partners (ESIP) [20], guidance is provided to promote clear and obvious placement of software citation information, as well as example attribution language. The recommendation informs the end users that the citation should minimally include the name of the software package, the version, creator and DOI. While the recommendation does not enforce any particular citation style, such styling should be considered within the context of the required style of the target journal, when the citation is

28 VII.CONCLUSIONSAND NEXT STEPS citations for display in the NCAR Institutional Repository, OpenSky. Software citation is a growing area of interest for impact Software citation metrics is emerging as an important fea- metrics, and in this paper we have presented a strategy for ture of scholarly impact and in due time, as uptake of creating developing an organizational software recommendation. It is persistent identifiers and robust citing behaviors take root, the clear citation is one of the most effective methods to improve full benefit of metrics will be realized and, like data, software the breadth of understanding around where and how software is will become fully recognized for its contribution to scientific being used. Encouraging the discipline of appropriately citing research and the broader scientific enterprise. software within academic publications is one of the primary tools we have to move toward the software metrics that expose the full contribution of software’s impact on research. VIII.ACKNOWLEDGEMENTS In developing this strategy several critical lessons were In addition to thanking the respondents to our survey, we learned. First, background understanding and evaluation of would like to thank Sara Byrd for her invaluable contribution the landscape of software and existing citation practices is in refining, developing and deploying the survey. paramount. Appreciable insights were gained by developing automated mining tools to understand the existing software packages through the lens of organizational Github reposito- REFERENCES ries. Furthermore, details of the practice of citation among those repositories provided the empirical evidence necessary [1] N. Robinson-Garcia, P. Mongeon, W. Jeng, and R. Costas, “DataCite as a novel bibliometric source: Coverage, strengths to inform survey development and target gaps in the data. and limitations,” Journal of Informetrics, vol. 11, no. 3, pp. As a side effect, this evaluation unveiled a comprehensive 841–854, 2017. view of the breadth and depth of open source software [2] A. M. Smith, D. S. Katz, and K. E. Niemeyer, “Software repositories within the organization — which will be valuable citation principles,” PeerJ Computer Science, vol. 2, p. e86, to administrative and development groups within the wider 2016. NCAR community. Second, through community survey instru- [3] F. B. F. Campbell, “The Theory of the National and Interna- ments, our insights deepened around the value, knowledge, tional Bibliography: With Special Reference to the Introduc- tion of System in the Record of Modern Literature. London: perceptions and attitudes of and about software citation. We Library Bureau. Cited by: Hood, William W.; & Wilson, learned about the challenge of how to promote the benefits Conception S.(2001). The Literature of Bibliometrics,” Sci- of software citation through DOIs and to refocus on the entometrics, and Informetrics. Scientometrics, vol. 52, no. 2, traceability value proposition, just as such traceability is seen pp. 291–314, 1896. as an essential benefit of publication DOIs. The strategy of [4] J. M. Cattell, “Statistics of American psychologists,” The surveying members of the top 10% most active and popular American Journal of Psychology, vol. 14, no. 3/4, pp. 310– repositories requires further exploration, since the unusually 328, 1903. high survey response rate could be an anomaly, an indication [5] E. W. Hulme, “Statistical bibliography in relation to the growth of modern civilization,” 1923. of high interest, unexpected demand for software citation, [6] A. J. Lotka, “The frequency distribution of scientific produc- a survey population bias, a legitimately effective sampling tivity,” Journal of the Washington academy of sciences, vol. strategy for yielding high responses or some other undiscov- 16, no. 12, pp. 317–323, 1926. ered phenomenon. Finally, we learned that a low complexity, [7] S. C. Bradford, “Sources of information on specific subjects,” direct recommendation that can be quickly and easily revised Engineering, vol. 137, pp. 85–86, 1934. promotes faster development, deployment and more robust [8] F. Osareh, “Bibliometrics, citation analysis and co-citation user feedback. analysis: A review of literature I,” Libri, vol. 46, no. 3, pp. The next steps of this work involve a review of the current 149–158, 1996. recommendation at the one year point in the lifespan (early [9] E. Garfield, “Citation Analysis as a Tool in Journal Evalua- summer 2021). That review will include data on how many tion,” Science, vol. 178, no. 4060, pp. 471–479, 1972. new software DOIs have been minted since the inception of [10] L. A. ;. D. Davidson, “Digital Object Identifiers: Promise and Journal of Electronic the recommendation, the number of new repositories that have Problems for Scholarly Publishing,” Publishing, vol. 4, no. 2, Dec. 1998. been added to the organizational Github repository (with and [11] N. Paskin, “Toward unique identifiers,” Proceedings of the without DOIs), as well any relevant change to software citation IEEE, vol. 87, no. 7, pp. 1208–1227, 1999. best practices that may need to be considered for integration. [12] N. A. Vasilevsky, J. Minnier, M. A. Haendel, and R. E. The review will also include a refresh of the DOI citation Champieux, “Reproducible and reusable research: Are jour- counts and will culminate in a new short survey to understand nal data sharing policies meeting the mark?” PeerJ, vol. 5, changes to the use of the recommendation, as well as any p. e3208, 2017. changes in the perceptions of software citation and the general [13] M. D. Wilkinson, M. Dumontier, Ij. J. Aalbersberg, G. use of DOIs for software. In addition to concrete changes to Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, the recommendation, a formalized workflow for software cita- L. B. da Silva Santos, and P. E. Bourne, “The FAIR Guiding Principles for scientific data management and stewardship,” tion tracking will be considered, as well as integrating software Scientific data, vol. 3, no. 1, pp. 1–9, 2016.

29 [14] I. Peters, P. Kraker, E. Lex, C. Gumpenberger, and J. I. [18] Y. AlNoamany and J. A. Borghi, “Towards computational re- Gorraiz, “Zenodo in the spotlight of traditional and new producibility: Researcher perspectives on the use and sharing metrics,” Frontiers in Research Metrics and Analytics, vol. of software,” PeerJ Computer Science, vol. 4, p. e163, 2018. 2, p. 13, 2017. [19] A. M. Smith, K. E. Niemeyer, D. S. Katz, L. A. Barba, [15] L. Soito and L. J. Hwang, “Citations for software: Providing G. Githinji, M. Gymrek, K. D. Huff, C. R. Madan, A. C. identification, access and recognition for research software,” Mayes, and K. M. Moerman, “Journal of Open Source Soft- 2017. ware (JOSS): Design and first-year review,” PeerJ Computer [16] D. S. Katz and N. P. C. Hong, “Software citation in theory Science, vol. 4, p. e147, 2018. and practice,” in International Congress on Mathematical [20] J. Hausman, S. Stall, J. Gallagher, and M. Wu, “Software Software, 2018, pp. 289–296. and Services Citation Guidelines and Examples,” pp. 100947 [17] R. C. Jimenez,´ M. Kuzak, M. Alhamdoosh, M. Barker, B. Bytes, 2019. Batut, M. Borg, S. Capella-Gutierrez, N. C. Hong, M. Cook, and M. Corpas, “Four simple recommendations to encourage best practices in research software,” F1000Research, vol. 6, 2017.

30 A Portable Framework for Multudimensional Spectral-like Transforms At Scale

Dmitry Pekurovsky San Diego Supercomputer Center University of California at San Diego La Jolla, California, USA [email protected]

Abstract— We report progress in an ongoing effort to develop issue on high-end HPC systems today and, as we approach the a versatile and portable software framework for computing muti- Exascale, this limitation is only going to become worse and will dimensional spectral-like transforms at large scale. The design become a significant bottleneck for many applications. A covers Fast Fourier Transforms and other algorithms that can be number of past-decade implementations of Multi-dimensional broken down into line operations. This class of algorithms covers (M-D) FFT aim to optimize performance at large scale, utilizing a wide range of scientific applications and are notoriously challenging to scale on largest supercomputers. Another challenge strategies such as two-dimensional (2D) decomposition [4-11]. addressed by this project is the fast pace of change in the field of While this was a significant step forward compared to one- High Performance Computing, with new systems and paradigms dimensional (1D) decomposition, now there is a greater need appearing every few years, demanding great adaptability and than ever to work on evolving M-D FFT algorithms and effort on behalf of software developers. To this end we have software to keep pace with the system evolution and application developed a flexible software framework as an open source requirements. package named P3DFFT++. It is written in C++ in a highly object- Fourier Transforms come in many flavors. For example, oriented fashion, with interfaces for Fortran and C. The goal is to there are real-to-complex, complex-to-complex, and real-to- shield the user from details of low-level mechanisms of real (such as sine/cosine) transforms that are commonly used in communication and computation by providing a portable high- level API for commonly used algorithms. The framework will scientific and engineering applications. In addition, there are incorporate many modern HPC programming features, such as algorithms sufficiently similar to FFT. They can be placed in GPU implementation, overlapping communication with the same category, which we call spectral transforms. Some computation and GPU data transfer, as well as algorithm examples of these include wavelets and high-order finite autotuning. We cover design choices of the package and early difference schemes – in fact, anything that involves heavy use results. of all elements of a given dimension in a 3D array, one dimension at a time. We can target all such transforms in a Keywords—Fast Fourier Transforms, Parallel Programming, single package, without loss of generality or performance. Scientific Computation, Numerical Algorithms, Open Source Software Many existing spectral transforms implementations have a rigid or specialized user interface, limiting their usability. Past I. INTRODUCTION packages (such as P3DFFT, the predecessor of P3DFFT++ in terms of key ideas) have been created in response to the largest demand at the time, namely Direct Numerical Simulations of Fast Fourier Transforms (FFTs) is a ubiquitous algorithm in turbulence (DNST) field [13-27], where distinct applications computational science and engineering, second only perhaps to typically have a fairly consistent set of data structures. These linear algebra in terms of the universality of impact. Codes from packages have focused on achieving high performance, while a variety of disciplines rely on FFTs, often through the use of staying within the narrow range of data structures and problems third-party libraries, to simulate a wide range of phenomena. of a typical DNST application. The natural next step of This paper deals with a challenging case of multidimensional evolution might encompass other possible use scenarios, (3D and 4D) FFTs computed repeatedly during simulations including data structures, feature sets and opportunities for running on the medium to high end of High Performance application-specific optimizations. This is where we perceive Computing (HPC) architectures, in terms of size and power. the need for a universal approach that would allow a high This is the typical way they are used for many applications, degree of flexibility in terms of features and usage, at the same including (but not limited to) Direct Numerical Simulations of time maximizing scalable performance by incorporating turbulence, simulations of the ocean and atmosphere, lessons learned from earlier work on M-D FFT packages. This astrophysics, acoustics, seismic simulations, material science, is precisely what the new framework P3DFFT++ aims to medical imaging and molecular dynamics. accomplish. It provides an adaptable, portable The challenge of FFTs at large scale is well-studied and has implementation of M-D FFT and related spectral-like to do with dependence on the system’s bisection bandwidth, algorithms as an open source package. This is reminiscent of as well as on-node memory bandwidth [1-3]. This is a known

31 the role FFTW [28] has played for spectral transforms in the past, and the way packages like BLAS and LAPACK have been the standards for interface and implementation in the field of linear algebra computations. P3DFFT++ provides both a software package and a universal API for using M-D FFT and related spectral-like transforms. It allows for user’s choice of data layout in terms of memory storage and processor space decomposition. It supplies an abstraction level that hides optimization from the user, thus making it easily adaptable to a number of architectures. P3DFFT++ is built in a modular manner, providing opportunities for expansion, for example by adding more transform types in addition to FFTs (e.g. sine/cosine, wavelets, high-order finite difference schemes etc). It also aims to streamline calculations of constructs that rely on FFTs (such as derivatives and convolutions), in order to improve performance and maximize usability. In short, P3DFFT++ is designed to be Figure 1. 3D FFT implementation with 2D decomposition typically involves a a universal toolbox for a broad range of uses of spectral sequence of 1D transforms in X, Y and Z dimensions, interspersed with two sub- transforms, while maintaining competitive performance at communicator all-to-all exchanges. scale. transposes. The end result consists of data distributed as a Z- In this paper we start by defining the problem in the context pencil. It is assumed that after performing necessary operations of previous work in the field. We proceed to describe design in Fourier space (such as taking derivatives, convolution etc) a elements of P3DFFT++, then to demonstrate its performance user might want to do an inverse transform from Z- to X- and ease of use. pencils. Thus only X-pencils are supported in physical space and Z pencils in the Fourier space, out of many other potentially useful configurations. In contrast, as we shall see below, II. PREVIOUS WORK AND THE SCOPE OF THE PROJECT P3DFFT++ envisions many other possible choices for data Throughout this paper we refer to spectral-like transforms layout. In addition, the way 1D FFTs and the transposes are when we mention any multi-dimensional transform algorithm combined in P3DFFT is fixed and the user has no control in the on a structured grid that has the following properties: process. Finally, P3DFFT provides only real-to-complex transforms, as well as a few related real-to-real transforms such 1. It can be reduced to a sequence of 1D transforms for an entire as sine and cosine. The implementation of the algorithm is fixed array, one for each dimension, independent of other dimensions. and thus not adaptable to multiple architectures. 2. Each such 1D transform is compute and memory-bandwidth While performance of a package like P3DFFT on a suitable intensive. In terms of data decomposition, it is best to have all problem and hardware may be impressive (see Fig. 2), even such data in that dimension to reside locally in memory for each solutions will not be sufficient as we step into exascale core/task. The reason for the above is avoidance of heavy computing, in view of both problem and architectures communication that would be necessary to exchange data adaptability. Exascale computing is very much a moving target, between the stages of the 1D algorithm if the data were not all in terms of both hardware and software paradigms. Therefore local. one hopes for a framework portable enough to avoid rewriting scientific software every 1-2 years. Clearly, all flavors of 3D FFT and sine/cosine transforms fall under this category. In addition, high-order finite difference A number of other 3D FFT solutions have been published, schemes and wavelets can also be considered spectral-like both as open source third-party libraries [5-12] and as parts of transforms and supported by P3DFFT++ without loss of proprietary codes [29-35]. Although a thorough review of 3D generality. FFT packages is out of the scope of this paper, it is fair to say that most existing solutions implement an approach similar to As mentioned in the introduction, P3DFFT package [4] has P3DFFT, with some variations of features and specialization for been an important cornerstone in the evolution of this software certain use cases and platforms. Among CPU libraries, there is niche. P3DFFT was written in Fortran90 and MPI/OpenMP, a number of open source libraries implementing 2D encapsulating a well-performing formulation of spectral decomposition, with various other useful features. For example, algorithms suitable for extreme scale computation. In particular, PFFT [9] and OpenFFT [8] support 4D FFTs and 3D P3DFFT implements 2D decomposition in processor space, decomposition. AccFFT [7], FFTE [6] and heFFTe [12] provide which allows in principle to scale a N3 problem up to N2 tasks. GPU implementation. PFFT, OpenFFT and heFFTe include P3DFFT follows the most common sense and efficient path of elements of autotuning. NB3DFFT [10] and P3DFFT [4] computing 3D FFT (for details see [4] and Fig. 1): starting with support pruned transforms and overlap of computation with data distributed as pencils local in X dimension, we do 1D FFT communication. None of the existing implementations, to the in X, Y and Z dimensions, in turn over the entire array, author’s knowledge, provide flexible data layout options. interspersed with two each of local and inter-processor

32 2048^3 1024^3 4096^3 8192^3 Ideal

8.000

4.000

2.000

1.000

0.500

0.250 Time Time (sec) 0.125

0.063

0.031

0.016 2.6E+02 5.1E+02 1.0E+03 2.0E+03 4.1E+03 8.2E+03 1.6E+04 3.3E+04 6.6E+04 1.3E+05 2.6E+05 5.2E+05 N cores

Figure 2. Strong scaling of P3DFFT on Mira (IBM BlueGene/Q at Argonne National Lab). The abundance of packages may be daunting for a new user, level language and has a flexible interface, its performance is on especially without a clear information contrasting them. It takes par with that of existing packages like P3DFFT, for cases that substantial time and commitment to thoroughly evaluate ten or both can handle. Continued work is aimed at achieving even so libraries and compare their performance to make an informed higher performance by utilizing modern optimizations with choice. Once committed to a library, the user is unlikely to potential to make a difference at exascale level. switch. Therefore, they may be missing useful features and/or performance and in some cases may not even realize it. In addition to the low-level functions (“building blocks” mentioned above), P3DFFT++ provides high level 3D (and, in P3DFFT++ aims to combine most of the useful features the future, higher dimensional) transform functions, both for mentioned above under a “one roof” approach. In addition, it convenience and optimization. In particular, an autotuning expands the context of a spectral transform in modern framework is going to be used for the planning stage, choosing computing by giving more choice about what the user can do. the best execution path for a given combination of platform and Using C++ object-oriented features, it encapsulates many problem. The framework also includes utilities for derivative options in a convenient interface. The goal is to have enough and convolution calculations. flexibility to cover as many as possible use cases, platforms and languages. In this way it is different from most of the existing libraries mentioned above, with possible exception of heFFTe III. P3DFFT++ FRAMEWORK AND DESIGN [12]. The latter is a very recent work with goals seemingly overlapping those of P3DFFT++. A detailed comparison between the two packages goes outside the scope of this paper. P3DFFT++ is implemented in C++ and uses MPI for inter- processor communication. 1D transforms are delegated to P3DFFT++ building blocks are 1D transforms, local standard libraries such as MKL[36], FFTW[28], ESSL[37], transposes and interprocessor transposes. By combining these CUFFT[38], or alternatively to a user-defined implementation. blocks in any desired way the user has a high degree of control The package includes C++, C and Fortran interfaces, and over the execution of higher-dimensional transform. The user documented through examples, tutorials and reference pages. can do any combination of 1D transforms and transposes, going The home page for this package is all the way to 3D FFT. This framework is extremely flexible and http://www.p3dfft.net. The package is released can be customized, for example, for cases such as de-aliasing in through github.com with an open source license. computational fluid dynamics, where only parts of the 3D spectrum are needed for the computation, and the rest can be P3DFFT++ uses an object-oriented design to encapsulate discarded, with resulting savings in compute time. P3DFFT++ various data structures and transforms into classes and class can be thought of as a FFTW-like standard for multidimensional templates, providing a clear and concise interface for the user. It spectral transforms on pre-exascale and exascale machines. presently supports four datatypes: single-precision real and complex, and double-precision real and complex. Most classes In designing P3DFFT++, special attention is paid to are defined as C++ templates in terms of input and output minimizing expensive operations, such as inter-processor datatypes, for example: communication, local memory transposes and other operations leading to cache misses. Even though P3DFFT++ uses a higher-

33 template A Z-pencil could be defined by dmap[]={1,2,0}, or class transform3D; alternatively, {2,1,0}. Both of these choices map Z onto the P3DFFT++ borrows the idea of transform planners from local processor grid dimension, in our definition. Z-pencils are FFTW. Namely, each transform (be it 1D or 3D) has a planner commonly used in applications involving pseudospectral function (usually contained in a C++ class constructor) that gets solution of Navier-Stokes and Poisson equations, where the called once when the transform is initialized, and contains any workflow goes from physical to spectral and then back to setup arrangements for execution (as well as possibly trial physical space. Assigning an X-pencil data layout to physical execution runs within an autotuning framework). Once a space and Z-pencil to spectral space is the most economical way, transform has been “planned”, it can be executed multiple times since it saves us from having to do extra inter-processor in a fast call. Using C++ classes is a convenient way to transposes in case we were to bring the data back to X-pencil encapsulate all the information and functions referring to a shape. The algorithm in the spectral space typically does not transform. involve any data shuffling between processors, and can thus be easily implemented in Z-pencils, taking into account that the Z A. Data layout descriptors dimension is local. 1) Processor Grid mapping 2) Memory storage mapping

As mentioned above, P3DFFT++ is intended to support very The above description describes the size and location of each general data layout, both in terms of grid decomposition and local portion of the grid in the global grid, as well as assigns each memory arrangement. We begin with discussion of 3D grids, MPI task its own position in the multidimensional processor with generalization for 4D straightforward. A 3D data grid can grid. The local portion of the grid for each task has dimensions be mapped onto a 1D, 2D or 3D processor grid. (Most 3D FFT easily computed as Li = Gi/PDi, where Gi is the global implementations use 1D or 2D decomposition, which is well- grid dimensions. Next consider how a three-dimensional array suited for the algorithm as it preserves at least one entirely local is stored in memory. The simplest case would be to store the dimension. However, some applications (for example those array simply following the logical dimensions ordering, namely from the field of Molecular Dynamics) have inherent 3D first dimension (X) is stored with stride-1 access, followed by Y decomposition, and it is necessary for a package like P3DFFT++ and Z (this is sometimes called Fortran, or row-major, storage, to deal with such cases, so this layout choice will be included in although the name is misleading, as it can be used both in future versions). Fortran and C). However, more generally, each logical dimension i (range 0 through 2) can be stored as rank M in the Processor grid is constructed via MPI Cartesian i communicators. To provide the user the most choice, processor mapping of the 3D local array onto the one-dimensional RAM grid is constructed in the most general way. Let PDi (i=0,1,2) of the node. Each Mi can have values from 0 to 2, with 0 being be dimensions of the processor grid. (In case of 2D the stride-1 access dimension and 2 being the largest-stride decomposition one of these will be equal to 1, and in case of 1D dimension. A basic ”Fortran” ordering is M[]={0,1,2}. decomposition two of these will be equal to 1.) Our convention Other examples of storage mapping are M[]={1,2,0}, or is such that i=0 dimension implies the fastest-changing index, M[]={2,1,0}. In both of the the latter examples, the logical i.e. adjacent MPI tasks, i=1 dimension has stride of PD0, and i=2 Z dimension of the array is stored first (stride-1). Such choice is dimension has stride PD0 x PD1 (“Fortran” ordering). Such a good fit for a Z-pencil decomposition, though by no means topological choices may in some cases be embedded in the required. application calling P3DFFT++. In other cases choosing a The reason we include a generalized memory ordering is to different topological mapping may potentially make a give the user more choice in defining how their data is stored. performance difference, depending on the topology of the The thinking here is two-fold: hardware. 1. The calling application may have its own storage scheme, Having defined the processor grid, there are multiple ways and the library can match those without the user having to the user’s data grid can be mapped onto it, and these can be change the code. defined by three integers DMAP[i]. 2. If starting from scratch, the user may want to choose Let us give an example of how a user may define an X- mappings that lead to optimal performance (such as Z- pencil, a common data layout pattern with 2D decomposition, pencil examples above). In many cases it might be with the X logical dimension local and Y and Z dimensions worthwhile to run a few simple numerical experiments to decomposed and mapped onto a 2D processor grid P1 x P2. The determine the best mapping, since a lot depends on the processor grid is defined by PD[] = {1,P1,P2}. Now we problem definition and system characteristics. map the data grid dimensions onto processor grid: dmap[] = The above descriptions of data decomposition in MPI space {0,1,2}. This maps the X logical dimension onto the first and local storage scheme are enough to avoid ambiguities in data P (local) processor grid dimension, the Y dimension onto 1-sized layout representation. Every word in memory location for every subcommunicator of adjacent MPI tasks, and finally, Z MPI rank is assigned its place in the global data and processor dimension onto a P2-sized subcommunicator with slowest- grids, and vice versa. The original 3D array is assumed to be changing MPI task index. contiguous, although in the future it is possible to expand the

34 description to include non-contiguous arrays, such as subarrays dimensions, and buffer sizes for the all-to-all exchange. The embedded in larger arrays. exchange itself may be implemented via MPI_Alltoall, or an alternative method such as pairwise exchange. Autotuning B. Base classes mechanism is planned in future versions to help establish the The ProcGrid class describes processor grid, as detailed best performing option for a given platform and problem. above. The DataGrid class includes all information about data layout for a given variable. It includes global and local (per MPI 3. Another class combines interprocessor transpose with task) grid dimensions G and L , the mapping of the data grid 1D transform, providing opportunities for optimization by i i minimizing memory access. onto the processor grid, and finally, the local memory storage ordering Mi. In short, this class describes all relevant aspects of These classes form the backbone of P3DFFT++. An data layout, and is used as metadata for array variables. For arbitrary sequence of these three types of stage classes defines example, when defining a 3D FFT, the DataGrid objects for an execution path for a multidimensional transform (with the input and output arrays may have the same global grid limitation that exactly three 1D transforms are included, for dimensions but different local dimensions, distribution among example, in the case of 3D transform). A user can arrange them MPI tasks and local storage layout. Different DataGrid in any manner (as long as the data descriptors are consistent objects can use the same ProcGrid. The DataGrid class is between the consecutive stages). oblivious to datatype of the data array, as that information is C. Higher-level classes encoded in the class templates for transform classes. P3DFFT++ also provides a higher-level class transform3D, P3DFFT++ defines a number of the most common spectral which combines individual stages needed for a given 3D 1D transform types, such as complex-to-complex FFT, real-to- transform in an optimized fashion. This class takes as input the complex FFT, cosine transform etc. This list includes an empty metadata descriptors (DataGrid objects) for input and output transform, implying simply a copy from input to output, as a arrays, as well as the three transform types to be used in X, Y convenience feature. Each 1D transform type is defined as a and Z dimensions. Constructor for this class includes planning class containing basic information such as the size of the of the 3D transform algorithm, including necessary calls to 1D transform N, number of transforms in a bunch m, data types for transform planners. Since there are multiple execution paths input and output, a pointer to the planner for this transform (such possible for a given 3D transform, this class also includes an as fftw_plan_dft_many in FFTW), and a pointer to the autotuning framework (in a preliminary version at the time of execution function (such as fftw_execute_dft). writing) to choose the best possible execution path (this is described in more detail below). The execution member function A general multidimensional transform is defined as a executes the path that the autotuner found to be the best. In sequence of 1D transforms of suitable types as well as local and addition, a derivative execution function is provided, in order to interprocessor transposes. In P3DFFT++ this is expressed as a combine the spectral transform with derivative calculation in the linked list of classes of type stage. Each stage can be of the Fourier space (such as multiplication by the suitable following three varieties (programmed through derived classes wavenumber). in C++): Future work will include more higher-order classes, for 1. The 1D transform class includes necessary information example a 4D transform, more derivative options (e.g. to execute 1D transform for a 3D array, including the size of the Laplacian, divergence, curl) and convolutions. transform, number of elements needing to be transformed independently, strides, and the transform type. The number of elements m can control whether we transform just one line IV. PERFORMANCE CONSIDERATIONS (m=1), one plane (m=L2, dimension of local array in the plane NL2) or the entire volume (m=L2L3). Notice that in contrast to the transform type class, which only describes the type of While some might expect a loss in performance of transform and is oblivious of the data, this class includes all P3DFFT++ due to the use of C++, as well as expanded feature relevant information about the data, such as DataGrid set, this loss turns out to be negligible. According to its design, metadata descriptors. The constructor for this class includes a the overhead from higher-level C++ features is small compared call to planning functions (for example if FFTW is used for 1D to the bulk of the computation, which is done either by FFTs, then a call to fftw_plan_dft_many or a similar specialized libraries such as FFTW, or a C-style code in critical planner is included). An execution member function executes portions of the software. In addition, ongoing work includes a the 1D transform that has been planned by the constructor and GPU interface. can be called multiple times. This class also includes local A. Interprocessor communication memory transposes, called independently or in conjunction with the transform to optimize memory access. Interprocessor communication is the main bottleneck for performance of spectral algorithms at large scale. It involves all- 2. The interprocessor transpose implements an exchange to-all exchanges, done repeatedly, either within the global equivalent to MPI_Alltoall in a Cartesian communicator, or within Cartesian sub-communicators. In subcommunicator, including all the needed packing and either case, this is an expensive operation with high data volume, unpacking, as well as local memory transposes. Its constructor that tends to stress the system interconnect’s bisection initializes fields such as the MPI communicator handle and bandwidth. As the size of high-end HPC architectures grows,

35 bisection bandwidth is typically not growing at the same rate, so transpose the input array, then to do a transform (in the performance of all-to-all communication is likely to become an dimension of the first index j, for all i), writing results into the even bigger bottleneck as time goes on. k’s space of the output array. The transposition inherently is not P3DFFT++ employs several strategies to minimize the cache-friendly (and could be further optimized by loop blocking, impact of this problem. Firstly, MPI implementations tend to as in other parts of the code), however compared to the separate differ in their implementation of collective and point-ot-point transpose and transform operations, we save one read+write communication calls. There are several ways to achieve the equivalent of the entire array. same result, e.g. one can use MPI_Alltoall for an all-to-all for all k: exchange, or implement this through a series of pairwise exchanges. One could also consider different shapes of the for all j and i: processor grid. To facilitate this optimization, an autotuner (see temp(j,i)=In(i,j,k) Sec. IV D below) can select the best path of execution, minimizing the cost of such all-to-all exchanges. In addition, an transform1D(Temp,Out(1,1,k)) overlap of communication with computation will be C. GPU capability implemented for certain types of transforms. Earlier results [10, 39-42] suggest this can partially hide the cost of the expensive Most modern pre-exascale computers include GPUs as part all-to-all exchanges. Finally, providing the pruned transforms of system design, and therefore it is important for competitive option (where only a part of the Fourier spectrum is kept) helps software packages to make use of this capability. Since GPU reduce the volume of data in such exchanges, with proportionate technology changes rapidly and different vendors often have decrease in cost. incompatible interfaces, it is imperative for any long-term package to include an interface that is portable and general B. Memory access enough. In addition to bisection bandwidth, spectral algorithms Work on GPU interface in P3DFFT++ is ongoing. Currently typically stress local memory bandwidth on each compute node. a trial version using CUDA implementation for NVIDIA GPUs In fact, some authors predict this bottleneck may even is in place for evaluation. It uses CUFFT and CUTENSOR overshadow the bisection bandwidth limits in future systems libraries from NVIDIA for 1D FFTs and memory transposes, [1,2]. In particular, this arises in three types of situations: respectively. Ongoing work includes asynchronous transfers, 1. Executing local 1D transforms, such as FFT. which will partially overlap compute and communication time with the time of data transfer to/from the GPU. In addition, it 2. Transposing the data locally in memory between 1D will include wrapper functions that are blind to the underlying transforms. GPU programming semantics. A good candidate for this is HIP 3. Packing/unpacking send/receive buffers for an interface from AMD, which bridges AMD and NVIDIA GPU interprocessor exchange. syntaxes. P3DFFT++ design is concerned with minimizing the D. Autotuning 3D transforms number of memory reads and writes. Especially concerning are non-stride-1 reads and writes. Such patterns of access are a Consider the case of 3D transform with 2D decomposition. known source of inefficiency, due to a high number of cache Input and output arrays are defined according to DataGrid misses involved. Unfortunately, such operations are an integral object descriptors, as explained above. This includes both part of spectral algorithms. P3DFFT++ follows the design decomposition in processor space and a mapping to memory choice common to most MD FFT implementations, namely storage. The algorithm must go through three stages of 1D calling an established 1D transforms via established library such transform, with two or more inter-processor transposes and two as FFTW. Since these transforms are cache-intensive, for best or more local memory transposes. The course of the algorithm performance these calls are done for data arranged in stride-1 consists of an assembled sequence of basic P3DFFT++ building pattern. P3DFFT++’s task, therefore, is to rearrange the data in blocks (see different types of stage described in Sec. III B). stride-1 pattern before calling each of the 1D transforms. This is These blocks have to be assembled in a way respecting the done by reorder functions implementing the loop blocking consistency of data layout between them, i.e. the output of one method to minimize the price for cache misses. Since no stage must be the same as the input for the next one. assumptions are made about the input and output data layouts, such reordering may be needed in the beginning and in the end In general, there is more than one combination of such of the run as well. assembly paths yielding the needed output (as was observed in [43]). Finding the best path is an optimization problem, in the P3DFFT++ takes advantage of opportunities for space of intermediate steps. One approach is to minimize the optimization in transition between the three main situations number of inter-processor exchanges, followed by minimizing listed above, by combining two of them in a way maximizing local memory transposes and/or their cost. This cost is not cache reuse. Below is an example (in pseudocode) of a known beforehand. Although at present a heuristic algorithm is combined call to 1D transform with memory transpose. The used, representing the author’s human experience, this may not code in this example transposes memory ordering from (0,1,2) always be optimal, considering the variety of problems and to (1,0,2), meaning that only the first two indices are architectures. Therefore, in the future P3DFFT++ will adopt an interchanged. A temporary array for each k value is used first to autotuning framework for measuring each of the best candidate

36 P3DFFT++ 2048^3 P3DFFT 2048^3 P3DFFT++ 1024^3 P3DFFT 1024^3

4.00E+00 2.00E+00 1.00E+00 5.00E-01 2.50E-01 1.25E-01 Time Time (sec) 6.25E-02 3.13E-02 1.56E-02 256 512 1024 2048 4096 8192 16384 N cores

Figure 3. Performance comparison of P3DFFT++ and P3DFFT v. 2.7.9 on Stampede2 at TACC. Reported on the vertical axis is timing for a forward-inverse pair of real-to-complex/complex-to-real 3D FFT in seconds. assembly paths. Each path consists of a linked list of stages, and using namespace p3dfft; these paths are stored in a vector list. The autotuner goes through setup(); each path, measuring its execution time for a given number of repetitions. This is done in the planning call (constructor) of 3D transform class. The best-performing path is saved and used in the execution step. Next, we need to define our processor grid. In this case we will use 2D decomposition. E. Performance experiment int pdims[] = {1, p1, p2};

Here we compare performance of the latest CPU version of ProcGrid Pgr(pdims, MPI_COMM_WORLD); P3DFFT++ with P3DFFT v. 2.7.9. We used Stampede2 Now we can set up input and output DataGrid descriptors. platform at TACC (using Intel’s Knights Landing Nodes with For this we will need global grid dimensions, processor grid 68 cores per node, of which 64 were used, and 100 Gb/sec Intel mapping information and memory ordering map. Omni-Path network with a fat tree topology employing six core switches). We have used a pair of real-to-complex and complex- int gdims[] = {nx,ny,nz}; to-real 3D FFT, which is relevant to many applications. We int m1[] = {0,1,2}; tested grid sizes 10243 and 20483, with 2D processor decomposition. Reported numbers in Fig. 3 are the timing for int dmap1[] = {0,1,2}; the forward/inverse transform pair, with the optimal processor Construct the input DataGrid object, in this case X-pencil: grid dimensions for each case. We see comparable performance of P3DFFT++ and P3DFFT, with P3DFFT++ slightly winning DataGrid Xpencil(gdims,-1,Pgr,dmap1,m1); in most cases. Note that these results were obtained with a version of the code without the autotuner, and the latter can be Now define and construct the output grid object: expected to further improve performance. int m2[] = {2,1,0}; int dmap2[] = {1,2,0};

V. USING P3DFFT++ DataGrid Zpencil(gdims,-1,dmap2,m2); Now define which type of 3D transform we want. In this In this section we demonstrate the use of P3DFFT++ for case, all three dimensions will have complex-to-complex calculation of complex-to-complex 3D FFT. In this case, 2D forward FFT in double precision: decomposition is used. The input is in X-pencils, with the most int type_ids[3] = {CFFT_FORWARD_D, basic memory ordering, while the output is in Z-pencil, with CFFT_FORWARD_D,CFFT_FORWARD_D}; memory ordering such that Z dimension is stored with stride 1. We will use C++ code for this demonstration, but C and Fortran trans_type3D type_cft_forward(type_ids); interfaces are also available, and example programs are provided in the distribution. Now find local dimensions of the input array and allocate space for it (note that mem_order mapping is used to translate First, call P3DFFT++ initialization function setup once from logical to physical storage indices): before any use, remembering to use p3dfft namespace:

37 int sdims1[3]; complex_double *IN=new complex_double[size1]; for(i=0;i<3;i++) sdims1[m1[i]] = Xpencil.ldims[i]; Now do the same for output array: int size1 = sdims1[0]*sdims1[1]*sdims1[2]; int sdims2[3]; for(i=0;i<3;i++) Ongoing work includes integrating more performance- sdims2[m2[i]] = Zpencil.ldims[i]; improvement features, such as asynchronous GPU operations, pruned transforms and overlap of communication with int size2 = sdims2[0]*sdims2[1]*sdims2[2]; computation, in order to make the package practical for exascale platforms. 4D transforms and 3D decomposition are features of complex_double *OUT=new interest to the community and will also be incorporated into the complex_double[size2]; package. Next we construct the 3D transform, including planning and finding the best execution path through autotuning, as described ACKNOWLEDGMENT above. This work used the Extreme Science and Engineering transform3D Discovery Environment (XSEDE), which is supported by trans_f(Xpencil,Zpencil,&type_cft_forward) National Science Foundation grant number ACI-1548562. In ; particular, it has used Comet platform at San Diego Supercomputer Center/UCSD, and Stampede2 platform at Now the input can be initialized. Then we are ready to TACC/U. Texas at Austin. execute the transform, as many times as necessary. This research used resources of the Oak Ridge Leadership for(i=0;i < nRep;i++) { Computing Facility at the Oak Ridge National Laboratory, … trans_f.exec(IN,OUT); … } which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05- After P3DFFT++ is done, call cleanup() to deallocate 00OR22725. temporary variables P3DFFT++ uses: This work was supported by U.S. NSF grant OAC-1835885. cleanup(); REFERENCES Using the same testing framework, many other kinds of transforms could have been defined, such as real-to-complex, [1] K. Czechowski, C. Battaglino, C. McClanahan, K. Iyer, P.K. Yeung and cosine, sine, or a user-defined transform, in any reasonable R. Vuduc, “On the communication complexity of 3D FFTs and its combination for three dimensions. Also various alternative data implications for exascale”, Proceedings of the 26th ACM international layout options are possible simply by changing pgrid, conference on Supercomputing, pp. 205-214, June 2012. proc_ordering and mem_order. More details can be found in the [2] C. McClanahan, K. Czechowski, C. Battaglino, K. Iyer, P.K. Yeung and user guide and tutorial at http://www.p3dfft.net. R. Vuduc, “Prospects for scalable 3D FFTs on heterogeneous exascale systems”, ACMIEEE conference on supercomputing, SC. 2011 [3] H. Gahvari, and W. Gropp, “An introductory exascale feasibility study for FFTs and multiautotune”, IEEE International Symposium on Parallel & VI. CONCLUSIONS AND FUTURE WORK Distributed Processing (IPDPS), pp.1-9, April 2010. [4] D. Pekurovsky, “P3DFFT: A framework for parallel computations of Fourier transforms in three dimensions”, SIAM Journal on Scientific We have provided a motivation for a new adaptable software Computing, 34(4), pp.C192-C209, 2012. framework for multidimensional FFTs and other spectral [5] N. Li, and S. Laizet, “2decomp & fft-a highly scalable 2d decomposition transforms. We have listed the desirable characteristics of such library and fft interface”, Cray User Group 2010 conference, pp. 1-13, framework, such as adaptability in terms of problem scope and May 2010. architecture features, extending far beyond the existing [6] D. Takahashi, ”An implementation of parallel 3-D FFT with 2-D multidimensional FFT libraries. This list formed the basis for decomposition on a massively parallel cluster of multi-core processors”, International Conference on Parallel Processing and Applied creation of an open source P3DFFT++ library package. We have Mathematics, pp. 606-614, Springer, Berlin, Heidelberg, September provided an overview of its design choices, discussed 2009. performance features and demonstrated using the package for a [7] A. Gholami, J. Hill, D. Malhotra and G. Biros, “AccFFT: A library for common 3D FFT case. P3DFFT++ (available at distributed-memory FFT on CPU and GPU architectures”, arXiv preprint http://www.p3dfft.net) is written in C++ with interfaces for C arXiv:1506.07933, 2015, unpublished. and Fortran. It has been documented and tested on a variety of [8] T.V.T. Duy and T. Ozaki, “Hybrid and 4-D FFT implementations of an problems. At present, the functionality of P3DFFT++ includes open-source parallel FFT package OpenFFT”, The Journal of most features present in P3DFFT and other comparable libraries, Supercomputing, 72(2), pp.391-416, 2016. while far surpassing them in terms of the data options. Certain [9] M. Pippig, “An efficient and flexible parallel FFT implementation based on FFTW”, Competence in High Performance Computing, pp. 125-134, advanced features are still being developed and tested. Springer, Berlin, Heidelberg, 2011. Performance considerations are of primary importance in [10] J.H. Göbbert, H. Iliev, C. Ansorge and H. Pitsch, H., “Overlapping of this discussion. At this time P3DFFT++ performance is Communication and Computation in nb3dfft for 3D Fast Fourier Transformations”, In Jülich Aachen Research Alliance (JARA) High- comparable to that of P3DFFT for the cases we have tested.

38 Performance Computing Symposium, pp. 151-159, Springer, Cham., [29] E.J. Bylaska et al, “Transitioning NWChem to the Next Generation of September 2016 Manycore Machines”, Exascale Scientific Applications: Scalability and [11] S. Plimpton, R. Pollock and M. Stevens, “Particle-Mesh Ewald and Performance Portability, ch. 8., 2017. rRESPA for Parallel Molecular Dynamics Simulations”, PPSC, March [30] A. Canning, J. Shalf, N. J. Wright, S. Anderson and M. Gajbe, "A hybrid 1997. MPI/OpenMP 3d FFT for plane wave first-principles materials science [12] A. Ayala, S. Tomov, A. Haidar, J. Dongarra. "heffte: Highly efficient fft codes", Proceedings of the International Conference on Scientific for exascale." In International Conference on Computational Science, pp. Computing (CSC), p. 1. The Steering Committee of The World Congress 262-275. Springer, Cham, 2020. in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2012. [13] D. Donzis, P.K. Yeung and K.R. Sreenivasan, “Dissipation and enstrophy in isotropic turbulence: resolution effects and scaling in direct numerical [31] M. Gajbe, A. Canning, L.W. Wang, J. Shalf, H. Wasserman and R. simulations”, Physics of Fluids, 20(4), p.045108, 2008. Vuduc, “Auto-tuning distributed-memory 3-dimensional fast Fourier transforms on the Cray XT4”, Proc. Cray User's Group (CUG) [14] H. Homann, O. Kamps, R. Friedrich and R. Grauer, “Bridging from Meeting,May 2009. Eulerian to Lagrangian statistics in 3D hydro-and magnetohydrodynamic turbulent flows”, New Journal of Physics, 11(7), p.073020, 2009. [32] M. Lee, N. Malaya and R.D. Moser, “Petascale direct numerical simulation of turbulent channel flow on up to 786k cores”, Proceedings [15] S. Laizet, E. Lamballais and J.C. Vassilicos, “A numerical strategy to of the International Conference on High Performance Computing, combine high-order schemes, complex geometry and parallel computing Networking, Storage and Analysis, p. 61, September 2013. for high resolution DNS of fractal generated turbulence”, Computers & Fluids, 39(3), pp.471-484, 2010. [33] S. Song and J.K. Hollingsworth, J.K., “Computation–communication overlap and parameter auto-tuning for scalable parallel 3-D FFT”, Journal [16] L. Thais, A.E. Tejada-Martínez, T.B. Gatski, G. Mompean and H. Naji, of Computational Science, 14, pp.38-50, 2016. “Direct and Large Eddy Numerical Simulations of Turbulent Viscoelastic Drag Reduction”, Wall Turbulence: Understanding and Modeling, pp. [34] J. Jung, C. Kobayashi, T. Imamura and Y. Sugita, “Parallel 421-428, Springer, Dordrecht, 2011. implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics simulations”, Computer Physics [17] P.K. Yeung and C.A. Moseley, “A message-passing, distributed memory Communications, 200, pp.57-65, 2016. parallel algorithm for direct numerical simulation of turbulence with particle tracking”, Parallel Computational Fluid Dynamics 1995 (pp. 473- [35] A.G. Chatterjee, M.K. Verma, A. Kumar, R. Samtaney, B. Hadri, B. and 480), 1996. R. Khurram, “Scaling of a Fast Fourier Transform and a pseudo-spectral fluid solver up to 196608 cores”, Journal of Parallel and Distributed [18] T. Engels, D. Kolomenskiy, K. Schneider, F.O. Lehmann and J. Computing, 113, pp.77-91, 2018. Sesterhenn, “Bumblebee flight in heavy turbulence”, Physical review letters, 116(2), p.028103, 2016. [36] “Developer Reference for Intel® oneAPI Math Kernel Library”, https://software.intel.com/content/www/us/en/develop/documentation/o [19] A. Beresnyak, “Spectra of strong magnetohydrodynamic turbulence from nemkl-developer-reference-c/top.html high resolution simulations”, The Astrophysical Journal Letters, 784(2), p.L20, 2014. [37] “IBM Engineering and 3cienti®c Subroutine Library for Linux on POWER”, [20] S. Lange and F. Spanier, “Evolution of plasma turbulence excited with https://www.ibm.com/docs/en/SSFHY8_6.1/reference/essl_reference_pd particle beams”, Astronomy & Astrophysics, 546, p.A51, 2012. f.pdf [21] L. Arnold, C. Beetz, J. Dreher, H. Homann, C. Schwarz and R. Grauer, [38] “The API reference guide for cuFFT, the CUDA Fast Fourier Transform “Massively Parallel Simulations of Solar Flares and Plasma Turbulence”, library”, https://docs.nvidia.com/cuda/cufft/index.html Parallel Computing: Architectures, Algorithms and Applications, Bd, 15, pp.467-474, 2008. [39] K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, S. Sur and D.K. Panda, “High-performance and scalable non-blocking all-to-all with [22] N. Peters, L. Wang, J.P. Mellado, J.H. Gobbert, M. Gauding, P. Schafer collective offload on InfiniBand clusters: a study with parallel 3D FFT. and M. Gampert, “Geometrical properties of small scale turbulence”, Computer Science-Research and Development, 26(3-4), p.237, 2011. Proceedings of the John von Neumann Institute for Computing NIC Symposium, Juelich, Germany (pp. 365-371), February 2010. [40] [35] K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky and D.K. Panda, “A Novel functional partitioning approach to design high- [23] J. Schumache and M. Pütz, “Turbulence in Laterally Extended Systems”, performance MPI-3 non-blocking alltoallv collective on multi-core PARCO, pp. 585-592, 2007. systems”, Parallel Processing (ICPP), 2013 42nd International [24] P.J. Ireland, T. Vaithianathan, P.S. Sukheswalla, B. Ray, and L.R. Collins, Conference on (pp. 611-620). IEEE, October 2013. “Highly parallel particle-laden flow solver for turbulence research”, [41] H. Subramoni, A.A. Awan, K. Hamidouche, D. Pekurovsky, A. Computers & Fluids, 76, pp.170-177, 2013. Venkatesh, S. Chakraborty, K. Tomko and D.K. Panda, “Designing non- [25] P. Fede and O. Simonin, “Numerical study of the subgrid fluid turbulence blocking personalized collectives with near perfect overlap for RDMA- effects on the statistics of heavy colliding particles”, Physics of Fluids, enabled clusters”, International Conference on High Performance 18(4), p.045103, 2006. Computing (pp. 434-453). Springer, Cham, July 2015. [26] S. Banerjee, A.G. Kritsuk, “Energy transfer in compressible [42] D. Pekurovsky, A. Venkatesh, S. Chakraborty, K. Tomko and D.K. magnetohydrodynamic turbulence for isothermal self-gravitating fluids”, Panda, “Designing Non-blocking Personalized Collectives with Near Phys. Rev. E, v.97, no. 2, p. 023107, 2018. Perfect Overlap for RDMA-Enabled Clusters”, High Performance [27] J. Bodart, L. Joly and J.B. Cazalbou, “Large scale simulation of Computing: 30th International Conference, ISC High Performance, turbulence using a hybrid spectral/finite difference solver”, Parallel Frankfurt, Germany, Proceedings, Vol. 9137, p. 434, Springer, July 2015. Computational Fluid Dynamics: Recent Advances and Future Directions, [43] T.V.T. Duy and T. Ozaki, “A decomposition method with minimum pp.473-482, 2009. communication amount for parallelization of multi-dimensional FFTs”, [28] M. Frigo and S.G. Johnson, “The design and implementation of FFTW3”, Computer Physics Communications, 185(1), pp.153-164, 2014. Proceedings of the IEEE, 93(2), pp.216-231, 2005.

39