ON THE PHENOMENON OF UNACCOUNTED FOR GAS Baseline formulation and error detection techniques

Thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Engineering and Physical Sciences

2021

Lubomir Botev Supervisor: Dr. Paul Johnson School of Mathematics Contents

List of Tables8

List of Figures 11

Abstract 16

Declaration 17

Copyright Statement 18

Acknowledgements 19

1 Introduction 20 1.1 Transportation...... 22 1.2 The NTS...... 24 1.2.1 Operating Conditions...... 25 1.2.2 Nodes...... 27 1.2.3 Linepack...... 29 1.3 Grid Balancing and Introducing UAG...... 29 1.4 Defining UAG...... 32 1.5 Literature Review...... 34 1.6 Scope...... 36 1.7 Contributions...... 38

2 Sources of Uncertainty 40 2.1 Classifying Errors...... 42 2.1.1 Random Error...... 42

2 2.1.2 Systematic Error...... 42 2.1.3 Decomposing UAG...... 43 2.2 Atmospheric Emissions...... 43 2.3 Metering Errors...... 45 2.3.1 The Measurement Chain...... 45 2.3.2 Random Error...... 47 2.3.3 Systematic Error...... 49 2.3.4 A Note on Error Cases...... 50 2.3.5 Uncertainty and Bias Drift...... 53 2.3.6 Estimating Total Measurement Uncertainty...... 53 2.4 Attributed Measurements...... 63 2.4.1 Failure Modes...... 64 2.4.2 Error Estimation at Attributable Sites...... 65 2.5 Linepack...... 71 2.5.1 Calculation...... 71 2.5.2 Temperature...... 72 2.5.3 Multiphase Flow...... 76 2.5.4 Linepack Uncertainty...... 76 2.6 Non-integrated Energy...... 78 2.7 Accounting Errors...... 79 2.8 Closeout Period...... 80 2.8.1 The Dataset...... 81 2.8.2 Analysis...... 81 2.8.3 Mitigating Closeout Period Errors...... 82 2.9 Additional Factors...... 84 2.9.1 Non-technical Losses (NTL)...... 84 2.9.2 Conflicting Interests...... 85 2.9.3 Billing Cycle Discrepancies...... 85 2.10 Analysis...... 86 2.11 Conclusion...... 88

3 3 Daily Baseline 89 3.1 Measures of UAG...... 90 3.1.1 Expressing UAG...... 90 3.1.2 Aggregation Frequency...... 91 3.1.3 Aggregation Function...... 91 3.2 NTS and UAG Statistical Analysis...... 94 3.2.1 UAG...... 94 3.2.2 UAG Predictors...... 101 3.2.3 Exploratory Regression...... 106 3.2.4 Nodes...... 112 3.3 Baseline Model...... 116 3.3.1 Uncertainty Based Approach...... 116 3.3.2 Statistical Approach...... 118 3.3.3 Aggregate Baseline Model...... 123 3.3.4 Decision Intervals...... 124 3.4 Model Performance Results...... 125 3.4.1 Additional Diagnostics...... 127 3.4.2 Baseline Calculation Methodology...... 130 3.5 Efficacy of Baseline Method...... 130 3.5.1 Nature of Error Identification...... 131 3.5.2 Historic Errors...... 131 3.5.3 Results...... 135 3.5.4 Extreme Values...... 136 3.5.5 Analytic Performance Estimation...... 136 3.6 Conclusion...... 142

4 Systematic Error Detection 143 4.1 Motivation and Case Study...... 143 4.2 Methodology...... 146 4.2.1 Algorithms...... 148 4.2.2 Metrics...... 149 4.3 Detection Problem...... 151

4 4.3.1 Offline...... 152 4.3.2 Online...... 152 4.3.3 Reconciled UAG...... 153 4.4 Effectiveness and Limitations...... 155 4.4.1 Simulation Procedure...... 156 4.4.2 Results...... 157 4.5 Discussion...... 159 4.5.1 Conclusion...... 161

5 UAG-led Node Error Detection 162 5.1 Joint Energy Balancing...... 162 5.1.1 Distribution Grids...... 163 5.1.2 Power Stations...... 168 5.1.3 Industrial Customers...... 184 5.1.4 LNG Terminals...... 184 5.1.5 Compressor Station Metering...... 185 5.1.6 Interpretation...... 185 5.2 Predicted vs Actual Flow Analysis...... 187 5.2.1 Methods of Prediction...... 187 5.2.2 Classification...... 189 5.2.3 Statistical Prediction in Literature...... 190 5.2.4 NG Load Forecast...... 191 5.2.5 Interpretation...... 192 5.2.6 Predicted UAG...... 192 5.3 Comprehensive Error-reducing Process...... 193 5.3.1 Baseline Contingent Error Minimisation...... 193 5.3.2 Independent Error Minimisation...... 194 5.3.3 Investigative threshold...... 195 5.3.4 Discussion...... 196 5.4 Conclusion...... 197

6 UAGMS – Industrial Integration 198 6.1 Solution Architecture...... 198

5 6.2 UAGMS Backend...... 199 6.2.1 Data Provision...... 199 6.2.2 Hosting...... 201 6.3 Features...... 202 6.3.1 UAG Monitor Tab...... 202 6.3.2 Causality Detection Tab...... 203 6.3.3 LDZ Weather Model Tab...... 204 6.3.4 Changepoint Analysis Tab...... 205 6.3.5 Reporting Tab...... 205 6.3.6 Data Configuration Tab...... 205 6.3.7 Help Page...... 206 6.4 Summary and Future Development...... 206

7 Summary and Recommendations 215 7.1 Recommendations...... 215 7.1.1 Monitoring and Statistical Control...... 215 7.1.2 Increasing UAG Calculation Frequency...... 216 7.1.3 Data Confidence...... 216 7.1.4 Standardisation of Reporting...... 217 7.1.5 Data Sharing...... 217 7.1.6 Cross-departmental approach to UAG management...... 218 7.1.7 Documentation of Sources of Uncertainty...... 218 7.1.8 Volumetric Balancing...... 218 7.1.9 Adoption of an Independent Error Reducing Process...... 219 7.1.10 Development of an Action Plan...... 219 7.1.11 Linepack Modelling Evaluation...... 219 7.1.12 Automation of the UAG Calculation...... 219 7.2 Summary...... 220 7.2.1 Limitations...... 221 7.2.2 Future Research Direction...... 221 7.2.3 Closing Remarks...... 222

A Glossary 223

6 B CWV calculation 225

Bibliography 226

7 List of Tables

1.1 Typical gas composition...... 26 1.2 Node type and aggregate daily flows by group, 2019. Flows are shown in terms as pure energy and percentage throughput, in terms of the mean and standard deviation...... 28 1.3 Yearly UAG and OUG...... 32

2.1 Relevant expanded uncertainties at the 95% confidence level in key measure- ment chain components...... 47 2.2 Mean daily CV, mean daily CV standard deviation for select offtakes, 2017. 66 2.3 Select UK weather station soil data for 2017, from MIDAS database. Subscript denotes soil depth in cm, for the indicated function: µ, σ, min, max denote the mean, standard deviation, minimum and maximum temperature respectively. idsrc indicates the MIDAS id...... 75 2.4 Meter vs Data error statistics, 2011-2016...... 79 2.5 Summary statistics by correction group. All energy values are in GWh. Mean and Sum are based on absolute values...... 81 2.6 UAG error sources in the NTS. M.Cap refers to the overall capacity of the error type to be modelled, and accounted for in a composite model. Impact refers to the potential impact on UAG. UAG ± indicates whether the error has a strictly positive, negative or mixed impact on UAG...... 87

3.1 Shapiro-Wilks normality test for UAG by year, 2012-2020...... 99 3.2 Multivariate regression model, 2015-2020 for predictors in Section 3.2.2. Es- timates, standard error and significance levels for predictors are shown, in addition to model fit statistics...... 109

8 3.3 Percentage throughout multivariate regression model, 2015-2020 for normalised predictors in Section 3.2.2. Estimates, standard error and significance levels for predictors are shown, in addition to model fit statistics...... 110 3.4 Regression models compared, with goodness of fit statistics...... 110 3.5 Forecast metrics for 1-step ahead UAG forecasts, with model parameters con- stant. Values in GWh...... 122 3.6 Forecast metrics for 1-step ahead UAG forecasts, with model re-fitting with each new data point. Values in GWh...... 122 3.7 Number of days exceeding upper (+) and lower (-) decision intervals, over the period 16-02-2015 to 23-05-2018. Bias indicates the difference between percentage upper and lower interval incursions. All decision intervals are at the 95% level...... 126 3.8 Number of days exceeding upper (+) and lower (-) decision intervals, over the period 16-02-2015 to 23-05-2018 for aggregate models. Bias indicates the difference between percentage upper and lower interval incursions. All decision intervals are at the 95% level...... 126 3.9 Diagnostic measures for prediction intervals. Interval width and MAE are in GWh. Excess UAG is the summed quantity across the interval. Key columns used to assess the intervals are S-score, where a lower value is better, and RF, where ideally the value must be as close to 0.95 as possible...... 129 3.10 Performance of baseline methods against historic errors, with metrics as de- fined in the previous section. P & R refers to predicted and reported days containing errors. The latter three columns are of chief importance, where a high IS, IES and INPV are desirable. Due to low sample size, many models have identical performance. The best model will represent a trade-off between IS, IES, and INPV - in this case, the Interior trim and Weighted mean have good historical performance...... 134

9 3.11 Mean detection probabilities, standard deviation of detection probabilities and EaR by node type, NTS nominal flow values, 2016-2020. We can see that increasing error sizes result in higher detection probabilities, and lower EaR values. Ideally, EaR should be as low as possible at a given error level. Node groups with larger mean flows had higher detection probabilities across the board...... 140

4.1 Systematic errors in exit nodes in the NTS, 2011-2016. Source: Ofgem... 144 4.2 Offline analysis detected changepoint locations, and performance metrics... 152 4.3 Results of online methods. The first four columns tabulate whether the re- spective changepoints were detected. Astart, Bstart, Bend, and Aend denote the Aberdeen Start, Braishfield B Start, Braishfield B End, and Aberdeen End changepoints respectively. 1 indicates a successfully identified, and a 0 the contrary...... 154

5.1 Downstream energy flow distribution by grid size, with flow as a percentage of total LDZ demand...... 165 5.2 Mean node detection probabilities, standard deviation of node detection prob- abilities and EaR for the 3 test scenarios of the baseline method, in LDZ offtake nodes...... 167 5.3 Power stations, 2019. Type of power train is show, alongside number of days active, mean and standard deviation of efficiency, R2 coefficient of model fit as per Equation (5.5)...... 171 5.4 Error detection probabilities for 5% and 10% of nominal flow, as compared to baseline method...... 183 5.5 Actions recommended depending on upstream and downstream system energy conservation state, respective to baseline...... 186 5.6 Forecast metrics for NG linear regression daily LDZ energy flow prediction methodology, 2017-2019. MAE, MSE, RMSE are given in GWh...... 191

10 List of Figures

1.1 New pipelines in capacity and kilometres...... 21 1.2 Transmission and distribution grid connections and components...... 24 1.3 UK National Transmission System...... 25 1.4 Natural Gas Daily Volume, Kirkstead Offtake, 2019-01-08...... 30 1.5 NTS UAG in addition to Shrinkages terms, for the years 2016-2020. Rolling 7-day average is overlayed in black. Note the different y-axis scales...... 33

2.1 NTS data flow from metering at node to UAG calculation...... 41 2.2 Measurement chain...... 46 2.3 Typical error curve vs flow rate for an ultrasonic meter. Relative uncertainty

is shown on the y-axis, and flow rate on the x-axis. Qmin and Qmax denote

that minimum and maximum calibrated flow ranges. Qs denotes the minimum

registered flow, and Qt a nominal flow one would expect through the meter. The dashed lines indicate the meter’s quoted uncertainty limit, in this example it is ±1%...... 50 2.4 Examples of various random errors, and their impact on the UAG. The red line indicates the corrected flow and UAG, whilst the black line indicates the pre-reconciled amounts. Node flow is on the left-hand side, and UAG on the right...... 52 2.5 System uncertainty surface as defined by Equation (2.15)...... 56 2.6 UAG Uncertainty Model 1, r=0.15, 95% confidence...... 59 2.7 UAG Uncertainty Model 1, r=0.15, plotted as a percentage of system throug- put, 95% confidence. Notice the confidence interval appears to almost be a constant...... 59

11 2.8 95% confidence intervals as suggested by Models 1-4. 7-day rolling averages are plotted in order to improve the visualisation and allow for comparison. Models are symmetrical on the x-axis and thus only the positive half of the y-axis is shown...... 62 2.9 Daily Calorific Value, Kirkstead, 2019-01-08...... 66 2.10 Mean Std error for the CV of an attributable site, NTS, based on 11 nodes

m seen in Table 2.2 with 95% confidence interval for 18.55 s gas transit speed. Red line indicates the model fitted in Equation (2.23)...... 69 2.11 Mean standard error for the CV of an attributable site, NTS, against a range of gas transit speeds...... 70 2.12 2017 mean daily 9am temperature across 131 MIDAS weather stations for varying soil depths...... 76 2.13 NTS billing errors throughout the period 2013-2018, grouped according to data system and meter errors, with 3 outliers removed...... 80 2.14 Number of corrections...... 82 2.15 Number of corrections...... 83

3.1 Monthly UAG, plotted using the various aggregation functions discussed in Section 3.1.3...... 93 3.2 UAG Boxplots, years 2014-2019, 7 outliers removed...... 95 3.3 Scatter plot of US yearly UAG percentage by state, against system through- put. Note that UAG is here measured in volume rather than energy, in millions of cubic feet. Data sourced from Natural Gas Annual [85]...... 96 3.4 Greek transmission network monthly aggregate UAG in GWh...... 97 3.5 (Partial) Autocorrelation, UAG, 2014-2018...... 98 p 3.6 Density plot of Ut and Ut , 2012-2018. The normal distribution is overlayed in red on both graphs. The higher peak and shorter tails are evident, as is the non-zero mean...... 100 p 3.7 Normal quantile-quantile plot of Ut and Ut , 2012-2018. Points should theo- retically lie on the red line, if normally distributed; We can see evidence of heavy tails, especially on the right hand side for both plots. This indicates more extreme values present in the data, than would be expected...... 101

12 3.8 Monthly UAG and absolute UAG totals, year vs month, 2013-2020. First row is simple summation, second row is absolute sum...... 102 3.9 UAG plotted with yearly aggregate demand and North LDZ’s CWV..... 103 3.10 FWMU for S/D bias and total throughput; Proportion of network throughput by meter; FWMU by meter type, and FWMU* by meter type. 2014-2020, NTS107 3.11 LMG calculated for rolling windows of sizes 100 (R) and 200 days (L), 2015- 2020. Normalised view can be seen on the bottom row...... 113 3.12 Depiction of site flow profile for a variety of site types...... 115 3.13 Mean daily non-zero flow from demand (Right) and supply (Left) nodes. Note the difference in scales, along with the exponential pattern displayed by the demand nodes’ distribution...... 116 3.14 UAG Baseline evaluation process...... 130 3.15 UAG vs identified error size...... 132 3.16 Error detection probabilities for 5, 10, 20 and 50% scaling errors applied to nominal node flows in the NTS, 2018. Nominal node size is represented as a % of throughput on the y-axis...... 138 3.17 Percentage cumulative nominal network energy plotted against node detection probability, 2018...... 139 3.18 EaR vs uncertainty specified in the baseline model. The dashed blue line indicates levels in the NTS...... 141

4.1 Top (4.1a): Historic UAG over 2009-2010, the period including the meter errors. Note the high UAG variance; error is more evident when looking at the average line. Bottom (4.1b): Reconciled vs historic UAG average lines. The magnitude, along with timing of the errors is evident. In both plots, the highlighted periods (pink/yellow) cover the error durations...... 145 4.2 Top (4.2a): Isolated error magnitudes for Aberdeen and Braishfield B. Bottom (4.2b): Individual integrated daily flows for the above sites. Note Aberdeen is a continuous, highly seasonal series, while Braishfield B appears much more stochastic in nature...... 146 4.3 Top: NTS total demand, 2009-2010. Bottom: active (flowing) nodes, 2009- 2010. Correlation coefficient 0.867...... 147

13 4.4 UAG against detected changepoints for both historic (left, 4.4a) and reconciled datasets (right, 4.4b)...... 153 4.5 Simulated detection rate curve depicting the probability of detecting true positives for increasing levels of implied shock...... 158 4.6 Smoothed GAM curve of lag against the implied shock level S on the left (4.6a), and of absolute location error against implied shock S, seen on the right (4.6b)...... 159

5.1 Example of downstream distribution grids in two LDZs; Scotland is seen on the left, and the South West on the right...... 166 5.2 Typical power station daily efficiency profile. In this case, values for Rocksav- age power station are plotted, with the linear model line of best fit overlayed in blue and prediction intervals shaded in grey...... 172 5.3 Linear model for efficiency ratio, for a range of power stations. Red points indicate values removed prior to model fitting...... 174 5.4 Top: Linear model for efficiency ratio, with 95% confidence intervals. Bottom: Efficiency vs load factor for single-unit Seabank B power station. Asymptotic curve and 95% confidence band overlayed in red. Red points correspond to same days on both plots...... 176 5.5 Simulated efficiency/load profile for a power station consisting of two fully independent power trains. Composite optimal efficiency curve is overlayed in black...... 179 5.6 Efficiency vs load factor for two-unit Langage power station. Composite asymptotic curve derived from single-unit efficiency overlayed in red – poor fit in dual-turbine operation is evident...... 180 5.7 Efficiency vs load factor for a set of power stations. All stations excluding Seabank B and Epping Green are of the combined power train variety.... 181 5.8 Flowchart indicating control flow of a baseline contingent error minimisation process...... 194 5.9 Flowchart indicating control flow of an independent node-flow error minimi- sation process...... 195

6.1 Ideal scenario...... 200

14 6.2 Proposed data flow...... 200 6.3 UAGMS: UAG Monitor Tab...... 207 6.4 UAGMS: LDZ Weather Model Tab...... 208 6.5 UAGMS: Causality Detection Tab...... 209 6.6 UAGMS: Changepoint Analysis Tab...... 210 6.7 UAGMS: Reporting Tab...... 211 6.8 UAGMS: Report Output...... 212 6.9 UAGMS: Data Configuration Tab...... 213 6.10 UAGMS: Help Page Tab...... 214

15 Abstract

The phenomenon of Unaccounted-for-Gas (UAG) in natural gas transmission networks can be summarised as the failure to account for a percentage of network throughput – typ- ically around 0.3% per annum – resulting in an increased transmission cost. This thesis represents the first holistic approach to UAG management in literature. The chief areas of focus are threefold; identifying the causes of UAG within the transmission system, formulat- ing a baseline for UAG under normal operating conditions, and examining the significance of UAG and its role in detecting errors in flow measurement. Aside from these key topics, a statistical analysis of UAG in the UK is carried out and compared to international coun- terparts. Our research into sources of uncertainty uncovers errors in linepack estimation as significant contributors to UAG in the UK, and we propose measures to rectify them. Regarding the baseline, we combine uncertainty models with statistical methods to produce a hybrid model, which under certain conditions can be expected to account for all sources of uncertainty in a transmission system. We conclude that the sole reliance on baseline methods to uncover large errors in daily flow measurement is insufficient and propose ad- ditional statistical monitoring processes with unique considerations for the different node types. Special attention is paid to energy conservation within power stations, and the cal- culation of downstream distribution UAG as an additional error detection technique. We examine appropriate statistical process control methodologies, and in particular changepoint analysis in detecting systematic flow metering errors in the transmission grid. The thesis presents a case study implementing the developed statistical methods into National Grid’s highly regulated operating environment. We provide recommendations for the reduction of UAG and the improvement of analytic processes, which are applicable to most transmission grids across the world.

16 Declaration

No portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

17 Copyright Statement

i. The author of this thesis (including any appendices and/or schedules to this thesis) owns certain copyright or related rights in it (the “Copyright”) and s/he has given The Uni- versity of Manchester certain rights to use such Copyright, including for administrative purposes.

ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has from time to time. This page must form part of any such copies made. iii. The ownership of certain Copyright, patents, designs, trade marks and other intellectual property (the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned by the author and may be owned by third parties. Such Intellec- tual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. iv. Further information on the conditions under which disclosure, publication and commer- cialisation of this thesis, the Copyright and any Intellectual Property and/or Repro- ductions described in it may take place is available in the University IP Policy (see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any relevant Thesis restriction declarations deposited in the University Library, The University Library’s reg- ulations (see http://www.manchester.ac.uk/library/aboutus/regulations) and in The Uni- versity’s Policy on Presentation of Theses.

18 Acknowledgements

I’d like to thank my parents, Albena and Dobrin, who have supported me throughout my educational endeavours and adventures. Likewise, I would like to thank my supervisor, Dr Paul Johnson, for his support and National Grid (David, Zoe, Matt and Amardip) for their cooperation. Importantly, this research would not have been possible without the EPSRC funding through a CASE award. Moreover, all the past and present inhabitants of office 2.127 have been a constant source of encouragement and light-hearted banter, and immea- surably improved the experience - thanks Zubier, Lalo, Angeliki, Karolis, Pui, and Rajenki!

I’d like to also state my gratitude to the examining committee, Dr Georgi Boshnakov and Professor Chris Dent for their helpful feedback, which has made the thesis a better work.

These acknowledgements wouldn’t be complete without mentioning the people who con- tributed to making my 8 years at Manchester a fantastic experience – Adam, Jason, Falah, Basset, Ryan... to name just a few!

And finally, I’d like to thank you, dear reader, for taking an interest in this work and I hope you will find it useful.

December 2020 Manchester

19 Chapter 1

Introduction

Today, natural gas is ubiquitous in the everyday life of millions of people around the globe. At least 86% of UK households in 2019 used gas heating, with a majority also employing it for cooking. In many countries, it is a key component of the energy generation mix – in the UK, it typically accounts for around 30%. Around the world, it continues to be adopted as a cleaner alternative and a compromise between renewable energy and more polluting hydrocarbons like oil and . Indeed, whilst little room for growth is left in some mature gas transmission grids, many developing countries have yet to build robust grids. This, coupled with new innovations including those allowing the profitable extraction of previously inaccessible shale gas reserves has led to a construction boom in gas infrastructure. We can see in Figure 1.1 that 2018 saw the highest capacity of new pipelines ever launched, and there are currently over 129,086 km of pipeline in construction globally. Modern gas transmission grids are, along with electrical transmission grids the largest man-made machines on the planet. These highly complex and nationally critical systems are operated continuously, with millions relying on gas for their everyday needs. Evidence exists of society using natural gas as early as 500 BC, where it was transported via a primitive bamboo pipeline to boil water and extract salt in China. However, it was not up until the early 19th century that it began to be utilised at scale in modern society. This came following the invention of the gas lamp by Scottish engineer William Murdoch, who illuminated his own house in Cornwall using the contraption in 1792. The date is no coincidence with the Industrial Revolution undertaking place in the United Kingdom at the time – over the next few next years, developments in the extraction of gas from coal through high pressure carbonisation (coking) and storage technologies allowed

20 CHAPTER 1. INTRODUCTION 21

Figure 1.1: New pipelines in capacity and kilometres, by year of first operation. Worldwide. Source: GGON [31]

● Oil Pipelines ● Gas Pipelines

● 6,000,000 ●

● 5,000,000 ● ● ●

● 4,000,000 ● ● ● ● ● ● ● ● ● ● ● 3,000,000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2,000,000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1,000,000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● Barrels a day equivalent of new pipeline of new equivalent Barrels a day 1980 1990 2000 2010 2020 Year of first operation

30,000 ● ● 25,000 ● ● 20,000

● ● ● 15,000 ● ● ● ● ● ● ● ● ● 10,000 ● ● ● ● ● ● ● ● Kilometers of new pipeline Kilometers of new ● ● ● ● ● ● ● ● ● ● ● ● 5,000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ●

1980 1990 2000 2010 2020 Year of first operation for large scale gas night-time illumination in cities like London and Paris. Innovators like William Henry of the Manchester Mechanics Institute, a forerunner to the University of Manchester Institute of Science and Technology (UMIST) were key in the development of techniques and understanding at this stage. However, applications for gas came slowly. The Bunsen burner was not invented until 1855. Indeed, gas was frequently vented or combusted at wellheads across the world through- out this era; without the pipelines infrastructure and Liquefied Natural Gas (LNG) shipping we have today and the overall demand, there was no incentive to capture natural gas. Unlike oil, natural gas is difficult to transport and store. Even harder is the process of accurately measuring the rate, or amount of gas flowing through a pipe. CHAPTER 1. INTRODUCTION 22

Whilst today the industry is in the midst of a digital revolution, with smart meters finding their way into end customers’ homes the rotary gas meter was not invented until 1817; before that, users of gas lamps were charged by the amount of lamps they had, rather than the amount of gas they burned. In the UK, it was not until the discovery of natural gas in the North Sea in 1967 that the fuel ubiquitous throughout homes today was adopted in favour of the coke gas. This was a much cleaner fuel, and eliminated the need for the polluting and unhealthy gas works in major cities. This fuel was also almost twice as efficient as coke gas; whilst the latter’s typical Calorific Value (CV) – or energy released upon combustion – is only in the region

MJ MJ of 20 m3 , natural gas has one of around 39 m3 . The National Transmission System (NTS) as we know it today was subsequently constructed, allowing for a North-South flow. In the 90’s, during the aptly named ‘’ a shift occurred in the electrical generation mix. It marked moving away from coal and towards gas generation due to its cost efficiency, lower pollution and a new competitive market following the privatisation of the gas industry. We are currently in the next big shift in the energy industry; there is strong movement to- wards decarbonisation, decentralisation and digitisation. The latter in particular is allowing for ever increasing datasets, from technical operating data to user consumption being made available for real-time analysis on a massive scale. Moreover, this has also generated much potential for the increasing application of statistical, mathematical and machine-learning techniques to be applied to the field, as will be explored. In the next section, we will briefly familiarize the reader with the modern natural gas transmission grid, and its place within the larger gas transportation industry. Our focus will then shift toward the NTS, and a brief technical overview will be presented. Afterwards, we will introduce the quantity known as Unaccounted for Gas, before discussing a variety of interesting problems that can be posed in the domain.

1.1 Natural Gas Transportation

The gas delivery system in many countries can be typically separated into two components: the transmission system, and the distribution system. We will now summarise their main features.

ˆ Transmission: The transmission system transports large of volumes of high pressure CHAPTER 1. INTRODUCTION 23

gas across long distances, typically between a relatively small number of input and output points. The former include offshore pipeline terminals, LNG terminals, and interconnectors linking to other transmission grids. Its end points are large power stations, heavy industry, and offtakes at city gates connecting to distribution grids. It also encompasses all the pipeline infrastructure, such as the pipes themselves, com- pressor and heating stations, and metering points. Included also are storage facilities, which are both input and output points. The transmission system is operated and controlled exclusively by National Grid (NG) in the UK, who is in turn accountable to the national regulator, Ofgem.

ˆ Distribution: The primary purpose of the distribution grid is to transport gas from the transmission grid offtakes to smaller customers like housing, commercial areas and light industry. Whilst there is usually only one transmission system in a country, typically there will be multiple distinct distribution grids, which may or may not be interconnected downstream serving distinct geographic regions (e.g, North East ). Furthermore, several of these distribution systems have their own gas storage facilities. Whilst traditionally the only input into these grids was the transmission system, increasingly gas is being inputted directly to such a system via biomethane, much like small scale electrical generation via renewables. This however will still account for only a small fraction of throughput, and the vast majority can still be attributed to the transmission grid. Distribution and transmission grids are sometimes operated by the same entity– however, this is not the case in the UK where distribution grids are independent of the NTS, with operators including Cadent, Gas Networks Ireland and Northern Gas Networks amongst others.

Importantly, both distribution and transmission grids transport two key quantities; gas, their primary function and raison d’etre, but also data. The multitude of sensors and ac- tuators across the network are all connected back to the control centre operated by NG. Likewise, equipment can be remotely controlled from the said centre. This computerised process control system is referred to as a SCADA: Supervisory control and data acquisi- tion, and every pipeline network around the world today is controlled in similar fashion. A typical transmission grid is illustrated in Figure 1.2, as are the interactions between the distribution, transmission, information grids and their respective start and end points. An CHAPTER 1. INTRODUCTION 24 important aspect to note is the downstream connection between separate distribution net- works – implications of this relate mainly to the ability to forecast demand at individual NTS offtakes.

Figure 1.2: Transmission and distribution grid connections and components.

Power Station

Storage

Compressor

Control Centre

Junction Heavy Industry Interconnector/Offshore Pipeline

LNG Terminal

Key

Transmission pipeline Distribution pipeline City B City A Unidirectional gas flow Storage Bi-directional gas flow Information flow Gas flow meter

Light Industry Housing Commercial

Next, we will examine the NTS in greater detail, focusing on key technical characteris- tics and some important concepts in gas transmission. Grids are not directly comparable across countries, as they operate in geographically distinct parts of the globe, facing different challenges, utilising different equipment and stemming from different eras. Therefore, it is important to put the forthcoming work in its specific industrial context, which is the NTS in the UK.

1.2 The NTS

The 8,780 km of welded steel pipeline, illustrated in Figure 1.3 constitutes the key component of the NTS. Of course, this is not a single continuous pipeline but rather a system of separate pipelines, linking up with each other at junctions that form the transmission grid. The grid was been engineered with an initial North-South flow in mind; and thus, southbound pipes originating at the seaside town of St Fergus in Scotland were the first to be constructed. Additional capacity has been added throughout the decades, through the installation of new CHAPTER 1. INTRODUCTION 25 pipelines. With the decline of North Sea production, the North-South flow is no longer the prevailing supply in the NTS. In recent times a significant proportion of gas coming onshore at major southerly terminals like Bacton, and LNG terminals like Isle of Grain has been pumped northward.

Figure 1.3: UK National Transmission System pipelines overlaid on map of the UK. Image from Google Earth, pipeline locations provided by NG.

When considered internationally, the NTS in the UK is amongst the larger grids, espe- cially considering the UK land mass; it is dwarfed only by grids of countries such as Russia (50,000 km), and the USA (300,000 km)[31].

1.2.1 Operating Conditions

Whilst input pressure is sometimes sufficient to transport gas to its destination, in large pipelines and grids such as the NTS this is supplemented by strategically placed compressor stations which burn either natural gas itself as a fuel, or run electric-powered turbines. This facilitates the transfer of gas across the pipeline, and ensures delivery pressures at exit points stipulated by contractual agreements are maintained. This pressure ranges between 40 and CHAPTER 1. INTRODUCTION 26

Table 1.1: Typical gas composition. Cassidy, Richard (1972 ) [12] Component Volume % Methane 93.63 Ethane 3.25 Propane 0.69 Butane 0.27 Other Hydrocarbons 0.20 Nitrogen 1.78 Carbon Dioxide 0.13 Helium 0.05

90 bar. Compressor utilisation is one of the main tools at the control centre’s disposal, allowing for the control of flow across the network. However, this comes at the cost of fuel and is thus carefully monitored. Pipelines are made from welded steel with a pipe diameter in the range of 460-1050 mm, and are buried underground at typical depths of 1.1 − 2m. Whilst this means they are not exposed to above ground weather, they are still subject to temperature fluctuations due to changing ground temperature, which affects the rate of cooling of the gas inside. Gas temperature varies between 5-45◦ within the pipeline. Gas will typically move within the m pipeline at speeds of 18-24 s . The energy content of natural gas is not homogeneous throughout the network. Indeed, natural gas itself is a mixture of hydrocarbons, that varies in composition - typically, this will resemble the profile seen in Table 1.1. As a result, gas extracted from different sources will have different compositions, and importantly also equates to a different energy content. This energy is quantified by a gas composition’s calorific value, and usually ranges from MJ 37 to 44.5 m3 in the NTS, and is typically measured across input and output points by a gas chromatogram. The lack of oxygen means the mixture is non-ignitable within the environment of the pipeline, under standard conditions – this is what allows for natural gas to be transported safely, and internal maintenance operations like welding to be carried out without depressurisation. Importantly, different gas compositions mix within the pipeline. Therefore, the total energy within a pipe section can only be indirectly estimated from meter readings at nearby entry and exit points, combined with pipeline pressure measurements. CHAPTER 1. INTRODUCTION 27

1.2.2 Nodes

Throughout this work, we will refer to NTS input and output points as nodes. These take the forms listed below, where they have been categorised as being either supply side (input), or demand side (output).

Supply

ˆ Offshore Pipeline: Pipelines coming from offshore rigs around the North Sea oil reserves will come aground at a terminal, where some short-term storage may be avail- able, and link directly to the NTS.

ˆ Shipping Terminals: Ships transporting LNG unload here – a recently constructed example is the Isle of Grain terminal, near London.

ˆ Interconnectors: Undersea pipelines to mainland Europe. Namely, they are the Bacton-Zeebrugge (Belgium) interconnector, and the Bacton-Balgzand (Netherlands) interconnector. The Moffat interconnector connects the UK to the Republic of Ireland, which is at the current time export only.

ˆ Withdrawal from Storage: Storage takes the form of salt mines, cavities and depleted gas/oil fields, which are injected with gas at high pressures (up to 200 bar). Gas can be withdrawn from storage as and when needed. Traditionally, gas was stored in gasometers, within the city gates; however this is no longer the case and these once common sights have been phased out since the advent of natural gas. Anecdotally [89], a gasometer’s pressure gauge in modern day Jakarta recorded a 0.085 bar, or 1.23 PSI spike on the day of the 1883 Krakatoa explosion, which was 160 km away.

Demand

ˆ Interconnectors: Flow can occur in both directions, and hence when gas is exported it can be viewed as demand.

ˆ Injection into Storage: As above, at times of low prices gas can be injected into storage. CHAPTER 1. INTRODUCTION 28

Table 1.2: Node type and aggregate daily flows by group, 2019. Flows are shown in terms as pure energy and percentage throughput, in terms of the mean and standard deviation. Demand Type Number Mean (GWh) SD (GWh) Mean (%) SD (%) Industrial 18 108.8 19.4 3.9% 1.0% Power Stations 40 582.0 179.0 21.8% 10.4% Storage 13 216.3 270.7 8.2% 9.9% Interconnector 3 510.9 320.4 18.6% 12.0% LDZ Offtakes 122 1437.0 726.7 47.4% 16.7% Supply Type Number Mean (GWh) SD (GWh) Mean (%) SD (%) Offshore 17 1715.5 374.5 70.8% 12.4% LNG 4 498.6 304.7 18.9% 9.8% Interconnector 3 122.1 286.3 3.4% 6.6% Storage 13 193.0 220.0 7.0% 6.9%

ˆ Power Stations: A large proportion (∼ 42% [20] in 2016) of the UK power generation mix is accounted for by gas-fired power stations. These stations consume large amounts of gas, and have dedicated connections to the NTS.

ˆ Heavy Industry: This includes both individual plants with a large demand for gas, such as nitrile manufacturing, and larger industrial complexes where multiple factories may be based.

ˆ LDZ (Local Distribution Zones) Offtakes: These offtakes feed the distribution system, and hence a large amount of variability can be expected.

ˆ Compressors: Compressors may be considered a demand node, as they consume gas directly from the grid.

We have summarized the number of each type of node in Table 1.2. It is immediately evident that demand nodes (196) far outnumber supply nodes (37). Overall, the majority of daily exit flow is accounted for by LDZ offtakes (47%), followed by power stations (21.8%) and inter connectors (18.6%). The majority of supply is processed at offshore gas terminals (70.8%), with a smaller proportion being importer LNG (18.9%). Overall, storage injections and withdrawals will typically account for around 8% of both daily supply and demand respectively. CHAPTER 1. INTRODUCTION 29

1.2.3 Linepack

Linepack (LP) is the amount of gas – and therefore energy stored – within the pipelines at a given time t. Thus, it is a form of ultra-short term storage. In fact, it has replaced certain storage methods such as gas holders and LNG storage. It can also be seen as a buffer allowing for gas delivery to be viewed as instantaneous from point to point. Due to the high amount of variability and time dependence of the demand curve (see Figure 1.4), linepack is used to compensate by being increased during hours of low demand (i.e. night-time). This allows for sudden jumps in demand during peak hours to be addressed. Perhaps the most important factor when balancing the network on a day-to-day basis in order to ensure demand is met, is managing the linepack. We will go into more detail regarding how this is calculated at a later stage.

1.3 Grid Balancing and Introducing UAG

In general, balancing in terms of transmission grids means ensuring there is enough gas availability throughout the day so as to guarantee delivery at contractual pressures for all offtakes. This is in itself a challenging task, as can be seen by examining a typical demand curve in Figure 1.4. The classic ’double peak’ profile is seen, corresponding to the early morning and evening demand peaks. The overall daily demand curve will have a very similar shape. Therefore controllers will make use of advance planning, forecasting and various operational tools available to them (compressors, linepack, injection or withdrawal from storage) in order to maintain pressures across the grid. The equivalent in electrical grids is maintaining line frequency and voltage, with a very similar demand curve - indeed, these are intertwined as gas power still accounts for a large share of generation in the UK. Balancing also relates to monitoring whether mass and energy is conserved whilst tran- siting throughout the system, as is required by the laws of physics. Of course, energy is always conserved, but due to measurement imperfections and other factors to be discussed in this work perfect balance is not achieved when inputs and outputs are equated. CHAPTER 1. INTRODUCTION 30

Figure 1.4: Natural Gas Daily Volume, Kirkstead Offtake, 2019-01-08

0.06

0.04 Volume (mcm) Volume

0.02

00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00 Hour

Equation (1.1) is the conservation equation governing a transmission grid, and is appli- cable both in terms of volume and energy.

E + Sw + Ii = D + 4LP + Si + Ie + Shrinkage, (1.1) | {z } | {z } Inputs Outputs where

ˆ E represents the total amount of gas flow from NG entry points such as plants, ports, local productions

ˆ Sw represents the amount of gas withdrawn from storage

ˆ Ii represents the amount of gas imported from interconnectors

ˆ D represents the amount of gas exiting the system through offtakes to the distribution network and special places like plants

ˆ 4LP represents the change in linepack during the time period in question - also written DLP (Delta Linepack)

ˆ Si represents the amount of gas injected into storage

ˆ Ie represents the amount of gas exported via interconnectors.

Shrinkage is a composite quantity, consisting of three components, two of which are discussed below: CHAPTER 1. INTRODUCTION 31

ˆ OUG: Own Use Gas is the total amount of gas used by the compressors which help to maintain pressure and flow in the NTS. CFU is made up of gas used for the turbine powered compressors.

ˆ NTS/LDZ CV Shrinkage: Unbilled Energy, normally referred to as Calorific Value Shrinkage (CVS), which is the difference between delivered and billed energy of a charging zone as a consequence of applying the Flow Weighted Average Calorific Value (FWACV) process in accordance with the Gas (Calculation of Thermal Energy) Reg- ulations 1996. Clearly, this is specific to the NTS.

Rearranging Equation (1.1) by moving all terms to one side, typically as Inputs - Outputs, in practice results in an error term. This error term is called Unaccounted for Gas (UAG) within the NTS: UAG = Inputs − Outputs, (1.2)

In the UK, UAG is the third component of Shrinkage, which represents the operational costs of the network. Positive UAG results in a direct cost to the transporter, as they must recoup this lost gas by purchasing more, in order to honour their transportation agreement with the shipper. Negative UAG does not incur any cost to the transporter as it implies free excess gas has materialised in the system - however, this cost may ultimately be absorbed by the shippers. Positive UAG is much more common, and is usually the standard across the world. As such, the related costs absorbed by the transporter will usually be governed by the transport licence issued by the transmission regulatory body. In the UK, this is Ofgem, and it stipulates such costs are passed down to the end customer as part of the transmission fee - meaning they in effect purchase more gas than they use, to compensate for accounting & metering errors. Therefore, minimising this quantity is in the public interest, and as such transporters are typically financially incentivized, or mandated to do so. Figure 1.5 illustrates the daily UAG for the period 2012-2020 for the UK grid, in addition to the OUG, CV Shrinkage and DLP. Recent total yearly values for UAG and OUG can be seen in Table 1.3. In general, the international industry standard for UAG is seen as ±0.3% of system throughput, and the the UK’s National Grid performs roughly within within this range. In recent years it has been as low as 0.06% although the absolute percentage of UAG has stayed constant. This might indicate that the presence of a systematic overestimation of certain system constituents has been rectified, resulting in a mean more evenly centred CHAPTER 1. INTRODUCTION 32 around 0. The proportion of UAG relative to total demand has remained fairly constant throughout.

Table 1.3: Yearly UAG and OUG as publicly recorded by NG in the UK, as both (absolute) percentage and total physical values. All physical values given in GWh. Source: National Grid. Year 2013 2014 2015 2016 2017 2018 Gas Transported 967253 862791 910667 956274 955702 920990 UAG 2138.74 2494.11 2557.00 1779.07 594.59 939.71 UAG % 0.221% 0.289% 0.281% 0.186% 0.062% 0.102% Absolute UAG 3112.89 3596.93 4074.61 3785.13 3300.38 3873.02 Abs.UAG % 0.322% 0.417% 0.447% 0.396% 0.345% 0.421% OUG 1261.89 1273.53 1541.21 2196.31 2599.84 1660.57 OUG % 0.130% 0.148% 0.169% 0.230% 0.272% 0.180%

Whilst the UAG is of a small relative order at about 0.06%-0.28% of transported gas, this equates to about £60 million GBP per annum at spot prices for the UK figures, for the average volume transported in recent years. However, it is obvious that if large proportions of this error were to arise out of a few nodes, significant billing errors could occur, and therefore it is important to take UAG seriously – moreover, UAG could be an indicator of serious leakage, therefore representing both potential safety and environmental concerns. Considering mass and energy balances are always conserved in practice, UAG can be viewed as a term quantifying the accounting and measurement efficiency across the network. Prior to 1817 and the invention of the rotary gas meter, UAG would have been equal to the network throughput, or 100% in percentage terms as no way of measuring the quantity of gas delivered to customers existed! In the next section, we will define UAG as calculated for the Gas Day (defined as between 05:00 to 05:00 UTC the following day, in the UK and the EU) within the NTS.

1.4 Defining UAG

Now we introduce some notation to express the calculation of UAG which must be carried out by the grid operator, in the context of nodes as discussed in an earlier section. Let n and m respectively be the number of input and output nodes across the transmission system.

th We denote the unobservable true daily total flow in terms of energy for the i node by νi,t, where the first n nodes are inputs to the system, and νi,t will be an output node when i > n. CHAPTER 1. INTRODUCTION 33

Figure 1.5: NTS UAG in addition to Shrinkages terms, for the years 2016-2020. Rolling 7-day average is overlayed in black. Note the different y-axis scales.

50 UAG

0

−50

200

100 DLP

0

−100

20 Energy (GWh)

15 OUG 10

5

0

5.0 CV Shrinkage

2.5

0.0

2016 2017 2018 2019 2020 Year

Key UAG DLP OUG CV Shrinkage CHAPTER 1. INTRODUCTION 34

Further, we introduce the following true values; let λt be the linepack at time t, ot – the total daily compressor fuel usage, st – the CV shrinkage, and at accounts for network specific losses such as leaks. Then following (1.1) and (1.2), the UAG Ut for day t can be defined as:

n n+m X X Ut = νi,t − νi,t − (λt − λt−1) − (ot + st + at) . (1.3) i=1 i=n+1 | {z } | {z } delta linepack additional terms | {z } node flows Note here that the unobservable true daily flow totals sum up to 0. However, only the observed (metered) flows are available and thus we denote the observed flow by adding a hat ˆ to the corresponding variable, i.e.ν ˆi,t,ω ˆi,t, λt,o ˆt,s ˆt anda ˆt respectively. Then the observed UAG Uˆ is given by

n n+m ˆ X X ˆ ˆ Ut = νˆi,t − νˆi,t − (λt − λt−1) − (ˆot +s ˆt +a ˆt). (1.4) i=1 i=n+1

Note that one of the reasons that UAG is only calculated daily by the network operator, is because not all terms will be available at more frequent intervals. It is unlikely that a higher granularity will be implemented in the future, as the many different accounting data fields reporting intervals are governed by regulations – this is the limiting factor, rather than data availability itself. Moreover, thus far there has been no research on the benefit of reporting UAG at more frequent intervals. As the observed flows are the only dataset we will work with, for simplicity we will henceforth use the notation Ut to denote observed UAG, unless otherwise stated. Before we proceed, it is worth noting that the definition of UAG is not internationally agreed upon, and nor is the nomenclature – literature frequently also refers to it as UfG, or Lost and Unaccounted For (LAUF) gas in the US, and sometimes the term includes other components which are grouped under the broader Shrinkage term in the UK.

1.5 Literature Review

Few in the literature have specifically focused on the issues relating to UAG in transmission networks; still there exist a few relevant works dealing with error identification and UAG modelling & analysis. We will consider works specific to UAG in this section, and elaborate on specific statistical literature as needed within the thesis. NG issues a bi-annual report on the state of UAG in its network. A discussion pertinent to detection and attribution CHAPTER 1. INTRODUCTION 35 can be found in one such report, [57], in which applications of CUSUM, CART and a number of heuristic techniques to aid detection and identification are considered. However, no satisfactory solution was reached at the time. We further advance the methods relating to power stations in Chapter5. A model for the uncertainty of UAG relating system size and throughput was proposed in [2], along with a review of typical metrological performance focusing on the Italian transmission network – this work is expanded upon in Chapter2. A similar work [91] also expands on the concept of theoretical loss arising out of random meter error, in transmission grids. Contrary to results in that work, we have found no evidence of seasonality when analysing the UK grid, as discussed in Section 3.2.1. A study in [91] further looks at a model for trunk gas pipeline, quantifying a theoretical expression for network loss, as a percentage of network throughput – however, this cannot be readily extended to a complex system such as a transmission system. Insight into practical approaches used by American pipeline companies to minimise lost and unaccounted for gas, along with recommendations are given in [26]. Best practices for the mitigation and control of UAG with a focus on distribution grids, along with a review of common causes can be found in [35]. A review of atmospheric emissions and UAG from the ex-Soviet transmission grid can be found in [72], although metrological performance from the time is unlikely to be comparable to today’s site instrumentation. Identification of faulty distribution grid meters was approached based on an individual load index in [60] - however, this is not easily transferable to transmission as the scope is entirely different when comparing household and region-serving meters. From a regulatory perspective, [17] considers actor motivation and optimal incentives to aid UAG minimisation for the US market, with the points being elaborated further in the report [16]. In 2020 we added to the literature with a paper titled ‘Statistical process control in the management of unaccounted for gas’ [10], exploring changepoint methods in particular – see Chapter4. Data reconciliation (DR) techniques have been applied to the UAG problem; notably in [4] to improve the accuracy of metered values in distribution networks. DR in a transmission pipeline with a view to minimise UAG was applied in [19], however this was in the case of a single pipeline that was constrained upon a known proportion of the constituent energies. Gross error detection, a sub-field of DR has been applied specifically in the gas distribution industry to detect faulty meters and limit UAG in [77]. [5] considers data reconciliation in a gas pipeline system of limited complexity, however pressure measurements are used CHAPTER 1. INTRODUCTION 36 for redundancy and calculations are done in terms of volume, not energy. Another caveat is that bi-directional flow, which frequently occurs in the NTS, is not considered in these examples. For more applications of DR in the natural gas industry, see [63]. Identification of faulty distribution grid meters was approached based on an individual load index in [60]. The aforementioned approaches using DR however the techniques that they outline not easily applicable in a NTS as redundancy does not usually exist - thus not allowing for any constraints to be applied on flows, whilst in the latter case [60] the scope is entirely different when comparing household and region-serving meters. We will investigate the potential for DR use in a transmission grid in Chapter5. Finally, it is worth noting that whilst at a first glance the academic literature may seem somewhat limited, a large number of white papers and reports, typically authored or commissioned by utilities exist on the subject - for example, see [39] and [82] for a full report with a focus on the link between emissions and UAG, and [82] for an executive summary of prevailing UAG concerns in the US. A large amount of similar work may remain hidden behind confidentiality restrictions, as companies are often hesitant to release information pertaining directly to their operational efficiency and losses. For example, after a study commissioned by Sui Northern Gas Pipelines Ltd to KPMG was leaked to the Pakistani press [64], the company shunned the report conclusions and claimed its own views were not taken into account.

1.6 Scope

The UAG problem and gas transmission data infrastructure allow for a number of research and practical problems to be posed. The specific industrial aims of this thesis were stipulated by the iCASE partnership sponsor, National Grid, and are laid out below:

ˆ Development of a baseline methodology: This encompasses the examination of error sources, quantification of such errors where possible and ultimately the construction of an appropriate model for UAG given the system operating state. In other words, this can be described as scientific modelling of the UAG’s behaviour. The aforementioned baseline will aid National Grid in both overall performance metrics and the devel- opment of a sound process control methodology, but also enables UAG management techniques to be refined and targeted in specific areas greatly improving response and CHAPTER 1. INTRODUCTION 37

holistic management with all stakeholders. The overall ability to draw inference from UAG to aid in network operation will be assessed.

ˆ Mathematical analysis techniques: The mathematical techniques will build on the results of the baseline analysis to provide a suite of tools and models that will be embedded into National Grid’s UAG management processes. In particular, systematic and random error detection techniques will be examined, and their efficacy to solve the specific scenario existing within the transmission grid data infrastructure assessed. Special focus is paid to the development of a real-time data analysis system, pertaining to all matters concerning UAG inference.

ˆ Creation of an integrated real-time analytics suite, implementing the techniques utilised above in a practical way allowing for their use by NG’s Balancing team, in order to aid fault detection and tracking.

More directly, some of the questions we will ask and answer throughout this thesis are as follows:

ˆ What causes UAG?

ˆ What is a normal level of UAG?

ˆ Is the system currently operating in an error-free state?

ˆ Why is UAG centred around a positive figure and not 0?

ˆ To what extent can both random and systematic errors in measurement be detected through UAG?

ˆ To what extent can both random and systematic errors in measurement be attributed back to a faulty node?

ˆ How can UAG be minimised?

Indeed, we have taken a holistic approach to the analysis of UAG in a transmission grid, covering topics ranging from analysing sources of uncertainty and using this to formulate baseline models, to providing guidelines for the high-level error detection processes imple- mented by a transmission grid operator. As can be seen in the literature review, no work of CHAPTER 1. INTRODUCTION 38 as broad a scope has been conducted before in this domain. Whilst our main focus is the UK’s NTS it is hoped that the study will serve as a reference to establish better control and management practices of UAG in both transmission and distribution grids both large and small across the world.

1.7 Contributions

Our contributions are chiefly in the domain of operational research in energy systems, and specifically in the quantification of UAG and the use of statistical methods on gas transmis- sion data to aid in decision making. The highlights of the thesis are summarised below, with the general structure also outlined:

ˆ A comprehensive study of factors contributing to UAG within the NTS is laid out in Chapter2, and errors stemming from several practices are quantified where possible. We have identified practices which result in unnecessary and avoidable errors, and this coupled with our recommendations posited in Chapter7 can help reduce future levels of UAG in the UK, which will have an appreciable real-life impact through the reduction of gas transmission costs.

ˆ A baseline level for UAG within the NTS has been identified as a function of operational parameters, which is able to replace the arbitrary constant baseline currently in use. Whilst this is not immediately generalisable across international grids due to their own idiosyncrasies and peculiarities, the methodology in general can be adapted to suit any grid. The methodology in question combines the results of the study on metering and accounting uncertainty in Chapter2, alongside a data-orientated statistical approach. Moreover, a statistical analysis has been carried out of historic UAG and other relevant time series, and the findings have been compared to other national transmission grids where possible. This can be found in Chapter3.

ˆ We reviewed appropriate methods of identifying systematic errors in transmission grids, through the use of changepoint analysis. We quantify the minimum detectable error size as a function of UAG variance; the findings appear in detail in Chapter4, which mostly follows our journal paper ’Applications of statistical process control in the CHAPTER 1. INTRODUCTION 39

management of unaccounted for gas’ [10], published in the Journal of Natural Gas Science and Engineering.

ˆ Two unique approaches to identifying errors in node-flows were developed in Chapter 5, both of which make use of UAG. In particular, this is the dual baseline approach, which combines monitoring UAG with monitoring energy conservation downstream or upstream of the NTS, and the analysis of predicted UAG. Gas-fired power station energy conservation was studied in detail, and several models pertaining to gas use and efficiency were proposed and verified against historic data. This chapter concludes by discussing the place of UAG within a transmission grids greater analytic process, and how it is best utilised in decision making and error detection. In particular, we found that error reducing processes should be UAG guided rather than UAG led.

ˆ A real time monitoring application, UAGMS (Unaccounted for Gas Management Suite) has been developed, tested and deployed within National Grid by us towards fulfilling the poject requirements. This implements some of the above methods into the NG work process, and is currently in use by the energy balancing team. Whilst the UAGMS does not represent a novelty in terms of computing, we include it as a case study of an efficient integration of complex analytical techniques within an existing system, in a highly regulated environment such as National Grid. The above can be found in Chapter6.

ˆ Recommendations were posed towards the reduction of UAG, and overall steps that can be taken by gas transmission operators to implement better process control – this can be found in Chapter7. Whilst in some instances these may be targeted primarily towards National Grid, the majority are generalisable to typical transmission or in some cases distribution grids. We also summarise our findings and conclusions in this chapter. Chapter 2

Sources of Uncertainty

It is important to understand the causes behind UAG, and that is why we begin with an overview of the metering, data transmission and processing functions involved in its calculation. Following this, we identify sources of such errors in the entire process from the point of metering and culminating in the calculation of UAG. We next attempt to quantify and model both existing and potential errors. We will not however construct a composite model for UAG; this will be carried out in the next chapter, alongside historical back-testing. Mapping out the flow of data from the metering process to the final calculation yielding the UAG figure is of vital importance as it enables sources of uncertainty to be identified and quantified. Figure 2.1 illustrates the process, and has been compiled following a thorough investigation into information flow from meter to the final UAG calculation within the NTS. Not all grids will have an identical process, and the list of possible sources of error is not exhaustive. The complexity involved is apparent; in particular, the process is not fully automated from the point of metering to the UAG calculation. Multiple points of human input and interaction exist, along with transmission across different platforms. This is typical for older gas transmission systems, which have adapted to rather than been designed with the current information technology advances. In an idealised scenario, a gas transmission telemetry system would be able to calculate UAG through an entirely automated process, removing many of the identified potential error sources. The next section considers the components of Figure 2.1 from the standpoint of random and systematic error.

40 CHAPTER 2. SOURCES OF UNCERTAINTY 41

Figure 2.1: NTS data flow from metering at node to UAG calculation GES Data Entry Storage data in n hourly Modelling & random failures (?) Calculations intervals, Subject to Batch updating of select Gemini

Errors. Random ransmission OUG and CVS. errors possible. T (2.4)Systematic ableau CV/DLP Estimation Systematic Errors T UAG Data Entry & Correction discrepencies Flow computer Systematic Error FC2 SCADA

FC1

Metering from [t,t+k] Random Error Systematic Error Can estimate using regulator mandated scaling. Applicable maximum variances. Such as for example CHAPTER 2. SOURCES OF UNCERTAINTY 42 2.1 Classifying Errors

2.1.1 Random Error

Random error is typically defined in terms of random measurement errors. Metering equip- ment can be expected to take measurements of a quantity which include a certain amount of random error arising from random phenomena and/or minutely fluctuating environments (e.g. power supply instabilities, signal noise). In complex systems such as a gas metering station this error propagates through the calculation. Provided the metering instrument is unbiased and properly calibrated, random error can usually be modelled by a probability distribution centred around and with an expected value of 0. Random error can never be completely removed, and therefore it is important to ensure it is minimised. Random errors are anticipated, and thus instrumentation comes with uncertainty ranges. Typically, ran- dom error can be reduced by taking more measurements, and averaging. Furthermore, it is important to know that they are present in all readings a meter takes. However, in terms of the UAG calculation process, random error has a broader defini- tion. It can occur in the data transmission process, for example when an operator makes a typographical data input mistake, or when data is corrupted during transmission or storage.

2.1.2 Systematic Error

In metering, a systematic error is a persistent and ultimately predictable error across multiple readings of the same quantity, arising out of faulty metering equipment or experimental design. Common types of systematic errors with a linear response are offset and multiplier errors. If the systematic effect is known - that is, the influence is quantifiable, it can be removed through the use of a correction factor. Systematic error could be, for example, the use of an incorrect unit conversion ratio. Unlike random error, systematic errors can be and often are transient in nature, contingent on the underlying conditions. This, coupled with the fact that their magnitude is also not necessarily constant can make their detection extremely challenging without the use of statistical control techniques. Systematic error is present in many other forms in the UAG calculation process. Quan- tities which have to be indirectly estimated, rather than metered will all contain an element of systematic error, the magnitude of which will depend on the soundness of the underlying methodology of the estimation. Linepack calculation and the use of attributable sites are CHAPTER 2. SOURCES OF UNCERTAINTY 43 two key sources of systematic error, which will be discussed in this chapter.

2.1.3 Decomposing UAG

Whilst measurement error is a substantial part of the daily UAG, it is not the sole component. Indeed, UAG can be regarded as the sum of multiple distinct error types, which can be categorised by their origin. Hence, the UAG at a time t can also be expressed as follows:

measurement data emissions Ut = Ut + Ut + Ut + ... (2.1)

Each component of the above can in most cases be further sub categorised into a random, and systematic component, as in the case of measurement error:

measurement Ut = random + systematic (2.2)

The following sections will elucidate upon whence UAG originates by considering specific sources of error, and quantify them as per above. We will attempt to quantify the net contribution of each category to UAG, and discuss factors that can aid in error mitigation.

2.2 Atmospheric Emissions

The first natural hypothesis one might posit to explain UAG is leakage; this is particularly attractive as transmission leakage would always have a positive effect on UAG, which might explain the slight positive bias. However, as will be discussed, emissions account for a small part of UAG. Emissions will always result in an increase of UAG, provided the leakage occurs within the transmission grid. There are many causes as to why natural gas may escape from the transmission system, and the quantity of these emissions ranges from well-estimable to largely unknown. Leaks can be either operational, where they are anticipated and accounted for, or accidental and unplanned. We review and categorise the key emissions causes below:

1. Venting gas: Defined as operational emissions arising from compressor startup, purg- ing, and depressurisation in addition to leakage around compressor seals and shafts. Approximately 55.5 GWh of venting gas was released into the atmosphere over 2017- 2018. This amounts to an effect accounting for just over 1.8% of the absolute UAG over the same period, or approximately 0.0018% in terms of system throughput. Since CHAPTER 2. SOURCES OF UNCERTAINTY 44

this is distributed approximately uniformly over the year it is unlikely to result in a statistically appreciable effect.

2. Operating emissions: Emissions from all remaining connectors and instrumentation in the transmission system. The EPA [86] provides the following definition for such ’leak’ emissions: ‘Potential sources of leak emissions include agitator seals, connectors, pump diaphragms, flanges, hatches, instruments, meters, open-ended lines, pressure relief devices, pump seals and valves. Leak emissions occur through many types of connection points (e.g., flanges, seals, threaded fittings) or through moving parts of valves, pumps, compressors, and other types of process equipment’. See [39] and [82] for a detailed review of UAG and emissions in the US.

Although prudent preventative maintenance schedules can lower the total level of emis- sions, they are by and large unavoidable on the scale of transmission and distribution grids and can be considered as operational emissions. Such emissions can be estimated in a number of ways, as discussed in [40]. The simplest, least expenditure inducing and most commonly used emissions estimating procedure is the use of an emissions factor applied to throughput on either a facility, equipment or component level. Methodolo- gies providing greater accuracy come at an increasingly greater cost. For the calendar year 2012-2013, the NTS reported total emissions as 0.01% of throughput.

3. Significant single-point failures: These are by far the least common, and are the result of pipeline failure due to corrosion, ground shifting, external influence or pro- duction defects amongst others. They are always unplanned and accidental in nature. A large amount of gas may be released, coupled with significant damage to property and human life if there is a resulting explosion, although the latter is exceedingly rare (only around 5% of leaks result in ignition). The European Gas Pipeline Incident Data Group maintains a database of gas release incidents in transmission grids. Over the years 2012-2016, 56 incidents were reported; this equates to a failure frequency of 0.136 per 1000 km of steel high-pressure (> 15 barg) pipeline per year [23]. Fortuitously this frequency is exceedingly low, and does not have a practical effect on UAG. In cases where such leakages are identified, losses can be numerically approximated via models such as [21]. CHAPTER 2. SOURCES OF UNCERTAINTY 45

In summary, emissions effects on UAG can be minimised by operating prudent mainte- nance and assurance schemes on both pipelines and related equipment, along with reliably mitigating emissions via accurate estimation. Regulatory oversight often governs these ac- tivities. The effect of emissions is much larger in distribution grids. This is due to both a much greater amount of pipeline, but also the lower overall construction and maintenance stan- dards. See [77] for further discussion of emissions from distribution.

2.3 Metering Errors

In this section, we will discuss errors arising out of the metering process; we begin with a discussion on the measurement stations, followed by an examination of the two types of measurement errors and the modelling of such errors.

2.3.1 The Measurement Chain

Modern gas metering sites are complex above ground installations, consisting of a number of meters working in unison to form a measuring chain. In a typical volumetric installation commonly found in transmission grids, this will consist of a primary volumetric flow device, a pressure transmitter, and a temperature transmitter in addition to a gas chromatograph all feeding into a flow computer. The information flow schematic can be seen in Figure 2.2. Volume is calculated as a function of temperature, pressure, and compressibility corrected to base conditions for billing purposes:

P Tb Zb Vb = V , (2.3) Pb T Z where V is the gas volume (m3), T is gas temperature (°C), P is pressure (Pa), all at

3 operating conditions. Z is the dimensionless gas compressibility factor. Vb(Sm ), Tb (°C), Pb

(Pa) and Zb denote the corresponding quantities at base conditions, which are taken to be 15°C and 101325 Pa within the National Grid. For a venturimetric flow configuration - that is to say, a meter measuring flow via a pressure differential resulting from an obstacle to the existing flow vessel, such as an orifice CHAPTER 2. SOURCES OF UNCERTAINTY 46

Figure 2.2: Measurement chain

Temperature Transmitter (T)

Volumetric Device (V) Pressure Transmitter (P)

Venturimetric Device (V) Gas Chromatogram (C,Z)

Flow Computer

NTS Data Infrastructure

plate, the following calculation is required to get Vb as published in ISO 5167-1:2003: s 0 2 r C πd 24P P Tb Zb Vb = p , (2.4) 4 1 − β4 ρb Pb T Z where C0 is the dimensionless flow coefficient,  is the dimensionless gas expandability factor, d (mm) is the internal orifice plate diameter, β is the dimensionless diameter ratio of the orifice plate, 4P (Pa) is the differential pressure between the orifice plate sections and ρb kg ( m3 ) is the gas density at base conditions. For a review of uncertainty in gas transmission measurement chains, see [27]. Natural gas quantities are measured in energy, and are a function of volume and calorific value. As a result, the following equation needs to be applied in order to determine the billable energy:

E = VbC, (2.5) where E is energy measured in MJ, Vb represents volume at base conditions and we take C to MJ be the calorific value measured in Sm3 . As all the above equations refer to an instantaneous rate of volume or energy flow, they need to be integrated with respect to time in order to determine the total daily flow values. This is achieved through an integrator, typically a part of the flow computer. Instantaneous flow values, alongside daily totals are transmitted to the control centre and logged into the SCADA. CHAPTER 2. SOURCES OF UNCERTAINTY 47

Table 2.1: Relevant expanded uncertainties at the 95% confidence level in key measurement chain components Metering component Maximum permissible uncertainty Density Computation 0.02 Analogue Converter 0.03 Pressure Transmitter 0.2 Temperature Transmitter 0.2 Volume 0.1 CV 0.25 Energy 1.1

2.3.2 Random Error

Measurement uncertainty at metering stations is regulated both by individual agreements between the asset owners and National Grid, and by the regulator (Ofgem). Whilst excep- tions exist, instantaneous energy measurements are required to have a relative expanded uncertainty of no greater than 1.1% on both supply and demand side nodes, as referenced in [61]. Multiple standards apply to metering instrumentation when used in such critical indus- trial capacity - such as ISO 5168, BS EN 1776, and ISO 6976 - a complete list of governing standards can be found in [61]. Individual meter uncertainty, along for the UK grid is speci- fied in [41]. However, exceptions do exist - for example, LDZ offtakes using older orifice plate or turbine metering systems usually have wider permitted uncertainty levels stated in their Supplemental (connection) Agreements, e.g. ±2.0% for volume measurement, ±4.3% for en- ergy measurement. It is important to maintain accurate databases with these uncertainties to form the groundwork of a baseline model. Assurance schemes validate the operating state of metering sites on an annual basis, ensuring the contractual maximum uncertainty is not exceeded, and carrying out necessary repairs as needed. The maximum uncertainties at test conditions for the individual components of the mea- surement chain are laid out in Table 2.1. From a modelling standpoint the final maximum contractual energy uncertainty of 1.1% remains best guess of actual operating uncertainty. The true value is in practice unobservable by the transporter, due to the following factors:

ˆ Metering assets are not always owned by the transporter and thus access is limited;

ˆ Likewise, technical information (asset age, manufacturer, etc.) about site instrumen- tation may be limited or not readily accessible; CHAPTER 2. SOURCES OF UNCERTAINTY 48

ˆ Uncertainty is not constant - indeed, degradation rates of metering equipment may result in non-linear uncertainty growth.

Now consider that the true daily flow for day t at an (input) site i is νi,t, and we make an observation via the meter asν ˆi,t = f(νi,t). We can model the described random error as a (proportional) normal error, which gives:

2 νˆi,t = (1 + i)νi,t i ∼ N(µi, σi ), (2.6) where µi, σi ∈ < are constants corresponding to individual nodes i. If the meter has recently been calibrated it seems appropriate to assume that µi ≈ 0. In fact, together these random metering errors can account for the majority of the UAG variance. If additional non-metered UAG components from Equation (1.1) are discounted, then by combining Equations (1.3), (1.4) and (2.6) we get: n n+m ˆ X X Ut − Ut = iνi,t − iνi,t, (2.7) i=1 i=n+1 ˆ where Ut and Ut are the true and observed flows respectively. Furthermore, in light of the fact that sensor polling frequencies are relatively high frequency (most often between 5ms to 5min at gas metering sites) and that total throughput is split into a large number of sites, provided no consistent systematic bias exists then by the Law of Large Numbers and the additive properties of the normal distribution it can be expected that the aggregate daily UAG’s metered component can be expressed as

2 ˆrandom ∼ N(¯µ, σ¯ ), (2.8) whereµ ¯ ≈ 0 andσ ¯2 are both constants to be calculated. In fact, both can be approximated by using the methodology such as a throughput-based estimation, where a relative uncertainty is assigned to similar groups of nodes which is further discussed in [2]. We will revisit methods of estimating this variance later in Section 2.3.6. UAG values exceeding this estimate by a large amount (e.g.µ ¯ + 3¯σ) are unlikely to be ascribable to random fluctuations, and could signal that either a data system or systematic error may be the underlying cause. In some industrial applications involving input-output measured systems, Data Recon- ciliation (DR) has been used as a statistical technique that allows for measurement noise resulting from random error to be minimised. It is applicable in systems where measurement CHAPTER 2. SOURCES OF UNCERTAINTY 49 redundancy exists. It also requires additional equations governing conservation and balance of flows as this allows the formulation of an optimisation problem. We further investigate the possible use of DR in a transmission network, as well as further reviewing the applications in literature in Chapter4.

2.3.3 Systematic Error

Metering stations should, under ideal operating conditions, be free of systematic errors. How- ever, they may accidentally be introduced in a number of ways. For instance, equipment may be misconfigured following a maintenance procedure, either mechanically or software wise (e.g. incorrect constants). Meters may also operate outside their rated performance envelopes. For example, orifice plate meters have lower bounds for gas flow under which readings are not accurate and in extreme cases not at all registered, so unless careful control is exercised when routing gas through the network such errors might occur. Further, some- thing as simple as natural degradation may erode key components resulting in unacceptable margins. All these cases may lead to a systematic error, but prudent maintenance schedules can minimise or eliminate most of these causes. The most commonly encountered systematic error is a constant scaling of the metered flow:

2 νˆi = cf(νi) = c(1 + i)νi i ∼ N(µi, σi ) (2.9) where c ≥ 0. This would encompass under-reads and over-reads, although in the UK most of the recorded systematic errors are a result of under-reading, i.e. c < 1. In literature, such errors are also referred to as gross errors, usually in the context of data reconciliation. Other possible forms of errors can include the following:

ˆ Fixed Error (Offset): f(νi) = νi + δ, δ ∈ <.

ˆ Sensitivity: f(νi) = νi1(νi∈Ai), where Ai is the operational range of the meter i and 1 is the indicator function:  1 if νi ∈ Ai, 1(νi∈Ai) = 0 if νi ∈/ Ai. All meters have an operating range, and will be better suited to measuring certain types of flow. Operating the meter outside of this range may result in entirely null, or wildly inaccurate reading. CHAPTER 2. SOURCES OF UNCERTAINTY 50

Figure 2.3: Typical error curve vs flow rate for an ultrasonic meter. Relative uncertainty is shown on the y-axis, and flow rate on the x-axis. Qmin and Qmax denote that minimum and maximum calibrated flow ranges. Qs denotes the minimum registered flow, and Qt a nominal flow one would expect through the meter. The dashed lines indicate the meter’s quoted uncertainty limit, in this example it is ±1%

1

0

−1 Error Percentage −2

−3

QS Qmin Qt Qmax Flow Rate

ˆ A mixture of the above: f(νi) = cνi(1 + i) + δ

Systematic errors of sufficient magnitude can result in perceptible shifts in both the mean and variance of UAG. Furthermore, a deterministic trend can be added to Ut if the introduced error and the flow through the metered site are large enough (resulting in UAG no longer being stationary). As previously mentioned, meter faults are not the only cause of systematic error. In particular, systematic errors may be correlated across metering chains through latent vari- ables such as linepack (affecting pressure), or temperature. Another source of correlated systematic error might be low (or high) regime flow rates, relative to the calibration point; see Figure 2.3. Such systematic errors are addressed in the discussion below.

2.3.4 A Note on Error Cases

We will now consider three unique categories of error. The underlying error generative processes will be fundamentally different in nature, and so it is useful to consider them CHAPTER 2. SOURCES OF UNCERTAINTY 51 separately when modelling errors. Instances of the different types are illustrated in Figure 2.4, where the red line indicates the corrected flow and UAG, whilst the black line indicates the pre-reconciled amounts. Node flow is on the left, and UAG on the right, demonstrating the impact of these errors. An error of each type is included; these types are elaborated upon below, assuming observed flowν ˆi,t is expressed as f(νi,t) = cνi,t(1 + i,t) + δ. c = 0, δ = 0 : Missing Data

For a multitude of reasons, for example an offline meter or missing data, it is possible that

flow data may not be entered into the system at all, resulting in null observed flowν ˆi,t = 0. The subsequent errors may be large in magnitude relative to the UAG. Typically, such cases are the result of accounting processes. The first and last rows in Figure 2.4 illustrate an example of a missing data correction.

νi,t = 0, νˆi,t = δ 6= 0 : Zero Out

Conversely, sometimes a value δ ∈ < can appear where no actual physical flow takes place - the opposite of missing data. These default values may be simulated or forecasted flows, which are automatically populated as substitutes for real flows in the database until real flows are received from metering stations. Whilst the scenario seems contrived, it will be shown that there are numerous instances of this occurring within the NG data storage system – it is likely similar practices are followed across the industry. Likewise, this also has potential for significant impact on the UAG, as the error magnitudes are identical to those caused by missing data. Such errors may be impossible to spot through statistical methods, especially if they are forecasted by an unknown process (as, values that are precisely equal to a forecasted value can be automatically flagged as suspicious). The second row of Figure 2.4 depicts this case.

Standard correction

Flow errors which do not fall into the above two categories correspond to situations where c 6= 0, νi,t 6= 0 andν ˆi,t 6= 0. These cases will be henceforth referred to as standard corrections, and an example of these can be seen in row 3 of Figure 2.4. The cases of standard corrections are the hardest to detect, and the most common. CHAPTER 2. SOURCES OF UNCERTAINTY 52

Figure 2.4: Examples of various random errors, and their impact on the UAG. The red line indicates the corrected flow and UAG, whilst the black line indicates the pre-reconciled amounts. Node flow is on the left-hand side, and UAG on the right. Blackbridge Power Station UAG

● ● 1e+08 ● 1.0e+08 ● ● ● ● ● ● ● ● ● ● ● ● ● 7.5e+07 ● ● ● ● 5e+07 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5.0e+07 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Energy (kWh) 0e+00 Energy (kWh) ● ● 2.5e+07 ● ● ● ● 01/11/2015 15/11/2015 01/12/2015 01/11/2015 15/11/2015 01/12/2015 Date Date Humbly Grove Storage UAG

● ●

5.0e+07 4e+07

● ● ● ● 2.5e+07 ● ● ● ● ● ● ● ● ● ● 2e+07 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Energy (kWh) Energy (kWh) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0e+00 ● ● ● ● ● ● ● ● ● ● ● ● ● 0e+00 ● ● ● ● ● ● ● ● ● 15/03/2016 01/04/2016 15/04/201616 15/03/2016 01/04/2016 15/04/201615/04/2016 Date Date Grain Power Station UAG

6e+07 ● ● 2e+07 ● ● ● ● ● ● ● ● ● 5e+07 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+07 ● ● ● ● ● ● ● ● ● ● 4e+07 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0e+00 ● ● 3e+07 ● ● ● ● ● ● ● ● ● ●

Energy (kWh) Energy (kWh) ● ● ● ● ● 2e+07 ● ● −1e+07 ● ● ● ●

● ● 1e+07 −2e+07 01/11/2014 15/11/2014 01/12/2014 01/11/2014 15/11/2014 01/12/2014 Date Date Coryton Power Station UAG

● ● ● 1e+08

2e+07 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5e+07 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1e+07 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Energy (kWh) 0e+00 Energy (kWh) ● ● ● ● 0e+00 ● ● 01/11/2015 15/11/2015 01/12/2015 01/11/2015 15/11/2015 01/12/2015 Date Date CHAPTER 2. SOURCES OF UNCERTAINTY 53

2.3.5 Uncertainty and Bias Drift

It must be noted that measurement uncertainty and bias are not constant in time. Through the process of wear and tear, performance degradation can be expected, both in terms of increased bias and greater uncertainty. However, given the yearly assurance schedule, we will assume for the remainder of this work that these processes result in an entirely negligible effect. This is in part due to the difficulty of acquiring empirical evidence to model the degradation process, as equipment and instrumentation differs, as do the assurance report features.

2.3.6 Estimating Total Measurement Uncertainty

measurement We now consider methods for estimating the measurement error, Ut . The paper by Arpino et al. (2014) [2] presents a model for the joint estimation of the standard error of the terms random and systematic, thereby providing a method to estimate the uncertainty of UAG. It must be pointed out that the paper presents this as a model for UAG, ignoring all other contributing elements - whereas in reality, it is strictly a measurement error model. Before proceeding, we will define some notation: to avoid confusion with the UAG process

Ut, we will define the standard uncertainty of the quantity x as u(x), expanded uncertainty as ue(x) and relative uncertainty as ur(x). Expanded uncertainty will assume an implied 95% coverage factor of 1.96. The equations in the next two subsections closely follow on from the paper, bringing it in line with previous notation and our framework. In this section

th νi will denote the observed aggregate daily flow from the i node for an arbitrary day, and th u(νi) will denote the uncertainty of the daily flow for the i node.

Law of Uncertainty Propagation

We begin by first considering the measurement equation, which relates a measurand Y as a function consisting of N measured variables X1,X2, ..., XN :

Y = f(X1,X2, .....XN ).

An estimate of the measurand Y , denoted as y is obtained as a function of measured estimates x1, x2, ..., xN of the quantities X1,X2, ..., XN , given by:

y = f(x1, x2, ...., xN ). CHAPTER 2. SOURCES OF UNCERTAINTY 54

Then, we take the Taylor expansion up to the second term of the above, to give us the Law of Uncertainty Propagation [81], expressed below in terms of the estimated variance u2(y): N 2 N−1 N X  ∂f  X X ∂f ∂f u2(y) = u2(x ) + 2 v(x , x ), (2.10) ∂x i ∂x ∂x i j i=1 i i=1 j=1 j i where u(xi) is the standard uncertainty associated with the value xi; and v(xi, xj) is the estimated covariance associated with xi and xj. The combined standard uncertainty of y can be obtained by taking the positive square root of (2.10). Note here that the second term disappears if X1,X2, ...XN are independent.

Modelling random

We can apply Equation (2.10) to Equation (1.3) to obtain an uncertainty term u(Ut) for the UAG process. Let x1, x2, ..., xN represent the measured estimates of the node flows

ν1, ν2, ..., νn+m on the demand and supply side together with the summed up correction factors, which we mark as ‘lesser terms’. Further assuming that errors are independent, we split up the summations according to supply (n nodes) and demand (m nodes) to express the combined standard uncertainty of UAG as:

v u n m+n uX 2 X 2 2 ue(Ut) = t ue(νi) + ue(νi) + ue(lesser terms) (2.11) i=1 i=n+1 In this case, the lesser terms include uncertainties stemming from linepack, OUG and in the case of the NTS CV Shrinkage. Arpino et al. argue that these are of much lower magnitude and can be ignored for the estimation. However, this assumption is untrue as will be shown in Section 2.5, due to the magnitude of the LP term and its significantly greater uncertainty. For a set of nodes (in this instance, supply side nodes indexed i ∈ [1, n]) with homo- geneous relative uncertainty ur(ν1), we can find their total relative uncertainty as follows, assuming independence of errors:

v u n u 1 X 2 2 ur(ν1:n) = t ν u (ν1). (2.12) (Pn ν )2 i r i=1 i i=1 Next, Arpino et al. go on to assume the following simplifying hypothesis: CHAPTER 2. SOURCES OF UNCERTAINTY 55

ˆ Uncertainty is independent of flow. In practice, uncertainty varies with flow rate, although it is not possible to keep accurate track of the rate of such variation for each meter.

ˆ Homogeneous uncertainty across groups, which could split according to factor such as node type or meter type.

ˆ Constant NG volume is measured across plants within the aggregation period, i.e., a day, along with constant uncertainty.

Whilst the reality is clearly different, these hypotheses allow us to simply express Equa- tion (2.12) as follows:

u (ν ) u (ν ) ≈ r√ 1 (2.13) r 1:n n Note that we have a quadratic reduction of uncertainty with network size in this idealised scenario. For instance, a group of 200 nodes (roughly the total amount in the NTS) with

1 homogeneous uncertainty will have a total uncertainty ≈ 14 that of a single node.

measurement Modelling Ut

measurement Next, [2] addresses the term systematic, the second component of Ut . As previously mentioned, systematic errors may be correlated through variables such as operating pressure, temperature, gas composition, or other common external factors. Such correlation amongst nodes will result in a lower uncertainty reduction, and therefore need to be addressed. Cor- relation can be accounted for by again referring to the Law of Uncertainty Propagation; including the second term in Equation (2.12) yields:

v u n n−1 n ! u 1 X 2 2 X X ur(ν1:n) = t ν u (ν1) + 2 ri,jνiνjur(νi)ur(νj) , (2.14) (Pn ν )2 i r i=1 i i=1 i=1 j=i+1

th th where ri,j is the correlation coefficient between the i and j node. Equation (2.14) can again be simplified by levying the same hypothesis as above, and using an averaged correlation term for the group; this yields

r 1 (n − 1) u (ν ) ≈ u (ν ) + r , (2.15) r 1:n r 1 n n CHAPTER 2. SOURCES OF UNCERTAINTY 56

q 1 with r constant for the network. Note that u ∝ n when r = 0; uncertainty decreases proportionally to the square root of the reciprocal of the number of nodes in the network. Ideally, the r values are experimentally calculated for each grid, as they will vary with the different metering installation set-ups. Such experimental studies carried out at the scale of the NTS can be costly and time consuming. However, r can also be approximated from estimates provided in [2]. In a medium to large sized network such as NG, r = [0.15, 0.2]. Furthermore, for the same network the paper gives an estimated relative total uncertainty of 0.78% p.a. A visualization of the surface defined by the approximation (2.15), for varying r, n, and ur(ν1) (the ’uncertainty multiplier’) is seen in Figure 2.5. It is evident that the ur(ν1:n) declines sharply with the number of nodes, and follows a similar yet more gradual pattern with correlation. Indeed, as nodes tend to infinity and correlation tends to 0, the uncertainty multiplier also tends to 0.

Figure 2.5: System uncertainty surface as defined by Equation (2.15)

In practical terms, the most important factor is segmenting nodes into groups of ho- mogeneous uncertainty. Usually, this can be achieved by grouping according to node type CHAPTER 2. SOURCES OF UNCERTAINTY 57

(terminal, LDZ, power station, etc.), or according to meter type: Orifice plate (101), ultra- sonic (64), or turbine (46). In the case of the NTS, it was only possible to segment according to supply and demand, using the regulatory maximum permissible uncertainty. Assurance schedules will only typically report Pass/Fail for individual metering processes, and not a current uncertainty.

measurement Finally, we can combine above to write down a confidence interval (CI) for Ut , in the absence of meter faults. For a grouping of only supply and demand as seen in Equa- tion (2.11), we have the following standard uncertainty, where ν2 denotes the homogeneous demand-side uncertainty:

v u n r !2 m+n r !2 u X 1 (n − 1) X 1 (m − 1) u(U measurement) ≈ t ν u (ν ) + r + ν u (ν ) + r t i r 1 n n i r 2 m m i=1 i=n+1 (2.16) We can use this value as both an estimate of standard deviation if we wish to model the process as normally distributed, or to calculate confidence intervals using the expanded uncertainty. Hence,

measurement CI = 0 ± u(Ut )α (2.17) where α is the coverage factor - 1.96 in the case of 95% confidence intervals.

Numerical Evaluation

In this section, we aim to evaluate the UAG model according to Equation (2.17), by generat- ing confidence intervals. Unfortunately, it is not possible to segment nodes within the NTS by uncertainty, as discussed previously; all nodes within the NTS have the same regulatory requirement of 1.1% in terms of 95% relative uncertainty. Using an r value of 0.15, we plot- ted the models’ confidence intervals for the NTS UAG from 2014 to 2020. The results can be seen in Figures 2.6 and 2.7. In the latter case, we observe that the confidence interval is constant around roughly 0.9% of throughput. Visually, the following takeaways can be made:

ˆ The model exhibits seasonality, as opposed to the observed UAG. We will cover the topic of seasonality in detail later on, but it is evident from the construction of the CHAPTER 2. SOURCES OF UNCERTAINTY 58

model that, being a function of demand, seasonality can be expected. This seasonality can be observed in two ways; through the obvious pattern in Figure 2.6, and the straight lines in Figure 2.7.

ˆ The term r 1 (n − 1) + r n n is approximately constant in time; this is both due to its magnitude in large networks and the fact that greater demand results in roughly proportionally more active offtakes. In this case, at an r of 0.15 the uncertainty was ±0.0044.

ˆ In general, UAG remains within the 95% confidence interval. Indeed, the number of

205 58 values outside the intervals was 2191 = 9.36% for positive values, 2191 = 1.28% for negative and a total of 10.62% outside the confidence intervals. This would suggest a positive bias; something further substantiated by a mean of 5168114, or 0.22% of throughput.

ˆ Experimental studies are required to determine an accurate r value; however the take- aways of conflicting seasonality and mean-bias remain.

ˆ There appear to be systematic periods of values outside the confidence intervals, most evident in late 2016 and 2019.

Furthermore, using the regulatory maximum should result in a worst case model, and we would expect very few values to be outside the 95% confidence intervals; this is not the case. In practice, we should assume meters should perform considerably better than the maximum. The fit of these models is examined more closely in the next chapter. In conclusion, it is apparent the model does not fully explain the behaviour of UAG within the NTS, but does suggest that it is highly likely random and small-scale systematic meter error is within nominal bounds. We have integrated the above random error baseline model into the industrial real-time monitoring system, to be discussed in Chapter6.

measurement Simulating Ut

Simulation can allow for a more realistic estimation of UAG uncertainty, as it enables for the individual node uncertainty to be modelled as a probability distribution, rather than a CHAPTER 2. SOURCES OF UNCERTAINTY 59

Figure 2.6: UAG Uncertainty Model 1, r=0.15, 95% confidence

UAG Positive Interval Negative Interval

1e+08

5e+07

0e+00 Energy (kWh)

−5e+07

2014 2015 2016 2017 2018 2019 2020 Date

Figure 2.7: UAG Uncertainty Model 1, r=0.15, plotted as a percentage of system througput, 95% confidence. Notice the confidence interval appears to almost be a constant.

UAG Positive Interval Negative Interval

4%

2%

0% Percentage System Throughput Percentage

−2%

2014 2015 2016 2017 2018 2019 2020 Date CHAPTER 2. SOURCES OF UNCERTAINTY 60

fixed value. This also allows for the modelling of more complex matters, such as uncertainty drift. We follow the Monte-Carlo (MC) methodology described below. The results can then be compared with those of the model governed by Equation (2.16), hereafter referred to as Model 1.

1. Assign each node i a relative uncertainty gu(i)

2. Assign each node i an error distribution Di

3. Simulate an error term ei for each node by sampling from Di

∗ 4. Calculate the simulated UAG, Ut as:

n n+m ∗ X X Ut = ei − ei. i=1 i=n+1

5. Repeat steps 1.-4. b times, where b is large. Confidence intervals can be calculated through the standard formula;

a=0.95 0 ± t σ ∗ ν=b−1 Ut

a=0.95 where tν=b−1 represents the Student-t 95% critical value with ν degrees of freedom, as the true σ is unknown. As ν is high, the statistic is approximately normal.

The above methodology is chiefly dependent on the choice of gu(i), and error distribution

Di. Some choices for gu(i) and Di are discussed below, in increasing order of complexity.

ˆ Model 2: gu(i) = umax , Di ∼ N(0, umaxνt,i) where umax is the maximum permissible uncertainty. In this case, uncertainty is an equal constant for all nodes. This is a worst-case model under the maximum regulatory uncertainty, and is provided as a reference as it offers no advantages over simply using Model 1.

ˆ Model 3: gu(i) ∼ U(0, umax), Di ∼ N(0,U(0, umax)νt,i) Here we assume uncertainties

are randomly assigned to each node, distributed uniformly between [0, umax]. This is a more realistic approach, although it introduces variance depending on the flow magnitude distribution. CHAPTER 2. SOURCES OF UNCERTAINTY 61

ˆ Model 4: gu(i) ∼ U(0, umax), Di ∼ MN(0, Σ), where Σ is an n-by-n covariance matrix, with n being the number of nodes.   2 (νt,1fu(1)) νt,1νt,2fu(1)fu(2)r1,n ··· νt,1νt,nfu(1)fu(n)r1,n    2  νt,1νt,2fu(2)fu(1)r2,1 (νt,2fu(2)) ··· νt,2νt,nfu(2)fu(n)r2,n Σi,j =    . . .. .   . . . .    2 νt,1νt,nfu(1)fu(n)rn,1 νt,2νt,nfu(2)fu(n)rn,2 ··· (νt,nfu(n))

and MN denotes the multivariate normal distribution. Correlation of errors is deter- mined by the off-diagonal elements. This allows for greater flexibility in accounting for suspected correlation in errors. Factors that can be considered in assigning the

daily ri,j could be based on, for example, local weather conditions, operating pressure or percentage maximum installation throughput. In our case, we used the assumption that nodes in the same LDZ will share numerous characteristics; often, instrumenta- tion will be common throughout an LDZ, as will operating pressure, local weather and pipeline specifications. Therefore, assigning a fixed value, in this case the experimen- tally derived value of 0.15 from [2] to the LDZ can be considered appropriate.

Meter Error Model Comparison

From Figure 2.8 we can draw conclusions as to the fit of the different models as compared to the true UAG. We first note that Model 2 has in general the widest interval, and Model 3 the smallest, as expected. Model 4 aligns very closely with Model 1, especially when also setting r = 0.15 universally. However, Model 4 allows for easier control over both uncertainty and covariance across groups; indeed, much finer modelling of known correlations can be accounted for using the MC approach. We also note that the same seasonality is exhibited by the MC models. In conclusion then, we have been able to verify Model 1 using a new MC approach, which allows for even finer control. Next, we chose other possible approaches regarding the allocation of covariances.

Covariance Allocation

More convoluted allocations for ri,j between nodes i, j, where i 6= j can be crafted - however, their accuracy will depend on undertaking experimental measurements individually across an array of operating conditions, and this may well be prohibitive when considered from a CHAPTER 2. SOURCES OF UNCERTAINTY 62

Figure 2.8: 95% confidence intervals as suggested by Models 1-4. 7-day rolling averages are plotted in order to improve the visualisation and allow for comparison. Models are symmetrical on the x-axis and thus only the positive half of the y-axis is shown.

30

Model 1 Model 2

20 Model 3 Model 4 Energy (GWh)

10

2014 2016 2018 2020 Date cost-benefit viewpoint. A certain amount of guesstimation may be required. We present an example allocation function r(i, j, t) with the form:

r(i, j, t) = αT ∗(i, j, t) + βV ∗(i, j, t) + γK∗(i, j, t), (2.18) where α, β, γ are dimensionless weighting constants. The functions T ∗,V ∗ and K∗ are bounded in the range [0, 1], and indicate the amount of error correlation resulting from me- ters jointly operating close to the limits of temperature, Volume and C.V. calibration ranges respectively. Such a model can be used for meters of common types. As all 3 functions can be expressed in a similar manner, we will only define V ∗. opt Let vi be the meter calibration point for node i; performance degrades on either side, min max to the end of the calibrated range (vi , vi ). As a simplifying assumption, we take that the uncertainty curve 2.3 is symmetric on either side of the calibration point and the limits

opt min max min are also equidistant to said calibration point; i.e. vi = vi + 0.5(vi − vi ). Let vi,t ∗ represent the reading at time t for node i. There we can define vi,t as a measure of systematic error bias; |vopt − v | v∗ = i,t (2.19) i,t opt max |vi − vi | CHAPTER 2. SOURCES OF UNCERTAINTY 63

The above value is bounded on [0, 1]. Finally, we define V ∗(i, j, t):

∗ ∗ ∗ ∗ ∗ V (i, j, t) = vi,tvj,t(1 − |vi,t − vj,t|) (2.20)

∗ ∗ where the reasoning for the terms is twofold; (1 − |vi,t − vj,t|) ensures that similar flow ∗ ∗ regimes result in higher correlation, whilst vi,tvj,t accounts for the combined distances from the optimal operating point. Note that through the multiplicative property, (2.20) is bounded on [0, 1]. This is desirable, as we wish to bound our final covariance between 0 and a maximal value, rmax, through the constants α, β, γ. The following equality must hold:

α + β + γ = rmax. (2.21)

Meter error aside, literature such as [17] and [35] often fails to identify errors arising from calculated – or modelled terms as a source of UAG, yet there is evidence to suggest a significant proportion of UAG can be attributed to this. We next move on to consider errors arising out of calculating non-metered constituents Equation (1.4).

2.4 Attributed Measurements

An attributed reading is used as an approximation to the true value of a quantity at some point along a pipeline when metering at the desired location is impractical, impossible or simply not cost effective. In general, this may result in minimal error when the quantity measured has minimal variation between the measurement point and the point where the said quantity is being estimated. In this section we discuss issues that may arise, identify where in the NTS this practice is used, and quantify the amount of error resulting from it. We begin by listing all quantities which use attributed site across the NTS:

1. Pipe segment CV, used for the linepack calculation

2. Pipe segment compressibility, used for the linepack calculation

3. Offtake CV, at certain sites where such metering equipment is not present or is not operational. Typically, around 10 sites per day will not meter CV and have to use attributed values, however this number fluctuates. CHAPTER 2. SOURCES OF UNCERTAINTY 64

4. Offtake compressibility. Usually, these offtakes will be the same as those not metering CV, as a chromatogram can be used to measure both values. A dedicated density meter can be used to measure the specific gravity independently.

5. Compressor station CV: CV is not typically measured at compressors stations, again for cost reasons.

6. Compressor station compressibility, as above.

Next, we examine situations where the attributable site estimate may not be accurate, and discuss how this can be mitigated.

2.4.1 Failure Modes

An attributable site is chosen because it is assumed that the flow characteristics metered at that location will be a good approximation under normal operating conditions. An example of a valid attributable site would be a metering station downstream of a pipe segment, in a single pipeline with no junctions between the segment and metering station. Then provided there is continuous flow across the segment and metering station, the approximation’s only error will be the time delay due to the transit time of the gas, and the unavoidable mea- surement error. Attributable sites typically have a backup, and will be hence referred to as the primary and secondary attributable site. The site making use of the approximation will be referred to as the target site. Below, we list cases where the primary or secondary site’s readings may not be a valid approximation to the true values:

ˆ Operational practicalities often result in sites being switched on and off. This can introduce a possible lag (up to 8 hours in the NTS) between a primary site stopping flow, and the target site’s attributable values being updated to the secondary. [49] suggests that due to the fact that multiple segments will typically have the same primary and secondary, switching between them often creates large fluctuations in linepack, in cases where this switch is delayed and the former estimate is no longer representative of reality. Such practical issues can be overcome with adequate control software.

ˆ In recent years, flow directions have not been constant in the NTS. As mentioned previously, initially the system was designed with a North to South flow in mind. CHAPTER 2. SOURCES OF UNCERTAINTY 65

Today, this is not the case for the entire network, with LNG gas being pumped further and further north. Due to such long term flow changes, or more abrupt operational decisions, attribution mappings can become invalidated as the flow of the primary site may simply not be representative of the target site. This is because the gas source at that site is fundamentally different due to changes in the directed network representing the grid. These issues can be overcome by implementing dynamic mappings, which are calculated in real time according to pipe flow directions. Dynamic mappings can also allow for flow-weighted average CVs to be calculated, in cases where a segment’s source is a junction of two pipes with varying pressures and CVs.

ˆ Finally, significant meter error at the primary site may result in an inaccurate estimate. This can cause a domino effect, as the error is then amplified in the UAG as it is subsequently used to calculate more than one energy value. The same measurement error mitigation strategies discussed throughout this work apply here. Ideally, sites which function as primary or secondary attributable sites should be inspected more frequently, due to this increased liability.

2.4.2 Error Estimation at Attributable Sites

The error resulting from the use of attributed sites is again hard to quantify without under- taking an extensive experimental study in real life conditions. However, we can create two estimates, A1 and A2 respectively for attribution error uncertainty depending on whether or not the attributable site is valid, that is to say none of the aforementioned failure modes are present. For reference, Table 2.2 shows the mean CV, and mean daily standard deviation, for a series of offtakes selected for their placement on different and distant pipelines on the grid, being fed from different sources. It is evident that overall, there is a relatively small amount of daily variation in the composition of natural gas across the network. The standard deviation of the mean estimates was 0.16, whilst the overall mean of daily standard deviations was 0.0782. Figure 2.9 depicts the daily CV fluctuation for the Kirkstead offtake, a typical single-feed system, giving an idea of the CV profile that can be seen throughout the day. CHAPTER 2. SOURCES OF UNCERTAINTY 66

Figure 2.9: Daily Calorific Value, Kirkstead, 2019-01-08

39.2

39.1 Volume (mcm) Volume

39.0

38.9

00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00 Hour

Table 2.2: Mean daily CV, mean daily CV standard deviation for select offtakes, 2017 Offtake Mean Daily CV Mean Daily CV SD Braishfield B 39.3 0.068 Careston 39.5 0.0864 Drointon 39.4 0.0837 Farningham A 39.2 0.0974 Farningham B 39.3 0.0473 Kirkstead 38.9 0.0549 Littleton Drew 39.1 0.108 Ross WM 39.3 0.0744 Roudham Heath 39.3 0.111 Rugby 39.3 0.0854 Silk Willoughby 39.1 0.0446

Errors Under Normal Conditions

In this section, we will develop an estimate for the additional uncertainty that is added on to an energy measurement at a node using an attributed CV reading. The following simplifying assumptions are made:

ˆ No failure mode is present, that is to say both the target and primary site are flow- ing, the underlying mappings are representative of the flow profile, and there is no significant measurement error at the target site outside of the calibrated range.

ˆ The target site is directly downstream of the primary site.

ˆ Gas moves continuously at a constant speed v throughout the pipe segment separating CHAPTER 2. SOURCES OF UNCERTAINTY 67

the target and primary site.

To estimate the uncertainty, we must analyse the differences between consecutive CV readings at individual nodes. Let Ct and Ct+1 be CV readings taken at time t and t + 1 respectively. Then

Ct − Ct+1 (2.22) is an estimate of the attribution error at a distance of v4t. The same principle is extended below, to arrive at an average value: For a set speed v expressed in meters per second, we can take differences at increasing time intervals to estimate the uncertainty at varying distances from the primary site. The standard error is then estimated as the standard deviation of the errors. Complicating matters is the fact that readings are not taken at precisely equal intervals, and therefore the methodology below was used to facilitate the calculation.

Let Qi be the set of measurement times, in seconds, for node i. Likewise, let Ci be the set of CV readings, taken at time Qi for node i. Let nodes be indexed by i ∈ [1,N], and time by t ∈ [1,T ]. Let κi(t) be a function that maps a measurement time t ∈ Qi to a CV reading w ∈ Ci, for a given node i. We then define the functions S(i, t) and ω(i, t1, t2) as follows:

S(i, t) = minq∈Qi s.t t − q > 0

The function S returns the smallest time in Qi that is greater than the input time t. This is required as readings are taken at discrete and unknown times, and thus the CV value at an exact time has to be estimated by the last known reading.

ω(i, t1, t2) = κi(S(i, t1)) − κi(S(i, t2))

The function ω returns the difference in estimated CV readings for a set of times, through using the estimation provided by S.

1. Define a time step m in seconds, for e.g. let m = 60. This will determine the distance, d = mv at which uncertainty is being estimated.

T 2. For each node i ∈ [1,N], let Di = {ω(i, (t − 1)m, tm)|t ∈ [2, 3, ..., m ]} CHAPTER 2. SOURCES OF UNCERTAINTY 68

3. Approximate standard error as N 1 X Ad = σ(D ), 1 N i i=1 d where σ is the sample standard deviation function and where A1 is the standard error

for an attributable site d meters away. In our analysis, we found that the mean Di was consistently of a magnitude ≤ 10−5 or lower, which given a typical CV is negligible - therefore, we can assume the estimate to be unbiased.

Applying the above procedure to a series of time intervals results in Figure 2.10. The figure depicts the estimated curve from the aforementioned sites, along with an exponential model fitted on-top illustrated by the red line. The model is presented below:

V   Ax = x (−0.07260) exp(−0.01538x) + 0.08603 , (2.23) 1 18.55 x where A1 is the standard error estimate for an attributable site x kilometers away from the primary, and Vx is the average speed in meters per second. We note the model presents a very good fit to the data, with an RSS of 0.0001473. The only disparity can be seen at very low distances, namely under 5 kilometers; however, the error at such distances can be assumed to be 0. Indeed, the overall trend is that provided the assumptions are valid, using attributed sites results in very little additional error. For reference, for a typical average CV of 39.6, if every pipe segment was 50 km away from an attributable site, the uncertainty

50 penalty at 95% would only be A1 = ±0.25% for a point estimate. Figure 2.11 plots the same curve for the range of speeds at which natural gas typically transits through a transmission pipeline.

Errors Under Fallacious Attribution

As previously discussed, there are a number of reasons why the assumptions behind the attributable site selection may become invalid. We will focus on the second failure mode discussed above, where due to operational practicalities the primary site ceases to be repre- sentative of the flow at the target site. Therefore, in this instance to calculate the standard uncertainty we simply have to assess the homogeneity of CV composition within the grid. Since the flows are assumed to be inherently different, the temporal alignment is less rele- vant and therefore we only investigate for the instantaneous case. Our choice of topologically distant nodes is relevant here, as this guarantees an ’error state’. CHAPTER 2. SOURCES OF UNCERTAINTY 69

Figure 2.10: Mean Std error for the CV of an attributable site, NTS, based on 11 nodes seen m in Table 2.2 with 95% confidence interval for 18.55 s gas transit speed. Red line indicates the model fitted in Equation (2.23)

0.075

0.050 Std Error

0.025

0 25 50 75 100 Kilometers

1. Define a time step m in seconds, for e.g. let m = 60. In this instance, m is arbitrary - the value chosen should however be slightly larger than the typical measurement frequency.

2. for each node i

3. for each node j, j 6= i

T 4. for each time step, t ∈ [1, m ]), let Gj,t = κi(S(i, tm)) − κi(S(j, tm))

5. let

Bi = σ(Gj,t)

6. estimate standard error as N 1 X Ad = B 2 N i i=1

Applying the above methodology to the nodes in question yields an average standard

5 error of A2 = 0.4536, or in perspective typically 2.3% of the CV value. This estimate was calculated for a distance of 5 km, and should be invariant with respect to distance considering CHAPTER 2. SOURCES OF UNCERTAINTY 70

Figure 2.11: Mean standard error for the CV of an attributable site, NTS, against a range of gas transit speeds

0.06

Speed V=18.55 m/s 0.04 V=21 m/s

Std Error V=24 m/s

0.02

0 25 50 75 100 Kilometers

flow should not be representative of the primary site at any point in time under the assumed failure mode. Nodes which use attributed CV can simply have the attributed uncertainty added onto the meter error component of their final energy measurement. This can be directly accounted for through the meter error models discussed above. An improvement to the attributed reading methodology would be the use of appropriately lagged measurements, depending on the distance and prevailing gas transit speed between the primary and target site. Further improvements could be achieved by interpolating between successive readings at the primary site to more precisely estimate the value at the target site. This would theoretically remove the aforementioned uncertainties entirely, under the idealised assumptions previously discussed. Finally, we note that all the above attribution errors have related to calorific value mea- surements, which are required only for energy values. Therefore, they would not be present in the UAG were it to be calculated for volume. Attributed CV plays an important role in linepack calculation, which is the topic of the next section. CHAPTER 2. SOURCES OF UNCERTAINTY 71 2.5 Linepack

The largest term in UAG which must be computed numerically instead of measured in any pipeline system is the linepack, as it is fundamentally a complex-shaped container which has continuously flowing inputs and outputs at any given time. Moreover, temperature, pressure, CV and compressibility are not constant throughput the pipeline. [2] do not include a linepack uncertainty term in their UAG model, arguing it is negligible. However, as will be shown, this is erroneous on two counts:

ˆ The mean absolute magnitude of DLP (as seen in Figure 1.5) is 20 GWh, for the period 2015-2020. When compared against the empirical cumulative distribution of mean non-zero flow of nodes in the same period (see Figure 3.13), linepack is in the top 78%. This suggests that even if its uncertainty is as low as that of a metered node, it should still be included in a UAG model.

ˆ Linepack is difficult to calculate in complex transmission grids, due to practical issues which will be discussed. Estimated values are often used for certain terms, which suggests that uncertainty is probably much higher compared with a fully metered offtake.

Therefore, it is important to understand the linepack calculation methodology, and quan- tify the uncertainty. We begin with its calculation.

2.5.1 Calculation

In the case of the NTS, linepack is calculated by directly estimating the quantity of gas contained within grid, by splitting up the pipeline system at strategic points into smaller segments, where measurements for pressure (P ) are available on either end - the notation for which is P1 and P2, and likewise for temperature (T ), compressibility (Z) and Calorific Value (C).

Free volume Vf for each segment is calculated as

π V = D2L, (2.24) f 4 where D is the pipe diameter, and L is the segment length. A possible source of error, as identified in [67], is the use of inaccurate pipeline dimensions in the above equation. CHAPTER 2. SOURCES OF UNCERTAINTY 72

The volume of gas within the pipe segment can then be calculated by combing (2.24) and (2.3):

Pavg Tb Zb Vb = Vf , (2.25) Pb Tavg Zavg where

1 P = (P + P ), (2.26) avg 2 1 2

1 T = (T + T ), (2.27) avg 2 1 2

1 Z = (Z + Z ). (2.28) avg 2 1 2 However, the gradient within the pipe of each variable is non-linear, and therefore the seg- ment length L directly affects uncertainty, with lower values being better. The pressure estimate can be improved by using the formula from [55]

  2 P1P2 Pavg = P1 + P2 − . (2.29) 3 P1 + P2 Finally, linepack needs to be calculated in energy and thus Equation (2.5) is applied again, using volume as per Equation (2.25).

2.5.2 Temperature

Specifically in the NTS, a fixed value of temperature is used for the entire system - that is

∗ ∗ to say, Tavg = T , with T constant, which is clearly a source of bias and uncertainty. This approximation is due to operational reasons, and similar compromises can be expected in all large and complex control systems. In the case of the NTS, the above approximation is used due to temperatures being metered at above ground installations, where the sensors can be exposed to direct or indirect sunlight; this results in large energy fluctuations being observed in the control room, hindering the operator’s capacity to react properly to system demands. CHAPTER 2. SOURCES OF UNCERTAINTY 73

Non-isothermal behaviour

Within a given segment whose inlet and outlet temperatures are defined as T1,T2 respec- tively, and assuming no external influences between the two points such as compressors, the gas temperature will usually tend to stabilise just under the surrounding ambient soil tem- perature (Tg), due to the Joule-Thomson effect. Therefore, in segments where T1 ≈ Tg, using the respective soil temperature estimate for T1,T2 is a sound approximation. Of course, soil temperature does vary along the pipeline, however as will be discussed such errors are un- likely to be drastic at common transmission pipeline depths, particularly when considering a relatively small segment length L. However, in segments adjacent or close to compressor or input node outlets the assump- tion that T1 = Tg is unlikely to hold. In the case of compressor stations, gas is heated to high temperatures and subsequently cooled to around 45-50◦C, to prevent pipeline damage [55]. This is substantially higher than typical UK soil temperatures, and therefore in the described scenario soil temperature estimates cannot be used as a proxy. Likewise, the NTS temperature input range is typically 0-30◦C, and thus temperature will ultimately depend on source (e.g., subsea pipe vs LNG regassification). However, the temperature in this case will be known, as it is one of the variables always measured at metering stations. Compressor outlet temperature will also typically be either measured, or derivable from the operating parameters. Gas can also cool within the pipeline as it flows through devices causing pressure changes, e.g. regulators. An important consequence of the fixed temperature assumption is that there exists a direct causal relationship between OUG, DLP error and consequently UAG, as compressor operation increases the amount of gas at a greater temperature than the fixed value, thus increasing the amount of total daily error. The rate at which gas will converge to a point just below ambient temperature will again depend on multiple factors, including pipe diameter, pipe insulation, pipe thickness, gas composition and ground composition. Models for predicting temperatures of non-isothermal flow in pipelines exist, both in closed analytic form [22] and through DE based modelling [94],[51]. Provided inlet temperature is known, and this should be the case for both scenarios discussed above, the distance d at which Td ≈ Tg can be calculated, where Td is the pipe temperature at a distance d from an arbitrary point. This method should be used to calculate

T1,T2 for all segments until the temperature is approximately, or slightly lower than the CHAPTER 2. SOURCES OF UNCERTAINTY 74 ambient soil temperature. For all other segments, ambient soil temperature can be used as an approximation to the gas temperature. Next we consider how this varies in the UK and some models one can use to estimate it.

Soil temperature

Soil temperature up to a meter in depth experiences fluctuations due to the diurnal cycle. In general, the lower the depth, the less influence the earth’s atmospheric conditions have. Beyond a depth of 9-12 meters, the annual weather cycle has no effect, and the temperature remains constant due to the high thermal inertia of soil [28]. This can further be observed from Table 2.3; 10 cm standard deviation in temperature is on average around 1.5 degrees higher than that at 100 cm. The wider variation can also be seen when comparing minimum and maximum values. However, the mean at 100 cm is consistently higher. Figure 2.12 also illustrates this. Pipelines are buried at typical depths of 1.1-2 meters, and therefore whilst are not ex- posed to diurnal fluctuations, they can be expected to exhibit seasonality. Models exist for estimating temperature of given soil depths; for example, that of Kusuda [48], which only re- quires measurements at ground level. In our scenario, this may be the easiest to implement, as there are existing AGL measurements. However, this model is inaccurate up to about 1 meter in depth, which is close to the minimum pipe depth and therefore cannot be rec- ommended for the shallowest pipes. Another model [7] can extrapolate temperature depths via a sinusoidal model, and is recommended for estimating temperature at lower depths. In practical terms, model selection is often limited by data availability. It must be pointed out that the soil above transmission pipeline does not necessarily have the same qualities, and most importantly thermal conductivity, as unperturbed soil; different fill materials may be used, and soil may be placed under different levels of compaction.

Improved methodology

To conclude, in the absence of temperature readings for each pipe segment we suggest that the best approach is to calculate linepack through existing commercial pipeline simulation software, capable of transient-state non-isothermal modelling and also factoring in the trans- mission grid’s exact specifications and operating conditions. Such an analysis would come at a considerable computational cost, however this is only required to be calculated at 24h CHAPTER 2. SOURCES OF UNCERTAINTY 75

Table 2.3: Select UK weather station soil data for 2017, from MIDAS database. Subscript denotes soil depth in cm, for the indicated function: µ, σ, min, max denote the mean, standard deviation, minimum and maximum temperature respectively. idsrc indicates the MIDAS id. idsrc µ100 µ10 σ100 σ10 min100 min10 max100 max10 County 54 9.6 9.0 3.1 4.0 5.3 1.5 14.1 16.2 W.Isles 393 11.8 10.5 3.3 5.2 6.6 1.6 16.3 21.9 Lincolnshire 658 12.4 11.6 3.8 5.3 6.6 2 17.8 21.2 Herefordshire 723 13.3 11.8 4.1 5.4 6.1 1.5 19.4 22.9 Greater Lodnon 1066 11.2 10.7 3.1 5.1 7 0.9 15.2 18.6 Cumbria 1105 10.9 10.5 3.1 4.7 6.5 1 15 20.1 Lancashire 1352 11.5 10.2 2.6 4.4 7.2 1.1 15.3 20.7 19188 11.5 10.7 3.9 5.8 5.1 0.3 17.4 23.7 30690 10.5 8.8 3.3 4.6 5.4 0.8 15.2 19.2 Staffordshire intervals. Alternatively, we suggest that the following methodology should provide a good approximation:

1. Distance from compressors and input points at which isothermal flow with respect to ambient ground temperatures occurs must be calculated, using models such as [22], [94], [51]. Segments within this distance must use those models for temperature estimates,

in combination with known T1 values from inputs and compressors, and known T2 from ambient ground temperature models.

2. For segments not included in the above, ambient ground temperature at the pipe centre

depth can be used to approximate pipe temperature to calculate T1,T2. Depending on reading availability, models [48] or [7] can be used. Specifically in the UK, due to the data availability of soil temperatures at various depths the latter is recommended. Point estimates can be obtained using a distance-weighted average of the n closest metering stations.

Error estimates

It is difficult to quantify the current error present in the LP calculation arising from the fixed temperature assumption for a number of reasons: Compressor usage and flow directions change on a daily basis, so it is not feasible to retrospectively calculate non-isothermal temperature estimates. Moreover, accurate pipe depth measurements for the entirety of the NTS are required, as are pipeline pressures for all segments. The average yearly standard deviation at 100cm depth across 131 stations was 3.35◦C CHAPTER 2. SOURCES OF UNCERTAINTY 76

Figure 2.12: 2017 mean daily 9am temperature across 131 MIDAS weather stations for varying soil depths.

15

Soil Depth 100cm

10 10cm 30cm Degrees Celsius

5

Jan 2017 Apr 2017 Jul 2017 Oct 2017 Jan 2018 Date

in the UK, with a mean of 9.9◦C. This fact suggests potential for serious bias in the LP estimate, no matter what value is used as the fixed estimate. However, it must be noted that due to the relatively minor inter-day variation of soil temperatures, the effect of this error is reduced as it is the difference between consecutive days that is of interest; therefore, the bias of both estimates is likely to be very similar.

2.5.3 Multiphase Flow

The calculations in Section 2.5.1 are valid for single-phase flow, where the contents of the pipeline are assumed to be uniformly gaseous. Multiphase flow refers to flow where an amount of liquid has condensed, and is slowing through the pipe simultaneously. This takes up free volume which would otherwise be available for the gas, therefore introducing another error. In practice, there will usually be an element of multiphase flow.

2.5.4 Linepack Uncertainty

We propose two procedures to be used in estimating linepack uncertainty whilst factoring in the use of attributable sites for CV, labeled as PLU1 & PLU2 (Procedure for calculating CHAPTER 2. SOURCES OF UNCERTAINTY 77

Linepack Uncertainty). Compressibility uncertainty is not explicitly factored in, but as this is also an attributed value, the steps taken to account for CV attribution can simply be repeated in the case of compressibility uncertainty, following a similar data analysis and model fit. The first procedure is exact, and should be implemented in the SCADA system, as manual calculation is not realistic.

x PLU1.1 For each segment, individually calculate A1, using data from the primary site in the previous 24h interval, or alternatively the model specified in Equation (2.23).

x PLU1.2 If it is known that the attributable site was not reliable, use the estimate A2.

PLU1.3 Calculate the standard uncertainty of the energy value contained within each segment. This should also include the uncertainty in Equation (2.25), calculated by applying

x x Equation (2.10). Note that instrument uncertainty must be added onto A1,A2.

PLU1.4 Add the above uncertainties for each segment in quadrature. Thus, if we let u(si) represent the standard uncertainty for the ith segment, linepack standard uncertainty u(L) can be expressed as follows:

p 2 2 2 u(L) = u(s1) + u(s2) + ...u(sn) . (2.30)

Whilst the above methodology will give a more accurate estimate, the cost of a its inte- gration into an existing SCADA system where an implementation is not already present may outweigh the marginal value it can provide in terms of uncertainty quantification. There- fore, we also provide a more practical rough estimation methodology (PLU2) which can be implemented more easily and with minimal developmental cost, described below:

PLU2.1 Estimate what proportion, if any, of linepack is being fallaciously attributed a CV

value. Let 0 ≤ {p1, p2} ≤ 1 denote the proportion of normal and fallacious attribution respectively. In our case, make the arbitrary choice of 0.8 and 0.2 respectively.

PLU2.2 Calculate an average attribution uncertainty as follows:

x x A = p1A1 + p2A2 (2.31)

An average attribution distance must be used. In our case, we take this as 50 km, giving us A ≈ 0.66%. Here, we are also approximating the relative uncertainty from the standard uncertainty by using an average value for CV of 39.6, in order to enable us to add this value in the next step. CHAPTER 2. SOURCES OF UNCERTAINTY 78

PLU2.3 Add the relative uncertainties of Equation 2.5 together as they are being multiplied, assuming constant instrument uncertainty for all segments. In our case, this yields a

relative uncertainty of ur(L) = 1.76%, by using the values in Table 2.1 (1.0% for volume measurement, and an additional 0.1% of energy instrumentation error). Alternatively, standard rules of uncertainty propagation can be applied to Equations (2.24)–(2.29).

These findings contradict those from [49], where it was suggested that the majority of linepack error is due to CV uncertainty. Our own calculations have shown this is not the case. Further, a similar investigation could be carried out in exactly the same fashion with compressibility data, to estimate the uncertainty of that term. Finally, it is noteworthy to point out that DLP is inherently a zero-sum process as it represents changes in energy in a fixed-volume container; errors and as such their influence can be mitigated by working with UAG values calculated at a higher aggregation level, e.g. monthly.

2.6 Non-integrated Energy

In practice, Equation (2.5) is integrated with respect to time throughout the day, in order to arrive at an accurate final energy value. This is because both terms vary throughout the day, as can be seen in Figures 1.4 and 2.9. These calculations are carried out automatically on- site, by the measuring equipment. However, situations may arise where this is not carried out due to operational difficulties. For example, typically when using an attributed site for energy, a simple average value will be used alongside the integrated daily volume, as calculations will be performed manually rather than within the SCADA system. Likewise, some small sites such as compressors will report only a totalised end of day volume figure. Therefore, in this instance integrated energy cannot be calculated. Using a non-integrated energy value fails to reflect fluctuations arising from the varying nature of both volumetric and CV flow, i.e. the difference between a simple and weighted average. It is not known how big an influence this might have on UAG, however if CV telemetry fails at a large site and this coincides with a large intra-day CV variation, the result might be a large daily UAG, which will not be rectifiable without making assumptions as to the true CV profile for that day. This type of error can be rectified completely by integrating 2.5 with respect to time, as CHAPTER 2. SOURCES OF UNCERTAINTY 79

Table 2.4: Meter vs Data error statistics, 2011-2016 Meter Error Data Error Mean 0.0068 -0.0169 Std. Dev 0.6154 5.0399 Sum 15.5782 -38.6230 Absolute Sum 353.7460 1997.5908 opposed to totalising volume and multiplying by an average CV.

2.7 Accounting Errors

Data flow from the point of measurement to the final reporting cycle is not always fully automated, and may involve some element of human-supervised data transfer. Indeed, be- tween 110-150k manual data inputs were performed regarding flow values in the financial year 2018-19. This creates an opportunity for mistakes to happen. Points at which such an error can be introduced in the system can be seen in Figure 2.1, indicated by the data input and transmission within the orange rectangles, and the subsequent database processing and manipulation processes. These kinds of errors are hard to quantify and are also irregular - for example, omitting a zero or leading digit when inputting data will result in an entirely different order of magnitude, whilst mistyping the last number will result in a minuscule error. Furthermore, they are dependent on the complexity of each network, and the degree of automation. Missing data and forecasted flows automatically inserted in non-flowing days also present a unique form of accounting error. A fully digital accounting system should in theory be immune to such errors. Figure 2.13 illustrates the magnitude of total daily fiscal errors, grouped by meter or data system error. We note that data system errors account for the vast majority of large magnitude events. From Table 2.4, we note that whilst there is only a small discrepancy in means with meter errors being mostly positive compared to data errors, the latter have a significantly higher standard deviation, and account for a considerably larger amount of UAG, as seen by the absolute sum. Accounting errors can prove to be both systematic and random; a misconfiguration of a database process will result in a systematic error until rectified, whilst a user mistyping a figure is a random event. The random component of accounting errors can be modelled as a Poisson process. From historical data, the rate was estimated to be 0.1756 per Gas CHAPTER 2. SOURCES OF UNCERTAINTY 80

Figure 2.13: NTS billing errors throughout the period 2013-2018, grouped according to data system and meter errors, with 3 outliers removed.

5e+07

0e+00

Data Error Meter Error Energy (KWh)

−5e+07

2014 2016 2018 Date

Day, or 0.006595 per flowing node, with 95% confidence intervals of (0.1570, 0.1926) and (0.003212, 0.009978) respectively. The individual severity of errors is difficult to assess, as already discussed in Section 2.3.4. A more thorough statistical analysis is presented in the next section due to richer underlying data. As mentioned above, such errors can be mitigated through investment in data infras- tructure, ensuring all telemetry transfer is entirely automated from the point of metering to both control and fiscal data storage. Next we will consider a special kind of accounting error in the NTS; errors resulting from delayed information imputation.

2.8 Closeout Period

In the NTS, not all end of day energy values are based on telemetry; in some cases, final values are received from operators (for example, this is the case with supply terminals). In other cases, due to operational factors the initial value is deemed to be inaccurate. Whatever the cause, flow data received at time t, in days, is subject to updates within a time window [t, t + w]; on the demand side w = 5 and on the supply side w = λ(t): CHAPTER 2. SOURCES OF UNCERTAINTY 81

Table 2.5: Summary statistics by correction group. All energy values are in GWh. Mean and Sum are based on absolute values. Group ¯j Mean Sum N σ Days No. nodes Missing Data 2.70 34.32 78323.05 2282 69.30 347 284 Standard Correction 3.02 0.31 1888.41 6191 1.80 622 178 Zero Out 3.11 3.83 436.38 114 11.45 98 43 All 2.93 9.39 80647.84 8587 38.87 650 184

λ(t) = 15 + Month(t) where Month(t) is the number of days remaining until the end of the current month. This period is known as the closeout period. If discrepancies are made known beyond this date, the case is treated as a reconciliation and goes through a formal regulated process, which is inefficient and costly. Such errors will then be classified as the aforementioned accounting errors. Therefore, there is significant motivation on both sides to acquire the true values within [t, t + w]. Throughout the next sections, we will use j to denote the time lag in days at which flow value for a node is finalised.

2.8.1 The Dataset

Once updated, the prior flow data is replaced and can no longer be obtained from the billing API (MIPI); hence, daily snapshots were collected at 23:00 GMT commencing on 15-02-2018 through to 15-02-2020, for a total of 730 days. Only demand-side nodes were monitored in this instance, for a total of 193 nodes. The focus of this section will primarily be the demand side. The MIPI data is presumed to be equivalent to billing data.

2.8.2 Analysis

Over the 730 days in question, Table 2.5 contains relevant summary statistics. The table cat- egorises errors into 3 groups depending on their nature, as discussed in Section 2.3.4. By far the most common type of correction were standard corrections (6191), followed by instances of missing data (2282) and a relatively small amount of zeroing out (114). Moreover, as can be expected, instances of missing data resulted in errors of the greatest magnitude and standard deviation on average, followed by cases of zeroing out. This justifies the grouping levels applied, as such errors must be treated differently. CHAPTER 2. SOURCES OF UNCERTAINTY 82

Figure 2.14 illustrates the error magnitudes by group, across the aforementioned time frame. Once again, it is evident that the missing data is largely of a different magnitude to the other error types. Moreover, there is obvious seasonality; this is a result of the larger amount of active nodes in summer, increasing the potential for error. We can also observe that the majority of standard corrections are small (< 10 GWh), as are cases of zeroing out.

Figure 2.14: Number of corrections

Missing Data 0

−200

−400

−600

Standard Correction

40

20

0

Energy (GWh) −20

Zero Out

60

40

20

0 2018−07 2019−01 2019−07 2020−01 Date

From Figure 2.15, it is evident that the overall trend is that corrections decrease as j increases. However, we note that large corrections, and in particular missing data is still inputed as late as j = 4. Whilst the original intention was to produce confidence intervals for UAG given varying levels of j, it is obvious that this would not be useful due to the magnitude of the errors. Therefore, it can only be concluded that in situations where such a large magnitude of errors persists, UAG analysis cannot take place before the end of the closeout period.

2.8.3 Mitigating Closeout Period Errors

The closeout period and its implications was one of the key investigative pathways specified in the project brief. Therefore, we discuss factors that can improve database completeness and accuracy prior to closeout. CHAPTER 2. SOURCES OF UNCERTAINTY 83

Figure 2.15: Number of corrections

j=2 0 −200 −400 −600

j=3

0

−200

−400

j=4 0 −100 Energy (GWh) −200 −300 −400 −500 j=5

0

−200

−400 2018−07 2019−01 2019−07 2020−01 Date

ˆ Missing data checks: Automated checks can verify whether data fields pertaining to metering stations reporting data manually have been populated. A simple daily process would allow operators to follow up on missing data promptly, provided this information is available and easy to access.

ˆ Zeroing out: The causes of such errors need to be investigated further, as this is an altogether atypical scenario. Data fields should never be populated in advance with simulated or forecasted flows, with the expectation this will be corrected once data has been received.

ˆ Statistical monitoring of forecasted vs actual flows. This can be extended to the monitoring of shipper nominations vs actual flows. Specifically, this can be used to identify cases of missing data easily without having to create new models for node flow rates.

Methods for identifying simple corrections will be discussed later in Chapter3. CHAPTER 2. SOURCES OF UNCERTAINTY 84 2.9 Additional Factors

The following sources of error can be considered minor in comparison to the aforementioned, and in some cases are not applicable to transmission grids; yet they still warrant discussion.

Calorific Value Shrinkage

CV shrinkage is a fiscal requirement arising from the Gas (Calculation of Thermal Energy) Regulations 1996. Due to LDZ’s being assigned a common daily CV for downstream billing purposes, individual nodes within each LDZ cannot be billed for energy exceeding the LDZ Flow Weighted Average CV (FWACV) by an absolute value greater than 1 MJ. This energy is compensated via the shrinkage term in the UAG. These situations are normally avoided by the control room. In cases where such an event does occur, this does result in a systematic albeit compensated error being introduced in the individual node flows, as the FWACV is used instead of the metered CV. Therefore, whilst CV shrinkage has no impact on UAG, it can affect individual node flows which can impact forecast accuracy and any systematic error detection process focusing on those flows. This highlights the importance of understanding all underlying modelling processes in detail. Fortunately, situations where FWACV is used in the NTS are few, and as can be seen in Figure 1.5 the magnitude of CV shrinkage relevant to other balancing terms is low. CV shrinkage can be classified as a form of calculated value error.

2.9.1 Non-technical Losses (NTL)

NTL encompasses losses arising from gas theft, which is carried out via either illegal connec- tion or unauthorised meter manipulation, either through the metering process itself or its data storage and data transmission function in the case of smart meters. This is not a factor in transmission grids, due to the large volumes and high pressures involved making illegal connections prohibitively unwieldy. Likewise, meter theft would require a commercial fraud operation of unprecedented scale so this potentiality can be discounted. However, this is an issue in some distribution grids [62], particularly in developing countries where enforcement and regulation may be more lax, and where the rate of introduction of smart meters is low. In distribution grids, theft will manifest itself as a systematic under-read if it is done at the point of metering, or only be apparent through observation of a balancing quantity such as CHAPTER 2. SOURCES OF UNCERTAINTY 85

UAG if it is due to an illegal connection. We will consider the gas theft scenario, which is identical to a large systematic meter fault, in Chapter4.

2.9.2 Conflicting Interests

The transporters’ main motivation is to minimise transportation costs and therefore, by extension, to ensure the metering and accounting processes are as accurate as possible to avoid absorption of UAG costs. Yet actors on either side of the NTS wish to either maximise or minimise the chargeable energy, within the legal framework; thus, situations where billing disagreements arise may lead to a conflict of interest. Disagreements may occur as a result of technical factors such as asynchronous timings on flow computers, meaning that there are slightly different energy totals for aggregate periods (daily) being received by the transporter and counterparty. Although the net results of such differences may be small, these conflicts of interest pose a risk of systematic error to the balancing process. The magnitude or frequency of the aforementioned errors is impossible to quantify without the relevant data. Moreover, since demand side transactions are likely to be underestimated, and supply side are more likely to be overestimated; the combined effect doubles rather than cancels out the error. Such issues can be eliminated by, for example, using a mutually agreed up single-point meter and flow computer, and giving the transporter or regulator an arbitration role as they have no financial motivation to misreport flows, as opposed to allowing the counterparty to settle billing arrangements.

2.9.3 Billing Cycle Discrepancies

A significant source of error in distribution networks arises out of unsynchronised billing cycles for end users (both with calendar billing period and with each other), along with estimated consumption. It is important to note this is not a factor in transmission grids, where the billing cycle is strictly the 5-to-5 Gas Day. However it does affect the ability of DNOs (Distribution Network Operators) to check for errors against their own reported flows, and to also calculate an accurate downstream UAG figure; this has implications for the transmission operator, as will be discussed in Section 5.1.1. CHAPTER 2. SOURCES OF UNCERTAINTY 86 2.10 Analysis

The above errors have been identified by a systematic investigation into all parts or the metering and data processing ecosystem. Identifying error contributors and their respective magnitudes is a time-consuming and difficult process to carry out in large and complex systems; often factors can easily go unnoticed and only be detected incidentally. Therefore, it is important that potential errors are thoroughly described, quantified and logged when any part of the system is modified, removed or expanded. This principle should be upheld from the start when building new systems, and would remove the need to hold retrospective studies like the present. If these error contributors are not systematically documented in the large organisations operating such grids, eventually the information becomes highly decentralised and may even be lost. Maintaining a database of error contributors allows for the following key processes:

ˆ The creation of baseline models built from the ground up: Whilst not all factors can be accounted for, this is completely impossible if the underlying error generation processes are not understood, quantified and documented.

ˆ Active management and monitoring of risks.

ˆ Contingency planning in cases of abnormal behaviour: Potential sources are known, and can be systematically re-evaluated.

ˆ Scope for improvement: Where errors can be minimised through the implementation of either better methodology or instrumentation, this could be an area of future invest- ment. Having such quantitative data will allow for decisions to be made on a sound financial basis.

We have considered the vast array of factors that contribute to UAG in the NTS, and in Table 2.6 we present a summary thereof, with an appraisal of the potential impact and modelling capacity. The most important factors for the NTS in terms of UAG impact were found to be Linepack, non-intergation of flows, accounting errors and the unavoidable measurement error. We have presented mitigation strategies, summarised in the table, for all factors. Leakage is the only instance where the effect was found to always have a numerically positive impact on UAG. CHAPTER 2. SOURCES OF UNCERTAINTY 87 ± Either Either Either Either Either Either Either Either Positive Positive UAG Mitigation Integrating Energy Pipeline Inspection Immediate Investigation Maintenance Procedures Modelling; Assurance Schedules Regulatory / Legal Intervention Removal of Potential for Conflict Pipeline Simulation, Temp Probes indicates whether the error has a strictly positive, negative or Dynamic Mappings, Flow Weighted Streams ± Digital infrastructure; Data Input/Output Cross Check Low Low Low Low Low High High High M.Cap Medium Unknown impact on UAG. UAG Low Low Low Low High High High V High Impact Unknown Unknown potential Type Either Random Random Random Random Systematic Systematic Systematic Systematic Systematic Error Theft Meter Error Meter Error Non-Integration Accounting Error Attributable Sites Conflict of Interest Leakage: Operational Leakage: Single Point Linepack Temperature Table 2.6: UAG errorcomposite sources model. in the Impact NTS. refersmixed M.Cap to impact refers the on to UAG. the overall capacity of the error type to be modelled, and accounted for in a CHAPTER 2. SOURCES OF UNCERTAINTY 88 2.11 Conclusion

Specifically in the cases of attributed measurements and linepack calculation, we have pro- posed methodologies calculating their uncertainty. It has been suggested that calculating UAG in terms of volume may remove a large amount of uncertainty from the calculation. Existing meter error models for gas transmission systems have been examined, and we have proposed our own methodology based on Monte Carlo simulation which offers greater flexibility. We have also proposed methods for allocating covariance estimates between in- dividual nodes. We have explored temperature variation within the pipeline and soil, and have concluded that the use of a constant fixed value for grid temperature is not a good alternative to using metered within-pipe values. It has been shown that throughout the closeout period, UAG cannot be accurately cal- culated, and therefore action should not be taken until values are finalised. It is important to note that in general, steps should be taken to alleviate the source of error where possible, rather than compensate through a wider composite baseline. In some instances, careful study may conclude that the additional uncertainty may be acceptable given the cost savings, as is the case for certain attributable sites, where uncertainty is low (< 1%). The next chapter will combine several of the factors contributing the UAG identified here, and produce a composite baseline, which will then be compared against a variety of statistical models, alongside the pure meter error models. Chapter 3

Daily Baseline

A baseline for UAG quantifies the expected amount of variation resulting from unavoidable measurement and estimation error. Values beyond this baseline can be interpreted to suggest the presence of a large error, which indicates that a non-zero number of the components in Equation (1.4) contain an error significantly outside their uncertainty range. The baseline used by National Grid at the time of writing to determine whether a daily

UAG value, Ut is abnormal and warrants investigation, is a fixed constant with a value of ±20 GWh. This legacy value represents a ‘best-by-test’ approximation of a reasonable prediction interval for the true value of UAG, and has been settled upon based on results of historic investigations. In effect, the baseline functions as a decision interval:

− + Dt = [dt , dt ], (3.1)

as action is taken when

− + Ut ≤ dt or Ut ≥ dt .

The action triggers a series of data verification and checks to be carried out. Therefore, by considering historic values of UAG, and combining the results of Chapter2 we will now attempt to create robust and theoretically sound intervals for D. However, we will also investigate whether relying on UAG exceeding this interval is for us to infer that sufficient for significant errors in billing. We first begin with a discussion on the appropriateness of different measures of UAG, followed by a statistical analysis of UAG values and daily system energy flows. Subsequently, we will attempt to model UAG both using univariate and multivariate techniques, so that we

89 CHAPTER 3. DAILY BASELINE 90 can determine appropriate decision intervals. This modelling approach will then be verified against historical data.

3.1 Measures of UAG

Reporting standards for UAG calculation, frequency, and level of aggregation are not uni- form amongst the international community of gas transmission and distribution. It is often entirely up to the transmission grid operator to decide upon such standards, and set up Key Performance Indicators (KPIs) accordingly. Likewise, regulatory bodies may use equally arbitrary forms of UAG when setting targets. Therefore, it is important to explore the frequently used standards, and discuss their pros and cons.

3.1.1 Expressing UAG

In addition to aggregate energy or volume, UAG is also sometimes expressed as a percentage of the total energy transported across the system over a given time period, or also known as put throughput Tt . The definition of this varies; for our calculations, we define it as:

n put X Tt = νi,t − (λt − λt−1) − ct, (3.2) i=1 where the terms are as defined in Section 1.4: λt and ct denote LP and OUG respectively. We can then express percentage UAG:

p Ut Ut = put . (3.3) Tt The motivation for this is that, naturally, one expects the magnitude of UAG to be directly related to total throughput; so using the percentage should transform the UAG to be demand-invariant as seen in Figure 2.7. The presence or not of seasonality will be p investigated in-depth in a later section. Using the term Ut also has the advantage of allowing comparisons across different transmission networks. This form of UAG can be a useful metric when expanding or modifying the network; whilst the UAG in energy terms can be expected to undergo distributional change, if similar metering equipment is used then theoretically p Ut should remain constant. Moreover, this allows for historical and present day cross-grid comparisons to be made, with view to monitoring performance improvement or degradation. CHAPTER 3. DAILY BASELINE 91

It must be noted that, however, if the UAG was not seasonal prior to this transformation, a periodic pattern will be introduced and vice versa. p Absolute values of both Ut and Ut are also useful - especially so in the context of aggre- gation. Furthermore, a baseline specified as a percentage throughput can remain constant in time, and is also meaningful without context, as opposed to a value in energy. Indeed, according to [16], ‘The best benchmark would seem to come from tracking an individual utility’s LAUF gas percentage over time’.

3.1.2 Aggregation Frequency

Thus far in this thesis, UAG has been examined for day long intervals. Theoretically, UAG can even be thought of in continuous time. However, in practical terms it is often the case that discretisations of lower than daily granularity are impossible due to system complexity, and the multilayer interactions between different data infrastructures and accounting pro- cesses. On the other hand, it is trivial to aggregate UAG into intervals spanning multiple days - but whether this is appropriate, and whether the interpretations stemming from such quantities should be used in decision making are questions which do not have a clear answer. In business practice, quantities are often aggregated into standard calendar intervals, i.e. weekly, monthly, quarterly, yearly (n = 7, 31, 124, 365), according to some aggregation function f([U1,U2, ..., Un]). KPIs are often set based on these aggregations rather than daily values. This is also the case for UAG analysis at NG. Aggregating at higher levels reduces noise, making it easier to interpret data. However, at very high levels of aggregation, important trends and features may be obfuscated. That is why it is important to consider data at a range of aggregation levels, prior to decision making. Throughout the rest of this thesis, we will only consider daily data. However, the baseline models presented herein can be extended to provide estimates at frequencies of n > 1.

3.1.3 Aggregation Function

We begin our discussion by considering using a simple summation across arbitrary time in- tervals. Of greatest consequence is the fact that errors ‘cancel out’ under simple summation, which results in much lower average UAG as the process is centered close to 0. It is evident CHAPTER 3. DAILY BASELINE 92 that such a feature is not appropriate, as the financial implications of errors to shippers are independent of one another – in other words, a systematic over-read in one meter might not be offset by an under-read in another. However, one particularly useful result of taking a summation is that the effect of DLP, and therefore any systematic linepack estimation error is negated in the long term. In single pipelines, it can be appropriate to set targets by simple sums as there are no financial implications to error cancellation. For similar reasons, taking a mean: n 1 X U , n i i=1 is also inappropriate when assessing target performance as it allows for independent errors to cancel out. The absolute summation of UAG prevents positive and negative errors from cancelling out, and is calculated as: n X |Ui|. i=1 Therefore, a more useful mean to take is the mean of absolute values:

n 1 X |U |. n i i=1 The root mean square is another useful statistic that can help when comparing trends across intervals, e.g. weekly. It provides the benefit of negating the sign, and giving greater weight to large values: its calculation is seen below: v u n u 1 X t U 2. n i i=1

Figure 3.1 illustrates UAG as per the various aggregation functions at a monthly fre- quency, which provides a good compromise between removing noise, and not obfuscating inter-year trends. The first plot illustrates the disadvantages of only relying on a simple summation. Following late 2016, the simple sum would suggest that there was a marked decrease in UAG, lasting until 2019. However, observing the absolute sum reveals there was no apparent change. Indeed, during early 2018 it is close to 0 - this would suggest accounting efficiency is excellent, which considering the absolute sum was 400 GWh respectively, would be a wholly misleading conclusion. When simple and absolute summations are taken into account as a percentage through- put, the exact same pattern is evident. Interestingly, there is no visible structural change in CHAPTER 3. DAILY BASELINE 93

Figure 3.1: Monthly UAG, plotted using the various aggregation functions discussed in Section 3.1.3

600

400

200 Energy (GWh)

0

2015 2016 2017 2018 2019 2020 Date

Aggregation Function Absolute Sum Sum

0.8%

0.6%

0.4%

0.2% Energy (GWh)

0.0%

2015 2016 2017 2018 2019 2020 Date

Aggregation Function Absolute Sum, % Tput Sum, % Tput

20

15

10

5 Energy (GWh)

0

2015 2016 2017 2018 2019 2020 Date

Aggregation Function Mean Abs Mean CHAPTER 3. DAILY BASELINE 94 either aggregation when considering percentage throughput as compared to those performed on pure energy flows. The same pattern is also reflected when taking simple and average absolute UAG, seen on the bottom plot of Figure 3.1. In conclusion, simple summation of UAG is not indicative of accounting efficiency and efficacy. A focus on minimising this value for a given aggregation frequency can lead to operators ignoring evidence suggestive of significant issues, or indeed finding it beneficial to their goals. Therefore, KPI’s should not be based on this metric. However, monitoring it may highlight a bias in the accounting a measurement process, as can be seen in Figure 3.1, where there is a clear positive bias in the period 2015-2017, and mid 2018 onward. We recommend that a metric based on absolute values as a percentage of throughput during the aggregation period in question be used when setting UAG targets.

3.2 NTS and UAG Statistical Analysis

In this section, we will examine some key statistical features exhibited by both UAG and daily node energy values within the NTS, for the years 2012-2020. Data is publicly available ranging back to 2007, however, this includes significant systematic errors prior to 2012 which will be examined in Chapter4. Throughout this period, no significant changes or extensions to the network have been carried out. We will also make some comparisons to international transmission grids where appropriate. The R statistical programming language [70] was used to carry out the below analysis, and for all further computational work in this thesis.

3.2.1 UAG

The daily UAG time series, from 2012-2019 has previously been depicted in Figure 1.5, and some summary statistics seen in Table 1.3. Figure 3.2 contains boxplots of daily values by year; it is evident that the overall trend has remained somewhat constant, with a minor dip in 2017. The number of outliers seems to have decreased in recent years, when compared to 2015-2016. The IQR does not appear to fluctuate by large amounts either; finally, the UAG mean has been consistently positive, mostly between 5 and 15 GWh, or between roughly 0.062% and 0.221% of throughput. CHAPTER 3. DAILY BASELINE 95

Figure 3.2: UAG Boxplots, years 2014-2019, 7 outliers removed 60

50

40

30

20

10

0

Energy (GWh) −10

−20

−30

−40

−50

−60 2014 2015 2016 2017 2018 2019 2020 Date

Non-zero mean

A discussion on the fact that UAG has a consistently positive mean is warranted, as this is a strong deviation from the theoretical mean of 0. Under ideal conditions, the mean UAG can be expected to oscillate between positive and negative. Intuitively, the consistent positive mean suggests some element of systematic bias. Such a hypothesis is further evidenced by the aggregate sum being typically positive in Figure 3.1. It has been speculated by [2] this could be due to higher flow rates resulting in out-of-regime flow, outside of the calibration range of meters – or, conversely, low flow rates during summer. This has been accounted for in the modelling section below. An alternative explanation may be found in any number of the remaining sources of uncertainty in Chapter2, as bias could be present at any point. It is highly unlikely a single meter error is causing this feature, due to strict assurance schedules. Component emissions, as discussed in Section 2.2 are estimated at 0.0001 of throughput - this corresponds to a mean of 255156 KWh, with a standard deviation of 68061 KWh. As discussed previously, venting gas accounts for 0.0018% of throughput. Taking a ballpark estimate of an average 0.3 GWh total daily emissions, the error this would account for is still an order of magnitude lower than the typical daily mean of Ut and thus does not explain the consistent positive bias. The fact remains that no concrete explanation for this significant positive mean can be proffered, even in the case of National Grid, where extensive investigations have taken place. Indeed, whilst internationally mean yearly UAG is typically positive, it is not unheard of for CHAPTER 3. DAILY BASELINE 96 it to be negative; it is likely that such relatively small biases are inherent to the complexity of large transmission grids. The proximal cause in each instances will be specific to each grid. As an example of the wide range of UAG values observed internationally, Figure 3.3 plots percentage UAG versus throughput, for all US States and the District of Columbia for the year 2017. The mean UAG was 0.48%, with a standard deviation of 1.77 %, highlighting the high variation. We note that no visible effect between throughput and percentage UAG exists in this case; the Pearson correlation coefficient is 0.11. Considering the above in combination with Table 1.3, we can again conclude that the NTS’s performance is on par if not better than the international norm.

Figure 3.3: Scatter plot of US yearly UAG percentage by state, against system throughput. Note that UAG is here measured in volume rather than energy, in millions of cubic feet. Data sourced from Natural Gas Annual [85]

4e+06

3e+06 C ubic F eet) 2e+06 Volume ( M illion

1e+06

0e+00

−6 −4 −2 0 2 4 UAG (Percentage Throughput)

Stationarity

Theoretically, under idealised operating conditions Ut should be a stationary process, with no trend or seasonal component. The Augmented Dickey Fuller (ADF) test can identify whether or not a process is stationary; in the case of Ut, all years in addition to the dataset as a whole have a p-value of < 0.01, leaving strong evidence to suggest UAG is a stationary process. This is to be expected. However, this is not always the case in grids around the world; especially in new grids with developing infrastructure. An example of this case can CHAPTER 3. DAILY BASELINE 97 be seen in Figure 3.4, which depicts the monthly aggregate UAG under simple summation for the Greek transmission network. Although a direct comparison is not possible as only monthly summations are available, it is clear there has been a substantial trend over the past years, and indeed the ADF test yields a p-value of 0.4954. A long standing trend is indicative either of ongoing system modification, poor UAG controls or malfunctioning equipment; therefore, it is important to establish the presence of a unit root early on in an analysis. We can conclude it is not necessary to apply transformations such as differencing to attain a stationary time series in this case.

Figure 3.4: Greek transmission network monthly aggregate UAG in GWh

20

10

0 UAG,(GWh)

−10

−20

−30 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Date

Autocorrelation

UAG should also not be expected to exhibit any serial correlation, and the presence of signif- icant autocorrelation can indicate a systematic error in the terms comprising Ut. Therefore, we examine the autocorrelation and partial autocorrelation of UAG – plots of each statistic p with 95% significance intervals can be seen in Figure 3.5, for both Ut and Ut . All plots indicate that there is a small negative correlation at the first lag. The overall p pattern exhibited between Ut and Ut is almost identical, both in terms of autocorrelation and partial autocorrelation - suggesting that the correlation structure is preserved when scaling by throughput. No clear geometric or decay pattern can be observed, and the remaining significant lags on both types of plots are only borderline significant, and do not occur at identifiable seasonal cycles. Therefore, we can conclude that UAG is free of significant autocorrelation, with the exception lag 1, which is slightly negative. CHAPTER 3. DAILY BASELINE 98

Figure 3.5: (Partial) Autocorrelation, UAG, 2014-2018 UAG, Autocorrelation % UAG, Autocorrelation 0.10 ● ● 0.10 ● ● ● ●

● ● ● ● ● 0.05 ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.00 ● ● ● ● 0.00 ● ● ● ● ● ● ● ACF ACF ● ● ●

−0.05 ● ● ● −0.05 ●

−0.10 −0.10

● ●

1 3 5 7 9 11 13 15 17 19 21 23 25 27 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Lag Lag UAG, Partial Autocorrelation % UAG, Partial Autocorrelation

● ●

0.10 ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ●

● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ●

● ● ● ● 0.00 ●

● ● ● 0.00 ● ● ●

● PACF PACF ●

● −0.05 ● ● ● −0.05 ● ● ● ● ●

● −0.10 −0.10

● ●

1 3 5 7 9 11 13 15 17 19 21 23 25 27 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Lag Lag CHAPTER 3. DAILY BASELINE 99

Normality

measurement By far the largest component of UAG is Ut , as discussed in Chapter2. The random measurement component of Ut is normally distributed, however with varying parameters each day. Therefore, whilst not theoretically normal, it is possible and worth investigating whether

Ut is normal over longer time periods, and the nature of its distributional shape. We tested both the UAG dataset for normality, as a whole but also in 1-year intervals. The results can be seen in Table 3.1, where the Shapiro-Wilks [79] test for normality was used, as justified by the weak autocorrelation observed in the previous section. In this case, the null hypothesis H0 indicates that the time series is normally distributed, whilst the alternative hypothesis suggests the converse. Therefore, p-values under the significance level, 0.05 indicate the data is not normal. The dataset as a whole had a p-value of 0.0667, suggesting no evidence to reject normality, although at a narrow margin. However, this is contradicted by Table 3.1, where for the majority of years the data was non-normal. Figure 3.6 examines the shape of the overall UAG distribution, as compared to a normal curve with p parameters estimated from the data for both Ut and Ut . In both cases, we can see that whilst UAG does roughly have the familiar bell-shaped curve, it has a steeper peak than the normal distribution, and thinner tails. Indeed, Ut has an excess kurtosis of 1.59, indicative p of leptokurtic distribution. The same analysis was repeated using Ut , and the results were almost identical. Finally, we examined the behaviour of the distributional tails more closely, with the aid of normalised quantile-quantile plots, seen in Figure 3.7. We can see evidence of heavy tails, especially on the right hand side for both plots. This indicates more extreme values present in the data, than would be expected under a normal distribution. This, combined with the above results, provides strong evidence to suggest UAG is non-normal.

Table 3.1: Shapiro-Wilks normality test for UAG by year, 2012-2020 Year 2012 2013 2014 2015 2016 2017 2018 2019 2020 p-value 0.0672 0.0916 <0.001 <0.001 <0.001 <0.001 0.0072 0.092 0.00425

Seasonality

We further verify whether seasonality has an effect on the series via visual inspection, and a few statistical checks. These checks were performed on the monthly aggregate summed series to better capture a seasonal effect, and the daily series. Aggregating by month will CHAPTER 3. DAILY BASELINE 100

p Figure 3.6: Density plot of Ut and Ut , 2012-2018. The normal distribution is overlayed in red on both graphs. The higher peak and shorter tails are evident, as is the non-zero mean. UAG Density % UAG Density 4e−08

75 3e−08

50 2e−08 Density Density

1e−08 25

0e+00 0

−5e+07 0e+00 5e+07 1e+08 −0.02 0.00 0.02 0.04 Value Value eliminate the potential noise contribution of DLP which is present in the daily series, as DLP is a zero-sum process. From Figure 3.8, which depicts the same monthly simple and absolute summations previously discussed but plotted as year versus month, we cannot observe any clear seasonal pattern. Next an ETS (Error, Trend, Seasonal) [37] model was fitted to the daily data. Following a fit via Akaike information criterion minimisation, the optimal model which was identified did not include an additive seasonal trend. This follows the methodology of Hyndman [38]. Furthermore, when monthly dummy variables were regressed to the daily data, they were found not to be significant. Therefore, we can conclude that the UK grid does not exhibit detectable seasonality in trend at any lag, both at the monthly and daily aggregation levels. It must be noted that the lack of seasonality is surprising. It is logical to expect either a mean or variance change between seasons, especially given the amount of variation in flow rates; this can be seen in Figure 3.9. In the figure we plot the total demand (green), a single offtake’s Composite Weather Variable (CWV – red) and the corresponding UAG (blue). We see large peaks in winter in both the red and green lines, but the blue line shows hardly any difference at all. The lack of a seasonal trend is a positive indicator with respect to the metering and accounting process’s accuracy and precision; seasonality in errors could be indicative of serious underlying meter or model bias, which is exacerbated during period of high flow. If such seasonality in trend does exist, the underlying variation is sufficient so as to mask its effects on UAG. CHAPTER 3. DAILY BASELINE 101

p Figure 3.7: Normal quantile-quantile plot of Ut and Ut , 2012-2018. Points should theoreti- cally lie on the red line, if normally distributed; We can see evidence of heavy tails, especially on the right hand side for both plots. This indicates more extreme values present in the data, than would be expected.

UAG Normal Q−Q Plot UAG % Normal Q−Q Plot 1e+08 0.04 0.03 0.02 5e+07 0.01 Sample Quantiles Sample Quantiles 0.00 0e+00 −0.01 −0.02 −5e+07

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

Theoretical Quantiles Theoretical Quantiles

In the next sections, we will investigate whether a statistical relationship exists between UAG and severable variables which are speculated to be potential predictors. This is an important investigative process, and as such should be carried out prior to creating a baseline.

3.2.2 UAG Predictors

Itemised below are potential features which may serve as explanatory variables for UAG, when used in a multivariate model such as linear regression. They have been identified as such either due to the discovery of causative relationships between them and UAG as per the discussions in Chapter2, or because it is speculated a relationship is highly likely.

ˆ DLP: This term is known to contain large errors due to temperature estimation, and attributable CV & compressibility to a lesser extent. We also consider lagged values CHAPTER 3. DAILY BASELINE 102

Figure 3.8: Monthly UAG and absolute UAG totals, year vs month, 2013-2020. First row is simple summation, second row is absolute sum.

● 2013 ● 2015 ● 2017 ● 2019

● 2014 ● 2016 ● 2018 ● 2020

● ● ● ● 400 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Total Energy (GWh) Total ● ● ● ● ● 0 ● ● ● ● ● ● ●

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month

600 ● ● ●

● ● 500 ● ●

● ● ● 400 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Total Energy (GWh) Total ● ● ● ● ● ● ● ● ● ● ● ● ● ● 300 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ●

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month

of this term, as the error is conditional on past DLP estimates. Alternatively, λt and

λt−1 (linepack values on either side of the Gas Day) could be used.

ˆ Aggregate demand curve: Captures the general seasonal trend.

ˆ Dummy seasonality variables for day and month: Although no evidence for seasonality has been found, this can potentially allow finer daily/monthly effects to be captured, and serves as an additional diagnostic check.

ˆ CWV: Composite weather variable, which is a function of the temperature and wind speed at t and t−1, thereby providing a comprehensive weather predictor on a national or per-LDZ basis. This variable also includes a transformation to linearise demand- response – for a detailed calculation, see AppendixB. System performance can be expected to differ according to ambient and equipment temperature, and environmen- tal conditions where instrumentation is exposed to the diurnal and weather cycles - CHAPTER 3. DAILY BASELINE 103

Figure 3.9: UAG plotted with yearly aggregate demand and North LDZ’s CWV.

CWV North 15

10

CWV 5

0

Demand

4e+09

3e+09

2e+09

Energy (KWh) 1e+09

0e+00

UAG

1e+08

5e+07

0e+00 Energy (KWh)

−5e+07

2015 2016 2017 2018 2019 Year

perhaps resulting in a synchronous systematic error.

ˆ OUG: OUG is known to have errors due to CV & compressibility attribution, and non-integration of energy.

ˆ UAG: Lagged values may increase predictive power, although there is little evidence to suggest a strong autocorrelation effect, and therefore this is again included as a diagnostic check.

ˆ Dummy holiday variable: Demand can vastly differ across UK bank holidays, which may result in atypical operating conditions, and so these are specially accounted for via a dummy variable.

Next, we move on to some calculated predictors.

Flow-weighted meter utilisation

The metering accuracy and bias is not constant across all certified flow rates in a metering system. We therefore propose that using a representation of a meters utilization as a per- centage of its maximum certified flow could partially explain UAG. Indeed, this is a primary CHAPTER 3. DAILY BASELINE 104 reason for UAG as postulated in [2]. We would like to extend this concept to the multivari- ate case by taking a Flow-Weighted average of the Meter Utilisation (FWMU). However, working with daily data poses a problem; we are not able to calculate the flow weighted average from the recorded aggregate flows. Such an operation would be data intensive, and would have to be performed within the SCADA system. Since this data is not available, and hence the daily aggregate flow rate was used as an approximation. True meter maximums have been estimated using recorded historic maximums. This approach may warrant fur- ther exploration in the future. Below we will describe the steps currently used to calculate the approximate flow-weighted meter utilisation for a group of nodes. Note that in this instance, we calculated separate estimates for the demand and supply side, as not only are these on opposite sides of the balancing equation, but significant differences in instrumen- tation, tolerances and volumes exist between large scale input points and generally smaller exit points. We begin by performing a min-max normalisation for each node i across the desired time frame, [1,T ], which maps flow time series to a range of 0 to 1. This step serves to estimate the meter utilisation at each time step t, and is calculated as:

0 νˆi,t − minτ νˆi,1:T νˆi,t = . (3.4) maxτ νˆi,1:T − minτ νˆi,1:T

We then calculate a daily flow-weighted mean utilisation separately for the supply and de- mand sides: n n+m P s 0 P d 0 wi,tνˆi,t wi,tνˆi,t s i=1 d i=n+1 FWMUt = n , FWMUt = n+m , (3.5) P s P d wi,t wi,t i=1 i=n+1 s d where the weights wi,t and wi,t are calculated as:

νˆ0 νˆ0 ws = i,t , wd = i,t , (3.6) i,t Pn 0 i,t Pn+m 0 i=1 νˆi,t i=n+1 νˆi,t for the supply and demand sides respectively. We can then calculate the FWMU bias between the supply and demand side as

bias s d FWMUt = FWMUt − FWMUt . (3.7) CHAPTER 3. DAILY BASELINE 105

Of course, the FWMU can also be calculated as a single figure for the entire network:

n+m P wq νˆ0 i,t i,t νˆ0 FWMUtotal = i=1 , with wq = i,t (3.8) t n+m i,t Pn+m 0 P q i=1 νˆi,t wi,t i=1 If we make the additional assumption, as we did in Section 2.3.6 that measurement accuracy degrades closer to the extremes of the certified measurement range, we can modify Equation (3.4) to: 2 0  νˆi,t − minτ νˆi,1:T  νˆi,t = − α , (3.9) maxτ νˆi,1:T − minτ νˆi,1:T where α is the mean optimal measurement accuracy as a proportion of the measurement range - we take α = 0.5. This penalizes extreme values to a greater extent on either side of the flow range, with higher values indicating greater uncertainty, and potential for high UAG. Then, we denote the FWMU metrics calculated using Equation (3.9) as FWMU∗.

Meter Type

Three types of meters are used throughout the NTS; Ultrasonic, Orifice Plate and Turbine. Let h = [1, 2, 3] indicate the meter type, in the order listed above. These meter types are covered by the ISO 5167, ISO 9951 and ISO 17089 standards respectively. The former pair make up for the majority of nodes and flow in the network. However, due to the variability of demand and seasonality, the proportion of flow passing through each meter type is subject to a high amount of fluctuation, resulting in possible out of regime flow, or flow close to the certified range extremes. Each meter type will in general behave differently at the extremes compared to other types, but in a similar manner within-group. We calculated two sets of predictors to estimate a potential group effect arising out of the use of different meter types. Let daily node energy flow readings be indexed by their

h meter type, asν ˆi,t. Further, let nh correspond to the number of nodes in each category of meters. Then, we define the first set of predictors as the proportion of by meter type:

Pnh h i=1 νˆi,t Ph = Pn+m (3.10) i=1 νˆi,t The second set of predictors extend FMWU as per Equations (3.5) and (3.6), but instead CHAPTER 3. DAILY BASELINE 106 of grouping according to supply as demand, we group according to meter type h:

nh P wh νˆ0h i,t i,t νˆ0h FWMUh = i=1 , with wh = i,t (3.11) t nh i,t Pnh 0h P h i=1 νˆi,t wi,t i=1 h∗ ∗ And finally, FWMUt represents FWMU calculated according to the meter grouping above. Figure 3.10 illustrates the time series of the predictors discussed above, for the period bias 2013-mid 2019 in the NTS. We begin the discussion with the top row, where FWMUt (red) total and FWMUt (blue) are depicted. It is apparent that the yearly seasonality typically seen total in demand curves is present in FWMUt , which is to be expected as more gas is flowing bias through the system overall. FWMUt oscillates around 0 without any oblivious seasonal effect, suggesting the variation is due to operational practicalities rather than demand. The bias is usually within ±10%. We note that when including supply, demand, total or bias (FWMU) in a statistical model, care must be taken to avoid linear dependence. Looking at the proportion of network throughput by meter plot, we can observe that the vast majority of flow can be accounted for by orifice plate (red) and ultrasonic meters (blue); with both series having mirrored seasonality, with orifice plates flowing at a higher rate on average during the winter months. Turbine meters (green) are used for only a small proportion (< 10%) of flow. h From the third row, where FWMUt is depicted we can once again see the familiar seasonal pattern, reflected in all kinds of meters. There does not appear to be significant deviation across groups, following 2015. h∗ When examining FWMUt we can get a clearer picture of the extreme ranges, and we note that ultrasonic meters (blue) more frequently operate closer to their limits. A clearly seasonal pattern is less apparent here.

3.2.3 Exploratory Regression

The use of multivariate OLS regression is of a diagnostic nature. We will apply it to further investigate whether or not the predictors in Section 3.2.2 hold significant explanatory power

1 2 p with respect to UAG. The model takes the standard form; for a set of predictors xt , xt , ..., xt of size p, the response variable Ut is expressed as a linear combination of the predictors scaled CHAPTER 3. DAILY BASELINE 107

Figure 3.10: FWMU for S/D bias and total throughput; Proportion of network throughput by meter; FWMU by meter type, and FWMU* by meter type. 2014-2020, NTS FWMU

0.75

0.50 Key 0.25 S/D Bias

0.00 Total Proportion

−0.25

2014 2016 2018 2020 Date Proportion Network Throughput by Meter

0.6 Key Orifice Plate 0.4 Turbine

Proportion 0.2 Ultrasonic

0.0 2014 2016 2018 2020 Date FWMU 0.8

0.7 Key 0.6 Orifice Plate 0.5 Turbine

Proportion Ultrasonic 0.4

0.3

2014 2016 2018 2020 Date FWMU*

0.08

0.06 Key Orifice Plate 0.04 Turbine

Proportion Ultrasonic 0.02

0.00 2014 2016 2018 2020 Date CHAPTER 3. DAILY BASELINE 108

by a set of coefficients, β1...p:

1 2 p Ut = β1xt + β2xt + ··· + βpxt + εt, (3.12)

2 with the error term distributed normally εt ∼ N(0, σ ). Three models were fitted:

1 1. Model Reg : The first set of prediction variables specified in Section 3.2.2, with Ut being the response. No seasonal variables were used. The results can be seen in Table 3.2

2 put 2. Model Reg : Secondly, we fitted a model where the response variable was Ut , with OUG and DLP were also expressed as a percentage throughput. The remaining variables were normalised between 0 and 1. This model is shown in Table 3.3.

3. Model Reg3: Lastly, we fitted Model Reg1 with the addition of seasonal dummy variables representing the week day, the month, and whether or not the day is a UK bank holiday. This output is not shown due to the number of predictors, however an overall summary can be seen in Table 3.4.

The immediate impression from Table 3.4 is that, when considering the entire dataset, only around 17% of the variance can be explained by regression. This is a very low amount implying the predictors have little explanatory power over Ut, although all F-statistics were significant suggesting models do improve upon a model with no independent variables. Im- portantly, we have further evidence pointing toward seasonality not being present in Ut, as the seasonal model Reg3 failed to attain a higher adjusted R2 than the non-seasonal one, despite having 73 fewer degrees of freedom. Overall, there was little difference between the models, in terms of explanatory power, suggesting Ut is close to random and therefore free of significant systematic error. However, upon closer inspection of Reg1 and Reg2, from Tables 3.2 and 3.3 we can observe that the terms DLP, lagged DLP, along with opening and closing linepack were significant across all models, and all had the same sign (negative), as did OUG. Moreover, the fact that opening and closing linepack (calculated in volume) are significant in addition to DLP in energy, suggest an error both in energy and volume exists. This backs up the evidence from Chapter2 that a causative relationship does exist. CHAPTER 3. DAILY BASELINE 109

To better investigate the above statement, a rolling regression will be performed in the next section.

Table 3.2: Multivariate regression model, 2015-2020 for predictors in Section 3.2.2. Esti- mates, standard error and significance levels for predictors are shown, in addition to model fit statistics. Regressor βi Estimate βi Std. Error Significance Level OUG -1.43 0.12 *** DLP -0.13 0.01 *** Closing Linepack, mcm -22717.03 8117.00 *** Opening Linepack, mcm -18498.69 7550.44 ** Orifice Plate FWMU 12933411.00 19813378.00 Turbine FWMU -486687.30 4663787.00 Ultrasonic FWMU 3637674.00 12249648.00 Orifice Plate FWMU* 130014360.00 46779147.00 *** Turbine FWMU* -170518.80 26589831.00 Ultrasonic FWMU* -52492880.00 21682473.00 ** Supply -16064552.00 33035047.00 S/D Bias 17929265.00 17151729.00 ** Global CWV 330901.10 162026.90 Throughput 0.003 0.001 *** DLPt−1 0.026 0.001 *** Ut−1 -0.02 0.02 Constant Term 6779262.00 6708836.00

Observations 1,789 R2 0.18 Adjusted R2 0.18 Residual Std. Error 10513550 (df=1772) F-Statistic 24.683*** ( df = 16; 1772) Significance Key *p<0.1; **p<0.05; ***p<0.01

Rolling regression

From the previous section, we can conclude that there is weak to no evidence supporting a constant linear effect over the course of 2015-2020. However, there is also evidence to suggest certain variables, in particular DLP and its related predictors, alongside OUG are statistically significant. To explore this further, and account for the possibility of transient interactions between the above, we employ rolling regression. Window sizes were 100 and 200 days, across the same time frame. Smaller windows may allow for short term effects to be captured, which are transient with respect to the larger 5-year interval. CHAPTER 3. DAILY BASELINE 110

Table 3.3: Percentage throughout multivariate regression model, 2015-2020 for normalised predictors in Section 3.2.2. Estimates, standard error and significance levels for predictors are shown, in addition to model fit statistics. Regressor βi Estimate βi Std. Error Significance % DLP -0.193 0.011 *** % OUG -1.069 0.152 *** % DLPt−1 0.035 0.012 *** Normalised CWV 0.002 0.001 ** Normalised Opening Linepack -0.003 0.002 ** Normalised Closing Linepack -0.003 0.001 ** Ut−1 0.066 0.023 *** FWMU Supply 0.004 0.004 S/D Bias 0.0003 0.004 Total FWMU 2.346 2.067 Orifice Plate FWMU* 0.037 0.021 * Turbine FWMU -0.001 0.002 Ultrasonic FWMU* -0.03 0.011 *** Constant 0.007 0.003 **

Observations 1,859 R2 0.17 Adjusted R2 0.17 Residual Std. Error 0.005 (df = 1845) F-Statistic 29.382***(df = 13;1845) Significance Key *p<0.1; **p<0.05; ***p<0.01

Table 3.4: Regression models compared, with goodness of fit statistics Reg1 Reg2 Reg3 Multiple R2 0.1823 0.1715 0.1856 Adjusted R2 0.1749 0.1657 0.1703 F-Statistic 24.68 29.38 12.12 p-Value <0.01 <0.01 <0.01 DF 1772 1845 1755 RSE 10510000 0.005477 12500000 CHAPTER 3. DAILY BASELINE 111

LMG, an importance metric first described in [52] and named after the authors of said work, was chosen as a measure of variable importance. This is a decomposition of the R2 coefficient, calculated on a per-variable basis. It has the property of summing up to the total R2. We will briefly cover its derivation below, following that in [32]. Let S be a set of predictors, and let SS indicate the sum of squares. Then,

Model SS R2(S) = . (3.13) Total SS Additional R2 when adding regressors from set M to a model with regressors from set S is then given as: seqR2(M|S) = R2(M ∪ S) − R2(S) (3.14)

Let r = (r1, r2, ..., rp) denote the indices of the regressors x1, x2, ..., xp. Let Sk(r) denote the set of regressors contained within the model before regressor xk, in the order r. Then the 2 portion of R allocated to the regressor xk in the order R is:

2 2 2 seqR ({xk}|Sk(r)) = R ({xk} ∪ Sk(r)) − R (Sk(r)) (3.15)

Finally, the LMG statistic can be defined as follows, with n(S) returning the size of set S:

1 X LMG(x ) = n(S)!(p − n(S) − 1)!seqR2({x }|S). (3.16) k p! k S⊆{x1,...,xp}n{xk}

Reg2 was used for the rolling regression model, as this avoided computational in the evaluation of LMG errors due to singular matrices. The computation was carried out in R using the implementation provided in the relaimpo package [32]. The results can be seen in Figure 3.11. One key interpretation that can be drawn is that there is significant fluctuation in the R2 coefficient, indicated by the total level on the graph, across the time range. Indeed, entering 2020 R2 was almost 0.4, and this coincides with a large increase in UAG as seen in Figure 1.5. This can be explained by the fact that systematic errors are transient in time depending on source, and their contribution to UAG also varies.

The effect of DLP is the most important, in terms of explaining the variance of Ut. This is more evident when we combine all the variables related to linepack, as seen in the second row where they are merged into the red colour. This consistently accounts for an excess of 50% of the variance explained by R2, and can be considered as overwhelming evidence CHAPTER 3. DAILY BASELINE 112 that the linepack calculation is the leading source of error. OUG, and the different FWMU metrics are much less prominent. The fact that the term Ut−1, is significant, particularly in 2016, can also be interpreted as DLP related, as this term contains information about any errors within the previous day closing linepack, which is equivalent to the next opening linepack. Figure 3.11 also suggests that either the approximation to the true daily FWMU is inaccurate to an extent that it does not capture the real effect, or that there is no significant systematic errors stemming from meters operating close to their calibrated limits. This remains an avenue for future research. Before moving on to consider the applications of univariate time series models in mod- elling UAG, it must be once again emphasised that multivariate models making use of UAG constituents such as DLP and OUG should be solely of diagnostic value. If a link is discov- ered, then this should be remedied in the first instance via an operational solution of the underlying cause, where it is possible to so - rather than using the multivariate model to create wider confidence intervals for the UAG. The exception to the above would be using predictors such as FWMU, which account for systematic error in much the same sense that we discussed in the meter error models of Chapter2.

3.2.4 Nodes

We briefly present a discussion of the various types and behaviours of time series representing aggregate daily flows found in a gas transmission network. In Figure 3.12 the plots are labelled as corresponding to one of the following categories, starting with single feed in the top left and moving in a clockwise order:

1. Single feed distribution network offtakes: The downstream distribution system in this case is exclusively fed by the site. This results in an uninterrupted flow which will follow the aggregate demand closely (as can be seen, for example, in Figure 3.9), and is predictable to some extent through variables such as temperature and sunlight.

2. Switching offtakes: Such offtakes feed complex distribution grids, which can accept flows through multiple sites. Hence, flow proportions can vary between sites depending on operational constraints and demand – for example, flow can be transferred due to CHAPTER 3. DAILY BASELINE 113

W V W V DLP FWMU S/D Bias FWMU Supply Global C Orifice Plate FWMU* OUG FWMU* Total FWMU Turbine Lag 1 UAG FWMU* Ultrasonic Key Closing Linepack, volume DLP Lag 1 DLP, FWMU S/D Bias FWMU Supply Global C Opening Linepack, volume Orifice Plate FWMU* OUG FWMU* Total FWMU Turbine Lag 1 UAG FWMU* Ultrasonic Key 2020 2020 2019 2019 2018 2018 Date Date 2017 2017 2016 2016 Normalised LMG, window size = 200 Normalised size LMG, window LMG, window size = 200 size LMG, window

1.00 0.75 0.50 0.25 0.00

0.4 0.3 0.2 0.1 0.0 LMG R2 to Normalised LMG, 2020 2020 2018 2018 Date Date 2016 2016 Normalised LMG, window size = 100 Normalised size LMG, window LMG, window size = 100 size LMG, window

1.00 0.75 0.50 0.25 0.00

0.6 0.4 0.2 0.0 LMG R2 to Normalised LMG, Figure 3.11: LMGbottom calculated row. for rolling windows of sizes 100 (R) and 200 days (L), 2015-2020. Normalised view can be seen on the CHAPTER 3. DAILY BASELINE 114

demand dipping below the minimum flow rate required by a site’s instrumentation. Another example is nodes which are only active during the winter months, or hours of high demand. This results in unpredictable, discontinuous behaviour which cannot be explained out of context. A smoother time series can be obtained by combining the flows of such grouped sites, and this approach will be further examined in Chapter5.

3. Industrial sites: These vary; some exhibit fairly constant flow, whereas others fluc- tuate to a large degree. The underlying industrial production processes cannot be predicted, nor can their behaviour. Seasonality is not normally a factor, as demand from heavy industry is typically constant throughout the year.

4. Storage sites: Such sites exhibit flows dictated by the network’s operational require- ments and investment strategies based on stochastic control. This will be largely unpredictable, and unexplainable a posteriori through purely quantitative means.

5. Supply terminals: These will either supply a steady flow in the case of gas originating from offshore platforms, or exhibiting some seasonality in the case of LNG imports.

6. Power stations: Seasonal as a group, but individually discontinuous and unpre- dictable – their behaviour will correspond to fluctuations in the electrical grid demand along with corresponding operational practicalities.

It is worth taking a moment to consider the distribution of flow across nodes. This is explored in Figure 3.13, where the mean non-zero flow of demand and supply nodes is plotted in order. Most striking is the plot of demand nodes; the distribution approaches that of an exponential. The vast majority of daily flow is accounted for by a relatively small number of nodes, or key sites. CHAPTER 3. DAILY BASELINE 115 2020 2018 2016 Storage Withdrawal: Hornsea Withdrawal: Storage Industrial: Indeos Nitrile Plant Switching Offtake: Faringham B Faringham Switching Offtake: 2014 0 8 6 4 2 0 0 75 50 25 125 100 400 300 200 100 Year 2020 2018 Figure 3.12: Depiction of site flow profile for a variety of site types 2016 Power Station: Winnigton Power Terminal: St Fergus Shell St Fergus Terminal: Single feed offtake: Lockerbie offtake: Single feed 2014 6 4 2 0 5 0 0

20 15 10

300 200 100 Energy (GWh) Energy CHAPTER 3. DAILY BASELINE 116

Figure 3.13: Mean daily non-zero flow from demand (Right) and supply (Left) nodes. Note the difference in scales, along with the exponential pattern displayed by the demand nodes’ distribution.

● ● 500 ● ●●

● 60 ●

400 ●

● ●● ●● ● ● 300 ● ● ●● 40 ●● ●●● ●● ● ● ● ●● ● ●● ● Energy (GWh) Energy (GWh) ●● 200 ●● ● ●● ●● ●● ●● ●● ● ● ●● ● 20 ●●● ●● ●● ● ●● ● ● ● ●●●● ● ●●● 100 ●●●● ● ●● ● ● ●●● ● ● ● ●● ●●●● ●●●●●● ●●●●●●●● ● ●● ●●●●● ●●●● ● ●●●●● ●●●● ●●●● ●●●●● ●●●●●●● ● ●●●●●●●●●● ●●●●●●●●●●●●●●● ● 0 ●●●●●●●●●●●●●●●●● 0

5 10 15 0 50 100 150 Node order statistic Node order statistic

3.3 Baseline Model

In this section, we will initially specify two different approaches to calculating a daily baseline for UAG. The first will combine all elements uncovered in the analysis in Chapter2 into a composite model. The latter will fit standard time series models to historic UAG. These two approaches will then be combined, formulating the UAG baseline. We will then test these baselines against historic data, and also investigate whether UAG can be accurately forecasted.

3.3.1 Uncertainty Based Approach

Having identified – with the exception of the unknown unknowns – all significant sources of uncertainty in the UAG calculation, in this section we combine them to produce a compre- hensive model for UAG uncertainty. The following components make up the Composite Model:

ˆ Meter error component, accounting for both random and small-scale systematic errors. These models are as described Section 2.3.6.

ˆ Additional uncertainty penalty due to use of attributed sites. We implement this in two ways:

1. With the inclusion of a penalising term based on total throughput when using meter error Model 1. CHAPTER 3. DAILY BASELINE 117

2. Through assigning a fixed proportion (p = 0.05) of site’s attributed status when simulating through Monte-Carlo with meter error Model 4, and penalizing this through increasing their uncertainty as described in Section 2.4.2.

In practice, this approximation is not necessary as the exact sites using attributed value will be known.

ˆ DLP uncertainty term, accounting for both instrument uncertainty, and attributed CV as discussed in Section 2.5.

ˆ OUG uncertainty, estimated as equal to linepack uncertainty as attributed measure- ments are also used.

ˆ Emissions compensation term, correcting model mean.

The above features are added in quadrature, as they can be considered independent, to estimate the UAG uncertainty. Furthermore, the model includes a centring correction term φ, accounting for the component emissions, with a value of 0.001 time the total system throughput, as estimated by NG. We present three estimates of the daily standard deviation for the NTS, defined below.

The first estimate, σc1 uses Meter Error Model 1 and includes a penalty terms based on the total metered flow. The second estimate σc2 uses Meter Error Model 4, and the last estimate σc3 is as above but introduces an additional penalty based on the LP temperature estimations issues.

v u n+m u 2 2 ˆ 2 ˆ 2 X 2 σc1 = tu (Model 1) + u (λt) + u (λt−1) + (0.0176ˆot) + (0.0176)(0.05) νˆi,t, (3.17) i=1

q 2 2 ˆ 2 ˆ 2 σc2 = u (Model 4) + u (λt) + u (λt−1) + (0.0176ˆot) , (3.18)

q 2 ∗2 ˆ ∗2 ˆ 2 σc3 = u (Model 4) + u (λt) + u (λt−1) + (0.0176ˆot , (3.19)

∗ where u in σc3 includes a constant penalty accounting for temperature-related DLP un- certainty – 0.3 is used as a ballpark figure in the absence of a more accurate estimate, as discussed in Section 2.5.1. CHAPTER 3. DAILY BASELINE 118

The decision interval D in the case of the composite error approach is then calculated as follows:

a=0.95 φ ± t σci , (3.20) with put φ = hTt , (3.21) where h = 0.001, accounting for leak emissions. From herein, the baselines generated by the above method will be referred to as Composite Models 1-3 respectively, calculated by substituting Equations (3.17)-(3.19) into (3.20) as needed.

3.3.2 Statistical Approach

Univariate time series models can be used to model UAG and predict decision intervals, strictly under the condition that they are trained on data free of significant error. In other words, the training dataset should only ideally feature the effects of random meter error, and small scale systematic meter error. This provides a convenient, fast and easy method of capturing the characteristics of Ut without having to carry out time consuming investigations, such as those in Chapter2. In the real world, it is not realistic to expect data to conform to the above idealised requirements. That is why, training data should be pre-processed to remove outliers and known historic errors. The historic errors referred to in this instance include any individual error that can be attributed to a specific cause; e.g. accounting errors that are identified and reconciled. The models should additionally be constantly validated using new data. In our case, following the removal of known days containing errors, we further remove outliers as identified by the approach in [14], and implemented into R in the tsoutliers package.

Models

We will provide only a short introduction to the some suitable models, as they are all commonly used in forecasting practice today.

ˆ ARIMA: These models, as described in [11] and [53] have been a stalwart of fore- casting and modelling for the past few decades, and remain a highly popular class of CHAPTER 3. DAILY BASELINE 119

models. AIC minimisation was used to identify the best fitting model specification. This was identified as ARIMA(1,1,2).

ˆ NNETAR: A feed-forward neural network, as described in [93]. This class of neural networks was one of the first, and simplest developed. The optimal parameters for the univariate model were NNAR(13,7). Exogenous variables can also be included as inputs. In our case, a model with DLP as the exogenous variable was also specified, and the fitted parameters in that case were NNAR(13,9). This was done to investigate whether the inclusion of DLP can improve prediction, rather than as a robust baseline model.

ˆ ETS: An exponential smoothing state space model, as described in [37]. This is an univariate model capable of accounting for complex seasonality, and can be viewed as an extension to the ubiquitous exponential smoothing models.

ˆ TBATS: This is another popular univariate algorithm, which is based on the exponen- tial smoothing method in addition to a Box-Cox transformation, ARMA modelling of residuals and a trigonometric expression of seasonality terms [18]. It has found much traction in recent times, and is typically seen as superior to the above.

The decision interval D is in the case of the statistical models equivalent to the one step ahead prediction interval. A prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. The probability, or confidence level used remains at 95%. It is important to note that these intervals are not to be confused with confidence intervals, which relate to the probability of a parameter such as the mean falling within a given range. The prediction intervals for all but NNETAR are calculated using well known analytic formulae, whilst simulation is used in the latter case.

Point Forecasts

We will next assess the models’ capability of producing accurate point forecasts for Ut. Moreover, it allows us to easily compare across the various models fit, as using the AIC or BIC is not possible for all models. The testing methodology used is discussed below, following a review of the results. CHAPTER 3. DAILY BASELINE 120

Test and Train Data

The training, or in-sample was defined as the UAG and predictors from 28-01-2015 to 23- 05-2018, for a total of 1212 days. This is the maximum available data allowing for a well sized test set. The test data was defined as the UAG and its predictors from 24-05-2018 to 02-06-2019, for a total of 376 days. Importantly, this is greater than a year thus allowing for testing of the models across the entire demand pattern.

Benchmarks

To benchmark the predictive power of the aforementioned models, we will use the following test forecasts:

ˆ Naive Forecast: Ut+1 is predicted as Ut

ˆ Historic Forecast: Ut+1 is set to Ut−365

ˆ 1 Pt Fast/Slow Mean: Ut is set to n i=t−a Ui, with a ∈ (7, 31).

ˆ Theoretical Forecast: Ut = 0, as per the expected value.

It is postulated that if complex models cannot improve upon the metrics provided by the above, then there is evidence to suggest no additional information about Ut can be extracted from the system information set.

Metrics

The metrics used to assess the predictive power will be based on 1-step ahead forecasts. Whilst it can be difficult to compare the quality of fit across different types of models on the training data, out of sample testing allows us to leverage standard forecasting metrics. The following standard metrics will be used, where y denotes the predictors, andy ˆ the fitted values :

ˆ 1 Pn 2 Mean Square Error (MSE): n i=1(yi − yˆi) . √ ˆ Root Mean Square Error (RMSE): MSE

ˆ Mean Absolute Percentage Error (MAPE): 1 Pn yi−yˆi n i=1 yi CHAPTER 3. DAILY BASELINE 121

ˆ 1 Pn Mean Absolute Error (MAE): n i=1|yi − yˆi|

ˆ Relative Standard Deviation (RSTD) σyˆ µyˆ

ˆ Normalised Mean Square Error (NMSE): MSE σyˆ

In all cases, a lower value indicates a better model fit.

Results: 1-step ahead forecast

The aforementioned metrics can be seen calculated for the test period across the discussed models in Table 3.5 and 3.6. We summarise the main conclusions that can be drawn:

ˆ All models performed significantly better than the naive and historic forecasts, when considering the MAE, MSE, and RMSE. However, their performance improvement was only minor when compared to that of forecasts based on both the fast and slow mean. Indeed, the best performing model in terms of MAE was only around 10% better, with most models scoring almost equal with the benchmarks. This trend continues when taking other metrics into account.

ˆ Re-evaluating model fit parameters to include all previous observations resulted in only a minor improvement (3% MAE reduction) in the case of the best model.

ˆ Overall, it cannot be claimed that any model is capable of forecasting the UAG to any great accuracy, as evidenced by both the MAE and the RMSE (variance), which was at 11 GWh for the best model.

ˆ Although multivariate models attained the lowest MAE and MSE, they did not fare considerably better than univariate ones, especially considering their higher degrees of freedom.

ˆ Considering the performance of 1-step forecasts, it does not appear necessary to trial multiple step ahead forecasts and they are highly unlikely to be of any practical value.

These results are encouraging, and they suggest that the UAG exhibited by the National Grid is unpredictable, as it should be. Furthermore, they are in line with the low R2s calculated with the regression models. CHAPTER 3. DAILY BASELINE 122

Table 3.5: Forecast metrics for 1-step ahead UAG forecasts, with model parameters constant. Values in GWh. MAE MSE RMSE MAPE NMSE RSTD NNETAR (Exg.) 9.01 139 11.8 8.83 4.76 2.67 Reg1 9.81 163 12.8 3.74 4.07 1.99 Reg3 10.0 164 12.8 3.80 4.10 1.9 TBATS 10.1 162 12.7 37.38 15.68 2.93 ARIMA 10.1 162 12.7 480.38 12.00 3.04 NNETAR 10.1 166 12.9 8.66 11.55 2.75 ETS 10.1 167 12.9 4.95 24.47 3.03 Slow Mean 10.3 168 13.0 17.10 15.53 2.88 Fast Mean 10.8 187 13.7 7.63 8.42 2.95 Theoretical 11.0 192 13.8 Inf Inf Inf Historic 14.4 334 18.2 7.98 2.15 9.52 Na¨ıve 15.3 377 19.4 4.22 2.21 4.17

Table 3.6: Forecast metrics for 1-step ahead UAG forecasts, with model re-fitting with each new data point. Values in GWh. MAE MSE RMSE MAPE NMSE RSTD NNETAR (Exg.) 8.76 130 11.4 15.60 4.41 2.69 Reg1 9.18 141 11.9 3.25 4.33 2.00 Reg3 9.19 137 11.7 46.1 4.46 2.00 NNETAR 9.96 164 12.8 14.9 10.87 2.84 TBATS 1.00 162 12.7 10.2 15.39 2.89 ARIMA 1.01 162 12.7 24.92 11.55 3.01 ETS 1.02 167 12.9 4.70 24.75 3.04 Slow Mean 1.03 168 13.0 17.10 15.53 2.88 Fast Mean 1.08 187 13.7 7.63 8.42 2.95 Theoretical 1.11 192 13.8 Inf Inf Inf Historic 1.45 334 18.3 7.99 2.15 9.52 Na¨ıve 1.54 377 19.4 4.22 2.21 4.17 CHAPTER 3. DAILY BASELINE 123

3.3.3 Aggregate Baseline Model

We will now discuss heuristic methods for combining the decision intervals as calculated by the composite and statistical models above, with a focus on the practical justification specific to our scenario for each method. It has previously been shown [66] that combining (in that instance, prediction) intervals can improve accuracy, often reducing overconfidence. Moreover, by aggregating the uncertainty models with the univariate time series models, we are able to bridge the gap between the theoretical expectation of UAG and the observed reality, creating a well-rounded baseline which should account for all aspects of the behaviour of Ut. The heuristics used have previously been considered by numerous studies, including in [33] and [30]. To summarise, we aggregate two key classes of models:

(i) Composite error models, based on the uncertainty approach

(ii) Univariate models, trained on error-free data – encompassing all error types, and can account for seasonality

Listed below are the heuristics applied. Given a set of lower and upper decision intervals,

∗ ∗ of size n: L = L1, ..., Ln, U = U1, ..., Un, we define the aggregate intervals as L and U respectively. Note that this notation applies only for following two sections, and is not to be confused with UAG.

ˆ ∗ 1 Pn ∗ 1 Pn Simple Average: L = n i=1 Li, U = n i=1 Ui. All prediction intervals are equally weighted, and a simple average is taken. This heuristic is frequently used due to the fact that it is the simplest aggregation, with overall good performance and robustness.

ˆ ∗ 1 Pn ∗ 1 Pn Weighted Average: L = n i=1 LiPi, U = n i=1 UiPi, where Pi represents a vector of weights. This allows the forecaster to place greater importance on certain models, enabling him to insert his own intuition and experience into the calculation. In our case, we used a vector of P = [0.1, 0, 0.1, 0.2, 0.2, 0, 0, 0, 0.4, 0, 0, 0, 0, 0], plac- ing greater weight on the random meter error models and the exogenous NNETAR which can specifically account for DLP error, with less weight given to the remaining univariate models. CHAPTER 3. DAILY BASELINE 124

ˆ Median: L∗ = Median(L), U ∗ = Median(U). Again, a simple method which is more desirable in situation where extreme values may have a significant influence.

ˆ Envelope: L∗ = Min(L), U ∗ = Max(U). This takes the strongest possible approach to reducing overconfidence to a minimum. In our scenario, values that are outside of this aggregation can be assumed with a very high degree of certainty to be anomalous and warranting of further investigation.

ˆ ∗ 1 Pn−k ∗ 1 Pn Exterior Trimming: Here, L = n−k i=1 Li, U = n−k i=k+1 Li, with k = int(βn). This is similar to averaging, with the exception that a number of lower and upper endpoint of the lower and upper intervals are trimmed, removing the effect of outliers, and reducing overconfidence. The constant β sets the proportion of forecasts to be trimmed: as suggested by [30], we set k at 2, but also calculate the heuristic at k = 3.

ˆ ∗ 1 Pn ∗ 1 Pn−k Interior Trimming: L = n−k i=k+1 Li, U = n−k i=1 Li. The same principles as above apply, with the exception that trimming happens on the opposite ends respec- tively. This is done to address underconfidence in an average forecast. Both interior and exterior trimming has a greater effect on the aggregate interval when k is large.

3.3.4 Decision Intervals

In this section, we evaluate the decision intervals generated by all approaches specified above, in addition to the pure meter error models from Chapter2. The intervals will be compared ˆ against historic values of Ut, and also against the following two benchmarks:

1. Legacy 20 GWh Decision Interval: The current practice is to investigate all UAG values ±20 GWh. Clearly, this does not factor in seasonality or any form of underlying variation. However, it has been developed following years of practical results, and it is therefore interesting to see how the advanced methods compare – rule of thumb methods lack sophistication but if they cannot be improved upon through advanced methods, then such methods are unnecessary.

2. Bollinger Bands: This can also be described as rolling standard deviations centred CHAPTER 3. DAILY BASELINE 125

around the rolling mean.

v t+k u t+k 1 X u 1 X 2 BB = x ± nt (x − x¯) , (3.22) k i k − 1 i i=t i=1 where k is the rolling window size and n the number of standard deviations required; we used n = 2, 3, corresponding roughly to 95% and 99% confidence levels. Whilst this will only be a decision interval if the data within the window is error-free, and cannot account for long term trends or seasonality, it is another simplistic method which will provide a useful comparison to more complex methods.

As an aside, it must be noted that the test and train datasets are reversed in this section, i.e. models are trained on what was previously defined as the test set. This is due to data availability limitations, with respect to the historical errors which are later analysed.

3.4 Model Performance Results

The performance of individual models and benchmarks against historic UAG can be seen in Table 3.7, whilst that of aggregate methods is seen in Table 3.8. Interestingly, all models exhibited a strong positive or very small negative bias, including the composite models which correct for emissions. The positive mean UAG may explain this in terms of the random error models, which are centred around 0. Model 3 was clearly too sensitive, with the no- correlation assumption proving to be of a significant difference. Indeed, when compared to the statistical methods the pure meter error and composite approaches were much more overconfident. This is to be expected, as they cannot account for all error sources. An average of 10.3% of days proved to be outside the decision intervals. Surprisingly, the legacy benchmark proved comparable in general to the 95% decision intervals, despite being entirely arbitrary – this suggests the operators’ years-long experience of investigating errors and their ’feel’ for the system may have resulted in a simple, yet effective strategy. From the aggregate methods, all models except the envelope and interior trim methods showed overconfidence. This is to be expected, as these heuristics specifically guard against overconfidence. CHAPTER 3. DAILY BASELINE 126

Table 3.7: Number of days exceeding upper (+) and lower (-) decision intervals, over the period 16-02-2015 to 23-05-2018. Bias indicates the difference between percentage upper and lower interval incursions. All decision intervals are at the 95% level. + %+ - %- Bias Total % Total ARIMA 31 0.026 43 0.036 -0.010 74 0.062 TBATS 79 0.066 45 0.038 0.028 124 0.104 ETS 65 0.054 41 0.034 0.020 106 0.089 Met. Model 1 106 0.089 28 0.023 0.065 134 0.112 Met. Model 4 150 0.126 42 0.035 0.091 192 0.161 Met. Model 3 246 0.206 88 0.074 0.132 334 0.280 Met. Model 2 82 0.069 24 0.020 0.049 106 0.089 NNETAR 80 0.067 89 0.075 -0.008 169 0.142 NNETAR (exo) 83 0.070 95 0.080 -0.010 178 0.149 Reg1 25 0.021 24 0.020 0.001 49 0.041 Reg3 26 0.022 22 0.018 0.003 48 0.040 BB95 32 0.027 12 0.010 0.017 44 0.037 BB99 6 0.005 1 0.001 0.004 7 0.006 20 GWh 110 0.092 39 0.033 0.060 149 0.125 Composite 1 106 0.089 28 0.023 0.065 134 0.112 Composite 2 148 0.124 42 0.035 0.089 190 0.159 Composite 3 125 0.105 31 0.026 0.079 156 0.131

Table 3.8: Number of days exceeding upper (+) and lower (-) decision intervals, over the period 16-02-2015 to 23-05-2018 for aggregate models. Bias indicates the difference between percentage upper and lower interval incursions. All decision intervals are at the 95% level. + %+ - %- Bias Total % Total Mean 54 0.045 29 0.024 0.021 83 0.070 Weighted Mean 57 0.048 36 0.030 0.018 93 0.078 Median 69 0.058 33 0.028 0.030 102 0.085 Envelope 16 0.013 8 0.00007 0.007 24 0.02 Interior, β = 0.1 48 0.040 27 0.023 0.018 75 0.063 Interior, β = 0.3 36 0.030 19 0.016 0.014 55 0.046 Exterior, β = 0.1 63 0.053 31 0.026 0.027 94 0.079 Exterior, β = 0.3 88 0.074 41 0.034 0.039 129 0.108 CHAPTER 3. DAILY BASELINE 127

3.4.1 Additional Diagnostics

Table 3.9 contains some further diagnostics, for both individual models and aggregate heuris- tics. These are as follows, of which 1-4 are as reviewed in [33], and (5) is specific to our scenario:

(1) Average S-score:

T 1 X α S(L, U, x) = − (U − L ) − (L − x )+ − (x − U )+, (3.23) T 2 t t t t t t t=1

where xt is the realised value of the forecasted quantity at time t, α the confidence level, and the notation (x)+ is equivalent to max(x, 0). This is a quantile scoring rule, developed in [42]. This measure is always negative, with a higher score being better. It is higher if the interval is as narrow as possible, whilst still retaining a capture rate equal to that of the significance level.

(2) Relative frequency (RF):

T T 1 X X RF = I(x > U ) + I(x < L ), (3.24) T t t t t t=1 t=1 where T is the number of predictions and I is the indicator function. Ideally, RF should be equal to significance level 100(1 − α)%. It can be seen as an overall measure of calibration; if RF < 100(1 − α)%, the model is overconfident, and the converse indicates underconfidence.

(3) Average interval width (IW):

T 1 X IW = (U + |L| ). (3.25) 2T t t t=1 This is a good measure of the degree of uncertainty around the variable, and can serve to explain the RF.

(4) Mean absolute error: In terms of intervals, the MAE is defined as

T 1 X MAE = |M − x |, (3.26) T t t t=1

where Mt is the midpoint at time t. This provides similar information to interval width, telling us how well-located the intervals are, with a lower score being better. CHAPTER 3. DAILY BASELINE 128

(5) Excess UAG: The amount of UAG outside the baseline. Following the notation in Section 3.1, we define excess UAG as:

E t + + t − − Ut = I(U > dt )(Ut − dt ) + I(U < dt )(Ut − dt ), (3.27)

E E+ E− where I is the indicator function. Using Ut , we can also define Ut and Ut as E the negative and positive part, alongside the absolute excess Ut and percentage E |Ut | throughput absolute excess put . Across a time interval, the above quantities are Tt summed with respect to time to allow for a comparison across baselines. Excess UAG can be interpreted as the amount of UAG that cannot be accounted for by the baseline model.

From Table 3.9 we can see that once again, linear regression performed better than both the statistical and composite error models, having the highest S-score, and an RF close to 95%. This is not surprising, as despite the low R2 the explanatory variables have significant predictive value (in particular, DLP), and this information is not available to be encoded into the intervals generated by the other classes of models. However, as previously discussed, it must be emphasised that these regression models must not be used as a foundation for baseline intervals but only as a diagnostic tool. Excluding linear regression, the mean alongside the interior trim with β = 0.3 performed the best, also having relative frequencies close to 0.93. ARIMA was the best performing statistical model, in terms of S-score and RF with NNETAR being vastly overconfident. Looking at the positive and negative part of excess UAG, we again note a clear positive bias, this time across all models. Excess UAG as per the mean baseline was at 0.034 % of throughput. Unfortunately, it is hard to put this value in context as this has not been calculated previously for other grids. However, it represents an exceedingly small fraction of throughput, and should therefore be interpreted in a positive light with respect to the NTS metering efficiency and accounting process. The key interpretation of the baseline results with regards to UAG then is that the overconfidence displayed by uncertainty models, coupled with the slight under-confidence of the regression models which use independent variables to predict UAG, indicates once again that UAG contains systematic error most likely originating in DLP, which cannot be entirely accounted for. CHAPTER 3. DAILY BASELINE 129

Table 3.9: Diagnostic measures for prediction intervals. Interval width and MAE are in GWh. Excess UAG is the summed quantity across the interval. Key columns used to assess the intervals are S-score, where a lower value is better, and RF, where ideally the value must be as close to 0.95 as possible. E E+ E− E |Ut | Model S-score RF IW MAE Ut Ut Ut put Tt Uncertainty Met. Error Model 1 -2305023 0.888 45.30 20.19 1127 271 1399 0.000450 Met. Error Model 2 -2196580 0.911 48.71 21.54 915 252 1168 0.000376 Met. Error Model 3 -3271107 0.720 27.00 13.13 2377 720 3097 0.000997 Met. Error Model 4 -2553373 0.839 36.95 16.70 1539 405 1944 0.000626 Composite Model 1 -2276560 0.891 45.95 20.4 1075 269 1345 0.000433 Composite Model 2 -2178363 0.916 49.36 21.82 875 250 1126 0.000363 Composite Model 3 -2109222 0.929 52.83 23.37 766 174 940 0.000303 Statistical ARIMA -2053757 0.938 54.16 23.81 486 349 835 0.000269 TBATS -2320781 0.896 47.05 21.18 934 431 1365 0.000439 ETS -2223747 0.911 47.55 21.24 808 427 1235 0.000397 NNETAR (exo) -2517667 0.851 43.70 19.14 853 825 1678 0.000540 NNETAR -2413404 0.858 42.64 18.70 862 688 1550 0.000499 Regression Reg1 -1723346 0.959 50.09 21.96 443 119 562 0.000181 Reg3 -1721900 0.960 49.73 21.80 451 118 569 0.000183 Aggregate Mean -2009093 0.918 44.81 19.72 749 312 1060 0.000341 Weighted Mean -1983954 0.922 45.31 19.93 720 296 1016 0.000327 Median -2081898 0.909 43.99 19.42 827 345 1172 0.000377 Envelope -1965990 0.980 65.33 29.09 340 57 397 0.000128 Interior Trim β = 0.1 -1975611 0.929 46.84 20.61 680 280 960 0.000309 Interior Trim β = 0.3 -1932014 0.942 50.37 22.17 580 223 803 0.000258 Exterior Trim β = 0.1 -2062206 0.909 42.76 18.85 826 359 1185 0.000381 Exterior Trim β = 0.3 -2208589 0.881 38.93 17.30 1022 451 1474 0.000474 Benchmark BB95 -1754132 0.963 59.89 26.02 252 55 306 0.000099 BB99 -2183680 0.994 85.48 38.47 51 5 56 0.000018 Legacy 20 GWh -2250973 0.875 40.00 17.61 1172 320 1492 0.000480 CHAPTER 3. DAILY BASELINE 130

3.4.2 Baseline Calculation Methodology

To summarise, we can recommend that the UAG baseline calculation methodology is to combine both statistical UAG models trained on UAG data that is free of ’abnormal’ errors, with uncertainty models calculated following a comprehensive study of the system. Based on our results for the NTS, we recommend using the envelope method to provide concrete evidence for unacceptable daily UAG, and the mean or interior trim methods to serve as an overall indicator. This baseline methodology is further summarised in Figure 3.14.

Figure 3.14: UAG Baseline evaluation process

Node flows and operational data

Additional sources of Meter error models error and uncertainty (LP, emissions...)

Composite error model

Aggregation methods UAG Baseline model

Validation & Statistical model fit calibration

Known error & outlier removal

Historic UAG data

3.5 Efficacy of Baseline Method

In the next section, we explore historic known errors during the same interval as the pre- diction intervals above were fitted, and determine to what extent decision making based on these models improves upon the legacy baselines, and whether or not reliance on a decision CHAPTER 3. DAILY BASELINE 131 interval methodology is sufficient for identifying reconcilable errors, which we define as errors due to a single ascertainable and quantifiable cause - for example a large accounting error. We will also discuss the interpretation of a daily UAG value, in relation to the baseline.

3.5.1 Nature of Error Identification

Before we proceed to analyse suitable investigation decision intervals, we will discuss how errors are identified in a modern transmission system, and statistically analyse those expe- rienced by the NTS in recent years. Note that here we explicitly focus on single-day errors, caused by meter or data error. Errors in daily flow volumes are discovered via two pathways:

(i) Investigation following an abnormal UAG day, which is defined as one whose total energy flow is outside the decision interval. This investigation is often inconclusive, because attributing an error to a node whilst limited to daily data is a difficult task, as will be later discussed. Presently, NG carries out further investigations into possible

node flow errors only when |Ut| > 20GWh. These investigations consist of recalculating the UAG from source again, followed by numerous data validation checks. From 20- 07-2016 though to 10-01-2018 (a total of 548 days), there were 58 UAG days that were flagged as suspicious according to this selection process. 4 cases were resolved, meaning investigations led to UAG being reduced. It must be noted that these 4 cases were all instances of extremely high UAG for the NTS, namely above 30 GWh.

(ii) Retrospective updates stemming from outside the balancing environment, e.g. a sup- plier submitting corrected flow data, or simply submitting previously unavailable flow data. Reconciliation following identification of meter or data error also fall into this category.

3.5.2 Historic Errors

A dataset of historic errors in the NTS, which were discovered via the two pathways above was compiled. The errors included were of any operational origin, that is to say, they involved both meter and data errors. This data was further filtered to include only errors of a magnitude greater than 0.5 GWh (equivalent to roughly 0.02% of throughput), as CHAPTER 3. DAILY BASELINE 132 individually errors of lower magnitudes are unlikely to have any appreciable effect on UAG. They spanned from 24-03-2015 to 17-05-2018, and were 53 in number. These errors can be seen plotted against the respective daily UAG, in terms of pure daily energy flow, in Figure 3.15. Initially, we can observe that the majority lie within the historic baseline, their signs are evenly distributed, and there does not appear to be a strong positive correlation between them. The errors had a mean of -0.31 GWh, an absolute mean of 9.87 GWh, and a standard deviation of 16.45. According to our previous classification, 9 cases were cases of missing data, 7 of ’zeroing out’, and 39 were standard corrections. Interestingly, 27 (52%) of errors resulted in an improvement to UAG, defined as a reduc- tion in absolute value. Conversely, this means that 45% of discovered errors worsened the financial implications of UAG. This is somewhat counter-intuitive, as reconciling an error typically suggests an overall improvement to a given predicament.

Figure 3.15: UAG vs identified error size

85

70

55

40

25

10 UAG, GWh UAG,

−5

−20

−35

−50

−60 −40 −20 0 20 40 Error Size, GWh

Next we consider how many of the errors were outside of the baselines defined in the previous section. It is important to note that it is obviously impossible to discover all rec- oncilable errors within a time frame. As discussed in the previous section, the investigations into high UAG days are not conclusive; failure to identify a cause does not rule out the presence of an ascertainable error. Therefore, a complete database of positive and negative days cannot be compiled. Thus, it is implied that even though positives can be identified on occasion, negatives cannot because the presence of an error cannot definitively be excluded. This is further confounded CHAPTER 3. DAILY BASELINE 133

by the fact that it is only ever days where |Ut| > 20 that are investigated, introducing a source of bias. As a result, it is not possible to use traditional classification metrics with the usual interpretations, as the true negatives are unknown. In Table 3.10, we can see the total number of days predicted by the algorithms, the number of days which were both predicted to be erroneous and found to contain an error (predicted and reported). Also included are the following metrics, derived from their traditional counterparts:

Pred. & Reported Indicated Sensitivity (IS) = , (3.28) Reported and Pred. & Reported Indicated Positive Prediction Value (INPV) = . (3.29) Predicted Although both metrics correspond to their values as calculated by the said terms in a tra- ditional confusion matrix classification setting, they are prefaced by ‘indicated’. This is because the real proportion of true negatives and true positives is not known - we have as- sumed that any day not reported as having an error is a true negative. INPV is interpreted as the rate at which predicted errors are discovered to contain error. A perfect discovery rate would be 95%, under the prediction interval significance and uncertainty level coverage factors used in the baseline model. In practice however, this can be expected to be much lower. The indicated sensitivity can be seen as an approximation of the efficacy of the baseline methodology, as it represents the fraction of predicted errors to those actually reported. Efficacy can also be measured in terms of reconciled energy, which can be calculated by dividing the absolute sum of reconciled energy on correctly predicted days by the sum of total absolute reconciled energy within the time period. We define reconciled energy as the difference arising out of the error correction, and assume this is the amount a successful investigation into a predicted error would have yielded. We call this metric Indicated Energy Sensitivity (IES), defined below as:

P Abs. Reconciled Energy, Predicted & Reported IES = (3.30) P Abs. Reconciled Energy, Reported CHAPTER 3. DAILY BASELINE 134

Table 3.10: Performance of baseline methods against historic errors, with metrics as defined in the previous section. P & R refers to predicted and reported days containing errors. The latter three columns are of chief importance, where a high IS, IES and INPV are desirable. Due to low sample size, many models have identical performance. The best model will represent a trade-off between IS, IES, and INPV - in this case, the Interior trim and Weighted mean have good historical performance. Predicted P&R IS INPV IES BB99 6 5 0.096 0.833 0.127 Envelope 16 9 0.173 0.562 0.204 Reg1 25 9 0.173 0.360 0.262 Reg2 26 9 0.173 0.346 0.275 ARIMA 31 10 0.192 0.323 0.228 BB95 32 13 0.250 0.406 0.275 Interior Trim, β = 0.3 42 9 0.173 0.214 0.410 Interior Trim, β = 0.1 53 8 0.154 0.151 0.444 Weighted Mean 57 10 0.192 0.175 0.444 Mean 61 4 0.077 0.066 0.444 ETS 65 5 0.096 0.077 0.444 Exterior Trim β = 0.3 68 5 0.096 0.074 0.444 Median 69 2 0.038 0.029 0.444 TBATS 79 9 0.173 0.114 0.444 NNETAR 80 9 0.173 0.112 0.410 Met.Model 2 82 9 0.173 0.110 0.444 NNETAR (exo) 83 3 0.058 0.036 0.440 Exterior Trim, β = 0.1 90 9 0.173 0.100 0.444 Met. Model 1 106 8 0.154 0.075 0.444 Legacy 20GWh 110 9 0.173 0.082 0.444 Met.Model 4 150 9 0.173 0.060 0.447 Met.Model 3 246 9 0.173 0.037 0.483 CHAPTER 3. DAILY BASELINE 135

3.5.3 Results

The performance of baseline methods against historic errors, as seen in Table 3.10 shows a mean indicated sensitivity of 14.7%, which tells us that overall only a small fraction of reported errors are outside of the baseline. This suggests that the baseline method has a low efficacy. There wasn’t much variance between the algorithms in terms of the number of days predicted and reported; with there being a mean of 7.9 predicted & reported errors, with a standard deviation of 2. The legacy baseline P&R was 9, which was also the median at n = 14. When considering indicated sensitivity in terms of reconciled energy, the average energy correctly identified was 197 GWh, as compared to 316 GWh being below the baseline. This corresponds to an IES rate of 38.33%, which is greater than the indicated sensitivity of 14.7%, but still comparatively low. The mean INPV was 13.5%, suggesting that the few errors as predicted by the baseline decision interval ever get resolved - this can be attributed to the difficulty in identifying the source of an error. When assessing and comparing the performance of individual baselines, one can either consider the indicated sensitivity in relation to the total predicted days, or the IES again in relation to the total predicted days. This is because a greater number of predicted days equates to a greater investigative cost, and thus the number of such investigations must be minimised whilst maximising the sensitivity. In this instance, the envelope method was superior in terms of indicated sensitivity, and the interior trim method with β = 0.3 was the best performing in terms of IES. In comparison to the legacy baseline, the latter identified an equal amount of reconciled energy at less than half the number of overall predicted days with errors. Considering the exponentially distributed scale of nominal node flow as seen in Figure 3.13, relatively large errors at individual low-throughput stations can go undetected if the only reliance is on the UAG baseline methodology. Just 14 percent of identified error in- stances were detected by baseline methods on average, and in energy terms this corresponded to only 38%. This suggests that in order to maximise the number of errors detected, the applied techniques must also utilise the individual energy flow time series. CHAPTER 3. DAILY BASELINE 136

3.5.4 Extreme Values

Extreme values, which would normally have raised alarms or seemed suspicious in a real-time physical control scenario accounted for 30% of errors. Therefore, simple control mechanisms monitoring for extreme values such as

max νˆi,t > νi ,

νˆi,t = 0,

Q(ˆνi,t) > 0.99,

Q(ˆνi,t) < 0.01, or even according to preset bounds:

L1 < νˆi,t < L2 can be recommended. It must be noted such control mechanisms are standard when dealing with real time physical flow control, as this is critical operational and safety information - however, accounting systems do not face such issues and hence such monitoring does not take place. Indeed, an alarm will almost surely in this instance not represent a safety issue but rather a data verification one. Of course, the calculation of alarm bounds in this instance may be a complicated issue, as the quantities represent daily aggregates rather than instantaneous flows.

3.5.5 Analytic Performance Estimation

Finally, we approximate the sensitivity of the baseline method to the presence of a single large flow error, which is the usual assumption when a large UAG value is calculated. To facilitate this, we take daily UAG to be normally distributed, with zero mean and variance as approximated by (2.16). This assumption is justified by our statistical analysis, and furthermore it facilitates the analytic evaluation of sensitivity, as it can be expressed as:

+ P (Ut + t > dt ), (3.31) where t represents the size of the error. Meter Error Model 1 was employed to calculate confidence intervals, as it has a low RF as seen in Table 3.9 and therefore the detection CHAPTER 3. DAILY BASELINE 137 probabilities are likely to be an upper bound for the other models. Moreover, the variation in RF between the most appropriate models was minimal, and thus the conclusions from our subsequent analysis are universally applicable. Under our assumptions, (3.31) is precisely equivalent to:

+ + + P (Ut > dt − ) = 1 − P (Ut ≤ dt − t) = 1 − FUt (dt − t), (3.32)

where FUt is the normal cumulative distribution function of Ut.

We define t individually for each node i:

i,t = (c − 1)ˆνi,t, (3.33) where c is the scaling error proportion as defined in (2.9). This allows us to calculate a mean detection probability across a series of nodes, or the entire network as

n 1 X 1 − F (d+ −  ). (3.34) n Ut t i,t i=1 The above measures all concern a single day t, whilst it is often a summary metric over a time frame that is required. To overcome the fact that node flow is highly variable and discontinuous, we calculate nominal flows for each node i. This is done by averaging non-zero flow; for each node, define

0 the time subset T i as:

0 T i = {t ∈ [1, 2, ..., T ] | νˆi,t 6= 0}. (3.35)

Then, nominal flow can be calculated as:

nominal 1 X νˆi = 0 νˆi,t (3.36) |T i| 0 t∈T i In Figure 3.16, the sorted detection probabilities for each node are plotted against the node’s nominal flow as a percentage of throughput for scaling errors of increasing size. As expected, larger errors correspond to greater detection probabilities, as do nodes with larger nominal flows. However, the overall detection rates of a single error are low. To further visualise the sensitivity of the baseline method to single-meter error, Figure 3.17 plots the same detection probabilities, but this time the y-axis represents cumulative percentage CHAPTER 3. DAILY BASELINE 138

Figure 3.16: Error detection probabilities for 5, 10, 20 and 50% scaling errors applied to nominal node flows in the NTS, 2018. Nominal node size is represented as a % of throughput on the y-axis.

Key 10 % Error 20 % Error 5 % Error 50 % Error

0.100

0.075

0.050

0.025 Nominal Node flow as % Network Throughput Nominal Node flow

0.000

0.25 0.50 0.75 1.00 Detection Probability

network throughput. As an example, we can see that in the nodes accounting for the top 50% of flow, there is an approximately 95% chance a 50% error will result in UAG exceeding the baseline. From the two aforementioned Figures, it is apparent that for the vast majority of nodes, even relatively large single node scaling errors will not usually cause UAG to exceed the bounds defined by the baseline model, thereby resulting in the triggering of an alarm.

EaR

A weakness of the network mean detection probability (3.34) is that it is not weighted by flow; thus, low-volume nodes have a disproportionate effect. Therefore, we introduce the EaR (Energy-at-Risk) metric, calculated for a single day t, following on from the definition of i,t in Equation (3.33):

n ! X νˆi,t EaR(t, c) = 1 − 1 − F (d+ −  ) . (3.37) Ut t i,t Pn νˆ i=1 j=1 j,t The range of the metric is 0 ≤ EaR ≤ 1, with 0 being a perfect value, and 1 indicating a total exposure to errors at the calculated level. Importantly however, this figure does CHAPTER 3. DAILY BASELINE 139

Figure 3.17: Percentage cumulative nominal network energy plotted against node detection probability, 2018.

Key 10 % Error 20 % Error 5 % Error 50 % Error

1.00

0.75

0.50

0.25 Cumulative % Nominal Network Throughput Cumulative

0.00

0.25 0.50 0.75 1.00 Detection Probability

not indicate the proportion of energy that can be simultaneously unaccounted for under the given error level; rather, the overall proportion of energy which is derived from nodes where an error of such magnitude can occur without Ut crossing the baseline, ceteris paribus. Equation (3.37) provides an instantaneous value for the metric. Perhaps what is of greater use is EaR as an averaged value over the time period t ∈ [1,T ], especially when considering overall grid structure and performance. We define a weighted average EaR as follows:

T Pn X νˆi,t EaR(¯ T, c) = EaR(t, c) i=1 . (3.38) PT Pn t=1 t=1 j=1 νˆj,t Alternatively, nominal values as defined in (3.36) could be substituted into (3.37) for a nominal operating EaR. A further feature of EaR is that it does not have to be calculated based on the entire network; rather, a subset of nodes can be utilised. It is this latter nominal EaR that we subsequently use in our analysis of the NTS. When the NTS was taken as a whole, nominal EaR at the 5%, 10%, 20%, and 50% error levels for the period 2016-2019 was 63.4%, 48.4%, 34.0%, 16.2% respectively. This leads to CHAPTER 3. DAILY BASELINE 140

Table 3.11: Mean detection probabilities, standard deviation of detection probabilities and EaR by node type, NTS nominal flow values, 2016-2020. We can see that increasing error sizes result in higher detection probabilities, and lower EaR values. Ideally, EaR should be as low as possible at a given error level. Node groups with larger mean flows had higher detection probabilities across the board. Mean Detection Probability Error Scale Factor c 5% 10% 20% 50% Industrial 16.7% 17.7% 19.7% 26.3% IC 36.4% 55.5% 69.3% 72.0% LNG 30.4% 46.4% 64.9% 81.1% Power 17.8% 20.0% 24.8% 40.0% Offshore 38.2% 55.5% 75.9% 94.1% LDZ Offtakes 19.9% 23.7% 30.3% 44.6% Storage 20.6% 26.3% 36.7% 56.2% Standard Deviation of Detection Probability Industrial 1.4% 3.1% 6.9% 18.1% IC 19.3% 34.4% 41.5% 43.3% LNG 14.5% 28.6% 34.9% 36.0% Power 1.9% 4.1% 9.3% 21.3% Offshore 21.3% 27.0% 23.8% 18.0% LDZ Offtakes 8.8% 14.3% 20.7% 29.6% Storage 6.1% 13.8% 24.1% 32.2% EaR Industrial 81.2% 77.8% 70.1% 46.8% IC 53.3% 24.6% 4.0% 0.0% LNG 58.3% 31.6% 11.8% 1.5% Power 80.5% 76.3% 66.6% 40.9% Offshore 44.9% 24.7% 10.0% 0.8% LDZ Offtakes 86.0% 82.5% 74.4% 53.6% Storage 75.6% 65.2% 48.3% 26.7% a similar conclusion as that obtained from Figures 3.16 and 3.17, in that small- to medium- sized errors are undetectable for a large proportion of nodes. A tabulated calculation of EaR by group, along with mean and standard deviation of detection probabilities can be seen in Table 3.11. In general, we can observe the trend that demand-side nodes have lower detection probabilities, and higher EaR; this is explained by the fact that they individually account for a lower amount of energy throughput, on average.

EaR vs measurement uncertainty

Before concluding, we append to this chapter a brief discussion on the variation of EaR against uncertainty of measurement, and how this can come into use from an operational or CHAPTER 3. DAILY BASELINE 141 regulatory standpoint. EaR can be seen as a measure of the baseline methods efficacy, and an overall measure of the ease of error detection in general in a transmission grid. Therefore, considering the variation of EaR against uncertainty can be helpful when determining the maximum uncertainties for nodes from a regulatory and operational perspective. For exam- ple, rather than specifying a maximum uncertainty, a regulator could specify a minimum EaR value that must be attained. Moreover, a regulator could specify an EaR in addition to a maximum uncertainty – this would naturally result in operators installing more sensitive equipment at higher throughput nodes due to their greater influence on EaR. Figure 3.18 depicts EaR evaluated for a range of measurement uncertainties, with the blue line indicating the current maximum permissible uncertainty in the NTS. We note that the curves have an exponential form, and therefore the concept of diminishing returns on investment must be considered.

Figure 3.18: EaR vs uncertainty specified in the baseline model. The dashed blue line indicates levels in the NTS.

Key 10 % Error 20 % Error 5 % Error 50 % Error

5%

4%

3%

2% Meter Uncertainty

1%

0%

0.0 0.2 0.4 0.6 0.8 EaR CHAPTER 3. DAILY BASELINE 142 3.6 Conclusion

We have considered appropriate aggregation frequencies and functions for UAG, and ex- amined the interpretations that can be derived from each. This has highlighted the inap- propriateness of focusing on the simple sum of UAG over a time frame, in cases where the objective is to make an assessment of the operational efficiency of the metering process. We can recommend focusing on metrics derived from absolute UAG. We have conducted a thorough statistical analysis of UAG, and the key finding is that it does not exhibit any quantifiable seasonality, which is contrary to trends in other interna- tional grids [2]. A baseline evaluation process has been proposed, which is an aggregation approach com- bining the estimated baseline derived from the uncertainty study approach, and a baseline as calculated through time series models trained on ’normal’ UAG data. We have investigated the fit exhibited by these baselines to real UAG data from the NTS, and also evaluate their error detection rate with respect to historic NTS single-day errors. We have found further evidence suggesting that the LP estimation errors are a signifi- cant contributor to UAG in the UK, following variable importance testing on rolling linear regression using the LMG statistic. Composite uncertainty models, in addition to historic statistical models when applied to NTS data were in general overconfident in regards to the UAG baseline (mean RF was 90 %), however this is a relatively small deviation and as such UAG can be assumed to be performing roughly within the expected boundaries. This showed that a majority of errors identified, both in terms of number and energy, are not predicted as such by the baseline error detection methodology, suggesting additional monitoring tools are needed. Finally, the mean detection probability and EaR metrics were defined, providing a sum- mary of the overall sensitivity of the baseline method. Calculation of these metrics for nominal flow values further reinforced the above conclusions. We also suggested how EaR may be used to set measurement uncertainty requirements from a regulatory perspective. Chapter 4

Systematic Error Detection

The following chapter will address the problem of identifying a systematic error from purely focusing on the univariate UAG time series, through the use of online and offline changepoint methods. This chapter largely follows from our paper, ’Applications of statistical process control in the management of unaccounted for gas’ [10], which was published in the Journal of Natural Gas Science and Engineering. Further, we note that ’systematic’ in this chapter refers to errors that have a persistent effect over multiple days.

4.1 Motivation and Case Study

Whilst unlikely, large systematic errors at metering stations resulting from instrumentation faults are possible and may even go unnoticed. Over the period 2009-2010, two large metering errors developed in the UK transmission grid, namely at the Aberdeen and Braishfield B NTS distribution offtakes. The former took place over nearly 12 months, commencing following annual routine maintenance on the 21-07-2009. The resulting metering errors were not identified until the subsequent scheduled maintenance on 10-08-2010, equating to a duration of 323 days. The cause was an incorrectly positioned orifice plate, which resulted in the meter recording an under-read of around 25% from 21-07-2009 to 20-07-2010 and around 70% from then on until 10-08-2010, as per the final technical report [43]. The total magnitude of the energy missing ranged from 2.5 GWh to 7.5 GWh per Gas Day at peak demand levels, adding up to a total of 1617 GWh. In the case of the Braishfield B error, the affected period was from 26-01-2010 through

143 CHAPTER 4. SYSTEMATIC ERROR DETECTION 144

Table 4.1: Systematic errors in exit nodes in the NTS, 2011-2016. Source: Ofgem Year Instances of error Absolute total GWh 2011 10 27.36 2012 10 46.25 2013 8 182.70 2014 7 58.33 2015 9 23.96 2016 5 73.32 to 26-04-2010 giving a total of 90 days. The cause in this instance was again the incorrect configuration of the metering installation following maintenance – a differential pressure transmitter equalising valve was left open, as detailed in [44]. This resulted in an estimated systematic under-read of 40.735%. The error measured in terms of energy per day varied in magnitude between 10 GWh and 15 GWh, resulting in an increase in the observed UAG of around 100% when compared to normal UAG values. The total energy lost was 1141 GWh. To put the above errors in perspective, Table 4.1 lays out the number and total magnitude of systematic errors since discovered in the grid during the period 2011-2016. It is clear that there is a large disparity in the severity of 2009-2010 errors, and those one might typically expect. Figures 4.1 illustrates Ut along with the reconciled UAG throughout the aforementioned time period, with moving averages added for clarity. The error sizes during the time period are depicted in Figure 4.2, along with the historic integrated daily flows for the two sites. Visually, the following takeaways are evident:

ˆ On inspection of Figure 4.1a, the start time of the Aberdeen error (pink shaded area) cannot be easily identified from the UAG or UAG 7-day average. However both the start time and end time of the Braishfield B error (yellow shaded area) is easy to spot visually using the 7-day average (blue line).

ˆ Systematic shifts in the UAG time series are clearly evident upon visual inspection in both UAG (red line) and its 7-day average (blue line) in the Figure 4.1a; it can be deduced that the system underwent a change in state during the period of the errors occurring.

ˆ Inspecting Figure 4.1b with the post reconciliation flows (blue line), there are clearly unexplained changes in state around 09-2009 to 10-2009 that could cause false positives.

ˆ From Figure 4.2b, the two sites exhibit wildly different flows; whereas Aberdeen (blue CHAPTER 4. SYSTEMATIC ERROR DETECTION 145

Figure 4.1: Top (4.1a): Historic UAG over 2009-2010, the period including the meter errors. Note the high UAG variance; error is more evident when looking at the average line. Bottom (4.1b): Reconciled vs historic UAG average lines. The magnitude, along with timing of the errors is evident. In both plots, the highlighted periods (pink/yellow) cover the error durations.

UAG UAG 7−day moving average

6e+07 Aberdeen Braishfield B

4e+07

2e+07 Energy ( k Wh)

0e+00

2009−07 2010−01 2010−07 Date

UAG 7−day moving average Reconciled UAG 7−day moving average

6e+07 Aberdeen Braishfield B

4e+07

2e+07 Energy ( k Wh)

0e+00 2009−07 2010−01 2010−07 Date

line) exhibits a stable flow in line with a steady increase and decrease reflecting tem- peratures, the Braishfield B site (red line) fluctuates widely.

ˆ If we use the actual error (blue line) in the Figure 4.2a as a reference, we can visually identify the shift caused by the Braishfield B error in all other graphs; however, similar discontinuities exist in Figure 4.2b which cannot be explained by an error.

During this time, there was no statistical monitoring of the sites and UAG with a view to detecting such changes. In the following sections methods to identify these errors, namely changepoint analysis and process control are explored. The motivation behind a quick iden- tification of the error is obvious; the reconciliation process involved in rectifying such large billing errors is costly. Furthermore, in cases where the UAG remains unattributed it is absorbed by the grid operator, and in most cases passed down to the domestic consumer. Recovering and redistributing misplaced funds can be difficult if not impossible in some CHAPTER 4. SYSTEMATIC ERROR DETECTION 146

Figure 4.2: Top (4.2a): Isolated error magnitudes for Aberdeen and Braishfield B. Bottom (4.2b): Individual integrated daily flows for the above sites. Note Aberdeen is a continuous, highly seasonal series, while Braishfield B appears much more stochastic in nature.

Aberdeen error Braishfield B error

2.0e+07

1.5e+07

1.0e+07 Energy ( k Wh) 5.0e+06

0.0e+00 2009−07 2010−01 2010−07 Date

Aberdeen Braishfield B

6e+07 Aberdeen Braishfield B

4e+07

2e+07 Energy ( k Wh)

0e+00 2009−07 2010−01 2010−07 Date circumstances. Hence, a large financial stimulus exists to minimise the time to detection of a metering fault, both as a direct result of a lower UAG and in the form of regulator performance based incentives.

4.2 Methodology

The field of changepoint analysis has, and in particular recently, been the subject of much study from the statistical and data science communities. Numerous reviews have been published – see for example [1], [84] – with a great number of demonstrated applications in a broad variety of fields including geophysical [50], aeronautical [59] and cybersecurity [69] amongst others. There exist a multitude of competing algorithms, which can all be broadly categorised by the following; online/offline, univariate/multivariate, parametric/ non-parametric, supervised/unsupervised and single/multiple changepoint. The underpin- nings of these algorithms are broad [1] – examples include likelihood ratio, subspace models, CHAPTER 4. SYSTEMATIC ERROR DETECTION 147

Figure 4.3: Top: NTS total demand, 2009-2010. Bottom: active (flowing) nodes, 2009-2010. Correlation coefficient 0.867. 160

140 Nodes Online 120

2009−07 2010−01 2010−07 Month

4000

3500

3000

2500 Total Energy (GWh) Total

2000

1500 2009−07 2010−01 2010−07 Month probabilistic methods and kernel-based methods. Clearly, the practitioner has a multitude of choices, however it can be unclear which is the best method to choose in a particular case. As such, several algorithms deemed to be suitable for the scenario at hand have been compared against each other. Following the introduction of a changepoint resulting from a systematic error at a node ˜ i∗, the calculated UAG process U takes the form 1 − c U˜ = Uˆ + νˆ , (4.1) t t c i∗,t ˆ where U andν ˆi∗,t denote respectively the UAG and metered node flow prior to the application of the systematic scaling error c, which is as defined in (2.9), and where we have excluded the random noise component for clarity. The process is no longer stochastic in nature, due to the introduction of a deterministic trend. As a result, the following features of the UAG process can be expected to undergo a change: CHAPTER 4. SYSTEMATIC ERROR DETECTION 148

ˆ Mean: Provided a node is flowing constantly above a non-zero amount that would

have been measured atν ˆmin, this would result in a mean shift of at least that amount multiplied by the error scaling factor c, ceteris paribus. In the case of Aberdeen,

νˆminc = 442067.5 KWh post error, and in the case of the Braishfield B error we have

νˆminc = 7288481 KWh. Variations above these levels will also affect the long-term means respectively.

ˆ Variance: Likewise, the variance can be expected to change due to the introduction of the deterministic trend, which will exhibit its own day-to-day variance.

ˆ Independence and correlation structure: Node flow series will usually exhibit a high degree of autocorrelation, and observations will not usually be independent. Provided the scaling factor c is large enough, these features should become observable in the UAG process.

We will first consider the UAG detection problem from the perspective of offline, or retro- spective changepoint analysis, followed by that of online, or streamed changepoint analysis. The goal in each case is slightly different: offline methods can assess the overall detectability of an error, and also set a benchmark for online performance given they have access to the unrestricted dataset. Conversely, online methods treat the time series as a stream, thus replicating the real life scenario of using UAG to draw inference on system state. We will now give a brief overview of the algorithms used, and the key metrics used to assess and compare their performance when applied to the aforementioned historical data. This includes two known errors, reflected by four changepoints as both the start and end will result in a change to the process. Results are presented in Section 4.3.1.

4.2.1 Algorithms

Notation used henceforth is now introduced. True changepoints, located at time t of a time series and numbering k = 1 through to n are denoted as τk. Predicted changepoints, or d those estimated by algorithms are denoted asτ ˆk. In the online case, we letτ ˆk denote the d detection time of a predicted changepoint, withτ ˆk ≥ τk. Four offline multiple changepoint algorithms are hereafter applied on the dataset; two parametric methods (PELT [46] and Binary Segmentation (BinSeg)) based on likelihood- ratio sampling, one non-parametric method (ECP) [54] and finally, a Bayesian method (BCP) CHAPTER 4. SYSTEMATIC ERROR DETECTION 149 based upon the methodology outlined by Hartigan [6]. Implementations were obtained from the R packages bcp [25] and changepoint [45]. In the case of online analysis, algorithms based on the online univariate changepoint detection framework (CPM) as described and implemented by [74] were utilised. These methods work by adding data points to the stream until a change is detected based on a test statistic being exceeded, whence the process is then reinitialised with the new starting point at the true change point τk. The difference between algorithms in this case is the test statistic used to determine when a change has occurred. Since prior evidence suggested the UAG is weakly Gaussian, 3 parametric test statistics were chosen in Student-t, GLR and the Bartlett test statistics. The latter two are able to detect changes reflected in the variance, in contrast to Student-t which just focuses on shifts in the mean of the process. Cumulative sums (CUSUM) were used as a further benchmark, seeing as they were one of the earliest techniques for statistical process control having been described as early as 1931 [80] and have since found widespread use in industrial applications, remaining in use to this day. Indeed, advanced methods may be superfluous if they do not exceed established methodologies performance wise, making this an important comparison. Finally, the previously mentioned offline methods were also ran sequentially, in an at- tempt to gauge their performance. This was not their original intended functionality, but did provide some interesting results.

4.2.2 Metrics

Next we define the metrics we use in the results section to compare one technique against another.

Offline

Let each time value (day) be classified as either erroneous (positive), or normal (negative). This classification itself holds one of two states: true, or false. As a result we have 4 possible combinations: let TN represent True Negatives, TP – True Positives, FN – False Negatives and FP – False Positives. Typically, metrics based upon the ratios of the above classifications are used to compare performance across competing algorithms. The following are amongst CHAPTER 4. SYSTEMATIC ERROR DETECTION 150 the most common, especially when focusing on changepoint algorithms.

TN+TP Accuracy = (4.2) TN + TP + FP + FN

TP Precision = (4.3) TP + FP

TP Sensitivity = (4.4) TP + FN Other metrics such as the Rand index [71] are also often used to compare across change- point algorithms. However, we will restrict ourselves to the former 3 metrics when discussing offline errors.

Online

Whilst a plethora of metrics exist to gauge the performance of offline methods, this task is somewhat harder in the online case as two additional factors need to be considered: the location of the identified changepoints and the detection lag. Detection lag represents the amount of time it would have taken before a true positive (changepoint) is detected, and the alarm can be raised leading to human intervention. The location refers to its position within a time series, relative to the true changepoint. Classifying predicted changepoints can be a challenge; for a predicted changepoint to qualify as a true positive, it must be detected at a lag Λ, where:

d Λ =τ ˆk − τk ≥ 0, and within a certain location tolerance ζ such that

−ζ < τˆk − τk < ζ.

When true changepoints are close together and within the given tolerance

τk+1 − τk < ζ, the interpretation of resulting predicted changepoints can be questionable. The value of ζ is therefore heuristic in nature; whereas in our case, a difference in the location estimate greater than 20 observations can be unacceptable, this can be perfectly agreeable when working with sensor data of high frequency. As a result, the validity of accuracy metrics is contingent upon CHAPTER 4. SYSTEMATIC ERROR DETECTION 151 a good selection of ζ. In our case, ζ was taken to be 25 days (which can still be seen as excessive given the nature of the errors under investigation). Another challenge when comparing methods is that predicted changepoints may or may not converge closer to the true location as more data is acquired. As two algorithms may exhibit contradictory performance at different points in time, the comparison between them becomes tricky. To overcome this, we will only consider the location estimate at the first instance a changepoint is picked up. In a practical setting, this will be the initial value to be considered by human analysis and the basis upon which system maintenance decisions will be made. Over a dataset including multiple changepoints, the average lag Λ¯ and mean absolute location error (ML) can be used in addition to FP/TP derived metrics to determine the quality of the segmentation.

n 1 X Λ¯ = τˆd − τ  (4.5) n k k k=1

n 1 X ML = |τˆ − τ | (4.6) n k k k=1 These metrics can be used to improve the algorithms practical performance by increasing or decreasing sensitivity to achieve the desired compromise between detection lag and accuracy. In the case of online, sensitivity is usually recorded in the form of an Average Run Length (ARL) parameter. The ARL is used to specify the number of expected observations before a false positive is detected. In the case of algorithms following the [74] framework, an ARL of 500 and 1000 was trialled. This value was determined to be reasonable when taking the nature of the system into account. For the non-changepoint techniques, 5 standard deviations of the summary statistic were used when running cumulative sums.

4.3 Detection Problem

We now move onto the results of applying the aforementioned changepoint algorithms on the actual historical data from 2009-2010, and discuss their implications. CHAPTER 4. SYSTEMATIC ERROR DETECTION 152

Table 4.2: Offline analysis detected changepoint locations, and performance metrics. Detected Changepoint Location, +/- days BinSeg ECP PELT BCP Aberdeen Start N/A N/A N/A N/A Braishfield B Start -1 0 -1 0 Braishfield B End 0 3 1 0 Aberdeen End 1 0 0 1 False 2 1 2 1 Positives

4.3.1 Offline

The results of the offline analysis of the observed data can be seen in Table 4.2. We note that the Braishfield B error is picked up and located with greater accuracy by all algorithms. Conversely, the Aberdeen error start is not picked up at all by any algorithm – however, the end point is accurately located by all algorithms, with a maximum location error of one day. As predicted, all of the algorithms pick up at least one false positive, around two months after the start of the Aberdeen error.

4.3.2 Online

Table 4.3 contains detailed results of the algorithms (listed in Section 4.2.1) performance against the test dataset. The results are similar to the offline setting; in that most algorithms correctly identified all changepoints bar the Aberdeen error start. Typical detection lag was about 5 days for the best performing algorithms, whilst the location estimates were predictably worse than the offline case, being an average of around 10 days off the true value. However, the location estimate can be expected to improve with the addition of more data (which in a streamed setting equates to waiting for more time units to pass). Algorithms from the CPM framework performed best at an ARL of 1000, with drastically better results when compared to those using an ARL of 500. Parametric methods focusing on mean shifts, and in particular using the Student-t distribution fared better than those also accounting for variance (Bartlett and GLR). Online PELT fared surprisingly well, and indeed may be the best performing algorithm; whilst it had the lowest mean lag the location estimates were further out than those produced by CPM Student-t. The CUSUM benchmark was exceeded upon by the GLR (1000) and Student-t (1000). CHAPTER 4. SYSTEMATIC ERROR DETECTION 153

Figure 4.4a illustrates the results of both online and offline methods against the historic data. From here, it is clear that the performance of offline methods is generally comparable. Conversely, online methods vary considerably between themselves, and there are more false positives.

Figure 4.4: UAG against detected changepoints for both historic (left, 4.4a) and reconciled datasets (right, 4.4b) Historic UAG Reconciled UAG

4e+07 2e+07

Type ● Offline ● Online

1e+07 Method 2e+07 BCP ● BinSeg

UAG UAG CPM Bartlett CPM GLR 0e+00 Offline CPM KS ● ● CPM Student CUSUM 0e+00 Offline ECP ● ● ● ● ● PELT −1e+07 Online Online

−2e+07 −2e+07

2009−07 2010−01 2010−07 2009−07 2010−01 2010−07 Date Date

4.3.3 Reconciled UAG

It is important to benchmark this corrected dataset with the same algorithms, to determine whether or not the changepoint introduced by the errors have been removed, and therefore whether the correction factors used are appropriate. This highlights an interesting retro- spective use of changepoint analysis. By applying the correction factors as recommended in the independent reports [43] and

[44], we can rearrange Equation (4.1) to find the reconciled UAG time series Ut. using the correction factor in place of c. The reconciled time series is presented in Figure 4.1b, indicated by the blue line. On first inspection, the shocks in the mean are still visible following the reconciliation. The mean of the reconciled data is around 15.2 GWh, with a relatively high standard deviation of 10.3 GWh per day. After running some initial tests on the data set, the value of p=0.435 from a Box-Pierce test does not reject the hypothesis that observations are independent. It is clear even from a visual inspection that Uˆ is not stationary (Figure 4.1), CHAPTER 4. SYSTEMATIC ERROR DETECTION 154 , start B , ¯ Λ start 80.0 77.0 13.8 22.8 11.7 19.0 25.8 14.7 20.5 10.5 20.0 A 6.5 3.0 5.3 6.5 6.0 2.8 4.3 6.3 13.8 15.0 53.7 ML 0.25 0.25 0.75 0.75 0.75 0.75 0.75 0.50 0.75 0.75 0.75 Sensitivity 0.25 0.25 1.00 1.00 0.75 0.75 0.60 0.33 1.00 0.75 0.50 Precision 0.14 0.14 0.75 0.75 0.60 0.60 0.50 0.25 0.75 0.60 0.43 Accuracy 1 1 3 3 3 3 3 2 3 3 3 TP 3 3 1 1 1 1 1 2 1 1 1 FN 3 3 0 0 1 1 2 4 0 1 3 FP end 0 0 1 1 1 1 1 1 1 1 1 A end 1 1 1 1 1 1 1 0 1 1 1 B 0 0 1 1 1 1 1 1 1 1 1 start B 0 0 0 0 0 0 0 0 0 0 0 start A denote the Aberdeen Start, Braishfield B Start, Braishfield B End, and Aberdeen End changepoints respectively. 1 ECP PELT end KS(500) CUSUM KS(1000) A GLR(500) Algorithm GLR(1000) Bartlett(500) Bartlett(1000) Student-t(500) Student-t (1000) , and end indicates a successfully identified, and a 0 the contrary. B Table 4.3: Results of online methods. The first four columns tabulate whether the respective changepoints were detected. CHAPTER 4. SYSTEMATIC ERROR DETECTION 155 and this observation is substantiated by a changepoint analysis, the results of which are summarised in Figure 4.4b. Two change points are consistently identified by most methods – all detected a mean/variance shift in mid-September 2009, and most detected a similar but in the opposite direction shift around July 2010. The cause remains unknown. However, given how this coincides with a sudden increase in both demand and the number of active offtakes, as seen in Figure 4.3, there is evidence suggesting the presence of an additional undocumented metering error. Importantly, we note that there are no errors coinciding with the known error start and end dates, indicating that there is no evidence to suggest the correction factors used are erroneous. A further extension of the principle of verifying correction factors through changepoint analysis is using it in the estimation of said correction factors. Whilst this has not been explored further, it is an interesting consideration for future research.

4.4 Effectiveness and Limitations

Penultimately we will consider the limitations of univariate online changepoint detection in a gas transmission setting, via a simulation study. It is important to note that we make no attempt at solving the attribution problem through multivariate changepoint analysis; this is an interesting area of causal inference research outside the scope of the thesis. Gauging practical effectiveness of univariate stream monitoring with real data is chal- lenging. Two key obstacles are the inherently variable and unknown nature of both the error profile, and the underlying gas flow time series of future errors. Analytic values for key metrics such as mean lag and detection rate can be calculated only for highly contrived scenarios, such as a mean shift in a normal distribution. This contrasts with the setting of UAG, where Ut is only weakly normal and the potential error profile is unknown as this is dependent on the deterministic trend of the underlying node time series. Furthermore, we typically cannot – as we will do in our simulation – assume a multiplicative error to be constant in time.

Detection times are inherently linked with false positive rates and ARL0. Lowering the

ARL0, or equivalent parameter will greatly reduce lags at the expense of false positives. In addition, when working with real data such as historic UAG we run into the same issues as those faced in Section 3.5.2. Namely, false positives are hard to interpret as they cannot be CHAPTER 4. SYSTEMATIC ERROR DETECTION 156

definitively proven to be such, especially when they are in the distant past. To overcome some of these challenges, we will focus purely on the CPM framework. We can explicitly specify

the theoretical ARL0 (a value of 1000 was used) for all test statistics making the results directly comparable, and the impact of false positives can be discounted as monitoring is reinitialised after every positive. Hence, we just need to ensure the sampled time interval is free of positives before adding in our simulated error. We will now proceed to describe the sampling and testing methodology used to produce the results seen in Figures 4.5 and 4.6. The data used was the UAG and metering stations’ time series for the period 2013-2019. No large known systematic errors are present in this time interval. Exclusion criteria were nodes with under 50% flow rates for each sampled period (mean=146), as this can introduce unnecessary variance. UAG was recalculated separately so as to account for a simulated multiplicative error in uniform steps between 1%-50%, for each gas flow time series. Online distributional change testing was then performed sequentially at 1-day intervals on this simulated UAG time series. Whilst no single statistic can encapsulate the unique error profile introduced by each time series, we used the ratio of mean error to the standard deviation of the prior UAG time series to indicate the implied shock level S, seen below as a function of a given multiplicative

error me and node i:

1 PT me T t=τ νi,t S(me, i) = . (4.7) σU1:T

where T is the total time interval length, τ is the changepoint location within T , νi,t is

the flow value of node i at time t, and σU1:T represents the prior standard deviation of the UAG. One would expect that a larger value of S will result in shorter detection times. Note

that S is indirectly controlled via incrementally increasing the multiplicative error me, and therefore S is a calculated value. Predicted location tolerance was set at ζ = 30, as in the earlier analysis.

4.4.1 Simulation Procedure

The simulation procedure used in our study is presented below.

Step (1) Normalise by scaling between 0 and 1 both the UAG and node flow time series to allow for better control of S. CHAPTER 4. SYSTEMATIC ERROR DETECTION 157

Step (2) Sample N = 50 random time intervals during the period in question, of length T = 100. For each time interval:

Step (3) Resample U1:T without replacement.

Step (4) For each gas node νi with i ∈ [1,N], introduce a multiplicative error at time τ ∈ [1,T ]. In our simulation, τ = 50 was used.

Step (5) Calculate the UAG including the error, and run the changepoint algorithm.

Step (6) Stop if change is detected or range of me ∈ [0.01, 0.5] is exhausted, recording the estimated location and lag in the case of a detected change. Recursively calculate S.

Step (7) Incrementally increase the magnitude of me by a uniform step size, in our case 0.01, and return to Step (5).

We resample the UAG time series at Step (3) to remove the correlation and dependence structure. Whilst the implication is that we do not ultimately use exact historical data, we do retain the prior distribution of the UAG for the given interval, and prevent having the algorithm initialise into a state where the test statistic may be close to the decision boundary, resulting in both fallacious results and high variance. Resampling will also remove the influence of any deterministic trends stemming from existing systematic errors. Theoretically, UAG should be independent and uncorrelated, providing additional justification. This is key for making these results generalisable to other transmission grids and removing any underlying idiosyncratic effects.

4.4.2 Results

Figures 4.5 and 4.6 are presented and discussed in order of importance for our online se- quential changepoint detection scenario. The detection probability against shock level is key as this ultimately represents whether or not a change is likely to be detected; a well-chosen algorithm will rapidly converge to a detection probability of 1 as the implied shift increases. These results are depicted in Figure 4.5. Immediately apparent is the significant difference between the performance of the competing test statistics. Most striking is the poor detection probability exhibited by the Bartlett statistic; for a shift of 1.25S, changes are identified at only the 10% level compared CHAPTER 4. SYSTEMATIC ERROR DETECTION 158 to almost 100% for all other algorithms. This suggests that relying solely upon a change in variance is not suitable in our scenario. It also serves to highlight the importance of carrying out such simulated testing prior to implementing monitoring in a production environment. The best performing test statistic is Kolmogorov-Smirnov (KS), yielding significantly better detections rates particularly at lower shift levels. Indeed, lower implied shifts (S < 1) are of greater importance to this interval typically containing a higher proportion of errors.

Figure 4.5: Simulated detection rate curve depicting the probability of detecting true posi- tives for increasing levels of implied shock

1.00

0.75

Method

Bartlett 0.50 GLR KS Student Detection probability

0.25

0.00

0.0 0.5 1.0 1.5 2.0 S (Implied shift)

Furthermore, these limits are in line with the conclusions from the posteriori analysis. The start of the Abderdeen error at 2 GWh, is only about 0.2S of the then UAG which correlates with a poor detection rate for all algorithms and was not detected initially. Conversely, Braishfield B averaged around 12 GWh or 1S and was almost immediately detected, also as suggested by the detection rates in Figure 4.5. Figure 4.6a plots a smoothed GAM [90] curve of detection lag against implied shock. Minimising the lag allows the operator to act quicker, in turn minimising both financial and reputational loss. Performance roughly mirrors that depicted in Figure 4.5. The initial slight positive gradient exhibited by GLR and Bartlett in Figure 4.6a is explained by a very low number of correctly identified changes at low shock levels, as evidenced by Figure 4.5. Change detection based on the KS statistic is superior roughly between shocks of 0 and 1S. The Student-t statistic provides lower detection lags for higher shifts. Based on these observations, both the KS and Student-t statistic should be employed to monitor for CHAPTER 4. SYSTEMATIC ERROR DETECTION 159

Figure 4.6: Smoothed GAM curve of lag against the implied shock level S on the left (4.6a), and of absolute location error against implied shock S, seen on the right (4.6b).

30

20

20 Method

Bartlett GLR KS Lag, days Student 10 10 Absolute location error, days Absolute location error,

0 0

0 1 2 3 0 1 2 3 S (Implied shift) S (Implied shift)

changes. Smoothed absolute location error versus implied shift is seen in Figure 4.6b. In terms of online detection, this is of lower priority. A high accuracy will aid in the attribution of an error, but it is conditional upon the error being detected. The same trend as in Figure 4.5 is seen here; KS and Student-t are the best performing algorithms. High shifts result in estimates accurate to generally under a week, whereas the location estimate is generally unreliable for S < 0.5.

4.5 Discussion

The above results have two significant implications for gas transport network operators. Firstly, it is key to strive to minimise UAG variance, as not only does this minimise overall transportation cost but also allows for higher detection rates in the presence of a fault. Secondly, nodes with a low nominal flow relative to UAG variance will not cause a detectable fault in the UAG, even at significant systematic error rates. Therefore, it is important to identify such sites, and subsequently investigate systematic deviations from either overall flow expectation or forecasts. Next we provide recommendations of the steps involved in the implementation of an effective changepoint monitoring process on UAG, in a complex transmission grid such as the NTS. Our method contains 5 steps: CHAPTER 4. SYSTEMATIC ERROR DETECTION 160

1. Carry out analysis on normal data (in this context, normal refers to data resulting from operations where the system is in an error free, or normal state), and determine its key statistical properties such as normality, stationary, and heteroscedasticity. If possible, detection can be based on filtered data (first differences, model residuals, etc.)

2. Analyse past cases of anomalous data, and determine what kind of errors may be likely to occur. Carrying out a posteriori analysis on past errors can also be helpful to develop a better understanding of detection limitations.

3. Determine an action plan in the case of a changepoint being identified. Factor in the cost of a false positive, along with the cost of missing a false positive.

4. Based on the results of the analysis in 1. and 2., the type of statistical process control can be identified. The algorithms in place should ideally be tailored to the data and the expected changes: e.g., looking for a shift in mean in a normally distributed series. Sensitivity should be as high as can be tolerated possible for critical processes, to avoid missing crucial events and yet minimising the cost of false positives.

5. If persistent change is identified that cannot be attributed to a known error and is perhaps due to new operational conditions, this should be considered the new normal and monitoring reconfigured to adapt to this change. However, if no recent operational changes have occurred, a thorough investigation of potential causes would be advisable.

Despite taking these precautions, we have shown that relatively large errors at individual nodes can have little effect on the UAG, especially in times of low overall demand. This is in part due to the instantaneous gas flow distribution across nodes tending to be exponential in nature (most gas is delivered and removed from the network by just a few points), and hence a fault is more likely to go unnoticed at a lower-flow site. Therefore, it is also important that these control processes are also carried out down- stream of the transmission network, by the distribution network operators. This is because in the case of a systematic error in a transmission grid, downstream UAG in a single feed distribution system will also exhibit an error of equal magnitude, but opposite sign. By frequent comparison of results, fault detection times are highly likely to be reduced. Data sharing and availability are also important factors to be considered by the operators. CHAPTER 4. SYSTEMATIC ERROR DETECTION 161

4.5.1 Conclusion

We have shown that an application of online statistical control techniques on UAG can prove valuable in diagnosing the presence of meter faults – indeed, when working with large faults detection times can number in just a few days whilst still producing accurate estimates of the exact error location in the time domain.

However, the UAG process Ut in large transmission grids is prone to sustaining transient shocks in both mean and variance. Often, detection of these shocks will not be possible due to the system size and complexity, coupled with a lack of redundancy in metering and the effect of further errors idiosyncratic to each grid. Conflicting evidence exists regarding seasonality in UAG, and no effective models could be identified. Therefore, whilst moni- toring for systematic shifts in mean and variance can be beneficial, interpreting results and determining a course of action should not be based solely on these quantitative results, but rather the operator will need to combine this with more practical knowledge about the grids behaviour. Referencing Figures 4.5 and 4.6 and comparing them to the current UAG variance should provide further insights if a predicted changepoint should be acted upon. Maintaining a good working relationship with downstream balancing control will allow for a quicker diagnosis of faults. The importance of identifying and testing a suitable algorithm for online monitoring has been highlighted. In the case of the UK’s National Grid, we can recommend that monitoring be carried out through parametric online detection. We suggest the use of the CPM framework based on both the Student-t and Kolmogorov-Smirnov test statistic. This should be done on a weekly to bi-weekly basis considering the daily nature of the UAG. Additionally, we demonstrated another use of changepoint analysis – as a further check for the validity of estimated correction factors. Chapter 5

UAG-led Node Error Detection

In this chapter, we consider two unique error-detection techniques for daily node flows within the NTS which make use of the UAG balancing term, and the principle of energy conservation in general. These techniques are aimed at identifying significant errors in node flow, which do not arise out of pure measurement error. The discovery of such errors is important to a transmission grid operator as firstly, they can in some cases be rectified through simple data validation and verification techniques thereby providing accurate measurement to network participants, and in other cases future errors can be prevented by physical maintenance procedures where the source of the error is in the metering installation itself. These techniques can also be viewed through the lens of causality detection, as they search for the presence of an error term resulting in a significant causal relationship between the node flow and UAG. Importantly we also discuss how these error detection techniques, alongside previously discussed procedures such as the baseline method can be incorporated into a greater error- reducing process within the transmission grid.

5.1 Joint Energy Balancing

Conservation of energy is not only monitored within transmission grids. Indeed, the exact same energy balancing processes take place in downstream distribution grids, where UAG is also calculated. Moreover, the monitoring of energy conservation can be carried out for a variety of both supply and demand node types, and is not limited to distribution grids. By combining the results of daily energy balancing both upstream and downstream of a meter,

162 CHAPTER 5. UAG-LED NODE ERROR DETECTION 163 a highly sensitive error detection and attribution methodology can be formulated. An automatic consequence of carrying out energy conservation monitoring either down- stream of upstream of the NTS is that this fulfils the goals not only of error detection, but also attribution. Moreover, since such energy conservation monitoring is already carried out in many instances, this removes the needed for statistical and data-based error attribution carried out by the transporter. Furthermore, as will be demonstrated, this method provides a very high sensitivity. Both systematic and random errors can be identified in this manner, as the form of error detection is simply contingent on the type of process control used; changepoint analysis can be combined to identify systematic errors, and the baseline method can be combined to identify random ones. In general, relatively large errors at small- to medium-sized nodes can go undetected when relying purely on monitoring the NTS UAG, as shown in Chapter3. However, such errors are much more easily detected when viewed from the perspective of the downstream energy process. This is because in all cases barring distribution grids the energy offtaken is the sole entry point into the downstream process. Finally, a very important property is that when a metering error exists, this will al- ways result in opposite signed effects in the downstream and upstream energy conservation calculation. Before reviewing how energy conservation monitoring results can be combined, we first examine the individual balancing methods. Energy conservation methods differ according to node type, and therefore we consider these separately, starting with gas distribution grids.

5.1.1 Distribution Grids

Distribution network offtakes account for the vast majority of daily demand flow, at roughly 50% as per Table 1.2. Therefore, having a precise attribution strategy for these nodes is crucial. The natural energy balancing process employed in a distribution grid is the same as that in a transmission grid. Inputs and outputs are balanced, as in Equation (1.1), only with a different set of nodes. This allows for the calculation of a distribution grid UAG value. We

D denote this downstream UAG as Ut . CHAPTER 5. UAG-LED NODE ERROR DETECTION 164

Exactly the same baseline control method can be employed in a distribution grid, follow- ing an uncertainty study and the development of an appropriate baseline. To evaluate the efficiency of the dual baseline approach across transmission and distri- bution grids, the downstream topology of NTS offtakes needs to be more closely considered.

Connected Components

As shown in Figure 1.2, certain NTS offtake nodes will feed grids that are further linked downstream between each other through medium-pressure pipelines, as opposed to purely single-feed systems. This is why the connectivity of grids downstream of NTS distribution offtakes must be analysed. We define the greater distribution supergraph by an n by n connectivity matrix M,

th where the Mi,j element represents an unidirectional link from node i to node j within the downstream distribution grids. Connected components of this supergraph are defined as subgraphs in which any two vertices are linked to each other by paths, and which are connected to no additional vertices in the supergraph. Connected components should be treated as a individual distribution grids, and UAG should be calculated independently for each component by the DNO, rather than on an LDZ-basis. The reasoning for this is illustrated by Figure 5.1, where connected components found within two NTS LDZs are shown. UAG values for connected components are entirely independent of each other, and there is no interaction between different connected compo- nents. This allows the narrowing down of a potential fault to only the small number of supply nodes found within each subgraph, rather than within the greater LDZ. The breakdown of independent components by size, nominal flow and percentage through- put nominal flow can be seen in Table 5.1. We note that whilst the majority of connected components are single-feed systems, this accounts for only a small proportion of the total en- ergy outflow. Indeed, upwards of 80% of the distribution energy outflow is into components consisting of 3 or more nodes. This is an important factor, when considering the sensitivity of the dual baseline approach, and is discussed in the next section. CHAPTER 5. UAG-LED NODE ERROR DETECTION 165

Table 5.1: Downstream energy flow distribution by grid size, with flow as a percentage of total LDZ demand Size ›Grids % Demand % LDZ Demand σ % Demand σ % LDZ Demand 1 26 0.0276 0.0521 0.00752 0.0094 2 4 0.0322 0.0634 0.007 0.0206 3 4 0.0546 0.103 0.014 0.0126 5 2 0.0955 0.17 0.0389 0.0283 6 1 0.0594 0.11 0.0168 0.00814 7 2 0.0787 0.143 0.0265 0.0121 8 2 0.051 0.0945 0.0148 0.012 9 2 0.1 0.184 0.0314 0.0134 10 1 0.0433 0.0808 0.0117 0.00969

Error Detection Sensitivity

D The dual baseline approach is then the combination of baseline monitoring on Ut and Ut . The error detection sensitivity of relying on a baseline approach has already been approxi- mated in Section 3.5.5. We have discussed that this is dependent upon two key factors; the variance of the UAG, and the size of the error. The size of a detectable error at a given probability is dependent on the proportion of total throughput in the connected component a node is responsible for. The best case is found in single feed systems, as 100% of supply flows through the only node in the sub- graph. This means sensitivity is purely determined by UAG variance, and can be directly approximated as in (3.31). However, this scenario accounts for only 5% of energy throughput. Therefore, it is im- portant to consider the within-group nominal energy flow distributions, in order to appraise the overall sensitivity of the approach. On average, the mean flow through a node across all connected components was 40.0% (σ = 0.37) of the throughput. In contrast, for LDZ offtakes in the LDZ this figure was 0.575% (σ = 0.006597). This stark contrast immediately suggests that the baseline method is much more powerful when focusing on the supply side in distribution network connected components. As previously discussed, detection probability is dependent on UAG variance. Unfortu- nately, no data on UAG at either a distribution network or individual distribution grid level was available from within NG, and neither is this currently published online. Therefore, downstream offtakes were simulated so as to calculate UAG limits. 400 demand offtakes were assigned random normally distributed proportions of the overall supply. We tested two CHAPTER 5. UAG-LED NODE ERROR DETECTION 166

Figure 5.1: Example of downstream distribution grids in two LDZs; Scotland is seen on the left, and the South West on the right.

scenarios:

ˆ Equal Conditions: Distribution UAG behaviour and underlying uncertainties mirror those of the transmission grid. Therefore, UAG limits were calculated as in Equation (2.16).

ˆ D Inferior Distribution Performance: Ut variance is considerably higher. In this instance, we kept the supply node uncertainty at 1.1% as this is known, but set distri- bution exit nodes at 5%.

To determine the overall sensitivity of the baseline method in the distribution networks, we proceed as in Chapter3 and examine the mean detection probabilities, the standard deviation of thereof and the EaR. The results of the sensitivity analysis can be seen in Ta- ble 5.2. We note that there is a drastic increase in mean detection probability for error in all nodes, particularly at lower error levels as compared to the transmission application of the baseline method. As expected, incresing UAG levels through measurement uncertainty CHAPTER 5. UAG-LED NODE ERROR DETECTION 167

Table 5.2: Mean node detection probabilities, standard deviation of node detection proba- bilities and EaR for the 3 test scenarios of the baseline method, in LDZ offtake nodes. % error Mean Detection Probability SD, Detection Probability EaR Transmission Baseline Method 5% 19.9% 8.8% 86.0% 10% 23.7% 14.3% 82.5% 20% 30.3% 20.7% 74.4% 50% 44.6% 29.6% 53.6% Distribution Baseline Method: Equal Measurement Uncertainty 5% 49.5% 31.5% 22.5% 10% 55.3% 27.3% 15.7% 20% 60.8% 21.3% 9.6% 50% 65.9% 13.2% 3.8% Distribution Baseline Method: Inferior Measurement Uncertainty 5% 33.5% 32.1% 41.0% 10% 45.4% 32.3% 27.6% 20% 52.5% 29.8% 18.8% 50% 60.0% 22.3% 10.6% results in lower sensitivity, although there are still significant gains compared to the trans- mission results. Most importantly, at the c = 5% level when considering EaR, applying the baseline method on the distribution side yielded roughly 65% and 45% better results respectively under the equal and inferior measurement uncertainty assumptions. Whilst the overall detection probabilities indicate that applying the downstream baseline detection method will significantly improve sensitivity to upstream meter errors, it still cannot be used to definitively exclude the presence of significant errors, particularly in the case of smaller nodes. Finally, we consider the improvement to nominal network EaR when also making use of downstream detection probabilities. In this case, the calculation is expressed as follows:

n ! X + Di + νˆi,t EaR(t, c) = 1 − 1 − min(FU (d − t,i),F (d − t,i)) , (5.1) t t Ut t Pn νˆ i=1 j=1 j,t where F Di is defined as the cumulative distribution function of U Di , for the distribution Ut t grid containing node i and equal to 1 when i is not a node with a downstream distribution grid. The above calculation was evaluated for the nominal NTS flows over 2016-2019, and values of 55.5%, 37.3%, 23.2%, and 10.9% were obtained for the usual error increments. This represents an improvement of 7.9%, 11.2%, 10.8%, 5.3% percentage points respectively. CHAPTER 5. UAG-LED NODE ERROR DETECTION 168

Issues

There are several practical obstacles that can prevent the implementation of this approach or diminish its efficacy, listed below:

ˆ The DNO may not calculate UAG on a component level.

ˆ D Ut variance is exceedingly large, or there is a consistent bias present. This is likely, due to the additional issues not present in the transmission grid, such as billing cycle discrepancies or theft.

ˆ D Data, and in this instance Ut may not be shared by the DNO.

ˆ Crucially, it may not be possible to accurately assess daily component UAG at all, as traditionally consumer gas meters are analogue and not linked back electronically to the distributor. However, this issue is bound to be overcome with the implementation of smart meter technology, and this new data availability will doubtless be an avenue for further research.

ˆ D A high Ut value does not necessarily imply an error on the transmission side, as it could originate on the demand side too. However, distribution grids will typically have thousands of downstream meters of very low individual throughput, and therefore they are unlikely to individually be the cause of high UAG days.

Despite these issues, we believe the evidence above presents a compelling argument for DNOs to carry out this analysis, and work towards daily UAG calculation for connected components in the future. Next, we consider energy conservation monitoring in power sta- tions.

5.1.2 Power Stations

As per Table 1.2, gas fired power stations account for a large component (∼ 22%) of daily throughput in the NTS. Today, these are all typically CCGT (Combined Cycle Gas Turbine) plants. The plants integrate two cycles; a primary gas turbine, where gas is combusted alongside pressurised air (Braytone cycle). Following this, heat is recovered in a steam generator which is then used to spin a (Rankine cycle), generating further electricity. This results in an efficiency much greater than the single gas turbine, which is CHAPTER 5. UAG-LED NODE ERROR DETECTION 169 typically only at 28% to 33% as per [34]. Indeed, a modern CCGT’s efficiency at nominal load approaches 60%. The combination of a gas turbine, a steam generator and a steam turbine is refereed to as the power train. Multiple configurations of power trains exist; for example, two gas turbines may share a single steam turbine. A power station can house multiple power trains. The electricity generated is usually directly connected to the national transmission grid, or in some cases fed to large industrial users. As this electricity output is also metered, it provides a form of metering redundancy and allows us to perform energy conservation balancing.

Energy Conservation

In constructing an energy conservation methodology, we employ the following facts. Power stations use gas for the sole purpose of , with the exception of heat co-generation plants: These plants generate steam which is directly used in other industrial processes, and therefore for those cases the methods discussed herein do not apply. Further- more, electricity generated is always metered prior to being transferred onto the electricity transmission grid. Finally, power stations will typically not store statistically significant quantities of gas on site, meaning gas usage can be assumed to result in nearly instanta- neous electricity generation. The efficiency is a function of a plant’s load factor; optimal efficiency is achieved at the design nominal load, and there is a significant financial incentive not to deviate considerably from this level as the resulting generation will be less profitable. However, the latest gener- ation CCGT plants can operate at loads as low as 40-50% load per unit, with only minimal efficiency penalties as claimed in [76]. It has previously been suggested in [56] that monitoring the efficiency ratio of power stations for excessive deviation can be used in detecting meter error. Of course, this does not definitively identify the cause of the error, as the said error could be in either the gas or electricity energy values. Moreover, efficiency deviation as a result of load factor was not considered in the above report. There it is also argued that CCGT plants will operate within a sufficiently narrow efficiency band, making efficiency modelling unnecessary for the purposes of error detection. Therefore, in the next sections we will investigate the linear relationship between electricity output and gas input, as well as the relationship between load factor & efficiency in modern CCGT plants. This is done with a view to assess the CHAPTER 5. UAG-LED NODE ERROR DETECTION 170 potential of using these models to infer errors at either meter.

Data

An analysis of 16 gas-fired power stations, all of which were relatively modern combined cycle plants, connected to the NTS was conducted. Power output was derived from historic generation data [24], available as instantaneous power output in 30-minute intervals, which is the lowest tradeable time unit in the UK. Power output was interpolated between these points to calculate total energy generated. Let Ot indicate the instantaneous electricity generation at time t, and Ot:t+x the total energy generated between t and t + x. Setting x = 1 yields the daily electricity generation. In the UK, for 30-minute instantaneous power intervals the daily energy total is calculated by the following sum:

48 X Ot + Ot+ i O = 48 , (5.2) t:t+1 4 i=0 which introduces an element of uncertainty and error, as true start up time between intervals is unknown. For convenience, we will subsequently denote daily measured generated power th ˆ by the i node as Oi,t. However, the error can be seen minimal during periods of constant operation. This limitation of our study was a consequence of data availability. Nevertheless, even this low granularity of power output data is sufficient for a high error detection rate to be achieved, as will be shown. Power output was subsequently aggregated into daily intervals, as per the Gas Day, by summing (5.2).

Daily efficiency ei,t for a node i can then be calculated as follows:

ˆ Oi,t ei,t = (5.3) νˆi,t

Efficiency Ratio

First, we assessed the overall variation in efficiency of power plants, and the ability to model gas usage irrespective of load factor. Obvious outliers outside of realistic efficiencies, which we set as those

ei,t ∈/ [0.1, 0.7], (5.4) were removed from the data, as this indicates either data error, meter error or test flows and is not reflective of normal operation. The realistic efficiency bounds were set according to CHAPTER 5. UAG-LED NODE ERROR DETECTION 171

Table 5.3: Power stations, 2019. Type of power train is show, alongside number of days active, mean and standard deviation of efficiency, R2 coefficient of model fit as per Equation (5.5). Name Power Train N Mean e SD e R2 Mean l SD l Coryton Combined 271 0.450 0.039 0.991 0.584 0.218 Damhead Combined 223 0.440 0.035 0.995 0.681 0.164 Epping Green Single 299 0.464 0.035 0.975 0.720 0.132 Great Yarmouth Combined 284 0.506 0.019 0.989 0.910 0.077 Keadby Combined 225 0.448 0.052 0.998 0.703 0.207 Kings Single 111 0.403 0.093 0.904 0.549 0.187 Langage Combined 225 0.463 0.041 0.998 0.564 0.209 Marchwood Combined 299 0.500 0.053 0.998 0.749 0.205 Medway Combined 242 0.434 0.041 0.996 0.596 0.218 Rocksavage Combined 326 0.467 0.024 0.998 0.589 0.214 Seabank B Single 199 0.475 0.044 0.900 0.709 0.199 Seabank Combined 211 0.462 0.086 0.883 0.701 0.173 Spalding Combined 324 0.472 0.041 0.971 0.684 0.182 Stallingborough Combined 266 0.467 0.031 0.998 0.694 0.201 Stallingborough Combined 298 0.468 0.041 0.995 0.692 0.200 Sutton Bridge Combined 259 0.459 0.037 0.999 0.518 0.197 Mean 0.454 0.047 0.974 0.657 0.184

[76]. A simple linear model (5.5) was fitted:

ˆ 2 νˆi,t = αi + βiOi,t + i,t i,t ∼ N(0, σ ). (5.5)

In Table 5.3 we can see the R2 coefficient of the above fitted linear models, alongside the calculated standard deviation and mean efficiencies. Mean efficiency was 0.507, slightly lower than the nominal usual efficiencies of modern plants. This is explained by the fact that operating load is not consistently at optimal levels, and that performance can only be expected to degrade with age. Standard deviation for operating efficiency varied between 0.019 to 0.053, with a mean of 0.047. A plot of electricity output vs gas input for the Rocksavage power station can be seen in Figure 5.2, with the linear model defined above overlayed with blue. The fit displayed by this figure was also common to the vast majority of power stations. We observe that a straight line appears to be a good model to the data. This is further backed up by the high R2 coefficient exhibited by the model fit for all power stations; indeed, in 13 of the 16 cases the fit had an R2 > 0.97. Further visual evidence for the suitability of the linear model is present in Figure 5.3, where the gas use vs electricity generation for 12 stations is plotted. CHAPTER 5. UAG-LED NODE ERROR DETECTION 172

Figure 5.2: Typical power station daily efficiency profile. In this case, values for Rocksav- age power station are plotted, with the linear model line of best fit overlayed in blue and prediction intervals shaded in grey.

3e+07

2e+07 Gas F low (kWh)

1e+07

0e+00

0.0e+00 5.0e+06 1.0e+07 1.5e+07 Electricity Generated (kWh)

For all power stations, we visually note a good fit with the model. On the plot, many cases of either missing data or zeroing out are immediately evident. These cases, which are all identified by the filtration in (5.4) are highlighted in red. Relatively few points exist that are both outside the prediction intervals, and not clear-cut errors picked up by the filtration. It is these errors which linear modelling may prove to be most useful to identify. However, we can note in the case of Epping Green and Seabank B, the prediction interval is wider due to the influence of outliers. Since it is especially important to guard against outlier influence, in addition to outlier re- moval it was decided to use robust regression, instead of ordinary least squares. Importantly, we require that the robust regression used is resistant to both outliers in the explanatory and response data, as evidence in Figure 5.3 suggests these are present. Therefore, the fit of model (5.5) used for the calculation of prediction intervals was carried out through iterated reweighted least squares, as implemented in the MASS package in R. There has been much discussion regarding competing robust estimators – which specify the function of residuals to be minimised in robust regression – and their comparative advantages in literature; we applied the MM estimator as recommended in [87], which was first put forth in [92]. It must CHAPTER 5. UAG-LED NODE ERROR DETECTION 173 be noted here that there are several appropriate choices given our requirements of outlier resistance; a popular and viable alternative is least trimmed squares regression – this is recommended in [88]. Automated detection of anomalous values can be implemented by flagging values outside the linear model prediction intervals for a given electricity output. Moreover, the predicted value can be used as a substitute for the measured value in assessing predicted UAG - this is discussed in more detail in the next sections. Finally, the use of robust regression itself allows for outlier detection. Subsequently, these outliers can be removed from the dataset through this method, prior to considering the non-linear fits which we will next discuss.

Efficiency vs Load Factor

Next, we consider the load factor’s effect on the efficiency ratio, and investigate whether through modelling these two quantities we can achieve superior detection as opposed to the constant efficiency assumption. Theoretically, following the inclusion of this source of efficiency variation the only remaining uncertainty will be that of the gas and electricity meters.

Instantaneous plant load factor lt at time t was calculated as:

Ot lt = (5.6) Omax

where Omax is the power output at 100% the level load a plant is certified to operate at, and where optimal efficiency is attained. A simple average of the non-zero half-hourly load factors was calculated for each day. We note that the models subsequently discussed ideally model instantaneous load, rather than the daily average. However, we apply them to daily data, and acknowledge that using this averaged value for load factor is an additional source of error. To facilitate notation, we will continue to use lt to represent load, and further we will provide all subsequent equations without indexing for a set of power stations, for clarity. CCGT power station efficiency curves cannot be generalised, as they depend upon the plant power train configuration. Different configurations result in vastly different efficiency profiles. We will discuss three distinct categories, each of which was represented in our dataset. It must be noted that efficiency profiles at given loads will obviously be known by the power station and specified in its design, either in a closed form (see [76], [36]) or as a numerical model. However, we as always consider this problem from the gas transporters CHAPTER 5. UAG-LED NODE ERROR DETECTION 174

Figure 5.3: Linear model for efficiency ratio, for a range of power stations. Red points indicate values removed prior to model fitting. Coryton Keadby Seabank B 4e+07 ● ● ● ● ●● ● ●● ●●●●● ●●●● ●● ●●●● ● ●●●● ●●●●● ●● ●●●●● ●●●● ● ● ● ●●●●●● ●●●● ● ●●●●●● 3e+07 ● 3e+07 ●●●●● ●●● 4e+07 ●●●●● ●● ●● ● ●●● ●●●●● ●●●●● ●●●●●●●● ● ●●● ●●●●● ●● ●●●●●● ●●● ●●●●●● ●●● ●●●● ●●●●●●● ●●●●●● 2e+07 ●● 2e+07 ●●●●● ●●● ●● ●●●● ●● ●●●● ●●●●● ●●●●●● ●●●● 2e+07 ●● ● ●●●● ●● ●●●●●●●●● ●● ●●●● ● ●● ● ●●● ●●●● ●●●● ●●● ●●● ●●●●●●●●● ●●●● ● ●●●●●●●●●●● 1e+07 ● ●●●●●● 1e+07 ● ●● ●●●●●●●● ●●●● ● ●●●●●●●● ● ●●●● ●●●●● ●●●●● ●●● ●●●●● ●●●●●●●● ●●● ●●●● ●●●●●●● ●●●● ●●● ●●●●●●● ●●● ●●● ●●●●●●●●●● ● ●● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ●●●●●● ● ●● ●● 0e+00 0e+00 ● ● ● 0e+00 ●● ● ● 0.0e+00 5.0e+06 1.0e+07 1.5e+07 0.0e+00 5.0e+06 1.0e+07 1.5e+07 0.0e+00 2.5e+06 5.0e+06 7.5e+06 Damhead Creek Langage Stallingborough 1

4e+07 ● 4e+07 ● ●● ● ●● ● ● ●●●● ●● ●●● ● ●● ● ●●● ● ●●● ● ●●● ●● ●●●●● ● ●●● ●● ●●● ●●●● 3e+07 ●● ●●●● ● ●●●●● ●●●●● 3e+07 ●●●● ●●●●●● 3e+07 ● ●●●●● ●●●●●● ●●●● ● ●●●● ●●●●● ●●● ● ●●●● ● ●●●●● ●●●● ● ●●●●● ● ●●● ●●● ● ●●●●●● ●●●●● ●●●●● ● ●●●●● ●●●●● ● ●●●●● ●●●●●● ● ●●● ●●●●● ●●●●● ●●●●● ●●● 2e+07 ● ●●●●● ● ● ●●● ●●●●● 2e+07 ● ●● ● ●● 2e+07 ● ● ●●● ●●● ●●● ●● ●●●● ● ● ●●● ●●● ●●● ●●●●● ●●●● ●●● ●●●● ● ● ●●● ●●●●● ●●● ●●● ●●●●●● ●●● ●● ●●●●● ●●●● ●● 1e+07 ●●●● ●●●● 1e+07 ●●● ●●● 1e+07 ●●●●● ●●●●● ● ● ●●● ●●●●●● ●● ●● ●●●●●● ●●● ●●●● ●●●●●● ●●●●● ●●●●● ●● ●●●● ●● ●●●● ● ●● ● ● ●● ● ●● ● 0e+00 ● ● ● ● ● ● 0e+00 ●● ●● 0e+00 ●●● ● ● ● ● ●● ● ● ●● 0.0e+00 5.0e+06 1.0e+07 1.5e+07 0.0e+00 5.0e+06 1.0e+07 1.5e+07 0.0e+00 5.0e+06 1.0e+07 1.5e+07 Epping Green Marchwood Stallingborough 2

● ● ●●● ● 2.0e+07 4e+07 ●●●● ●●● ● ● ●●●●● ●●●●● ●●● ●●●●● ●●●● ●●●● ●●●●●● ● ●●●●● ●●● ●●●●●●● ●●●●● Gas Flow (kWh) Gas Flow ●● ●● ●●●● ●●●●●●● ●●●●●● ●●●●● ●●●●●●● ●●●●●● ●●●●●●● ●●●●●● ●●●● ●●●●● ●●●●● ●●●●● 1.5e+07 ●●●●●●● ●● 2e+07 ●●● ● ● ●●●●●● 3e+07 ●●●● ● ●●●●● ● ●●●●● ●●● ●●●● ●●●●●● ●●●●● ●●●●● ● ● ●●●●●●● ●●●●● ●●●●●● ● ●●●●● ●●●● ●●●●●● ●●●●●● ●●●●● ●●● ●●●●●● ●● ●●●●● ●●●●● ●●● ●●●● ●●●●●●● ●● ●●●●● 1.0e+07 ●●●●●● ●●● ●●●● ●●●●● 2e+07 ●● ●●● ●●●●● ●●● ●●●● ●●●● ●●● ●●●●● ● ●● ● ●●●● ●● ● 1e+07 ●●●● ●●● ●● ● ● ●● ●● ●●● ●●● ● ●●●● ● ●● 5.0e+06 ●●● ● ●● ●●● 1e+07 ●● ●●●● ●●● ●●● ●● ● ●●● ●● ●●● ●●● ●●● ●● ●● ●● ●● ●●● ● ● ● ● ●●● ●● 0.0e+00 0e+00 ●● ● ● ● ● ● 0e+00 ● ● ● ● ● ●● ● ●●●● 0.0e+00 2.5e+06 5.0e+06 7.5e+06 0.0e+00 5.0e+06 1.0e+07 1.5e+07 2.0e+07 0e+00 5e+06 1e+07 Great Yarmouth Rocksavage Sutton Bridge

● ● ● ●● ●● 2.0e+07 ● ●●●●● ● ●● ●●●● ●●●●●●●●●● ●● ●●●● ●●●●●●●●●● ● ●●●●● ●●● ● ●●●●●●●●●●●● ● ●●● ●●●● ●●●●●●●●●● ●●●●●●● ● ●●●●● ●●●●●●● 3e+07 ●● ●●●●●●● ● ●●●●●● ●●●● ● ●●●● 3e+07 ●●● ●●●●● ●● ●● ●●●●●● ●●● ●●●● ● ●●●●● ●●● 1.5e+07 ●●● ●●● ●●●● ●●●●● ● ●●●●● ●● ● ●●●●● ● ●●●●● ●●● ●●●● ● ●●●●●● ●●● ●●●●● ●● ●● ●●● ●●●● ●● ●●● ●●● ● ●●●● 2e+07 ●●●● ●● 2e+07 ●● ●●●● ● ●●●● ●●● 1.0e+07 ●●●● ●●●● ● ●●● ●●● ● ●●●● ●● ● ● ●●●●● ●●●● ● ●●●● ●●● ● ●●●● ●●● ●●●● ●●●● ●●● ●●●● ● ●●●●●● 1e+07 ●●●●● 1e+07 ●●●●● ●●●●● 5.0e+06 ● ●●● ●●● ●●● ●●●● ●●● ●●●●● ●● ●●●● ●● ●●●● ●●●● ●●● ● ●● ●●● ●● ●● ●●●● ● 0.0e+00 ●● ●● ● ● 0e+00 ● ● ● ● ● 0e+00 ●●● ● ● ● ● 0.0e+00 2.5e+06 5.0e+06 7.5e+06 1.0e+07 0.0e+00 5.0e+06 1.0e+07 1.5e+07 0.0e+00 5.0e+06 1.0e+07 1.5e+07 Electricity Generated (kWh) CHAPTER 5. UAG-LED NODE ERROR DETECTION 175 perspective, where only daily flow data on a per-plant basis is available. To accurately calcu- late efficiency, more involved quantities are required, such as individual turbine inlet/outlet pressures and temperatures – realistically, this will not be available to the transporter. We continue by discussing single-unit stations:

Single Power Train Stations

We define a single power train CCGT plant as one where a single gas turbine is mated to a heat recovery module, usually made up of a steam generator and a steam turbine. These are generally uncommon in the UK, due to the convenience of constructing multiple units alongside each other. Three power stations were contained in our dataset. From a modelling perspective, this presents the simplest structure. In this instance, the efficiency of the combined gas and steam generation can be modelled as an non-linear least squares asymptotic regression. The following parametrisation was used, expressing efficiency as a function of the load factor:

l  e(l ) = emin + (emax − emin) exp t . (5.7) t c Where emax is the maximum efficiency of the power train and emin is the minimum ef- ficiency, with c being a constant to be fitted, interpreted as indicative of the rate at which efficiency increases with load. A good fit using non-linear least squares on (5.7) was obtained for the single-unit stations considered. A typical example can be seen in Figure 5.4, where the linear model of efficiency is also included for reference. Firstly, we observe the majority of values lie within the prediction intervals of the model, and visually the curve appears to capture the behaviour of the data very well. Almost all values lying outside of the curve correspond to low-flow days as can be seen on the top plot, where they do not appear as ob- vious outliers. This suggests that for instances of only short periods of electrical generation, the high level of aggregation (30 minute load and daily gas flow) can result in anomalous efficiency/load values.

Multi-power Train Stations

As previously mentioned, power stations will usually consist of power trains, due to efficien- cies of scale gained in construction and convenience. These units will not always operate at equal load factors; efficiency maximisation will be prioritised. This usually means a single CHAPTER 5. UAG-LED NODE ERROR DETECTION 176

Figure 5.4: Top: Linear model for efficiency ratio, with 95% confidence intervals. Bottom: Efficiency vs load factor for single-unit Seabank B power station. Asymptotic curve and 95% confidence band overlayed in red. Red points correspond to same days on both plots.

● ●● ●●●●● ●● ●● ●● ● ● ● ● ●●● 1.5e+07 ●● ● ●● ● ●● ● ● ● ●● ● ●● ●● ●●● ●●●●●● ●●●● ● ●●● ●●● ●●● ●●●●● ● ● ● ●●● 1.0e+07 ● ●●● ●●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●

Gas F low (kWh) ● ● ●● ● ● ●●● ● ●● ●● ●● ● 5.0e+06 ● ● ● ●●● ● ●●●● ●●●● ●●● ●●●●● ●●● ● ●●●●●●● ●●●● ● ● ● ● 0.0e+00 ●●●

0 2500000 5000000 7500000 Electricity Generated (kWh)

● ● ●● ●● ●●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●●● ● ●●●●●●● ● ● ● ● ● ● ●● ●●●●● ●● ●● ●● ●●●●●●● ● ● ● 0.5 ● ●●●●● ● ●● ● ●● ● ●● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●●● ●● ●●●● ●● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ●●

Efficiency ● ●● ● ● 0.4

● ● ● 0.3 0.00 0.25 0.50 0.75 1.00 Load Factor CHAPTER 5. UAG-LED NODE ERROR DETECTION 177 power train will be utilised up to its optimal load before a second unit is activated, bar- ing other operational requirements such as maintaining loads at levels allowing for a quick demand response. This latter mode of operation may be required as a cold-start takes considerably more time. Cases where gas turbines do not share a heat recovery module or steam turbine are hereby referred to as multi-power train stations. Such plant’s total efficiency can be written down

i th as follows, assuming it contains n power trains, with lt representing the load factor of the i i power train at time t, and e(lt) its efficiency:

n X li etot = t e(li), (5.8) t Pn j t i=1 j=1 lt assuming that units have the same maximum power output:

1 2 n Omax = Omax = ... = Omax.

Under no external demand or operational constraints, power stations will aim to operate within the curve defined by the following non-linear programming problem, thereby max- imising their efficiency for a given plant power output, expressed as a total load factor ltot:

n i opt tot X l i e |l = max n e(l ) (5.9) l1,l2,,,ln P lj i=1 j=1 subject to the following constrains:

n X li l1, l2....ln ∈ {0, [lmin, lmax]}, = ltot, (5.10) n i=1 and where e(li) is defined by the single-power train efficiency function, which is in this instance non-linear, as defined in (5.7). To model the overall efficiency profile, (5.7) must first be fitted to the data by separating the region of historic data points corresponding to optimal single-unit flow, prior to calculating the curve corresponding to (5.9). It must be noted that (5.9) defines the optimal efficiency curve. Beneath this curve there is a region where generation is possible, but does not maximise efficiency. Therefore values that may fall outside the confidence intervals of the optimal curve, but inside those of the feasible generation region, can only be interpreted as follows:

ˆ Indicative of meter/data error

ˆ Indicative of less that optimally efficient operation CHAPTER 5. UAG-LED NODE ERROR DETECTION 178

The latter possibility, coupled with the lack of additional operational knowledge on the transporter’s side and the low overall variation in efficiency means that abnormal values should only be defined as those outside the prediction intervals of the feasible region, in the case of multi-unit stations. A simulated load/efficiency profile, based upon Equation (5.9) can be seen in Figure 5.5, with the single-power train operation curve depicted in blue and the multi-power train feasible region depicted in red. A composite, optimal efficiency curve has been overlayed in black. Once the coefficients of the regression function (5.7) have been estimated, the boundaries of the feasible region of multi-power train operation can be calculated directly by evalu- ating (5.9) subject to both maximisation (as stated, in the case of optimal efficiency) and minimisation. This yields the lower and upper bounds respectively. Anomalous values in the case of multi-power train stations can then be identified as those falling outside the simple confidence interval defined by the single operation curve, and those outside the confidence intervals of the feasible region, obtained by combining the upper interval of the upper bound and lower interval of the lower bound. The multi-power train configuration is by far the least common in the UK, and no such power stations were present in our data set. This is because it is usually more efficient to combine the steam generated by two gas turbine heat recovery modules for use in a single steam turbine, rather than install steam turbines on a one-to-one basis. This method is most suitable for modelling open cycle power stations, as these do not have heat recovery modules. However, studying this case is necessary prior to considering the next, and most common scenario; combined power train stations.

Combined Power Train Stations

The most common design in current CCGT plants in the UK (reflected by 11 power stations in our data) is to share head recovery modules between gas turbines. This presents a further problem in our modelling scenario, as the load of the steam turbine is no longer equal to the gas turbine load. Moreover, the power output of the steam turbine is not equal to that of the gas turbines. As such, the weighting has to be based on power output rather than load factor as in (5.9). We cannot jointly model the gas and steam turbine efficiency, and therefore we use esteam(lsteam) to denote the efficiency of the steam turbine at at a give set of gas turbine load factors li. Moreover, we have that the steam turbine load factor is always CHAPTER 5. UAG-LED NODE ERROR DETECTION 179

Figure 5.5: Simulated efficiency/load profile for a power station consisting of two fully inde- pendent power trains. Composite optimal efficiency curve is overlayed in black.

0.45

0.40

Operation

Multiple Single Efficiency 0.35

0.30

0.2 0.4 0.6 0.8 1.0 Load Factor

equal to:

n 1 X lsteam = li. n i=1 i th Also let egas denote the efficiency of the i gas turbine. Therefore, for n gas turbines sharing a steam turbine (the most pervasive design), the optimal efficiency given a total plant load factor ltot can be expressed as the following non-linear maximisation problem:

n i Pn ! opt tot X O i i Os steam i=1 li e |l = max n egas(l ) + e ( ) , (5.11) l1,l2,...,ln P Oj + Osteam O n i=1 j=1 tot where Oi corresponds to the power output of the ith gas turbine, Osteam to the steam turbine

steam power output, and Omax the maximum power from the steam turbine. The above problem is also subject to the following constraints:

steam Pn i i Omax Pn i 1 2 n min max i=1 l Omax + n i=1 l tot l , l , ...., l ∈ {0, [l , l ]}, i steam = l . (5.12) nOmax + Omax Unfortunately, the same approach of fitting an asymptotic regression based on single- unit efficiency/load from historic data cannot be adopted in this instance. The reason is CHAPTER 5. UAG-LED NODE ERROR DETECTION 180

steam steam i i due to the impossibility to evaluate e (l ) or egas(l ) independently from the data. We demonstrate this by adapting the multi-power station approach in the modelling of the efficiency profile of Langage power station, which consists of two gas turbines mated to a single steam turbine. The data, with the curve of optimal efficiency overlayed in red is seen in Figure 5.6. It is evident that whilst points of single turbine operation present a good fit with the curve, the subsequent region exhibits a much greater efficiency than is predicted. It is also evident that there is a greater amount of variation in the points corresponding to dual-turbine operation, as explained by them being placed on a region rather than a curve.

Figure 5.6: Efficiency vs load factor for two-unit Langage power station. Composite asymp- totic curve derived from single-unit efficiency overlayed in red – poor fit in dual-turbine operation is evident

0.5

0.4 Efficiency

0.3

0.25 0.50 0.75 1.00 Load Factor

This is all the more so apparent when taking into consideration the general efficiency profiles displayed by such stations, evidenced in Figure 5.7. Moreover, we note that the sawtooth pattern is not discernible for all combined power train stations, and in some cases there is no clear structure at all. The high overall variation, coupled with the difficultly in formulating a parametric model therefore suggest that in practical terms, purely data- based modelling of efficiency vs load in the case of combined power train stations is typically infeasible. CHAPTER 5. UAG-LED NODE ERROR DETECTION 181

Figure 5.7: Efficiency vs load factor for a set of power stations. All stations excluding Seabank B and Epping Green are of the combined power train variety. Coryton Keadby Seabank B

● 0.5 ●●●● ● ● ●●●●●●●●●●● ● ●●●●●●●●●●●●●●●● ● 0.5 ●●●● ●●● ●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●● ●● ● ● ●● ●●●●●●●●●●●●●●●●●●● ● ●●●●● ●●●●●●●●● ●● ●●● ● ● ●●●●●●●●●●●● ●●●●●●●● ● ● ● ● ●●●●●● ●● ● ●●●●●●●●●● ● ●●●●●●●●●●●●●●● ●●● ● ●●●● ● ● ●●●●●●●●●● ●●●●●●● ● ● ●● ● ●● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●● ●●●●●● ●● ● ●● ● ● ●● ● ●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●●● ● ● ●●● ●● 0.5 ●●●●●● ●●●●● ● ●●●●● ● ●● ● ● ●●●●● ●●●● ●●● ● ●● ●●● ● ● ● ●● ●●●● ● ●●●● ●● ● ●●●● ●● ● ● ● ●●●● ● ●● ● ●●●● ● ● ● ●●● ● ●●●●● ● ● ●● ● ● ● ● ●●●● ● ●●● ● 0.4 ● ● ●● ● ● ●●●●●●● 0.4 ●●● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ●● ● ● ●● ●● ● ● 0.4 ● ● ● ● ● ●● ● 0.3 0.3 ● ● ● ● 0.3 ● ● ● ● ● ● ● 0.2 ● 0.2 ● 0.2 ● 0.25 0.50 0.75 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Damhead Creek Langage Stallingborough 1

● ● ● ●● ●●● ● ● 0.5 ● ● ●●●●●●●●●●●●●●●●●●● 0.5 ● ● ● ●●●●●●●●● ●●●●●●● ●● ● ● ●●●●● ●● ● ●●●●●●●●● ● ● ● ●●●●●●● ●●● ● ● ●●●●● ●●●● ●●●●●●● ● ● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●●● ● ● ● ●● ● ●●●●●●●●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ●●●●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●●● ● ●●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●●●●●●● ●●● ● ●● ● ●●● ● ● ●● ●● 0.5 ● ●●●●●●●●●●●●●● ● ●●● ●● ●● ● ● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●● ● ●● ● ● ●● ●● ●● ●●●●●●●●●●●●● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●●●●●●●●●●●● ● ● ● ● ●● ● ●● ● ●● ● ●●● ● ● ●● ● ● ● ● ●● ●● ● ●●●●● ●●●●●●● ●● ● ● ● ●●● ●● ●● ● ●● ●●●●● ●● 0.4 ● ● ● ●● ● 0.4 ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●●●●●● ● ●● ● ● ● ●●● ● ●● ● ●●● ● ● ● ● ● ●● 0.4 ● ● ● ● ● 0.3 0.3 ● 0.3 ● ● ● ● ● 0.4 0.6 0.8 0.25 0.50 0.75 0.00 0.25 0.50 0.75 1.00 Epping Green Marchwood Stallingborough 2

● ● ● 0.5 ●● ● ●● ● ● ●●●●●●●●● ●

Efficiency ● ● ●●● ●●●●●● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●●●●● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●● ●●●● ● 0.5 ●● ●●●● ● ● 0.6 ●●●●●●● ● ● ●● ● ●● ●●●●● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● 0.5 ● ●●●●●●●●●●●●●●●● 0.4 ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●● ● ●●●●●●●● ● 0.4 ● ● ●●●●●● ● ●● ● ● ● ●● ●●●●● ●●●●● ● ● ● ●●● ● ● ●●●● ● ● ● ●● ● 0.4 ●● ●●● ● ●● ● ● ● ● ● 0.3 0.3 ● ● ● ● 0.3 ● ●● ●● ● ● ●● ● ● ● ● ● ● 0.00 0.25 0.50 0.75 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Great Yarmouth Rocksavage Sutton Bridge

0.55 ●● 0.50 ● ●●● ●●●● ● ● ● ● ● ●●●●●●●●● ●● ●● ●● ● ● ●●●●●●●●●●●●●●●●● ● ●●●●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●● ●● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●● ●● ●●●●●●● ●●●●●●●● ●●●●● 0.50 ● ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●● ● ● ●●● ●●●●● ● ●● ● ● ●● ●●● ● ● ●● ●●● ●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ● ●● ● ●●● ●●●●●●● ●●●● ● ● ●● ●●● ●●●●●●●● ● ● ●●●● ●●●●●●●●●●●●●●●● ●● ● ● ●● ● ● ●● ●●●●●● ●●●● ●● ●●● ●●●● ● 0.50 ●●●●●●●●● ●●●●●● ● ● ● ● ●●●●●●●●● ● ● ● ●● ● ● ● ●●● ● ●●●●●●●● ●● ● ● ● ●●●●●●●●●●●●● ● ● ●●● ● ●● ● ●●●●●●●● ●●●●●●● ● ● ● ● ● ● ● ● ●●●●●● ● ● ●● ●● ● ● ● 0.45 ●●● ●●● ● 0.45 ●●● ●● ● ● ● ●●●● ● ● ●● ● ●●●●● ● ● ●● 0.45 ●●● ●●●● ● ● ● ● ●● ● ● ●●●● ● ●●●●● ● ● ●●●● ●●● 0.40 ●● ● ● ● 0.40 ● ● ● 0.40 ● ● ● ● ● ● ● ● 0.35 ● ● 0.35 ● ● 0.35 0.30 ● ● 0.30 ● ● ● ● 0.25 0.50 0.75 1.00 0.2 0.4 0.6 0.8 0.25 0.50 0.75 Load Factor CHAPTER 5. UAG-LED NODE ERROR DETECTION 182

Error Detection Sensitivity

Finally, we attempt to quantify the sensitivity provided by the use of the linear model methodology for identifying anomalous gas flows. This sensitivity is contingent on three variables; the measurement uncertainty of both gas and electricity metering installations, and the efficiency ratio maintained throughout the interval. If calculating sensitivity of percentage errors, then the volume also plays a deciding factor. Prediction interval width can also be taken as a direct measure of sensitivity, although this is hard to easily interpret. To get a representative value of sensitivity, and also to allow us to compare the detection probability with that of the baseline method we make the following assumptions: a power station consumes its nominal amount of gas, as defined in (3.36). The energy efficiency is constant throughout the periods of operation, and we evaluate this at the 30th and 90th quantiles of the historic efficiency. We evaluate the probability

nominal P ((1 + c)ˆνi > Wi) (5.13) where Wi is the upper prediction interval as calculated from robust regression of the linear model specified in (5.5), and c is the scaling error. Wi is a random variable, as it is a function ˆ of the measured quantity Oi. Due to this, we evaluate (5.13) via simulation through the procedure detailed below:

nominal 1. Calculate nominal node flow,ν ˆi . Assume this is the true gas usage.

ˆ nominal 2. Using the selected ratio, calculate the true electricity generation as Oi = eiνˆi .

3. Subject to meter uncertainty, simulate Gaussian measurement noise for both gas and electricity values. We used an uncertainty of 1.1% for the gas and electricity value alike.

4. Add in scaling error to measured gas value at required level; in this instance, 5%.

5. Calculate prediction interval based on the linear model described in (5.5), through robust regression.

6. Note whether (5.13) holds; if so, increase iterator j.

7. Repeat steps 2-6 n times, where n is a sufficiently large sample size. (5.13) is then j approximated by the ratio n . Here we used n = 10000. CHAPTER 5. UAG-LED NODE ERROR DETECTION 183

Table 5.4: Error detection probabilities for 5% and 10% of nominal power station flow, as compared to baseline method. 5% 10% 5% 10% 5% 10% Name False Positive Efficiency Q90 Efficiency Q30 Baseline Method Coryton 0.05 0.94 1.00 0.70 1.00 0.17 0.19 Damhead Creek 0.01 0.76 1.00 0.45 1.00 0.17 0.23 Epping Green 0.02 0.90 1.00 0.13 0.98 0.16 0.18 Great Yarmouth 0.03 0.90 1.00 0.10 0.98 0.17 0.19 Keadby 0.01 0.74 1.00 0.43 1.00 0.16 0.18 Langage 0.02 0.89 1.00 0.50 1.00 0.17 0.20 Marchwood 0.06 0.96 1.00 0.28 1.00 0.17 0.23 Medway 0.02 0.89 1.00 0.99 1.00 0.17 0.19 Rocksavage 0.06 0.95 1.00 0.41 1.00 0.17 0.20 Spalding 0.00 0.75 1.00 0.10 0.98 0.17 0.21 Stallingborough 1 0.04 0.94 1.00 0.71 1.00 0.16 0.18 Stallingborough 2 0.02 0.88 1.00 0.21 0.99 0.16 0.18 Sutton Bridge 0.23 0.99 1.00 0.83 1.00 0.17 0.19

Table 5.4 contains the results following the application of the above procedure for a range of mean efficiencies, and compares the energy-conservation based detection probabilities with those attained by using the baseline method. We can see the following takeaways: at 10% and above, detection probability is close to 100% for both tested efficiencies when using regression, as compared to on average 19% for the baseline method. However, there is a sharp decline in 5% detection probabilities when using the Q30(e) value for efficiency, with detection averaging 45%. Nonetheless, this is still a significant improvement over the baseline method, where detection probability is at 17%.

Summary

We have found that using robust linear regression to model gas usage as a function of electricity generation can provide an effective method of detecting errors. In particular, practically all zeroing out/ missing data errors can be eliminated through this method, and the detection probability of errors larger than 10% at nominal loads for the studied power stations is close to 100%. It must be noted that simple filtration based on the daily efficiency ratio was also found to be effective in detecting large errors. We have also found that asymptotic regression provides a good fit to the historic load vs efficiency curve of single power train stations. More complex stations of the CCGT type may require extraneous knowledge about their performance parameters for a parametric model to be formulated. CHAPTER 5. UAG-LED NODE ERROR DETECTION 184

The variation between the load/efficiency profiles of these stations may indicate that this modelling task may not be suited to a generalised model, and is best performed by the stations themselves rather than the transmission grid operator. However, in cases where such models are available, using them to detect errors based on load/efficiency modelling will prove yet more sensitive, as the only source of uncertainty will be the measurement uncertainty of both meters. The main disadvantage of both methods is their exposure to the risk of false positives due to electricity measurement or electricity reporting/data errors. Finally, we can conclude that simple linear models and even flagging values based on a pre-set efficiency ratio bound can be a powerful tool for transmission grid operators.

5.1.3 Industrial Customers

Indeed the joint energy balancing method can be extended to apply to any form of mass or energy balancing process downstream, provided this accounts for the entirety of gas flow through the offtake. However, centralising data and analysing the information may prove challenging in certain scenarios; for example, whilst mass balances may be monitored in a chemical manufacturing plant, this may cover multiple processes and installations, with varying production levels throughout the day. Further compounding factor could be gas pumped into on-site storage, or within site linepack. An aggregate mass balance figure would then have to be computed and relayed to NG. Standardisation of such a process for the large variety of industrial customers would not be realistic. Moreover, unlike in the power station scenario, the transmission grid operator is unable to independently infer mass or energy conservation through available data. Therefore, industrial users should be encouraged to carry out such analysis themselves and to immediately notify the transmission operator of instances where mass or energy conservation within measurement tolerances is not attained.

5.1.4 LNG Terminals

In the case of LNG terminals, daily energy conservation can be estimated by equating the energy outputted into the transmission grid to the difference in shore tank storage in addition to the quantity of imported gas that has been offloaded from tankers during the time period. The uncertainty and calculation methods involved will vary on a site-by-site basis. This is CHAPTER 5. UAG-LED NODE ERROR DETECTION 185 not a function that can be carried out by the transporter, in cases where they are not also the operators of the LNG facility.

5.1.5 Compressor Station Metering

One of the key problems faced in the error detection setting of energy flows in complex transmission grids is the lack of redundancy in measurement. Energy is metered at output and input points, but only once; due to the cost of such installations, there are no additional meters at transit points within the network – typically, with one exception. In many networks, and within the NTS, volume is metered at compressor stations (energy is not metered as it is irrelevant from an operational standpoint). This sometimes provides the opportunity to further subdivide the transmission grid into two connected components. The condition for this is that there exist no links between any of the nodes of the two components, aside from the compressor station itself. In cases where this holds, two UAG components can be calculated. This allows for the narrowing-down of the region accountable for a significant error, in cases on high UAG. However, in complex grids such as the NTS, the above scenario may not be applicable as compressor stations do not always function continuously. More so, operation must be continuous throughout the UAG calculation period (in our case, a day). Nonetheless, we in- clude this possibility as a potential consideration for international grids. Data reconciliation techniques can also be applied in this instance, provided the right conditions hold.

5.1.6 Interpretation

Finally, we discuss the interpretations and subsequent suggested action of the four com- binations possible between the balancing states (normal/abnormal) of the upstream and downstream energy conservation terms. Here, the normal and abnormal states are deter- mined by whether or not the value is within the calculated baseline. For instance, in our scenario the upstream term is always Ut, and in the case of distribution grids the down- D stream term is Ut , and their states are determined as per the respectively calculated UAG baselines. The four eventualities are presented in Table 5.5. Whilst cases 2 and 4 are straightforward in their interpretation, cases 1 and 3 warrant further elaboration, and will be discussed separately: CHAPTER 5. UAG-LED NODE ERROR DETECTION 186

Table 5.5: Actions recommended depending on upstream and downstream system energy conservation state, respective to baseline Case Upstream Downstream Interpretation/Action Significant evidence for data or metering error 1 Abnormal Abnormal Data validation/verification / Physical inspection required 2 Abnormal Normal The node can be excluded from the list of possible causes 3 Normal Abnormal Additional investigations required 4 Normal Normal No further action necessary

Case 1

In case one, where both the upstream and downstream energy balancing terms are outside the baseline there are two possible variations, each of which must be interpreted differently. The variations are differentiated by whether the balances err in synchrony or not.

1. Opposing Errors: In this instance, errors observed in the downstream and upstream term are in opposition to each other. In the distribution grid scenario, this can be

D interpreted as Ut being negative, and Ut being positive, or vice versa. This is an important feature, as a consistent measurement or data error at the metering point will always result in errors of opposing signs observed on either side, regardless of whether it is an under- or over-read. Combined, these errors suggest the presence of an error at one of the nodes joining the transmission and distribution networks is highly likely. In this instance, it is recommended that data validation and verification checks be immediately carried out. In the case that this pattern is repeated on consecutive days, this should be taken as sufficient evidence that physical site inspection is warranted.

It can also be beneficial to consider the magnitude of the indicated error; in the case E,D of distribution, excess UAG Ut . It could be posited that if the energy conservation abnormalities are due to the same error, then transmission and distribution excess UAG should be roughly equivalent:

E E,D Ut ≈ Ut (5.14)

However, this may not necessarily be the case, as the true UAGs for both the transmis- sion and distribution grids are unknown and will typically exhibit significant variation. Therefore, (5.14) should not be taken as a requirement for further action, but if it is satisfied it can be interpreted as additional evidence pointing toward an error. CHAPTER 5. UAG-LED NODE ERROR DETECTION 187

2. Synchronous Errors: Interpreted as equal signed UAG terms, when considering the dual-grid case. Under this eventuality, the likelihood of a significant error existing at nodes joining the upstream grid with the specified distribution network are exceedingly low. The abnormal UAG is in all likelihood due to an error in a node that does not account for gas transfer between the specified downstream and upstream networks.

Case 3

When the downstream energy balance of a node is anomalous, yet the upstream is within limits, further investigations may be warranted. A data or meter error for the valuesν ˆi,t of the nodes composing the supply of the distribution network cannot be excluded, as evidence in Section 3.5.5 has shown. Therefore, additional numerical analysis should be undertaken, such as that discussed in the next section. The cause of the abnormality in this case may rest in any one of the other demand nodes composing the specified downstream distribu- tion network. When considering power station efficiency, the equivalent is an error in the electricity metering. Therefore if insufficient evidence to further suggest the presence of a transmission-side error is found, this case need not be investigated.

5.2 Predicted vs Actual Flow Analysis

Predictions of node flows are numerical approximations of the node flow νi,t, made without knowledge of the observed valueν ˆi,t. We denote such predictors asν ¯i,t. Transmission grid operators will endeavour to produce or acquire accurate predictions for node flows for a multitude of reasons; as indicated by NG [58] these include safety, security of supply, regu- latory requirements, investment decisions and operational planning. A reason not typically considered is the application of our method: the analysis of predicted versus actual flows. This analysis can aid classify measured flows as anomalous, and flag them for further inves- tigation. Unlike all other reasons, this method makes retrospective use of predictions, rather than using them to guide or predict future behaviour.

5.2.1 Methods of Prediction

Before we consider the aforementioned analysis in more detail, we discuss methods of pre- dicting node flow typically. CHAPTER 5. UAG-LED NODE ERROR DETECTION 188

1. Forecasted values: Transmission grid operators will typically create statistical fore- casts for daily flows so as to aid grid balancing and planning. However, forecasts may not be made for all node types, and are usually created at higher aggregation levels

(in the NTS, these are at LDZ-level). Importantly, forecasts for a flowν ˆi,t are created ahead of time. If we denote the last known time used in predicting flows as t0, then in the forecast setting t0 < t. Predictors of forecasted flows will vary according to node type. For instance, in the case of LDZ offtakes these might be weather-related variables.

2. Modelled values: Prediction ofν ˆi,t can be improved by including data points at and beyond the prediction time t; therefore, in this instance t0 ≥ t. This is impossible in a forecast setting, as these are not yet available. However, in our scenario they can be included as analysis takes place after-the-fact. The inclusion of this information results in prediction accuracy strictly greater than or equal that of forecasted predictions. Predictors such as closely correlated nodes can be used in this instance, in certain cases. The simplest modelled predictions are those obtained by using existing forecast methods with measured, rather than forecast, predictors. Another example is using the linear model 5.5 alongside power generation measurements to estimate gas flow.

3. Commercial Nominations: In the NTS, shippers are required to nominate values ahead of time regarding the amount of gas they intend to either input or offtake at each separate entry or exit point on the system. Summation of nominations for each node should therefore result in a rough prediction of flow.

4. External Predictions: NG receives information on daily and hourly expected flow from entry terminals, storage facilities, interconnection points and large end consumers, such as gas-fired power stations, to aid in linepack planning. As above, aggregating this information can yield a predicted daily flow value. These predictions will be made using information available only to the counterparty receiving/supplying gas to the NTS; for example, an industrial plant can accurately provide predicted values as its production schedule is known in advance.

Whilst the last two points discussed may be specific to the NTS, all transmission grids will have some equivalent procedures for obtaining estimates of loads as this is necessary CHAPTER 5. UAG-LED NODE ERROR DETECTION 189 for day-to-day network balancing. Indeed, it is only the second method that does not take place under normal circumstances, as it serves no purpose assuming accurate measurements are available and the analytic cost of model creation and maintenance is not justified out- side of the error detection and model validation scenario. The prediction methods can be grouped into statistical-based approaches (in this case, these are methods 1. and 2.) and extraneous-information based approaches (3. and 4.). Hybrid predictions, where historical and expected load profiles are combined with statistically generated forecasts by an analyst are also common. The availability of predicted flow values will therefore vary across transmission grid op- erators, and across node types. In instances where more than one prediction is available, considering the realised measurement against multiple predicted values will allow for even greater confidence in decision making.

5.2.2 Classification

Our use for predictions is in order to classify a node as anomalous or not; this classification will guide the decision as to whether investigative action is taken or not. The method of classification differs according to the nature of the predictor. For statistical-based predictors, generated from a known model, standard prediction intervals which we denote as PI =

− + [¯νi,t, ν¯i,t] can usually be calculated either analytically or through simulation. Anomalous nodes in this case are those whose observed flowsν ˆi,t are outside the prediction interval, i.e.

νˆi,t ∈/ PI. Conversely, extraneous information-based approaches will typically only provide a point predictionν ¯i,t, as is the case when considering nominations in the NTS. In such scenarios, classification can be carried out based on purely the absolute difference,

|ν¯i,t − νˆi,t| > Li, (5.15)

with Li being a decision limit to be determined. In all cases, the past performance of the predictor must be considered. Reliance on poor predictors may result in alert fatigue, and the development of mistrust in the method to yield actionable information. As such, we will next consider what predictor performance can be expected, by focusing on one particular class: forecasted values. We review the academic literature, and the results stemming from NG’s own forecasting methodology. CHAPTER 5. UAG-LED NODE ERROR DETECTION 190

5.2.3 Statistical Prediction in Literature

Gas transmission load forecasting is a topic that has received much attention, both in practice by transmission operators and in academia. A large number of models have been applied to demand time series such as wavelet transform [65], sigmoidal and spline regression [29], non-linear grey models [78] and artificial neural networks [83]. Most often, models are applied to either aggregate demand, or city-level offtakes where weather-related predictors are paramount, and good fits are generally achieved. In this section, we do not aim to improve upon forecasting methodology, but to assess the likely forecast error that can be expected. As discussed in Section 3.2.4 and as evidenced in Figure 3.12 the transmission setting provides time series of varying structure. Some series, in particular single-feed LDZ offtakes will have highly predictable continuous time series, allowing for the creation of a high- accuracy forecasts. Other series, like storage, can be expected to be mostly stochastic and unpredictable. In some cases, such as power stations, large bodies of work exist relating to forecasting electricity demand (e.g. [8], [9]), although this is typically on an aggregate regional level. For other nodes, such as industrial customers, there may be little to no predictors available, and only extraneous information may allow for predictions to be made. Indeed, in the case of high-energy consuming industries, forecasting is usually carried out in annual time scales, as seen in [3]. Next, we consider two examples from literature that demonstrate both the varying nature of forecasted error in transmission, and the overall efficacy of such forecasts. The work in [68] combines several state of the art machine learning methods to create a hybrid model for transmission hourly energy flow in the German transmission grid. In that instance the mean MAPE for the best hybrid algorithm was still 15% with a range of 3% to 32%, across a number of industrial, power station and residential offtake nodes. Moreover, there was significant variation of the MAPE between groups. Likewise, a very similar pattern can be observed in [65], where demand nodes in the Greek transmission grid were studied. City- wide offtakes had a low MAPE, whereas industrial and power generation units did not. Of the distribution points covered in the latter study, 14 out of 41 nodes had a MAPE greater than 100%. The remaining nodes had a mean MAPE of 14%. When the nodes were refined to only large cities, the MAPE fell to 9%. CHAPTER 5. UAG-LED NODE ERROR DETECTION 191

Table 5.6: Forecast metrics for NG linear regression daily LDZ energy flow prediction methodology, 2017-2019. MAE, MSE, RMSE are given in GWh. LDZ MAE MSE RMSE MAPE NMSE RSTD East Anglia 5.79 66.14 8.13 5.33% 0.0153 0.0701 East Midlands 7.50 111.10 10.54 5.07% 0.0172 0.0672 North East 6.22 73.32 8.56 5.85% 0.0318 0.0820 North England 4.91 43.91 6.63 5.52% 0.0286 0.0745 North Thames 6.23 71.58 8.46 5.06% 0.0111 0.0604 North West 11.84 248.73 15.77 6.20% 0.0305 0.0843 Scotland 6.82 83.07 9.11 5.14% 0.0213 0.0674 South East 6.12 75.00 8.66 4.68% 0.0103 0.0594 Southern 4.94 47.93 6.92 4.94% 0.0140 0.0658 South West 4.85 45.10 6.72 5.96% 0.0208 0.0823 West Midlands 6.09 72.99 8.54 5.26% 0.0144 0.0687

5.2.4 NG Load Forecast

In the NTS, demand forecasting is primarily done on an aggregate basis at the network and LDZ-level, but also carried out for certain large nodes. Demand is modelled through a linear regression, with terms accounting for a standard daily seasonal demand, weekend or weekday status and a weather-based predictor in the CWV variable. A detailed calculation of the CWV can be found in Appendix B. In brief, this variable is formed by combining a mean of the present and previous days temperature, factoring in a wind correction and limiting the variable at the extreme ends of temperature fluctuation, beyond which gas demand no longer increases or decreases linearly. This amalgamation of demand response behaviour, wind and temperature data into a single linearly dependent variable is an alternative to using a more convoluted parametric model, such as sigmoid regression [29], which is the agreed upon standard in German national transmission. For a detailed discussion on the modeling procedure, see [58]. We calculated the standard metrics for LDZ forecasts made by National Grid, and these can be seen in Table 5.6. The MAPE for all nodes ranged between 4.68% and 6.2%, with a mean of 5.23%. This performance is favourably comparable to the results presented in academic literature. Aggregate forecasts such as those at the LDZ-level represent a trade-off between gran- ularity and accuracy. In general forecasting practice, by aggregating data and therefore modelling at a less granular level, higher accuracy and precision can be attained. Each subsequent level of aggregation diminishes the ability to localise an error to a specific node. CHAPTER 5. UAG-LED NODE ERROR DETECTION 192

Rather, the error can only be localised to the aggregation level. However, at the very least this can be used to reduce the subset of nodes more time consuming analytic techniques are applied to and thereby reducing analytic cost.

5.2.5 Interpretation

A significant difference between the predicted and actual flow values can be taken to be indicative of a metering or data error. However, even cases of extreme model error such as a non-zero prediction against a zero actual flow can be explained by a plethora of reasons: a poor forecasting model, an erroneous nomination, or operational considerations are just a few examples of such reasons. Suspicion can be strengthened when the historic metrics, such as accuracy or MAPE of a predictor are examined. Likewise, large differences with respect to multiple predictors can also provide additional evidence. However, what distinguishes the application of predicted vs actual flow analysis in this application is the ability to examine a predicted value’s effect on the balancing term UAG – thereby providing a unique dimension to the method.

5.2.6 Predicted UAG

¯ th Predicted UAG, denoted as Ui,t is UAG recalculated with the its i constituent substituted for a predictor of its value, rather than the measured term. It is calculated simply as

¯ Ui,t = Ut + Ii

Identifying Large Error Candidates

Particularly in cases where daily UAG is outside of the baseline, predicted UAG can be used to identify predictors which, if it transpired that they were indeed representative of the true flow, would result in a considerable decrease in UAG; preferably to within the baseline. Conversely, it can also be used to exclude predictors suggesting investigations take place, in instances where the resulting predicted UAG is considerably worse.

Prioritising Inspection Order

¯ Ui,t is subsequently useful to consider as a method of prioritising the inspection order of multiple suspicious nodes, when the objective function is to purely minimise absolute UAG. ¯ That is to say, inspections are conducted in order of lowest to highest based on |Ui,t|.

5.3 Comprehensive Error-reducing Process

Thus far, we have considered a wide range of topics surrounding UAG – namely, uncertainty analysis, baseline model formulation, metrics and error detection techniques. However, we have not yet discussed how a transmission grid operator can use UAG in the most meaningful way, and how to structure daily control processes around this statistic. In other words, we tie the results of the preceding chapters together from the perspective of a transmission grid operator. This is the focus of the current section, and we will demonstrate that adopting a well-reasoned high-level UAG process methodology is critical. We present two error-reducing processes: One where monitoring node-flow is UAG con- tingent, and one where it isn’t. This is followed by a discussion on the guiding motivations to be considered when choosing or developing an error-reducing process.

5.3.1 Baseline Contingent Error Minimisation

An example of such a process is given in Figure 5.8. Here, analytic techniques on node flow are carried out only when UAG exceeds the baseline, in which instance nodes are investigated sequentially until either an error is discovered which returns UAG to below the baseline, or the investigative threshold (discussed in Section 5.3.3) is exceeded. The described method minimises analytic cost, whilst maximising the percentage of time UAG CHAPTER 5. UAG-LED NODE ERROR DETECTION 194

Figure 5.8: Flowchart indicating control flow of a baseline contingent error minimisation process.

Node Measurement Excess UAG Yes Stream >0?

No Additional Node F low A nalysis Operational Data: UAG Calculation Linepack, etc

Joint Balancing Predicted Flow Additional Techniques Techniques Analysis

Baseline Model Baseline Analysis Formulation

Classification of Flows

Yes Sequential End of Analytic Excess UAG UAG Recalculation Investigation Function >0?

No

Yes No Investigative Threshold Exceeded?

is within the baseline. This minimisation is evidenced by the contingent nature of the node flow analysis, and by the fact that anomalous flows are investigated sequentially – if an error resulting in UAG returning to baseline bounds is uncovered, the analytic process is terminated. This goal can be furthered by strictly investigating only those nodes i where ¯ |Ui,t| < |Ut|. By investigating nodes we refer to the data validation and verification checks already mentioned, and in extreme cases, physical meter inspection.

5.3.2 Independent Error Minimisation

Figure 5.9 illustrates an error minimisation process that is not contingent – and therefore independent of UAG. Unlike in Figure 5.8, we see that the node flow analysis is undertaken regardless of the results of the baseline model. However, the baseline model serves to trigger the use of ‘contingency techniques’. These techniques can be viewed as analytic functions which come at a greater time or financial cost, and are only carried out when there is a high suspicion of a large error. Another difference is that flows are not inspected sequentially; CHAPTER 5. UAG-LED NODE ERROR DETECTION 195

Figure 5.9: Flowchart indicating control flow of an independent node-flow error minimisation process.

Node Measurement Stream

Additional Node Flow Analysis Operational Data: UAG Calculation Linepack, etc

Contingency Joint Balancing Predicted Flow Techniques Techniques Analysis

Baseline Model Excess UAG Baseline Analysis Formulation >0? Yes

Classification of Flows

No Excess UAG Investigation of High End of Analytic UAG Recalculation Function >0? Suspicion Flows

Yes

Yes Investigating No Investigation of Lower Threshold Suspicion Flows Exceeded?

rather, all flows classified as anomalous are investigated. However, the stop conditions are the same as when investigation is triggered in the UAG-contingent model: the analytic function is terminated either when UAG is below the baseline, or the investigative threshold has been reached. The primary objective of this process is to minimise flow errors and UAG serves only as a guide in order to achieve it. However, as a consequence the UAG is also minimised, subject to the natural underlying variation.

5.3.3 Investigative threshold

The investigative threshold is the stop condition ensuring the processes we have considered are well-defined in terms of maximum cost. Factors that influence it include, but are not limited to:

ˆ Non-tangible deterioration of relationships with stakeholders. This may result from, CHAPTER 5. UAG-LED NODE ERROR DETECTION 196

for example, persistently requesting meter inspections or verification of data from the meter installation owners.

ˆ Alert fatigue can be caused by a constant, endless stream of low-probability investi- gations which do not yield results. As a consequence, less care or attention may be devoted in the future.

ˆ Opportunity cost of analyst’s time spent in other functions

ˆ Overall human resources available to analytic function

The investigative threshold is then not analytically derived, but rather set according to operational and business considerations. It can be specified through a number of ways, examples of which are as a maximum time spent investigating or as a maximum number of node flows investigated.

5.3.4 Discussion

To evaluate the superior process, we must first consider the motivation behind calculating UAG. Typically, it is calculated as a tool to serve in aiding to minimise metering errors for all participants across the network. It is crucial that UAG is not taken as the only indicator of metering error, and elevated to a position where all other investigative processes are contingent on the current UAG position. This may become the case out of a desire to minimise UAG, or keep UAG within the limits set out by a baseline model. As we have already discussed, the baseline method alone cannot be relied upon to ensure no metering errors are present in large and complex transmission grids with uncertainty parameters similar to those of the NTS. A further reason to avoid purely focusing on the baseline method to provide investigative impetus is the following scenario: whilst we have worked with the assumption that only a single error is present so far, two errors of opposing signs but equal magnitude m will have no effect on UAG, and yet the total resulting absolute error will be 2m. In such instances, the baseline method is wholly inadequate. Funda- mentally, minimising UAG and minimising error are not equivalent; they are distinct objectives. It is these different objectives which respectively result in the aforementioned processes. Therefore, it is important that the objective of minimising error is prioritised, and low UAG CHAPTER 5. UAG-LED NODE ERROR DETECTION 197 viewed as a consequence rather than a goal. Nonetheless, UAG analysis in the form of uncertainty quantification and baseline modelling still acts as an important pillar in the error detection process. In light of this perspective we believe it is important that UAG measures, such as those discussed in Section 3.1 should not form the basis of KPI’s or regulatory or operational performance targets in isolation. Doing so can result in providing motivation to adopt processes that purely minimise UAG and cost, such as the one depicted in Figure 5.8.

5.4 Conclusion

In this chapter we have considered two practical methods, specific to the metered network scenario of identifying the source of an error – namely, the joint balancing approach and the predicted UAG method. In particular, our analysis of power station efficiency from published data may find use in the broader domain of energy systems modelling. We have demonstrated that using a simple linear model, whilst controlling for outliers through robust model and efficiency ratio filtering can capture virtually all errors above 10% of nominal daily flow. We have proposed that a source of model uncertainty can be removed by considering efficiency vs load factor, through the use of asymptotic regression. However, and specifically in the common case of combined power train stations it has been shown that fitting a parametric method may not be feasible due to the overall variation across stations. Energy conservation has also been examined for most of the other node types; in partic- ular, distribution networks can be expected to be able to calculate daily UAG in the near future with the arrival of smart meters, allowing the implementation of the monitoring we suggest. We discussed the use of predicted vs measured values, and their potential use in an error detection setting. We identified the usefulness of calculating a predicted UAG term, and proposed two applications of its use – to both filter out nodes for investigation, and to generate an order of investigation. Finally, we discussed the high-level error reduction processes that can be employed by the transmission system operator, and considered how UAG is best utilised within them. We suggested that these processes should be UAG-guided, rather than UAG contingent. Chapter 6

UAGMS – Industrial Integration

A principal component of the project has been the integration of techniques, models and findings stemming from the research aspect into the operational processes of National Grid. As stipulated in the deliverables, a real time monitoring system of the UAG ecosystem was required. We accomplished this through the development of a solution based on the R language and Shiny [13] framework – the resulting application was the eponymously named Unaccounted for Gas Management Suite (UAGMS) . In this chapter, we document the features of this application, and discuss some additional techniques and heuristics regarding error detection. Moreover, we have left included abridged versions of preliminary development discussions concerning practicalities of the implementation within NG’s existing structure. The reasons for this are threefold; firstly, it may function as a case study for future implementations of a similar scope. Secondly, it represents the state of open source statistical software at the time of writing. Finally, it is in line with the scope of iCASE PhDs.

6.1 Solution Architecture

Discussions were held over the optimal delivery method for the data-based solution. Due to the frequency and dimensionality of the data, the R computing environment was deemed suitable for any algorithmic implementation. It’s key merit is that it allows us to take advantage of the vast amount of already available third-party resources and libraries, such as those containing changepoint methods. Hence, discussions focused on the manner in which an R interface would be provided to

198 CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 199 the team. Several solutions were proposed:

(i) Tableau front-end with R-bridge: NG recently implemented a new data system (GCS). The front end to this system is Tableau, with all operational billing data in- stantly available. If R is installed locally, it is possible to create Tableau worksheets which interface with the R environment, albeit with some severe data transfer restric- tions, resulting in extended computation time for even the most basic tasks. Further- more, being entirely restricted to Tableau worksheets for information output is quite limiting. This is illustrated in Figure 6.1.

(ii) Local R Installation: R could simply be installed with the developed solutions provided as script files. Data could be loaded as .csv files. However, the balancing team does not employ data scientists with R expertise and would therefore require training in the use, maintenance, and interpretation of results which is both beyond the scope of the project, and further lacks robustness and resilience.

(iii) Shiny App This was proposed early on in the project. Indeed, delivery of R content through Shiny apps has been gaining traction in the data science community in recent years. The user interface and content is delivered through a java-powered webpage, allowing for reactive controls. The backend is an R shell running server-side, with access to local databases. Hence, this scenario is ideal save for the following drawbacks: data needs to either be manually uploaded, or regularly synchronised with an online source, and the fact that a server is required to host the application.

Ultimately, the Shiny App (henceforth, referred to the UAGMS) was selected as the most suitable candidate. We will now go into detail of certain technicalities relating to the implementation (the backend).

6.2 UAGMS Backend

6.2.1 Data Provision

The algorithms and models used necessitate the availability of the multivariate time series consisting of all terms in the balancing Equation (1.1). Further, as NG will use the solution CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 200 for process control the database it is required to update on a daily frequency. In this subsection, we review options for the provisioning of this data.

Figure 6.1: Ideal scenario

Figure 6.2: Proposed data flow

(i) Manual .csv upload as needed. Since the total available data is in the region of 2000x250 entries, it does not consist of more than a few MB making this solution feasible. However, it would require the user to prepare updated .csv files before every use, resulting in an additional step. Nevertheless, due to its simplicity and resilience the use of such external data has been implemented as an alternative data source.

(ii) Updating local DB through API MIPI interface. Fortunately, NG provides an API to its MIPI data dissemination interface (which in turns communicates with Gemini, the current source for billing data). Using this API, a database containing all CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 201

required flow data in addition to further quantities included in the balancing equation such as linepack can be automatically maintained, updating daily. The update process can be ran automatically at a predetermined time, or manually, depending on whether a dedicated hosting solution is adopted. This scenario is illustrated in Figure 6.2.

6.2.2 Hosting

The method of hosting the application instance is the most crucial aspect of the implemen- tation. The ideal scenario would involve a Linux machine hosting the Shiny server on-site (internally), with direct access to the required databases. However, such an involved imple- mentation is beyond the scope of the project due to the following reasons: Firstly, the NG HQ is a security sensitive information technology environment. Secondly, it is highly regulated. These two reasons combined make the deployment of custom and third party codebases within the system problematic, and subject to extensive vetting processes. However, it is worth pointing out some benefits of this solution; data collection could be incorporated to monitor, record and summarise the effect of corrections as seen in Section 2.8. Further, weekly emails containing alerts and results of monitoring functions could be sent to the team removing the need for active monitoring. Another potential benefit is the ability to pre-process and store results thereby enabling immediate content delivery, and allowing for algorithms of greater complexity. The alternative option is hosting externally – there are numerous ways this can be achieved. The solution currently being employed for testing purposes is through the of- ficial, R-studio provided deployment method; http://www.shinyapps.io/. It offers various levels of service, ranging from free to enterprise-level subscriptions. A simple subscription should be sufficient as usage is metered on computation time, and it is unlikely this would be substantial. Whilst an external solution cannot interface with the SCADA system, node flows can be retrieved by the publicly available MIPI API. The key advantage of exter- nal hosting is that it mitigates concerns relating to security and regulation, as code is not executed on NG data or within its computing environment. CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 202 6.3 Features

In this subsection we present the features of the finished software. The current iteration can be found live, at the time of writing, at this link. The application is presented in a tabbed dashboard format, with the navigation pane and tab-specific options on the left-hand side. Tabs represent distinct functionalities. Screenshots are illustrated in the figures below; the functionality and usage scenarios shall now be described.

6.3.1 UAG Monitor Tab

The UAG Monitor Tab seen in Figure 6.3 is the first tab to be initialised and displayed once the application is opened. The top of the page displays three key statistics; the number of days exceeding the selected baseline both in absolute and percentage terms, and whether or not a systematic error has been detected. Below is a plot of UAG and the selected confidence intervals, with days exceeding the calculated baseline highlighted in red. The models and filtration rules presently implemented are as follows:

ˆ Meter Error Model 1

ˆ Composite Model 1

ˆ ETS, TBATS, ARIMA one-step forecasts

ˆ Bollinger Bands

ˆ Outlier detection, based on the GESD method [73]. The GESD Method (Generalized Extreme Studentized Deviate Test) progressively eliminates outliers using a Student’s t-test comparing the test statistic to a critical value. Each time an outlier is removed, the test statistic is updated. Once test statistic drops below the critical value, all outliers are considered removed. Given previous data is free of error, this is yet another method of detecting potentially anomalous UAG values.

ˆ Legacy intervals including constant percentage throughput, or a pure constant.

Moreover, aggregation heuristics for the above models for the mean, median, minimum and maximum have also been implemented. Models utilising MC sampling from the mul- tivariate normal distribution have not been implemented for the time being, purely due to CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 203 their higher computational time. Contextual tuning options appear on the left-hand side, depending on which model has been selected. The user can change other options such as the plot type, date window, and data column (necessary if using a .csv file). The table shows the values of the UAG and the limits of days exceeding either bound. Selecting a day will highlight it on the chart, and vice versa. Once a day has been selected, pressing ’Explore Day’ will take the user to the Causality Detection Tab. Selecting batch analysis will queue up all days outside of the given range for causality analysis. The UAG Monitor Tab also provides a visualisation which aggregates UAG based on time frames of greater frequency, i.e. monthly, yearly. The aggregation function can also be varied according to those proposed in Chapter3.

6.3.2 Causality Detection Tab

In the Causality Detection Tab depicted in Figure 6.4, attention is first drawn to the info boxes at the top of this tab which convey key information relating to the user’s previous selection in a sleek and modern fashion. Below, the results of analysis aiming to discover the cause of individual high UAG days are delivered to the user. This is currently done through the assignments of flags to nodes. The user first specifies which flags to check for, as the analysis process can take upwards of 20s. The flags implemented are:

ˆ Outliers (Fast/Slow): Currently, two algorithms are being used to detect outliers in node flows. In both cases, the series are first decomposed using local polynomial regression fitting [15], and the residuals extracted. The fast algorithm then applies a simple outlier classification rule on the residuals, identifying values outside 1.5 times the inter quartile range. The slower algorithm is the above described GESD Method [73], again applied on the residuals. Both these methods should not be relied upon for highly discontinuous or irregular flow profiles.

ˆ Interrupts are nodes which recorded a null flow for the indicated day, having been active just prior.

ˆ High/Low: Here, the user can specify a quantile, and thus flag values that are flowing at a high or low rate. CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 204

ˆ Extreme: As above, only this is only raised if the value is an absolute historic maximum or minimum for the given node.

ˆ Linear Models: Nodes flowing outside of their prediction interval as calculated by linear regression are flagged.

ˆ Percentage change: Nodes are flagged if they exceed a given absolute day-on-day per- centage increase.

Flagged nodes can be sorted by predicted UAG, which uses the linear demand model prediction in the case of demand nodes, and a midpoint value corresponding to the two temporally adjacent flow measurements for all other node types. The user is presented with a map indicating the physical location of a selected node with a pulsating red dot. Summary statistics of the selected node are also tabulated, indicating the percentage online time, the percentile of the selected day, mean, standard deviation, the maximum and minimum. Finally, the nodes’ group (e.g. LDZ, power stations) can be overlayed on top of the time series plot by selecting the relevant group on the right-hand box. Previous analysis can be accessed through the drop-down, to save time. Both the UAG Monitor and Causality Detection Tabs have been designed in a way so as to easily allow for the development of additional functionality, either through a baseline model or alternative flagging methodology – the existing visualisations and data structures need not be modified. We have not implemented the power station efficiency monitoring methods discussed in Chapter5, due to technical complexities involved in the necessary data acquisition and time constraints.

6.3.3 LDZ Weather Model Tab

The LDZ Weather Model Tab is a visual and numerical tool that aims to help the user identify a faulty LDZ demand meter by modelling flow through a linear model, as specified in National Grids methodology [58]. This approach yields models with an R2 greater than 0.9 in over 50 nodes, with only a minority exhibiting a poor fit. The intention here is simple; for nodes whose behaviour can be modelled effectively by regression, days where their flow exceeds the prediction intervals can be used as evidence to flag up the node for review, especially in high UAG days. The predictors used by the linear models include CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 205

CWV, seasonality, demand, local demand (LDZ), and holiday dummy variables. These are automatically fitted following two-way AIC for each node. The user may view a particular model in more detail by clicking on the relevant button. Demand as a percentage of LDZ is also displayed, as is correlation with other nodes.

6.3.4 Changepoint Analysis Tab

Figure 6.6 illustrates the changepoint analysis functionality implemented in the UAGMS. Three R packages used for changepoint detection and process monitoring are delivered in this tab. Namely, these are cpm [74], changepoint [47], and qcc [75]. The changepoint is used to provide offline detection, in the form of PELT or binary segmentation. Online changepoint analysis is implemented through the cpm package, and the default test statistics and ARL are set as those determined most suitable in Chapter4. The package qcc provides an implementation of the CUSUM control chart, which is the simplest way of monitoring for a change in mean. This variety is provided as it will allow quick visual verification and redundancy. Identified changepoints along with their means are tabulated in the top right-hand side for both online and offline cases. The box in the bottom right allows for algorithm sensitivity settings to be altered for all three packages. Specifically, in the case of the changepoint package, the user can select between the PELT or BinSeg algorithms; if using BinSeg, the maximum number of segmentations can be specified.

6.3.5 Reporting Tab

The Reporting Tab, seen in Figure 6.7 allows users to export the results of in-app analysis into a PDF format. The user first selects the time frame of interest, followed by the required components (e.g. changepoints, daily errors, historic visualisations, etc.). Once the download button is pressed, the results will be computed and a download will be made available in the PDF format. A sample output is depicted in Figure 6.8.

6.3.6 Data Configuration Tab

The Data Configuration Tab as seen in Figure 6.9 allows the user to individually chose the data source for the UAG, entry and exit flows. A .csv file can either be uploaded, or the CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 206 online databases used. This tab also enables the user to update the App’s database with that from the MIPI and the UAG source. The default source is the online data; on toggling the material switches an option to upload a .csv file will appear. Once a suitable file has been uploaded, this will then be used for the analysis. Note that not all features will be available in this state, due to the complexity of matching names in the relational database used for qualitative information. The current source is displayed, along with the available time frame for each dataset. On pressing the Sync button, the App will download and consolidate the new information into its database. This process should take no longer than a few seconds. A progress bar reassures the user and indicates the remaining time. Upon launch of the UAGMS, if any of the databases are identified as being out of date, the user will be presented with a pop-up alert, and allowing them to proceed to the Configuration Tab where they can update the database. This tab also contains an option to enable auto- updating, which will always run the update process on launch.

6.3.7 Help Page

The Help page, as seen in Figure 6.10 aids users achieve their goals within the UAGMS. The documentation here is split between the FAQ, the feature documentation and workflow examples, and has been developed alongside the users at National Grid so as to be as clear and concise as possible.

6.4 Summary and Future Development

UAGMS delivers all the objectives set out in the project manifest, and moreover is flexible in both its deployment and features. As discussed, adding new models is trivial and there are a range of hosting options, both internal and external going forward. Future additions to the App could include a more robust integration with the data structures at National Grid, and features such as email alerts, authentication and persistent environments. CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 207 Figure 6.3: UAGMS: UAG Monitor Tab CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 208 Figure 6.4: UAGMS: LDZ Weather Model Tab CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 209 Figure 6.5: UAGMS: Causality Detection Tab CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 210 Figure 6.6: UAGMS: Changepoint Analysis Tab CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 211 Figure 6.7: UAGMS: Reporting Tab CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 212

Figure 6.8: UAGMS: Report Output CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 213 Figure 6.9: UAGMS: Data Configuration Tab CHAPTER 6. UAGMS – INDUSTRIAL INTEGRATION 214 Figure 6.10: UAGMS: Help Page Tab Chapter 7

Summary and Recommendations

Concluding, we will briefly review the work completed directly concerning the industrial goals, and assess whether a solution to the intended problems has been provided. Impor- tantly, we will also outline recommendations to National Grid and the broader gas trans- mission community regarding both operations, and accounting processes.

7.1 Recommendations

The following recommendations are based on observations and literature that has been made available over the duration of the project. Whilst some may be specific to the National Grid, an effort has been made to generalise where possible so as to benefit the wider gas transportation community. It is understood some of these may not be feasible due to their wider implications to the business, however they may be of some use in future decision making when opportunities arise to redesign systems and infrastructure, or when building new systems from the ground up.

7.1.1 Monitoring and Statistical Control

It is important to monitor gas balancing terms, including UAG regularly with the aim of identifying systematic shifts or large one-off imbalances. Statistical control techniques should be used to aid in ascertaining whether significant changes have taken place, both in the short and long term. It is essential to stress that this statistical control should be applied not only to the physical real-time control telemetry (where it is often present), but also to aggregate terms (daily flows) in the accounting record-keeping. In the case of National Grid, these

215 CHAPTER 7. SUMMARY AND RECOMMENDATIONS 216 statistical control techniques have been incorporated into the App. Simple checks, such as extreme and abnormal value monitoring can prove surprisingly effective, and should be present as standard in any record-keeping system. Specific recommendations for both online and offline monitoring, tuned to the UAG typically found in the NG, along with generic guidelines for setting up a monitoring scheme can be found in Section 4.4.

7.1.2 Increasing UAG Calculation Frequency

Current UK UAG calculation frequency is daily. However, if accurate real time energy flows for all components can be reliably established, there are no limitations to the UAG calculation frequency. A natural progression would be to calculate 12-hour intervals, or even hourly. Such improvements could lead to faster identification of errors; indeed, it would allow a much greater depth of analysis as UAG could be tracked across the day as both volume and the number of active nodes fluctuate. This could permit for a faster ’narrowing down’ of suspect nodes. However, the amount of data this would involve would require analysis to be entirely automated, with only anomalies being investigated further by operators. Moreover, there is no transmission or distribution grid that presently calculates UAG at such a granularity. The patterns that could be uncovered in future research in this area are unknown unknowns. In reality, the limitation of UAG monitoring is the largest sensor polling interval. We have not explored a continuous time formulation of the UAG problem in this work, as there is little obvious practical benefit; however, this is another future possibility.

7.1.3 Data Confidence

It is important that at all stages of the data transmission process, users have a confidence in both the veracity and the robustness of their data. This can only be achieved by a thorough understanding of data transmission pathways, data systems and the underlying data processing, by key users within the organisation. For example, a possible impediment to this is the presence of conflicting data regarding system operation. Such situations need to be handled with thorough investigations into the underlying cause as a matter of urgency, and remedial action either in the form of system maintenance or process control be taken where necessary so as to prevent future occurrences. In mature transmission systems, like the NTS, the eventual complexity is such that it is CHAPTER 7. SUMMARY AND RECOMMENDATIONS 217 unlikely all billing, control and monitoring systems will be unified from a data perspective. Often legacy systems and workarounds could be used in parallel, or as a substitute to newly implemented systems. This can lead to exceedingly complex work-flow processes being de- veloped, which can over time amount to decreased efficiency, and an increased potential for the introduction of errors. Such errors may end up being undocumented, and their discovery becomes exceedingly difficult. Furthermore, this complexity hinders the development of the aforementioned understanding of data structures. A lack of confidence in data can limit the analytic scope to which ideas are developed; the belief that data is not of sufficient quality will prevent more complex techniques being used. Indeed, this may also contribute to the non-sharing of data to interested parties and the broader community. Only three European transmission grid operators, including the UK, provided UAG data freely online at the time of writing. Therefore, it is important the transmission grid operators strive to both simply data structures, limiting the number of data sources requiring alternative systems and processes and work to ultimately unify their data processing needs in a single digital ecosystem.

7.1.4 Standardisation of Reporting

UAG is best reported as percentage throughput, as discussed previously in Chapter3. When aggregating across a longer period of time, it is important to report statistics based on both absolute and standard values, so as to provide a fair and unbiased picture. Reporting intervals should optimally be no greater than yearly, and optimally quarterly. Furthermore, the quantity and units too should both preferably be standardised; as a recommendation, we would suggest reporting both in terms of volume and energy. This can raise accountability as any problems resulting from UAG become public.

7.1.5 Data Sharing

As aforementioned, few transmission grid operators share any kind of operational data, let alone UAG. Data quality and perceived confidentiality issues may be obstacles to this effect. However, in the particular case of UAG, data and concerns should at a minimum be shared with all downstream and upstream connections; this may help in the diagnosis of faults, as it will provide a level of measurement redundancy, perhaps allowing for techniques such as CHAPTER 7. SUMMARY AND RECOMMENDATIONS 218 data reconciliation. In the UK, this data is available through the website; in addition, as previously mentioned a large variety of operational data can be accessed through the MIPI. Furthermore, it is important to share raw rather than top-level, highly summarised data so as to allow external analysis to take place, thus enabling the possible creation of new insights. In the case of UAG, this will usually be daily balancing and node flow data, allowing the user to balance Equation (1.3) themselves.

7.1.6 Cross-departmental approach to UAG management

In particular, we have shown that the gas flow modelling and forecasting functions are closely related to UAG analysis. A holistic approach to UAG requires that there is continuous com- munication of relevant results and models between the organisations’ modelling function, and energy balancing. UAG implications of functions which may result in either error intro- duction or detection, even incidentally, must be communicated to the relevant parties.

7.1.7 Documentation of Sources of Uncertainty

We recommend the systematic documentation and quantification of all factors which may introduce uncertainty or potential for error into the flow accounting and measurement pro- cesses to be carried out, as described in Chapter2. Existing networks should create such a database following a comprehensive study, and update it subsequent to operational or accounting changes. New networks should find this easier as the database can be created during the design and planning stages, and calibrated once operation commences.

7.1.8 Volumetric Balancing

UAG should be calculated in both volume and energy. Whilst the volume equation will clearly not represent the true fiscal picture, it eliminates many errors arising out of CV modelling inaccuracies. Therefore, one would be able to deduct whether an error is resulting from volumetric or energy errors when comparing the two time series. Furthermore, a volumetric UAG would be more sensitive to errors like those experienced in Aberdeen and Braishfield B, as the time series would be closer centred around 0 with less variance under normal operation. CHAPTER 7. SUMMARY AND RECOMMENDATIONS 219

7.1.9 Adoption of an Independent Error Reducing Process

By this, we refer to the discussion in Section 5.3.4. We briefly summarise: Structuring error reduction to be guided by, rather than contingent on UAG results in a greater amount of errors discovered, and improved long-term metering accuracy for all network participants. As a consequence, it is also recommended that KPIs are not purely based on UAG. This is applicable almost universally, with the exception of very small transmission systems or individual pipelines, where the baseline method can be deemed sufficient for error detection purposes. This can be ascertained by examining the EaR statistic.

7.1.10 Development of an Action Plan

An action plan in the case of identifying either a large random or systematic error must be in place. This will speed up reaction time, minimising the cost incurred and allowing decision making to focus on only the most critical aspects. The action plan will often be structured according to the financial cost of actions taken to rectify the error; note that this may often include intangible quantities such as relationships with external stakeholders.

7.1.11 Linepack Modelling Evaluation

As has been discussed in Chapters2 and3, there exists both substantial statistical evi- dence and practical knowledge suggesting a large error in NTS linepack. This error can be attributed specifically to CV and temperature modelling within the linepack estimation pro- cess. Such evidence should ideally be acted upon, through a cost-benefit analysis of possible solutions and their subsequent implementation.

7.1.12 Automation of the UAG Calculation

Ideally, UAG should not be a hand-calculated term, as having to collate data from multiple distinct sources manually introduces further potential error sources. Automatic calculation could not only reduce errors, but further log UAG changes as delayed daily flows come in. Confidence in automated telemetry readings, integrated physical flow monitoring, and accounting systems are crucial. CHAPTER 7. SUMMARY AND RECOMMENDATIONS 220 7.2 Summary

Our work has focused on a wide range of considerations pertaining to the management and control of UAG in a transmission network. We carried out a comprehensive investigation into sources of uncertainty, and areas where errors are likely to be introduced resulting in UAG specifically in the NTS. It was concluded that the majority of uncertainty originates from measurement uncertainty in nodes. However, unlike prior models in literature we found that including uncertainties of additional terms (linepack, OUG) is crucial, and although they represent lower energy than nodes, their uncertainty is likely to be much higher. In particular, it was discovered that significant errors arise due to difficulties in the accurate assessment of linepack. We provided an easily implementable procedure aimed at reducing these errors. Sources of uncertainty were combined to form a baseline model; we postulated that a superior model can be attained by combining the uncertainty model with purely statistical time series models based on error-free statistical data. We concluded that this model cannot however be used as a basis for error detection for the majority of node flows, when considering the distribution of such nominal flows in the NTS. We studied the distribution of the said nominal flows, and also conducted a statistical analysis of UAG. A surprising finding was that there was no discernible seasonality. We considered a historical case of systematic error, and investigated the extent to which such errors can be identified from UAG using an assortment of changepoint models. This portion of our research has been published in the Journal of Natural Gas Science and Engineering [10]. In it, we reach a similar conclusion as that when considering baseline models - only relatively large errors in the majority of nodes can be detected through statistical control based on UAG. We provided specific recommendations for the type of statistical control to adopt in the setting of the NTS. The results of the above work were partially implemented in the application developed as part of the industrial collaboration with National Grid, known as UAGMS. This application is already seeing use by the energy balancing team at NG. The application also implemented the baseline models discussed, and some heuristics to aid in error attribution. Advanced detection techniques, which utilise node flows in addition to UAG data and aim to not only detect errors, but determine their source within the network were reviewed in our final chapter. In particular, we considered the monitoring of energy conservation CHAPTER 7. SUMMARY AND RECOMMENDATIONS 221 in power stations, and the combined analysis of upstream and downstream UAG across transmission and distribution networks – both novel techniques insofar as such practice has not been previously reported in literature. We also considered other methods of identifying error, such as forecasting error and predicted UAG inference. We outlined the high level monitoring processes that should be implemented, ensuring a comprehensive approach is adopted regarding error detection.

7.2.1 Limitations

The key limitations faced by the project derived from the use of daily data. Focusing analysis on detailed operational data, for example individual CV, volume, temperature readings at the minute-by-minute level may allow for a greater variety of statistical techniques to be employed. However, the complex setting of transmission grids means that access to such data – specifically, a direct view into the SCADA system – is limited in an academic setting.

7.2.2 Future Research Direction

As we have taken a holistic approach to the management of UAG, we have touched upon nu- merous distinct domains. As such, we highlight only some of the areas where future research might be directed, and which were not already reviewed at length by us. Our study consid- ered, but did not examine in any detail the potential use of data reconciliation techniques in transmission grids, with redundancy acquired through the use of intermediate flows at com- pressor stations. Whilst we carried out some comparison of UAG across international grids, we did conduct a systematic review of differences and similarities of key statistics. This was due to the lack of data availability for grids outside the UK. As regards to error detection via efficiency/load modelling in the case of power stations, the non-parametric methods and ML techniques can provide an alternative pathway for combined power train stations and might be an avenue of future research. More granular data can allow for a greater amount of statistical techniques to be used, and in particular when trying to attribute sources of systematic error – future research could focus on the examination of techniques that can achieve this goal in our scenario. Finally, in the future it may be possible to analyse UAG and shorter intervals – such as hourly – and this will doubtless provide plenty of analytic potential. CHAPTER 7. SUMMARY AND RECOMMENDATIONS 222

7.2.3 Closing Remarks

Whilst the future place of hydrocarbon fuel use considering the current climate crisis is in doubt and the reserves of such hydrocarbons are limited, pipelines are and will remain a key component of global infrastructure. Once constructed, they represent the most efficient transportation method across long distances for gases and liquids. Indeed, current studies are investigating the potential of one day converting the NTS to a hydrogen transmission grid, which will serve to transport a net zero emission fuel. Therefore, the topics that have been discussed will remain relevant, and it is hoped the methodologies, techniques and recommendations can be adapted and used around the world, today and in the future. Appendix A

Glossary

Definitions for common abbreviations and terms used in this thesis are defined below:

ˆ NG: National Grid, owner and operator of the UK gas and electricity transmission networks.

ˆ NTS: National Transmission System. The gas transmission system owned and opera- tor by NG in the UK

ˆ LDZ: Local Distribution Zone, the networks that distribute the gas from the NTS offtakes to the domestic consumer and small/medium industrial users.

ˆ UAG: Unaccounted for Gas

ˆ OUG: Own Use Gas

ˆ DNO: Distribution Network Operator: The entity operating distribution networks, which are supplied by the NTS.

ˆ LP: Linepack

ˆ KPI: Key Performance Indicator

ˆ (D)LP: (Delta) Linepack : the difference in day-to-day linepack, or amount of gas contained within the pipeline.

ˆ Shrinkage: An umbrella term for UAG, OUG and calorific value shrinkage.

ˆ DR: Data reconciliation

223 APPENDIX A. GLOSSARY 224

ˆ MIPI: National Grids data API.

ˆ CV: Calorific Value, total energy released as heat when a substance undergoes com- plete combustion with oxygen under standard conditions.

ˆ AGL: Above Ground Level.

ˆ MC: Monte Carlo

ˆ Ofgem: The Office of Gas and Electricity Markets, the government regulator for the electricity and downstream natural gas markets in Great Britain

ˆ CWV: Composite Weather Variable, a weather variable that is linearly related to non-daily metered gas demand

ˆ FWMU: Flow weighted meter utilisation

ˆ INPV: Indicated positive prediction value

ˆ IES: Indicated energy sensitivity

ˆ CUSUM: Cumulative Sum

ˆ ECP: E-divisive Changepoint

ˆ PELT: Pruned linear time

ˆ BCP: Bayesian changepoint

ˆ BinSeg: Binary segmentation

ˆ CCGT: Combined Cycle Gas Turbine

ˆ DNO: Distribution Network Operator

ˆ UAGMS: Unaccounted for Gas Management Suite Appendix B

CWV calculation

The CWV is calculate as follows in the UK for each set of measurements ATt,Wt, as per NG methodology [58]:

Et = 0Et−1 + 0.5ATt

Where AT is actual temperature in degrees Celsius.

CWt = l1Et + (1.0 − l1)St − l2 max(0,Wt − W0) max(0,T0 − ATt)

CW is an intermediate term, prior to linearisation. Wt is the wind speed at time t, in knots. l1, l2, l3 are dimensionless constants.  V + q(V − V ) if V ≤ CW  1 2 1 2 t   V1 + q(CWt − V1) if V1 < CWt < V2 CWVt = CW if V ≤ CW ≤ V  t 0 t 1   CWt + l3(CWt − V0) if V0 > CWt

V0,V1,V2 are the linearity bounds on CWt, and in cases where they are exceeded a correction is made to maintain a linear demand-response. St is the seasonal normal effective tempera- ture. The national CWV used in the UAG regression is calculated as a weighted average of LDZ CWVs.

225 Bibliography

[1] Aminikhanghahi, S., and Cook, D. J. A survey of methods for time series change point detection. Knowledge and information systems 51, 2 (2017), 339–367.

[2] Arpino, F., Dell’Isola, M., Ficco, G., and Vigo, P. Unaccounted for gas in transmission networks: Prediction model and analysis of solutions. Journal of National Gas Science and Engineering 17, 2014 (2014), 58–70.

[3] Azadeh, A., Ghaderi, S. F., and Sohrabkhani, S. Annual electricity consump- tion forecasting by neural network in high energy consuming industrial sectors. Energy Conversion and management 49, 8 (2008), 2272–2278.

[4] Badillo-Herrera, J.-D., Chaves, A., and Fuentes-Osorio, J.-A. Computa- tional tool for material balances control in natural gas distribution network. CT&F- Ciencia, Tecnolog´ıay Futuro 5, 2 (2013), 31–46.

[5] Bagajewicz, M. J., and Cabrera, E. Data reconciliation in gas pipeline systems. Industrial & Engineering Chemistry Research 42, 22 (2003), 5596–5606.

[6] Barry, D., and Hartigan, J. A. A bayesian analysis for change point problems. Journal of the American Statistical Association 88, 421 (1993), 309–319.

[7] Beardsmore, G. R., Cull, J. P., Cull, J. P., et al. Crustal heat flow: a guide to measurement and modelling. Cambridge University Press, 2001.

[8] Bedi, J., and Toshniwal, D. Deep learning framework to forecast electricity de- mand. Applied energy 238 (2019), 1312–1326.

[9] Bianco, V., Manca, O., and Nardini, S. Electricity consumption forecasting in italy using linear regression models. Energy 34, 9 (2009), 1413–1421.

226 BIBLIOGRAPHY 227

[10] Botev, L., and Johnson, P. Applications of statistical process control in the man- agement of unaccounted for gas. Journal of Natural Gas Science and Engineering (2020).

[11] Box, G. E., Jenkins, G. M., and Reinsel, G. C. Time series analysis: forecasting and control, vol. 734. John Wiley & Sons, 2011.

[12] Cassidy, R. Gas: Natural energy. In London: Frederick Muller Limited. p. 14, 1979.

[13] Chang, W., Cheng, J., Allaire, J., Xie, Y., and McPherson, J. shiny: Web Application Framework for R, 2019. R package version 1.3.2.

[14] Chen, C., and Liu, L.-M. Joint estimation of model parameters and outlier effects in time series. Journal of the American Statistical Association 88, 421 (1993), 284–297.

[15] Cleveland, R. B., Cleveland, W. S., McRae, J. E., and Terpenning, I. Stl: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics 6, 3 (1990), 235–248.

[16] Costello, K. Lost and unaccounted-for gas: Practices of state utility commissions. Tech. rep., 2013.

[17] Costello, K. W. Lost and unaccounted-for gas: Challenges for public utility regula- tors. Utilities Policy 29 (2014), 17 – 24.

[18] De Livera, A. M., Hyndman, R. J., and Snyder, R. D. Forecasting time series with complex seasonal patterns using exponential smoothing. Journal of the American statistical association 106, 496 (2011), 1513–1527.

[19] de Oliveira, E. C., Frota, M. N., and de Oliveira Barreto, G. Use of data reconciliation: A strategy for improving the accuracy in gas flow measurements. Journal of Natural Gas Science and Engineering 22 (2015), 313–320.

[20] Department for Energy, Business and Industrial Strategy. Digest of United Kingdom Energy Statistics. Tech. rep., 2017.

[21] Ebrahimi-Moghadam, A., Farzaneh-Gord, M., and Deymi-Dashtebayaz, M. Correlations for estimating natural gas leakage from above-ground and buried urban BIBLIOGRAPHY 228

distribution pipelines. Journal of Natural Gas Science and Engineering 34 (2016), 185 – 196.

[22] Edalat, M., and Mansoori, G. A. Buried gas transmission pipelines: temperature profile prediction through the corresponding states principle. Energy sources 10, 4 (1988), 247–252.

[23] EGIG. Gas pipeline incidents - 10th report of the European gas pipeline incident data group, (period 1970-2016). Tech. rep., March 2018.

[24] Elexon. Balancing mechanism reporting agent. https://www.elexon.co.uk/data/ balancing-mechanism-reporting-agent/, 2020.

[25] Erdman, C., and Emerson, J. W. bcp: An R package for performing a Bayesian analysis of change point problems. Journal of Statistical Software 23, 3 (December 2007), 2007.

[26] Feldman, R. J. The lost and unaccounted-for gas: Chasing the “silver bullet“. Pipeline and Gas Journal 225 (07 1998).

[27] Ficco, G., Dell’Isola, M., Vigo, P., and Celenza, L. Uncertainty analysis of energy measurements in natural gas transmission networks. Flow Measurement and Instrumentation 42 (2015), 58–68.

[28] Florides, G. A., and Kalogirou, S. A. Annual ground temperature measurements at various depths.

[29] Friedl, H., Mirkov, R., and Steinkamp, A. Modelling and forecasting gas flow on exits of gas transmission networks. International statistical review 80, 1 (2012), 24–39.

[30] Gaba, A., Tsetlin, I., and Winkler, R. L. Combining interval forecasts. Decision Analysis 14, 1 (2017), 1–20.

[31] Global Gas & Oil Network. Global fossil infrastructure tracker. Tech. rep., May 2020.

[32] Gromping,¨ U., et al. Relative importance for linear regression in r: the package relaimpo. Journal of statistical software 17, 1 (2006), 1–27. BIBLIOGRAPHY 229

[33] Grushka-Cockayne, Y., and Jose, V. R. R. Combining prediction intervals in the m4 competition. International Journal of Forecasting 36, 1 (2020), 178 – 185. M4 Competition.

[34] Hasan, N., Rai, J. N., and Arora, B. B. Optimization of ccgt power plant and performance analysis using matlab/simulink with actual operational data. SpringerPlus 3, 1 (2014), 275.

[35] Haydell, M. Unaccounted-for gas. 2001 Proceedings, American School of Gas Mea- surement Technology (2001), 148–153.

[36] Horlock, J. H. Chapter 5 - Full Calculations of Plant Efficiency. In Advanced Gas Turbine Cycles. Pergamon, Oxford, 2003, pp. 71 – 84.

[37] Hyndman, R., Koehler, A. B., Ord, J. K., and Snyder, R. D. Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008.

[38] Hyndman, R. J., Koehler, A. B., Snyder, R. D., and Grose, S. A state space framework for automatic forecasting using exponential smoothing methods. Interna- tional Journal of forecasting 18, 3 (2002), 439–454.

[39] ICF International. Lost and unaccounted-for gas. Tech. rep., December 2014.

[40] Innogate. Energy (electricity and gas) sector performance assessment and improve- ment under the regulatory perspective. Tech. rep., 2015.

[41] Joint Office of Gas Transporters. Work procedure for validation of equipment associated with measurement systems for the calculation of mass, volume and energy flow rate of gas. Tech. rep., 2011.

[42] Jose, V. R. R., and Winkler, R. L. Evaluating quantile assessments. Operations research 57, 5 (2009), 1287–1297.

[43] Kelton Engineering. Aberdeen measurement error review. Independent expert sig- nificant meter error (SMER), document ref.: Nk3177-001. Tech. rep., 2010. BIBLIOGRAPHY 230

[44] Kelton Engineering. Braishfield ”B” measurement error review. Independent expert significant meter error (smer) - draft report, document ref.: Nk3173-003. Tech. rep., 2010.

[45] Killick, R., and Eckley, I. A. changepoint: An R package for changepoint analysis. Journal of Statistical Software 58, 3 (2014), 1–19.

[46] Killick, R., Fearnhead, P., and Eckley, I. A. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association 107, 500 (2012), 1590–1598.

[47] Killick, R., Fearnhead, P., and Eckley, I. A. Optimal detection of changepoints with a linear computational cost. 2012.

[48] Kusuda, T., and Achenbach, P. R. Earth temperature and thermal diffusivity at selected stations in the united states. Tech. rep., National Bureau of Standards Gaithersburg MD, 1965.

[49] Lane, T. Linepack calculation analysis. Tech. rep., National Grid, 2016.

[50] Lee, S., and Kim, S. U. Comparison between change point detection methods with synthetic rainfall data and application in south korea. KSCE Journal of Civil Engineer- ing 20, 4 (2016), 1558–1571.

[51] Liang, G. C., Tan, L. L., and Tian, H. A model for predicting flowing gas temperature and pressure profiles in buried pipeline. In Advanced Materials Research (2012), vol. 463, Trans Tech Publ, pp. 1065–1068.

[52] Lindeman, R. H., Merenda, P., and Gold, R. Z. Introduction to bivariate and multivariate analysis, glenview, il. Scott: Foresman and company 119 (1980).

[53] Lutkepohl,¨ H. Introduction to multiple time series analysis. Springer Science & Business Media, 2013.

[54] Matteson, D. S., and James, N. A. A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association 109, 505 (2014), 334–345. BIBLIOGRAPHY 231

[55] Menon, S. Chapter Seven - Thermal Hydraulics. In Transmission Pipeline Calculations and Simulations Manual. Gulf Professional Publishing, Boston, 2015, pp. 273 – 316.

[56] National Grid. Unaccounted for gas (UAG) report. https://www.nationalgrid. com/uk/gas-transmission/document/79151/download, July 2012.

[57] National Grid. Unaccounted for gas (UAG) report. https://www.nationalgrid. com/uk/gas-transmission/document/79146/download, October 2013.

[58] National Grid. Gas demand forecasting methodology. https://www.nationalgrid. com/sites/default/files/documents/8589937808-Gas%20Demand%20Forecasting% 20Methodology.pdf, Novemeber 2016.

[59] Niculita, O., Skaf, Z., and Jennions, I. K. The application of bayesian change point detection in uav fuel systems. Procedia CIRP 22 (2014), 115–121.

[60] Nilsson, U. R. A new method for finding inaccurate gas flow meters using billing data: Finding faulty meters using billing data. Flow Measurement and Instrumentation 9, 4 (1998), 237–242.

[61] Ofgem. Uniform Network Code - Offtake Arrangements Document. Section D. Mea- surements, 29-41. Tech. rep., 2005.

[62] Ofgem. Gas Theft Consultation. Tech. rep., 2011.

[63] Oliveira, E. C., and Aguiar, P. F. Data reconciliation in the natural gas industry: Analytical applications. Energy & Fuels 23, 7 (2009), 3658–3664.

[64] Pakistan Today. SNGPL Shuns KPMG report on UFG losses. https://profit. pakistantoday.com.pk/2017/09/16/sngpl-shuns-kpmg-report-on-ufg-losses/, September 2017.

[65] Panapakidis, I. P., and Dagoumas, A. S. Day-ahead natural gas demand forecast- ing based on the combination of wavelet transform and anfis/genetic algorithm/neural network model. Energy 118 (2017), 231–245.

[66] Park, S., and Budescu, D. V. Aggregating multiple probability intervals to improve calibration. Judgment and Decision Making 10, 2 (2015), 130. BIBLIOGRAPHY 232

[67] Petherick, L. M., Pietsch, F. U., et al. Effects of errors in linepack calcula- tions on real-time computational pipeline monitoring. In PSIG Annual Meeting (1994), Pipeline Simulation Interest Group.

[68] Petkovic, M., Chen, Y., Gamrath, I., Gotzes, U., Hadjidimitriou, N. S., Zittel, J., Xu, X., and Koch, T. A hybrid approach for high precision prediction of gas flows. Tech. Rep. 19-26, ZIB, Takustr. 7, 14195 Berlin, 2019.

[69] Polunchenko, A. S., Tartakovsky, A. G., and Mukhopadhyay, N. Nearly optimal change-point detection with an application to cybersecurity. Sequential Analysis 31, 3 (2012), 409–435.

[70] R Core Team. R: A Language and Environment for Statistical Computing. R Foun- dation for Statistical Computing, Vienna, Austria, 2019.

[71] Rand, W. M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66, 336 (1971), 846–850.

[72] Reshetnikov, A. I., Paramonova, N. N., and Shashkov, A. An evaluation of historical methane emissions from the soviet gas industry. Journal of Geophysical Research: Atmospheres 105, D3 (2000), 3517–3529.

[73] Rosner, B. Percentage points for a generalized esd many-outlier procedure. Techno- metrics 25, 2 (1983), 165–172.

[74] Ross, G. J. Parametric and nonparametric sequential change detection in R: The cpm package. Journal of Statistical Software 66, 3 (2015), 1–20.

[75] Scrucca, L. qcc: an R package for quality control charting and statistical process control. R News 4/1 (2004), 11–17.

[76] Seebregts, A. J. Gas-fired power. Technology Brief E02 (April 2010).

[77] Shafiq, M., Nisar, W. B., Savino, M. M., Rashid, Z., and Ahmad, Z. Mon- itoring and controlling of unaccounted for gas (ufg) in distribution networks: A case study of sui northern gas pipelines limited pakistan. IFAC-PapersOnLine 51, 11 (2018), 253–258. BIBLIOGRAPHY 233

[78] Shaikh, F., Ji, Q., Shaikh, P. H., Mirjat, N. H., and Uqaili, M. A. Forecasting China’s natural gas demand based on optimised nonlinear grey models. Energy 140 (2017), 941–951.

[79] Shapiro, S. S., Wilk, M. B., and Chen, H. A comparative study of various tests for normality. Journal of the American Statistical Association 63, 324 (1968), 1343–1372.

[80] Shewhart, W. A. Economic control of quality of manufactured product. ASQ Quality Press, 1931.

[81] Taylor, B. N., and Kuyatt, C. E. NIST technical note 1297. https://www.nist. gov/pml/nist-technical-note-1297.

[82] The House Natural Resources Committee Democratic Staff. America pays for gas leaks. Tech. rep., August 2013.

[83] Tian, Y., Zhao, Y., and Zhao, X. Study of load forecasting for urban gas supply. GAS & HEAT 4 (1998).

[84] Truong, C., Oudre, L., and Vayatis, N. A review of change point detection methods. arXiv preprint arXiv:1801.00718 (2018).

[85] U.S. Energy Information Administration. Natural gas annual. Tech. rep., 2017.

[86] U.S. EPA Office of Air Quality Planning and Standards. Oil and natural gas sector leaks. Tech. rep., April 2014.

[87] Venables, W. N., and Ripley, B. D. Modern applied statistics with S-PLUS. Springer Science & Business Media, 2013.

[88] Wilcox, R. Modern statistics for the social and behavioral sciences: A practical in- troduction. CRC press, 2011.

[89] Winchester, S. Krakatoa: The day the world exploded. Penguin UK, 2004.

[90] Wood, S. N. Generalized additive models: an introduction with R. Chapman and Hall/CRC, 2017.

[91] Xie, Y., Wang, X., and Mai, F. Calculation of theoretical transmission loss in trunk gas pipeline. Advances in Mechanical Engineering 11, 12 (2019), 1687814019895440. BIBLIOGRAPHY 234

[92] Yohai, V. J., Stahel, W. A., and Zamar, R. H. A procedure for robust estimation and inference in linear regression. In Directions in robust statistics and diagnostics. Springer, 1991, pp. 365–374.

[93] Zhang, G., Patuwo, B. E., and Hu, M. Y. Forecasting with artificial neural networks:: The state of the art. International journal of forecasting 14, 1 (1998), 35–62.

[94] Zhou, J., Adewumi, M. A., et al. Predicting flowing gas temperature and pressure profiles in buried pipelines.