<<

Mälardalen University Doctoral Dissertation293 Doctoral University Mälardalen Extreme points of the Vandermonde Extreme pointsthe Vandermonde of and phenomenological exponential withmodelling power functions Lundengård Karl

Karl Lundengård EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND PHENOMENOLOGICAL MODELLING WITH POWER EXPONENTIAL FUNCTIONS 2019 ISBN 978-91-7485-431-2 ISSN 1651-4238 P.O. Box 883, SE-721 23 Västerås. Sweden 883, SE-721 23 Västerås. Box Address: P.O. Sweden 325, SE-631 05 Eskilstuna. Box Address: P.O. www.mdh.se E-mail: [email protected] Web: 1

Mälardalen University Press Dissertations No. 293

EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND PHENOMENOLOGICAL MODELLING WITH POWER EXPONENTIAL FUNCTIONS

Karl Lundengård

2019

School of Education, Culture and Communication 2

Copyright © Karl Lundengård, 2019 ISBN 978-91-7485-431-2 ISSN 1651-4238 Printed by E-Print AB, Stockholm, Sweden 3

Mälardalen University Press Dissertations No. 293

EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND PHENOMENOLOGICAL MODELLING WITH POWER EXPONENTIAL FUNCTIONS

Karl Lundengård

Akademisk avhandling som för avläggande av filosofie doktorsexamen i matematik/tillämpad matematik vid Akademin för utbildning, kultur och kommunikation kommer att offentligen försvaras torsdagen den 26 september 2019, 13.15 i Delta, Mälardalens högskola, Västerås.

Fakultetsopponent: Professor Palle Jorgensen, University of Iowa

Akademin för utbildning, kultur och kommunikation 4

Abstract This thesis discusses two topics, finding the extreme points of the Vandermonde determinant on various surfaces and phenomenological modelling using power-exponential functions. The relation between these two problems is that they are both related to methods for curve-fitting. Two applications of the mathematical models and methods are also discussed, modelling of electrostatic discharge currents for use in electromagnetic compatibility and modelling of mortality rates for humans. Both the construction and evaluation of models is discussed. In the first chapter the basic theory for later chapters is introduced. First the Vandermonde , a matrix whose rows (or columns) consists of of sequential powers, its history and some of its properties are discussed. Next, some considerations and typical methods for a common class of curve fitting problems are presented, as well as how to analyse and evaluate the resulting fit. In preparation for the later parts of the thesis the topics of electromagnetic compatibility and mortality rate modelling are briefly introduced. The second chapter discusses some techniques for finding the extreme points for the determinant of the on various surfaces including spheres, ellipsoids and cylinders. The discussion focuses on low dimensions, but some results are given for arbitrary (finite) dimensions. In the third chapter a particular model called the p-peaked Analytically Extended Function (AEF) is introduced and fitted to taken either from a standard for electromagnetic compatibility or experimental measurements. The discussion here is entirely focused on currents originating from lightning or electrostatic discharges. The fourth chapter consists of a comparison of several different methods for modelling mortality rates, including a model constructed in a similar way to the AEF found in the third chapter. The models are compared with respect to how well they can be fitted to estimated mortality rate for several countries and several years and the results when using the fitted models for mortality rate forecasting is also compared.

ISBN 978-91-7485-431-2 ISSN 1651-4238 5

Acknowledgements

Many thanks to all my coauthors and supervisors. My main supervisor, Pro- fessor Sergei Silvestrov, introduced me to the Vandermonde matrix and fre- quently suggested new problems and research directions throughout my time as a doctoral student. I have learned many lessons about mathematics and academia from him and my co-supervisor Professor Anatoliy Malyarenko. My other co-supervisor Dr. Milica Ranˇci´cplayed a crucial role and she is a role model with regards to conscientiousness, work ethic, communication and patience. I have learned invaluable lessons about interdisciplinary re- search, communication and time and resource management from her. I also want to thank Dr. Vesna Javor for her regular input that improved the research on electromagnetic compatibility considerably. Cooperating with other doctoral students was very valuable. Jonas Osterberg¨ and Asaph Keikara Muhumuza (with support from his super- visors Dr. John M. Mango and Dr. Godwin Kakuba) made important contributions to the research on the Vandermonde determinant and Samya Suleiman’s understanding of mortality rate forecasting and other aspects of actuarial mathematics was necessary for the work to progress. I am also glad that I had the opportunity to take part in the supervision of talented master students Andromachi Boulogari and Belinda Strass and use the foundations they laid in their degree projects for further research. Many thanks to all my coworkers at M¨alardalenUniversity, especially to Dr. Christopher Engstr¨om,Dr. Johan Richter and Docent Linus Carls- son for managing the bachelor’s and master’s programmes in Engineering mathematics together with me. Perhaps most importantly, I thank my family for all the support, en- couragement and assistance you have given me. A special mention to my sister for help with translating from 18th century French, it is perfectly un- derstandable that you decided to move to the other side of the Earth after that. I will wonder my whole life how my father, whose entire mathematics career consisted of unsuccessfully solving a single problem on the blackboard in 9th grade, would have reacted to this dissertation if he were still with us. Fortunately my mother continues to be an endless source of support and encouragement. I am continually surprised and delighted over how much of her work ethics, sense of quality and unhealthy work habits I seem to have inherited from her. Without the ideas, requests, remarks, questions, encouragements and patience of those around me this work would not have been completed.

Karl Lundeng˚ard V¨aster˚as, September, 2019

3 6

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Popul¨arvetenskaplig sammanfattning

Det finns m˚angaf¨oreteelser i v¨arldensom det ¨ar¨onskv¨artatt beskriva med en matematisk modell. I b¨astafall kan modellen h¨arledasifr˚anl¨amplig grundl¨aggandeteori men ibland ¨ardet inte m¨ojligtatt g¨oradet, antingen d¨arf¨oratt det inte finns n˚agonv¨alutvecklad teori eller f¨oratt den teori som finns kr¨aver information som inte ¨artillg¨anglig. I detta fall s˚abeh¨ovsen modell som, i n˚agonm˚an,st¨ammer ¨overens med teori och empiriska observa- tioner men som inte ¨arh¨arleddfr˚anden grundl¨aggandeteorin. S˚adana mod- eller kallas f¨orfenomenologiska modeller. I denna avhandling konstrueras fenomenologiska modeller av tv˚aolika fenomen, str¨ommeni elektrostatiska urladdningar och d¨odsrisk. Elektrostatiska urladdningar sker n¨arladdning snabbt fl¨odarfr˚anett objekt till ett annat. V¨albekanta exempel ¨arblixtnedslag eller sm˚ast¨otar orsakade av statisk elektricitet. F¨oringenj¨orer¨ardet viktigt att kunna beskriva denna typ av elektriska str¨ommarf¨oratt se till att elektroniska system inte ¨arf¨ork¨ansligaf¨orelektromagnetisk p˚averkan utifr˚anoch att de inte st¨orandra system d˚ade anv¨ands. D¨odsrisken beskriver sannolikheten f¨ord¨odvid en viss ˚alder.Den kan anv¨andasf¨oratt uppskatta livskvaliteten i ett land eller andra demografiska eller f¨ors¨akringsrelaterade¨andam˚al. En egenskap hos b˚adeelektrostatiska urladdningar och d¨odsrisksom kan vara utmanande att modellera ¨aromr˚adend¨aren brant ¨okningf¨oljs av en l˚angsams¨ankning.S˚adanam¨onsterf¨orekommer ofta i elektrostatiska urladdningar och i m˚angal¨ander¨okar d¨odsrisken kraftigt vid ¨overg˚angen fr˚anbarn till vuxen och f¨or¨andrassedan l˚angsamt fram till tidig medel˚alder. I denna avhandling anv¨andsen matematisk funktion som kallas poten- sexponentialfunktionen som en byggsten f¨oratt konstruera fenomenologiska modeller av str¨ommeni elektrostatiska urladdningar samt d¨odsriskutifr˚an empiriska data f¨orrespektive fenomen. F¨orelektrostatiska urladdningar f¨oresl˚asen metod som kan konstruera modeller med olika noggrannhet och komplexitet. F¨ord¨odsrisker f¨oresl˚asn˚agraenkla modeller som sedan j¨amf¨ors med tidigare f¨oreslagnamodeller. I avhandlingen diskuteras ocks˚aextrempunkterna hos Vandermonde de- terminanten. Detta ¨arett matematiskt problem som f¨orekommer inom flera olika omr˚adenmen f¨oravhandlingen ¨arden mest relevanta till¨ampningen att extrempunkterna kan hj¨alpatill att v¨aljal¨ampligadata att anv¨anda n¨ar man konstruerar modeller med hj¨alpav en teknik som kallas f¨oroptimal design. N˚agraallm¨annaresultat f¨orhur extrempunkterna kan hittas p˚adi- verse ytor, t.ex. sf¨areroch kuber, presenteras och det ges exempel p˚ahur resultaten kan till¨ampas.

4 7

Popular science summary

There are many phenomena in the world that it is desirable to describe using a mathematical model. Ideally the mathematical model is derived from the appropriate fundamental theory but sometimes this is not feasible, either because the fundamental theory is not well understood or because the theory requires a lot of information to be applicable. In these cases it is necessary to create a model that, to some degree, matches the fundamental theory and the empirical observations, but is not derived from the fundamental theory. Such models are called phenomenological models. In this thesis phenomenological models are constructed for two phenomena, electrostatic discharge currents and mortality rates. Electrostatic discharge currents are rapid flows of electric charge from one object to another. Well-known examples are lightning strikes or small electric chocks caused by static electricity. Describing such currents is im- portant when engineers want to ensure that electronic systems are not dis- turbed too much by external electromagnetic disturbances or disturbs other systems when used. Mortality rate describes the probability of a dying at certain age. It can be used to assess the quality of life in a country or for other demographical or actuarial purposes. For electrostatic discharge currents and mortality rates an important feature that can be challenging to model is a steep increase followed by a slower decrease. This pattern is often observed in electrostatic discharge currents and in many countries the mortality rate increases rapidly in the transition from childhood to adulthood and then changes slowly until the beginning of middle age. In this thesis a mathematical function called the power-exponential func- tion is used as a building block to construct phenomenological models of electrostatic discharge currents and mortality rates based on empirical data for the respective phenomena. For electrostatic discharge currents a method- ology for constructing models with different accuracy and complexity is pro- posed. For the mortality rates a few simple models are suggested and com- pared to previously suggested models. The thesis also discusses the extreme points of the Vandermonde deter- minant. This is a mathematical problem that appears in many areas but for this thesis the most relevant application is that it helps choosing the appropriate data to use when constructing a model using a technique called optimal design. Some general results for finding the extreme points of the Vandermonde determinant on various surfaces, e.g. spheres or cubes, and applications of these results are discussed.

5 8

Notation

Matrix and vector notation v, M - Bold, roman lower- and uppercase letters denote vectors and matrices respectively.

Mi,j - Element on the ith row and jth column of M. M·,j, Mi,· - Column (row) vector containing all elements from the jth column (ith row) of M. nm [aij]ij - n × m matrix with element aij in the ith row and jth column.

Vnm, Vn = Vnn - n × m Vandermonde matrix. Gnm, Gn = Gnn - n × m generalized Vandermonde matrix. Anm, An = Ann - n × m .

Standard sets Z, N, R, C - Sets of all integers, natural numbers (including 0), real numbers and complex numbers. n n n Sp , S = S2 - The n-dimensional sphere defined by the p - norm, ( n+1 ) n n+1 X p Sp (r) = x ∈ R |xk| = r . k=1 k C [K] - All functions on K with continuous kth derivative.

Special functions Definitions can be found in standard texts. Suggested sources use notation consistent with thesis. (α,β) Hn, Pn - Hermite and Jacobi , see [2]. Γ(x), γ(x, y), ψ(x) - The Gamma-, incomplete Gamma and Digamma functions, see [2].

2F2(a, b; c; x) - The , see [2].   m,n a Gp,q z - The Meijer G-function, see [236]. b Ei(x) - The exponential integral, see [2].

6 9

Probability theory and Pr[A] - Probability of event A. Pr[A|B] - Conditional probability of event A given B.

EX [Y ] - Expected value of quantity Y with respect to X. Var(X) - Variance of X. AIC - Akaike Information Criterion, see Definition 1.14.

AICC - Second order correction of the AIC, see Remark 1.9. I(f, g) - Kullback–Leibler divergence, see Definition 1.15.

Mortality rate

Sx(∆x) - Survival function, see Definition 1.19. Tx - Remaining lifetime for an individual of age x. µ(x) - Mortality rate at age x, see Definition 1.20. mx,t - Central mortality rate at age x, year t, see page 66.

Other df = f 0(x) - Derivative of the function f with respect to x. dx dkf = f (k)(x)- kth derivative of the function f with respect to x. dxk ∂f = f 0(x) - Partial derivative of the function f with respect to x. ∂x a¯b - Rising a¯b = a(a + 1) ··· (a + b − 1).

7 10 11

Contents

List of Papers 13

1 Introduction 15 1.1 The Vandermonde matrix ...... 19 1.1.1 Who was Vandermonde? ...... 19 1.1.2 The Vandermonde determinant ...... 21 1.1.3 Inverse of the Vandermonde matrix ...... 25 1.1.4 The alternant matrix ...... 26 1.1.5 The generalized Vandermonde matrix ...... 29 1.1.6 The Vandermonde determinant in systems with Coulombian interactions ...... 30 1.1.7 The Vandermonde determinant in theory ...... 33 1.2 Curve fitting ...... 37 1.2.1 Linear interpolation ...... 37 1.2.2 Generalized divided differences and interpolation . . . 42 1.2.3 Least squares fitting ...... 45 1.2.4 Linear least squares fitting ...... 45 1.2.5 Non-linear least squares fitting ...... 46 1.2.6 The Marquardt least squares method ...... 47 1.3 Analysing how well a curve fits ...... 50 1.3.1 Regression ...... 50 1.3.2 Quantile-Quantile plots ...... 52

9 12

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

1.3.3 The Akaike information criterion ...... 53 1.4 D-optimal experiment design ...... 57 1.5 Electromagnetic compatibility and electrostatic discharge currents ...... 60 1.5.1 Electrostatic discharge modelling ...... 62 1.6 Modelling mortality rates ...... 65 1.6.1 Lee–Carter method for forecasting ...... 68 1.7 Summaries of papers ...... 71

2 Extreme points of the Vandermonde determinant 75 2.1 Extreme points of the Vandermonde determinant and related on various surfaces in three dimensions . . . . . 77 2.1.1 Optimization of the generalized Vandermonde deter- minant in three dimensions ...... 77 2.1.2 Extreme points of the Vandermonde determinant on the three-dimensional unit sphere ...... 81 2.1.3 Optimisation using Gr¨obnerbases ...... 82 2.1.4 Extreme points on the ellipsoid in three dimensions . 83 2.1.5 Extreme points on the cylinder in three dimensions . . 85 2.1.6 Optimizing the Vandermonde determinant on a sur- face defined by a homogeneous ...... 87 2.2 Extreme points of the Vandermonde determinant on the sphere 89 2.2.1 The extreme points on the sphere given by roots of a polynomial ...... 89 2.2.2 Further visual exploration on the sphere ...... 96 2.3 Extreme points of the Vandermonde determinant on some surfaces implicitly defined by a univariate polynomial . . . . 103 2.3.1 Critical points on surfaces given by a first degree uni- variate polynomial ...... 104 2.3.2 Critical points on surfaces given by a second degree univariate polynomial ...... 105 2.3.3 Critical points on the sphere defined by a p-norm . . . 107 2.3.4 The case p = 4 and n =4...... 107

10 13

CONTENTS

2.3.5 Some results for even n and p ...... 110 2.3.6 Some results for cubes and intersections of planes . . . 118 2.3.7 Optimising the probability density function of the eigenvalues of the Wishart matrix ...... 120

3 Approximation of electrostatic discharge currents using the analytically extended function 123 3.1 The analytically extended function (AEF) ...... 125 3.1.1 The p-peak analytically extended function ...... 126 3.2 Approximation of lightning discharge current functions . . . . 133 3.2.1 Fitting the AEF ...... 133 3.2.2 Estimating parameters for underdetermined systems . 134 3.2.3 Fitting with data points as well as charge flow and specific energy conditions ...... 135 3.2.4 Calculating the η-parameters from the β-parameters . 138 3.2.5 Explicit formulas for a single-peak AEF ...... 139 3.2.6 Fitting to lightning discharge currents ...... 140 3.3 Approximation of electrostatic discharge currents using the AEF by interpolation on a D-optimal design ...... 143 3.3.1 D-optimal approximation for exponents given by a class of arithmetic sequences ...... 145 3.3.2 D-optimal interpolation on the rising part ...... 146 3.3.3 D-optimal interpolation on the decaying part . . . . . 148 3.3.4 Examples of models from applications and experiments 150 3.3.5 Modelling of ESD currents ...... 150 3.3.6 Modelling of lightning discharge currents ...... 152 3.3.7 Summary of ESD modelling ...... 159

4 Comparison of models of mortality rate 161 4.1 Modelling and forecasting mortality rates ...... 162 4.2 Overview of models ...... 162 4.3 Power-exponential mortality rate models ...... 164

11 14

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

4.3.1 Multiple humps ...... 165 4.3.2 Single hump model ...... 165 4.3.3 Split power-exponential model ...... 166 4.3.4 Adjusted power-exponential model ...... 166 4.4 Fitting and comparing models ...... 167 4.4.1 Some comments on fitting ...... 168 4.4.2 Results and discussion ...... 174 4.5 Comparison of parametric models applied to mortality rate forecasting ...... 178 4.5.1 Comparison of models ...... 180 4.5.2 Results, discussion and further work ...... 180

References 185

Index 209

List of Figures 211

List of Tables 215

List of Definitions 216

List of Theorems 217

List of Lemmas 218

12 15

List of Papers

Paper A Karl Lundeng˚ard,Jonas Osterberg¨ and Sergei Silvestrov. Extreme points of the Vandermonde determinant on the sphere and some limits involving the generalized Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019.

Paper B Karl Lundeng˚ard,Jonas Osterberg¨ and Sergei Silvestrov. Optimization of the determinant of the Vandermonde matrix on the sphere and related surfaces. Methodology and Computing in Applied Probability, Volume 20, Issue 4, pages 1417 – 1428, 2018.

Paper C Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Extreme points of the Vandermonde determinant on surfaces implicitly determined by a univariate polynomial. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019.

Paper D Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Optimization of the Wishart joint eigenvalue probability density distribution based on the Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019.

Paper E Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. On some properties of the multi-peaked analytically extended function for approximation of lightning discharge currents. Chapter 10 in Engineering Mathematics I: Electromagnetics, Fluid Mechanics, Material Physics and Financial Engineering, Volume 178 of Springer Proceedings in Mathematics & Statistics, Sergei Silvestrov and Milica Ranˇci´c(Eds), Springer International Publishing, pages 151–176, 2016. 16

Paper F Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. Estimation of parameters for the multi-peaked AEF current functions. Methodology and Computing in Applied Probability, Volume 19, Issue 4, pages 1107 – 1121, 2017.

Paper G Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. Electrostatic discharge currents representation using the analytically extended function with p peaks by interpolation on a D-optimal design. Facta Universitatis Series: Electronics and Energetics, Volume 32, Issue 1, pages 25 – 49, 2019.

Paper H Karl Lundeng˚ard,Milica Ranˇci´cand Sergei Silvestrov. Modelling mortality rates using power-exponential functions. Submitted to journal, 2019.

Paper I Andromachi Boulougari, Karl Lundeng˚ard,Milica Ranˇci´c, Sergei Silvestrov, Belinda Strass and Samya Suleiman. Application of a power-exponential function based model to mortality rates forecasting. Communications in Statistics: Case Studies, Data Analysis and Applications, Volume 5, Issue 1, pages 3 – 10, 2019.

Parts of the thesis have been presented at the following international conferences:

• ASMDA 2015 - 16th Applied Stochastic Models and Data Analysis In- ternational Conference with 4th Demographics 2015 Workshop, Piraeus, Greece, June 30 – July 4, 2015.

• SPLITECH 2017 - 2nd International Multidisciplinary Conference on Computer and Energy Science, Split, Croatia, July 12 – 14, 2017.

• EMC+SIPI 2017 - IEEE International Symposium on Electromagnetic Compatibility, Signal and Power Integrity, Washington DC, USA, August 7 – 11, 2017.

• SPAS 2017 - International Conference on Stochastic Processes and Alge- braic Structures, V¨aster˚as,Sweden, October 4 – 6, 2017.

• SMTDA 2018 - 5th Stochastic Modelling Techniques and Data Analysis International Conference, Chania, Crete, Greece, June 12 – 15, 2018.

• IWAP 2018 - 9th International Workshop on Applied Probability, Bu- dapest, Hungary, June 18–21, 2018.

Summaries of papers A-I with a brief description of the thesis authors contributions to each paper can be found in Section 1.7.

14 17

Chapter 1

Introduction

This chapter is partially based on Papers D, E, H, and I

Paper D Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Optimization of the Wishart joint eigenvalue probability density distribution based on the Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019.

Paper E Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. On some properties of the multi-peaked analytically extended function for approximation of lightning discharge currents. Chapter 10 in Engineering Mathematics I: Electromagnetics, Fluid Mechanics, Material Physics and Financial Engineering, Volume 178 of Springer Proceedings in Mathematics & Statistics, Sergei Silvestrov and Milica Ranˇci´c(Eds), Springer International Publishing, pages 151–176, 2016.

Paper H Karl Lundeng˚ard,Milica Ranˇci´cand Sergei Silvestrov. Modelling mortality rates using power-exponential functions. Submitted to journal, 2019.

Paper I Andromachi Boulougari, Karl Lundeng˚ard,Milica Ranˇci´c, Sergei Silvestrov, Belinda Strass and Samya Suleiman. Application of a power-exponential function based model to mortality rates forecasting. Communications in Statistics: Case Studies, Data Analysis and Applications, Volume 5, Issue 1, pages 3 – 10, 2019. 18 19

INTRODUCTION

Two topics are discussed in this thesis, finding the extreme points of the Vandermonde determinant and phenomenological modelling using power- exponential functions. Several of the methods and approaches that are discussed are also applied to modelling of electrical current for use in elec- tromagnetic compatibility, or to modelling of mortality rate of humans for actuarial or demographical purposes. The topics are related since the ex- treme points of the Vandermonde determinant is relevant for certain curve fitting problems that can appear in the construction of the phenomenologi- cal models. An overview of the major relations between the different parts of the thesis are illustrated in Figure 1.1. The relations are of many kinds, common definitions and dependent results, conceptual connections as well as similarities in proof techniques and problem formulations. This thesis is based on the nine papers listed on pages 13–14. The contents of the papers have been rearranged (and in some cases parts have been omitted) to avoid repetition and improve cohesion, but the original text and structure of the papers have been largely preserved. Significant parts of Chapters 1-3 have also appeared in [180]. If a section is based on a paper this is specified at the beginning of the section and unless otherwise specified any subsections are from the same source. A section that is based on a paper contains text from the paper that is unchanged except for modifications to correct misprints and ensure consistency within the thesis. Chapter 1 introduces concepts used in later chapters. The Vandermonde matrix, its history, applications, generalizations and some of its proper- ties are introduced in Section 1.1. Section 1.2 discusses a few different ap- proaches to curve fitting. Section 1.3 discusses a few methods for evaluating the result. Basic optimal design is discussed in Section 1.4. Sections 1.5 and 1.6 introduce electromagnetic compatibility and mortality rate modelling. Chapter 2 discusses the optimisation of the Vandermonde determinant over various surfaces. First the extreme points on a few different surfaces in three dimensions are examined, see Section 2.1. In Section 2.2 the determi- nant is optimised on the sphere in higher dimensions and some results for surfaces defined by a univariate polynomial are discussed in Section 2.3. Chapter 3 discusses fitting a piecewise non- model to data. The particular model is introduced in Section 3.1 and a general frame- work for fitting it to data using the Marquardt least squares method is de- scribed in Sections 3.2.1–3.2.5. The framework is then applied to lightning discharge currents in Section 3.2.6. An alternate curve fitting method based on D-optimal interpolation (found analogously to the results in Section 2.2) is described and applied to electrostatic discharge currents in Section 3.3. Chapter 4 compares several different mathematical models of mortality rate for humans. The comparison is done by fitting the models to central mortality rates from several different countries and then analysing how well the model fits and what happens when the results of the fitting is used for mortality rate forecasting (using the so called Lee–Carter method).

17 20

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Curve fitting

Least squares Linear Non-linear method least squares fitting least squares fitting Section 1.2.3 Section 1.2.4 Section 1.2.5

The Marquardt Linear interpolation D-optimal design least squares method Section 1.2.1 Section 1.4 Section 1.2.6

Extreme points of Phenomenological modelling with the Vandermonde power-exponential functions determinant

Vandermonde matrix Power exponential function Section 1.1

Extreme points on Electromagnetic Evaluation of various surfaces in 3D compatibility curve fit Section 2.1 Section 1.5 Section 1.3

Optimization The Analytically Mortality rate on a sphere Extended Function modelling Section 2.2 Section 3.1 Section 1.6

Optimization on a Mortality rate Lightning discharge surface defined by a models fitted current modelling univariate polynomial to data Section 3.2 Section 2.3 Section 4.1

Mortality rate Interpolation on models applied a D-optimal design to forecasting Section 3.3 Section 4.5

Figure 1.1: Illustration of the most significant connections in the thesis.

18 21

1.1. THE VANDERMONDE MATRIX

1.1 The Vandermonde matrix

The Vandermonde matrix is a well-known matrix with a very special form that appears in many different circumstances, a few examples are (see Sections 1.2.1 and 1.2.2), least squares curve fitting (see Section 1.2.3), optimal experiment design (see Section 1.4), construction of error-detecting and error-correcting codes (see [31, 124, 242] as well as more recent work such as [28]), determining if a market with a finite set of traded assets is complete [62], calculation of the discrete Fourier transform [241] and related transforms such as the fractional discrete Fourier transform [215], the quantum Fourier transform [70], and the Vandermonde transform [11, 12], solving systems of differential equations with constant coefficients [213], various problems in mathematical physics [283], nuclear physics [51], and quantum physics [249, 271], systems of Coulombian interactions (see Section 1.1.6) and describing properties of the matrix of stationary stochastic processes [158] and in various places in random matrix theory (see Sections 1.1.7 and 2.3.7). In this section we will review some of the basic properties of the Van- dermonde matrix, starting with its definition. Definition 1.1. A Vandermonde matrix is an n × m matrix of the form  1 1 ··· 1  h im,n  x1 x2 ··· xn  V (x ) = xi−1 =   (1) mn n j  . . .. .  i,j  . . . .  m−1 m−1 m−1 x1 x2 ··· xn where xi ∈ C, i = 1, . . . , n. If the matrix is square, n = m, the notation Vn = Vnm will be used. Remark 1.1. Note that in the literature the term Vandermonde matrix is often used for the of the matrix given above.

1.1.1 Who was Vandermonde? The matrix is named after Alexandre Th´eophileVandermonde (1735–1796) who had a varied career that began with law studies and some success as a concert violinist, transitioned into work in science and mathematics in the beginning of the 1770s that gradually turned into administrative and leadership positions at various Parisian institutions as well as work in politics and economics in the end of the 1780s [86]. His entire mathematical career consisted of four published papers, first presented to the French Academy of Sciences in 1770 and 1771 and published a few years later. The first paper, M´emoire sur la r´esolutiondes ´equations [279], discusses some properties of the roots of polynomial equations, more specifically for- mulas for the sum of the roots and a sum of symmetric functions of the pow-

19 22

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions ers of the roots. This paper has been mentioned as important since it con- tains some of the fundamental ideas of group theory (see for instance [168]), but generally this work is overshadowed by the works of the contempo- rary Joseph Louis Lagrange (1736–1813) [166]. He also notices the equality a2b + b2c + ac2 − a2c − ab2 − bc2 = (a − b)(a − c)(b − c), which is a special case of the formula for the determinant of the Vandermonde matrix, but this connection is not discussed in the paper. The second paper, Remarques sur des probl`emesde situation [280], dis- cusses the problem of the knight’s tour (what sequence of moves allows a knight to visit all squares on a chessboard exactly once). This paper is con- sidered the first mathematical paper that uses the basic ideas of what is now called knot theory [237]. The third paper, M´emoire sur des irrationnelles de diff´erents ordres avec une application au cercle [281], is a paper on combinatorics and the most well-known result from the paper is the Chu–Vandermonde identity, n  k  n−k   n  X Y r + 1 − j Y s + 1 − j Y r + s + 1 − j = ,  j   j   j  k=1 j=1 j=1 j=1 where r, s ∈ R and n ∈ Z. The identity was first found by Chu Shih-Chieh ca 1260 – ca 1320, traditional chinese: 朱世傑  in 1303 in The precious mirror of the four elements 四元玉  and was rediscovered (apparently independently) by Vandermonde [8, 223]. In the fourth paper M´emoire sur l’´elimination [282] Vandermonde dis- cusses some ideas for what we today call determinants, which are functions that can tell us if a linear equation system has a unique solution or not. The paper predates the modern definitions of determinants but Vander- monde discusses a general method for solving linear equation systems using alternating functions, which has strong relation to determinants. He also notices that exchanging exponents for indices in a class of expressions from his first paper will give a class of expressions that he discusses in his fourth paper [300]. This relation is mirrored in the relationship between the deter- minant of the Vandermonde matrix and the determinant of a general matrix described in Theorem 1.3. While Vandermonde’s papers can be said to contain many important ideas they do not bring any of them to maturity and he is therefore usu- ally considered a scientist and mathematician compared to well- known contemporary mathematicians such as Etienne´ B´ezout(1730–1783) and Pierre-Simon de Laplace (1749–1827) or scientists such as the chemist Antoine Lavoisier (1743–1794) that he worked with for some time after his mathematical career. The Vandermonde matrix does not appear in any of Vandermonde’s published works, which is not surprising considering that the modern matrix concept did not really take shape until almost a hundred years later in the works of Sylvester and Cayley [43, 268]. It is therefore

20 23

1.1. THE VANDERMONDE MATRIX strange that the Vandermonde matrix was named after him, a thorough discussion on this can be found in [300], but a possible reason is the simple formula for the determinant that Vandermonde briefly discusses in his fourth paper can be generalized to a Vandermonde matrix of any size. One of the main reasons that the Vandermonde matrix has become known is that it has an exceptionally simple expression for its determinant that in turn has a surprisingly fundamental relation to the determinant of a general matrix. We will be taking a closer look at the determinant of the Vandermonde ma- trix and related matrices several times in this thesis so the next section will introduce it and some of its properties.

1.1.2 The Vandermonde determinant Often it is not the Vandermonde matrix itself that is useful, instead it is the multivariate polynomial given by its determinant that is examined and used. The determinant of the Vandermonde matrix is usually called the Vander- monde determinant (or or Vandermondian [283]) and can be written using an exceptionally simple formula. But before we discuss the Vandermonde determinant we will disuss the general determi- nant. Definition 1.2. The determinant is a function of square matrices over a field F to the field F, det : Mn×n(F) → F such that if we consider the determinant as a function of the columns det(M) = det(M·,1, M·,2,..., M·,n) of the matrix the determinant must have the following properties

• The determinant must be multilinear det(M·,1, . . . , aM·,k + bN·,k,..., M·,n)

= a det(M·,1,..., M·,k,..., M·,n) + b det(M·,1,..., N·,k,..., M·,n).

• The determinant must be alternating, that is if M·,i = M·,j for some i 6= j then det(M) = 0.

• If I is the then det(I) = 1.

Remark 1.2. Defining the multilinear and alternating properties from the rows of the matrix will give the same determinant. The name of the alter- nating property comes from the fact that it combined with multilinearity implies that switching places between two columns changes the sign of the determinant. This definition of the determinant is quite abstract but it is sufficient to define a unique function.

21 24

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Theorem 1.1 (Leibniz formula for determinants). A standard result from says that the determinant is unique and that it is given by the following formula n X I(σ) Y det(M) = (−1) mi,σ(i) (2) σ∈Sn i=1 where Sn is the set of all permutations of the set {1, 2, . . . , n}, that is all lists that contain the numbers 1, 2, . . . , n exactly once, and if σ is a permutation then σ(i) is the ith element of that permutation. Remark 1.3. Often formula (2) is used immediately as the definition of the determinant of a matrix, see for instance [9]. The formula is usually attributed to Gottfried Wilhem Leibniz (1646–1716), probably due to a letter that he wrote to Guillaume de l’Hˆopital(1661–1704) in 1693 where he describes a method of solving linear equation systems that is closely related to Cramer’s rule [218], the particular letter was published in [173] and a translation can be found in [263]. The determinant has several uses and interpretations, for example • If det(M) 6= 0 then the vectors corresponding to the columns (or rows) are linearly independent. Compare this to the properties of the matrix described on page 27. • If the columns (or rows) of M are interpreted as sides defining an n-dimensional parallelepiped the absolute value of det(M) will give the volume of this parallelepiped. Compare this to the interpretation of D-optimality on page 58. The sign of the determinant is also important when considering the orientation of the surface which is highly relevant in geometric algebra and integration over several variables, see [123, 246] for examples in geometric algebra, physics and analysis. We will now discuss the Vandermonde determinant specifically.

Theorem 1.2. The Vandermonde determinant, vn(x1, . . . , xn), is given by Y vn(x1, . . . , xn) = det(Vn(x1, . . . , xn)) = (xj − xi). 1≤i

22 25

1.1. THE VANDERMONDE MATRIX

Proof of Theorem 1.2. There are many versions of this proof, see for exam- ple [18,36,42,126], with focus on different aspects of the proof. Here we will provide a fairly concise version that still makes all the steps of the proof clear. We start by only considering one of the variables xk, which gives a single variable function vn(xk). From the general expression for the deter- minant, expression (2) it is clear that vn(xk) must be a polynomial of degree n in xk. We also know that if we let xk = xi for any 1 ≤ i ≤ n, i 6= k, the determinant will be equal to zero since the corresponding matrix will have two identical columns. Thus if vn(xi) = 0 we can write n Y vn(xk) = P (xk) (xk − xi) i=1 i6=k where P (xk) is a polynomial. If we repeat this argument for all the variables, and ensure that no roots appear twice in the factorization, we get n−1 Y vn(x1, . . . , xn) = Pn(x1, . . . , xn) (xn − xi) i=1 n−2 n−1 Y Y = Pn−1(x1, . . . , xn) (xn−1 − xi) (xn − xi) i=1 i=1 n−1 Y = P0(x1, . . . , xn)(x2 − x1)(x3 − x2)(x3 − x1) ··· (xn − xi) i=1 and since this factorization has each xk appear as a root n times we can conclude that Y vn(x1, . . . , xn) = det(Vn(x1, . . . , xn)) = C (xj − xi) 1≤i

Theorem 1.3. There is a relationship between the exponents of the expanded Vandermonde determinant and the indices in the expression for a general determinant, more specifically

n ! n ! Y Y Y xi vn(x1, . . . , xn) = xi (xj − xi) i=1 i=1 1≤i

23 26

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Proof. We will prove this theorem by showing that replacing exponents with indices will give a function that by Definition 1.2 is a determinant. In Definition 1.2 we interpreted the determinant as a function of the columns of the matrix, for the Vandermonde determinant this corresponds to a function of the xi since they define the columns. Here we will interpret each part of Definition 1.2 as a statement about the xi and then show how it is implied by the Vandermonde determinant.

• Alternating: The alternating property is easy to interpret in terms of the xi since if xi = xj for some i 6= j then we have two identical columns. Consider the product form of the Vandermonde determinant given in Theorem 1.2. Switching places between xi and xj with i < j in the Vandermonde determinant is equal to switching sign in all factors that contain either xi or xj as well as xk with i ≤ k ≤ j. There will be j − i − 1 factors that contain xi and satisfy i < k ≤ j and j − i − 1 factors that contain xj satisfy i ≤ k < j and one factor (xi − xj). This means that in total we will change sign in 2(j − i) − 1 factors which means the sign of the whole product will change.

• Multilinearity: If we denote the left hand side in (3) with w

n ! Y w = xk vn(x1, . . . , xn) k=1 then multiplying the kth column by a scalar can be interpreted as follows n n X i X i M·,k → aM·,k ⇔ w = xkci → axkci i=1 i=1 and addition of columns as n n X i X i i M·,k → M·,k + N·,k ⇔ w = xkci → (xk + yk)ci i=1 i=1 and multilinearity follows immediately from this.

• det(I) = 1: For the identity matrix we have ( 1 i = j xi,j = 0 i 6= j which for the expanded Vandermonde determinant corresponds to the transformation ( 1 i = j xj → i 0 i 6= j

24 27

1.1. THE VANDERMONDE MATRIX when expanding the Vandermonde determinant we get n−1 Y vn(x1, . . . , xn) = vn−1(x1, . . . , xn−1) (xn − xi) i=1 n−1 = xn vn−1(x1, . . . , xn−1) + P (n) n−2 n−1 Y = xn vn−2(x1, . . . , xn−2) (xn−1 − xi) + P (n) i=1 n−1 n−2 = xn xn−1vn−2(x1, . . . , xn−2) + P (n, n − 1) n Y k−1 = xk + P (n, n − 1,..., 1) k=1 k−1 where P (I), I ⊂ Z>0, does not contain any terms of the form xk for all k ∈ I. Thus applying the transformation corresponding to the identity matrix we get

n ! n n Y Y k Y xi vn(x1, . . . , xn) = xk + P (n, . . . , 1) → 1 + 0 = 1. i=1 k=1 k=1

Thus if we take the right hand side in equation (3) and exchange exponents for indices we get a determinant be Definition 1.2 and since the determinant j is unique by Theorem 1.1 and xi,j = xi in the Vandermonde matrix this must be equal to

n ! n Y X I(σ) Y i xi vn(x1, . . . , xn) = (−1) xσ(i). i=1 σ∈Sn i=1

1.1.3 Inverse of the Vandermonde matrix The inverse for the Vandermonde matrix has been known for a long time, es- pecially since the solution to a Lagrange interpolation problems (see Section 1.2.1) gives the inverse indirectly. Here we will only give a short overview of the work on expressing the inverse as an explicit matrix. An explicit expression for the inverse matrix has been known since at least the end of the 1950s, see [199].

Theorem 1.4. The elements of the inverse of an n-dimensional Vander- monde matrix V can be calculated by j−1 −1 (−1) σn−j,i Vn ij = n (4) Y (xk − xi) k=1 k6=i

25 28

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

where σj,i is the j:th elementary with variable xi set to zero. j X Y  1 , a = b σ = x (1 − δ ) , δ = (5) j,i mk mk,i a,b 0 , a 6= b 1≤m1

We will not give the proof of this theorem here, but the general outline of a proof will be given in Section 1.2.1. In the literature there are many cases where the inverse is instead written as a product of several simpler matrices, usually triangular or diagonal [214, 225,226,277]. There is also a lot of literature that takes a more algorithmic approach and tries to find fast ways of computing the elements, classical examples include the Parker–Traub algorithm [274] and the Bj¨orck–Pereyra algorithm [23], and more recent results can be found in [84].

1.1.4 The alternant matrix Many generalizations of the Vandermonde matrix have been proposed and studied in the literature. An early generalization is the alternant matrix which is a matrix that exchanges the powers in the Vandermonde matrix with other functions [219]. Definition 1.3. An alternant matrix is a matrix of the form   f1(x1) f1(x2) ··· f1(xn)  f2(x1) f2(x2) ··· f2(xn)  A (f ; x ) = [f (x )]m,n =   (6) mn m n i j i,j  . . .. .   . . . .  fm(x1) fm(x2) ··· fm(xn) where fi : F → F where F is a field. If the matrix is square, n = m, the notation An = Anm will be used. Remark 1.4. Someties the alternant matrix is used as an alternative name for the Vandermonde matrix or the Vandermonde matrix multiplied by a [276]. There are several special cases of alternant matrices that are useful or interesting in various mathematical fields:

Interpolation and curve fitting

Just like the Vandermonde matrix can be used for polynomial interpolation the alternant matrix can be used to describe interpolation with other sets of function, see Section 1.2.1 and 1.2.2, as well as approximate curve fitting, for example using the least squares method described in Section 1.2.3.

26 29

1.1. THE VANDERMONDE MATRIX

Alternant codes As mentioned on page 19 there are several different error-detecting and error-correcting codes that can be described using the Vandermonde matrix. These and some related codes can also be categorized as alternant codes, a term introduced in [121]. For a survey on these codes see [295].

Jacobian matrix One of the most well-known examples of an alternant matrix is the Jaco- n n bian matrix. Let f : F → F be a vector-valued function that is n times differentiable with respect to each variable, then the Jacobian matrix is the matrix J, given by   ∂y1 ∂y2 ··· ∂yn ∂x1 ∂x1 ∂x1  ∂y1 ∂y2 ··· ∂yn   ∂x2 ∂x2 ∂x2   . . . .   . . .. .   . . .  ∂y1 ∂y2 ··· ∂yn ∂xn ∂xn ∂xn where y = f(x). The most common application of the Jacobian matrix is to use its determinant to describe how volume elements are deformed when changing variables in multivariate calculus [246]. The numerous applications and generalizations that follow from this alone are too numerous to list so here we will only note that it holds a central role in many methods for multivariate optimizations, such as the Marquardt least squares method described in Section 1.2.6.

Wronskian matrix di−1 n−1 If fn = (f1, . . . , fn), fi = dxi−1 , and gn = (g1, . . . , gn), gi ∈ C [C], then the alternant matrix An(fn; gn) will be the Wronskian matrix. The Wronskian matrix has a long history [125] and is commonly used to test if a set of functions are linearly independent as well as finding solutions to ordinary differential equations [101]. If the determinant of the Wronskian matrix is non-zero then the functions are linearly independent, see [27,32], but proving linear dependence requires further conditions, see [25, 26,230, 231, 293]. A classical application of the Wronskian is confirming that a set of solu- tions to a linear differential equation are linearly independent, or if n−1 lin- early independent solutions are known, constructing the remaining linearly independent solution using Abel’s identity (for n = 2) or a generalisation of it [34]. If Li is a linear partial differential operator of order i, then the alternant matrix An(Ln; gn), where Ln = (L1,...,Ln), is the generalized Wronskian matrix [227], has been used in for example diophantine geometry [82, 244] and for solving Korteweg-de Vries equations, see [197] and the references

27 30

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions therein. The generalized Wronskian matrix has similar properties with re- spect to the linear dependence of the functions it is created from as the standard Wronskian [294]. Both the Wronskian and generalized Wronskian is also useful in algebraic geometry, see [101] for several examples.

Bell matrix Alternating matrices can be used to convert function composition into ma- di−1 j trix multiplication. By letting Di = dxi−1 and gj(x) = (f(x)) , where f is infinitely differentiable, the alternant matrix B[f] = An(Dn, gn) is called a Bell matrix (its transpose is known as the Carleman matrix). Some au- thors, for instance [159], refer to Bell matrices as Jabotinsky matrices due to a special case of Bell matrices considered in [137]. That Bell matrices converts function composition into matrix multiplica- tion can be seen by noting that the power series expansion of the jth power of f can be written as

∞ j X i (f(x)) = B[f]ijx i=1 and from this equality follows that B[f ◦ g] = B[g]B[f]. This is the basic property behind a popular technique called Carleman linearisation or Carle- man embedding that has seen wide use in the theory of non-linear dynamical systems. The literature on the subject is vast but a systematic introduction is offered in [165].

Moore matrix When working in a finite field with prime characteristic p an analogue of the Vandermonde and Wronskian matrix can be constructed by taking an alternant matrix where the rows are given by power of the Frobenius au- tomorphism, F (ω) = ωp. This matrix is called the and is named after its originator E. H. Moore who also calculated its determinant,

ω1 ··· ωn p p n−1 p−1 p−1 ω1 ··· ωn Y Y Y . . . = ··· (ωi + ki−1ωi−1 + ... + k1ω1)(mod p), . .. . i=1 k k =0 pn−1 pn−1 i−1 1 ω1 ··· ωn and showed that if this determinant was not equal to zero then ω1,..., ωn are linearly independent [211]. There are several uses for the determinant of the Moore matrix in function field arithmetic, see for instance [113], a classical example is finding the modular invariants of the general linear group over a finite field [72, 224]. The determinant also plays an important role in the theory of Drinfeld modules [221].

28 31

1.1. THE VANDERMONDE MATRIX

1.1.5 The generalized Vandermonde matrix There are several types of matrices (or determinants) that have been referred to as generalized Vandermonde matrices, for example the confluent Vander- monde matrix is sometimes referred to as the generalized Vandermonde ma- trix [149,150,175,194,265], this matrix and its role in interpolation problems is briefly described on page 40. Other examples include modified versions of confluent Vandermonde matrices [91], as well as matrices with elements given by multivariate monomials of increasing multidegree [39], or similarly over the algebraic closure of a field [61], matrices with elements given by multivariate polynomials with univariate terms [283]. α αn In this thesis we call the alternant matrix Amn(x 1 , . . . , x ; x1, . . . , xn) the generalized Vandermonde matrix. Definition 1.4. A generalized Vandermonde matrix is an n × m matrix of the form

 α1 α1 α1  x1 x2 ··· xn α2 α2 α2 h im,n  x1 x2 ··· xn  G (x ) = xαi =   (7) mn n j  . . .. .  i,j  . . . .  αm αm αm x1 x2 ··· xn where xi ∈ C, αi ∈ C, i = 1, . . . , n. If the matrix is square, n = m, the notation Gn = Gnm will be used. This name has been used for quite some time, see [120] for instance. The main reason to study this matrix seems to be its connection to Schur polynomials, see below, and thus the research on the matrix is primarily focused on its determinant. Many of the results are algorithmic in nature [47, 66–68,157] but there are also more algebraic examinations [85,97,250,296]. There are several of examples where the determinant of generalized Van- dermonde matrices are interesting or useful.

Schur polynomials

Given an integer partition λ = (λ1, . . . , λn), that is 0 < λ1 ≤ λ2 ≤ . . . λn and each λi ∈ N, we can define a(λ1+n−1,λ2+n−2,...,λn)(x1, . . . , xn)

= det(Gn(λ1 + n − 1, λ2 + n − 2, . . . , λn; x1, . . . , xn)).

Note that a(λ1+n−1,λ2+n−2,...,λn)(x1, . . . , xn) is a polynomial that always have a(n−2,n−1,...,0)(x1, . . . , xn) = vn(x1, . . . , xn) as a factor. The polynomials given by expressions of the form a(λ1+n−1,λ2+n−2,...,λn)(x1, . . . , xn) sλ(x1, . . . , xn) = a(n−2,n−1,...,0)(x1, . . . , xn)

29 32

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions are called the Schur functions or Schur polynomials and were introduced by Cauchy [42] but named after Issai Schur (1875–1941) that showed that they were highly useful in invariant theory and representation theory. For instance they can be used to determine the character of conjugacy classes of representations of the symmetric group [98]. They have also been used in other areas, for instance to describe the generating function of many classes of plane partitions, see for instance [36] for several examples. The literature on Schur polynomials is vast and so are the applications so there will be no attempt to summarise them here.

Integration of an exponential function over a unitary group If we let U(n) be the n-dimensional unitary group and dU a Haar measure normalised to 1 then the Harish-Chandra–Itzykson–Zuber integral formula [116, 136], says that if A and B are Hermitian matrices with eigenvalues λ1(A) ≤ ... ≤ λn(A) and λ1(B) ≤ ... ≤ λn(B) then

Z det [exp(tλ (A)λ (B))]n,n n−1 t tr(AUBU ∗) j k j,k Y e dU = n(n−1) i! (8) U(n) t 2 vn(λ(A))vn(λ(B)) i=1 where vn is the determinant of the Vandermonde matrix. If t = 1 and A and B are chosen as diagonal matrices ( ( ai if i = j, bi if i = j, Aij = Bij = 0 if i 6= j, 0 if i 6= j, then formula (8) reduces to an expression involving determinants of a gen- eralized Vandermonde matrix and two Vandermonde matrices,

ea1b1 ea1b2 . . . ea1bn

ea2b1 ea2b2 . . . ea2bn

...... anb1 anb2 anbn Z ∗ e e . . . e etr(AUBU ) dU = . U(n) vn(a1, . . . , an)vn(b1, . . . , bn)

1.1.6 The Vandermonde determinant in systems with Coulombian interactions Several interesting mathematical problems that feature Vandermonde ma- trices and Vandermonde determinant can be described as questions about systems with Coulombian interactions. The name Coulombian interaction come from Charles-Augustin Coulomb (1736–1806) who is probably most well-known for quantifying the force between two charged particles (what is today known as Coulomb’s law) in 1785 [59]. Coulombs law states that

30 33

1.1. THE VANDERMONDE MATRIX the force between two charged particles is proportional to the product of the charges and the inverse of the square of the distance between the two charges. When talking about Coulombian interactions in mathematics or mathematical physics it usually refers to a system described by an energy given by N 1 X X H (x , . . . , x ) = g(x − x ) + N V (x ) (9) N 1 N 2 i j i i6=j i=1 where the interaction , g(x), can take a few different forms, more on this later, and V (x) is an external potential that can behave in many dif- d ferent ways. The points xi usually belong to R (or some subset thereof) but there is also research that involves more general manifolds. A common goal is to minimize this energy or find some other extreme points. There are many areas where this kind of problems, or closely related problems, appear. See the extended version of [255] for a recent review of the field. In this section we will mention a few examples of interesting systems with Coulombian interactions that are connected to the Vandermonde determi- nant and the properties of the Vandermonde determinant we discuss in this thesis.

Fekete points

In Section 1.2.1 interpolation of a finite number of points using a polynomial will be discussed. When a function is approximated by a polynomial using interpolation the approximation error depends on the chosen interpolation points. The Fekete points is a set of points that provide an almost optimal choice of interpolation points [248] and they are given by maximizing the Vandermonde determinant. Taking the logarithm of the expression for the Vandermonde determinant given in Theorem 1.2 gives

X log(vn(x1, . . . , xn)) = log(xj − xi) 1≤i

1 and thus − 2 log(vn(x1, . . . , xn)) gives the same as setting g(x) = log(x) and V (x) ≡ 0 in (9). Finding the Fekete points is also of interest in complex- ity theory and would help with finding an appropriate starting polynomial for a homotopy algorithm for realizing the Fundamental Theorem of Alge- bra [258,262]. In Chapter 2 we will discuss how to find the maximum points of the Vandermonde determinant for certain special cases. A common gen- eralisation of the Fekete points is the case where multivariate polynomials d are used, see for example [30, 37, 203]. The case where and points in C are interpolated have also been examined, an example of a recent significant results is [20] and a review can be found in [24].

31 34

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Distribution of electrical charges

The most classical example of a system with Coulombian interactions is a system of charged particles confined to some volume, even if it was not studied (from a mathematical point of view) until almost a hundred years after Coulomb’s law was introduced [119, 267]. The classical mathematical formulation of this problem considers p+1 charges fixed at points a0,...,ap ∈ C with weights η0,...,ηp and n moveable charges x1,...,xn. The questions is then what x-values give the extreme points of L(x1, . . . , xn) given by n p X X  1  X  1  L(x1, . . . , xn) = ηj log + log . |aj − xk| |xk − xi| k=1 j=0 1≤i

More background on this type of problem together with a collection of recent results can be found in [73]. If there are no fixed charges the problem becomes equivalent to maximising the absolute value of the Vandermonde determinant similar to finding the Fekete points. The problems discussed in Chapter 2 belongs to the class of equations that are called Schr¨odinger-like in [73].

Sphere packing

There are several different interaction kernels apart from the logarithmic interaction kernel, g(x) = − log(x), that are interesting in mathematical physics, especially statistical mechanics and quantum mechanics. One im- 1 portant class of interaction kernels are those given by g(x) = |x|s where s is a positive integer. When this interaction kernel is used value given by formula (9) is called the Riesz s-energy. There is a large body of significant literature, in [255] over 30 references are listed as introduction to different related problems.  1  It is worth noting that lim 1 − = − log(|x|) which connects min- s→0 |x|s imising the Riesz s-energy to the Fekete points. If we instead s → ∞ the problem of minimising the Riesz s-energy formally corresponds to the optimal sphere-packing problem, that is finding the arrangement of non-overlapping identical spheres that cover as much of a space as possible. This is a classical problem where extensive effort has gone into finding optimal packings but for many years the problem was only fully solved in one, two and three dimensions, until recently when surprisingly simple proofs were found for 8 and 24 dimensions (seemingly without giving any results for any number of dimensions in-between). For a thorough collection of classical results see [58] and for the recent results see [52–54, 284].

32 35

1.1. THE VANDERMONDE MATRIX

Coulomb gas In mathematical physics a system of particles whose energy can be described by (9) is often called a Coulomb gas [93, 207, 255]. One of the most wide- reaching results in the analysis of Coulomb gases was that many gas sys- tems can be described using random matrices that belongs to a so-called β-ensemble which is defined by matrices with random elements. The foun- dational results were found in the early 1960s and applied to the cases where β = 1, β = 2 and β = 4 [78–81]. These cases will be briefly discussed in Section 1.1.7 and describe where the Vandermonde determinant appears the probability density functions for the eigenvalues of the random matrices. If the same theory is extended to other values of β it can also be connected to equations similar to the Harish-Chandra–Itzykson–Zuber integral formula described on page 30 [93].

1.1.7 The Vandermonde determinant in random matrix theory This section is based on Section 1, 3 and 4 in Paper D

Random matrix theory is a large research are with many applications, primarily in quantum mechanics and statistical mechanics [93,207,255] but also in wireless communication and finance [13] and they appear as an im- portant tool for analysing and evaluating algorithms in numerical linear algebra [83]. One class of random matrices that have been analysed extensively are the so-called β-ensembles, for a brief motivation see the section on Coulomb gas above. Here we will define the some well-known β-ensembles and describe where the Vandermonde determinant appears in their probability density functions.

Definition 1.5. Let X = (X1, ··· ,Xn), where Xi ∼ N (µi,ΣΣ) and Xi is independent of Xj, where i 6= j. The matrix W : p × p is said to be Wishart distributed [292] if and only if W = XX> for some matrix X in a family of Gaussian matrices Gm×n, m ≤ n, that is, X ∼ Nm,n(µµ,Σ, I) where Σ ≥ 0. Next we will look at the expression for the probability density distribu- tion of the eigenvalues of a Wishart distributed matrix taken from [7]. Theorem 1.5. If X is distributed as N (µµ,ΣΣ), then the probability density > distribution of the eigenvalues of XX , denoted λ = (λ1, . . . , λm), is given by:

− 1 n − 1 n 1 (n−p−1)   π 2 det(ΣΣΣ) 2 det(D) 2 Y 1 −1 P(λ) = 1 (λi − λj) exp − Tr(ΣΣ D) 2 np 1  1  2 2 Γp 2 n Γp 2 p i

33 36

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

It will prove useful that the formula given in Theorem 1.5 contains the Y term (λi − λj) which we recognize from Theorem 1.2 as the determinant i

Lemma 1.1. Let P be a polynomial and A be a symmetric n × n matrix. If the eigenvalues of A, {λk, k = 1, . . . , n}, are all distinct then n X P (λk) = Tr (P (A)) . k=1 Proof. By definition, for any eigenvalue λ and eigenvector v we must have Av = λv and thus

m ! m m X k X k X k P (A)v = ckA v = ck(A v) = ckλ v k=0 k=0 k=0 and thus P (λ) is an eigenvalue of P (A). For any matrix, A, the sum of eigenvalues is equal to the trace of the matrix n X λk = Tr(A) k=1 when multiplicities are taken into account. For the matrices considered in the Lemma 1.1 all eigenvalues are distinct. Thus applying this property to the matrix P (A) gives the desired statement.

Lemma 1.2. A Wishart distributed matrix W as defined in Definition 1.5 will be a symmetric n × n matrix.

Proof. From the definition W is a p × p matrix such that W = XX>. Then

W> = (XX>)> = (X>)>X> = XX> = W and thus W is symmetric.

The Gaussian Orthogonal Ensembles (GOE), the Gaussian Unitary En- sembles (GUE), the Gaussian Symplectic Ensembles (GSE) and the Wishart Ensembles (WE) are well-known classical ensembles. More detailed discus- sions on these ensembles can be found in [6,7,75,163,207,220,292], here we will only give their definitions and look at how the Vandermonde determi- nant appears in the probability density function for their eigenvalues.

34 37

1.1. THE VANDERMONDE MATRIX

Definition 1.6. The Gaussian Orthogonal Ensemble (GOE) is characterised by a X with real elements. The diagonal entries of X are independent and identically distributes (i.i.d) with a standard normal distri- bution N (0, 1) while the off-diagonal entries are i.i.d with a standard normal distribution N1(0, 1/2). That is, a random matrix X gives a GOE, if it is symmetric and real-valued (Xij = Xji) and has (√ 2ξii ∼ N1(0, 1), if i = j X−ij = (10) ξij ∼ N1(0, 1/2), i < j.

Definition 1.7. The Gaussian Unitary Ensemble (GUE) is characterised by Hermitian (that is H>∗ = H where >∗ denotes the conjugate transpose) complex-valued matrices H. The diagonal entries of H are independent and identically distributes (i.i.d) with a standard normal distribution N (0, 1) while the off-diagonal entries are i.i.d with a standard normal distribution N2(0, 1/2). In other words, a random matrix H belongs to the GUE, if it is complex-valued, Hermitian, and the entries satisfy (√ 2ξii ∼ N2(0, 1), if i = j Hij = (11) √1 (ξ + iη ) ∼ N (0, 1/2), i < j, 2 ij ij 2 where i is the imaginary unit. Definition 1.8. The Gaussian Symplectic Ensemble (GSE) is characterised by a matrix, S, with quaternion elements that is self-dual (that is S>∗ = S where >∗ denotes the conjugate transpose of a quaternion). The diagonal entries H are independent and identically distributes (i.i.d) with a standard normal distribution N (0, 1) while the off-diagonal entries are i.i.d with a standard normal distribution N4(0, 1/2). In other words, a random matrix S belongs to the GUE, if it is complex-valued, Hermitian, and the entries satisfy (√ 2ξii ∼ N2(0, 1), if i = j Hij = (12) √1 (ξ + iα + jβ ) + kγ ∼ N (0, 1/2), i < j, 2 ij ij ij ij 4 where i, j and k are the fundamental quaternion units.

Definition 1.9. The Wishart Ensembles (WE), Wβ(m, n), m ≥ n, are char- acterised by the symmetric, Hermitian or self-dual matrix W = Wβ(N,N) obtained as W = AA>, W = HH>, or W = SS> where > represents the appropriate transpose as given in the definition of the GOE, GUE and GSE respectively. To obtain the joint eigenvalue densities for random matrices, we apply the the principle of matrix factorization, for instance if the random matrix X is expressed as X = QΛQ>, then Λ directly gives the eigenvalues of X [138].

35 38

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Applying the Jacobian technique for joint density transformation, see for example [7], this yields the joint densities of eigenvalues and eigenvectors. Lemma 1.3. The three Gaussian ensembles have joint eigenvalues proba- bility density function given by

N ! Y 1 X Gaussian: (λ) = Cβ |λ − λ |β exp − λ2 (13) Pβ N 1 2 2 i i

N Y Γ (1 + β/2) Cβ = (2π)−N/2 . N Γ (1 + jβ/2) j=1

Lemma 1.4. The Wishart ensembles have a joint eigenvalue probability density distribution given by

N ! Y Y 1 X Wishart: (λ) = Cβ,α |λ − λ |β λα−p exp − λ2 (14) Pβ N 1 2 i 2 i i

β β where α = 2 m and p = 1 + 2 (N − 1). The β parameter is decided by what type of elements are in the Wishart matrix, real-valued elements corresponds to β = 1, complex-valued elements correspond to β = 2 and quaternion β,α elements correspond to β = 4, and the normalizing constant CN is given by N Y Γ (1 + β/2) Cβ,α = 2−Nα . (15) N  β  j=1 Γ (1 + jβ/2) Γ α − 2 (n − j) More information on Lemma 1.3 and 1.4 can be found in standard text on random matrix theory, see for example [138, 207, 220]. Thus the joint eigenvalue probability density distribution for all the en- sembles can be summarized in the following theorem (for more detail see for example [83, 163, 207]). Theorem 1.6. Suppose that X belongs to one of the ensembles discussed given by Definitions 1.6–1.9. Then the distribution of eigenvalues of XN is given by ! Y β X (x , ··· , x ) = C¯β |x − x |β exp − x2 (16) PX 1 N N i j 4 i i

¯(β) where CN are normalized constants and can be computed explicitly and β is determined by the elements of X as in Lemma 1.4.

36 39

1.2. CURVE FITTING

From (16) it should be noted that the properties of a probability density function, that is,

N Z Y 0 ≤ P(x) ≤ 1 and P(x) dxi = 1 N R i=1

Y β do hold as verified in [207]. We also notice that the term |xi − xj| in i

1.2 Curve fitting

The process of constructing a mathematical curve so that it has the best possible fit to some series of data pints is usually referred to as curve fitting. Exactly what fit means and what constraints are put on the constructed curve varies depending on context. In this section we will discuss a few different scenarios and methods that are related to the Vandermonde ma- trix and the methods used in later chapters to construct phenomenological mathematical models. We will give an introduction to a few different interpolation methods in Sections 1.2.1–1.2.2, that gives a curve that passes exactly through a finite set of points. If we cannot make a curve that passes through the points exactly we will need to choose how to measure the distance between the curve and the points in order to determine what curve fits the data points best. In Sections 1.2.3–1.2.6 the so called least squares approach to this kind of problem is presented.

1.2.1 Linear interpolation The problem of finding a function that generates a given set of points is usually referred to as an interpolation problem and the function generating the points is called an interpolating function. A common type of inter- polation problem is to find a continuous function, f, such that the given set of points {(x1, y1), (x2, y2),...} can be generated by calculating the set {(x1, f(x1)), (x2, f(x2)),...}. Often the interpolating function is also a lin- ear combination of elementary functions, but interpolation can also be done in other ways, for instance with fractals (the classical texts on this is [15,16]) or parametrised curves. For some examples, see Figure 1.2.

37 40

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Figure 1.2: Some examples of different interpolating curves. The set of red points are interpolated by a polynomial (left), a self-affine fractal (middle) and a Lissajous curve (right).

In the case of the interpolating function being a of other functions and the interpolation is achieved by changing the coefficients of the linear combination this is said to be a linear model (not to be confused with linear interpolation that is interpolation with piecewise straight lines). For linear models the interpolation problem can be described using al- ternant matrices. Suppose we want to find a function m X f(x) = aigi(x) (17) i=1 that fits as well as possible to the data points (xi, yi), i = 1, . . . , n. We then get an interpolation problem described by the linear equation system Aa = y where a are the coefficients of f, y are the data values and X is the appropriate alternant matrix,       g1(x1) g2(x1) . . . gm(x1) a1 y1 g1(x2) g2(x2) . . . gm(x2) a2 y2 X =   , a =   , y =   .  . . .. .   .   .   . . . .   .   .  g1(xn) g2(xn) . . . gm(xn) an yn

Polynomial interpolation A classical form of interpolation is polynomial interpolation where n data points are interpolated by a polynomial of at most degree n − 1. The Vandermonde matrix can be used to describe this type of interpola- tion problem simply by rewriting the equation system given by p(xk) = yk, k = 1, . . . , n as a matrix equation  n−1     1 x1 ··· x1 a1 y1 n−1 1 x2 ··· x  a2 y2  2    =   . . . .. .   .   .  . . . .   .   .  n−1 1 xn ··· xn an yn

38 41

1.2. CURVE FITTING

That the polynomial is unique (if it exists) is easy to see when considering the determinant of the Vandermonde matrix Y det(Vn(x1, . . . , xn)) = (xj − xi). 1≤i

Clearly this determinant is non-zero whenever all xi are distinct which means that the matrix is invertible whenever all xi are distinct. If not all xi are distinct there is no function of the x coordinate that can interpolate all the points. There are several ways to construct the interpolating polynomial without explicitly inverting the Vandermonde matrix. The most straight-forward is probably Lagrange interpolation, named after Joseph-Louis Lagrange (1736– 1813) [167] who independently discovered it a few years after Edward Waring (1736–1798) [288]. The idea behind Lagrange interpolation is simple, construct a set of n polynomials {p1, p2, . . . , pn} such that ( 0, i 6= j pi(xj) = 1, i = j and then construct the final interpolating polynomial by the sum of these pi weighted by the corresponding yi. The pi polynomials are called Lagrange basis polynomials and can easily be constructed by placing the roots appropriately and then normalizing the result such that pi(xi) = 1, which gives the expression

(x − x1) ··· (x − xi−1)(x − xi+1) ··· (x − xn) pi(x) = . (xi − x1) ··· (xi − xi−1)(xi − xi+1) ··· (xi − xn) The explicit formula for the full interpolating polynomial is n X (x − x1) ··· (x − xk−1)(x − xk+1) ··· (x − xn) p(x) = yk (xk − x1) ··· (xk − xk−1)(xk − xk+1) ··· (xk − xn) k=1 and from this formula the expression for the inverse of the Vandermonde matrix can be found by noting that the jth row of the inverse will consist of the coefficients of pj, the resulting expression for the elements is given in Theorem 1.4. Polynomial interpolation is mostly used when the data set we wish to interpolate is small. The main reason for this is the instability of the inter- polation method. One example of this is Runge’s phenomenon that shows that when certain functions are approximated by polynomial interpolation fitted to equidistantly sampled points will sometimes lose when the number of interpolating points is increased, see Figure 1.4 for an example.

39 42

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

p1(x) p2(x) p3(x) p4(x) p(x) (x, y)

2

0

−2

0 1 2 3 4 5 6 7 8

Figure 1.3: Illustration of Lagrange interpolation of 4 data points. The red 4 X dots are the data set and p(x) = ykp(xk) is the interpolating k=1 polynomial.

One way to predict this instability of polynomial interpolation is that the conditional number of the Vandermonde matrix can be very large for equidistant points [108]. There are different ways to mitigate the issue of stability, for example choosing data points that minimize the conditional number of the relevant matrix [106, 108] or by choosing a polynomial basis that is more stable for the given set of data points such as Bernstein polynomials in the case of equidistant points [222]. Other polynomial schemes can also be considered, for instance by interpolating with different basis functions in different inter- vals, for example using polynomial splines. Naturally another choice is to instead of polynomials choose basis func- tions that are more suitable to the problem at hand. For an example of this see Section 3.3. While the instability of polynomial interpolation does not prevent it from being useful for analytical examinations it is generally considered imprac- tical when there is noise present or when calculations are performed with limited precision. Often interpolating polynomials are not constructed by inverting the Vandermonde matrix or calculating the Lagrange basis poly- nomials, instead a more computationally efficient method such as Newton interpolation or Neville’s algorithm are used [235]. There are some variants of Lagrange interpolation, such as barycentric Lagrange interpolation, that have good computational performance [21]. In applications where the data is noisy it is often suitable to use least squares fitting, which is discussed in Section 1.2.3, instead of interpolation.

40 43

1.2. CURVE FITTING

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0 −40−20 0 20 40 −40−20 0 20 40 −40−20 0 20 40

Figure 1.4: Illustration of Runge’s phenomenon. Here we attempt to approx- imate a function (dashed line) by polynomial interpolation (solid line). With 7 equidistant sample points (left figure) the approx- imation is poor near the edges of the interval and increasing the number of sample points to 14 (center) and 19 (right) clearly re- duces accuracy at the edges further.

Finally we will discuss an interesting and important (but for the rest of the thesis irrelevant) form of polynomial interpolation called Hermite interpolation where it is not only required that p(xk) = yk but also that the derivatives up to a certain order (sometimes allowed vary per point) are also given. This requires a higher degree polynomial that can be found by solving the equation system

 p(x ) = y  k k0  p0(x ) = y  k k1 .  .   (i) p (xk) = yki for all k = 1, 2, . . . , n where ki are integers that defines the order of the derivative that needs to match at the point given by xk. When this equation system is written as a matrix equation the resulting n X matrix, C, will have dimension m × m with m = ki with rows given by i=1

( j 0, b ≤ kj X Ca,b = with c = a − ki and c < a ≤ c + kj+1. (b−1)! xb−c−1, b > k (b−c−1)! k j i=1

The matrix C is called a confluent Vandermonde matrix and has been studied extensively since Hermite interpolation is important both for nu- merical and analytical purposes. For example the confluent Vandermonde

41 44

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions matrix also has a very elegant formula for the determinant [3]

Y (ki+1)(kj +1) det(C) = (xj − xi) . 1≤i

There are also many results related to its inverse and numerical proper- ties, classical examples are [104, 105, 107], some further examples are men- tioned on page 29 but this is a vanishingly small part of the total literature on the subject.

1.2.2 Generalized divided differences and interpolation In Section 1.2.1 we saw how the coefficients of an interpolating polyno- mial could be computed by inverting the Vandermonde matrix or using the Lagrange basis polynomials. Another method for the coefficients of the polynomials is based on a computation called divided differences.

Definition 1.10. Let x0,... , xn then the divided differences operator that acts on a function f(x) is defined as  f(x0), n = 0, [x0, . . . , xn]f(x) = [x1, . . . , xn]f(x) − [x0, . . . , xn−1]f(x)  , n > 0. xn − x1 The reason that the divided difference operator is interesting in polyno- mial interpolation is that if we apply it to two distinct points, x0 and x1, and a function f(x) then the result is the slope of a line that passes through the two points (x0, f(x0)) and (x1, f(x1)),

[x1]f(x) − [x0]f(x) f(x1) − f(x0) [x0, x1]f(x) = = . x1 − x0 x1 − x0 A line that passes through the two points can then be constructed like this p(x) = f(x0) + (x − x0)[x0, x1]f(x).

It can similarly be shown that a polynomial that interpolates a set of points

(x0, f(x0)),..., (xn, f(xn)) can be written p(x) =f(x0) + (x − x0)[x0, x1]f(x) + (x − x0)(x − x1)[x0, x1, x2]f(x) + ...

+ (x − x0) ··· (x − xn−1)[x0, . . . , xn]f(x).

This method for interpolation is usually referred to as Newton interpolation and is probably the most well-known application of divided differences. In

42 45

1.2. CURVE FITTING some literature, e.g. [65], this property is even used as a definition for divided differences. Since we expect to find the same polynomial whether we use the Lan- grange interpolation method described in Section 1.2.1 or the Newton inter- polation method described above we also expect there to be some relation between the divided difference operator and the Vandermonde determinant. Turns out there is a fairly simple relation, see [253] for details.

Lemma 1.5. The divided difference operator defined in Definition 1.10 can also be written as n−1 1 x0 ··· x0 f(x0) n−1 1 x1 ··· x f(x1) 1 ...... n−1 1 xn ··· xn f(xn) [x0, . . . , xn]f(x) = , (18) vn(x0, . . . , xn) where vn(x0, . . . , xn) denotes the Vandermonde determinant. Remark 1.5. Sometimes, see for example [253], the relation in Lemma 1.5 is used as the definition of the divided difference operator. The divided differences operator can also be used to describe the er- ror that one gets when a function is approximated by interpolating with a polynomial, the following lemma is from [156].

Lemma 1.6. Let p(x) be a polynomial of degree smaller than or equal to n that interpolates the points {(xi, f(xi)), i = 0, . . . , n}. For any x 6= xi, i = 0, . . . , n the error f(x) − p(x) is given by n Y f(x) − p(x) = [x0, . . . , xn, x]f(x) (x − xi). i=0 Combining Lemma 1.5 and Lemma 1.6 gives

n−1 1 x0 ··· x0 f(x0) n−1 1 x1 ··· x f(x1) 1 ...... n−1 1 xn ··· xn f(xn) n 1 x ··· xn−1 f(x) Y f(x) − p(x) = (x − x ). v (x , . . . , x , x) i n 0 n i=0 which gives some insight to why the value of the Vandermonde determinant is important when choosing interpolation points. Another popular application of the divided differences operator is the construction of so called B-splines, piecewise polynomial functions that allow

43 46

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions for very efficient storage and computation of a variety of shapes. The concept of (mathematical) splines first appeared in the 1940s [251,252] and B-splines were developed in the 1960s and 1970s [22,63,64]. We can define a B-spline using the divided differences as follows.

Definition 1.11. Given a sequence, · · · ≤ t−1 ≤ t0 ≤ t1 ≤ t2 ≤ · · · we can define the kth B-spline of order m as

( m (−1) [tk, . . . , tk+m]gk(x, t), tk ≤ x < tk+1, Bk,m(x) = 0, otherwise, where ( (x − t)k−1, x ≥ t, gk(x, t) = 0, otherwise. and the divided difference operator acts with respect to t. Remark 1.6. There are several different ways to define B-splines, above we followed the definition in [253]. In modern literature it is more common that B-splines and their computation are described from the perspective of so-called blossoms [99,239,240] rather that the divided difference description. B-splines can be used for many things, for example approximation theory [204], geometric modelling [99] and wavelets construction [49]. We will not discuss their use further in this thesis. If we want to do linear interpolation and use some other set of basis functions other than the monomials, as in (17), then we need to define a generalized version of the divided difference operator.

Definition 1.12. Given a set of m linearly independent functions, G = {gi}, and n values, x0,...,xn, then the generalized divided differences operator that acts on a function f(x) is defined as

g1(x1) g2(x1) ··· gn−1(x1) f(x1)

g1(x2) g2(x2) ··· gn−1(x2) f(x2)

......

g1(xn) g2(xn) ··· gn−1(xn) f(xn) [x1, . . . , xn]Gf(x) = . g1(x1) g2(x1) ··· gn(x1)

g1(x2) g2(x2) ··· gn(x2)

......

g1(xn) g2(xn) ··· gn(xn)

Remark 1.7. We mentioned previously that the divided difference operator can be used to construct B-splines and using the generalized divided differ- ence operator similar tools can be constructed using other sets functions than polynomials as a basis, see for example [196].

44 47

1.2. CURVE FITTING

1.2.3 Least squares fitting If it is not necessary to exactly reproduce the series of data points a com- monly applied alternative to interpolation is least squares fitting. A least squares fitting of a mathematical model to a set of data points {(xi, yi), i = 1, . . . , n} is the choice of parameters of the model, here denoted β, chosen such that the sum of the squares of the residuals n X 2 S(β) = (yi − f(β; xi)) i=1 is minimized. This choice is appropriate if data series is affected by inde- pendent and normally distributed noise, see Section 1.3.1. The most wide-spread form of least squares fitting is linear least squares fitting where, analogously to linear interpolation, the function f(β; x) de- pends linearly on β. This case has a unique solution that is simple to find. It is commonly known as the least squares method and we describe it in detail in the next section. With a non-linear f(β; x) it is usually much more difficult to find the least squares fitting and often numerical methods are used, e.g. the Marquardt least squares method described in Section 1.2.6. In Section 3.2 we present a scheme for approximating electrostatical discharges to ensure electromagnetic compatibility (see Section 1.5) that uses both the least squares method and the Marquard least squares method. In Chapter 4 we fit several models to estimated mortality rates using non- linear least squares fitting and compare the result in various way described in Sections 1.3.1–1.3.3.

1.2.4 Linear least squares fitting Suppose we want to find a function m X f(x) = βigi(x) (19) i=1 that fits as well as possible in the least squares sense to the data points (xi, yi), i = 1, . . . , n, n > m. We then get a curve fitting problem described by the linear equation system Aβ = y where β are the coefficients of f, y is the vector of data values and A is the appropriate alternant matrix,       g1(x1) g2(x1) . . . gm(x1) β1 y1 g1(x2) g2(x2) . . . gm(x2) β2 y2 A =   , β =   , y =   .  . . .. .   .   .   . . . .   .   .  g1(xn) g2(xn) . . . gm(xn) βn yn This is an overdetermined version of the linear interpolation problem described in Section 1.2.1.

45 48

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

How can we actually find the coefficients that minimize the sum of the squares of the residuals? First we can define the square of the length of the residual vector, e = Aβ − y, as a function n > X 2 > S(e) = e e = |ei| = (Aβ − y) (Aβ − y). i=1 This kind of function is a positive second degree polynomial with no mixed terms and thus has a global minima where ∂s = 0 for all 1 ≤ i ≤ n. We ∂ei can find the global minima by looking at the derivative of the function, ei is determined by βi and ∂ei = Ai,j ∂βj thus n n ∂S X ∂ei X = 2e = 2(A β − y )A = 0 ⇔ A>Aβ = A>y ∂β i ∂β i,· i i,j i i=1 j i=1 This gives A>Aβ = A>y ⇔ β = (A>A)−1A>y and by the Gauss–Markov theorem ([102,103,201], see for instance [208] for a more modern description), if (A>A)−1 exists then (19) gives the linear, unbiased estimator that gives the lowest variance possible for any linear, unbiased estimator. The matrix given by (A>A)−1A> is sometimes referred to as the Moore–Penrose pseudoinverse of A. i−1 Clearly a linear curve fitting model with gi(x) = x gives an equation system described by a rectangular Vandermonde matrix.

1.2.5 Non-linear least squares fitting So far we have only considered models that are linear with respect to the parameters that specify them. If we relax the linearity condition and simply consider fitting a function with m parameters, f(β1, . . . , βm; x), to n data points in the least squares sense it is usually referred to as a non-linear least squares fitting. There is no general analogue to the Gauss–Markov theorem for non- linear least squares fitting and therefore finding the appropriate estimator requires more knowledge about the specifics of the model. In practice non- linear least squares fittings are often found using some numerical method for non-linear optimization of which there are many (see for instance [247] for an overview). In the next section we will give an overview of a standard method called the Marquardt least squares method. In Section 3.2.2 we will use a combi- nation of the Marquardt least squares method and methods for linear least

46 49

1.2. CURVE FITTING squares fitting to fit a non-linear model described by

G (β; t) η = y where β, η are vectors of parameters to be fitted, y is the data we wish to fit the model to and G (β; t) is the generalized Vandermonde matrix

 1−t β 1−t β 1−t βm  (t1e 1 ) 1 (t1e 1 ) 2 ··· (t1e 1 ) 1−t β 1−t β 1−t βm (t2e 2 ) 1 (t2e 2 ) 2 ··· (t2e 2 )  G (β; t) =   .  . . .. .   . . . .  1−tn β 1−tn β 1−tn βm (tne ) 1 (tne ) 2 ··· (tne )

1.2.6 The Marquardt least squares method This section is based on Section 3.1 of Paper E

The Marquardt least squares method, also known as the Levenberg-Marquardt algorithm or damped least squares, is an efficient method for least squares estimation for functions with non-linear parameters that was developed in the middle of the 20th century (see [174], [202]). The least squares estimation problem for functions with non-linear pa- rameters arises when a function of m independent variables and described by k unknown parameters needs to be fitted to a set of n data points such that the sum of squares of residuals is minimized. The vector containing the independent variables is x = (x1, ··· , xn), the vector containing the parameters β = (β1, ··· , βk) and the data points

(Yi,X1i,X2i, ··· ,Xmi) = (Yi, Xi) , i = 1, 2, ··· , n.

Let the residuals be denoted by Ei = f(Xi; β) − Yi and the sum of squares of Ei is then written as n X 2 S = (f(Xi; β) − Yi) , i=1 which is the function to be minimized with respect to β. The Marquardt least squares method is an iterative method that gives approximate values of β by combining the Gauss-Newton method (also known as the inverse Hessian method) and the steepest descent (also known as the gradient) method to minimize S. The method is based around solving the linear equation system   A∗(r) + λ(r)I δ∗(r) = g∗(r), (20)

∗(r) ∗(r) where A is a modified of E(b) (or f(Xi; b)), g is a rescaled version of the gradient of S, r is the number of the current iteration

47 50

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions of the method, and λ is a real positive number sometimes referred to as the fudge factor [235]. The Hessian, the gradient and their modifications are defined as follows: A = J>J,

∂fi ∂Ei Jij = = , i = 1, 2, ··· , m; j = 1, 2, ··· , k, ∂bj ∂bj and ∗ aij (A )ij = √ √ , aii ajj while > ∗ gi g = J (Y − f0), f0i = f(Xi, b, c), gi = . aii Solving (20) gives a vector which, after some scaling, describes how the parameters b should be changed in order to get a new approximation of β,

δ∗(r) b(r+1) = b(r) + δ(r), δ(r) = √i . (21) aii

It is obvious from (20) that δ(r) depends on the value of the fudge factor λ. Note that if λ = 0, then (20) reduces to the regular Gauss-Newton method [202], and if λ → ∞ the method will converge towards the steepest descent method [202]. The reason that the two methods are combined is that the Gauss-Newton method often has faster convergence than the steepest descent method, but is also an unstable method [202]. Therefore, λ must be chosen appropriately in each step. In the Marquardt least squares method this amounts to increasing λ with a chosen factor v whenever an iteration increases S, and if an iteration reduces S then λ is reduced by a factor v as many times as possible. Below follows a detailed description of the method using the following notation: n 2 (r) X  (r)  S = Yi − f(Xi, b , c) , (22) i=1 n 2  (r) X  (r) (r)  S λ = Yi − f(Xi, b + δ , c) . (23) i=1 The iteration step of the Marquardt least squares method can be described as follows:

• Input: v > 1 and b(r), λ(r).

/ Compute S λ(r).

 λ(r)  • If λ(r)  1 then compute S v , else go to ..

48 51

1.2. CURVE FITTING

 λ(r)  (r) (r+1) λ(r) • If S v ≤ S let λ = v .

. If S λ(r) ≤ S(r) let λ(r+1) = λ(r).

• If S λ(r) > S(r) then find the smallest integer ω > 0 such that S λ(r)vω ≤ S(r), and set λ(r+1) = λ(r)vω.

• Output: b(r+1) = b(r) + δ(r), δ(r).

This iteration procedure is also described in Figure 1.5. Naturally, some condition for what constitutes an acceptable fit for the function must also be chosen. If this condition is not satisfied the new values for b(r+1) and λ(r+1) will be used as input for the next iteration and if the condition is satisfied the algorithm terminates. The quality of the fitting, in other words the value of S, is determined by the stopping condition and the initial values for b(0). The initial value of λ(0) affects the performance of the algorithm to some extent since after the first iteration λ(r) will be self-regulating. Suitable values for b(0) are challenging to find for many functions f and they are often, together with λ(0), found using heuristic methods.

Input: Compute S λ(r) b(r), λ(r) and v > 1

 (r)  ω = ω + 1 ω = 1 λ(r)  1 Compute S λ YES v NO NO NO

 (r)  S λ(r)vω ≤ S(r) S λ(r) ≤ S(r) S λ ≤ S(r) NO v YES YES YES

(r+1) (r) ω (r+1) (r+1) λ(r) λ = λ v λ = λ(r) λ = v

Output: b(r+1) = b(r) + δ(r), δ(r)

Figure 1.5: The basic iteration step of the Marquardt least squares method, definitions of computed quantities are given in (21), (22) and (23).

In Section 3.2 the Marquardt least squares method will be used for least squares fitting with power-exponential functions.

49 52

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

1.3 Analysing how well a curve fits

In this thesis we will discuss several ways to construct mathematical mod- els. With several mathematical models available it is needed to have some method for comparing the methods and choose the most suitable one. When the model in constructed with a certain application in mind there is often a set of required or desired properties given by the application and choosing the best model is a matter of seeing which model matches the requirements the best. In many cases this process is not straightforward and often there is not one model that is better than the other candidate models in all aspects, a common example is the trade-off between accuracy and complexity of the model. It is often easy to improve the model by increasing its complex- ity (either by introducing more general and flexible mathematical concepts that are more difficult to analyse or less well understood or by extending the model in a way that increases the cost of computations and simulations using the model), but finding the best compromise between accuracy and complexity can be difficult. In this section we will discuss how to com- pare models primarily with respect to accuracy and the number of required parameters.

1.3.1 Regression Regression is similar to interpolation except that the presence of noise in the data is taken into consideration. The typical regression problem assumes that the data points {(xi, yi), i = 1, . . . , n} are sample from a stochastic variable of the form Yi = f(β; xi) + i where f(β; x) is a given function with a fixed number of undetermined pa- rameters β ∈ B and i for i = 1, . . . , n are samples of a random variable with expected value zero, called the errors or the noise for the data set. There are many different classes of regression problems defined by the type of function f(β1, . . . , βm; x) and the distribution of errors. Here we will only consider the situation when the i variables are in- dependent and normally distributed with identical variance and that the k parameter space B is a compact subset of R and that for all xi the function f(β; xi) is a continuous function of β ∈ B. Suppose we want to choose the appropriate set of parameters for f based on some set of observed data points. A common approach to this is so called maximum likelihood estimation. Definition 1.13. The , L is the function that gives us the probability that a certain observation, x, of a stochastic variable X is made given a certain set of parameters, β,

Lx(β) = Pr(X = x|β).

50 53

1.3. ANALYSING HOW WELL A CURVE FITS

Thus choosing parameters that maximize the likelihood function gives the set of parameters that seem to be most likely based on available infor- mation. Typically these parameters cannot be calculated exactly and must be estimated, this estimation is called the Maximum Likelihood Estimation (MLE). To find the MLE we need to find the maximum of the likelihood function. Note that here we will only consider the case where the noise variables, i, are independent and normally distributed with mean zero.

Lemma 1.7. For the stochastic variables Yi = f(β; xi) + i where f(β; x) is a given function with a fixed number of undetermined parameters β ∈ B and i for i = 1, . . . , n are independent random variables with expected value zero and standard deviation σ the likelihood function is given by the joint probability density function for the noise, n  2  n n Y (yi − f(β; xi)) L (β) = (2π) 2 σ exp − . y σ2 i=1

Proof. Since each i is normally distributed with mean zero and standard de- viation σ the difference between the observed value and the given function, yi − f(β; xi) is normally distributed with mean zero and standard devia- tion σ. Since all the errors are independent the joint probability density function is just the product of n probability density functions of the form  (y −f(β;x ))2  p (β;(x , y )) = √ 1 exp − i i for i = 1, . . . , n. i i i 2πσ σ2

For the MLE we only care about what parameters give the maximum of the likelihood function, not the actual value of the likelihood function so we can ignore the constant factor and in practice it also often simple to consider the maximum of the logarithm of the likelihood function. This leads to the following lemma.

Lemma 1.8. Consider a regression problem described by a set of data points (xi, yi), i = 1, . . . , n and the stochastic variables Yi = f(β; xi) + i where f(β; x) is a given function with a fixed number of undetermined parameters β ∈ B and i for i = 1, . . . , n are independent normally distributed random variables with expected value zero and standard deviation σ. The MLE for the parameters β will minimize the sum of the squares of the residuals, n X 2 S(β) = (yi − f(β; xi)) . i=1

Proof. Since the natural logarithm is a monotonically increasing function − ln(Ly(β)) will have a minimum point where Ly(β) has a maximum point.

51 54

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

By Lemma 1.7 n  ! n n Y yi − f(β; xi) − ln(L (β)) = − ln (2π) 2 σ exp − y σ2 i=1 n  n n 1 X 2 = − ln (2π) 2 σ + (y − f(β; x )) σ2 i i i=1  n n 1 = − ln (2π) 2 σ + S(β). σ2 Since the first term and the factor in front of S(β) does not depend on β the minimum point of S(β) will coincide with the maximum of the likelihood function.

Here we can see that finding the MLE is equivalent to using the curve fitting technique describes in Section 1.2.3. In Section 1.2.4 we saw that solving this problem in the case when the model was linear with respect to its parameters was relatively straightforward and in Section 1.2.5 we saw that when the model was nonlinear with respect to its parameters the problem was considerably harder.

1.3.2 Quantile-Quantile plots In the regression problem discussed in Section 1.3.1 is assumed that the noise of the model follows some distribution, for the purpose of this thesis we only considered the case of normally distributed noise but the essential problem formulation is the same regardless of noise distribution. Testing this assumption can be done in different ways and in Section 4.4.1 we will demonstrate some cases where the assumption is not entirely true using a quantile-quantile plot (Q-Q plot). Q-Q plots are a common tool for graphi- cally analysing how close sampled data is to a given distribution [273]. Suppose we have n samples of a stochastic variable X from an unknown distribution. Suppose we want to make a Q-Q plot that tests if X belong to a distribution with cumulative distribution function F . First we sort the samples in ascending order, x1 ≤ x2 ≤ ... ≤ xn and then choose what we expect to be the corresponding probability for each sample size. A common k − 0.5 approach is computing the -th quantile for sample x , k = 1, . . . , n, n k k − 0.5 i.e. finding F −1 . It is then expected that the points given by n  k − 0.5  F −1 , x should mostly follow a straight line in the Q-Q plot n k apart from some random noise. If the residuals show some other pattern or some points lie very far from the line this indicates that the residuals would be better described by some other distribution or that there is a significant number of outliers.

52 55

1.3. ANALYSING HOW WELL A CURVE FITS

There are many versions of this kind of tool and many alternatives for which quantiles to choose, see for examples Table 2.1 in [273], but here we will only use quantiles given above. This kind of tool does not provide a rigorous test, rather it is up to whoever analyses the sample to determine if the sample points are close enough to linear or not. These types of plot can also help identifying another distribution that might be a more reasonable assumption, for example we use a Q-Q plot to compare to a normal distribution and the lower end of the curve turns downwards and the higher tail turns upwards this indicates that the samples come a relatively long-tailed distribution, while if the ends of the curve turn in the opposite directions this indicates a relatively short-tailed distribution [273].

1.3.3 The Akaike information criterion

When constructing a mathematical model of an observable process without describing the underlying causes of the process, i.e. a phenomenological model, there are many tools available for creating a model that can recreate a finite set of points with arbitrarily high precision, for example the inter- polation methods described in Sections 1.2.1 – 1.2.2 or the least squares methods in Sections 1.2.3 – 1.2.6. Regardless of what method is chosen the accuracy (unless already exact) can be improved by adding more (free) parameters to the model (exactly how this is accomplished depends on the model). There is a well known anecdote, see [77], where Freeman Dyson describes how Enrico Fermi dissuaded him from pursuing a research project in particle physics where the model had many free parameters by saying

I remember my friend Johnny von Neumann used to say, ’with four parameters I can fit an elephant, and with five I can make him wiggle his trunk’.

Sometimes, especially when working with data that has significant noise, you can have a model that is ’too accurate’ in the sense that the model de- scribes some of the noise as well as the underlying process in a curve fitting or regression problem. For an example see Figure 4.4 where some models designed to reproduce a certain pattern in data gives unreasonable results when noise makes the pattern indistinct. This phenomena is called overfit- ting and can cause problems with extrapolation based on and interpretation of the model. A common sign of overfitting is that the model gives unrea- sonable results for points not in the original data set, similar to Runge’s phenomenon, illustrated in Figure 1.4. One way to detect possible overfitting is to compare the model to a sim- ilar model with fewer parameters and see if the improvement in accuracy is sufficiently large to warrant the use of the more complex model. A for-

53 56

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions malised way of doing this is using the Akaike Information Criterion (AIC) originally developed in the 1970s [4, 5]. Definition 1.14. Let g be a model of some data, y, with k estimated parameters and let Lˆ(g|y) be the maximum value of the likelihood function for the model. Then the Akaike Information Criterion is given by   AIC(g|y) = 2(k + 1) − 2 log Lˆ(g|y) .

The AIC is used for comparing models to each other and a lower AIC indicates a model that describes the data better but without overfitting or needless complexity, essentially a model that contains more information. The key concept for explaining why the AIC works is Kullback–Leibler divergence, also known as Kullback–Leibler information or relative entropy. Definition 1.15. The Kullback–Leibler divergence of two probability dis- tributions over the real numbers with probability density functions f and g is defined as Z ∞ f(x) I(f, g) = f(x) ln dx. −∞ g(x) Remark 1.8. There are definitions of the Kullback–Leibler divergence for discrete and multivariate distributions as well, but for the purposes of this thesis this is the only definition we will need. The Kullback–Leibler divergence has two properties that makes it useful for evaluating mathematical models. Lemma 1.9. Let f and g be the probability density functions of two prob- ability distributions. Then I(f, g) ≥ 0 and if and only if I(f, g) = 0 then f = g. Proof. Since f(x) and g(x) are probability density functions they must be g(x) non-negative and thus f(x) must also be non-negative. Then the following inequality holds  g(x)  g(x) ln ≤ − 1. f(x) f(x) If we multiply by f(x) and integrate on both sides of the inequality and note  g(x)   f(x)  that ln f(x) = − ln g(x) then we get Z ∞ f(x) Z ∞  g(x)  −I(f, g) = − f(x) ln dx ≤ f(x) − 1 dx −∞ g(x) −∞ f(x) Since f and g are probability density distributions we can conclude that Z ∞  g(x)  Z ∞ f(x) − 1 dx = g(x) − f(x) dx −∞ f(x) −∞ Z ∞ Z ∞ = g(x) dx − f(x) dx = 1 − 1 = 0 −∞ −∞

54 57

1.3. ANALYSING HOW WELL A CURVE FITS and thus Z ∞ f(x) −I(f, g) = − f(x) ln dx ≤ 0 ⇔ I(f, g)) ≥ 0. −∞ g(x)

To prove that I(f, g) = 0 if and only if f = g we can argue similarly as before and get Z ∞ I(f, g) ≥ f(x) − g(x) dx −∞ and since f and g are probability density function the right hand side will only be zero if f = g.

A common interpretation of I(f, g) is that the larger the Kullback– Leibler divergence is, the more information is lost by using g as an ap- proximation of f [40,164]. If we have a distribution with probability density function f and a number of candidates for approximating this distribution that have probability density functions g1, g2, ..., gn, then the best can- didate for approximation would be the candidate with I(f, gk) closest to zero. Often this is not useful when trying to model a process based on ob- servations since the true distribution is unknown. One solution to this is to estimate Kullback–Leibler divergence from the true model. This is the main idea behind the AIC. Fully deriving the AIC is somewhat complicated, see for example [35], and here we will only give a short motivation (based on Section 7.2 in [40]). First we must consider the situation that we can apply the AIC. We will have taken some set of observations from a stochastic variable Y with an unknown probability density function f and based on those constructed our model (using for example a curve fitting technique). Let us call the observations y and the model we construct based on the data g(·|y). This will not necessarily give the best possible version of the model so simply looking at I(f, g(·|y)) can be misleading, it is better to consider the expected value of I(f, g(·|y)) with respect to y that has the property

EY [I(f, g(·|Y ))] > I(f, g(·|y∗)) where y∗ is the set of observations that gives the best possible version of the candidate model. Since f is unknown we cannot estimate this expected value directly but it can be rewritten as follows

Z ∞ EY [I(f, g(·|Y ))] = f(y) ln(f(y)) dy −∞ Z ∞  − EY f(x) ln(g(x|Y )) ln(f(x)) dx −∞

= c − EY EX [ln(g(X|Y )]

55 58

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions where X is an independent stochastic variables with the same distribution as Y . Since c is a constant value that is independent of our candidate model we can ignore it and focus on EY EX [ln(g(X|Y )]. We saw in Section 1.3.1 that when we construct the model from the data we use the MLE for the parameters. If we denote the parameters given by the MLE by βˆ then instead of g(x|y) we can denote the chosen model with g(βˆ; x). Thus h ˆ i EY EX [ln(g(X|Y )] = EβˆEX ln(g(β; X) . Using a Taylor expansion we can see that

 ∂ > ln(g(βˆ; X)) β − βˆ   ∂β1  1 1 ˆ  .  . ln(g(β; X)) ≈ ln(g(β; X)) +  .   .       ∂  ˆ ln(g(βˆ; X)) βn − βn ∂βn  >   β1 − βˆ1 β1 − βˆ1 1  ∂2 n,n  .  ˆ  .  +  .  ln(g(β; X))  .  . (24) 2 ∂βi∂βj 1,1 βn − βˆn βn − βˆn

Since βˆ were given by the MLE the first order derivatives will disappear, ∂ ln(g(βˆ; X)) = 0. Taking the expectation of the remaining part of the ∂β1 expression with respect to X gives ˆ EX [ln(g(β; X))] ≈ EX [ln(g(β; X))]  >   β − βˆ n,n β − βˆ 1 1 "  2  # 1 1 1 . ∂ ˆ . +  .  EX ln(g(β; X))  .  2   ∂βi∂βj   1,1 βn − βˆn βn − βˆn ˆ = EX [ln(g(β; X))] + T (β).

Next we take the expectation with respect to βˆ and get h ˆ i ˆ EβˆEX ln(g(β; X) ≈Eβˆ[EX [ln(g(β; X))] + T (β)] ˆ =EX [ln(g(β; X))] + Eβˆ[T (β)]. It can be shown that the first term is given by the maximum of the likelihood function, EX [ln(g(β; X))] = Lˆ, and that the second therm is approximately equal to the number of free parameters which is the number of parameters ˆ of the model plus the standard deviation of the noise, Eβˆ[T (β)] ≈ k + 1. Combining this gives the expression for the AIC given in Definition 1.14 apart from the factor −2 which is used as a matter of convention [40].

56 59

1.4. D-OPTIMAL EXPERIMENT DESIGN

Remark 1.9. For models with many parameters and small sample sizes it is recommended to add a second order correction to the AIC called the AICC [40,129] and in the case of least squares fitting is given by the following expression 2(k + 1)(k + 2) AIC = AIC + . C n − k − 2 There are several other information criterion that could be used in a similar way to the AIC, for example the Takeuchi information criterion [270] or the Bayesian information criterion [254]. Here we will use the AIC since it is considered a reliable criterion that is simple to calculate [40,164] and is asymptotically optimal for selecting the model with the least mean square errors [297].

1.4 D-optimal experiment design

For the class of linear non-weighted regression problems described in Sec- tion 1.3.1 minimizing the square of the sum of residuals gives the maximum- likelihood estimation of the parameters that specify the fitted function. This estimation naturally has a variance as well and minimizing this variance can be interpreted as improving the reliability of the fitted function by minimiz- ing its sensitivity to noise in measurements. This minimization is usually done by choosing where to sample the data carefully, in other words, given the regression problem defined by yi = f(β; xi) + i for i = 1, . . . , n with the same conditions on f(β; x) and i as in Section 1.3.1 we want to choose a design {xi, i = 1, . . . , n} that minimizes the variance of the values predicted by the regression model. This is usually referred to as G-optimality. To give a proper definition of G-optimality we will need the concept of the Fisher information matrix. When motivating the expression for the AIC in Section 1.3.3 the matrix  ∂2 n,n ln(g(βˆ; X)) ∂βi∂βj 1,1 appeared, see expression (24). In that context we were interested how much information was lost when the model g was used instead of the data. If the model g is the true distribution, twice differentiable, and has only one parameter, β, it is possible to describe how information about the model that is contained in the parameter using the Fisher information

∂2 log(g(β; X)) I(β) = −E . X ∂β2

57 60

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Essentially this expression measures the probability of a particular outcome being observed for a known value of β, so if the Fischer information is only large in near a certain points it is easy to tell which parameter value is the true parameter value and if the Fisher information does not have a clear pea it is difficult to estimate the correct value of β. When the model has several parameters the Fisher information is replaced by the Fischer information matrix. n Definition 1.16. For a finite design x ∈ X ⊆ R the Fisher information matrix, M, is the matrix defined by

 ∂2 n,n M(β) = −EX ln(g(βˆ; X)) ∂βi∂βj 1,1

Remark 1.10. The concept of information in the AIC and the concept of information here are two different but related concepts, for a discussion of this relation see Section 7.7.8 in [40]. There is a lot of literature on the Fisher information matrix and but in the context of the least squares problems discussed here we have a fairly simple expression for its elements, see [208] for details. n Lemma 1.10. For a finite design x ∈ X ⊆ R the Fisher information matrix for the type of least squares fitting problem considered in this section can be computed by n X > M(x) = f(xi)f(xi) i=1  > where f(x) = f1(x) f2(x) ··· fn(x) .

Definition 1.17 (The G-optimality criterion). A design ξ is said to be G-optimal if it minimizes the maximum variance of any predicted value

Var(y(ξ)) = min max Var(y(x)) = min max f(x)>M(z)f(x). xi, i=1,2,...,n x∈X z∈X x∈X

The G-optimality condition was first introduced in [264] (the name G- optimality comes from later work by Kiefer and Wolfowitz where they de- scribe several different types of optimal design using alphabetical letters [153], [154]) and is an example of a minimax criterion, since it minimizes the maximum variance of the values given by the regression model [208]. There are many kinds of optimality conditions related to G-optimality. One which is suitable for us to consider is D-optimality. This type of opti- mality was first introduced in [285] and instead of focusing on the variance of the predicted values of the model it instead minimizes the volume of the confidence ellipsoid for the parameters (for a given confidence level).

58 61

1.4. D-OPTIMAL EXPERIMENT DESIGN

Definition 1.18 (The D-optimality criterion). A design ξ is said to be D-optimal if it maximizes the determinant of the Fisher information matrix det(M(ξ)) = max det(M(x)). x∈X The D-optimal designs are often good design with respect to other types of criterion (see for example [112] for a brief discussion on this) and is often practical to consider due to being invariant with respect to linear transformations of the . A well-known theorem called the Kiefer–Wolfowitz equivalence theorem shows that under certain conditions G-optimality is equivalent to D-optimality. Theorem 1.7 (Kiefer–Wolfowitz equivalence theorem). For any linear re- gression model with independent, uncorrelated errors and continuous and linearly independent basis functions fi(x) defined on a fixed compact topo- logical space X there exists a D-optimal design and any D-optimal design is also G-optimal. This equivalence theorem was originally proven in [155] but the for- mulation above is taken from [208]. Thus maximizing the determinant of the Fisher information matrix corresponds to minimizing the variance of the estimated β. Interpolation can be considered a special case of re- gression when the sum of the square of the residuals can be reduced to zero. Thus we can speak of D-optimal design for interpolation as well, in fact optimal experiment design is often used to find the minimum number of points needed for a certain model. For a linear interpolation problem defined by the alternant matrix A(f; x) the Fisher information matrix is M(x) = A(f; x)>A(f; x) and since A(f; x) is an n × n matrix det(M(x)) = det(A(f; x)>) det(A(f; x)) = det(f; x))2. Thus the maximization of the de- terminant of the Fisher information matrix is equivalent to finding the ex- treme points of the determinant of an alternant matrix in some volume given by the set of possible designs. A standard case of this is polynomial interpolation where the x-values are in a limited interval, for instance −1 ≤ xi ≤ 1 for i = 1, 2, . . . , n. In this > case the regression problem can be written as Vn(x) β = y where Vn(x) is a Vandermonde matrix as defined in equation (1) and the constraints on the elements of β means that the volume we want to optimize over is a cube in n dimensions. There is a number of classical results that describe how to find the D-optimal designs for weighted univariate polynomials with various efficiency functions, e.g. [87], and in Section 2.3.3 we will demonstrate one way to optimize the Vandermonde determinant over a cube. The shape of the volume to optimize the determinant in is given by con- straints on the data points. For example, if there is a cost associated with each data point that increases quadratically with x and there is a total bud- get, C, for the experiment that cannot be exceeded the constraint on the

59 62

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

2 2 2 x-values becomes x1 + x2 + ... + xn ≤ C and the determinant needs to be optimized in a ball. In Chapter 2 we examine the optimization of the Van- dermonde determinant over several different surfaces in several dimensions. In Section 3.3 we use a D-optimal design to improve the stability of an interpolation problem as an alternative to the non-linear fitting from Section 3.2. Note that while choosing a D-optimal design can give an approximation method that is more stable since it minimizes the variance of the parameters, the approximating function can still be highly sensitive to changes in param- eters (the variance of the predicted values can be minimized but still high) so it does necessarily maximize stability or stop instability phenomenons similar to Runge’s phenomenon for polynomial interpolation.

1.5 Electromagnetic compatibility and electrostatic discharge currents

There are many examples of electromagnetic phenomena that involve two objects influencing each other without touching. Almost everyone is familiar with magnets that attract or repel other object, sparks that bridge physical gaps and radio waves that send messages across the globe. While this action- at-a-distance can be very useful it can also cause unintended interactions between different systems. This is usually referred to as electromagnetic disturbance or electromagnetic interference and the field of electromagnetic compatibility (EMC) is the study and design of systems that are not sus- ceptible to disturbances from other systems and does not cause interference with other systems or themselves [228, 290]. There are many possible causes of electromagnetic disturbance including a multitude of sources. Some examples are man-made sources such as broad- casting and receiving devices, power generators and converters, power con- version and ignition systems for combustion engines, manufacturing equip- ment like ovens, saws, mills, welders, blenders and mixers, other equipment such as fans, heaters, coolers, lights, computers, instruments for measure- ments and control, examples of natural sources are atmospheric-, solar- and cosmic noise, static discharges and lightning [212]. Mathematical modelling is an important tool for EMC [212]. Using com- puters for electromagnetic analysis have been done since the 1950s [115] and it rapidly became more and more useful and important over time [234]. In practice many different types of models and methods are used, all with their own advantages and disadvantages, and the design process often involves a combination of analytical and numerical techniques [94]. The sources of elec- tromagnetic disturbances are not always well understood or cannot be well described and deriving all parts of the model from first principles requires a combination of many different techniques, both numerical, stochastic and analytical, see [76,229] for examples. In practice it is often reasonable to use

60 63

1.5. ELECTROMAGNETIC COMPATIBILITY AND ELECTROSTATIC DISCHARGE CURRENTS phenomenological models reproducing typical patterns based on statistical data [45, 148]. Requirements for a product or system to be considered electromagneti- cally compatible can be found in standards such as the IEC 61000-4-2 [132] and IEC 62305-1 [133]. In several of these standard approximations of typ- ical currents for various phenomena are given and electromagnetic compat- ibility requirements are based on the effects of the system being exposed to these currents, such as the radiated electromagnetic fields. Ideally the descriptions of these currents should give an accurate description of the ob- served behaviour that the standard is based on as well being computationally efficient (since computer simulations replacing construction of prototypes can save both time and resources) and be compatible with the mathematical tools that are commonly used in electromagnetic calculations, for instance Laplace and Fourier transforms. In this thesis we will discuss approximations of electrostatic discharge currents, either from a standard or based on experimental data. In Section 1.5.1 a review of models in the literature can be found and in Chapter 3 we propose a new function, the analytically extended function (AEF), for modelling these currents that has some advantages compared to the commonly used models and can be applied to many different cases, typically at the cost of some extra manual work in fitting the model. Electrostatic discharge (ESD) is a common phenomenon where a sud- den flow of electricity between two charged object occurs, examples include sparks and lightning strikes. The main mechanism behind is usually said to be contact electrification, this phenomena is due to all materials occa- sionally emitting electrons, usually at a higher rate when they are heated. Typically the emission and absorption balances out but since the rate of emission varies between different materials an imbalance can occur when two materials come sufficiently close to each other. When the materials are separated this charge imbalance might remain for some time, it can be re- stored by the charged objects slowly emitting electrons to the surrounding objects but in the right conditions, for example if the charged object comes near a conductive material with an opposite charge, the restoration of the charge balance can be very rapid resulting in an electrostatic discharge. The reader is likely to be familiar with the case of two materials rubbing against each other building up a charge imbalance and one of the objects generating a spark when moved close to a metal object. This case is common since fric- tion between objects typically means a larger contact area where charges can transfer and movement is necessary for charge separation. For this reason this mechanism is often referred to as friction charging or the triboelectric effect. Contact charging can happen between any material, including liq- uids and gases, and can also be affected by many other types of phenomena, such as ion transfer or energetic charged particles colliding with other ob- jects [100]. Therefore the exact mechanisms behind electrostatic discharges

61 64

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions can be difficult to understand and describe, even when the circumstances where the electrostatic discharge are likely are well known [195]. In this thesis we focus on two types of electrostatic discharge, lightning discharge and human-to-object (human-to-metal or human-to-human). Lightning discharges can cause electromagnetic disturbances in three ways, by passing through a system directly, by passing through a nearby object which then radiates electrical fields that disturbs the system, or by indirectly inducing transient currents in systems when the electrical field associated with a thundercloud disappears when the lightning discharge re- moves the charge imbalance between cloud and ground [45]. We discuss modelling of some lightning discharges from standards and experimental data in Section 3.2. Electrostatical discharges from humans are very common and are typ- ically just a nuisance, but they can damage sensitive electronics and can cause severe accidents, either by the shock from the discharge causing a human error or by directly causing gas or dust explosions [148, 195]. We discuss modelling of a simulated human-to-object electrostatical discharge in Section 3.3.

1.5.1 Electrostatic discharge modelling Well-defined representation of real electrostatic discharge currents is needed in order to establish realistic requirements for ESD generators used in testing the equipment and devices, as well as to provide and improve the repeata- bility of tests. It should be able to approximate the current for various test levels, test set-ups and procedures, and also for various ESD conditions such as approach speeds, types of electrodes, relative arc length, humidity, etc. A mathematical function is necessary for computer simulation of such phenomena, for verification of test generators and for improving standard waveshape definitions. A number of current functions, mostly based on exponential functions, have been proposed in the literature to model the ESD currents, [44,95,96, 142,144,152,266,278,286,287,301,302]. Here we will give a brief presentation of some of them and in Section 3.1 we will propose an alternative function and a scheme for fitting it to a waveshape. A number of mathematical expressions have been introduced in the liter- ature for the purpose of representation of the ESD currents, either the IEC 61000-4-2 Standard one [132], or experimentally measured ones, e.g. [95]. In this section we give an overview of most commonly applied ESD current approximations. A double-exponential function has been proposed by Cerri et al. [44] for representation of ESD currents for commercial simulators in the form t t − τ − τ i(t) = I1e 1 − I2e 2 ,

62 65

1.5. ELECTROMAGNETIC COMPATIBILITY AND ELECTROSTATIC DISCHARGE CURRENTS this type of function is also applied in other types of engineering, see Sec- tion 3.1 for some examples. This model was also extended with a four-exponential version by Keenan and Rossi [152]:

 t t   t t  − τ − τ − τ − τ i(t) = I1 e 1 − e 2 − I2 e 3 − e 4 . (25)

The Pulse function was proposed in [89],

 t p t − τ − τ i(t) = I0 1 − e 1 e 2 , and has been used for representation of lightning discharge currents both in its single term form [181] as well as linear combinations of two [266], three or four Pulse functions [301]. The Heidler function [117] is one of the most commonly used functions for lightning discharge modelling n  t  I τ1 t 0 − τ i(t) =  n e 2 , η 1 + t τ1

Wang et al. [286] proposed an ESD model in the form of a sum of two Heidler functions: n n  t   t  I τ1 t I τ3 t 1 − τ 2 − τ i(t) =  n e 2 +  n e 4 , (26) η1 1 + t η2 1 + t τ1 τ3   1/n   1/n with η = exp − τ1 nτ2 and η = exp − τ3 nτ4 being the 1 τ2 τ1 2 τ4 τ3 peak correction factors. The function has been used to fit different electro- static discharge currents using different methods [95, 286, 302]. Berghe and Zutter [278] proposed an ESD current model constructed as a sum of two Gaussian functions in the form: ! ! t − τ 2 t − τ 2 i(t) = A exp − 1 + Bt exp − 2 . (27) σ1 σ2

The following approximation using exponential polynomials is presented in [287], i(t) = Ate−Ct + Bte−Dt, (28) and has been used for design of simple electric circuits which can be used to simulate ESD currents. One of the most commonly used ESD standard currents is the IEC 61000- 4-2 current that represents a typical electrostatic discharge generated by the human body [132]. In the IEC 61000-4-2 standard [132] this current

63 66

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions is given by a graphical representation, see Figure 3.10, together with some constraints, see page 150. In Figure 1.6 the models discussed in this section have been fitted to the graph given in the standard. The data from the standard is not included in this figure since some features, notably the initial delay visible in the standard is not reproduced in either model. The different models give quite different quantitative behaviour in the region 2.5 − 25 ns. In Section 3.1 we propose a new scheme for modelling this type of functions and in Section 3.3 we fit this model to the IEC 61000-4-2 standard current and some experimental data.

Two Heidler, [95] Two Heidler, [302] 14 Pulse binomial, [266] Exponential polynomial, [287] 12 Two Gaussians, [278] Four exponential, [152]

10

8 ) [A] t ( i 6

4

2

0 0 5 10 15 20 25 30 35 40 45 50 55 t [ns]

Figure 1.6: Comparison of different functions representing the Standard ESD current waveshape for 4kV.

The model given in Section 3.1 is also fitted to both lightning discharge current from the standard and from measured data in Section 3.2.

64 67

1.6. MODELLING MORTALITY RATES

1.6 Modelling mortality rates

This section is based on Paper H

Understanding how the probability of surviving to or beyond a certain age is, can be an important question for insurers, actuaries, demographers and policies makers. For some purposes a simple mathematical model can be desirable. Here we will discuss some basic mathematical concepts useful for this type of understanding. Definition 1.19. Consider an individual whose current age is x and whose remaining lifetime is denoted Tx > 0 then the survival function, Sx(∆x), is defined as Sx(∆x) = Pr[Tx > ∆x].

It is typically assumed [71] that the remaining lifetime Tx obeys the relation

Pr[T0 > x + ∆x] Pr[Tx > ∆x] = Pr[T0 > x + ∆x|T0 > x] = (29) Pr[T0 > x]

S0(x + ∆x) or phrased in terms of the survival function Sx(∆x) = . S0(x) There are three conditions a survival function must satisfy [71] in order to have a reasonable interpretation in terms of lifespan

• Only individuals with positive remaining lifetime are considered thus an individual must survive at least 0 units of time, Sx(0) = 1.

• There are no immortal individuals, lim Sx(∆x) = 0. ∆x→∞ • Since the definition only contains an upper bound on remaining life- time Sx(∆x) must be non-increasing. Here we will not work with the survival function directly, instead we will model the mortality rate. Definition 1.20. The mortality rate, µ, (also known as force of mortality, death rate or hazard rate) for an individual of age x is defined as

µ(x) = lim Pr[T0 ≤ x + dx|T0 > x]. (30) dx→0+ We can express the mortality rate using the survival function and vice versa using the following lemma.

Lemma 1.11. If Sx(∆x) is a survival function whose derivative exists when x and ∆x are both non-negative and µ(x) is the corresponding mortality rate then dS0 dx µx = − S0(x)

65 68

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions and  Z x+∆x  Sx(∆x) = exp − µ(t) dt . x

Proof. Using (29) and that the derivative of S0(x) exists we can rewrite the definition of the mortality rate as dS0 1 − Pr[Tx > dx] dx d µ(x) = lim = − = − ln(S0(x)). dx→0+ dx S0(x) dx Thus we have expressed the mortality rate in terms of the survival function and using some calculus we can express the survival function in terms of the mortality rate. First note that if the derivative of S0(x) exists then dS0 dx d µ(x) = − = − ln(S0(x)) S0(x) dx and integrating on both sides gives

Z x+∆x µ(t) dt = ln(Sx(∆x)) − ln(Sx(0)) x and since Sx(0) = 1 then ln(Sx(0)) = 0

 Z x+∆x  Sx(∆x) = exp − µ(t) dt . x

In Chapter 4 we will apply these concepts to models of the human lifes- pans in different countries and different years. Mortality rates are typically estimated from demographic data for a country using the central mortality rate which is defined differently than the mortality rate given by (30). The central mortality rate of a group of individuals of age x at time t is denoted m and defined as m = dx where d is the number of deaths at age x x,t x,t Lx x during some time period and Lx is the average number of living individu- als of age x during that same interval. In this thesis we will only consider time intervals of one year and thus the estimates or the central mortality rate mx,t is estimated the same way as µ(x) so for any given year t we can assume that mx,t ≈ µ(x). When examining the mortality rate for developed countries there are three patterns that are recurring all over the world, an increased mortality 1 rate for infants that decreases rapidly, in other words µ(x) ∼ x for small x, exponential growth of mortality rate for higher ages, µ(x) ∼ ecx for large x, and a ’hump’ for young adults where the mortality rate first increases quickly and then remains constant or slowly decreases for some years. Some examples of mortality rates that demonstrate these patterns can be seen

66 69

1.6. MODELLING MORTALITY RATES in Figure 1.7. The typical explanations for the rapid decrease in mortality rate is that small children are sensitive to disease, disorders and accidents but becomes more resilient as they mature. The ’hump’ for young adults is usually attributed to a lifestyle change, starting in their early to mid teens individuals tend to become more independent and take more risks, especially young men. Sometimes this phenomena is known as the accident hump since accidents (often vehicular accidents) are believed to explain a large part of the shape of the hump, e.g. in the USA in 2017 approximately 40% of the deaths in the age range 15-35 [161]. The increase in mortality rate for higher ages is explained by the increased risk of health issues that follows naturally from aging. In some countries there is also a visible trend that the growth of the mortality rate starts to slow down for very high ages, whether this trend is generally present and at which age it should be taken into consideration is still being debated, see [14, 17, 88, 109, 177] for examples of varying views.

USA 1992 Sweden 1992 Switzerland 1992 Ukraine 1992

-4 -4 -4 -4 ) ) ) ) x,t x,t x,t -6 -6 x,t -6 -6 ln(m ln(m ln(m -8 -8 ln(m -8 -8

-10 -10 -10 -10 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 age, x, years age, x, years age, x, years age, x, years Japan 1992 Taiwan 1992 Australia 1992 Chile 1992

-4 -4 -4 -4 ) ) ) ) x,t x,t x,t -6 -6 -6 x,t -6 ln(m ln(m ln(m -8 -8 -8 ln(m -8

-10 -10 -10 -10 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 age, x, years age, x, years age, x, years age, x, years

Figure 1.7: Examples of central mortality rate curves for men demonstrating the typical patterns of rapidly decreasing mortality rate for very young ages followed by a ’hump’ for young adult and a rapid in- crease for high ages.

In Section 4.2 an overview of models in literature will be given and three new models introduced. The different models will then be compared to each other by fitting the models to the central mortality rate for men in various different countries and computing the corresponding AIC values. It is not only a models ability to reproduce observed patterns in data that determines its usefulness. Choosing the appropriate model depends on the intended application. For many applications it is not just desirable to understand what the mortality rate is now and how it has changed his-

67 70

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions torically, but also how it can be predicted. In Section 4.5 we will see what effects replacing historical data on mortality rates with values given by a few different fitted models will have when using a method of forecasting called the Lee–Carter method. The Lee–Carter forecasting method is described in the next section.

1.6.1 Lee–Carter method for forecasting This section is based on Paper I

For many applications it is also important to be able to forecast how the mortality rate will change in the future. There are several methods of producing the forecasts but the method proposed by Lee and Carter in 1992 [170], seems to be generally accepted, because it produces satisfactory fits and forecasts of mortality rates for various countries. Secondly, the structure of the Lee–Carter (L–C) method allows for easy computation of confidence intervals related to mortality projections. Lee and Carter developed their approach specifically for U.S. mortality data, 1900-1989 and forecasted (over a 50 year forecast horizon), 1990-2065. However, the method has now been applied to mortality data from many other countries and time periods, e.g. Chile [172], China [147], Japan [291], the seven most economically developed nations (G7) [275], India [46], the Nordic countries [162], Sri Lanka [1] and Thailand [298]. Lee and Carter assumed [170] that the central mortality rate for a given age changes as a log-normal random walk with drift ln(mx,t) = ax + bxkt + εx,t, (31) where ln(mx,t) is the central mortality rate at age x in year t, ax is the average pattern of mortality at age x, bx represents how mortality at each age varies when the general level of mortality changes, kt is mortality index that captures the evolution of rates over time and εx,t an error term which causes the deviation of the model from the observed mortality rates, assumed 2 to be normally distributed N(0, σt ). The parametrization given in (31) is not unique. For example, if we have a solution ax, bx and kt, then there might exist any non-zero constant c ∈ R which gives another solution of ax − cbx, cbx and kt = c, for which these transformations might produce identical forecasts. In order to get a unique solution when fitting a L–C model, constraints must be imposed. The constraints can be chosen in different ways but here we will use the X following: bx is constrained to sum to 1, bx = 1, and kt to sum to 0, x X kt = 0, which gives ax to be as the average over time of the ln(mx,t), t 1 X ax = T ln(mx,t). t

68 71

1.6. MODELLING MORTALITY RATES

The parameters ax, bx and the mortality indices kt are found as follows: Given a set of ages (or age ranges), {xi, i = 1, . . . , n}, and a set of years, 1 X {tj, j = 1,...,T }, first estimatea ˆx = T ln(mx,t). Then construct the t matrix given by Zij = ln(mxi,tj ) − aˆxi . From the conditions imposed on bx and kt we now know that Zij = bxi ktj and thereby the values of bx and kt can be found using the singular value decomposition (SVD) of Z. Finding > the standard SVD Z = USV gives ˆbx as the first column of U and kˆt is given by the largest singular value multiplied by the first column of V >. Forecasting future mortality indices can be done in different ways, but in practice the random walk with drift model (RWD) for kˆt is common because of its simplicity and straight forward interpretation, so we will also use the RWD model to estimate kˆt by kˆt = kˆt−1 + θ + εt. In this specification, θ is the drift term, and kˆt is forecast to decline linearly with increments of θ, while deviations from this path, εt, are permanently incorporated in the trajectory. The drift term θ is estimated as below, which shows that θˆ only depends on the first and last values of kˆt estimates, kˆ − kˆ θˆ = T 1 . T − 1

ˆ We can now forecast the mortality index with the formula kt+∆t = kˆt + θ∆t + εt and then predict the logarithm of the central mortality rate as ˆ ˆ ln(mx,t+∆t) =a ˆx + bxkt. To accompany this prediction we also want a con- fidence interval for the forecast at time ∆t. This can be done by computing the confidence interval for the mortality index by computing the standard deviation of the mortality indices compared with the RWD model and then multiplying the result with square root of ∆t. Thus if we used the central mortality rate for T different years we can with with confidence level α say that √ √ ˆ ˆ kt+∆t − λ α · σ · ∆t ≤ kt+∆t ≤ kt+∆t + λ α · σ · ∆t, 2 2 where λβ is the inverse of the cumulative normal distribution function for T −1 ! X 2 β and σ = (kt+1 − kt − θ) . In Section 4.5 examples of forecasts t=1 are illustrated, see Figure 4.7 for central mortality rates and Figure 4.8 for mortality indices. The L–C model is not without flaws, a common remark is that the as- sumptions on how mortality rate changes are quite restrictive. It cannot cap- ture age specific changes of pattern, for example medical breakthroughs in reducing a specific cause of death that is common in a certain age range [169]. It also often fails when applied to specific causes of mortality, for example motor vehicle accidents showed a rising trend initially as the availability of motor vehicles increased but over time it has decreased due to improved

69 72

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions safety of vehicles and roads as well as increased urbanisation [111]. The Lee- Carter model has also been criticized for giving age-profiles that evolve in implausible ways for long-run forecasts [111] as well as extrapolation back- wards in time [169]. It can also misrepresent the temporal dependence be- tween age groups [210]. Several variants and extensions of the L–C approach that improves performance have been suggested, see [29,57,162,169,171,176] for examples. These models extended the L–C approach by including addi- tional period effects and in some cases cohort effects. In Section 4.5 we will fit a few models to central mortality rate data for several countries and then use the fitted model to produce values for the L–C method and examine the differences in the predictions based on the different sets of data.

70 73

1.7. SUMMARIES OF PAPERS

1.7 Summaries of papers

Paper A [187] This paper examines the extreme points of the Vandermonde determinant on the sphere in three or more dimensions. A few different ways to analyse the three-dimensional case are shown in Section 2.1.2, and a detailed description of the method used to solve the n-dimensional problem from [269] can be found in Section 2.2.1. The extreme points are given in terms of roots of rescaled Hermite polynomials. For dimensions three to seven explicit expres- sions are given the results are visualized dimensions by using symmetries of the answers to project all the extreme points onto a two-dimensional surface, see Section 2.2.2. The thesis author contributed primarily to the derivation of some of the recursive properties of the Vandermonde determinant and its derivatives and to a lesser extent to the visualisation aspects of the problem.

Paper B [186] The Vandermonde determinant is optimized over the ellipsoid and cylinder in three dimensions, see Section 2.1.4 and 2.1.5. Lagrange multipliers are used to find a system of polynomial equations which give the local extreme points. Using Gr¨obnerbasis and other techniques the extreme points are given either explicitly or as roots of univariate polynomials. The results also presented visually for some special cases. The method is also extended to surfaces defined by homogeneous polynomials, see Section 2.1.6. The ex- treme points on sphere defined by the p - norm (primarily p = 4) are also discussed. The thesis author primarily contributed to the examination of the ellipsoid, cylinder and surfaces defined by homogenous polynomials.

Paper C [216] The sphere in n dimensions with respect to a p - norm can be thought of as a surface defined implicitly by a univariate polynomial. Here it is shown that the extreme points of the Vandermonde determinant on a bounded surface defined by a univariate polynomial are given the zeroes of the polynomial solution of a differential equation with polynomial coefficients. Expressions for polynomials whose roots give the coordinates of the extreme points are given for the cases of a surface given by a general first or second-degree poly- nomial, some higher degree monomials and cubes (Sections 2.3.1–2.3.3 and 2.3.6). Some results that can be used to reduce the dimension of the prob- lem, but not solve it entirely, for even n and p are also discussed. The thesis author contributed by extending previous results to cubes and the general polynomials of low degree and, based on contributions from the other au- thors, he found how the Newton–Girard formulae can be used to compactly express and simplify the equation system corresponding to the case where the surface is a sphere defined by a p -norm for even n and p.

71 74

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Paper D [217] This paper reviews the role of the Vandermonde matrix in random matrix theory and shows how the problem of finding the extreme points of the prob- ability distribution of the eigenvalues of a Wishart matrix can be rewritten as a problem of finding the extreme points of the Vandermonde determinant on a sphere with a radius given by the trace of the square of the Wishart ma- trix. The thesis authors contribution was showing that the extreme points of the probability distribution of the eigenvalues must lie on a sphere with a particular radius and how to use the properties of the Vandermonde deter- minant to find a polynomial whose roots give the coordinates of the extreme points of the probability distribution of the eigenvalues, see Section 2.3.7.

Paper E∗ [185] This paper is a detailed description and derivation of some properties of the analytically extended function (AEF) and a scheme for how it can be used in approximation of lightning discharge currents, see sections 3.1.1 and 3.2.2. Lightning discharge currents are classified in the IEC 62305-1 Stan- dard into waveshapes representing important observed phenomena. These waveshapes are approximated with mathematical functions in order to be used in lightning discharge models for ensuring electromagnetic compatibil- ity. A general framework for estimating the parameters of the AEF using the Marquardt least squares method (MLSM) for a waveform with an arbitrary (finite) number of peaks as well as for the given charge transfer and specific energy is described, see sections 1.2.6, 3.2 and 3.2.3. This framework is used to find parameters for some single-peak waveshapes and advantages and disadvantages of the approach are discussed, see Section 3.2.6. The thesis author contributed with the p -peak formulation of the AEF, modification to the MLSM and basic software for fitting the AEF to data.

Paper F∗[184] In this paper it is examined how the analytically extended function (AEF) can be used to approximate multi-peaked lightning current waveforms. A general framework for estimating the parameters of the AEF using the Mar- quardt least squares method (MLSM) for a waveform with an arbitrary (finite) number of peaks is presented, see Section 3.2. This framework is used to find parameters for some waveforms, such as lightning currents from the IEC 62305-1 Standard and recorded lightning current data, see Section 3.2.6. The thesis author contributed with improved software for fitting the AEF to the more complicated waveforms (compared to Paper E).

∗The model and techniques in Paper E and F are applied to various waveforms in [144, 145, 188–190].

72 75

1.7. SUMMARIES OF PAPERS

Paper G [191] The multi-peaked analytically extended function (AEF) is used in this pa- per for representation of electrostatic discharge (ESD) currents. In order to minimize unstable behaviour and the number of free parameters the expo- nents of the AEF are chosen from an arithmetic sequence. The function is fitted by interpolating data chosen according to a D-optimal design. ESD current modelling is illustrated through two examples: an approximation of the IEC Standard 61000-4-2 waveshape, and a representation of some mea- sured ESD current. The contents of this paper is in Section 3.3. The thesis author contributed with the derivation of the D-optimal design, motivating its use as well as software for fitting the AEF to the example currents.

Paper H [192] There are many models for the mortality rates for various years and coun- tries. A phenomenon that complicates the modelling of human mortality rates is a rapid increase in mortality rate for young adults (in many de- veloped countries this is especially pronounced at the age of 25). In this paper a model for mortality rates based on power-exponential functions is introduced and compared to empirical data for mortality rates from sev- eral countries and other mathematical models for mortality rate. The thesis authors contribution is the formulation of the model and writing software for fitting the various models to empirical data and computing the Akaike Information Criterion to facilitate comparison between the models.

Paper I [33] Mortality rate forecasting is important in actuarial science and demogra- phy. There are many models for mortality rates with different properties and varying complexity. In this paper several models are used to mortality rates listings by fitting the models to empirical data using non-linear least square fitting. These listings are then used to forecast the mortality rate using the Lee–Carter method and the results for the different models are compared. The thesis authors contribution was assisting with writing soft- ware that computed the mortality rate listings as well as devise the method for comparing the reliability of the forecast in a simple manner.

73 76 77

Chapter 2

Extreme points of the Vandermonde determinant

This chapter is based on Papers A, B, C, and D Paper A Karl Lundeng˚ard,Jonas Osterberg¨ and Sergei Silvestrov. Extreme points of the Vandermonde determinant on the sphere and some limits involving the generalized Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019. Paper B Karl Lundeng˚ard,Jonas Osterberg¨ and Sergei Silvestrov. Optimization of the determinant of the Vandermonde matrix on the sphere and related surfaces. Methodology and Computing in Applied Probability, Volume 20, Issue 4, pages 1417 – 1428, 2018. Paper C Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Extreme points of the Vandermonde determinant on surfaces implicitly determined by a univariate polynomial. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019.

Paper D Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Optimization of the Wishart joint eigenvalue probability density distribution based on the Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019. 78 79

2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D

2.1 Extreme points of the Vandermonde determi- nant and related determinants on various sur- faces in three dimensions

In this chapter we will discuss how to optimize the determinant of the Van- dermonde matrix and some related determinants over various surfaces in three dimensions and the results will be visualized.

2.1.1 Optimization of the generalized Vandermonde deter- minant in three dimensions This section is based on Section 1.1 of Paper A

In this section we plot the values of the determinant v3(x3) = (x3 − x2)(x3 − x1)(x2 − x1), and also the generalized Vandermonde determinant g3(x3, a3) for three dif- 2 2 2 3 ferent choices of a3 over the unit sphere x1 +x2 +x3 = 1 in R . Our plots are over the unit sphere but the determinant exhibits the same general behavior over centered spheres of any radius. This follows directly from (1.4) and that exactly one element from each row appears in the determinant. For any scalar c we get

n ! Y ai gn(cxn, an) = c gn(xn, an), i=1 which for vn becomes n(n−1) vn(cxn) = c 2 vn(xn), (32) and so the values over different radii differ only by a constant factor. In Figure 2.1 value of v3(x3) has been plotted over the unit sphere and the curves where the determinant vanishes are traced as black lines. The coordinates in Figure 2.1 (b) are related to x3 by    √  2 0 1 1/ 6 0√ 0 x3 = −1 1 1  0 1/ 2 0√  t, (33) −1 −1 1 0 0 1/ 3 where the columns in the product of the two matrices are the basis vectors in 3 3 R . The unit sphere in R can also be described using spherical coordinates. In Figure 2.1 (c) the following parametrization was used.

cos(φ) sin(θ) t(θ, φ) =  sin(φ)  . (34) cos(φ) cos(θ)

77 80

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

(a) Plot with respect to (b) Plot with respect to (c) Plot with respect to the regular x-basis. the t-basis, see (33). parametrization (34).

Figure 2.1: Plot of v3(x3) over the unit sphere.

We will use this t-basis and spherical parametrization throughout this sec- tion. From the plots in Figure 2.1 it can be seen that the number of extreme points for v3 over the unit sphere seem to be 6 = 3!. It can also been seen that all extreme points seem to lie in the plane through the origin that is orthogonal to an apparent symmetry axis in the direction (1, 1, 1), the direction of t3. We will see later that the extreme points for vn indeed lie in n X the hyperplane xi = 0 for all n, see Theorem 2.2, and the total number i=1 of extreme points for vn equals n!, see Remark 2.1. The black lines where v3(x3) vanishes are actually the intersections be- tween the sphere and the three planes x3 − x1 = 0, x3 − x2 = 0 and x2 − x1 = 0, as these differences appear as factors in v3(x3). We will see later on that the extreme points are the six points acquired from permuting the coordinates in 1 x3 = √ (−1, 0, 1) . 2 For reasons that will become clear in Section 2.2.1 it is also useful to think about these coordinates as the roots of the polynomial 1 P (x) = x3 − x. 3 2

So far we have only considered the behavior of v3(x3), that is g3(x3, a3) with a3 = (0, 1, 2). We now consider three generalized Vandermonde de- terminants, namely g3 with a3 = (0, 1, 3), a3 = (0, 2, 3) and a3 = (1, 2, 3). These three determinants show increasingly more structure and they all have a neat formula in terms of v3 and the elementary symmetric polynomials X ekn = ek(x1, ··· , xn) = xi1 xi2 ··· xik , 1≤i1

78 81

2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D

(a) Plot with respect to (b) Plot with respect to (c) Plot with respect to the regular x-basis. the t-basis, see (33). angles given in (34).

Figure 2.2: Plot of g3(x3, (0, 1, 3)) over the unit sphere.

where we will simply use ek whenever n is clear from the context. In Figure 2.2 we see the determinant

1 1 1 g3(x3, (0, 1, 3)) = x1 x2 x3 = v3(x3)e1, 3 3 3 x1 x2 x3 plotted over the unit sphere. The expression v3(x3)e1 is easy to derive, the v3(x3) is there since the determinant must vanish whenever any two columns are equal, which is exactly what the Vandermonde determinant expresses. The e1 follows by a simple polynomial division. As can be seen in the plots we have an extra black circle where the determinant vanishes compared to Figure 2.1. This circle lies in the plane e1 = x1 + x2 + x3 = 0 where we previously found the extreme points of v3(x3) and thus doubles the number of extreme points to 2 · 3!. A similar treatment can be made of the remaining two generalized de- terminants that we are interested in, plotted in the following two figures.

(a) Plot with respect to (b) Plot with respect to (c) Plot with respect to the regular x-basis. the t-basis, see (33). angles given in (34).

Figure 2.3: Plot of g3(x3, (0, 2, 3)) over the unit sphere.

79 82

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

(a) Plot with respect to (b) Plot with respect to (c) Plot with respect to the regular x-basis. the t-basis, see (33). angles given in (34).

Figure 2.4: Plot of g3(x3, (1, 2, 3)) over the unit sphere. a3 g3(x3, a3)

(0, 1, 2) v3(x3)e0 = (x3 − x2)(x3 − x1)(x2 − x1) (0, 1, 3) v3(x3)e1 = (x3 − x2)(x3 − x1)(x2 − x1)(x1 + x2 + x3) (0, 2, 3) v3(x3)e2 = (x3 − x2)(x3 − x1)(x2 − x1)(x1x2 + x1x3 + x2x3) (1, 2, 3) v3(x3)e3 = (x3 − x2)(x3 − x1)(x2 − x1)x1x2x3

Table 2.1: Table of some determinants of generalized Vandermonde matrices.

The four determinants treated so far are collected in Table 2.1. Deriva- tion of these determinants is straight forward. We note that all but one of them vanish on a set of planes through the origin. For a = (0, 2, 3) we have the usual Vandermonde planes but the intersection of e2 = 0 and the unit sphere occur at two circles. 1 x x + x x + x x = (x + x + x )2 − (x2 + x2 + x2) 1 2 1 3 2 3 2 1 2 3 1 2 3 1 1 = (x + x + x )2 − 1 = (x + x + x + 1) (x + x + x − 1) , 2 1 2 3 2 1 2 3 1 2 3 and so g3(x3, (0, 2, 3)) vanish on the sphere on two circles lying on the planes x1 + x2 + x3 + 1 = 0 and x1 + x2 + x3 − 1 = 0. These circles can be seen in Figure 2.3 as the two black circles perpendicular to the direction (1, 1, 1). Note also that while v3 and g3(x3, (0, 1, 3)) have the same absolute value on all their respective local extreme points (by symmetry) we have that both g3(x3, (0, 2, 3)) and g3(x3, (1, 2, 3)) have different absolute values for some of their respective extreme points.

80 83

2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D

2.1.2 Extreme points of the Vandermonde determinant on the three-dimensional unit sphere

This section is based on Section 2.2 of Paper A

It is fairly simple to describe v3(x3) on the circle that is formed by the intersection of the unit sphere and the plane x1 + x2 + x3 = 0. Using Rodrigues’ rotation formula to rotate a point, x, around the axis √1 (1, 1, 1) 3 with the angle θ will give the √ √  2 cos(θ) + 1 1− cos(θ)− 3 sin(θ) 1− cos(θ)+ 3 sin(θ) 1 √ √ R = 1− cos(θ)+ 3 sin(θ) 2 cos(θ) + 1 1− cos(θ)− 3 sin(θ) . θ 3 √ √  1− cos(θ)− 3 sin(θ) 1− cos(θ)+ 3 sin(θ) 2 cos(θ) + 1

A point which already lies on S2 can then be rotated to any other point on S2 by letting R act on the point. Choosing the point x = √1 (−1, 0, 1) θ 2 gives the Vandermonde determinant a convenient form on the circle since: √ − 3 cos(θ) − sin(θ) 1 Rθx = √  √ −2 sin(θ)  , 6 3 cos(θ) + sin(θ) which gives √  2v3(Rθx) = 2 3 cos(θ) + sin(θ) √  3 cos(θ) + sin(θ) + 2 sin(θ)  √  −2 sin(θ) + 3 cos(θ) + sin(θ) 1 1 =√ 4 cos(θ)3 − 3 cos(θ) = √ cos(3θ). 2 2

Note that the final equality follows from cos(nθ) = Tn(cos(θ)) where Tn is the nth Chebyshev polynomial of the first kind. From formula (55) if follows that P3(x) = T3(x) but for higher dimensions the relationship between the Chebyshev polynomials and Pn is not as simple. Finding the maximum points for v3(x3) on this form is simple. The Van- dermonde determinant will be maximal when 3θ = 2nπ where n is some 2π integer. This gives three local maxima corresponding to θ1 = 0, θ2 = 3 4π and θ3 = 3 . These points correspond to cyclic permutation of the coordi- nates of x = √1 (−1, 0, 1). Analogously the minimas for v (x ) can be shown 2 3 3 to be a transposition followed by cyclic permutation of the coordinates of x. Thus any permutation of the coordinates of x correspond to a local extreme point just like it was stated on page 78.

81 84

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

2.1.3 Optimisation using Gr¨obnerbases This section is based on Section 4 of Paper B

In this section we will find the extreme points of the Vandermonde de- terminant on a few different surfaces. This will be done using Lagrange multipliers and Gr¨obnerbases but first we will make an observation about the Vandermonde determinant that will be useful later.

Lemma 2.1. The Vandermonde determinant is a n(n−1) of degree 2 . Proof. Considering the expression for the Vandermonde determinant in The- n X n(n − 1) orem 1.2 the number of factor of v (x) is i − 1 = . Thus n 2 i=1 n(n−1) vn(cx) = c 2 vn(x). (35)

Gr¨obnerbases together with algorithms to find them, and algorithms for solving a polynomial equation is an important tool that arises in many applications. One such application is the optimization of polynomials over affine varieties through the method of Lagrange multipliers. We will here give some main points and informal discussion on these methods as an in- troduction and describe some notation.

Definition 2.1. ([60]) Let f1, ··· , fm be polynomials in R[x1, ··· , xn]. The affine variety V (f1, ··· , fm) defined by f1, ··· , fm is the set of all points n (x1, ··· , xn) ∈ R such that fi(x1, ··· , xn) = 0 for all 1 ≤ i ≤ m. When n = 3 we will sometimes use the variables x, y, z instead of x1, x2, x3. Affine varieties are this way the common zeros of a set of multi- variate polynomials. Such sets of polynomials will generate a greater set of polynomials [60] by

( m ) X hf1, ··· , fmi ≡ hifi : h1, ··· , hm ∈ R[x1, ··· , xn] , i=1 and this larger set will define the same variety. But it will also define an ideal (a set of polynomials that contains the zero-polynomial and is closed under addition, and absorbs multiplication by any other polynomial) by I(f1, ··· , fm) = hf1, ··· , fmi. A Gr¨obnerbasis for this ideal is then a finite set of polynomials {g1, ··· , gk} such that the ideal generated by the leading terms of the polynomials g1, ··· , gk is the same ideal as that generated by all the leading terms of polynomials in I = hf1, ··· , fmi.

82 85

2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D

In this paper we consider the optimization of the Vandermonde deter- minant vn(x) over surfaces defined by a polynomial equation on the form n X p sn(x1, ··· , xn ; p; a1, ··· , an) ≡ ai|xi| = 1, (36) i=1 where we will select the constants ai and p to get ellipsoids in three di- mensions, cylinders in three dimensions, and spheres under the p-norm in n dimensions. The cases of the ellipsoids and the cylinders are suitable for solution by Gr¨obnerbasis methods, but due to the existing symmetries for the spheres other methods are more suitable, as provided in Section 2.3.3. From (35) and the convexity of the interior of the sets defined by (36), under a suitable choice of the constant p and non-negative ai, it is easy n X p to see that the optimal value of vn on ai|xi| ≤ 1 will be attained on i=1 n X p ai|xi| = 1. And so, by the method of Lagrange multipliers we have that i=1 the minimal/maximal values of vn(x1, ··· , xn) on sn(x1, ··· , xn) ≤ 1 will be attained at points such that ∂vn −λ ∂sn = 0 for 1 ≤ i ≤ n and some constant ∂xi ∂xi λ and sn(x1, ··· , xn) − 1 = 0, [243]. For p = 2 the resulting set of equations will form a set of polynomials in λ, x1, ··· , xn. These polynomials will define an ideal over R[λ, x1, ··· , xn], and by finding a Gr¨obnerbasis for this ideal we can use the especially nice properties of Gr¨obner bases to find analytical solutions to these problems, that is, to find roots for the polynomials in the computed basis.

2.1.4 Extreme points on the ellipsoid in three dimensions

This section is based on Section 5 of Paper B

In this section we will find the extreme points of the Vandermonde determi- nant on the three dimensional ellipsoid given by ax2 + by2 + cz2 = 1 (37) where a > 0, b > 0, c > 0. Using the method of Lagrange multipliers together with (37) and some rewriting gives that all stationary points of the Vandermonde determinant lie in the variety

V = V ax2 + by2 + cz2 − 1, ax + by + cz, ax(z − x)(y − x) − by(z − y)(y − x) + cz(z − y)(z − x).

83 86

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Computing a Gr¨obnerbasis for V using the lexicographic order x > y > z give the following three basis polynomials: 2 g1(z) =(a + b)(a − b) − 4(a + b)2(a + c)(b + c) + 3c2(a2 + ab + b2) + 3c(a3 + b3) z2 + 3c(a + b + c)4(a + b)(a + c)(b + c) + (a2 + b2)c + (a + b)c2z4 − c2(b + c)(a + c)(a + b + c)2z6, (38) 2 2 2 2  g2(y, z) = 2(a + b) (a + c)(b + c) + c(a + 2b )(a + b + c) + 2bc (a + b) z 5 3 + q1z − q2z − b(a − b)(a + b)(a + b + 3c)y, (39) 2 2 2 2  g3(x, z) = 2(a + b) (a + c)(b + c) + c(2a + b )(a + b + c) + 2ac (a + b) z 5 3 − q1z + q2z − a(a − b)(a + b)(a + b + 3c)x, (40) 2 2 q1 = 9 c (b + c)(a + c)(a + b + c) , 2 2 2 2 2 2 q2 = 3c(a + b + c)(3a b + 4a c + 3ab + 6abc + 4ac + 4b c + 4bc ). This basis was calculated using software for symbolic computation [200]. Since g1 only depends on z, and g2 and g3 are first degree polynomials in y and x respectively, the stationary points can be found by finding the roots of g1 and then calculate the corresponding x and y coordinates. A general formula can be found in this case (since g1 only contains even powers of z it can be treated as a third degree polynomial) but it is quite cumbersome and we will therefore not give it explicitly.

Lemma 2.2. The extreme points of v3 on an ellipsoid will have real coor- dinates. Proof. The is a useful tool for determining how many real roots low-level polynomials have. Following Irving [135] the discriminant, ∆(p), 2 3 of a third degree polynomial p(x) = c0 + c1x + c2x + c3x is 3 2 2 3 2 2 ∆ = 18c1c2c3c4 − 4c2c4 + c2c3 − 4c1c3 − 27c1c4 and if p(x) is non-negative then all roots will be real (but not necessarily distinct). Since the first basis polynomial g1 only contains terms with even 2 exponents and is of degree 6 the polynomialg ˜1 defined byg ˜1(z ) = g1(z) will be a polynomial of degree 3 whose roots are the square roots of g1. Calculating the discriminant ofg ˜1 gives 2 2 4 3 ∆(˜g1) = 9(a − b) (a + b + 3c) (a + b + c) abc 32(a3b2 + a3c2 + a2b3 + a2c3 + b3c2 + b2c3) + 61abc(a + b + c)2 .

Since a, b and c are all positive numbers it is clear that ∆(g1) is non- negative. Furthermore, since a, b and c are positive numbers all terms ing ˜1 with odd powers have negative coefficients and all terms with even powers have positive coefficients. Thus if w < 0 theng ˜1(w) > 0 and thus all roots must be positive.

84 87

2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D

x2 y2 Figure 2.5: Illustration of the ellipsoid defined by + +z2 = 0 with the ex- 9 4 treme points of the Vandermonde determinant marked. Displayed in Cartesian coordinates on the right and in ellipsoidal coordinates on the left.

An illustration of an ellipsoid and the extreme points of the Vandermonde determinant on its surface is shown in Figure 2.5.

2.1.5 Extreme points on the cylinder in three dimensions This section is based on Section 6 of Paper B

In this section we will examine the local extreme points on an infinitely long cylinder aligned with the x-axis in 3 dimensions. In this case we do not need to use Gr¨obnerbasis techniques since the problem can be reduced to a one dimensional polynomial equation. The cylinder is defined by by2 + cz2 = 1, where b > 0, c > 0. (41) Using the method of Lagrange multipliers gives the equation system ∂v 3 = 0, ∂x ∂v 3 = 2λby, ∂y ∂v 3 = 2λcz. ∂z Taking the sum of each expression gives c by + cz = 0 ⇔ y = − z. (42) b Combining (41) and (42) gives r c  b 1 rc 1 + 1 cz2 = 1 ⇒ z = ± √ ⇒ y = ∓ √ . b c b + c b b + c

85 88

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

16 Figure 2.6: Illustration of the cylinder defined by y2 + z2 = 1 with the ex- 25 treme points of the Vandermonde determinant marked. Displayed in Cartesian coordinates on the right and in cylindrical coordinates on the left.

Thus the plane defined by (42) intersects with the cylinder along the lines ( r r ! ) c 1 b 1 `1 = x, √ , − √ x ∈ R = {(x, r, −s)|x ∈ R} , b b + c c b + c ( r r ! ) c 1 b 1 `2 = x, − √ , √ x ∈ R = {(x, −r, s)|x ∈ R} . b b + c c b + c

Finding the stationary points for v3 along `1: r r ! ! 2 1 b c 1 v3 (x, r, −s) = x + √ − x + (r + s) , b + c c b b + c r !! ∂v 1 b rc 3 (x, r, −s) = 2x + √ − (r + s) . ∂x b + c c b

From this it follows that r ! ∂v 1 rc b 3 (x, r, −s) = 0 ⇔ x = √ − . ∂x 2 b + c b c

Thus r ! r ! 1 1 rc b rc b x1 = √ − , , − (43) b + c 2 b c b c is the only stationary point on `1. It can similarly be shown that x2 = −x1 is the only stationary point on `2. The location of these points on the cylinder are shown in Figure 2.6.

86 89

2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D

2.1.6 Optimizing the Vandermonde determinant on a surface defined by a homogeneous polynomial This section is based on Section 7 of Paper B

When using Lagrange multipliers it can be desirable to not have to consider the λ-parameter (the scaling between the gradient and direction given by the constraint). We demonstrate a simple way to remove this parameter when the surface is defined by an homogeneous polynomial.

Lemma 2.3. Let g : R → R be a homogeneous polynomial such that k n(n−1) n g(cx) = c g(x) with k 6= 2 . If g(x) = 1, x ∈ C defines a continu- ous bounded surface then any point on the surface that is a stationary point n for the Vandermonde determinant, z ∈ C , can be written as z = cy where

∂vn ∂g = , i ∈ 1, 2, . . . , n (44) ∂xi x=y ∂xi x=y

− 1 and c = g(y) k . n Proof. By the method of Lagrange multipliers the point y ∈ {x ∈ R |g(x) = 1} is a stationary point for the Vandermonde determinant if

∂vn ∂g = λ , k ∈ 1, 2, . . . , n ∂xk x=y ∂xk x=y for some λ ∈ R. The stationary points on the surface given by g(cx) = ck are given by n(n−1) ∂vn k ∂g c 2 = c λ , k ∈ 1, 2, . . . , n ∂xk x=y ∂xk x=y n(1−n) k and if c is chosen such that λ = c 2 c then the stationary points are defined by ∂v ∂g n = , k ∈ 1, 2, . . . , n. ∂xk ∂xk n k Suppose that y ∈ {x ∈ R |g(x) = c } is a stationary point for vn then − 1 the point given by z = cy where c = g(y) k will be a stationary point for the Vandermonde determinant and will lie on the surface defined by g(x) = 1.

Lemma 2.4. If z is a stationary point for the Vandermonde determinant on the surface g(x) = 1 where g(x) is a homogeneous polynomial then −z is either a stationary point or does not lie on the surface.

87 90

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

k Proof. Since g(−x) = (−1) g(x) is either 1 or −1 then |vn(x)| = |vn(−x)| for any point, including z and the points in a neighbourhood around it which means that if g(−x) = g(x) then the stationary points are preserved and otherwise the point will lie on the surface defined by g(x) = −1 instead of g(x) = 1.

A well-known example of homogeneous polynomials are quadratic forms. If we let g(x) = x>aSx then g(x) is a quadratic form which in turn is a homogeneous polynomial with k = 2. If S is a positive definite matrix then g(x) = 1 defines an ellipsoid. Here we will demonstrate the use of Lemma 2.3 to find the extreme points on a rotated ellipsoid. Consider the ellipsoid defined by 1 5 3 5 x2 + y2 + yz + z2 = 1 (45) 9 8 4 8 then by Lemma 2.3 we can instead consider the points in the variety 2 V = V − 2xy + 2xz + y2 − z2 − x, 9 5 3 − x2 + 2xy − 2yz + z2 − y − z, 4 4 3 5 − 2xz − y2 + 2yz + x2 − y − z. 4 4 Finding the Gr¨obnerbasis of V gives

2 g1(z) = z(6z + 1)(260642z − 27436z + 697), 3 2 g2(y, z) = − 1138484256z − 127275604z + 16689841z + 6277879y, 3 2 g3(x, z) = 10246358304z + 1145480436z − 93707658z + 6277879x.

This system is not difficult to solve and the resulting points are: p0 = (0, 0, 0),  1 1 p = 0, , − , 1 6 6 √ √ √ ! 45 2 1 5 2 1 5 2 p = , − − , − , 2 361 19 722 19 722 √ √ √ ! 45 2 1 5 2 1 5 2 p = , − + , + . 3 361 19 722 19 722

The point p0 is an artifact of the rewrite and does not lie on any ellipsoid and can therefore be discarded. By Lemma 2.4 there are also three more

88 91

2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE

Figure 2.7: Illustration of the ellipsoid defined by (45) with the extreme points of the Vandermonde determinant marked. Displayed in Cartesian coordinates on the right and in ellipsoidal coordinates on the left.

stationary points p4 = −p1, p5 = −p2 and p6 = −p3. Rescaling each of p these points according to Lemma 2.3 gives qi = g(pi) which are all points on the ellipsoid defined by g(x) = 1. The result is illustrated in Figure 2.7. Note that this example gives a simple case with a Gr¨obnerbasis that is small and easy to find. Using this technique for other polynomials and in higher dimensions can require significant computational resources.

2.2 Extreme points of the Vandermonde determi- nant on the sphere

In this section we will consider the extreme points of the Vandermonde n determinant on the n-dimensional unit sphere in R . We want both to find an analytical solution and to identify some properties of the determinant that can help us to visualize it in some area around the extreme points in dimensions n > 3.

2.2.1 The extreme points on the sphere given by roots of a polynomial This section is based on Section 2.1 of Paper A

The extreme points of the Vandermonde determinant on the unit sphere in n R are known and given by Theorem 2.3 where we present a special case of Theorem 6.7.3 in [269]. We will also provide a proof that is more explicit than the one in [269] and that exposes more of the rich symmetric prop- erties of the Vandermonde determinant. For the sake of convenience some properties related to the extreme points of the Vandermonde determinant defined by real vectors xn will be presented before Theorem 2.3.

89 92

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Theorem 2.1. For any 1 ≤ k ≤ n n ∂vn X vn(xn) = . (46) ∂x x − x k i=1 k i i6=k This theorem will be proven after introducing the following useful lemma: Lemma 2.5. For any 1 ≤ k ≤ n − 1 "n−1 # ∂vn vn(xn) Y ∂vn−1 = − + (x − x ) (47) ∂x x − x n i ∂x k n k i=1 k and n−1 ∂vn X vn(xn) = . (48) ∂x x − x n i=1 n i Proof. Note that the determinant can be described recursively "n−1 # Y Y vn(xn) = (xn − xi) (xj − xi) i=1 1≤i

90 93

2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE

Supposing that formula (46) is true for n − 1 results in

"n−1 # n−1 ∂vn vn(xn) Y X vn−1(xn−1) = − + (x − x ) ∂x x − x n i x − x k n k i=1 i=1 k i i6=k n−1 n vn(xn) X vn(xn) X vn(xn) = + = . x − x x − x x − x k n i=1 k i i=1 k i i6=k i6=k Showing that (46) is true for n = 2 completes the proof

2 ∂v2 ∂ x2 − x1 X v2(x2) = (x − x ) = −1 = = ∂x ∂x 2 1 x − x x − x 1 1 1 2 i=1 1 i i6=1 2 ∂v2 ∂ x2 − x1 X v2(x2) = (x − x ) = 1 = = . ∂x ∂x 2 1 x − x x − x 2 2 2 1 i=1 2 i i6=2

Theorem 2.2. The extreme points of vn(xn) on the unit sphere can all be found in the hyperplane defined by n X xi = 0. (50) i=1 This theorem will be proved after the introduction of the following useful lemma:

Lemma 2.6. For any n ≥ 2 the sum of the partial derivatives of vn(xn) will be zero. n X ∂vn = 0. (51) ∂xk k=1 Proof. This lemma is easily proven using Lemma 2.5 and induction: n n−1 "n−1 # ! n−1 X ∂vn X vn(xn) Y ∂vn−1 X vn(xn) = − + (xn − xi) + ∂xk xn − xk ∂xk xn − xi k=1 k=1 i=1 i=1 "n−1 # n−1 Y X ∂vn−1 = (xn − xi) . ∂xk i=1 k=1 Thus if equation (51) is true for n − 1 it is also true for n. Showing that the equation holds for n = 2 is very simple ∂v ∂v 2 + 2 = −1 + 1 = 0. ∂x1 ∂x2

91 94

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Proof of Theorem 2.2. Using the method of Lagrange multipliers it follows that any xn on the unit sphere that is an extreme point of the Vandermonde determinant will also be a stationary point for the Lagrange function

n ! X 2 Λn(xn, λ) = v(xn) − λ xi − 1 i=1 for some λ. Explicitly this requirement becomes

∂Λ n = 0 for all 1 ≤ k ≤ n, (52) ∂xk ∂Λ n = 0. (53) ∂λ

Equation (53) corresponds to the restriction to the unit sphere and is there- fore immediately satisfied. Since all the partial derivatives of the Lagrange function should be equal to zero it is obvious that the sum of the partial derivatives will also be equal to zero. Combining this with Lemma 2.6 gives n n   n X ∂Λn X ∂vn X = − 2λxk = −2λ xk = 0. (54) ∂xk ∂xk k=1 k=1 k=1 n X There are two ways to satisfy condition (54) either λ = 0 or xk = 0. k=1 When λ = 0 equation (52) reduces to

∂v n = 0 for all 1 ≤ k ≤ n, ∂xk and by equation (32) this can only be true if vn(xn) = 0, which is of no n X interest to us, and so all extreme points must lie in the hyperplane xk = k=1 0.

n Theorem 2.3. A point on the unit sphere in R , xn = (x1, x2, . . . xn), is an extreme point of the Vandermonde determinant if and only if all xi, i ∈ {1, 2, . . . n}, are distinct roots of the rescaled Hermite polynomial

r ! − n n(n − 1) P (x) = (2n(n − 1)) 2 H x . (55) n n 2

92 95

2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE

Remark 2.1. Note that if xn = (x1, x2, . . . xn) is an extreme point of the Vandermonde determinant then any other point whose coordinates are a permutation of the coordinates of xn is also an extreme point. This follows from the determinant function being, by definition, alternating with respect to the columns of the matrix and the xis defines the columns of the Vander- monde matrix. Thus any permutation of the xis will give the same value for |vn(xn)|. Since there are n! permutations there will be at least n! extreme points. The roots of the polynomial (55) define the set of xis fully and thus there are exactly n! extreme points, n!/2 positive and n!/2 negative.

Remark 2.2. All terms in Pn(x) are of even order if n is even and of odd order when n is odd. This means that the roots of Pn(x) will be symmetrical in the sense that if xi is a root then −xi is also a root.

Proof of Theorem 2.3. By the method of Lagrange multipliers condition (52) must be satisfied for any extreme point. If xn is a fixed extreme point so that vn(xn) = vmax, then (52) can be written explicitly, using (46), as n ∂Λn X vmax = − 2λx = 0 for all 1 ≤ k ≤ n, ∂x x − x k k i=1 k i i6=k or alternatively by introducing a new multiplier ρ as n X 1 2λ ρ = x = x for all 1 ≤ k ≤ n. (56) x − x v k n k i=1 k i max i6=k

By forming the polynomial f(x) = (x − x1)(x − x2) ··· (x − xn) and noting that n n n 0 X Y Y f (xk) = (x − xi) = (xk − xi), j=1 i=1 x=xk i=1 i6=j i6=k n n n n n n n 00 X X Y X Y X Y f (xk) = (x − xi) = (xk − xi) + (xk − xi) l=1 j=1 i=1 x=xk j=1 i=1 l=1 i=1 j6=l i6=j j6=k i6=j l6=k i6=l i6=l i6=k i6=k n n X Y = 2 (xk − xi), j=1 i=1 j6=k i6=j i6=k

93 96

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions we can rewrite (56) as

00 1 f (xk) ρ 0 = xk, 2 f (xk) n or 2ρ f 00(x ) − x f 0(x ) = 0. k n k k And since the last equation must vanish for all k we must have

2ρ f 00(x) − xf 0(x) = cf(x), (57) n for some constant c. To find c the xn-terms of the right and left part of equation (57) are compared to each other,

2ρ c · c xn = − xnc xn−1 = −2ρ · c xn ⇒ c = −2ρ. n n n n Thus the following differential equation for f(x) must be satisfied

2ρ f 00(x) − xf 0(x) + 2ρf(x) = 0. (58) n Choosing x = az gives

2ρ f 00(az) − a2zf 0(az) + 2ρf(az) (n − 1) 1 d2f 2ρ 1 df = (az) − az (az) + 2ρf(az) = 0. a2 dz2 n a dz q n By setting g(z) = f(az) and choosing a = ρ a differential equation that matches the definition for the Hermite polynomials is found: g00(z) − 2zg0(z) + 2ng(z) = 0. (59)

By definition the solution to (59) is g(z) = bHn(z) where b is a constant. An exact expression for the constant a can be found using Lemma 2.7 (for the sake of convenience the lemma is stated and proved after this theorem). We get n n X X n(n − 1) x2 = a2z2 = 1 ⇒ a2 = 1, i i 2 i=1 i=1 and so s 2 a = . n(n − 1)

94 97

2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE

Thus condition (52) is satisfied when xi are the roots of r ! n(n − 1) P (x) = bH (z) = bH x . n n n 2

− n Choosing b = (2n(n − 1)) 2 gives Pn(x) with leading coefficient 1. This can be confirmed by calculating the leading coefficient of P (x) using the explicit expression for the Hermite polynomial (61). This completes the proof.

Lemma 2.7. Let xi, i = 1, 2, . . . , n be roots of the Hermite polynomial Hn(x). Then n X n(n − 1) x2 = . i 2 i=1

Proof. By letting ek(x1, . . . xn) denote the elementary symmetric polynomi- als Hn(x) can be written as

Hn(x) = An(x − x1) ··· (x − xn) n n−1 n−2 = An(x − e1(x1, . . . , xn)x + e2(x1, . . . , xn)x + q(x)) where q(x) is a polynomial of degree n − 3. Noting that n X 2 2 X xi = (x1 + ... + xn) − 2 xixj i=1 1≤i

Comparing the coefficients in the two expressions for Hn(x) gives n An = 2 ,

Ane1(x1, . . . , xn) = 0, n−2 Ane2(x1, . . . , xn) = −n(n − 1)2 . Thus by (60) n X n(n − 1) x2 = . i 2 i=1

95 98

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

k Theorem 2.4. The coefficients, ak, for the term x in Pn(x) given by (55) are given by the following relations 1 a = 1, a = 0, a = , n n−1 n−2 2 (k + 1)(k + 2) a = − a , 1 ≤ k ≤ n − 3. (62) k n(n − 1)(n − k) k+2

Proof. Equation (58) tells us that

1 1 P (x) = P 00(x) − xP 0 (x). (63) n 2ρ n n n

That an = 1 follows from the definition of Pn and an−1 = 0 follows from the Hermite polynomials only having terms of odd powers when n is odd and 1 even powers when n is even. That an−2 = 2 can be easily shown using the definition of Pn and the explicit formula for the Hermite polynomials (61). The value of the ρ can be found by comparing the xn−2 terms in (63)

1 1 a = n(n − 1)a + (n − 2)a . n−2 2ρ n n n−2

From this follows 1 −1 = . 2ρ n2(n − 1) Comparing the xn−l terms in (63) gives the following relation

1 1 a = (n − l + 2)(n − l)a + (n − l)a n−l 2ρ n−l+2 n−l n which is equivalent to

−(n − l + 2)(n − l + 1) a = a . n−l n−l+2 ln2(n − 1)

Letting k = n − l gives (62).

2.2.2 Further visual exploration on the sphere This section is based on Section 2.4 of Paper A

Visualization of the determinant v3(x3) on the unit sphere is straightforward, as well as visualizations for g3(x3, a) for different a. In three dimensions all points on the sphere can be mapped to the plane. In higher dimensions we need to reduce the set of visualized points somehow. In this section we provide visualizations for v4, . . . , v7 by using symmetry properties of the Vandermonde determinant.

96 99

2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE

Four dimensions

By Theorem 2.2 we know that the extreme points of v4(x4) on the sphere all lie in the hyperplane x1 + x2 + x3 + x4 = 0. The intersection of this 4 hyperplane with the unit sphere in R can be described as a unit sphere in 3 R , under a suitable basis, and can then be easily visualized. This can be realized using the transformation

−1 −1 0  √ 1/ 4 0 0  −1 1 0  √ x =    0 1/ 2 0  t. (64)  1 0 −1 √ 0 0 1/ 2 1 0 1

(a) Plot with t-basis given by (64). (b) Plot with θ and φ given by (34).

Figure 2.8: Plot of v4(x4) over points on the unit sphere.

The results of plotting the v4(x4) after performing this transformation can be seen in Figure 2.8. All 24 = 4! extreme points are clearly visible. From Figure 2.8 we see that whenever we have a local maxima we have a local maxima at the opposite side of the sphere as well, and the same for minima. This is due to the occurrence of the exponents in the rows of Vn. From equation (32) we have n(n−1) vn((−1)xn) = (−1) 2 vn(xn), and so opposite points are both maxima or both minima if n = 4k or + n = 4k + 1 for some k ∈ Z and opposite points are of different types + if n = 4k − 2 or n = 4k − 1 for some k ∈ Z . By Theorem 2.3 the extreme points on the unit sphere for v4(x4) is described by the roots of this polynomial 1 1 P (x) = x4 − x2 + . 4 2 48

97 100

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

The roots of P4(x) are: s s 1 r2 1 r2 x = − 1 + , x = − 1 − , 41 2 3 42 2 3 s s 1 r2 1 r2 x = 1 − , x = 1 + . 43 2 3 44 2 3

Five dimensions By Theorem 2.3 or 2.4 we see that the polynomials providing the coordinates of the extreme points have all even or all odd powers. From this it is easy to see that all coordinates of the extreme points must come in pairs xi, −xi. Furthermore, by Theorem 2.2 we know that the extreme points of v5(x5) on the sphere all lie in the hyperplane x1 + x2 + x3 + x4 + x5 = 0. 5 We use this to visualize v5(x5) by selecting a subspace of R that contains all points that have coordinates which are symmetrically placed on the real line, (x1, x2, 0, −x2, −x1). The coordinates in Figure 2.9 (a) are related to x5 by

−1 0 1  √   0 −1 1 1/ 2 0 0   √ x5 =  0 0 1  0 1/ 2 0  t. (65)   √  0 1 1 0 0 1/ 5 1 0 1

(a) Plot with t-basis given by (65). (b) Plot with θ and φ given by (34).

Figure 2.9: Plot of v5(x5) over points on the unit sphere.

The result, see Figure 2.9, is a visualization of a subspace containing 8 of the 120 extreme points. Note that to satisfy the condition that the

98 101

2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE coordinates should be symmetrically distributed pairs can be fulfilled in two other subspaces with points that can be described in the following ways: (x1, x2, 0, −x1, −x2) and (x2, −x2, 0, x1, −x1). This means that a transfor- mation similar to (65) can describe 3 · 8 = 24 different extreme points.

The transformation (65) corresponds to choosing x3 = 0. Choosing 5 another coordinate to be zero will give a different subspace of R which behaves identically to the visualized one. This multiplies the number of extreme points by five to the expected 5 · 4! = 120. By Theorem 2.3 the extreme points on the unit sphere for v5(x5) is described by the roots of this polynomial 1 3 P (x) = x5 − x3 + x. 5 2 80

The roots of P5(x) are: x51 = −x55, x52 = −x54, x53 = 0, s s 1 r2 1 r2 x = 1 − , x = 1 + . 54 2 5 55 2 5

Six dimensions

As for v5(x5) we use symmetry to visualize v6(x6). We select a subspace of 6 R with all symmetrical points (x1, x2, x3, −x3, −x2, −x1) on the sphere. The coordinates in Figure 2.10 (a) are related to x6 by

−1 0 0   0 −1 0   √    1/ 2 0 0  0 0 −1 √ x6 =    0 1/ 2 0  t. (66)  0 0 1  √   0 0 1/ 2  0 1 0  1 0 0

In Figure 2.10 there are 48 visible extreme points. The remaining ex- treme points can be found using arguments analogous the five-dimensional case.

By Theorem 2.3 the extreme points on the unit sphere for v6(x6) is described by the roots of this polynomial

1 1 1 P (x) = x6 − x4 + x2 − . 6 2 20 1800

99 102

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

(a) Plot with t-basis given by (66). (b) Plot with θ and φ given by (34).

Figure 2.10: Plot of v6(x6) over points on the unit sphere.

The roots of P6(x) are: x61 = − x66, x62 = −x65, x63 = −x64, 3 1 4  √  1 1  2 (−1) 3 3 3 x64 = √ 10i − 10 z6w + z6w 2 15 6 6 1 r √ √  = √ 10 − 2 10 3l6 − k6 , (67) 2 15 1 1 4  √  1 1  2 (−1) 3 3 3 x65 = √ −10i − 10 z6w + z6w 2 15 6 6 1 r √ √  = √ 10 − 2 10 3l6 + k6 , (68) 2 15 1  1 √  1 1   2 x = 3 10 w 3 + w 3 + 5 66 30 6 6 r 1  √  = 2 10 · k + 5 , (69) 30 6 √ √ z6 = 3 + i, w6 = 2 + i 6 !! !! 1 r3 1 r3 k = cos arctan , l = sin arctan . 6 3 2 6 3 2

100 103

2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE

Seven dimensions

As for v6(x6) we use symmetry to visualize v7(x7). We select a subspace of 7 R that contains all symmetrical points (x1, x2, x3, 0, −x3, −x2, −x1) on the sphere.

The coordinates in Figure 2.11 (a) are related to x7 by

−1 0 0   0 −1 0     √   0 0 −1 1/ 2 0 0   √ x7 =  0 0 0   0 1/ 2 0  t. (70)   √  0 0 1  0 0 1/ 2    0 1 0  1 0 0

(a) Plot with t-basis given by (70). (b) Plot with θ and φ given by (34).

Figure 2.11: Plot of v7(x7) over points on the unit sphere.

In Figure 2.11 48 extreme points are visible just like it was for the six- dimensional case. This is expected since the transformation corresponds 7 to choosing x4 = 0 which restricts us to a six-dimensional subspace of R which can then be visualized in the same way as the six-dimensional case. The remaining extreme points can be found using arguments analogous the five-dimensional case.

By Theorem 2.3 the extreme points on the unit sphere for v4 is described by the roots of this polynomial

1 5 5 P (x) = x7 − x5 + x3 − x. 7 2 84 3528

101 104

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

The roots of P7(x) are: x71 = − x77, x72 = −x76, x73 = −x75, x74 = 0, 3 1 4  √  1 1  2 (−1) 3 3 3 x75 = √ 14i − 14 z6w + z6w 2 21 6 6 1 r √ √  = √ 14 − 2 14 3l6 − k6 , (71) 2 21 1 1 4  √  1 1  2 (−1) 3 3 3 x76 = √ −14i − 14 z6w + z6w 2 21 6 6 1 r √ √  = √ 14 − 2 14 3l7 + k7 , (72) 2 21 r 1 1 √  1 1   2 x = 3 14 w 3 + w 3 + 5 77 42 6 6 r 1  √  = 2 14k + 5 , (73) 42 7 √ √ z6 = 3 + i, w6 = 2 + i 10 !! 1 r5 k = cos arctan , 7 3 2 !! 1 r5 l = sin arctan . 7 3 2

102 105

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL

2.3 Extreme points of the Vandermonde determi- nant on some surfaces implicitly defined by a univariate polynomial

This section is based on Paper C

In this section the objective is to find the extreme points of the Vander- monde determinant on a surface implicitly defined by n m X X i gR(x) = R(xi) = 0, where R(x) = rix , ri ∈ R. (74) i=1 i=0

Lemma 2.8. The problem of finding the extreme points of the Vandermonde determinant on the surface defined by gR(x) = 0 can be rewritten as an ordinary differential equation of the form f 00(x) − 2ρR0(x)f 0(x) − P (x)f(x) = 0 (75) that has a unique (up to a multiplicative constant) polynomial solution, f, and any permutation of the roots of f will give the coordinates of a critical point of the Vandermonde determinant. Proof. Using the method of Lagrange multipliers we get n ∂vn ∂gR X vn(x) = λ ⇔ = λR0(x ) ∂x ∂x x − x j j j i=1 j i i6=j for some λ ∈ R. If we only consider this expression in a single point we can consider vn(x) as a constant value and then the expression can be rewritten as n X 1 = ρR0(x ) (76) x − x j i=1 j i i6=j where ρ is some unknown constant. Consider the polynomial n Y f(x) = (x − xi) i=1 and note that 00 n 1 f (xj) X 1 = . (77) 2 f 0(x ) x − x j i=1 j i i6=j

103 106

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

In each critical point we can combine (76) and (77) thus in each of the extreme points we will have the relation 00 0 0 f (xj) − 2ρR (xj)f (xj) = 0, j = 1, 2, . . . , n for some ρ ∈ R. Since each xj is a root of f(x) we see that the left hand side in the differential equation must be a polynomial with the same roots as f(x), thus we can conclude that for any x ∈ R f 00(x) − 2ρR0(x)f 0(x) − P (x)f(x) = 0 (78) where P (x) is a polynomial of degree m − 2. Using this technique it is also easy to find the coordinates on a sphere translated in the (1,..., 1) direction.

Corollary 2.1. If x = (x1, x2, . . . , xn) is a critical point of the Vander- n monde determinant on a surface S ⊂ C then (x1 + a, x2 + a, . . . , xn + a) is a critical point of the Vandermonde determinant on the surface {x + a1 ∈ n C |x ∈ S}. Proof. Follows immediately from Y vn (x1 + a, x2 + a, . . . , xn + a) = (xj + a − xi − a) 1≤i

In several cases it is possible to find the extreme points by identifying the unknown parameters, ρ and the coefficients of P (x), by comparing the terms in (75) with different degrees and solving the resulting equation system. We will discuss the cases in the upcoming sections.

2.3.1 Critical points on surfaces given by a first degree uni- variate polynomial n X When R(x) = r1x + r0 the surface defined by R(xi) = 0 will always be i=1   a plane with normal (1, 1,..., 1) through the point r0 , r0 ,..., r0 . r1 r1 r1 Since     r0 r0 r0 Y r0 r0 vn x1 + , x2 + , . . . , xn + = xj + − xi − r1 r1 r1 r1 r1 1≤i

104 107

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL

2.3.2 Critical points on surfaces given by a second degree univariate polynomial

1 2 1 2 2  Surfaces defined by letting R(x) = 2 x + r1x + r0 = 2 (x + r1) − r1 + 2r0 r  2  r1 will all be spheres around (−r1, −r1,..., −r1) with radius n 2 − r0 . Thus the critical points can be found by a small modification of the technique used on the unit sphere described in Section 2.2. n X 1 Theorem 2.5. On the surface defined by g(x) = x2 + r x + r the 2 i 1 i 0 i=1 coordinates of the critical points of the Vandermonde determinant are given by the roots of

  1 ! n − 1 2 (x + r1) f(x) = Hn 2 2(r1 − 2r0) 2 n n−2i b 2 c i   n−2i X (−1) n − 1 2 (x + r1) = n! i! 2(r2 − 2r ) (n − 2i)! i=0 1 0 where Hn denotes the nth (physicist) Hermite polynomial.

1 2 Proof. Since R(x) = 2 x + r1x + r0 the differential equation (75) will be of the form 00 0 f (x) − 2ρ(x + r1)f (x) − p0f(x) = 0.

By considering the terms with degree n it is easy to see that p0 = −2ρn and thus we get 00 0 f (x) − 2ρ(x + r1)f (x) + 2ρnf(x) = 0.

1 y Setting y = ρ 2 (x + r1) gives x = 1 − r1 and by considering the function ρ 2   y g(y) = f 1 − r1 we can rewrite the differential equation as follows ρ 2 d2g ry  dg − 2ρ − r + r + 2 ρ n g(y) = 0 dx2 ρ 1 1 dx y 1 00 2 0 ⇔ ρ g (y) − 2 ρ 1 ρ g (x) + 2 ρ n g(y) = 0 ρ 2 ⇔ g00(y) − 2 y g0(x) + 2 n g(y) = 0. (79)

Equation (79) defines a class of called the Hermite 1 polynomials [2], Hn(y). Thus f(x) = cHn(ρ 2 (x + r1)) for some arbitrary constant c. To find the value of ρ we can exploit some properties of the roots of the Hermite polynomials.

105 108

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

If we let yi, i = 1, . . . , n be the roots of Hn(y). On page 95 we show that these roots have the following properties n X yi = 0, (80) i=1 n X n(n − 1) y2 = . (81) i 2 i=1 y We now take the change of variables x = 1 − r1 into consideration and ρ 2 get n n !2 n ! X X yi 1 X xi = 1 − r1 = 1 yi − nr1, i=1 i=1 ρ 2 ρ 2 i=1 n n !2 n n ! n ! X X yi X 1 X 2r1 X x2 = − r = y2 − y + nr2. i 1 1 ρ i 1 i 1 i=1 i=1 ρ 2 i=1 i=1 ρ 2 i=1

Using (80) and (81) we can simplify these expression n X xi = −nr1, i=1 n X n(n − 1) x2 = + nr2. i 2ρ 1 i=1

This allow us to rephrase the constraint g(x) = 0 as follows n X 1 n(n − 1) nr2 g(x) = x2 + r x + r = − 1 + nr = 0 2 i 1 i 0 4ρ 2 0 i=1 and from this it is easy to find an expression for ρ n − 1 ρ = 2 . 8(r1 − 2r0)

Thus the coordinates of the extreme points are the roots of the polynomial given in Theorem 2.5.

Remark 2.3. Note that Remarks 2.1 and 2.2 apply in this case as well. For more details and demonstrations of how to visualize this result see Section 2.2.2.

106 109

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL

2.3.3 Critical points on the sphere defined by a p-norm n Definition 2.2. The p−norm of x ∈ R denoted kxkp is defined as

1 n ! p X p kxkp = |xi| , for p > 0. (82) i=1 n Definition 2.3. The infinity norm of x ∈ R denoted kxk∞ is defined as kxk∞ = sup{|xi| : 1 ≤ i ≤ n}. (83) n−1 Definition 2.4. The sphere defined by the p-norm, denoted Sp (r), for n positive integer p, is the set of all x ∈ R such that n X p p p |xi| = kxkp = r . (84) i=1 When r = 1 this is the unit sphere defined by the p-norm, denoted simply n−1 n−1 Sp . When p increases the points on Sp approach the points on the cube n−1 so for convenience we define S∞ as the cube defined by the boundary of [−1, 1]n. Spheres defined by p-norms describes many well-known geometric shapes. 1 2 2 2 2 For instance when n = 2, p = 2, then S2 (r) = {(x1, x2) ∈ R : x1 + x2 = r } is a circle and when n = 3, p = 2, then

1 2 2 2 2 2 S3 (r) = {(x1, x2, x3) ∈ R : x1 + x2 + x3 = r } is the standard 2-sphere with radius r. In the previous section we discussed how the extreme points of the Vandermonde determinant are distributed for the case p = 2 and n ≥ 2. In this section we will examine how the extreme points of the Vandermonde determinant are distributed on the sphere defined by the p-norm for the cases p ∈ {4, 6, 8} for a few different values of n. In n−1 Figure 2.12 Sp for p = 2, p = 4, p = 6, p = 8, and p = ∞ with a section cut out are illustrated. Similarly to the previous section we will construct a polynomial whose roots give the coordinates of the extreme points of the Vandermonde deter- minant. First we will consider the case p = 4, n = 4.

2.3.4 The case p = 4 and n = 4 We will illustrate the construction of a polynomial that has the coordinates of the points as roots with the case p = 4, n = 4. If we denote the poly- 4 nomial whose roots give the coordinates with P4 (x) and use the same type of argument that was used to get equation (75). Taking P (x) to be of the form: n n−2 n−4 P (x) = x + cn−2x + cn−4x + ··· (85)

107 110

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

2 Figure 2.12: Illustration of Sp for p = 2, p = 4, p = 6, p = 8, and p = ∞ with a section cut out. The outer cube corresponds to p = 0 and p = 2 corresponds to the sphere in the middle. with every other coefficient zero, when n is even of we have even powers and when n is odd we have odd powers. By identifying the powers in the differential equation (75) for the case p = 4:

00 3 0 2 P (x) + ρnx P (x) + (σnx + τnx + νn)P (x) = 0, (86) we obtain that τnxP (x) does not share any powers with any other part of the equation and thus τn = 0. Similarly, identifying the coefficients we obtain pρn + σn = 0. This leads us to the differential equation

00 3 0 2 P (x) + ρnx P (x) + (−pρnx + νn)P (x) = 0. (87) Basing on (85) and (87), and setting n = 4, p = 4 we get to generate the system of n n−1 X p Sp = xi = 1, i=1 4 n n−2 n−4 P4 (x) = x + cn−2x + cn−4x + ··· , 00 0 4 3 4 2 4 P4 (x) + ρnx P4 (x) + (−pρnx + νn)P4 (x) = 0. It follows that 4 4 X 4 4 4 4 4 S4 = xi = x1 + x2 + x3 + x4 = 1, i=1 0 00 4 4 2 4 3 4 2 P4 (x) = x + c2x + c0 ⇒ P4 (x) = 4x + 2c2x ⇒ P4 (x) = 12x + 2c2, and by substitution into the differential equation

2 3 3 2 4 2 (12x + 2c2) + ρnx (4x + 2c2x) + (−pρnx + νn)(x + c2x + c0) = 0, 4 2 (ν − 2ρc2)x + (νc2 − 4ρc0 + 12)x + (2c2 + c0ν) = 0.

108 111

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL

Equating corresponding coefficients as in P (x) we get:

ν − 2ρc2 = 1,

νc2 − 4ρc0 + 12 = c2,

2c2 + c0ν = c0.

2 4 Setting t = x we can express S3 and P (x) as follows:

4 4 2 2 X 2 S3 = 2t1 + 2t2 = 2 ti = 1 i=1 4 4 2 2 P4 (x) = x + c2x + c0 = t − (t0 + t1)t + t0t1 = 0. 4 Equating coefficient in P4 (x) gives t0 + t1 = c2, t0t1 = c0 2 2 2 ⇒t0t1 + t1 = c2t1 ⇒ c0 + t1 = c2t1 ⇒ t1 = c2t1 − c0 2 2 2 ⇒t0 + t0t1 = c2t0 ⇒ t0 + c0 = c2t0 ⇒ t0 = c2t0 − c0 4 2 2 2 X 2 2 ⇒t0 + t1 = c2(t0 + t1) − 2c0 = c2 − 2c0 ⇒ 2 ti = 2(c2 − 2c0) = 1 i=1 This now gives a fourth equation so as to solve the system:

ν − 2ρc2 = 1, (88)

νc2 − 4ρc0 + 12 = c2, (89)

2c2 + c0ν = c0, (90) 2 2(c2 − 2c0) = 1. (91)

From (88) we obtain ν = 1 + 2ρc2 and substituting this into (89) gives 2  c2(1 + 2ρc2) − 4ρc0 + 12 = c2 ⇒ ρ 2(c2 − 2c0) = −12 ⇒ ρ = −12.

To get the last equality use (91) and the fact that c2 6= 0. Using this value in the expression for ν we obtain ν = 1 − 24c2 and substituting this value into (90) gives 1 2c + c (1 − 24c ) = c ⇒ 2c (1 − 12c ) = 0 ⇒ 1 − 12c = 0 ⇒ c = , 2 0 2 0 2 0 0 0 12 where the last equality follows from c2 6= 0. Now with ρ = −12, c0 = 1/12, using (91) we obtain

2 1 2 8 2 2(c − 2c0) = 1 ⇒ c2 = + = ⇒ c2 = √ 2 2 12 12 6

Therefore we obtain P 4(x) = x4 − √2 x2 + 1 . 4 6 12 In Section 2.3.5 we will generalise this technique somewhat.

109 112

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

2.3.5 Some results for even n and p In this section we will discuss the case when n and p are positive and even integers, and n > p. We will discuss a method that can give the coordinates n−1 extreme points of the Vandermonde determinant constrained to Sp , as defined in (84), as the roots of a polynomial. First we will examine how this optimisation problem can be rewritten as a differential equation similar to (86).

Lemma 2.9. Let n and p be even positive integers. Consider the unit sphere given by the p - norm, in other words the surface given by ( ) n p n X p Sn = (x1, . . . , xn) ∈ R xi = 1 . i=1 There exists a second order differential equation a P p00(x) − p−2 xp−1P p0(x) + Qp (x)P p(x) = 0, (92) n n n n n p p where Pn (x) and Qn(x) are of the forms

1 2 n−1 p 2n X 2i Pn (x) = x + c2ix i=0 and 1 2 p−2 p p−2 X i 2i Qn(x) = −ap−2x + (−1) a2ix . i=0 p p There is also a relation between the coefficients of Pn and Qn given by

j−1 ! X n + p − 2j 2j(2j − 1)c + a c + a c = 0 (93) 2j 2k 2(j−k−1) n p−2 2j−p k=0 n+p−2 for 1 ≤ j ≤ 2 where cn = 1, ck = 0 for k 6∈ {0, 2, 4, . . . , n} and ak = 0 for k 6∈ {0, 2, 4, . . . , p − 2}.

Proof. This result is proved analogously to how (75) is found. Define n p Y Pn (x) = (x − xi) i=1 and note that p00 n 1 Pn (x) X 1 = . 2 p0 x − x Pn (x) i=1 i

110 113

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL

Now apply the method of Lagrange multipliers and see that in the critical points n X 1 = ρR0(x ) x − x j i=1 j i i6=j where ρ is some unknown constant. In each critical point we can combine the two expressions and get p00 0 p0 Pn (xj) − 2ρR (xj)Pn (xj) = 0, j = 1, 2, . . . , n for some ρ ∈ R. Since each xj is a root of f(x) we see that the left hand side in the differential equation must be a polynomial with the same roots p as Pn (x), thus we can conclude that for any x ∈ R p00 0 p0 Pn (x) − 2ρR (x)Pn (x) − Q(x)f(x) = 0 where Q(x) is a polynomial of degree p − 2. By applying the principles of polynomial solutions to linear second order differential equation [10, 50], expanding the expression and matching the coefficients of the terms with different powers of x you can see that the coefficients of P (x) and Q(x) must obey the relation given in (93).

Noting that the relations between the two sets of coefficients are lin- ear we will consider the equations given by (93) corresponding to j ∈ n n−2 n n+p−2 o 2 , 2 ,..., 2 , the corresponding system of equations in matrix form becomes

 p      cn−2 cn−4 cn−6 ··· c4 n cn−p−2 a0 −n(n − 1) p−2  1 cn−2 cn−4 ··· c6 n cn−p   a2   0   p−4       0 1 cn−2 ··· c8 cn−p+2  a4   0   n    =   .  ......   .   .   ......   .   .   4       0 0 0 ··· cn−2 n cn−4  ap−4  0  2 0 0 0 ··· 1 n cn−2 ap−2 0 (94) n+p−2 By solving this system we can reduce the 2 equations given by n−2 matching the terms to 2 equations that together with the condition given by (84) gives a system of polynomial equations that determines all the un- known coefficients of P (x). To describe how we can express the solution to (94) we will use a few well- known relations between elementary symmetric polynomials and power sums often referred to as the Newton–Girard formulae (Theorem 2.7), and Vieta’s formula (Theorem 2.6) that describes the relation between the coefficients of a polynomial and its roots. Here we will give some useful properties of elementary symmetric poly- nomials and power sums and relations between them.

111 114

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Definition 2.5. The elementary symmetric polynomials are defined by n X e1(x1, . . . , xn) = xi, i=1 X e2(x1, . . . , xn) = xi1 xi2 , 1≤i1

Theorem 2.6 (Vieta’s formula). Suppose x1, . . . , xn are the n roots of a polynomial n n−1 x + c1x + ... + cn. k Then ck = (−1) ek(x1, . . . , xn). Definition 2.6. A power sum is an expression of the form n X k pk(x1, . . . , xn) = xi . i=1 Theorem 2.7 (Newton–Girard formulae). The Newton–Girard formulae can be expressed in many ways. For us the most useful version is the de- terminantal expressions. Let ek = ek(x1, . . . , xn) and pk = pk(x1, . . . , xn) denote the elementary symmetric polynomials and the power sums as in Definitions 2.5 and 2.6. Then the power sum can be expressed in terms of elementary symmetric polynomials in this way

e1 1 0 ··· 0 0

2e2 e1 1 ··· 0 0

3e3 e2 e1 ··· 0 0 p = . k ......

(p − 1)en−1 en−2 en−3 ··· e1 1

pen en−1 en−2 ··· e2 e1 Proof. See for example [198].

112 115

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL

Lemma 2.10. Using the following notation 2m cm cm−1 cm−2 ··· c2 n c1 2m−2 1 cm cm−1 ··· c3 n c2 2m−4 0 1 cm ··· c4 c3 t (c , c , . . . , c ) = n (95) n 1 2 m ...... 4 0 0 0 ··· cm n cm−1 2 0 0 0 ··· 1 n cm 2c and tn(c) = then tn can be written n p−1 X n(r1 + r2 + ··· + rn − 1) Y ri tn(c1, . . . , cp) = cp−i r1!r2! ··· rn! r1+2r2+3r3+···+nrn=n i=0 r1≥0, ..., rn≥0 and it obeys the recursive relation p 2p X t (c , . . . , c ) = c − c t (c , . . . , c ). n 1 p n 1 i+1 n p−i+2 p i=2

Proof. Comparing the expression for tn with the relations given in Theo- rem 2.7 it is clear that these relations are equivalent to the Newton-Girard formulae with some minor modifications.

Lemma 2.11. For even n and p the condition (84) can be rewritten as

−n tn(cn−p−2, cn−p, . . . , cn−2) = 1 where tn is defined by (95). n X p Proof. Note that the expression gp(x1, . . . , xn) = xi = 1 is a power sum. 1 By Theorem 2.7 the following relation holds:

e1 1 0 ··· 0 0

2e2 e1 1 ··· 0 0

3e3 e2 e1 ··· 0 0 g (x) = p ......

(p − 1)en−1 en−2 en−3 ··· e1 1

pen en−1 en−2 ··· e2 e1 where ek is the k:th elementary symmetric polynomial of x1, ... , xn. Using Vieta’s formula we can relate the elementary symmetric polynomials to the coefficients of P (x) by noting that n 2 −1 n 2n X 2j X k n−k P (x) = x + c2jx = (−1) ekx j=1 k=1 or more compactly e2k = cn−2k.

113 116

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

With e2k = cn−2k and e2k+1 = 0 we get

0 1 0 ··· 0 0

2cn−2 0 1 ··· 0 0

0 cn−2 0 ··· 0 0

4cn−4 0 cn−2 ··· 0 0 gp(x) = ......

0 cn−p−2 0 ··· 0 1

pcn−p 0 cn−p−2 ··· cn−2 0

Using on every other row gives

0 1 0 0 ··· 0 0 2cn−2 1 0 ··· 0 0 2cn−2 0 1 0 ··· 0 0 0 0 1 ··· 0 0 0 cn−2 0 1 ··· 0 0 4cn−4 cn−2 0 ··· 0 0 g (x) = 4cn−4 0 cn−2 0 ··· 0 0 = − p ...... 0 0 c4 ··· 0 1 0 c2 0 c4 ··· 0 1 pc0 c2 0 ··· cn−2 0 pcn−p 0 c2 0 ··· cn−2 0

2cn−2 1 0 0 ··· 0 0 2cn−2 1 0 ··· 0 0

4cn−4 cn−2 1 0 ··· 0 0 4cn−4 cn−2 1 ··· 0 0

0 0 0 1 ··· 0 0 6cn−6 cn−4 cn−2 ··· 0 0 = = − ......

0 0 0 0 ··· 0 1 (p − 2)c2 c4 c6 ··· cn−2 1

pc0 c2 c4 c6 ··· cn−2 0 pc0 c2 c4 ··· en−4 cn−2 p cp cp−1 cp−2 ··· c2 n c1 p−2 1 cp cp−1 ··· c3 n c2 p−4 0 1 cp ··· c4 n c3 p 2 = −n ...... = (−1) n tn(c2, c4, . . . , cp) ...... 4 0 0 0 ··· cp n cp−1 2 0 0 0 ··· 1 n cp

Thus gp(x1, . . . , xn) = 1 is equivalent to −ntn(c2, c4, . . . , cp) = 1.

Lemma 2.12. The coefficients of the polynomial Q(x) in (92) can be ex- pressed using the coefficients of P (x) as follows p a = (−1)k+1n2(n − 1)t (c , . . . , c ), k = 1, 2,..., . (96) 2k−2 n n−p+2k+2 n−2 2

114 117

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL

Proof. By (94) we can write    p −1   a0 cn−2 cn−4 cn−6 ··· cn−p−4 n cn−p−2 −n(n − 1) p−2  a2   1 cn−2 cn−4 ··· cn−p−6 n cn−p   0     p−4     a4   0 1 cn−2 ··· cn−p−8 cn−p+2  0    =  n    .  .   ......   .   .   ......   .     4    ap−4  0 0 0 ··· cn−2 n cn−4   0  2 ap−2 0 0 0 ··· 1 n cn−2 0 det(Tn,p,k) and using Cramer’s rule we get ap−2k = where tn(cn−p−2, . . . , cn−2)  p  cn−2 cn−4 ··· cn−2k+2 −n(n − 1) cn−2k−2 ··· n cn−p−2 p−2  1 cn−2 ··· cn−2k 0 cn−2k−4 ··· cn−p   n   0 1 ··· c 0 c ··· p−4 c   n−2k−2 n−2k−6 n n−p+2 Tn,p,k =  ......  .  ...... ··· .   4   0 0 ··· 0 0 0 ··· n cn−4  2 0 0 ··· 0 0 0 ··· n cn−2 | {z } M By moving the kth column to the first column and using Laplace expansion det(Tk) can be rewritten on the form

1 cn−2 ··· cn−2k

0 1 ··· cn−2k−2

......

0 0 ··· 1 k det(Tn,p,k) =(−1) n(n − 1) 0 0 ··· 0 M = −n(n − 1)|M|

0 0 ··· 0

......

0 0 ··· 0

0 0 ··· 0 p−2k cn−2 ··· cn−p+2k cn−p+2k+2 n 1 ··· c p−2k−2 c n−p+2k+2 n n−p+2k+4 . . . . = − n(n − 1) . .. . . 4 0 ··· cn−2 n cn−4 2 0 ··· 1 n cn−2 k =(−1) n(n − 1)tn(cn−p+2k+2, . . . , cn−2) −1 We can use Lemma 2.11 to see that t (c , . . . , c ) = and thus n n−p−2 n−2 n det(Tn,p,k) k+1 2 ap−2k = = (−1) n (n − 1)tn(cn−p+2k+2, . . . , cn−2) tn(cn−p−2, . . . , cn−2)

115 118

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

p Theorem 2.8. The non-zero coefficients, c2k, in Pn that solves (92) can be found by solving the polynomial equation system given by

j−1 ! X p−2k+1 2 2j(2j − 1)c2j + (−1) n (n − 1)tn(cn−2k+2, . . . , cn−2) k=0

+ n(n − 1)(n + p − 2j)tn(cn−p+4, . . . , cn−2) = 0, n for j = 0,..., 2 − 1.

Proof. The equation system is found by using (96) to substitute ak in (93).

Using Lagrange multipliers directly gives a polynomial equation system n with n equations while Theorem 2.8 gives 2 equations. As an example we can consider the case n = 8, p = 4. Matching the coefficients for (92) gives the system  a c + 2c = 0,  0 0 0   a0c2 + a2c0 + 12c4 = 0,   3 30c + a c + a c = 0, 6 0 4 4 2 2  1  56 + a c + a c = 0,  0 6 2 2 4   1  a + a c = 0, 0 4 2 6 7 2 and rewriting the constraint that the points lie on S4 gives 2c6 − 4c4 = 0. In this case the expressions for a0 and a2 becomes quite simple ( a0 = −112c6, a2 = 448. By resubstituting the expressions into the system, or using Theorem 2.8 directly an equation systems for the c0, c2, c4 and c6 is given by   112c0c6 + 2c0 = 0,  −112c2c6 + 448c0 + 12c4 = 0, −112c c + 332c + 30c = 0,  4 6 2 6  2  −2c6 + 4c4 + 1 = 0. The authors are not aware of any method that can be used to easily and reliably solve the system given by Theorem 2.8. In Table 2.2 results for a number of systems, both with even and odd n and various values for p are given. These were found by manually experimentation combined with computer aided symbolic computations.

116 119

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL

n = 2 √ 2 3 2 2 1 2 2 1 2 2 2 3 2 2 2 4 P2 (x) = x − 2 , P4 (x) = x − 2 2, P6 (x) = x − 2 , P8 (x) = x − 2 n = 3 √ 2 3 3 3 1 4 3 1 3 3 2 3 3 2 2 4 P2 (x) = x − 2 x, P3 (x) = x − 2 2x, P6 (x) = x − 2 x, P8 (x) = x − 2 n = 4 √ P 4(x) = x4 − 1 x2 + 1 , P 4(x) = x4 − 6 x2 + 1 , 2 2 √ 48 4 √3 √12 4 4 1 1 2 1  2 P (x) = x − ( 33 + 1) 3 x + 9 − 33 ( 33 + 1) 3 6 √4 96 √ 1 √ p √ 4 4 3 4 2 1  P8 (x) = x − 6 (30 5 − 30) x + 120 5 − 5 30 5 − 30 n = 5

√ 1 2 P 5(x) = x5 − 1 x, P 5(x) = x5 − 2 5 x3 + 3 x, P 5(x) = x5 − 10 3 x3 + 10 3 x 2 √4 4 5 20 6 2 20 √ 1 √ p √ 5 5 10 4 3 1  P8 (x) = x − 10 (50 13 + 10) x + 1800 5 13 − 55 50 13 + 10 n = 6 6 6 1 4 1 2 1 P2 (x) = x − 2 x + 20 x − 1800 √ √ √ √ √ √ 6 6 50+20 5 4 5 2 (−4+2 5) 50+20 5 P4 (x) = x − 10 x + 10 x − 600 n = 7 7 7 1 5 5 3 5 P2 (x) = x − 2 x + 84 x − 3528 √ √ √ √ √ √ 7 7 1050+84 109 5  1 109  3 (−16+2 109) 105+84 109 P4 (x) = x − 42 x + 21 + 42 x − 10584 n = 8 8 8 1 6 15 4 15 2 15 P2 (x) = x − 2 x + 224 x − 6272 x + 1404928 , √ √ √ 8 8 140+42 6 6  3 3 6  4 P4 (x) = x − 14 x + 28 + 28 x  √ 3 √ √  √ −(140+42 6) 2 29 140+42 6 2 3 6 − 16464 + 2352 x − 3136 + 1568 n Table 2.2: Polynomials, Pp , whose roots give the coordinates of the extreme points of the Vandermonde determinant on the sphere defined by the p-norm in n dimensions.

117 120

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

2.3.6 Some results for cubes and intersections of planes n−1 It can be noted that when p → ∞ then Sp as defined in the previous section will converge towards the cube. A similar technique to the described technique for surfaces implicitly defined by a univariate polynomial can be employed on the cube. The maximum value for the Vandermonde determinant on the cube [−1, 1]n has been known for a long time (at least since [90]). Here we will show a short derivation.

Theorem 2.9. The coordinates of the critical points of vn(x) on the cube n xn ∈ [−1, 1] are given by x1 = −1, xn = 1 and xi equal to the ith root of Pn−2(x) where Pn are the Legendre polynomials n X n n+k−1  P (x) = 2n xk 2 n k n k=0 or some permutation of them. Proof. It is easy to show that the coordinates −1 and +1 must be present in the maxima points, if they were not then we could rescale the point so that the value of vn(x) is increased, which is not allowed. We may thus assume the ordered sequence of coordinates

−1 = x1 < ··· < xn = +1. The Vandermonde determinant then becomes n−1 Y Y vn(x) = 2 (1 + xi)(1 − xi) (xj − xi). i=2 1

118 121

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL and thus the left hand side of the expression must form a polynomial that can be expressed as some multiple of f(x) (1 − x2)f 00(x) − 2xf 0(x) − σf(x) = 0. (97) The constant σ is found by considering the coefficient for xn−2: (n − 2)(n − 3) + 2(n − 2) − σ = 0 ⇔ σ = (n − 2)(n − 1). This gives us the differential equation that defines the Legendre polynomial Pn−2(x) [2]. The technique above can also easily be used to find critical points on the intersection of two planes given by x1 = a and xn = b, b > a.

Theorem 2.10. The coordinates of the critical points of vn(x) on the in- tersection of two planes given by x1 = a and xn = b are given by xn−1 = a,  x−a  xn = b and xi is the ith root of Pn−2 b−a where Pn are the Legendre polynomials n X n n+k−1  P (x) = 2n xk 2 n k n k=0 or some permutation of them. Proof. We assume the ordered sequence of coordinates

−1 = x1 < ··· < xn = +1. The Vandermonde determinant then becomes n−1 Y Y vn(x) = (b − a) (a − xi)(b − xi) (xj − xi). i=2 1

119 122

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions and thus the left hand side of the expression must form a polynomial that can be expressed as some multiple of f(x) (x − a)(x − b)f 00(x) + (2x − a − b)f 0(x) − σf(x) = 0. The constant σ is found by considering the coefficient for xn−2: (n − 2)(n − 3) + 2(n − 2) − σ = 0 ⇔ σ = (n − 2)(n − 1). The resulting differential equation is (x − a)(x − b)f 00(x) + (2x − a − b)f 0(x) − (n − 2)(n − 1)f(x) = 0. x−a If we change variables according to y = b−a and let g(y) = f(y(b − a) + a) then the differential equation becomes y(y − 1)g00(y) + (2y − 1)g0(y) − (n − 1)(n − 2)g(y) = 0 which we can recognize as a special case of Euler’s hypergeometric differen- tial equation whose solution can be expressed as g(y) = c ·2 F1(1 − n, n + 2; 1; y), for some arbitrary c ∈ R, where 2F1 is the hypergeometric function [2]. In this case the hypergeometric function is a polynomial and relates to the Legendre polynomials as follows

2F1(1 − n, n + 2; 1; y) = n!Pn−2(y)

 x−a  thus it is sufficient to consider the roots of Pn−2 b−a .

2.3.7 Optimising the probability density function of the eigenvalues of the Wishart matrix This section is based on Section 5 of Paper D

Here we will show an example of how the results in Section 2.2 can be applied to find the extreme points of the eigenvalue distribution of the ensembles discussed in Section 1.1.7. Lemma 2.13. Suppose we have a Wishart distributed matrix W with the probability density function of its eigenvalues given by n ! β X (λ) = C v (λ)m exp − P (λ ) (98) P n n 2 k k=1 where Cn is a normalising constant, m is a positive integer, β > 1 and P is a polynomial with real coefficients. Then the vector of eigenvalues of W will lie on the surface defined by n X P (λk) = Tr(P (W)). (99) k=1

120 123

2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL

Proof. Since W is symmetric by Lemma 1.2 then it will also have real eigen- values. By Lemma 1.1 n X P (λk) = Tr(P (W)) k=1 and thus the point given by λ = (λ1, λ2, . . . , λn) will be on the surface defined by n X P (λk) = Tr(P (W)). k=1

To find the maximum values we can use the method of Lagrange multi- pliers and find eigenvectors such that

n ! ∂P ∂ X dP (λk) = η Tr(P (W)) − P (λk) = −η , k = 1, . . . , n, ∂λk ∂λk dλk k=1 where η is some real-valued constant. Computing the left-hand side gives   (β) n ∂P β dP (λk) X m = P(λ) − +  . ∂λk  2 dλk λk − λi  i=1 i6=k Thus the stationary points of (98) on the surface given by (99) are the solution to the equation system   n β dP (λk) X m dP (λk) P(λ) − +  = −η , k = 1, . . . , n.  2 dλk λk − λi  dλk i=1 i6=k

If we denote the value of P in a stationary point with Ps then the system above can be rewritten as n   X 1 1 β η dP (λk) dP (λk) = − = ρ , k = 1, . . . , n. (100) λk − λi m 2 Ps dλk dλk i=1 i6=k The equation system described by (100) appears when one tries to op- timize the Vandermonde determinant on a surface defined by a univariate polynomial. This equation system can be rewritten as an ordinary differen- tial equation. For more details see Section 2.3 Consider the polynomial n Y f(λ) = (λ − λi) i=1

121 124

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions and note that 00 n 1 f (λj) X 1 = . 2 f 0(λ ) λ − λ j i=1 j i i6=j Thus in each of the extreme points we will have the relation

2 d f dP df − 2ρ = 0, j = 1, 2, . . . , n dλ2 dλ dλ λ=λj λ=λj λ=λj for some ρ ∈ R. Since each λj is a root of f(λ) we see that the left hand side in the differential equation must be a polynomial with the same roots as f(λ), thus we can conclude that for any λ ∈ R d2f dP df − 2ρ − Q(λ)f(λ) = 0 (101) dλ dλ dλ where Q is a polynomial of degree (deg(p) − 2). Consider the β ensemble described by (16). For this ensemble the poly- nomial that defines the surface that the eigenvalues will be on is p(λ) = λ2. Thus by Lemma 2.13 the surface becomes a sphere with radius pTr(W2). The solution to the equation system given by (100) was found in Section 2.2. The solution is given as the roots of a polynomial, in this case the solution can be written as the roots of the rescaled Hermite polynomials, the explicit expression for the polynomial whose roots give the maximum points is

  1 ! n − 1 2 (x + r1) f(x) = Hn 2 2(r1 − 2r0) 2 n n−2i b 2 c i   n−2i X (−1) n − 1 2 (x + r1) = n! (102) i! 2(r2 − 2r ) (n − 2i)! i=0 1 0 where Hn denotes the nth (physicist) Hermite polynomial [2]. The solution on the unit sphere can then be used to find the vector of eigenvalues that maximizes the probability density function P(λ) given by (16). Since rescaling the vector of eigenvalues affects the probability density depending on the length of the original vector in the following way   n(n−1)m β 2 2 (cλ) = c 2 exp (1 − c )|λ| (λ) P 2 P the unit sphere solution can be rescaled so that it ends up on the appropriate sphere.

122 125

Chapter 3

Approximation of electrostatic discharge currents using the analytically extended function

This chapter is based on Papers E, F and G

Paper E Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. On some properties of the multi-peaked analytically extended function for approximation of lightning discharge currents. Chapter 10 in Engineering Mathematics I: Electromagnetics, Fluid Mechanics, Material Physics and Financial Engineering, Volume 178 of Springer Proceedings in Mathematics & Statistics, Sergei Silvestrov and Milica Ranˇci´c(Eds), Springer International Publishing, pages 151–176, 2016.

Paper F Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. Estimation of parameters for the multi-peaked AEF current functions. Methodology and Computing in Applied Probability, Volume 19, Issue 4, pages 1107 – 1121, 2017.

Paper G Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. Electrostatic discharge currents representation using the analytically extended function with p peaks by interpolation on a D-optimal design. Facta Universitatis Series: Electronics and Energetics, Volume 32, Issue 1, pages 25 – 49, 2019. 126 127

3.1. THE ANALYTICALLY EXTENDED FUNCTION (AEF)

3.1 The analytically extended function (AEF)

In this section we consider least square approximation using a particular function we call the power-exponential function, as a basic component. Definition 3.1. Here we will refer to the function defined by (103) as the power-exponential function,

β x(β; t) = te1−t , 0 ≤ t. (103)

For non-negative values of t and β the power-exponential function has a steeply rising initial part followed by a more slowly decaying part, see Figure 3.1. This makes it qualitatively similar to several functions that are popular for approximation of important phenomena in different fields such as approximation of lightning discharge currents and pharmacokinetics. Examples include the biexponential function [38], [256], the Heidler function [117] and the Pulse function [299].

Figure 3.1: An illustration of how the steepness of the power exponential func- tion varies with β.

The power-exponential function has been used in other applications, for example to model attack rate of predatory fish, see [232, 233]. Here we examine linear combinations of piecewise power exponential functions that will be used in later sections to approximate electrostatic discharge current functions.

125 128

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

3.1.1 The p-peak analytically extended function

This section is based on Section 2 of Paper E

The p -peaked AEF is constructed using the power exponential function given in Definition 3.1. In order to get a function with multiple peaks and where the steepness of the rise between each peak as well as the slope of the decaying part is not dependent on each other, we define the analyti- cally extended function (AEF) as a function that consist of piecewise linear combinations of the power exponential function that has been scaled and translated so that the resulting function is continuous. With a given differ- ence in height between subsequent peaks Im1 , Im2 ,..., Imp , corresponding times tm1 , tm2 ,..., tmp , integers nq > 0, real values βq,k, ηq,k, 1 ≤ q ≤ p+1, 1 ≤ k ≤ nq such that the sum over k of ηq,k is equal to one, the p -peaked AEF i(t) is given by (104).

Definition 3.2. Given Imq ∈ R and tmq ∈ R, q = 1, 2, . . . , p such that tm0 = 0 < tm1 < tm2 < . . . < tmp along with ηq,k, βq,k ∈ R and 0 < nq ∈ Z nq X for q = 1, 2, . . . , p + 1, k = 1, 2, . . . , nq such that ηq,k = 1. k=1 The analytically extended function (AEF), i(t), with p peaks is defined as

 q−1 ! nq  2  X X βq,k+1  Imk +Imq ηq,kxq(t) , tmq−1 ≤ t ≤ tmq , 1≤q ≤p,  i(t)= k=1 k=1 (104) p ! np+1  X X β2  I η x (t) p+1,k , t ≤ t,  mk p+1,k p+1 mp  k=1 k=1 where t − t t − t  mq−1 mq  exp , 1 ≤ q ≤ p,  ∆tmq ∆tmq xq(t) = t  t   exp 1 − , q = p + 1,  tmq tmq and ∆tmq = tmq − tmq−1 . Sometimes the notation i(t; β, η) with

  β = β1,1 β1,2 . . . βq,k . . . βp+1,np+1 ,   η = η1,1 η1,2 . . . ηq,k . . . ηp+1,np+1 , will be used to clarify what the particular parameters for a certain AEF are. Remark 3.1. The p -peak AEF can be written more compactly if we intro-

126 129

3.1. THE ANALYTICALLY EXTENDED FUNCTION (AEF) duce the vectors

> ηq = [ηq,1 ηq,2 . . . ηq,nq ] , (105)  h 2 2 2 i> β +1 β +1 βq,n +1  xq(t) q,1 xq(t) q,2 . . . xq(t) q , 1 ≤ q ≤ p, xq(t) = (106) h 2 2 2 i> β β βq,n  xq(t) q,1 xq(t) q,2 . . . xq(t) q , q = p + 1. The more compact form is  q−1 ! X  I + I · η>x (t), t ≤ t ≤ t , 1 ≤ q ≤ p,  mk mq q q mq−1 mq  k=1 i(t) = q ! (107)  X  I · η>x (t), t ≤ t, q = p + 1.  mk q q mq k=1 If the AEF is used to model an electrical current, than the derivative of the AEF determines the induced electrical voltage in conductive loops in the lightning field. For this reason it is desirable to guarantee that the first derivative of the AEF is continuous. Since the AEF is a linear function of elementary functions its derivative can be found using standard methods. Theorem 3.1. The derivative of the p -peak AEF is  t − t x (t)  mq q > Imq ηq Bq xq(t), tmq−1 ≤ t ≤ tmq , 1 ≤ q ≤ p, di(t)  t − tm ∆tm = q−1 q dt x (t) tm − t I q q η>B x (t), t ≤ t, q = p + 1,  mq q q q mq t tmq (108) where  2  βq,1 + 1 0 ... 0 2  0 βq,2 + 1 ... 0  B =   , q  . . .. .   . . . .  2 0 0 . . . βq,nq + 1  2  βp+1,1 0 ... 0  0 β2 ... 0   p+1,2  Bp+1 =  . . .. .  ,  . . . .  0 0 . . . β2 p+1,np+1 for 1 ≤ q ≤ p. Proof. From the definition of the AEF (see (104)) and the derivative of the power exponential function (103) given by d x(β; t) = β(1 − t)tβ−1eβ(1−t), dt

127 130

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions expression (108) can easily be derived since differentiation is a linear oper- ation and the result can be rewritten in the compact form analogously to (107). Illustration of the AEF function and its derivative for various values of βq,k-parameters is shown in Figure 3.2.

Figure 3.2: Illustration of the AEF (solid line) and its derivative (dashed line) with different βq,k-parameters but the same Imq and tmq . (a) 0 < βq,k < 1, (b) 4 < βq,k < 5, (c) 12 < βq,k < 13, (d) a mixture of large and small βq,k- parameters.

Lemma 3.1. The AEF is continuous and at each tmq the derivative is equal to zero.

Proof. Within each interval tmq−1 ≤ t ≤ tmq the AEF is a linear combination of continuous functions and at each tmq the function will approach the same value from both directions unless all ηq,k ≤ 0, but if all ηq,k ≤ 0 then nq X ηq,k 6= 1. k=1 Noting that for any diagonal matrix B the expression nq 2 > X βq,k+1 ηq B xq(t) = ηq,kBkkxq(t) , 1 ≤ q ≤ p, k=1

128 131

3.1. THE ANALYTICALLY EXTENDED FUNCTION (AEF) is well-defined and that the equivalent statement holds for q = p and consid- ering (108) it is easy to see that the factor (tmq −t) in the derivative ensures that the derivative is zero every time t = tmq . When interpolating a waveform with p peaks it is natural to require that there will not appear new peaks between the chosen peaks. This corresponds to requiring monotonicity in each interval. One way to achieve this is given in Lemma 3.2.

Lemma 3.2. If ηq,k ≥ 0, k = 1, . . . , nq the AEF, i(t), is strictly monotonic on the interval tmq−1 < t < tmq . Proof. The AEF will be strictly monotonic in an interval if the derivative has the same sign everywhere in the interval. That this is the case follows > from (108) since every term in ηq Bq xq(t) is non-negative if ηq,k ≥ 0, k =

1, . . . , nq, so the sign of the derivative it determined by Imq .

If we allow some of the ηq,k-parameters to be negative, the derivative can change sign and the function might get an extra peak between two other peaks, see Figure 3.3.

Figure 3.3: An example of a two-peaked AEF where some of the ηq,k- parameters are negative, so that it has points where the first deriva- tive changes sign between two peaks. The solid line is the AEF and the dashed lines is the derivative of the AEF.

The integral of the electric current represents the charge flow. Unlike the Heidler function the integral of the AEF is relatively straightforward to find. How to do this is detailed in Lemma 3.3, Lemma 3.4, Theorem 3.2, and Theorem 3.3.

Lemma 3.3. For any tmq−1 ≤ t0 ≤ t1 ≤ tmq , 1 ≤ q ≤ p,

Z t1 β   β e t1 − tmq t0 − tmq xq(t) dt = β+1 ∆γ β + 1, , (109) t0 β β∆tmq β∆tmq

129 132

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

with ∆tmq = tmq − tmq−1 and

∆γ(β, t0, t1) = γ (β + 1, βt1) − γ (β + 1, βt0) , where Z t γ(β, t) = τ β−1e−τ dτ 0 is the lower incomplete Gamma function [2].

If t0 = tmq−1 and t1 = tmq then

Z tmq eβ x (t)β dt = γ (β + 1, β) . (110) q ββ+1 tmq−1

Proof.

Z t1 Z t1   β β t − tmq t − tmq xq(t) dt = exp 1 − dt t0 t0 ∆tmq ∆tmq β−1 Z t1  β   e t − tmq t − tmq = β β exp 1 − β dt. β t0 ∆tmq ∆tmq t−t Changing variables according to τ = β mq gives ∆tmq

Z t1 β Z τ1 β e β −τ xq(t) dt = β+1 τ e dt = t0 β τ0 eβ = (γ(β + 1, τ ) − γ(β + 1, τ )) ββ+1 1 0 eβ = ∆γ(β + 1, τ , τ ) ββ+1 1 0 β   e t1 − tmq t0 − tmq = β+1 ∆γ β + 1, β , β . β ∆tmq ∆tmq

When t0 = tmq−1 and t1 = tmq then

Z t1 β β e xq(t) dt = β+1 ∆γ (β + 1, β) t0 β and with γ(β + 1, 0) = 0 we get (110).

Lemma 3.4. For any tmq−1 ≤ t0 ≤ t1 ≤ tmq , 1 ≤ q ≤ p,

q−1 ! nq Z t1 X X i(t) dt = (t1 − t0) Imk + Imq ηq,k gq(t1, t0), (111) t0 k=1 k=1

130 133

3.1. THE ANALYTICALLY EXTENDED FUNCTION (AEF) where

β2 e q,k  t1 − tm t0 − tm  g (t , t ) = ∆γ β2 + 2, q−1 , q−1 q 1 0 β2 +1 q,k  2  q,k ∆tmq ∆tmq βq,k + 1 with ∆γ(β, t0, t1) defined as in (109). Proof. t t q−1 ! nq Z 1 Z 1 2 X X βq,k+1 i(t) dt = Imk + Imq ηq,kxq(t) dt t0 t0 k=1 k=1 q−1 ! nq t Z 1 2 X X βq,k+1 = (t1 − t0) Imk + Imq ηq,k xq(t) dt k=1 k=1 t0 q−1 ! nq X X = (t1 − t0) Imk + Imq ηq,k gq(t0, t1). k=1 k=1

Theorem 3.2. If tma−1 ≤ ta ≤ tma , tmb−1 ≤ tb ≤ tmb and 0 ≤ ta ≤ tb ≤ tmp then

a−1 ! n Z tb X Xa i(t) dt = (tma − ta) Imk + Ima ηa,k ga(ta, tma ) ta k=1 k=1 b−1 q−1 ! nq ! X X X 2  + ∆tmq Imk + Imq ηq,k gˆ βq,k + 1 q=a+1 k=1 k=1 b−1 ! n X Xb + (tb − tmb ) Imk + Imb ηb,k gb(tmb , tb), (112) k=1 k=1 where gq(t0, t1) is defined as in Lemma 3.4 and eβ gˆ(β) = γ (β + 1, β) . ββ+1 Proof. This theorem follows from integration being linear and Lemma 3.4.

Theorem 3.3. For tmp ≤ t0 < t1 < ∞ the integral of the AEF is

p ! np+1 Z t1 X X i(t) dt = Imk ηp+1,k gp+1(t1, t0), (113) t0 k=1 k=1 where gq(t0, t1) is defined as in Lemma 3.4.

131 134

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

When t0 = tmp and t1 → ∞ the integral becomes

Z ∞ p ! np+1 X X 2  i(t) dt = Imk ηp+1,k g˜ βp+1,k , (114) tmp k=1 k=1 where eβ g˜(β) = (Γ(β + 1) − γ (β + 1, β)) ββ+1 with Z ∞ Γ(β) = tβ−1e−t dt 0 is the Gamma function [2].

Proof. This theorem follows from integration being linear and Lemma 3.4.

In the next section we will estimate the parameters of the AEF that gives the best fit with respect to some data and for this the partial derivatives with respect to the βmq parameters will be useful. Since the AEF is a linear function of elementary functions these partial derivatives can easily be found using standard methods.

Theorem 3.4. The partial derivatives of the p-peak AEF with respect to the β parameters are

0, 0 ≤ t ≤ t ,  mq−1 ∂i  β2 +1 = 2 I η β h (t)x (t) q,k , t ≤ t ≤ t , 1 ≤ q ≤ p, ∂β mq q,k q,k q q mq−1 mq q,k  0, tmq ≤ t, (115) ( ∂i 0, 0 ≤ t ≤ tmp , = β2 ∂β p+1,k p+1,k 2 Imp+1 ηp+1,k βp+1,k hp+1(t)xp+1(t) , tmp ≤ t, (116) where  t − t  t − t  mq−1 mq−1 ln − + 1, 1 ≤ q ≤ p,  ∆tmq ∆tmq hq(t) =  t  t ln − + 1, q = p + 1.  tmq tmq

Proof. Since the βq,k parameters are independent, differentiation with re- spect to βq,k will annihilate all terms but one in each linear combination. The expressions (115) and (116) then follow from the standard rules for differentiation of composite functions and products of functions.

132 135

3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS

3.2 Approximation of lightning discharge current functions

This section is based on Section 3 of Paper F

Many different types of systems, objects and equipment are susceptible to damage from lightning discharges. Lightning effects are usually anal- ysed using lightning discharge models. Most of the engineering and electro- magnetic models imply channel-base current functions. Various single and multi-peaked functions are proposed in the literature for modelling lightning channel-base currents, examples include [117, 118, 140, 141, 146]. For engi- neering and electromagnetic models, a general function that would be able to reproduce desired waveshapes is needed, such that analytical solutions for its derivatives, integrals, and integral transformations, exist. A multi- peaked channel-base current function has been proposed by Javor [140] as a generalization of the so-called TRF (two-rise front) function from [141], which possesses such properties. In this paper we analyse a modification of such multi-peaked function, a so-called p -peak analytically extended function (AEF). The possibility of application of the AEF to approximation of various multi-peaked wave- shapes is investigated. Estimation of its parameters has been performed using the Marquardt least squares method (MLSM), an efficient method for the estimation of non-linear function parameters, see Section 1.2.6. It has been applied in many fields, including lightning research, e.g. for optimizing parameters of the Heidler function [178], or the Pulse function [181, 182]. Some numerical results are presented, including those for the Standard IEC 62305 [133] current of the first-positive strokes, and an example of a fast- decaying lightning current waveform. Fitting a p-peaked AEF to recorded current data (from [257]) is also illustrated.

3.2.1 Fitting the AEF

Suppose that we have kq points (tq,k, iq,k) ordered with respect to tq,k, tmq−1 < tq,1 < tq,2 < . . . < tq,kq < tmq , and we wish to choose parame- ters ηq,k and βq,k such that the sum of the squares of the residuals, kq X 2 Sq = (i(tq,k) − iq,k) , (117) k=1 is minimized. One way to estimate these parameters is to use the Marquardt least squares method described in Section 1.2.6. In order to fit the AEF it is sufficient that kq ≥ nq. Suppose we have some estimate of the β-parameters which is collected in the vector b. It is then fairly simple to calculate an estimate for the η-parameters, see Section 3.2.4,

133 136

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

which we collect in h. We define a residual vector by (e)k = i(tq,k; b, h)−iq,k where i(t; b, h) is the AEF with the estimated parameters. The Jacobian matrix, J, can in this case be described as

  ∂i ∂i ... ∂i ∂βq,1 t=t ∂βq,2 t=t ∂βq,nq t=t  q,1 q,1 q,1   ∂i ∂i ... ∂i   ∂βq,1 ∂βq,2 ∂βq,nq  J =  t=tq,2 t=tq,2 t=tq,2  (118)  . . . .   . . .. .     ∂i ∂i ... ∂i  ∂βq,1 ∂βq,2 ∂βq,nq t=tq,kq t=tq,kq t=tq,kq where the partial derivatives are given by (115) and (116).

3.2.2 Estimating parameters for underdetermined systems

This section is based on Section 3.2 of Paper E

For the Marquardt least squares method to work at least one data point per unknown parameter is needed, m ≥ k. It can still be possible to estimate all unknown parameters if there is insufficient data, m < k if we know some further relations between the parameters.

Suppose that k − m = p and let γj = βm+j, j = 1, 2, ··· , p. If there are at least p known relations between the unknown parameters such that γj = γj(β1, ··· , βm) for j = 1, 2, ··· , p then the Marquardt least squares method can be used to give estimates on β1, ··· , βm and the still unknown parameters can be estimated from these. Denoting the estimated parameters b = (b1, ··· , bm) and c = (c1, ··· , cp) the following algorithm can be used:

• Input: v > 1 and initial values b(0), λ(0).

• r = 0

/ Find c(r) using b(r) together with extra relations.

• Find b(r+1) and δ(r) using MLSM.

• Check chosen termination condition for MLSM, if it is not satisfied go to /.

• Output: b, c.

The algorithm is illustrated in Figure 3.4.

134 137

3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS

Input: choose v and r = 0 initial values for b(0) and λ(0)

Find b(r+1) and δ(r) Find h(r) using b(r) using MLSM together with extra relations

Termination condition NO r = r + 1 satisfied

YES Output: b, h

Figure 3.4: Schematic description of the parameter estimation algorithm.

3.2.3 Fitting with data points as well as charge flow and specific energy conditions

By considering the charge flow at the striking point, Q0, unitary resistance R and the specific energy, W0, we get two further conditions:

Z ∞ Q0 = i(t) dt, (119) 0 Z ∞ 2 W0 = i(t) dt. (120) 0

First we will define

Z ∞ Q(b, h) = i(t; b, h) dt 0 Z ∞ W (b, h) = i(t; b, h)2 dt. 0

These two quantities can be calculated as follows.

Theorem 3.5. p q−1 ! nq ! X X X 2 Q(b, h) = ∆tmq Imk + Imq ηq,k gˆ(βq,k + 1) q=1 k=1 k=1 p ! np+1 X X 2 + Imk ηp+1,k g˜(βp+1,k), (121) k=1 k=1

135 138

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

 2 p q−1 ! q−1 ! nq X X X X 2 W (b, h) =  Imk + Imk Imq ηq,k gˆ(βq,k + 1) q=1 k=1 k=1 k=1 nq 2 X 2 2  + Imq ηq,k gˆ 2 βq,k + 2 k=1 nq−1 nq  2 X X 2 2  + 2 Imq ηq,r ηq,s gˆ βq,r + βq,s + 2  r=1 s=r+1

p !2 np X X 2 2  + Imk ηp,k g˜ 2 βp,k k=1 k=1 np+1−1 np+1  X X 2 2  + 2 ηp+1,r ηp+1,s g˜ βp+1,r + βp+1,s  r=1 s=r+1 (122) where gˆ(β) and g˜(β) are defined in Theorems 3.2 and 3.3.

Proof. Formula (121) is found by combining (112) and (113). Formula (122) is found by noting that

n !2 n n−1 n X X 2 X X ak = ak + ar as k=1 k=1 r=1 s=r+1 and then reasoning analogously to the proofs for (112) and (113).

We can calculate the charge flow and specific energy given by the AEF with formulas (121) and (122), respectively, and get two additional residual terms EQ0 = Q(b, h) − Q0 and EW0 = W (b, h) − W0. Since these are global conditions this means that the parameters η and β no longer can be fitted separately in each interval. This means that we need to consider all data points simultaneously. The resulting J-matrix is

  J1 ... 0  . .. .   . . .    J =  0 ... Jp+1  (123)  ∂E ∂E ∂E ∂E   Q0 ... Q0 ... Q0 ... Q0   ∂β1,1 ∂β1,n1 ∂βp+1,1 ∂βp+1,np+1   ∂E ∂E ∂E ∂E  W0 ... W0 ... W0 ... W0 ∂β1,1 ∂β1,n1 ∂βp+1,1 ∂βp+1,np+1

136 139

3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS where   ∂i ∂i ... ∂i ∂βq,1 t=t ∂βq,2 t=t ∂βq,nq t=t  q,1 q,1 q,1   ∂i ∂i ... ∂i   ∂βq,1 ∂βq,2 ∂βq,nq  J =  t=tq,2 t=tq,2 t=tq,2  q  . . . .   . . .. .     ∂i ∂i ... ∂i  ∂βq,1 ∂βq,2 ∂βq,nq t=tq,kq t=tq,kq t=tq,kq and the partial derivatives in the last two rows are given by  dˆg 2 Imq ηq,s βq,s , 1 ≤ q ≤ p,  dβ 2 ∂Q β=βq,s+1 = d˜g ∂βq,s  2 Imp ηp+1,s βp+1,s , q = p + 1.  dβ 2 β=βp+1,s For 1 ≤ q ≤ p

q−1 ! ∂W X dˆg = 2 Imk Imq ηq,s βq,s ∂βq,s dβ 2 k=1 β=βq,s+1   nq 2  dˆg X dˆg  + 4 Imq ηq,sβq,s ηq,s + ηq,k   dβ 2 dβ 2 2  β=2βq,s+2 k=1 β=βq,s+βq,k+2 k6=s and p ! ∂W X = 4 Imk ηp+1,sβp+1,s ∂βp+1,s k=1   nq  d˜g X d˜g  ηp+1,s + ηp+1,k  .  dβ 2 dβ 2 2  β=2βp+1,s k=1 β=βp+1,s+βp+1,k k6=s

The derivatives ofg ˆ(β) andg ˜(β) are dˆg 1  eβ  = 1 + Γ(β + 1)Ψ(β) − ln(β) − G(β) , (124) dβ e ββ d˜g 1  eβ  = G(β) − 1 , (125) dβ e ββ where Γ(β) is the Gamma function, Ψ(β) is the digamma function, see for example [2], and G(β) is a special case of the Meijer G-function and can be defined as   3,0 1, 1 G(β) = G β 2,3 0, 0, β + 1

137 140

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions using the notation from [236]. When evaluating this function it might be more practical to rewrite G using other   β+1 3,0 1, 1 β G(β) = G β = 2F2(β + 1, β + 1; β + 2, β + 2; −β) 2,3 0, 0, β + 1 (β + 1)2 π csc (πβ) − Ψ(β) + π cot(πβ) + ln(β) Γ(−β) where ∞ X (β + 1)2 F (β + 1, β + 1; β + 2, β + 2; −β) = (−1)kβk 2 2 (β + k + 1)2 k=0 is a special case of the hypergeometric function. These partial derivatives were found using software for symbolic computation [200]. Note that all η-parameters must be recalculated for each step and how this is done is detailed in Section 3.2.4.

3.2.4 Calculating the η-parameters from the β-parameters

Suppose that we have nq − 1 points (tq,k, iq,k) such that tmq−1 < tq,1 < tq,2 < . . . < tq,nq−1 < tmq . For an AEF that interpolates these points it must be true that q−1 nq X X βq,s Imk + Imq ηq,sxq(tq,k) = iq,k, k = 1, 2, . . . , nq − 1. (126) k=1 s=1

Since ηq,1 + ηq,2 + ... + ηq,nq = 1 equation (126) can be rewritten as n −1 q−1 q   X βq,s βq,nq βq,nq X Imq ηq,s xq(tq,k) − xq(tq,k) = iq,k − xq(tq,k) − Ims s=1 s=1 (127) for k = 1, 2, . . . , nq − 1. This can be written as a matrix equation ˜ ˜ Imq Xqη˜q = iq, (128) q−1 >     ˜ βq,nq X η˜q = ηq,1 ηq,2 . . . ηq,nq−1 , iq = iq,k − xq(tq,k) − Ims , k s=1   βq,s βq,n X˜q =x ˜q(k, s) = xq(tq,k) − xq(tq,k) q , k,s and xq(t) given by (105). When all βq,k, k = 1, 2, . . . , nq are known then ηq,k, k = 1, 2, . . . , nq − 1 can nq−1 X be found by solving equation (128) and ηq,nq = 1 − ηq,k. k=1

138 141

3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS

If we have kq > nq − 1 data points then the parameters can be estimated with the least squares solution to (128), more specifically the solution to 2 ˜ > ˜ ˜ >˜ Imq Xq Xqη˜q = Xq iq.

3.2.5 Explicit formulas for a single-peak AEF t Consider the case where p = 1, n1 = n2 = 2 and τ = . Then the explicit tm1 formula for the AEF is ( β2 +1 (β2 +1)(1−τ) β2 +1 (β2 +1)(1−τ) i(τ) η1,1 τ 1,1 e 1,1 + η1,2 τ 1,2 e 1,2 , 0≤τ ≤1, = 2 2 2 2 (129) β2,1 β2,1(1−τ) β2,2 β2,2(1−τ) Im1 η2,1 τ e + η2,2 τ e , 1≤τ.

Assume that four datapoints, (ik, τk), k = 1, 2, 3, 4, as well as the charge flow Q0 and specific energy W0, are known. If we want to fit the AEF to this data using MLSM equation (123) gives   f1(τ1) f2(τ1) 0 0  f1(τ2) f2(τ2) 0 0     0 0 g (τ ) g (τ )   1 3 2 3   0 0 g (τ ) g (τ )  J =  1 4 2 4  ,  ∂ ∂ ∂ ∂   Q(β, η) Q(β, η) Q(β, η) Q(β, η)   ∂β1,1 ∂β1,2 ∂β2,1 ∂β2,2   ∂ ∂ ∂ ∂   W (β, η) W (β, η) W (β, η) W (β, η) ∂β1,1 ∂β1,2 ∂β2,1 ∂β2,2 β2 +1 (β2 +1)(1−τ)  fk(τ) = 2 η1,k β1,kτ 1,k e 1,k ln(τ) + 1 − τ , i1 β2 2 1,2 (β1,2+1)(1−τ1) η1,1 = − τ1 e , η1,2 = 1 − η1,1, Im1 β2 β2 (1−τ)  gk(τ) = 2 η2,k β2,kτ 2,k e 2,k ln(τ) + 1 − τ , i3 β2 2 2,2 β1,2(1−τ3) η2,1 = − τ3 e , η2,2 = 1 − η2,1, Im1  2  2  2 2  β = β1,1 + 1 β1,2 + 1 β2,1 β2,2 ,   η = η1,1 η1,2 η2,1 η2,2 , 2 2 β1,s Q(β, η) X e 2 2  = η1,s 2 γ β1,s + 2, β2,s + 1 I  β1,s+1 m1 s=1 2 β1,s + 1 2 2 β2,s X e 2  2 2  + η2,s 2 Γ β2,s + 1 − γ β2,s + 1, β2,s , 2β2,s+1 s=1 β2,s  dˆg 2 I η β , q = 1,  m1 1,s 1,s ∂Q  dβ β=β2 +1 = 1,s d˜g ∂βq,s  2 Imq ηp,s β2,s , q = 2,  dβ 2 β=β2,s

139 142

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions with derivatives ofg ˆ(β) andg ˜(β) given by (124) and (125),

 2 2  2 2  2 2 2 2  βe = β1,1 + β1,2 + 2 β1,1 + β1,2 + 2 (β2,1 + β2,2)(β2,1 + β2,2) ,  2 2 2 2  ηb = η1,1 η1,2 η2,1 η2,2 ,   ηe = (η1,1η1,2)(η1,1η1,2)(η2,1η2,2)(η2,1η2,2) , ∂ ∂ ∂   W (β, η) = 2 βq,s Q (2β, ηb) + β  Q βe, ηe . ∂βq,s ∂βq,s q, (s−1 mod 2)+1 ∂βq,s

3.2.6 Fitting to lightning discharge currents This section is based on Section 4 of Paper F

In this section some results of fitting the AEF to a few different waveforms will be demonstrated. Some single-peak waveforms given by Heidler func- tions in the IEC 62305-1 standard [133] will be approximated using the AEF, and furthermore, fitting the multi-peaked waveform to experimental data will be presented.

Single-peak waveforms In this section some numerical results of fitting the AEF function to single- peak waveshapes are presented and compared with the corresponding fitting of the Heidler function. The AEF given by (129) is used to model few light- ning current waveshapes whose parameters (rise/decay time ratio, T1/T2, peak current value, Im1, time to peak current, tm1, charge flow at the strik- ing point, Q0, specific energy, W0, and time to 0.1Im1, t1) are given in Table 3.1. Data points were chosen as follows:

(i1, τ1) = (0.1 Im1 , t1),

(i2, τ2) = (0.9 Im1 , t2 = t1 + 0.8 T1),

(i3, τ3) = (0.5 Im1 , th = t1 − 0.1 T1 + T2),

(i4, τ4) = (i(1.5 th), 1.5 th).

The AEF representation of the waveshape denoted as the first positive stroke current in IEC 62305 standard [133], is shown in Figure 3.5. Ris- ing and decaying parts of the first negative stroke current from IEC 62305 standard [133] are shown in Figure 3.6, left and right, respectively. β and η parameters of both waveshapes optimized by the MLSM are given in Ta- ble 3.1. We have also observed a so-called fast-decaying waveshape whose pa- rameters are given in Table 3.1. It’s representation using the AEF function is shown in Figure 3.7, and corresponding β and η parameter values in Ta- ble 3.1.

140 143

3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS

Figure 3.5: First-positive stroke represented by the AEF function. Here it is fitted with respect to both the data points as well as Q0 and W0.

Figure 3.6: First-negative stroke represented by the AEF function. Here it is fitted with the extra constraint 0 ≤ η ≤ 1 for all η-parameters.

Apart from the AEF function (solid line), the Heidler function represen- tation of the same waveshapes (dashed line), and used data points (red solid circles) are also shown in the figures.

Multi-peaked AEF waveforms for measured data

In this section the AEF will be constructed by fitting to measured data rather than approximation of the Heidler function. We will use data based on the measurements of flash number 23 in [257]. Two AEFs have been constructed, one by choosing peaks corresponding to local maxima, see Fig- ure 3.8, and one by choosing peaks corresponding to local maxima and local minima, see Figure 3.9. For both AEFs there are two terms in each interval which means that for each peak there are two parameters that are chosen manually (time and current for each peak) and for each interval there are two parameters that are fitted using the MLSM. The AEF in Figure 3.8 demonstrates that the AEF can handle cases where the function is not constant or monotonically increasing/decreasing between peaks. This is only possible if the AEF has more than one term in the interval.

141 144

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Figure 3.7: Fast-decaying waveshape represented by the AEF function. Here it is fitted with the extra constraint 0 ≤ η ≤ 1 for all η-parameters.

Figure 3.8: AEF fitted to measurements from [257]. Here the peaks have been chosen to correspond to local maxima in the measured data.

Conclusions

This section investigated the possibility to approximate, in general, multi- peaked lightning currents using an AEF function. Furthermore, existence of the analytical solution for the derivative and the integral of such function has been proven, which is needed in order to perform lightning electromagnetic field (LEMF) calculations based on it. Two single-peak Standard IEC 62305-1 waveforms, and a fast-decaying one, have been represented using a variation of the proposed AEF function (129). The estimation of their parameters has been performed applying the MLS method using two pairs of data points for each function part (rising and decaying). The results show that there are several factors that need to be taken into consideration to get the best possible approximation of a given waveform. The accuracy of the approximation varies with the chosen data points and the number of terms in the AEF. In several cases the two-term sum converged towards a single term sum. This can probably be improved by choosing the number of terms and the number and placement of data points in other ways which the authors intend to examine further. Further

142 145

3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN

First-positive First-negative Fast-decaying stroke stroke T1/T2 10/350 1/200 8/20 tm1 [µs] 31.428 3.552 15.141 Im1 [kA] 200 100 0.001 Q0 [C] 100 // W0 [MJ/Ω] 10 // t1 [µs] 14.5 1.47 6.34 β1,1 0.114 1.84 7.666 β1,2 2.17 9.99 2.626 β2,1 0.284 0.099 0.925 β2,2 0 0.127 2.420 η1,1 −0.197 1 0 η1,2 1.197 0 1 η2,1 1 0.401 0.2227 η2,2 0 0.599 0.7773

Table 3.1: AEF function’s parameters for some current waveshapes examples of fitted (single- and multi-peaked) waveforms can be found in [189] and [143].

3.3 Approximation of electrostatic discharge cur- rents using the AEF by interpolation on a D- optimal design

This section is based on Paper G

In this section we analyse the applicability of the AEF with p peaks to representation of ESD currents by interpolation of data points chosen ac- cording to a D-optimal design. This is illustrated through examples from two applications. The first application is modelling of ESDs from IEC stan- dards commonly used in electrostatic discharge immunity testing, and the second modelling of lightning discharges. For the ESD immunity testing application we model the IEC Standard 61000-4-2 waveshape, [131, 132] and an experimentally measured ESD cur- rent from [151]. For the lightning discharge application we model the IEC 61312-1 stan- dard waveshape [117,134] and a more complex measured lightning discharge current from [69]. We also use the same method to approximate a measured derivative of a lightning discharge current derivative from [130].

143 146

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Figure 3.9: AEF fitted to measurements from [257]. Here the peaks have been chosen to correspond to local maxima and minima in the measured data.

In both applications the basic properties of the current (or current deriva- tive) are the same, these properties and how they are modelled with the AEF is discussed in the next section.

Multi-peaked analytically extended function A so-called multi-peaked analytically extended function (AEF) has been proposed and applied to lightning discharge current modelling in Section 3.1 and [183]. Initial considerations on applying the function to ESD currents have also been made in [189]. The AEF consists of scaled and translated power-exponential functions, that is functions of the form x(β; t) = te1−tβ, see Definition 3.1. Here we define the AEF with p peaks as q−1 nq X X i(t) = Imk + Imq ηq,kxq,k(t), (130) k=1 k=1

for tmq−1 ≤ t ≤ tmq , 1 ≤ q ≤ p, and p np+1 X X Imk ηp+1,kxp+1,k(t), (131) k=1 k=1

for tmp ≤ t.

The current value of the first peak is denoted by Im1 , the difference between each pair of subsequent peaks by Im2 ,Im3 ,...,Imp , and their cor- responding times by tm1 , tm2 , . . . , tmp . In each time interval q, with 1 ≤ q ≤ p + 1, the number of terms is given by nq, 0 < nq ∈ Z. Parameters ηq,k are nq X such that ηq,k ∈ R for q = 1, 2, . . . , p + 1, k = 1, 2, . . . , nq and ηq,k = 1. k=1

144 147

3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN

Furthermore xq,k(t), 1 ≤ q ≤ p + 1 is given by

  t−tmq−1  x βq,k; , 1 ≤ q ≤ p, tmq −tmq−1 xq,k(t) = (132)  t  x βq,k; , q = p + 1. tmq

Remark 3.2. When previously applying the AEF, see Section 3.1.1, all exponents (β-parameters) of the AEF were set to β2+1 in order to guarantee that the derivative of the AEF is continuous. Here this condition will be satisfied in a different manner. Since the AEF is a linear function of elementary functions its derivative and integral can be found using standard methods. For explicit formulae please refer to Theorems 3.1–3.3. Previously, the authors have fitted AEF functions to lightning discharge currents and ESD currents using the Marquardt least square method but have noticed that the obtained result varies greatly depending on how the waveforms are sampled. This is problematic, especially since the methodol- ogy becomes computationally demanding when applied to large amounts of data. Here we will try one way to minimize the data needed but still enough to get an as good approximation as possible. The method examined here will be based on D-optimality of a regression model. A D-optimal design is found by choosing sample points such that the determinant of the Fisher information matrix of the model is minimized. For a standard linear regression model this is also equivalent, by the so- called Kiefer-Wolfowitz equivalence criterion, to G-optimality which means that the maximum of the prediction variance will be minimized. These are standard results in the theory of optimal experiment design and a summary can be found in for example [208]. Minimizing the prediction variance will in our case mean maximizing the robustness of the model. This does not guarantee a good approximation but it will increase the chances of the method working well when working with limited precision and noisy data and thus improve the chances of finding a good approximation when it is possible.

3.3.1 D-optimal approximation for exponents given by a class of arithmetic sequences It can be desirable to minimize the number of points used when construct- ing the approximation. One way of doing this is choosing the D-optimal sampling points. In this section we will only consider the case where in each interval the n exponents, β1,..., βn, are chosen according to k + m − 1 β = , m = 1, 2, . . . , n m c

145 148

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions where k is a non-negative integer and c a positive real number. Note that in order to guarantee continuity of the AEF derivative the condition k > c has to be satisfied. In each interval we want an approximation of the form n X βi βi(1−t) y(t) = ηit e i=1

1−t l and by setting z(t) = (te ) c we obtain n X k+i−1 y(t) = ηiz(t) . i=1

If we have n sample points, ti, i = 1, . . . , n, then the Fisher information matrix, M, of this system is M = U >U where

 k k k  z(t1) z(t2) . . . z(tn) k+1 k+1 k+1  z(t1) z(t2) . . . z(tn)  U =   .  . . .. .   . . . .  k+n−1 k+n−1 k+n−1 z(t1) z(t2) . . . z(tn)

Thus if we want to maximize det(M) = det(U)2 it is sufficient to maximize 1−t l or minimize the determinant det(U). Set z(ti) = (tie i ) c = xi then

n !   Y k Y un(t1, . . . , tn) = det(U) = xi  (xj − xi) . (133) i=1 1≤i

To find ti we will use the Lambert W function. Formally the Lambert W function is the function W that satisfies t = W (tet). Using W we can invert z(t) in the following way te1−t = xc ⇔ −te−t = −e−1xc ⇔ t = −W (−e−1xc). (134)

The Lambert W is multivalued but since we are only interested in real- valued solutions we are restricted to the main branches W0 and W−1. Since W0 ≥ −1 and W−1 ≤ −1 the two branches correspond to the rising and decaying parts of the AEF respectively. We will deal with the details of finding the correct points for the two parts separately.

3.3.2 D-optimal interpolation on the rising part The D-optimal points on the rising part can be found using Theorem 3.6.

146 149

3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN

Theorem 3.6. The determinant

n !   Y k Y un(k; x1, . . . , xn) = xi  (xj − xi) i=1 1≤i

Proof. Without loss of generality we can assume 0 < x1 < x2 < . . . < xn−1 < xn ≤ 1. Fix all xi except xn. When xn increases all factors of un that contain xn will also increase, thus un will reach its maximum value on the edge of the cube where xn = 1. Using the method of Lagrange multipliers in the plane given by xn = 1 gives   n ∂un k X 1 = u (k; x , . . . , x )  +  = 0, ∂x n 1 n x x − x  j j i=1 j i i6=j n Y for j = 1, . . . , n − 1. By setting f(x) = (x − xi) we get i=1 n 00 k X 1 k 1 f (xj) + = 0 ⇔ + = 0 x x − x x 2 f 0(x ) j i=1 j i j j i6=j 00 0 ⇔ xjf (xj) + 2kf (xj) = 0 (135) for j = 1, . . . , n − 1. Since f(x) is a polynomial of degree n that has x = 1 as a root then equation (135) implies f(x) xf 00(x) + 2kf 0(x) = c x − 1 where c is some constant. Set f(x) = (x−1)g(x) and the resulting differential equation is x(x − 1)g00(x) + ((2k + 2)x − 2k)g0(x) + (2k − c)g(x) = 0. The constant c can be found by examining the terms with degree n − 1 and is given by c = 2k + (n − 1)(2k + n), thus x(1 − x)g00(x) + (2k − (2k + 2)x)g0(x) +(n − 1)(2k + n)g(x) = 0. (136)

147 150

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Comparing (136) with the standard form of the hypergeometric function [2] x(1 − x)g00(x) + (c − (a + b + 1)x)g0(x) − abg(x) = 0 shows that g(x) can be expressed as follows g(x) = C · 2F1(1 − n, 2k + n; 2k, x) n−1 (2k)n−1 X n − 1(2k + n)i = C · (−1)i xi (n − 1)! i i i=0 (2k) where C is an arbitrary constant and since we are only interested in the roots of the polynomial we can chose C so that it gives the desired form of the expression. The connection to the Jacobi polynomial is given by [2] m! F (−m, 1 + α + β + n; α + 1; x) = P (α,β)(1 − 2x), 2 1 (α + 1)m m and α = 2k − 1, β = 0, m = n − 1 gives the expression in Theorem 3.6. (a,b) Note that the Pn (x) are orthogonal polynomials on the interval [−1, 1] with respect to the weight function (1 − x)a(1 + x)b and thus all of its zeros will be real, distinct and located in [−1, 1], see [48]. Thus all zeros of the polynomial given in Theorem 3.6 will be real, distinct and located in the interval [0, 1].

We can now find the D-optimal t-values using the upper branch of the Lambert W function as described in equation (134),

−1 c ti = −W0(−e xi ), where xi are the roots of the Jacobi polynomial given in Theorem 3.6. Since −1 −1 ≤ W0(x) ≤ 0 for −e ≤ x ≤ 0 this will always give 0 ≤ ti ≤ 1.

Remark 3.3. Note that xn = 1 means that tn = tq and also is equivalent nq X to the condition ηq,r = 1. In other words, we are interpolating the peak r=1 and p − 1 points inside each interval.

3.3.3 D-optimal interpolation on the decaying part Finding the D-optimal points for the decaying part is more difficult than it is for the rising part. Suppose we denote the largest value for time that can reasonably be used (for computational or experimental reasons) with tmax. 1 This corresponds to some value xmax = (tmax exp(1 − tmax)) c . Ideally we n would want a corresponding theorem to Theorem 3.6 over [1, xmax] instead n of [0, 1] . It is easy to see that if xi = 0 or xi = 1 for some 1 ≤ xi ≤ n − 1 then wn(k; x1, . . . , xn) = 0 and thus there must exist some local extreme

148 151

3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN

point such that 0 < x1 < x2 < . . . < xn−1 < 1. This is no longer guaranteed n when considering the volume [1, xmax] instead. Therefore we will instead n extend Theorem 3.6 to the volume [0, xmax] and give an extra constraint on the parameter k that guarantees 1 < x1 < x2 < . . . < xn−1 < xn = xmax.

Theorem 3.7. Let y1 < y2 < . . . < yn−1 be the roots of the Jacobi poly- (2k−1,0) nomial Pn−1 (1 − 2y). If k is chosen such that 1 < xmax · y1 then the determinant wn(k; x1, . . . , xn) given in Theorem 3.6 is maximized or min- n imized on the cube [1, xmax] (where xmax > 1) when xi = xmax · yi and xn = xmax, or some permutation thereof. Proof. This theorem follows from Theorem 3.6 combined with the fact that wn(k; x1, . . . , xn) is a homogeneous polynomial. Since wn(k; b · x1, . . . , c · k+ n(n−1) n xn) = b 2 ·wn(k; x1, . . . , xn) if (x1, . . . , xn) is an extreme point in [0, 1] n then (b·x1, . . . , b·xn) is an extreme point in [0, b] . Thus by Theorem 3.6 the points given by xi = xmax · yi will maximize or minimize wn(k; x1, . . . , xn) n on [0, xmax] . Remark 3.4. It is in many cases possible to ensure the condition 1 < (2k−1,0) xmax · y1 without actually calculating the roots of Pn−1 (1 − 2y). In the literature on orthogonal polynomials there are many expressions for upper and lower bounds of the roots of the Jacobi polynomials. For instance in [74] an upper bound on the largest root of a Jacobi polynomial is given and can be, in our case, rewritten as 3 y > 1 − 1 4k2 + 2kn + n2 − k − 2n + 1 and thus 3 1 1 − 2 2 > 4k + 2kn + n − k − 2n + 1 xmax guarantees that 1 < xmax · y1. If a more precise condition is needed there are expressions that give tighter bounds of the largest root of the Jacobi polynomials, see [179]. We can now find the D-optimal t-values using the lower branch of the Lambert W function as in equation (134),

−1 c ti = −W−1(−e xi ), where xi are the roots of the Jacobi polynomial given in Theorem 3.6. Since −1 −1 ≤ W−1(x) < −∞ for −e ≤ x ≤ 0 this will always give 1 ≤ ti < tmax = −1 −W−1(−e xmax) so xmax is given by the highest feasible t.

Remark 3.5. Note that here just like in the rising part tn = tp which means that we will interpolate to the final peak as well as p − 1 points in the decaying part.

149 152

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

3.3.4 Examples of models from applications and experiments Here we will apply the described scheme to two different applications, mod- elling of ESD currents commonly used in electrostatic discharge immunity testing and modelling of lightning discharge currents. The values of n and peak-times have been chosen manually, and k and c have been chosen by first fixing k and then numerically finding a c that gave a close approximation. For this purpose we used software for numerical computing [205], based on the interior reflective Newton method described in [55, 56]. This is then repeated for k = 1,..., 10 and the best fitting set of parameters is chosen. Note that this methodology uses all available data points to evaluate fitting but could probably be simplified further. For example, by using a simpler method for choosing c given k, only use a subset of available points to asses accuracy or, with sufficient experimentation find some suitable heuristic for choosing the appropriate value of k. Since the waveforms are given as data rather than explicit functions the D-optimal points have been calculated and then the closest available data points have been chosen. In these examples the coefficients in the linear sums can be negative.

3.3.5 Modelling of ESD currents The IEC 61000-4-2 standard current waveshape All ESD generators used in testing of equipment and devices must be able to reproduce the same ESD current waveshape. The requirements for this waveshape are given in the IEC 61000-4-2 Standard, [132]. The IEC 61000-4-2 Standard gives a graphical representation of the typ- ical ESD current, Figure 3.10, and also defines, for a given test level voltage, required values of ESD current’s key parameters. The values of the ESD currents key parameters are listed in Table 3.2 for the case of the contact discharge, where Ipeak is the ESD current initial peak, tr is the rising time defined as the difference between time moments corresponding to 10% and 90% of the current peak Ipeak, I30 and I60 are the ESD current values calculated for time periods of 30 and 60 ns, respectively, starting from the time point corresponding to 10% of Ipeak. In this section we present the results of fitting a 2-peak AEF to the Standard ESD current given in IEC 61000-4-2. Data points which are used in the optimization procedure are manually sampled from the graphically given Standard [132] current function, Figure 3.10. The peak currents and corresponding times are also extracted, and the results of D-optimal inter- polation with two peaks are illustrated, see Figure 3.11. The parameters are listed in Table 3.3. In the illustrated examples a fairly good fit is found but typically areas with steeply rising and decaying parts are somewhat more difficult to fit with good accuracy than the other parts of the waveform.

150 153

3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN

U [kV] Ipeak [A] tr [ns] 2 7.5 ± 15% 0.8 ± 25% 4 15.0 ± 15% 0.8 ± 25% 6 22.5 ± 15% 0.8 ± 25% 8 30.0 ± 15% 0.8 ± 25%

U [kV] I30 [A] I60 [A] 2 4.0 ± 30% 2.0 ± 30% 4 8.0 ± 30% 4.0 ± 30% 6 12.0 ± 30% 6.0 ± 30% 8 16.0 ± 30% 8.0 ± 30%

Figure 3.10: IEC 61000-4-2 Standard ESD Table 3.2: IEC 61000-4-2 current waveform with parameters, [132] standard ESD current parame- (image slightly modified for clarity). ters [132].

Local maxima and minima and corresponding times extracted from IEC 61000-4-2, [132] 15 IEC 61000-4-2 Peak current [A] Peak time [ns] 2-peaked AEF Peaks Imax1 = 15 tmax1 = 6.89 Interpolated points 10 Imin1 = 7.1484 tmin1 = 12.85

) Imax2 = 9.0921 tmax2 = 25.54 t ( i Parameters of interpolated AEF 5 Interval n k c

0 ≤ t ≤ tmax1 3 1 0.01385

0 tmax1 ≤ t ≤ tmax2 3 4 2.025 0 2 4 6 8 t < t 5 10 2.395 t [s] #10-8 max2 Figure 3.11: 2-peaked AEF interpo- Table 3.3: Parameters’ values of lated on a D-optimal design represent- 2-peaked AEF representing the IEC ing the IEC 61000-4-2 Standard ESD 61000-4-2 Standard ESD current current waveshape for 4 kV. waveshape for 4 kV.

151 154

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

3-peaked AEF representing measured current from ESD In this section we present the results of fitting a 3-peaked AEF to a waveform from experimental measurements from [151]. The result is also compared to a common type of function used for modelling ESD current, also from [151]. In Figures 3.12 and 3.13 the results of the interpolation of D-optimal points are shown together with the measured data, as well as a sum of two Heidler functions that was fitted to the experimental data in [151]. This function is given by n n  t  H  t  H τ1 − t τ3 − t i(t) = I e τ2 + I e τ4 , 1  nH 2  nH 1 + t 1 + t τ1 τ3

I1 = 31.365 A,I2 = 6.854 A, nH = 4.036,

τ1 = 1.226 ns, τ2 = 1.359 ns,

τ3 = 3.982 ns, τ4 = 28.817 ns.

Note that this function does not reproduce the second local minimum but that all three AEF functions can reproduce all local minima and maxima (to a modest degree of accuracy) when suitable values for the n, k and c parameters are chosen. In Figure 3.13 we can see that even small bumps in the rising part are successfully reproduced.

3.3.6 Modelling of lightning discharge currents IEC 61312-1 standard current waveshape In this section we use the scheme to represent the IEC 61312-1 Standard current wave shape as it is described in [117]. Rather than being given graphically, as the IEC 61000-4-2 Standard current waveform, the shape is described using a Heidler function,

t n Ipeak T − t i(t) = e τ (137) η t n 1 + T whose parameters are chosen according to Table 3.4. In Figures 3.14 and 3.15 the results of fitting an AEF by interpolating on a D-optimal design to the first stroke of a protection level I IEC 61312-1 Standard waveshape are shown. The parameters of the fitted AEF are given in Table 3.6. In this case the waveshape can be reproduced fairly well but gives a relatively complicated expression compared to (137).

152 155

3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN

Modelling a measured lightning discharge current In this section we fit a 13-peaked AEF function both with free parameters (as in [145]) and using interpolation on a D-optimal design, to data extracted from [130] that comes from measurements of a lightning strike on Mount S¨antis in Switzerland [245]. The results are shown in Figures 3.16 (a)–3.16 (d). It can be seen that in most cases the AEF with free parameters gives a closer fit but the version interpolated on a D-optimal design is often comparable. Parameters for the D-optimal fitting can be found in Table 3.7.

Modelling the lightning discharge current derivative Here we present some results when attempting to reproduce the derivative of the waveshape of the lightning discharge current using the AEF interpolated on a D-optimal design. We also compare the result of this fitting scheme to the results in [190] where the parameters of the AEF are chosen freely and fitted using the Marquardt Least Squares Method. The method for fitting an AEF described here is applied to the modelling of lightning current derivative signals measured at the CN Tower [130]. The results of the fitting can be seen in Figure 3.17. From these figures it is clear that in this case of several peaks and few terms in each interval the two schemes for fitting the AEF are often similar in quality but sometimes the extra flexibility offered when letting all the exponents in the AEF be chosen individually can give a significantly better fit, an example of this can be seen in Figure 3.17. A possible explanation for this in this case is that in the scheme for D-optimal fitting you need many terms in order to have both small and large exponents. In Figure 3.18 we examine how well the different fitting schemes model the current when they are integrated. Here we can see that the free parameter version gives a considerably better matching to the numerically integrated measured values than the D-optimal fitting version.

Protection level Parameter First stroke Subsequent stroke n 10 10 T 19.0 µs 0.454 µs τ 485 µs 143 µs η 0.930 0.993

I Ipeak 200 kA 50 kA

II Ipeak 150 kA 37.5 kA

III-IV Ipeak 100 kA 25 kA

Table 3.4: IEC 61312-1 standard current key parameters, [134].

153 156

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Measured data 6 3-peaked AEF Two Heidler function

) Peaks t 4 ( i Interpolated points 2

0 0 1 2 3 4 5 6 7 8 9 t [s] #10-8 Figure 3.12: 3-peaked AEF interpolated to a D-optimal design from measured ESD current from [151, Figure 3] compared with an approxima- tion suggested in [151]. Parameters are given in Table 3.5.

6 Figure 3.13: Close-up of the rising

) part of a 3-peaked AEF interpolated t 4 ( i to a D-optimal design from measured Measured data ESD current from [151, Figure 3]. 3-peaked AEF 2 Two Heidler function Parameters are given in Table 3.5. Peaks Interpolated points 0 0 2 4 t [s] #10-9

Local maxima and corresponding times extracted from [151, Figure 3]

Imax1 = 7.37 [A] Imax2 = 5.02 [A] Imax3 = 3.82 [A] tmax1 = 1.23 [ns] tmax2 = 6.39 [ns] tmax3 = 15.5 [ns] Parameters of interpolated AEF shown in Figure 3.12 Interval n k c

0 ≤ t ≤ tmax1 5 5 0.05750 tmax1 ≤ t ≤ tmax2 3 1 0.4920 tmax2 ≤ t ≤ tmax3 4 2 0.5967 tmax3 < t 6 1 1.019

Table 3.5: Parameters’ values of AEF with 3 peaks representing measured ESD current from [151, Figure 3].

154 157

3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN

IEC 61312-1 D-optimal AEF 80 Peak Interpolated sample points 60 ) t ( i 40

20

0 0 0.5 1 1.5 2 2.5 3 t [s] #10-3 Figure 3.14: AEF with 1 peak fitted by interpolating D-optimal points sampled from the Heidler function describing the IEC 61312-1 waveshape given by (137). Parameters are given in Table 3.6.

IEC 61312-1 D-optimal AEF 80 Peak Interpolated points Figure 3.15: Close-up of the rising part of 60 the AEF with 1 peak fitted by interpolating ) t

( D-optimal points samples from the Heidler i 40 function describing the IEC61312-1 wave- shape given by (137). 20 Parameters are given in Table 3.6.

0 0 1 2 3 t [s] #10-5

Chosen peak time and current tmax = 28.14 [µs] I = 92.54 [kA] Parameters of interpolated AEF shown in Figure 3.14 Interval n k c

0 ≤ t ≤ tmax 4 10 0.7565 tmax < t 5 1 41.82

Table 3.6: Parameters’ values of AEF representing the IEC 61312-1 standard waveshape.

155 158

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Measured data Residual D-optimal AEF 10 D-optimal AEF 1.2 Residual MLSM AEF . Free parameter AEF 8 Peaks 1 Interpolated sample points 6 0.8 ) ) t t ( ( i i 4 0.6

2 0.4

0 0.2 0 0 2 4 6 8 0 2 4 6 8 t [s] #10-4 t [s] #10-4 (a) Comparison of data and AEFs from (b) Residuals for the AEFs and data from t = −0.3437 µs to t = 888.1 µs. t = −0.3437 µs to t = 888.1 µs.

Residual D-optimal AEF 10 1.2 Residual MLSM AEF .

8 1

6 0.8 ) ) t t ( ( i i 4 0.6 Measured data D-optimal AEF 0.4 2 Free parameter AEF Peaks 0.2 0 Interpolated sample points 0 0 2 4 6 8 0 2 4 6 8 t [s] #10-6 t [s] #10-6 (c) Comparison of data and AEFs from (d) Residuals for the AEFs and data from t = −0.3437 µs to t = 9.280 µs. t = −0.3437 µs to t = 9.280 µs.

Figure 3.16: Comparison of two AEFs with 13 peaks and 2 terms in each interval fitted to measured lightning discharge current derivative from [69]. One is fitted by interpolation on D-optimal points and the other is fitted with free parameters using the MLSM method. Parameters of the D-optimal version are given in Table 3.7.

156 159

3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN

Peak times and currents Parameters of fitted AEF t [µs] I [µs] Interval n k c t1 = 0.3998 I1 = 8.159 0 ≤ t ≤ t1 2 2 0.4773 t2 = 0.9468 I2 = 10.96 t1 ≤ t ≤ t2 2 10 2.148 t3 = 1.458 I3 = 11.14 t2 ≤ t ≤ t3 2 1 0.3964 t4 = 1.873 I4 = 10.26 t3 ≤ t ≤ t4 2 1 0.2210 t5 = 2.475 I5 = 10.07 t4 ≤ t ≤ t5 2 10 1.695 t6 = 2.904 I6 = 9.819 t5 ≤ t ≤ t6 2 1 0.4591 t7 = 3.533 I7 = 8.519 t6 ≤ t ≤ t7 2 1 0.3503 t8 = 3.985 I8 = 9.097 t7 ≤ t ≤ t8 2 10 3.716 t9 = 5.036 I9 = 8.485 t8 ≤ t ≤ t9 2 1 0.6963 t10 = 6.168 I10 = 8.310 t9 ≤ t ≤ t10 2 1 0.2954 t11 = 8.472 I11 = 8.413 t10 ≤ t ≤ t11 2 6 3.074 t12 = 20.48 I12 = 8.576 t11 ≤ t ≤ t12 2 1 0.2784 t13 = 137.5 I13 = 4.178 t12 ≤ t ≤ t13 2 1 0.6456 t13 < t 4 1 0.3559

Table 3.7: Parameters’ values of AEF with 13 peaks representing measured data for a lightning discharge current from [245]. Local maxima and corresponding times extracted from [69, Figures 6, 7 and 8] are denoted t and I and other parameters correspond to the fitted AEF shown in Figures 3.16 (a), 3.16 (b) and 3.16 (c).

157 160

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Measured data D-optimal 12-peaked AEF 30 Free parameter AEF Peaks 10 Interpolated sample points

20 ] kA [ [kA/s] i t 5 d i/ 10 d Measured data D-optimal AEF 0 0 Free parameter AEF

0 2 4 6 0 2 4 6 t [s] #10-6 t [s] #10-6 Figure 3.17: Comparison of two AEFs Figure 3.18: Comparison of results of with 12 peaks and 2 terms in each in- integrating the approximating function terval fitted to measured lightning dis- shown in Figure 3.17. charge current derivative from [130]. Parameters are given in Table 3.8.

Peak times and currents Parameters of fitted AEF t [µs] I [µs] Interval n k c t0 = −0.3437 I0 = 0 t0 ≤ t ≤ t1 2 10 0.06099 t1 = 0.9468 I1 = 36.65 t1 ≤ t ≤ t2 2 1 0.4506 t2 = 0.5001 I2 = −2.208 t2 ≤ t ≤ t3 3 1 0.04772 t3 = 0.9215 I3 = 6.89 t3 ≤ t ≤ t4 2 1 0.4502 t4 = 1.212 I4 = −7.322 t4 ≤ t ≤ t5 3 1 0.2590 t5 = 1.714 I5 = 3.402 t5 ≤ t ≤ t6 3 2 0.9067 t6 = 2.103 I6 = 1.319 t6 ≤ t ≤ t7 3 1 0.3333 t7 = 2.730 I7 = −1.844 t7 ≤ t ≤ t8 3 1 0.03732 t8 = 3.416 I8 = 16.08 t8 ≤ t ≤ t9 2 4 3.3793 t9 = 4.005 I9 = −5.787 t9 ≤ t ≤ t10 2 1 1.4912 t10 = 4.216 I10 = −0.1268 t10 ≤ t ≤ t11 2 2 0.09448 t11 = 4.875 I11 = 1.972 t11 ≤ t ≤ t12 2 6 2.288 t12 = 5.538 I12 = 1.683 t13 < t 3 1 0.001705

Table 3.8: Parameters’ value of AEF with 12 peaks representing measured data for a lightning discharge current derivative from [130]. Chosen peak times are denoted t and I and other parameters correspond to the fitted AEF shown in Figure 3.17.

158 161

3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN

3.3.7 Summary of ESD modelling Here we examined a mathematical model for representation of ESD currents, either from the IEC 61000-4-2 Standard [132], or experimentally measured ones. The model has been proposed and successfully applied to lightning current modelling in Section 3.2 and [183] and named the multi-peaked analytically extended function (AEF). It conforms to the requirements for the ESD current and its first deriva- tive, which are imposed by the Standard [132] stating that they must be equal to zero at moment t = 0. Furthermore, the AEF function is time- integrable, see Section 3.1.1, which is necessary for numerical calculation of radiated fields originating from the ESD current. We also consider how the model can be fitted to a waveform by restricting the exponents in the AEF to an arithmetic sequence and then interpolate points of the function we wish to approximate chosen according to a D- optimal design. This makes the modelling less flexible than the case where all exponents can be chosen freely but gives a scheme for fitting the function that scales better to many data points than the MLSM fitting scheme used in [145, 183, 184, 188]. We apply the resulting methodology to some realistic cases, either taken from standards, see Section 3.3.5 and 3.3.6, or measured data, see Sections 3.3.5, 3.3.6 and 3.3.6. The methodology can give fairly accurate results even with a modest number of interpolated points but strategies for choosing some of the involved parameters should be further investigated. The decaying part of the waveforms are consistently difficult to fit and if the models are used in a context where significant error propagation appears a more flexible approach can be desirable.

159 162 163

Chapter 4

Comparison of models of mortality rate

This chapter is based on Papers H, and I

Paper H Karl Lundeng˚ard,Milica Ranˇci´cand Sergei Silvestrov. Modelling mortality rates using power-exponential functions. Submitted to journal, 2019.

Paper I Andromachi Boulougari, Karl Lundeng˚ard,Milica Ranˇci´c, Sergei Silvestrov, Belinda Strass and Samya Suleiman. Application of a power-exponential function based model to mortality rates forecasting. Communications in Statistics: Case Studies, Data Analysis and Applications, Volume 5, Issue 1, pages 3 – 10, 2019. 164

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

4.1 Modelling and forecasting mortality rates

In this chapter an overview of models of mortality rate found in literature will be given and three new models introduced. The different models will then be compared to each other by fitting the models to the central mortality rate for men in various countries and computing the corresponding AIC values. After that we will examine what happens when the central mortality rate is replaced with mortality rates given by a fitted model in the Lee– Carter method of forecasting described in Section 1.6.1. We will attempt to characterize the behaviours of a few models using data from a few different countries by looking at how well the values produced by the different mod- els match the assumptions of the Lee–Carter method and how reliable the parameters in the forecasting model are when based on different data sets.

4.2 Overview of models

Modelling all the factors that can affect the lifespan of an individual is not feasible, instead highly simplified models are used. We will consider some previously introduced models, see Table 4.1. Many of these models appear under different names, can be written in different ways and have several variants. For the comparison only the form written in the table will be considered. For all models all parameters are real-valued and non-negative except for Hannerz model where the parameters can be negative. In Section 1.6 the basic properties of survival functions and mortality rate were discussed. Here we will quickly summarize them for convenience. The survival function Sx(∆x) gives the probability that an individual survives another ∆x units of time. It is related to µ as follows dS0  Z x+∆x  dx µx = − ,Sx(∆x) = exp − µ(t) dt . S0(x) x There are three conditions that a survival function must satisfy to have a reasonable interpretation in terms of lifespan, Sx(0) = 1, lim Sx(∆x) = 0 ∆x→∞ and Sx(∆x) must be non-increasing. See page 65 for motivation. There are four patterns that are commonly observed when examining central mortality rate for developed countries: 1 1. Rapid decrease in mortality rate for young ages, µ(x) ∼ x for small x. 2. A ’hump’ for young adults with a rapid increase that levels off and remains constant or slowly decreases for some years. 3. Exponential growth for higher ages, µ(x) ∼ ecx for large x. 4. Deceleration of mortality rate growth for the highest ages. In Table 4.1 the Heligman–Pollard 1 model and all subsequent models can describe all these patterns to some extent. The models that cannot has either been important historically or can describe some age interval well.

162 165

4.2. OVERVIEW OF MODELS

Gompertz–Makeham [92] µ(x) = a + becx a xa−1 Weibull [289] µ(x) = b b aebx Logistic [29] µ(x) = ac 1 + (ebx − 1) b a Modified Perks [41] µ(x) = + d 1 + eb−cx ea−bx Gompertz inverse Gaussian [41] µ(x) = √ 1 + e−c+bx x x Double Geometric [92] µ(x) = a + b1b2 + c1c2 (x−c)2 −b x −b2 b x Thiele [272] µ(x) = a1e 1 + a2e 2 + a3e 3  2 a3 x (x+a2) −b2 ln b x Heligman–Pollard 1 [122] µ(x) = a1 + b1e 3 + c1c2 2 a  x  x (x+a2) 3 −b2 ln c1c b3 2 Heligman–Pollard 2 [122] µ(x) = a1 + b1e + x 1 + c1c2 2 a  x  x (x+a2) 3 −b2 ln c1c b3 2 Heligman–Pollard 3 [122] µ(x) = a1 + b1e + x 1 + c3c1c2 2 c3 a  x  x (x+a2) 3 −b2 ln c1c b3 2 Heligman–Pollard 4 [122] µ(x) = a1 + b1e + xc3 1 + c1c2 Hannerz [114] G (x) G (x) f(x) g1(x)e 1 g2(x)e 2 µ(x) = with f(x) = α 2 + (1 − α) , 1 + F (x) (1 + eG1(x)) (1 + eG2(x))2 eG1(x) eG2(x) F (x) = α + (1 − α) , 1 + eG1(x) 1 + eG2(x) a a a x2 a g (x) = 1 + a x + a ecx, G (x) = a − 1 + 2 + 3 ecx, 1 x2 2 3 1 0 x 2 c a a a x2 a g (x) = 5 + a x + a ecx and G (x) = a − 5 + 2 + 3 ecx 2 x2 2 3 2 4 x 2 c First Time Exit Model: SKI-6 [139] g(x)  2  √ k Hx 4 2 3 µ(x) = with g(x) = √ exp − , H(x) = a1 + ax − b x + lx − cx Z ∞ x3 2x g(t) dt x First Time Exit Model: Fractional 1st order approximation [261] g(x) 2|l + (c − 1)(bx)c|  −(l − (bx)c)2  µ(x) = where g(x) = √ exp − Z ∞ 3 2σ2x g(t) dt σ 2πx x First Time Exit Model: Fractional 2nd order approximation [261] g(x) µ(x) = Z ∞ where g(t) dt x 2 2|l + (c − 1)(bx)c| c(c − 1)(bx)c   −(l − (bx)c)2  g(x) = √ √ + k exp − σ 2πx σ 2πx 2|l + (c − 1)(bx)c| 2σ2x

Table 4.1: List of the models of mortality rate previously suggested in literature that are considered in this paper. The references gives a source with a more detailed description of the model, not necessarily the original source of the model.

163 166

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

4.3 Power-exponential mortality rate models

This section is based on Paper H

In this section we will construct a phenomenological model of the mor- tality rate and show how to easily interpret its parameters that in terms of patterns 1–3 in Section 4.2 and uses the power-exponential function defined in Section 4.3. 3 Let A ⊂ R with n nonnegative elements. If a ∈ A we write a = (a1, a2, a3).

We will use the following expression to approximate the mortality rate c 1 X −a2xa3 µ(x) = + a1 xe . xe−c2x a∈A

 Z t  The survival function is given by Sx(t) = exp − µ(x + s) ds . Con- 0 sider the change of variable u = a2a3(x + s) t Z c  a3 1 X −a2(x+s) − ln(Sx(t)) = ds + a1 (x + s)e ds (x + s)e−c2(x+s) 0 a∈A Z t Z a2a3(x+t) c1 X a1 = ds + ua3 e−a3u ds −c (x+s) a +1 (x + s)e 2 (a2a3) 3 0 a∈A a2a3x    = c1 Ei c2(x + t) − Ei(c2 x) X a1 + a +1 (γ(a3 + 1, a2a3(x + t)) − γ(a3 + 1, a2a3x)) (a2a3) 3 a∈A

Z t where γ(a, t) = xa−1e−x dx is the lower incomplete Gamma function and 0 Z ∞ e−s Ei(x) = − ds is the exponential integral [2]. Thus −x s

  Sx(t) = exp c1 Ei(c2x) − Ei c2 (x + t) ! X a1    · exp − a +1 γ a3 + 1, a2a3(x + t) − γ(a3 + 1, a2a3x) (a2a3) 3 a∈A   = exp c1 Ei(c2x) − Ei c2 (x + t)   Y a1   · exp a +1 γ(a3 + 1, a2a3x) − γ a3 + 1, a2a3(x + t) . (a2a3) 3 a∈A

164 167

4.3. POWER-EXPONENTIAL MORTALITY RATE MODELS

Since exp (γ(a, x) − γ(a, x + t)) is non-increasing with respect to t the  difference Ei(c2x) − Ei c2 (x + t) is also non-increasing with respect to t and thus Sx(t) is non-increasing with respect to t. Thus Sx(t) is non-decreasing and continuous, Sx(0) = 1 and lim Sx(t) = t→∞ 1 so this mortality rate models gives a reasonable survival function. In Sections 4.3.3 and 4.3.4 two more models that use similar expressions will be introduced, that these models also have the desired properties can be shown in an analogous way.

4.3.1 Multiple humps The model is defined so that there can be an arbitrary number of humps by adjusting the number of terms. See Figure 4.1 for an illustration of this. There are instances where several humps have been observed in mortality rate data but the examples that the authors are aware of are all in the early 19th century and therefore of low interest for the type of modelling considered here.

Sweden 1808 Sweden 1751 -2 -2 central mortality rate central mortality rate first term model mortality rate -3 second term -3 third term model mortality rate (x)) -4 (x)) -4 ln( ln(

-5 -5

-6 -6 10 20 30 40 50 60 70 10 20 30 40 50 60 70 age, x, years age, x, years

Figure 4.1: Examples of mortality rate curves with multiple humps. These models are hand-fitted and are intended to illustrate that they can replicate multiple humps, not show the best possible fit for multiple humps.

4.3.2 Single hump model We can model a single hump using c 1 −a2xa3 µ(x) = + a1 xe . (138) xe−c2x

The parameters c1, c2 and (a1, a2, a3) can easily be interpreted in terms of qualitative properties of the curve. c1 To interpret the effects of c1 note that c1 µ(x) → x when x → 0. d Looking at ln(µ(x)) instead we can note that dx ln(µ(x)) → c2 when t → ∞

165 168

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

so c2 gives the approximate slope of ln(µ(x)) for large x. The effects of the two terms are illustrated in Figure 4.2 where some examples of fitting the model to data from various countries are shown. In the simplest case where A = {(a1, a2, a3)} we get the expected hump a1 when ai > 0 for i = 1, 2, 3. In this case a3 gives the maximum height of a2 the hump, 1 gives the time for the humps maximum and a determines a2 3 the steepness of the hump (the larger a3 the steeper rise and the faster the decay).

4.3.3 Split power-exponential model

In the power-exponential model the parameter c1 affects the shape of the curve both for high and low ages and a3 affects both the increasing and decreasing part of the hump. To construct a model where this coupling is avoided we can split the two terms in the model at their respective local extreme points and adjust the values so that µ(x) is continuous (since the terms are split at local extreme points the derivative will also be continuous). We will refer to such a model as the split power-exponential model and it will give the following expression for the mortality rate. c˜  1  −a2xa˜ µ(x) = −c x + a1 xe +θ x − · c2 · e · (c1 − c3) (139) xe 2 c2 where

( 1 ( 1 ( c1, x ≤ a3, x ≤ 0, x ≤ 0 c˜ = c2 , a˜ = a2 , θ(x) = . c , x > 1 a , x > 1 1, x > 0 3 c2 4 a2 This increases the total number of parameters from five to seven. To in- terpret the parameters in this model we can reason similarly to the power- exponential model and note that c1 largely determines the slope of the mor- tality rate for low ages, c2 and c3 largely determines the slope for high ages, a1, a2 and a3 determine the maximum height of the hump, a2 determine the location of the maximum while a3 and a4 determine the slope before and after the maximum of the hump.

4.3.4 Adjusted power-exponential model The split power-exponential model above can give some adjustments to the shape but when comparing the model to data it was found that it still has some issues, primarily with matching the infant mortality rate. For this reason we suggest another modification that needs eight parameters. c˜ ec2x  −a2xa˜ µ(x) = c1 + a1 xe (140) c2x

166 169

4.4. FITTING AND COMPARING MODELS where

( 1 ( 1 c3, x ≤ , a3, x ≤ , c˜ = c2 anda ˜ = c2 c , x > 1 , a , x > 1 . 4 c2 4 c2

In this model the c2 parameter can be interpreted as the position of the minimum mortality rate if there is no hump. The slope before the minimum is controlled by c3 and after the minimum the slope is controlled by c4. The remaining parameter in the first term, c1, is an overall scale factor for the non-hump part of the model. The parameters in the second factor, a1, a2, a3 and a4 can be interpreted the same ways as for the split power-exponential model.

4.4 Fitting and comparing models

This section is based on Paper H

In order to compare the different models each model will be fitted to central mortality rates, defined on page 66, taken from the Human Mortality Database (HMD) [127]. The central mortality at age x for year t is denoted by mx,t and here we will only consider time intervals of one year and thus the estimates for the central mortality rate mx,t is estimated the same way as µ(x) so for any given year t we can assume that mx,t ≈ µ(x). What properties of a model is the most important depends on the in- tended application. Here we will try to examine how well the different models can describe the mortality rate in the entire human lifespan. To do this we will fit the model to the ages 1–100 years. The age range is chosen since it covers all the common patterns discussed on Page 67 and the HMD has reliable data for this age-range. One of the features that is most challenging to model is the hump that happens for young adults. Since this effect is usually much more pronounced for men we will only consider mortality rate data for men in this analysis. Since the mortality rate for high ages is several orders of magnitude higher than the mortality rate for low ages in order to recreate features like the adolescence hump we will find a least squares fitting of the natural logarithm of the model and the natural logarithm of the data. Some details on the methods used for fitting the different model will be discussed in Section 4.4.1. After the fitting is performed we will compare the quality of the fit of the different models using Akaike’s Information Criterion (AIC) described in Section 1.3.3. If we denote the estimated mortality rate at age x in year t ∈ {1, . . . , n} with mx,t as before and let µ(x) be the fitted model, in other

167 170

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions words we have chosen the parameters such that n X 2 (ln(mx,t) − µ(x)) x=1 is minimized, then the maximum of the likelihood function is given by n ! n X 2 Lˆ = − ln ln(m ) − ln µ(x) + c where c is a constant that 2 x,t x=1 only depends on the dataset used, not on the fitted model [40]. Since we will only use the AIC to compare different models we can ignore the constant term and simply use the following expression for the AIC

n ! X 2 AIC = 2k + 2 + n ln ln(mx,t) − ln µ(x) . (141) x=1 Remark: For models with many parameters and small sample sizes it is recommended to use the AICC , discussed in Section 1.9, given by the ex- pression 2(k + 1)(k + 2) AIC = AIC + . C n − k − 2 In the current investigation the sample size will be n = 100 and the number of parameters varies between 2 and 9. Thus the second order correction term 1 220 will vary between 4 and 89 ≈ 2.47. For the analysis done in this paper, see Section 4.4.2 and supplementary material, this correction is usually much smaller than the differences between the AIC from year to year for the different models and using the AICC rarely changes which model has the lowest information criterion. Since we are mainly looking for trends in when the different models are suitable this means that for this analysis using the AIC or the AICC makes no difference in practice. In the supplementary materials [193] you can find the computed AICC values as well as the AIC.

4.4.1 Some comments on fitting The models will be fitted to central mortality rate for men from 36 countries taken from the HMD. For each country the most recently available central mortality rates data for was taken for the ages 1–100 years. If more than 100 years of data was available for a country the most recent 100 years were chosen, otherwise all available data was used. In some cases a few data points were missing, usually the mortality rate for a single age in a single year, in these cases the missing data was interpolated as the mean value of neighbouring values. The models were fitted using software designed for numerical computations ([206]) and the majority of the models were fitted using an algorithm based on the interior reflective Newton method described in [55, 56].

168 171

4.4. FITTING AND COMPARING MODELS

The three first time exit models (SKI-6, Fractional 1st order and Frac- tional 2nd order) were fitted using the methodology described in their orig- inal sources where instead of fitting the logarithm of µ(x) directly g(x) is n X approximated from the central mortality rates as g(x) ≈ mx,t and the k=1 result fit the parameters of the g(x) function (using the interior reflective Newton method and σ = 1) and then the mortality rate is computed as Z ∞ −1 µ(x) = g(x) g(t) dt using high accuracy numerical integration. x The power-exponential model given by (138) caused numerical instabil- ities in the fitting procedure so it was rewritten on the form c 1 a3−a1 −a2xa3 µ(x) = + e a2xe xe−c2x instead that resulted in a much more stable behaviour. The two variations of the power-exponential model given by (139) and (140) had similar issues and could not be reliably fitted using the interior reflective Newton method. In order to fit the them were rewritten on the following forms:

Split power-exponential a˜   c˜ a˜−a2  2 −a2x 1 2 2 2 µ(x) = + e 1 a xe 2 + θ x − · c · e · c − c  −c2x 2 2 2 1 3 xe 2 c2  2 1  2 1 ( c1, x ≤ c2 a3, x ≤ a2 0, x ≤ 0 wherec ˜ = 2 , a˜ = 2 , θ(x) = , 2 1 2 1 c3, x > 2 a4, x > 2 1, x > 0 c2 a2

Adjusted power-exponential model  2 c˜ a˜ 2 c2x 2  2  −c1−c˜ e a1−a˜ 2 −a2x µ(x) = e 2 + e a2xe c2x  2 1  2 1 c3, x ≤ 2 , a3, x ≤ 2 , c2 c2 wherec ˜ = 1 anda ˜ = 1 c4, x > 2 , a4, x > 2 . c2 a2

The parameters for these forms were then found using the Marquardt– Levenberg method [174, 202]. This fitting can produce negative parameters but since they are squared in the formulation of the models they can be replaced with their absolute value. For both the split power-exponential model and the adjusted power-exponential model the parameters can be chosen such that the resulting model is equivalent to the power-exponential model. To get a good fit with these models it is a good idea to first fit the power-exponential model and then use those parameters as initial values for the fitting of the two modified models.

169 172

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

When performing the least squares fitting it is assumed that the residu- als are normally distributed with mean zero and that the variance does not depend on age. This is not always an accurate set of assumptions for several reasons. One reason is that some models cannot recreate all expected fea- tures in the data, e.g. the Gompertz–Makeham, Weibull or Modified Perks model cannot describe infant mortality accurately. Another reason is that some age intervals are consistently noisier than others. This is most no- ticeable in the intervals with the lowest and the highest mortality rates in countries with small to medium-sized populations. For these countries the ages with the lowest mortality rates are estimated from few deaths and the highest mortality rates are estimated from a small population. In Figure 4.3 some examples of quantile-quantile plots (Q-Q plots, see Section 1.3.2) for the residuals of models fitted to data are shown. We expect the residuals to mostly follow a straight line in the Q-Q plot apart from some random noise. For the models and data sets we have examined in this paper there is some deviation for all models and data sets. In most cases this is a combination of some difference in trend and outliers. The patterns also vary from model to model and country to country so when applying a model it might be worth considering adapting the fitting method to account for this. Here we still use the simple least squares fitting since choosing a more accurate distribution for the residuals would require more information about the considered countries (e.g. population numbers) and evaluating how well the models perform based on a very simple case can be informative about the practicality of the models. Anyone who intends to use models like these should evaluate them based on the particular data set they intend to use them for and what properties of the fitted model that are considered the most important for the intended application. The purpose of a systematic comparison like the one in this paper is to help identify some small number appropriate models and to evaluate and compare newly constructed models to existing models. In some cases models that allow a high degree of plasticity for the location and height of the hump (Hannerz, Heligman–Pollard and power-exponential models) are susceptible to overfitting when the central mortality rates are noisy, for examples see Figure 4.4. Choosing the model with lowest AIC often helps avoiding overfitting [40] but if the location and size of the hump is important extra care must be taken when fitting these models. For these models it is also possible to identify a specific part of the expression that controls the shape of the hump, so adding appropriate constraints to the parameters of the model can help mitigate overfittting.

170 173

4.4. FITTING AND COMPARING MODELS

USA 1995 Canada 1995

-4 -4

(x)) -6 (x)) -6 7 7 ln( -8 ln( -8

-10 -10 0 10 20 30 40 50 60 0 10 20 30 40 50 60 age, x, years age, x, years Sweden 1995 Switzerland 1995

-4 -4 (x)) (x)) -6 -6 7 7 ln( ln( -8 -8

-10 -10 0 10 20 30 40 50 60 0 10 20 30 40 50 60 age, x, years age, x, years Ukraine 1995 Japan 1995

-4 -4 (x)) -6 (x)) -6 7 7 ln( ln( -8 -8

-10 -10 0 10 20 30 40 50 60 0 10 20 30 40 50 60 age, x, years age, x, years Taiwan 1995 Australia 1995

-4 -4 (x)) (x)) -6 -6 7 7 ln( ln( -8 -8

-10 -10 0 10 20 30 40 50 60 0 10 20 30 40 50 60 age, x, years age, x, years Chile 1995

-4 central mortality rate first term

(x)) -6 second term 7 model mortality rate ln( -8

-10 0 10 20 30 40 50 60 age, x, years

Figure 4.2: Examples of the power-exponential model fitted to the central mor- tality rate for various countries with the role of the two terms il- lustrated.

171 174

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Hannerz model fitted to central First Time Exit: Fractional 2nd order approx. mortality rate for USA 2017 fitted to central mortality rate for USA 2017 1 1

0.5 0.5

0

0 -0.5 Quantiles of residuals Quantiles of residuals

-0.5 -1 -2 -1 0 1 2 -2 -1 0 1 2 Standard normal quantiles Standard normal quantiles

Adjusted power-exponential model fitted Heligman-Pollard 4 model fitted to to central mortality rate for USA 2017 central mortality rate for USA 2017 0.4 0.4

0.2 0.2

0 0

-0.2 -0.2

-0.4 -0.4 Quantiles of residuals Quantiles of residuals

-0.6 -0.6 -2 -1 0 1 2 -2 -1 0 1 2 Standard normal quantiles Standard normal quantiles Figure 4.3: Examples of quantile-quantile plots for the residuals of some mod- els that fit the central mortality rate for USA 2017 well. The closer the residuals are to the dashed line the better the residuals match the expected result from a normal distribution. All models con- sidered in this chapter show some degree of deviation, but more complicated models generally deviate less.

172 175

4.4. FITTING AND COMPARING MODELS

Estonia 1960 Denmark 2015 0 0 m m x,t -2 x,t -2 Heligman-Pollard 3 Heligman-Pollard HP4 -4 -4 ) ) -6 ln( -6 ln( -8

-8 -10

-10 -12 0 20 40 60 80 100 0 20 40 60 80 100 age (years) age (years)

Estonia 2015 Canada 1965 0 0 m m x,t x,t -2 Power-exponential -2 Adjusted power-exponential

-4 -4 ) ) ln( -6 ln( -6

-8 -8

-10 -10 0 20 40 60 80 100 0 20 40 60 80 100 age (years) age (years)

Figure 4.4: Examples of instances of overfitting with a few different models. Overfitting around the hump happens occasionally for most of the models where the hump is controlled by a separate term in the expression for the mortality rate. Here mx,t refers to the central mortality rate for men taken from the Human Mortality Database.

173 176

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

4.4.2 Results and discussion

In this section we will present and discuss some representative examples of the results of fitting the models to central mortality rates for men in 36 countries. The parameter values for the fitted models and computed AICs can be found in the supplementary material. The three models introduced in Section 4.3 fit the data to varying de- grees. Usually the models can receive a good fit past the peak of the hump but the power-exponential model often has issues fitting the mortality at low ages and some times for the rising part of the hump. The split-exponential model often improves the fitting on the rising part of the hump but still can- not properly fit the mortality at low ages. The adjusted power-exponential consistently outperforms the other two models when fitting the mortality for young ages. See Figure 4.5 for some representative examples.

Sweden 1970 Sweden 2014 0 0 Central mortality rate Central mortality rate -2 Power-exponential -2 Power-exponential -4 -4 ) ) -6 ln( -6 ln( -8

-8 -10

-10 -12 0 20 40 60 80 100 0 20 40 60 80 100 Ages (years) Ages (years) Japan 1970 Japan 2014 0 0 Central mortality rate Central mortality rate -2 Split power-exponential -2 Split power-exponential -4 ) -4 ) -6 ln( -6 ln( -8

-8 -10

-10 -12 0 20 40 60 80 100 0 20 40 60 80 100 Ages (years) Ages (years) USA 1970 USA 2014 0 0 Central mortality rate Central mortality rate -2 Adjusted power-exponential -2 Adjusted power-exponential

-4 -4 ) ) ln( -6 ln( -6

-8 -8

-10 -10 0 20 40 60 80 100 0 20 40 60 80 100 Ages (years) Ages (years)

Figure 4.5: Some examples of the three models introduced in Section 4.3 fitted to central mortality rate for men taken from the Human Mortality Database.

174 177

4.4. FITTING AND COMPARING MODELS

Comparing the AIC of the models to each other shows that the models with eight or more parameters (Hannerz, Heligman–Pollard 1-4 and adjusted power-exponential) often perform better than the other models. In Table 4.2 we show the AIC for all the different models fitted to the central mortality rate for Switzerland for ten different years. In Figure 4.6 the computed AIC values are shown for seven different countries and all models except the Weibull model that is excluded since it gives very large AIC values compared to the other models. Note that in Figure 4.6 the Weibull model has been omitted since it consistently performs much worse than the other models. The values in Table 4.2 and the graphs in Figure 4.6 show trends that are observed to some extent in all of the examined countries.

1916 1926 1936 1946 1956 1966 1976 1986 1996 2006 2016 Weibull 520.5 523.3 529.8 542.4 572.6 573.3 570.4 564.1 560.8 570.7 574.6 Gompertz–Makeham 289.7 296.4 285.9 287.0 323.0 319.1 301.3 299.9 300.2 328.6 320.2 Gompertz Inv. G. 401.0 399.8 400.1 411.4 428.6 432.8 421.5 413.6 396.8 412.8 411.3 Logistic 388.5 387.5 387.6 399.3 417.3 421.9 410.6 402.9 385.6 402.8 401.3 Modified Perks 290.0 295.2 283.7 286.2 323.4 315.1 302.7 301.9 302.2 330.6 322.2 Thiele 270.3 301.9 311.8 157.8 297.9 291.8 209.4 343.4 351.3 384.1 319.0 Double Geometric 171.3 352.1 341.1 376.9 470.2 438.1 351.8 485.8 416.6 475.5 504.3 Hannerz 144.3 106.3 144.5 103.0 137.0 150.9 177.9 132.3 159.5 242.6 230.2 Power-exponential 240.7 237.2 237.2 256.5 286.8 298.0 277.5 272.8 260.6 291.6 293.2 Split power-exp. 215.9 220.2 210.6 217.8 268.7 272.3 245.6 240.3 242.5 286.3 281.7 Adjusted power-exp. 89.5 74.9 102.9 99.2 134.4 165.0 151.8 134.3 168.5 246.1 220.1 SKI-6 170.1 197.3 179.4 152.7 135.0 176.6 216.3 193.9 201.7 288.8 285.5 Fractional 1st order 291.7 306.3 305.3 309.1 268.9 304.9 344.4 400.1 397.4 402.7 404.5 Fractional 2nd order 228.7 225.6 224.8 230.2 201.2 245.1 279.1 284.7 293.5 331.8 337.9 Heligman–Pollard 1 123.2 98.3 129.8 123.5 124.9 180.2 168.0 145.7 174.3 280.4 230.3 Heligman–Pollard 2 121.5 91.4 117.7 106.3 153.5 169.4 168.1 155.0 177.7 290.4 239.7 Heligman–Pollard 3 122.9 76.6 87.7 103.0 159.1 156.7 140.5 151.1 174.6 231.1 213.4 Heligman–Pollard 4 108.4 74.7 98.4 95.0 150.5 164.6 138.2 142.1 172.6 226.1 207.8

Table 4.2: Computed AIC values for the different models fitted to the central mortality rate for men for Switzerland for ten different years. In each column the lowest AIC for that year is marked in bold.

The Gompertz–Makeham model and modified Perks model are usually very similar and outperforms most other models with four parameters or less. The exception is the Fractional 2nd order approximation of the First Time Exit model that has four (free) parameters and does considerably better than the classical models in many cases. The performance of the Double geometric and Thiele models is incon-

175 178

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions sistent but usually relatively poor. For the other models with five to seven parameters the power-exponential and split power-exponential usually do better than models with fewer parameters but generally not as well as the SKI-6 model. The performance of the SKI-6 model also varies significantly, this might be a consequence of the different fitting scheme applied to the First Time Exit models. In fact, the SKI-6 model occasionally performs just as well as the best models in the set of tested models. Among the models with eight parameters or more there is no model that is clearly superior to the others but sometimes there is a model that clearly works better for a specific country, e.g. the adjusted power-exponential performs very well for the USA after 1950 and Hannerz model performs very well for Japan after 1960. Based on the countries and models considered here a good default method for describing the entire lifespan would be Heligman– Pollard 3 or Heligman–Pollard 4 since they consistently perform well and were comparatively easy to fit. For many countries the AIC for the best performing models tends to rise over time thus decreasing the difference between the best performing models and the other models. This effect is more pronounced in countries with smaller populations. A possible explanation is that overall the logarithm of central mortality rate is decreasing and getting more noisy. This change can be seen in Figure 4.5, the effect is strongest in Sweden where the population was approximately eight to ten million in 1970–2014, the upwards trends and changes in noise are less noticeable in Japan (population 100–126 million) and in the USA the (population 209–320 million), where no upwards trend for the AIC could be seen in Figure 4.6, there is no increase in noise level. As the noise level increases the best possible fit is reduced and the advantages of the more complicated models are reduced. For some countries the hump also becomes less distinct over time, e.g. due to a decreasing number of lethal traffic accidents (one of the major causes of death for young men), and when the hump becomes less distinct the accuracy of the model near the hump matters less. There are several ways that this work can be continued. Since different models give mortality rates with different qualitative properties it could also be valuable to do a similar analysis where the models are only fitted to a shorter age interval, for instance only higher ages or only lower ages. The fitting procedure can probably also be improved, for instance taking into account how the noisiness of the central mortality rates varies with age and population size or finding a fitting procedure that is based on a more accurate assumption on the properties of the residuals.

176 179

4.4. FITTING AND COMPARING MODELS

USA Canada 500 500

400 400

300 300 200 AIC AIC 200 100

100 0

-100 0 1940 1950 1960 1970 1980 1990 2000 2010 1940 1960 1980 2000 Year Year Sweden Switzerland 600 600

500 500

400 400 300 300 AIC AIC 200 200 100

0 100

-100 0 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 Year Year Japan Taiwan 600 500

500 400

400 300 300 200 AIC AIC 200 100 100

0 0

-100 -100 1950 1960 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 Year Year Australia 600

500 Hannerz Gompertz-Makeham Double Geometric Gompertz Inverse Gaussian 400 Thiele Logistic SKI-6 Modified Perks Model 300 Fractional 1st order Power-exponential AIC Fractional 2nd order Split power-exponential Heligman-Pollard 1 Double power-exponential 200 Heligman-Pollard 2 Heligman-Pollard 3 100 Heligman-Pollard 4

0 1940 1960 1980 2000 Year Figure 4.6: AIC for seven countries and seventeen models.

177 180

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

The models considered here are simple parametric models. There are other ways to mathematically model mortality rates, e.g. the dynamical model methods described in [260], and comparing them to the parametric models considered here could be interesting. How well the models fit data is only one of the properties that are im- portant for the model. There are many other properties that could deter- mine which models is most suitable depending on the intended application. Analysing how the same list of models interact with different methods for forecasting mortality rates (see [33] for example of this), computation of estimates like life expectancy or other demography related key values, e.g. the method of estimating the expected healthy lifetime described in [259], how the parameters of the models can be interpreted and measured indepen- dently of the mortality rate, simplicity of expressions for survival functions and similar theoretical concerns, and so on, could all be useful complements when considering what model to choose in given situation. When identifying that a certain model works better for certain countries than others it could also be very interesting to try and find some way to explain why a certain model works well for a certain data set and try to see if those properties could be used to create helpful indicators for what model is suitable for a data set that has not been examined before, or if they can be useful to identify what feature of the group that the data set describes that makes it behave in a particular way.

4.5 Comparison of parametric models applied to mortality rate forecasting

This section is based on Paper I

In Section 1.6.1 we described the Lee–Carter method for forecasting the logarithm of central mortality rates. In this Section we will examine how the results of using this method is affected if the central mortality rate is replaced by mortality rates given by a mathematical model. The Lee–Carter method is based on the assumption that central mortal- ity rates can be fairly accurately approximated by ln(mx,t) = ax + bxkt + εx,t, where ax, bx and kt are computed from historical central mortality rate as described in Section 1.6.1. In this section we will discuss two ways of char- acterising the reliability of the forecast by examining the mortality indices kt. We will then compare the results of applying the Lee–Carter method to data either taken from the Human Mortality database (HMD) or given by a simple mathematical model fitted to the central mortality rate from the HMD.

178 181

4.5. COMPARISON OF PARAMETRIC MODELS APPLIED TO MORTALITY RATE FORECASTING

First we will consider errors of forecasted mortality index, εt, which rep- resent noise that causes deviation from the expected linear change. This 2 term is normally distributed with mean 0 and variance σerror. Note that the kˆs are not independent, they have successive innovations that are inde- pendent and the variance of error between them is estimated as

T 1 X (see)2 = (kˆ − kˆ − θˆ)2 (142) t − 2 t t−1 t−2 which is used to calculate the uncertainty in forecasting kˆt over any given horizon. Then the associated error variance of forecasted values is given as

2 var(kˆt) = (see) · ∆t (143) and the square root of this gives the standard error estimate for the first forecast horizon √ SD(kˆt) = see · ∆t, where ∆t is the forecast horizon which shows that standard deviation in- creases with the square root of the increasing distance to the forecast, and from standard deviation of kˆt we can calculate the confidence band using 95% confidence interval with t-factor of 1.96 as;

(kˆt) ± 1.96(SD(kˆt)).

To forecast two periods ahead, we just substitute for the definition of kˆt−1 moved back in time one period and plug in the estimate of the drift param- eter θˆ kˆt−1 = kˆt−2 + θˆ + εt−1 then kˆt becomes kˆt = kˆt−1 + θˆ + εt

= kˆt−2 + θˆ + εt−1 + (θˆ + εt)

= kˆt−2 + 2θˆ + (εt−1 + εt).

To forecast kˆt at time T + ∆t with the data available up to T , we follow the same procedure iteratively ∆t times and obtain

∆t ˆ ˆ ˆ X kT +∆t = kT + ∆tθ + εT + n − 1 n√=1 ˆ ˆ = kT + ∆tθ + ∆tεt (144)

179 182

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions ignoring the error term since its mean is 0 and assumed to be indepen- dent with the same variance, we get forecast point estimates which follow a straight line as a function of ∆t with slope θˆ

ˆ ˆ ˆ kT +∆t = kT + ∆tθ.

The forecast of the logarithm of the mortality rate can then be computed ˆ ˆ using the forecasted mortality index ln(ˆµx,T +∆t) =a ˆx + bxkT +∆t. In Figure 4.7 an example is shown where the L–C method was applied using the original data and data given by two different models. A thirty year period (1970–2000) was used to compute mortality indices and a prediction was made 10 years in the future and compared to the measured mortality rate at that time.

4.5.1 Comparison of models To compare the different models to each other the mortality indices will be computed over a period of time and then we will use the same RWD to forecast kt in the interval t ∈ [1,T ] and use the result to estimate the variance of t in (144). The variance of t will be estimated by computing ˆ the quantity l∆t that is found by removing the drift term, constant term and the scaling of the stochastic term in (144),

ˆ ˆ ˆ k1+∆t − ∆tθ l∆t = √ . ∆t ˆ We can then consider l∆t to be measurements of l∆t = t ∈ N (0, var(t)) and then var(t) can be estimated using a standard maximum likelihood estimation of ˆlt. Since different models will give different kt they will also give different var(t) and a lower variance indicates a more suitable model. The second way to characterise the suitability of the model for forecasting is to compare the associated error variance of the forecasted values, given by (143). This is characterized by the standard error estimate given by (142). Thus computing and comparing the standard error estimates also gives some idea of the comparative suitability for forecasting of different models. An example of how the mortality indices can change for different models is illustrated in Figure 4.8.

4.5.2 Results, discussion and further work Here we will present results of applying the methodology described in previ- ous sections using five models and data from six countries. The models are Modified Perks, Logistic, Heligman–Pollard HP4 given in Table 4.1 as well as the power-exponential model and split power-exponential model described

180 183

4.5. COMPARISON OF PARAMETRIC MODELS APPLIED TO MORTALITY RATE FORECASTING

Original data 0 ) μ -5 ln(

-10 0 20406080100 age (years) Logistic 0 ) μ -5 ln(

-10 0 20406080100 age (years) Power-exponential 0 ) μ -5 ln(

-10 0 20406080100 age (years)

Central mortality rate 2000..... Lower 95% confidence interval Central mortality rate 2010..... Upper 95% confidence interval Forecasted mortality rate 2010

Figure 4.7: Example of central and forecasted mortality rates for Australia with original data and two different models. The mortality indices were computed using data generated in the period 1970–2000 and the logarithm of the mortality was forecasted 10 years into the future. The forecasted mortality rate 2010 is compared to the initial mor- tality rate (measured mortality rate 2000) and the measured value (measured mortality rate 2010). The three models demonstrate how the quality of the prediction can depend on the model. When using the original data the forecast differs relatively much in the age range 20–60 years. When using the logistic model the predic- tion and the central mortality rate are very similar but the model does not describe the actual shape of the mortality rate curve well. When using the power-exponential model the prediction and cen- tral mortality rate are very similar except around the peak of the hump.

181 184

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

Mortality index Australia 5

Original data Logistic 0 Power-exponential

-5 t k

-10

-15

1980 1990 2000 2010 2020 2030 2040 2050 year

Figure 4.8: Example of estimated and forecasted mortality indices for Australia with three different models along with their 95% confidence inter- vals. Note that the three different models forecast slightly different trendlines and that the confidence intervals have slightly different widths. In Section 4.5.1, two ways of characterising the reliability in the measured interval (1970-2010) and the forecasted interval (2011-2050), respectively, are described. in Section 4.3. The data is taken from the Human Mortality Database (HMD) [128], and gives the mortality rate for ages 1–100 in 1970–2010 for USA, Canada, Switzerland, Japan, Taiwan and Australia. The choice of years and countries is primarily driven by practical consideration, we wanted a set of countries with varying properties with respect to geographical po- sition, populations size and population density while also being developed enough to be qualitatively similar and have reliable data. The year range was chosen so that there would be no obvious major trend changes in mortality in either of the countries, this is a necessary requirement for the require- ments of the L–C model to be considered reasonable. We also wanted a scenario where the fitting method worked efficiently and reliably and the assumptions behind L–C forecasting were relatively reasonable. Note: For some of the countries data was missing for certain years and ages. In these cases the missing data was replaced with an average of neighbouring values. The results are shown in Tables 4.3 and 4.4. In both tables a lower value indicates a more reliable forecasting (assuming there are no major developments that significantly affect the mortality of certain population parts). Examining Table 4.3 we see that most of the time using the measured data for forecasting is the most desirable, the exception are Switzerland and Australia where the Split power-exponential model gives the best results

182 185

4.5. COMPARISON OF PARAMETRIC MODELS APPLIED TO MORTALITY RATE FORECASTING

Estimated Country variance of t USA Canada Switzerland Japan Taiwan Australia Measured data 0.111 0.123 0.123 0.143 0.113 0.0607 Logistic 0.124 0.131 0.140 0.154 0.125 0.0704 Modified Perks 0.122 0.128 0.132 0.149 0.118 0.0695 Power-exponential 0.123 0.130 0.129 0.149 0.141 0.0615 Split power-exp. 0.115 0.125 0.120 0.143 0.135 0.0602 HP4 0.116 0.134 0.128 0.142 0.120 0.0647

Table 4.3: Estimated variance of t found in the way described on page 180. The bold values are the lowest values in each column.

Standard Country error estimate USA Canada Switzerland Japan Taiwan Australia Measured data 0.151 0.199 0.398 0.244 0.299 0.209 Logistic 0.158 0.201 0.345 0.239 0.277 0.238 Modified Perks 0.160 0.204 0.371 0.247 0.294 0.243 Power-exponential 0.157 0.210 0.359 0.235 0.308 0.226 Split power-exp. 0.156 0.209 0.385 0.244 0.297 0.222 HP4 0.152 0.209 0.356 0.245 0.298 0.216

Table 4.4: Standard error estimates of forecasted mortality indices.

(for Japan slightly better results are obtained by HP4). Note that using a simple parametrized model can bring advantages compared to using mea- sured data so it is also interesting to only compare the different models with each other. Here the results vary significantly from country to country but usually estimated variance is lower for the more accurate models using more parameters. Examining Table 4.4 we can see that for Switzerland and Taiwan the models tend to give a smaller standard error estimate than the measured data, but otherwise the measured data gives standard error estimates that are as good or better than the other models. There is also greater variation in which model seems most reliable with respect to forecasting compared to Table 4.3. To understand if the variations here are indicative of trends for different classes of models a larger number of models should be applied to a larger number of datasets, note that more sophisticated fitting and forecasting method should probably also be employed to ensure that the comparison actually compares scenarios where each method is used in an appropriate way. There are also other aspects and applications that require a different

183 186

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions kind of analysis, for instance if we wanted to compare the models suitability for pricing life insurance or modelling pensions we would use different age ranges since life insurance is mostly intended for those who die at a rela- tively young age (but not children) while suitable pension planning requires understanding of how many individuals will live to a high age. We have also only considered a few simple parametrized models and not taken the relative explanatory power of the models into account. More of the models from Table 4.1 could be considered as well as other types of mortality modelling, for instance dynamical model methods [260]. In other words, every aspect of the comparison could be improved but we believe that some sort of systematic comparison of these type of models on easily available but relevant corpus of data could be a useful and informative tool for researchers and professionals.

184 187

REFERENCES

References

[1] Wasana Aberathna, Lakshman Alles, W. N. Wickremasinghe, and Isuru Hewapathirana. Modeling and forecasting mortality in Sri Lanka. Sri Lankan Journal of Applied Statistics, 15(3):141–170, 2014. [2] Milton Abramowitz and Irene Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York, 1964. [3] Alexander Craig Aitken. Determinants and Matrices. Interscience publishers, Inc., 3rd edition, 1944. [4] Hirotugu Akaike. Information theory and an extension of the max- imum likelihood principle. In B. N. Petrov and Frigyes Cs´aki,edi- tors, 2nd International Symposium on Information Theory, Tsahkad- sor, Armenia, USSR, September 2–8, 1971, pages 267–281, 1973. [5] Hirotugu Akaike. A new look at the identification. IEEE Transactions on Automatic Control, 19(6):716–723, 1974. [6] Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni. An introduc- tion to random matrices. Cambridge University Press, 2010. [7] Theodore W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley Interscience, 2003. [8] Richard Askey. Orthogonal Polynomials and Special Functions. Soci- ety for Industrial and Applied Mathematics, 1975. [9] Sheldon Axler. Linear Algebra Done Right. Springer International Publishing, 3rd edition, 2015. [10] Hassan Azad, M. Tahir Mustafa, and Abdallah Laradji. Polynomial solutions of certain differential equations arising in physics. Mathe- matical Methods in Applied Science, 36(12):1615–1624, 2013. [11] Tom B¨ackstr¨om.Vandermonde factorization of Toeplitz matrices and applications in filtering and warping. IEEE Transactions on Signal Processing, 61(24):6257–6263, 2013. [12] Tom B¨ackstr¨om,Johannes Fischer, and Daniel Boley. Implementation and evaluation of the Vandermonde transform. In 22nd European Signal Processing Conference (EUSIPCO), pages 71–75, 2014. [13] Zhidong Bai, Zhaoben Fang, and Ying-Chang Liang. Spectral Theory of Large Dimensional Random Matrices and Its application to Wire- less Communiaction and Finance: Random Matrix Theory and Its Applications. World Scientific, 2014.

185 188

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[14] Elisabetta Barbi, Francesco Lagona, Marco Marsili, James W. Vaupel, and Kenneth W. Wachter. The plateau of human mortality: Demog- raphy of longevity pioneers. Science, 360(6396):1459–1461, 2018.

[15] Michael Fielding Barnsley. Fractal functions and interpolation. Con- structive Approximation, 2(1):303–329, 1986.

[16] Michael Fielding Barnsley. Fractals Everywhere. Academic Press, Inc., 1988.

[17] Mark Bebbington, Rebecca Green, Chin-Diew Lai, and RiˇcardasZi- tikis. Beyond the Gompertz law: exploring the late-life mortality de- celeration phenomenon. Scandinavian Actuarial Journal, 2014(3):189– 207, 2014.

[18] Richard Bellman. Introduction to Matrix Analysis. McGraw-Hill Book Company, New York, 2nd edition, 1970.

[19] Arthur T. Benjamin and Gregory P. Dresden. A combinatorial proof of Vandermonde’s determinant. The American Mathematical Monthly, 114(4):338–341, April 2007.

[20] Robert Berman, S´ebastien Boucksom, and David Witt Nystr¨om. Fekete points and convergence towards equilibrium measures on com- plex manifolds. Acta Mathematica, 207(1):1–27, 2011.

[21] Jean-Paul Berrut and Lloyd N. Trefethen. Barycentric Lagrange in- terpolation. SIAM Review, 46(3):501–517, 2004.

[22] Garrett Birkhoff and Carl de Boor. Piecewise polynomial interpolation and approximation. In Henry. L. Garabedian, editor, Approximation of functions - Proceeding of the General Motors Symposium of 1964, pages 164–190, 1965.

[23] Ake˚ Bj¨orck and Victor Pereyra. Solution of Vandermonde systems of equations. Mathematics of Computation, 24(112):893–903, 1970.

[24] Thomas F. Bloom, Len P. Bos, Jean-Paul Calvi, and Norman d Levenberg. Polynomial interpolation and approximation in C . arXiv:1111.6418.

[25] Maxime Bˆocher. Certain cases in which the vanishing of the Wronskian is a sufficient condition for linear dependence. Transactions of the American Mathematical Society, 2(2):139–149, April 1900.

[26] Maxime Bˆocher. On linear dependence of functions of one variable. Bulletin of the American Mathematical Society, pages 120–121, De- cember 1900.

186 189

REFERENCES

[27] Maxime Bˆocher. The theory of linear dependence. Annals of Mathe- matics, 2(1):81–96, January 1900.

[28] Nicholas Bonello, Sheng Chen, and Lajos Hanzo. Construction of regular quasi-cyclic protograph LDPC codes based on Vandermonde matrices. IEEE transactions on vehicular technology, 57(8):2583–2588, July 2008.

[29] Heather Booth and Leonie Tickle. Mortality modelling and forecast- ing: a review of methods. Annals of Actuarial Science, 3(1–2):3–43, 2008.

[30] Leonard P. Bos, Stefano De Marchi, Alvise Sommariva, and Marco Vianello. Computing multivariate Fekete and Leja points by numerical linear algebra. SIAM Journal of Numerical Analysis, 48(5):1984–1999, 2010.

[31] Ray Chandra Bose and Dwijendra Kumar Ray-Chaudhuri. On a class of error correcting binary group codes. Information and Control, 1960.

[32] Alin Bostan and Phillippe Dumas. and linear dependence. American Mathematical Monthly, 117(8):722–727, 2010.

[33] Andromachi Boulougari, Karl Lundeng˚ard, Milica Ranˇci´c,Sergei Sil- vestrov, Samya Suleiman, and Belinda Strass. Application of a power- exponential function-based model to mortality rates forecasting. Com- munications in Statistics: Case Studies, Data Analysis and Applica- tions, 5(1):3–10, 2019.

[34] William E. Boyce and Richard C. DiPrima. Elementary Differential Equations and Boundary Value Problems. John Wiley & Sons, Inc., 7th edition, 2001.

[35] Hamparsum Bozgodan. Model selection and Akaike’s information cri- terion (aic): the general theory and analytical extensions. Psychome- trika, 52(3):345–370, 1987.

[36] David Marius Bressoud. Proofs and Confirmations: The Story of the Conjecture. Spectrum. Cambridge University Press, 1999.

[37] Matteo Briani, Alvise Sommariva, and Marco Vianello. Computing Fekete and Lebesgue points: Simplex, square, disk. Journal of Com- putational and Applied Mathematics, 236(9):2477–2486, 2012.

[38] Charles Edward Rhodes Bruce and R. H. Golde. The lightning dis- charge. The Journal of the Institution of Electrical Engineers - Part II: Power Engineering, 88(6):487 – 505, December 1941.

187 190

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[39] Marshall W. Buck, Raymond A. Coley, and David P. Robbins. A generalized Vandermonde determinant. Journal of Algebraic Combi- natorics, 1:105–109, 1992.

[40] Kenneth P. Burnham and David R. Anderson. Model Selection and Multimodel Inference. Springer-Verlag New York Inc., 2002.

[41] Zoltan Butt and Steven Haberman. Application of frailty-based mortality models using generalized linear models. Astin Bulletin, 34(1):175–197, 2004.

[42] Augustin-Louis Cauchy. M´emoire sur les fonctions qui ne peuvent obtenir que deux valeurs ´egaleset de signes contraires par suite des transpositions op´er´eesentre les variables qu’elles renferment. Jour- nal de l’Ecol´ePolytechnique, 10(17):29–112, 1815. Reprinted in Eu- vres compl´etesd’Augustin Cauchy series 2, Volume 1, pp. 91–161, Gauthier-Villars, Paris (1899).

[43] Arthur Cayley. A memoir on the theory of matrices. Philosophical Transactions of the Royal Society of London, 148:17–37, 1858.

[44] Graziano Cerri, Roberto De Leo, and Valter Mariani Primian. ESD indirect coupling modelling. IEEE Transactions on Electromagnetic Compatibility, 38(3):274–281, 1996.

[45] Paul A. Chatterton and Michael A. Houlden. EMC Electromagnetic Theory to Practical Design. John Wiley & Sons, Inc., 1992.

[46] Rajendra N. Chavhan and Ramkrishna L. Shinde. Modeling and fore- casting mortality using the Lee-Carter Model for Indian population based on decade-wise data. Sri Lankan Journal of Applied Statistics, 17(1):51–68, 2016.

[47] Young-Min Chen, Hsuan-Chu Li, and Eng-Tjioe Tan. An explicit fac- torization of totally positive generalized Vandermonde matrices avoid- ing Schur functions. Applied Mathematics E-Notes, 8:138–147, 2008.

[48] Theodore S. Chihara. An Introduction to Orthogonal Polynomials. Dover Publications Inc., New York, Dover edition, 2011.

[49] Charles K. Chui. An Introduction To Wavelets. Academic Press, 1992.

[50] Hakan Ciftci, Richard L. Hall, Nasser Saad, and Ebubekir Dogu. Phys- ical applications of second-order linear differential equations that ad- mit polynomial solutions. Journal of Physics A: Mathematical and Theoretical, 43, October 2010.

188 191

REFERENCES

[51] Michele Cirafici, Annamaria Sinkovics, and Richard J. Szabo. Coho- mological gauge theory, quiver matrix models and Donaldson–Thomas theory. Nuclear Physics, Section B, 809(3):452–518, 2009. [52] Henry Cohn. A conceptual breakthrough in sphere packing. Notices American Mathematical Society, 64(2):102–15, 2017. [53] Henry Cohn, Abhinav Kumar, Stephen D. Miller, Danylo Radchenko, and Maryna Viazovska. The sphere packing problem in dimension 24. Annals of Mathematics, 185:1017–1033, 2017. [54] Henry Cohn and Stephen D. Miller. Some properties of optimal func- tions for sphere packing in dimensions 8 and 24. arXiv:1603.04759, 2016. [55] Thomas F Coleman and Yujing Li. On the convergence of reflec- tive Newton methods for large-scale nonlinear minimization subject to bounds. Mathematical Programming, 67(2):189–204, 1994. [56] Thomas F. Coleman and Yujing Li. An interior, trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Op- timization, 6:418–445, 1996. [57] Youwei Li Colin O’Hare. Explaining young mortality. Insurance: Mathematics and Economics, 50(1):12–25, January 2012. [58] John H. Conway and Neil J. A. Sloane. Sphere Packings, Lattices and Groups. Springer-Verlag Berlin Heidelberg New York, 3rd edition, 1999. [59] Charles-Augustin Coulomb. Premier m´emoiresur l’´electricit´eet le magn´etisme. Histoire de l’Acad´emie royale des sciences avec les m´emoires de math´ematiqueset de physique pour la mˆemeann´eetir´es des registres de cette acad´emie.Ann´eeMDCCLXXXV, pages 569–577, 1785. [60] David A. Cox, John Little, and Donal O’Shea. Ideals, Varieties, and Algorithms. Undergraduate Texts in Mathematics. Springer-Verlag New York, 3rd edition, 2007. [61] Carlos D’Andrea and Luis Felipe Tabera. Tropicalization and irre- ducibility of generalized Vandermonde determinants. Proceedings of the American Mathematical Society, 137:3647–3656, 2009. [62] Mark Herbert Ainsworth Davis. Martingale Representation and All That, pages 57–68. Birkh¨auserBoston, Boston, MA, 2005. [63] Carl de Boor. On calculating with b-splines. Journal of Approximation Theory, 1:46–69, 1972.

189 192

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[64] Carl de Boor. A Practical Guide to Splines. Springer Verlag, 1978.

[65] Carl de Boor. Divided differences. Surveys in Approximation Theory, 6:50–62, 2005.

[66] Stefano De Marchi. Polynomials arising in factoring generalized Van- dermonde determinants: an algorithm for computing their coefficients. Mathematical and Computer Modelling, 34(3):271–281, 2001.

[67] Stefano De Marchi. Polynomials arising in factoring generalized Van- dermonde determinants II: A condition for monicity. Applied Mathe- matics Letters, 15(5):627–632, 2002.

[68] Stefano De Marchi and Maria Morandi Cecchi. Polynomials arising in factoring generalized Vandermonde determinants III: Computations and their roots. Neural, Parallel and Scientific Computations, 14:25– 38, 2006.

[69] Federico Delfino, Renato Procopio, Mansueto Rossi, and Farhad Rachidi. Prony series representation for the lightning channel base current. IEEE Transactions on Electromagnetics Compatibility, 54(2):308–315, April 2012.

[70] Emmanuel Desurvire. Classical and Quantum Information Theory. Cambridge University Press, 2009.

[71] David C. Dickinson, Mary C. Hardy, and Howard R. Waters. Actu- arial Mathematics for Life Contingent Risks. International Series on Actuarial Science. Cambridge University Press, 2nd edition, 2009.

[72] Leonard Eugene Dickson. Linear Groups with an Exposition of the Galois Theory. B. G. Teubner, 1901.

[73] Dimitar K. Dimitrov and Boris Shapiro. Electrostatic problems with a rational constraint and degenerate lam´eoperators. Potential Analysis, 2018.

[74] Kathy Driver and Kerstin Jordan. Bounds for extreme zeros of some classical orthogonal polynomials. Journal of Approximation Theory, 164(9):1200–1204, September 2012.

[75] Ioana Dumitriu and Alan Edelman. Matrix models for beta ensembles. Journal of Mathematical Physics, 45(11):5830–5847, 2002.

[76] Joseph R. Dwyer and Martin A. Uman. The physics of lightning. Physics Report, 534(4):147–241, 2014.

[77] Freeman Dyson. A meeting with Enrico Fermi. Nature, 427(6972):297, 2004.

190 193

REFERENCES

[78] Freeman J. Dyson. Statistical theory of energy levels of complex sys- tems I. Journal of Mathematical Physics, 3(1):140–156, 1962.

[79] Freeman J. Dyson. Statistical theory of energy levels of complex sys- tems II. Journal of Mathematical Physics, 3(1):157–165, 1962.

[80] Freeman J. Dyson. Statistical theory of energy levels of complex sys- tems III. Journal of Mathematical Physics, 3(1):166–175, 1962.

[81] Freeman J. Dyson. Statistical theory of energy levels of complex sys- tems III. Journal of Mathematical Physics, 3(6):1199–1215, 1962.

[82] Freeman John Dyson. The approximation to algebraic numbers by rationals. Acta Mathematica, 79(1):225–240, December 1947.

[83] Alan Edelman and N. Raj Rao. Random matrix theory. Acta Numer- ica, 14:233–297, 2005.

[84] Alfredo Eisinberg and Guiseppe Fedele. On the inversion of the Vandermonde matrix. Applied Mathematics and Computation, 174(2):1384–1397, 2006.

[85] Thomas Ernst. Generalized Vandermonde determinants. U. U. D. M. Report 2000:6, 2000.

[86] Gilbert Faccarello. Du conservatoire `al’Ecole´ normale: Quelques notes sur A. T. Vandermonde (1735-1796). Cahiers d’Histoire du CNAM, 2/3:17–57, 1993.

[87] Valerii V. Fedorov. Theory of Optimal Experiments. Academic Press, Inc., 1972.

[88] Dennis M. Feehan. Separating the signal from the noise: Evidence for deceleration in old-age death rates. Demography, 55(6):2025–2044, 2018.

[89] Zhang Feizhou and Liu Shanghe. A new function to represent the lightning return-stroke currents. IEEE Transactions on Electromag- netic Compatibility, 44(4):595–597, 2002.

[90] Leopold Fej´er.Bestimmung derjenigen abszissen eines intervalles, f¨ur welche die quadratsumme der grundfunktionen der lagrangeschen in- terpolation im intervalle ein m¨oglichst kleines maximum besitzt. An- nali della Scuola Normale Superiore di Pisa - Classe di Scienze, S´erie 2, 1(3):263–276, 1932.

[91] Randolph P. Flowe and Gary A. Harris. A note on generalized Van- dermonde determinants. SIAM Journal on Matrix Analysis and Ap- plications, 14(4):1146–1151, October 1993.

191 194

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[92] David O. Forfar. Encyclopedia of Actuarial Science, volume 2, chapter Mortality Laws, pages 1–6. Wiley, 2006.

[93] Peter J. Forrester. Log-Gases and Random Matrices. Princeton Uni- versity Press, 2010.

[94] Patricia R. Foster. CAD for antenna systems. Electronics & Commu- nications Engineering Journal, 12(1):3–14, February 2000.

[95] G. P. Fotis and L. Ekonomou. Parameters’ optimization of the elec- trostatic discharge current equation. International Journal on Power System Optimization, 3(2):75–80, 2011.

[96] G. P. Fotis, Ioannis F. Gonos, and Ioannis A. Stathopulos. Determi- nation of discharge current equation parameters of ESD using genetic algorithms. Electronics Letters, 42(14):797–799, 2006.

[97] Ralf Fr¨oberg and Boris Shapiro. On Vandermonde varieties. Mathe- matica Scandinavica, 119(1):73–91, 2016.

[98] William Fulton and Joe Harris. Representation Theory: a first course. Springer-Verlag, 1991.

[99] Jean H. Gallier. Curves and Surfaces in Geometric Modelling: Theory and Algorithms. Morgan Kaufmann, 1999.

[100] Henry B. Garrett and Albert C. Whittlesey. Spacecraft charging, an update. IEEE Transactions on Plasma Science, 28(6):2017–2028, De- cember 2000.

[101] Letterio Gatto and Inna Scherbak. On generalized Wronskians. In P. Pragacz, editor, Contributions to Algebraic Geometry, pages 257– 296. EMS Congress Series Report, 2012. Longer version available at http://arxiv.org/abs/1310.4683.

[102] Carl Friedrich Gauss. Theoria combinationis observationum erroribus minimis obnoxiae, Pars Prior. Gottingae, 1821.

[103] Carl Friedrich Gauss. Theoria combinationis observationum erroribus minimis obnoxiae, Pars Posterior. Gottingae, 1823.

[104] Walter Gautschi. On inverses of Vandermonde and confluent Vander- monde matrices. Numerische Mathematik, 4:117–123, 1962.

[105] Walter Gautschi. On inverses of Vandermonde and confluent Vander- monde matrices II. Numerische Mathematik, 5:425–430, 1963.

[106] Walter Gautschi. Optimally conditioned Vandermonde matrices. Nu- merische Mathematik, 24(1):1–12, 1975.

192 195

REFERENCES

[107] Walter Gautschi. On inverses of Vandermonde and confluent Vander- monde matrices III. Numerische Mathematik, 29(4):445–450, 1978.

[108] Walter Gautschi. Optimally scaled and optimally conditioned Vander- monde and Vandermonde-like matrices. BIT Numerical Mathematics, 51(1):103–125, 2011.

[109] Leonid A. Gavrilov and Natalia S. Gavrilova. Late-life mortality is underestimated because of data errors. PLoS Biology, 17(2), 2019.

[110] Ira Gessel. Tournaments and Vandermonde’s determinant. Journal of , 3(3):305–307, 1979.

[111] Frederico Girosi and Gary King. Demographic forecasting. Press, 2008.

[112] Peter Goos. The Optimal Design of Blocked and Split-Plot Experi- ments. Number 164 in Lecture Notes In Statistics. Springer Verlag, New York, Inc., 2002.

[113] David Goss. Basic Structures of Function Field Arithmetic. Springer- Verlag Berlin Heidelberg, 1996.

[114] Harald Hannerz. An extension of relational methods in mortality es- timation. Demographic Research, 4:337–368, June 2001.

[115] Robert C. Hansen. Early computational electromagnetics. IEEE An- tennas and Propagation Magazine, 38(3):60–61, 1996.

[116] Harish-Chandra. Differential operators on a semisimple Lie algebra. American Journal of Mathematics, 79(1):87–120, January 1957.

[117] Fridolin Heidler. Travelling current source model for LEMP calcula- tion. In Proceedings of papers, 6th Int. Zurich Symp. EMC, Zurich, pages 157–162, 1985.

[118] Fridolin Heidler and Jovan Cveti´c. A class of analytical functions to study the lightning effects associated with the current front. Transac- tions on Electrical Power, 12(2):141–150, 2002.

[119] Eduard Heine. Handbuch der Kugelfunctionen, Theorie und Anwen- dungen. G. Reimer, Berlin, 1878.

[120] E. R. Heineman. Generalized Vandermonde determinants. Transac- tions of the American Mathematical Society, 31(3):464–476, July 1929.

[121] Hermann J. Helgert. Alternant codes. Information and Control, 26:369–380, 1974.

193 196

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[122] L. Heligman and J. H. Pollard. The age pattern of mortality. Journal of the Institute of Actuaries, 107(1):49–80, 1980.

[123] David Hestenes. New Foundations for Classical Mechanics. Kluwer Academic Publishers, 2nd edition, 2002.

[124] Alexis Hocquenghem. Codes correcteurs d’erreurs. Chiffres, 2:147– 156, September 1959.

[125] J’ozef Maria Hoene-Wro´nski. R´efutationde la Th´eoriedes fonctions analytiques de Lagrange. , 1812.

[126] Anders Holst and Victor Ufnarovski. Matrix Theory. Studentlitteratur AB, Lund, 2014.

[127] Human Mortality Database. Death rates. University of Cal- ifornia, Berkeley (USA), and Max Planck Institute for Demo- graphic Research (Germany). Available at www.mortality.org or www.humanmortality.de (data downloaded on 2019-06-01).

[128] Human Mortality Database. Death rates. University of Cal- ifornia, Berkeley (USA), and Max Planck Institute for Demo- graphic Research (Germany). Available at www.mortality.org or www.humanmortality.de (data downloaded on 2017-06-14).

[129] Clifford M. Hurvich and Chih-Ling Tsai. Regression and time series model selection in small samples. Biometrika, 76(2):297–307, 1989.

[130] Ali M. Hussein, Marius Milewski, and Wasyl Janischewskyj. Correlat- ing the characteristics of the CN tower lightning return-stroke current with those of its generated electromagnetic pulse. IEEE Transactions on Electromagnetics Compatibility, 50(3):642–650, August 2008.

[131] IEC 61000. Electromagnetic compatibility (EMC) - part 4-2: Testing and measurement techniques - electrostatic discharge immunity test, 2000.

[132] IEC 61000. Electromagnetic compatibility (EMC) - part 4-2: Testing and measurement techniques - electrostatic discharge immunity test, 2009.

[133] IEC 62305-1 Ed.2. Protection Against Lightning - Part I: General Principles, 2010.

[134] IEC International Standard 61312-1. Protection against lightning elec- tromagnetic impulse - electrostatic discharge immunity test, 1995.

[135] Ronald S. Irving. Integers, Polynomials, and Rings. Undergraduate Texts in Mathematics. Springer-Verlag New York, 1st edition, 2004.

194 197

REFERENCES

[136] Claude Itykson and Jean-Bernard Zuber. The planar approximation II. Journal of Mathematical Physics, 21(3):411–421, 1980.

[137] Eri Jabotinsky. Analytic iteration. Transactions of the American Mathematical Society, 108(3):457–477, September 1963.

[138] Alan T. James. The distribution of latent roots of the . The Annals of Mathematical Statistics, 31(1):151–158, 1960.

[139] J. Janssen and Christos. H. Skiadas. Dynamic modelling of life table data. Applied Stochastic Models and Data Analysis, 11:35–49, 1995.

[140] Vesna Javor. Multi-peaked functions for representation of lightning channel-base currents. In Proceedings of papers, 2012 International Conference on Lightning Protection - ICLP, Vienna, Austria, pages 1–4, 2012.

[141] Vesna Javor. New functions for representing IEC 62305 standard and other typical lightning stroke currents. Journal of Lightning Research, 4(Suppl 2: M2):50–59, 2012.

[142] Vesna Javor. New function for representing IEC 61000-4-2 standard electrostatic discharge current. Facta Universitatis, Series: Electron- ics and Energetics, 27(4):509–520, 2014.

[143] Vesna Javor. Representing measured lightning discharge currents by the multi-peaked function. In Software, Telecommunications and Computer Networks (SoftCOM), 2015 23rd International Conference on, Split, Croatia, pages 56–59, 2015.

[144] Vesna Javor, Karl Lundeng˚ard,Milica Ranˇci´c,and Sergei Silvestrov. Measured electrostatic discharge currents modeling and simulation. In Proceedings of TELSIKS 2015, Niˆs,Serbia, pages 209–212, 2015.

[145] Vesna Javor, Karl Lundeng˚ard,Milica Ranˇci´c,and Sergei Silvestrov. Analytical representation of measured lightning currents and its ap- plication to electromagnetic field estimation. IEEE Transactions on Electromagnetic Compatibility, 60(5):1415–1426, 2018.

[146] Vesna Javor and Predrag D. Ranˇci´c.A channel-base current function for lightning return-stroke modelling. IEEE Transactions on EMC, 53(1):245–249, 2011.

[147] Lin Jiang. Changing kinship structure and its implications for old-age support in urban and rural China. Population Studies, 49(1):127–145, June 1995.

[148] Kenneth L. Kaiser. Electrostatic Discharges. CRC Press, 2006.

195 198

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[149] Dan Kalman. The generalized Vandermonde matrix. Mathematics Magazine, 57(1):15–21, January 1984.

[150] E. P. F. Kan. An inversion procedure of the generalized Vandermonde matrix. IEEE Transactions on Automatic Control, 16(5):492–493, Oc- tober 1971.

[151] Pavlos S. Katsivelis, Ioannis F. Gonos, and Ioannis A. Stathopulos. Es- timation of parameters for the electrostatic discharge current equation with real human discharge events reference using genetic algorithms. Measurement Science and Technology, 21(10), October 2010.

[152] R. K. Keenan and L. K. A. Rossi. Some fundamental aspects of ESD testing. In Proceedings of IEEE International Symposium on Electro- magnetic Compatibility, pages 236–241, 1991.

[153] Jack Kiefer and Jacob Wolfowitz. On the nonrandomized optimality and randomized nonoptimality of symmetrical designs. The Annals of Mathematical Statistics, 29(3):675–699, September 1958.

[154] Jack Kiefer and Jacob Wolfowitz. Optimum designs in regression problems. The Annals of Mathematical Statistics, 30(2):271–294, June 1959.

[155] Jack Kiefer and Jacob Wolfowitz. The equivalence of two extremum problems. Canadian Journal of Mathematics, 12:363–366, 1960.

[156] David Kincaid and Ward Cheney. Numerical Analysis: Mathematics of Scientific Computing. American Mathematical Society, 3rd edition, 2002.

[157] Takuya Kitamoto. On the computation of the determinant of a gener- alized Vandermonde matrix. In Computer Algebra in Scientific Com- puting: 16th International Workshop, CASC 2014, pages 242–255, 2014.

[158] Andr´eKlein. Matrix algebraic properties of the Fisher information matrix of stationary processes. Entropy, 16:2013–2055, 2014.

[159] Donald Ervin Knuth. Convolution polynomials. Mathema Journal, 4:67–78, 1992.

[160] Donald Ervin Knuth. The Art of Computer Programming: Volume 1: Fundamental Algorithms. Addison-Wesley Professional, 1997.

[161] Kenneth D. Kochanek, Sherry L. Murphy, Jiaquan Xu, and Elizabeth Arias. Deaths: Final data for 2017. National Vital Statistics Report, 68(9), 2019. National Center for Health Statistics.

196 199

REFERENCES

[162] Marie-Claire Koissi, Arnold F. Shapiro, and G¨oranH¨ogn¨as. Eval- uating and extending the Lee–Carter model for mortality forecast- ing: Bootstrap confidence interval. Insurance: Mathematics and Eco- nomics, 38(1):1–20, February 2006.

[163] Wolfgang K¨onig.Orthogonal polynomial ensembles in probability the- ory. Probability Surveys, 2:385–447, 2005.

[164] Sadanori Konishi and Genshiro Kitagawa. Information Criteria and Statistical Modeling. Springer, 2008.

[165] Krzysztof Kowalski and Willi-Hans Steeb. Nonlinear Dynamical Sys- tems and Carleman Linearization. World Scientific Publishing, 1991.

[166] Joseph Louis Lagrange. R´eflexionssur la r´esolutionalg´ebriquedes ´equations. In Œuvres de Lagrange, volume 3, pages 205–421. J. A. Serret, 1869.

[167] Joseph Louis Lagrange. Le¸cons´el´ementaires sur les math´ematiques donn´eesa l’´ecolenormale. In Œuvres de Lagrange, volume 7, pages 183–287. J. A. Serret, 1877.

[168] Henri Lebesgue. L’œvre math’ematique de Vandermonde. In Notices d’Histoire des Math´ematiques. Universit´ede Gen`eve, 1958.

[169] Ronald Lee. The Lee–Carter method for forecasting mortality, with various extensions and applications. North American Actuarial Jour- nal, 4(1):80–91, 2000.

[170] Ronald D. Lee and Lawrence Carter. Modelling and forecasting U.S. mortality. Journal of the American Statistical Association, 87(419):659–671, 1992.

[171] Ronald D. Lee and Timothy Miller. Evaluating the Lee–Carter method for forecasting mortality. Demography, 38(4):537–549, November 2001.

[172] Ronald D. Lee and R. Rofman. Modeling and forecasting mortality in Chile. Notas Poblacion, 22(59):183–213, June 1994.

[173] Gottfried Wilhelm Leibniz. VI. Leibniz an de l’Hospital. In Leibniz Gesammelte Werk. A. Asher & Comp., 1850.

[174] Kenneth Levenberg. A method for the solution of certain non- linear problems in least squares. Quarterly of Applied Mathematics, 2(2):164–168, July 1944.

[175] Hsuan-Chu Li and Eng-Tjioe Tan. On a special generalized Vander- monde matrix and its LU factorization. Taiwanese Journal of Math- ematics, 12(7):1651–1666, October 2008.

197 200

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[176] Nan Li, Ronald D. Lee, and Shripad Tuljapurkar. Using the Lee– Carter method to forecast mortality for populations with limited data. International Statistical Review, 72(1):19–36, 2007. [177] Ting Li, Yang Claire Yang, and James J. Anderson. Mortality increase in late-middle and early-old age: Heterogeneity in death processes as a new explanation. Demography, 50(5):1563–1591, 2013. [178] Dino Lovri´c,Slavko Vujevi´c,and Ton´ciModri´c.On the estimation of Heidler function parameters for reproduction of various standardized and recorded lightning current waveshapes. International Transactions on Electrical Energy Systems, 23:290–300, 2013. [179] F´abioR. Lucas. Limits for zeros of Jacobi and Laguerre polynomias. Proceeding Series of the Brazilian Society of Applied and Computa- tional Mathematics, 3(1), 2015. [180] Karl Lundeng˚ard. Generalized Vandermonde matrices and determi- nants in electromagnetic compatibility. Licentiate thesis, M¨alardalen University, 2017. [181] Karl Lundeng˚ard, Vesna Javor, Milica Ranˇci´c,and Sergei Silvestrov. Application of the Marquardt least-squares method to the estimation of pulse function parameters. In AIP Conference Proceedings 1637, ICNPAA, Narvik, Norway, pages 637–646, 2014. [182] Karl Lundeng˚ard, Vesna Javor, Milica Ranˇci´c,and Sergei Silvestrov. Estimation of pulse function parameters for approximating measured lightning currents using the Marquardt least-squares method. In Con- ference Proceedings, EMC Europe, Gothenburg, Sweden, pages 571– 576, 2014. [183] Karl Lundeng˚ard, Vesna Javor, Milica Ranˇci´c,and Sergei Silvestrov. Application of the multi-peaked analytically extended function to rep- resentation of some measured lightning currents. Serbian Journal of Electrical Engineering, 13(2):1–11, 2016. [184] Karl Lundeng˚ard, Vesna Javor, Milica Ranˇci´c,and Sergei Silvestrov. Estimation of parameters for the multi-peaked AEF current function. Methodology and Computing in Applied Probability, pages 1–15, 2016. [185] Karl Lundeng˚ard, Vesna Javor, Milica Ranˇci´c,and Sergei Silvestrov. On some properties of the multi-peaked analytically extended function for approximation of lightning discharge currents. In Sergei Silvestrov and Milica Ranˇci´c,editors, Engineering Mathematics I: Electromag- netics, Fluid Mechanics, Material Physics and Financial Engineering, volume 178 of Springer Proceedings in Mathematics & Statistics, chap- ter 10. Springer International Publishing, 2016.

198 201

REFERENCES

[186] Karl Lundeng˚ard,Jonas Osterberg,¨ and Sergei Silvestrov. Optimiza- tion of the determinant of the Vandermonde matrix and related matri- ces. Methodology and Computing in Applied Probability, 20:1417–1428, 2018.

[187] Karl Lundeng˚ard,Jonas Osterberg,¨ and Sergei Silvestrov. Extreme points of the vandermonde determinant on the sphere and some lim- its involving the generalized vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silve- strov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer Interna- tional Publishing, 2019.

[188] Karl Lundeng˚ard,Milica Ranˇci´c, Vesna Javor, and Sergei Silve- strov. An examination of the multi-peaked analytically extended func- tion for approximation of lightning channel-base currents. In Pro- ceedings of Full Papers, PES 2015, Niˇs, Serbia, 2015. Electronic, arXiv:1604.06517.

[189] Karl Lundeng˚ard, Milica Ranˇci´c,Vesna Javor, and Sergei Silvestrov. Multi-peaked analytically extended function representing electrostatic discharge (ESD) currents. ICNPAA2016 Proceedings, 2016.

[190] Karl Lundeng˚ard, Milica Ranˇci´c,Vesna Javor, and Sergei Silvestrov. Novel approach to modelling of lightning current derivative. Facta Universitatis, Series: Electronics and Energetics, 30(2):245–256, June 2017.

[191] Karl Lundeng˚ard, Milica Ranˇci´c,Vesna Javor, and Sergei Silvestrov. Electrostatic discharge currents representation using the multi-peaked analytically extended function by interpolation on a D-optimal design. Facta Universitatis Series: Electronics and Energetics, 32(1):25–49, 2019.

[192] Karl Lundeng˚ard, Milica Ranˇci´c,and Sergei Silvestrov. Modelling mortality rates using power exponential functions. Submitted to jour- nal, 2019.

[193] Karl Lundeng˚ard, Milica Ranˇci´c, and Sergei Silvestrov. Sup- plementary material for ”Modelling mortality rates using power exponential functions”. Downloadable dataset, 2019. DOI: 10.6084/m9.figshare.8956838.

[194] Laurent¸iu Lupa¸s.On the computation of the generalized Vandermonde matrix inverse. IEEE Transactions on automatic control, 20(4):559– 561, August 1975.

199 202

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[195] G¨unter L¨uttgens and Norman Wilson. Electrostatic Hazards. Butterworth–Heinemann, 1997.

[196] Tom Lyche and Larry L. Schumaker. L-spline wavelets. Wavelet Anal- ysis and Its Applications, 5:197–212, 1994.

[197] Wen-Xiu Ma. Wronskians, generalized Wronskians and solutions to the Korteweg–de Vries equation. Chaos, Solitons and Fractals, 19:163– 170, 2004.

[198] Ian Grant Macdonald. Symmetric Functions and Hall Polynomials. Oxford University Press, 2nd edition, 1979.

[199] Nathaniel Macon and Abraham Spitzbart. Inverses of Vandermonde matrices. The American Mathematical Monthly, 65(2):95–100, 1958.

[200] Maple 18.02. Maplesoft, a division of Waterloo Maple Inc., Waterloo, Ontario.

[201] Andrey Andreyevich Markov. Rasprostranenie predel’nyh teorem is- chisleniya veroyatnostej na summu velichin svyazannyh v cep’. Zapiski Akademii Nauk po Fiziko-matematicheskomu otdeleniyu,, 25(3), 1908.

[202] Donald W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2):431–441, 1963.

[203] Jordi Marzo and Joaquim Ortega-Cerd`a. Equidistribution of Fekete points on the sphere. Constructive Approximation, 32(3):513–521, 2010.

[204] Peter Massopust. Interpolation and approximation with splines and fractals. Oxford University Press, 2010.

[205] MATLAB. Release 2015a, Optimization Toolbox. The Mathworks, Inc., Natick, Massachusetts, United States.

[206] MATLAB. Release 2019a, Optimization Toolbox. The Mathworks, Inc., Natick, Massachusetts, United States.

[207] Madan Lal Mehta. Random Matrices. Elsevier, 3rd edition, 2004.

[208] Viatcheslav B. Melas. Functional Approach to Optimal Experimen- tal Design, volume 184 of Lecture Notes in Statistics. Springer Sci- ence+Business Media, Inc, 2006.

[209] Leon Mirsky. An Introduction To Linear Algebra. Oxford University Press, 1955.

200 203

REFERENCES

[210] Daniel Mitchell, Patrick Brockett, Rafael Mendoza-Arriaga, and Ku- mar Muthuraman. Modeling and forecasting mortality rates. Insur- ance: Mathematics and Economics, 52:275–285, 2013. [211] Eliakim Hastings Moore. A two-fold generalization of Fermat’s theo- rem. Bullentin of the Americal Mathematical Society, 2:189–199, 1896. [212] David Morgan. A Handbook for EMC Testing and Measurement, vol- ume 8 of IEEE Electrical Measurement series. Peter Peregrinus Ltd. on behalf of the Institution of Electrical Engineers, London, United Kingdom, 1994. [213] H´ectorManuel Moya-Cessa and Francisco Soto-Eguibar. Differential Equations: An Operational Approach. Rinton Press, 2011. [214] H´ectorManuel Moya-Cessa and Francisco Soto-Eguibar. Inverse of the Vandermonde and Vandermonde confluent matrices. Applied Mathe- matics and Information Sciences, 5(3):361–366, 2011. [215] H´ector Manuel Moya-Cessa and Francisco Soto-Eguibar. Dis- crete fractional Fourier transform: Vandermonde approach. arxiv: 1604.06686v1 [math.GM], 2016. [216] Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango, and Godwin Kakuba. Extreme points of the Vandermonde determinant on surfaces implicitly determined by a univariate polynomial. Accepted for publication in Algebraic struc- tures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019. [217] Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango, and Godwin Kakuba. Optimization of the Wishart joint eigenvalue probability density distribution based on the Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Swe- den, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Mil- ica Ranˇci´c(Eds), Springer International Publishing, 2019. [218] Thomas Muir. The Theory of Determinants in the Historical Order of its Development: Part I: Determinants in General, Leibniz (1693) to Cayley (1841). MacMillan and Co. London, 1890. [219] Thomas Muir and William Henry Metzler. A Treatise on the Theory of Determinants. Dover Publications Inc., New York, 1966. [220] Robb I. Muirhead. Aspect of Multivariate Statistical Theory. Wiley Interscience, 1982.

201 204

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[221] Gary L. Mullen and Daniel Panario. Handbook of Finite Fields. CRC Press, 2013.

[222] Isidor Pavlovich Natanson. Constructive Function Theory, Volume 1: Uniform Approximation. Frederick Ungar Publishing Co., Inc., 1964.

[223] Joseph Needham and Wang Ling. Science and Civilisation in China, Volume 3: Mathematics and the Sciences of the Heavens and the Earth. Cambridge University Press, 1959.

[224] Øystein Ore. On a special class of polynomials. Transactions of the American Mathematical Society, 35(3):559–584, July 1933.

[225] Halil Oru¸c. LU factorization of the Vandermonde matrix and its ap- plications. Applied Mathematics Letters, 20:982–987, 2007.

[226] Halil Oru¸cand Hakan K. Akmaz. Symmetric functions and the Van- dermonde matrix. Journal of Computational and Applied Mathemat- ics, pages 49–64, 2004.

[227] Alexander Ostrowski. Uber¨ ein Analogon der Wronskischen Deter- minante bei Funktionen mehrerer Ver¨anderlicher. Mathematische Zeitschrift, 4(3):223–230, September 1919.

[228] Clayton R. Paul. Introduction to Electromagnetic Compatibility. John Wiley & Sons, Inc., 1992.

[229] Dragan Pavlovi´c,Gradimir V. Milovanovi´c,and Jovan Cveti´c.Calcu- lation of the channel discharge function for the generalized lightning traveling current source return stroke model. Filomat, 32(20):6937– 6951, 2018.

[230] Giuseppe Peano. Sur le d´eterminant Wronskien. Mathesis, 9:75–76, 1889.

[231] Giuseppe Peano. Sur le Wronskiens. Mathesis, 9:110–112, 1889.

[232] Lennart Persson. Handbook of Fish Biology and Fisheries, volume 1, chapter 15 Community Ecology of Freshwater Fishes, pages 321–340. Blackwell Publishing, 2002.

[233] Lennart Persson, Kjell Leonardsson, Andr´eM. de Roos, Mats Gyllen- berg, and Bent Christensen. Ontogenetic scaling of foraging rates and the dynamics of a size-structured consumer-resource model. Theoret- ical Population Biology, 54:270–293, 1998.

[234] Dragan Poljak. Advanced Modeling in Computational Electromagnetic Compatibility. John Wiley & Sons, Inc., 2007.

202 205

REFERENCES

[235] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 3rd edition, 2007.

[236] Anatoli Prudnikov, Jurij AleksandroviˇcBryˇckov, and Oleg Igoreviˇc Mariˇcev. Integrals and Series: More Special Functions, volume 3. Gordon and Breach Science Publishers, 1990.

[237] J´ozefH. Przytycki. History of the knot theory from Vandermonde to Jones. In XXIVth National Congress of the Mexican Mathematical Society (Spanish) (Oaxtepec, 1991), pages 173–185, 1991.

[238] Jennifer J. Quinn. Visualizing Vandermonde’s determinant through nonintersecting lattice paths. Journal of Statistical Planning and In- ference, 140(8):2346–2350, 2010.

[239] Lyle Ramshaw. Blossoming: A connect-the-dots approach to splines. resreport 19, Digital Systems Research Center, 1987.

[240] Lyle Ramshaw. Blossoms are polar forms. Computer Aided Geometric Design, 6(4):323–358, 1989.

[241] Kamisetti Ramamohan Rao, Do Nyeon Kim, and Jae-Jong Hwang. Fast Fourier Transform - Algorithms and Applications. Springer, 2010.

[242] Irving Stoy Reed and Gustave Solomon. Polynomial codes over cer- tain finite fields. Journal of the Society for Industrial and Applied Mathematics, 8(2):300–304, 1960.

[243] Ralph Tyrell Rockafellar. Lagrange multipliers and optimality. SIAM Review, 35(2):183–238, 1993.

[244] Klaus Friedrich Roth. Rational approximations to algebraic numbers. Mathematika, 2(1):1–20, 1955.

[245] Abraham Rubinstein, Carlos Romero, Mario Paolone, Farhad Rachidi, Marcos Rubinstein, Pierre Zweiacker, and Bertrand Daout. Lightning measurement station on mount S¨antis in Switzerland. In Proceedings of X International Symposium on Lightning Protection, Curitiba, Brazil, pages 463–468, 2009.

[246] Walter Rudin. Real and Complex Analysis. WCB/McGraw-Hill Book Company, 3rd edition, 1987.

[247] Andrzej Ruszczy´nski. Nonlinear Optimization. Princeton University Press, 2006.

[248] Edward B. Saff and Vilmos Totik. Logarithmic Potentials with Exter- nal Fields. Springer-Verlag Berlin Heidelberg, 1997.

203 206

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[249] Thomas Scharf, Jean-Yves Thibon, and Brian Garner. Wybourne. Powers of the Vandermonde determinant and the quantum Hall ef- fect. Journal of Physics A: General Physics, 27(12):4211–4219, 1994.

[250] Hans Peter Schlickewei and Carlo Viola. Generalized Vandermonde determinants. Acta Arithmetica, XCV(2):123–137, 2000.

[251] Isaac Jacob Schoenberg. Contributions to the problem of approxima- tion of equidistant data by analytic functions. part A: On the problem of smoothing or graduation. a first class of analytic approximation formulae. Quart. Appl. Math., 4:45–99, 1964.

[252] Isaac Jacob Schoenberg. Contributions to the problem of approxima- tion of equidistant data by analytic functions. part B: On the problem of osculatory interpolation. a second class of analytic approximation formulae. Quart. Appl. Math., 4:112–141, 1964.

[253] Larry L. Schumaker. Spline Functions: Basic Theory. Cambridge University Press, 3rd edition, 2007.

[254] Gideon Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6(2):461–464, 1978.

[255] Sylvia Serfaty. Systems of points with Coulombian interactions. Euro- pean Mathematical Society Newsletter, 12(110):16–21, 2018. Extended version available on arXiv:1712.04095.

[256] Lewis B. Sheiner and Stuart L. Beal. Evaluation of methods for estimating population pharmacokinetic parameters II. biexponential model and experimental pharmacokinetic data. Journal of Pharma- cokinetics and Biopharmaceutics, 9(5):635–651, 1981.

[257] Takatoshi Shindo, Toru Miki, Mikihisa Saito, Daiki Tanaka, Akira Asakawa, Hideki Motoyama, Masaru Ishii, Takeo Sonehara, Yusuke Suzuhigashi, and Hiroshi Taguchi. Lightning observations at Tokyo Skytree: Observation systems and observation results in 2012 and 2013. In Proceedings of the 2014 International Symposium on Elec- tromagnetic Compatibility (EMC Europe 2014), Gothenburg, Sweden, pages 583–588, 2014.

[258] Michael Shub and Steve Smale. Complexity of Bezout’s theorem, III: condition number and packing. Journal of Complexity, 9:4–14, 1993.

[259] Christos H. Skiadas and Maria Felice Arezzo. Estimation of the healthy life expectancy in Italy through a simple model based on mor- tality rate. In Christos H. Skiadas and Charilaos Skiadas, editors, Demography and Health Issues, volume 46 of The Springer Series on

204 207

REFERENCES

Demographic Methods and Population Analysis, pages 41–47. Springer Nature, 2018.

[260] Christos H. Skiadas and Charialos Skiadas. Exploring the Health State of a Population by Dynamic Modeling Methods, volume 45 of The Springer Series on Demographic Methods and Population Analysis. Springer Nature, 2018.

[261] Christos H. Skiadas and Charilaos Skiadas. The Fokker–Planck equa- tion and the first exit time problem. a fractional second order ap- proximation. In Christos H. Skiadas, editor, Fractional Dynamics, Anomalous Transport and Plasma Science, pages 67–75. Springer Na- ture, 2018.

[262] Steve Smale. Mathematical problems for the next century. The Math- ematical Intelligencer, 20(2):7–15, 1998.

[263] David Eugene Smith. Leibniz on determinants. In A Source Book in Mathematics, volume 1. Dover Publications Inc., New York, 1959.

[264] Kirstine Smith. On the standard deviations of adjusted and interpo- lated values of an observed polynomial function and its constants and the guidance they give towards a proper choice of the distribution of the observations. Biometrika, 12(1/2):1–85, 1918.

[265] Garrett Sobczyk. Generalized Vandermonde determinants and appli- cations. Aportaciones M´atematicas, 30:41–53, 2002.

[266] S. Songlin, B. Zengjun, T. Minghong, and L. Shange. A new analytical expression of current waveform in standard IEC 61000-4-20. High Power Laser and Particle Beams, 5:464–466, 2003.

[267] Thomas Joannes Stieltjes. Sur certains polynˆomesqui v´erifient une ´equationdiff´erentielle lin´eairedu second ordre et sur la theorie des fonctions de Lam´e. Acta Mathematics, 6:321–326, 1885.

[268] James Joseph Sylvester. Additions to the articles in the September number of this journal, “on a new class of theorems,” and on Pascal’s theorem. Philosophical Magazine Series 3, 37(251):363–370, 1850.

[269] Gabor Szeg˝o. Orthogonal Polynomials. American Mathematics Soci- ety, 1975.

[270] Kei Takeuchi. Distribution of informational statistics and a criterion of model fitting. 数理科学 (Suri¯ kagaku - Mathematical Sciences), 153:12–18, 1976. (In japanese).

205 208

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[271] Gabriel T´ellezand Peter J. Forrester. Exact finite-size study of the 2D OCP at Γ = 4 and Γ = 6. Journal of Statistical Physics, 97(3):489– 521, November 1999.

[272] T. N. Thiele. On a mathematical formula to express the rate of mortal- ity throughout life. Journal of the Institute of Actuaries, 16:313–329, 1872.

[273] Henry C. Thode. Testing Normality. Marcel Dekker, Inc. New York, 2002.

[274] Joseph F. Traub. Associated polynomials and uniform methods for the solution of linear problems. SIAM Review, 8(3):277–301, 1966.

[275] Shripad Tuljapurkar, Nan Li, and Carl Boe. A universal pattern of mortality decline in the g7 countries. Nature, 405(6788):789–792, 2000.

[276] Herbert Westren Turnbull and Alexander Craig Aitken. An Introduc- tion to the Theory of Canonical Matrices. Dover Publications, Inc., 1961.

[277] L. Richard Turner. Inverse of the Vandermonde matrix with applica- tions. Technical report, National Aeronautics and Space Administra- tion, Lewis Research Center, Cleveland Ohio, 1966.

[278] Steve Van den Berghe and Daniel De Zutter. Study of ESD signal entry through coaxial cable shields. Journal of Electrostatics, 44(3– 4):135–148, September 1998.

[279] Alexandre-Th´eophileVandermonde. M´emoiresur la r´esolutiondes ´equations. Histoire de l’Acad´emie royale des sciences avec les m´emoires de math´ematiqueset de physique pour la mˆemeann´eetir´es des registres de cette acad´emie.Ann´eeMDCCLXXI, pages 365–416, 1774.

[280] Alexandre-Th´eophileVandermonde. Remarques sur des probl`emes de situation. Histoire de l’Acad´emieroyale des sciences avec les m´emoires de math´ematiqueset de physique pour la mˆemeann´eetir´esdes reg- istres de cette acad´emie.Ann´eeMDCCLXXI, pages 566–574, 1774.

[281] Alexandre-Th´eophileVandermonde. M´emoiresur des irrationnelles de diff´erents ordres avec une application au cercle. Histoire de l’Acad´emie royale des sciences avec les m´emoires de math´ematiqueset de physique pour la mˆemeann´eetir´esdes registres de cette acad´emie.Ann´eeMD- CCLXXII Premi´ere Partie, pages 489–498, 1775.

[282] Alexandre-Th´eophile Vandermonde. M´emoire sur l’´elimination. Histoire de l’Acad´emie royale des sciences avec les m´emoires de

206 209

REFERENCES math´ematiqueset de physique pour la mˆemeann´eetir´esdes registres de cette acad´emie.Ann´eeMDCCLXXII Seconde Partie, pages 516– 532, 1776.

[283] Robert Vein and Paul Dale. Determinants and Their Applications in Mathematical Physics. Springer-Verlag New York, 1999.

[284] Maryna S. Viazovska. The sphere packing problem in dimension 8. Annals of Mathematics, 185(2):991–1015, 2017.

[285] Abraham Wald. On the efficient design of statistical investigations. The Annals of Mathematical Statistics, 14(2):134–140, June 1943.

[286] Kai Wang, D. Pommerenke, R. Chundru, T. Van Doren, J. L. Drew- niak, and A. Shashindranath. Numerical modeling of electrostatic discharge generators. IEEE Transactions on Electromagnetic Com- patibility, 45(2):258–271, 2003.

[287] Ke Wang, Jinshan Wang, and Xiaodong Wang. Four order electro- static discharge circuit model and its simulation. TELKOMNIKA, 10(8):2006–2012, 2012.

[288] Edward Waring. Problems concerning interpolations. Philosophical Transactions of the Royal Society of London, 69:59–67, 1779.

[289] Waloddi Weibull. A statistical distribution function of wide applica- bility. Journal of Applied Mechanics, 18:293–297, 1951.

[290] Tim Williams. EMC for Product Designers. Newnes, 3rd edition, 2001.

[291] JR Wilmoth. Mortality projections for Japan: A comparison of four methods. health and mortality among elderly population. In Graziella Caselli and Alan D. Lopez, editors, Health and mortality among elderly populations. Clarendon Press, 1996.

[292] John Wishart. The generalised product moment distribution in sam- ples from a normal multivariate population. Biometrika, 20A(1/2):32– 52, July 1928.

[293] Kenneth Wolsson. A condition equivalent to linear dependence for functions with vanishing Wronskian. Linear Algebra and its Applica- tions, 116:1–8, 1989.

[294] Kenneth Wolsson. Linear dependence of a function set of m variables with vanishing generalized Wronskians. Linear Algebra and its Appli- cations, 117:73–80, 1989.

207 210

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

[295] Sebasti`anXamb´o-Descamps. Block Error-Correcting Codes. Springer- Verlag Berlin Heidelberg, 1st edition, 2003.

[296] Shang-Jun Yang, Hua-Zhang Wu, and Quan-Bing Zhang. Generaliza- tion of Vandermonde determinants. Linear Algebra and its Applica- tions, 336:201–204, October 2001.

[297] Yuhong Yang. Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation. Biometrika, 92(4):937–950, 2005.

[298] Natthasurang Yasungnoen and P. Sattayatham. Forecasting thai mor- tality by using the Lee–Carter model. Asia-Pacific Journal of Risk and Insurance, 10(1):91–105, 2015.

[299] Chen Yazhou, Liu Shanghe, Wu Xiaorong, and Zhang Feizhou. A new kind of channel-base current function. In 3rd International symposium on Electromagnetic Compatibility, pages 304–646, 2002.

[300] Bernard Ycart. A case of mathematical eponymy: the Vandermonde determinant. Revue d’Histoire des Math´ematiques, 9(1):43–77, 2013.

[301] Zhiyong Yuan, Tun Li, Jinliang He, Shuiming Chen, and Rong Zeng. New mathematical descriptions of ESD current waveform based on the polynomial of pulse function. IEEE Transactions on Electromagnetic Compatibility, 48(3):589–591, 2006.

[302] Changqing Zhu, Sanghe Liu, and Ming Wei. Analytic expression and numerical solution of ESD current. High Voltage Engineering, 31(7):22–24, 2005. in Chinese.

208 211

Index

accident hump, 69 Gamma function, 35, 134, 139 AEF, see analytically extended func- incomplete, 132, 165 tion generalized divided differences, 46 AIC, 56, 171, 176 Gr¨obnerbasis, 84 second order correction, 59, 169 Gr¨obnerbasis, 84, 86 AICC , see second order correction of the AIC hazard rate, see mortality rate Akaike Information Criterion, see AIC Heidler function, 65 alternant matrix, 28 Hermite polynomial, 96 analytically extended function, 128, interpolation, 39 135, 141 Hermite, 43 central mortality rate, 68 Newton, 44 Coulomb gas, 35 polynomial, 40 Coulombian interaction, 32 Jacobi polynomial, 149 curve fitting, 39 Jacobian matrix, 29, 136 D-optimal design, 61 Kullback–Leibler divergence, 56 death rate, see mortality rate determinant, 23 Lagrange interpolation, 41 Vandermonde, 24 Lagrange multipliers, 84, 85, 87, 89, digamma function, 139 94, 95, 105, 113, 118, 120, divided differences, 44 121, 123, 149 least squares method, 47 electromagnetic compatibility, 62 Lee–Carter method, 70, 179 electromagnetic disturbance, 62 lightning discharge, 64, 142 electromagnetic interference, see elec- likelihood function, 52 tromagnetic disturbance linear model, 40 electrostatic discharge, 63 EMC, see electromagnetic compati- Marquardt least squares method, 49, bility 135, 141 ESD, see electrostatic discharge maximum likelihood estimation, 52 exponential integral, 165 Meijer G-function, 139 Fisher information matrix, 60 MLE, see maximum likelihood esti- force of mortality, see mortality rate mation MLSM, see Marquardt least squares G-optimal design, 60 method

209 212

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions mortality rate, 67 central, 68 models, 163 orthogonal polynomial Hermite, 96 Jacobi, 149 overfitting, 55, 171 power-exponential function, 127, 165

Q-Q plot, 54, 171 quantile-quantile plot, see Q-Q plot regression, 52 Runge’s phenomenon, 41, 62

Schur polynomials, 32 survival function, 67, 165

Vandermonde Alexandre Th´eophile,21 determinant, 23, 24 matrix, 21 generalized, 31 inverse, 27, 41

Wronskian matrix, 29

210 213

List of Figures

1.1 Illustration of the most significant connections in the thesis. . 18 1.2 Some examples of different interpolating curves. The set of red points are interpolated by a polynomial (left), a self-affine fractal (middle) and a Lissajous curve (right)...... 38 1.3 Illustration of Lagrange interpolation of 4 data points. The 4 X red dots are the data set and p(x) = ykp(xk) is the inter- k=1 polating polynomial...... 40 1.4 Illustration of Runge’s phenomenon. Here we attempt to ap- proximate a function (dashed line) by polynomial interpola- tion (solid line). With 7 equidistant sample points (left figure) the approximation is poor near the edges of the interval and increasing the number of sample points to 14 (center) and 19 (right) clearly reduces accuracy at the edges further...... 41 1.5 The basic iteration step of the Marquardt least squares method, definitions of computed quantities are given in (21), (22) and (23)...... 49 1.6 Comparison of different functions representing the Standard ESD current waveshape for 4kV...... 64 1.7 Examples of central mortality rate curves for men demon- strating the typical patterns of rapidly decreasing mortality rate for very young ages followed by a ’hump’ for young adult and a rapid increase for high ages...... 67 x2 y2 2.5 Illustration of the ellipsoid defined by + + z2 = 0 with 9 4 the extreme points of the Vandermonde determinant marked. Displayed in Cartesian coordinates on the right and in ellip- soidal coordinates on the left...... 85 16 2.6 Illustration of the cylinder defined by y2 + z2 = 1 with 25 the extreme points of the Vandermonde determinant marked. Displayed in Cartesian coordinates on the right and in cylin- drical coordinates on the left...... 86

211 214

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

2.7 Illustration of the ellipsoid defined by (45) with the extreme points of the Vandermonde determinant marked. Displayed in Cartesian coordinates on the right and in ellipsoidal coor- dinates on the left...... 89 2 2.12 Illustration of Sp for p = 2, p = 4, p = 6, p = 8, and p = ∞ with a section cut out. The outer cube corresponds to p = 0 and p = 2 corresponds to the sphere in the middle...... 108

3.1 An illustration of how the steepness of the power exponential function varies with β...... 125 3.2 Illustration of the AEF (solid line) and its derivative (dashed line) with different βq,k-parameters but the same Imq and tmq . (a) 0 < βq,k < 1, (b) 4 < βq,k < 5, 12 < βq,k < 13, (d) a mixture of large and small βq,k-parameters.128 3.3 An example of a two-peaked AEF where some of the ηq,k- parameters are negative, so that it has points where the first derivative changes sign between two peaks. The solid line is the AEF and the dashed lines is the derivative of the AEF. . 129 3.4 Schematic description of the parameter estimation algorithm. 135 3.5 First-positive stroke represented by the AEF function. Here it is fitted with respect to both the data points as well as Q0 and W0...... 141 3.6 First-negative stroke represented by the AEF function. Here it is fitted with the extra constraint 0 ≤ η ≤ 1 for all η- parameters...... 141 3.7 Fast-decaying waveshape represented by the AEF function. Here it is fitted with the extra constraint 0 ≤ η ≤ 1 for all η-parameters...... 142 3.8 AEF fitted to measurements from [257]. Here the peaks have been chosen to correspond to local maxima in the measured data...... 142 3.9 AEF fitted to measurements from [257]. Here the peaks have been chosen to correspond to local maxima and minima in the measured data...... 144 3.10 IEC 61000-4-2 Standard ESD current waveform with param- eters, [132] (image slightly modified for clarity)...... 151 3.11 2-peaked AEF interpolated on a D-optimal design represent- ing the IEC 61000-4-2 Standard ESD current waveshape for 4kV...... 151 3.12 3-peaked AEF interpolated to a D-optimal design from mea- sured ESD current from [151, Figure 3] compared with an approximation suggested in [151]. Parameters are given in Table 3.5...... 154

212 215

LIST OF FIGURES

3.13 Close-up of the rising part of a 3-peaked AEF interpolated to a D-optimal design from measured ESD current from [151, Figure 3]. Parameters are given in Table 3.5...... 154 3.14 AEF with 1 peak fitted by interpolating D-optimal points sampled from the Heidler function describing the IEC 61312- 1 waveshape given by (137). Parameters are given in Table 3.6.155 3.15 Close-up of the rising part of the AEF with 1 peak fitted by interpolating D-optimal points samples from the Heidler function describing the IEC61312-1 waveshape given by (137). Parameters are given in Table 3.6...... 155 3.16 Comparison of two AEFs with 13 peaks and 2 terms in each interval fitted to measured lightning discharge current deriva- tive from [69]. One is fitted by interpolation on D-optimal points and the other is fitted with free parameters using the MLSM method. Parameters of the D-optimal version are given in Table 3.7...... 156 3.17 Comparison of two AEFs with 12 peaks and 2 terms in each interval fitted to measured lightning discharge current deriva- tive from [130]. Parameters are given in Table 3.8...... 158 3.18 Comparison of results of integrating the approximating func- tion shown in Figure 3.17...... 158

4.1 Examples of mortality rate curves with multiple humps. These models are hand-fitted and are intended to illustrate that they can replicate multiple humps, not show the best possible fit for multiple humps...... 165 4.2 Examples of the power-exponential model fitted to the central mortality rate for various countries with the role of the two terms illustrated...... 171 4.3 Examples of quantile-quantile plots for the residuals of some models that fit the central mortality rate for USA 2017 well. The closer the residuals are to the dashed line the better the residuals match the expected result from a normal distribu- tion. All models considered in this chapter show some degree of deviation, but more complicated models generally deviate less...... 172 4.4 Examples of instances of overfitting with a few different mod- els. Overfitting around the hump happens occasionally for most of the models where the hump is controlled by a sepa- rate term in the expression for the mortality rate. Here mx,t refers to the central mortality rate for men taken from the Human Mortality Database...... 173

213 216

Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions

4.5 Some examples of the three models introduced in Section 4.3 fitted to central mortality rate for men taken from the Human Mortality Database...... 174 4.6 AIC for seven countries and seventeen models...... 177 4.7 Example of central and forecasted mortality rates for Aus- tralia with original data and two different models. The mor- tality indices were computed using data generated in the pe- riod 1970–2000 and the logarithm of the mortality was fore- casted 10 years into the future. The forecasted mortality rate 2010 is compared to the initial mortality rate (measured mor- tality rate 2000) and the measured value (measured mortality rate 2010). The three models demonstrate how the quality of the prediction can depend on the model. When using the original data the forecast differs relatively much in the age range 20–60 years. When using the logistic model the pre- diction and the central mortality rate are very similar but the model does not describe the actual shape of the mortality rate curve well. When using the power-exponential model the prediction and central mortality rate are very similar except around the peak of the hump...... 181 4.8 Example of estimated and forecasted mortality indices for Australia with three different models along with their 95% confidence intervals. Note that the three different models forecast slightly different trendlines and that the confidence intervals have slightly different widths. In Section 4.5.1, two ways of characterising the reliability in the measured interval (1970-2010) and the forecasted interval (2011-2050), respec- tively, are described...... 182

214 217

List of Tables

2.1 Table of some determinants of generalized Vandermonde ma- trices...... 80 n 2.2 Polynomials, Pp , whose roots give the coordinates of the ex- treme points of the Vandermonde determinant on the sphere defined by the p-norm in n dimensions...... 117

3.1 AEF function’s parameters for some current waveshapes . . . 143 3.2 IEC 61000-4-2 standard ESD current parameters [132]. . . . . 151 3.3 Parameters’ values of 2-peaked AEF representing the IEC 61000-4-2 Standard ESD current waveshape for 4 kV. . . . . 151 3.4 IEC 61312-1 standard current key parameters, [134]...... 153 3.5 Parameters’ values of AEF with 3 peaks representing mea- sured ESD current from [151, Figure 3]...... 154 3.6 Parameters’ values of AEF representing the IEC 61312-1 stan- dard waveshape...... 155 3.7 Parameters’ values of AEF with 13 peaks representing mea- sured data for a lightning discharge current from [245]. Local maxima and corresponding times extracted from [69, Figures 6, 7 and 8] are denoted t and I and other parameters corre- spond to the fitted AEF shown in Figures 3.16 (a), 3.16 (b) and 3.16 (c)...... 157 3.8 Parameters’ value of AEF with 12 peaks representing mea- sured data for a lightning discharge current derivative from [130]. Chosen peak times are denoted t and I and other pa- rameters correspond to the fitted AEF shown in Figure 3.17. 158

4.1 List of the models of mortality rate previously suggested in literature that are considered in this paper. The references gives a source with a more detailed description of the model, not necessarily the original source of the model...... 163

215 218

4.2 Computed AIC values for the different models fitted to the central mortality rate for men for Switzerland for ten different years. In each column the lowest AIC for that year is marked in bold...... 175 4.3 Estimated variance of t found in the way described on page 180. The bold values are the lowest values in each column. . . 183 4.4 Standard error estimates of forecasted mortality indices. . . . 183

List of Definitions

Definition 1.1 ...... 19 Definition 1.2 ...... 21 Definition 1.3 ...... 26 Definition 1.4 ...... 29 Definition 1.5 ...... 33 Definition 1.6 ...... 35 Definition 1.7 ...... 35 Definition 1.8 ...... 35 Definition 1.9 ...... 35 Definition 1.10 ...... 42 Definition 1.11 ...... 44 Definition 1.12 ...... 44 Definition 1.13 ...... 50 Definition 1.14 ...... 54 Definition 1.15 ...... 54 Definition 1.16 ...... 58 Definition 1.17 (The G-optimality criterion) ...... 58 Definition 1.18 (The D-optimality criterion) ...... 59 Definition 1.19 ...... 65 Definition 1.20 ...... 65

Definition 2.1 ...... 82 Definition 2.2 ...... 107 Definition 2.3 ...... 107 Definition 2.4 ...... 107 Definition 2.5 ...... 112 Definition 2.6 ...... 112

Definition 3.1 ...... 125 Definition 3.2 ...... 126

216 219

List of Theorems

Theorem 1.1 (Leibniz formula for determinants) ...... 22 Theorem 1.2 ...... 22 Theorem 1.3 ...... 23 Theorem 1.4 ...... 25 Theorem 1.5 ...... 33 Theorem 1.6 ...... 36 Theorem 1.7 (Kiefer–Wolfowitz equivalence theorem) ...... 59

Theorem 2.1 ...... 90 Theorem 2.2 ...... 91 Theorem 2.3 ...... 92 Theorem 2.4 ...... 96 Theorem 2.5 ...... 105 Theorem 2.6 (Vieta’s formula) ...... 112 Theorem 2.7 (Newton–Girard formulae) ...... 112 Theorem 2.8 ...... 116 Theorem 2.9 ...... 118 Theorem 2.10 ...... 119

Theorem 3.1 ...... 127 Theorem 3.2 ...... 131 Theorem 3.3 ...... 131 Theorem 3.4 ...... 132 Theorem 3.5 ...... 135 Theorem 3.6 ...... 147 Theorem 3.7 ...... 149

217 220

List of Lemmas

Lemma 1.1 ...... 34 Lemma 1.2 ...... 34 Lemma 1.3 ...... 36 Lemma 1.4 ...... 36 Lemma 1.5 ...... 43 Lemma 1.6 ...... 43 Lemma 1.7 ...... 51 Lemma 1.8 ...... 51 Lemma 1.9 ...... 54 Lemma 1.10 ...... 58 Lemma 1.11 ...... 65

Lemma 2.1 ...... 82 Lemma 2.2 ...... 84 Lemma 2.3 ...... 87 Lemma 2.4 ...... 87 Lemma 2.5 ...... 90 Lemma 2.6 ...... 91 Lemma 2.7 ...... 95 Lemma 2.8 ...... 103 Lemma 2.9 ...... 110 Lemma 2.10 ...... 113 Lemma 2.11 ...... 113 Lemma 2.12 ...... 114 Lemma 2.13 ...... 120

Lemma 3.1 ...... 128 Lemma 3.2 ...... 129 Lemma 3.3 ...... 129 Lemma 3.4 ...... 130

218 Mälardalen University Doctoral Dissertation293 Doctoral University Mälardalen Extreme points of the Vandermonde Extreme pointsthe Vandermonde of and phenomenological determinant exponential withmodelling power functions Lundengård Karl

Karl Lundengård EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND PHENOMENOLOGICAL MODELLING WITH POWER EXPONENTIAL FUNCTIONS 2019 ISBN 978-91-7485-431-2 ISSN 1651-4238 P.O. Box 883, SE-721 23 Västerås. Sweden 883, SE-721 23 Västerås. Box Address: P.O. Sweden 325, SE-631 05 Eskilstuna. Box Address: P.O. www.mdh.se E-mail: [email protected] Web: