Mälardalen University Doctoral Dissertation293 Doctoral University Mälardalen Extreme points of the Vandermonde Extreme pointsthe Vandermonde of and phenomenological determinant exponential withmodelling power functions Lundengård Karl
Karl Lundengård EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND PHENOMENOLOGICAL MODELLING WITH POWER EXPONENTIAL FUNCTIONS 2019 ISBN 978-91-7485-431-2 ISSN 1651-4238 P.O. Box 883, SE-721 23 Västerås. Sweden 883, SE-721 23 Västerås. Box Address: P.O. Sweden 325, SE-631 05 Eskilstuna. Box Address: P.O. www.mdh.se E-mail: [email protected] Web: 1
Mälardalen University Press Dissertations No. 293
EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND PHENOMENOLOGICAL MODELLING WITH POWER EXPONENTIAL FUNCTIONS
Karl Lundengård
2019
School of Education, Culture and Communication 2
Copyright © Karl Lundengård, 2019 ISBN 978-91-7485-431-2 ISSN 1651-4238 Printed by E-Print AB, Stockholm, Sweden 3
Mälardalen University Press Dissertations No. 293
EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND PHENOMENOLOGICAL MODELLING WITH POWER EXPONENTIAL FUNCTIONS
Karl Lundengård
Akademisk avhandling som för avläggande av filosofie doktorsexamen i matematik/tillämpad matematik vid Akademin för utbildning, kultur och kommunikation kommer att offentligen försvaras torsdagen den 26 september 2019, 13.15 i Delta, Mälardalens högskola, Västerås.
Fakultetsopponent: Professor Palle Jorgensen, University of Iowa
Akademin för utbildning, kultur och kommunikation 4
Abstract This thesis discusses two topics, finding the extreme points of the Vandermonde determinant on various surfaces and phenomenological modelling using power-exponential functions. The relation between these two problems is that they are both related to methods for curve-fitting. Two applications of the mathematical models and methods are also discussed, modelling of electrostatic discharge currents for use in electromagnetic compatibility and modelling of mortality rates for humans. Both the construction and evaluation of models is discussed. In the first chapter the basic theory for later chapters is introduced. First the Vandermonde matrix, a matrix whose rows (or columns) consists of monomials of sequential powers, its history and some of its properties are discussed. Next, some considerations and typical methods for a common class of curve fitting problems are presented, as well as how to analyse and evaluate the resulting fit. In preparation for the later parts of the thesis the topics of electromagnetic compatibility and mortality rate modelling are briefly introduced. The second chapter discusses some techniques for finding the extreme points for the determinant of the Vandermonde matrix on various surfaces including spheres, ellipsoids and cylinders. The discussion focuses on low dimensions, but some results are given for arbitrary (finite) dimensions. In the third chapter a particular model called the p-peaked Analytically Extended Function (AEF) is introduced and fitted to data taken either from a standard for electromagnetic compatibility or experimental measurements. The discussion here is entirely focused on currents originating from lightning or electrostatic discharges. The fourth chapter consists of a comparison of several different methods for modelling mortality rates, including a model constructed in a similar way to the AEF found in the third chapter. The models are compared with respect to how well they can be fitted to estimated mortality rate for several countries and several years and the results when using the fitted models for mortality rate forecasting is also compared.
ISBN 978-91-7485-431-2 ISSN 1651-4238 5
Acknowledgements
Many thanks to all my coauthors and supervisors. My main supervisor, Pro- fessor Sergei Silvestrov, introduced me to the Vandermonde matrix and fre- quently suggested new problems and research directions throughout my time as a doctoral student. I have learned many lessons about mathematics and academia from him and my co-supervisor Professor Anatoliy Malyarenko. My other co-supervisor Dr. Milica Ranˇci´cplayed a crucial role and she is a role model with regards to conscientiousness, work ethic, communication and patience. I have learned invaluable lessons about interdisciplinary re- search, communication and time and resource management from her. I also want to thank Dr. Vesna Javor for her regular input that improved the research on electromagnetic compatibility considerably. Cooperating with other doctoral students was very valuable. Jonas Osterberg¨ and Asaph Keikara Muhumuza (with support from his super- visors Dr. John M. Mango and Dr. Godwin Kakuba) made important contributions to the research on the Vandermonde determinant and Samya Suleiman’s understanding of mortality rate forecasting and other aspects of actuarial mathematics was necessary for the work to progress. I am also glad that I had the opportunity to take part in the supervision of talented master students Andromachi Boulogari and Belinda Strass and use the foundations they laid in their degree projects for further research. Many thanks to all my coworkers at M¨alardalenUniversity, especially to Dr. Christopher Engstr¨om,Dr. Johan Richter and Docent Linus Carls- son for managing the bachelor’s and master’s programmes in Engineering mathematics together with me. Perhaps most importantly, I thank my family for all the support, en- couragement and assistance you have given me. A special mention to my sister for help with translating from 18th century French, it is perfectly un- derstandable that you decided to move to the other side of the Earth after that. I will wonder my whole life how my father, whose entire mathematics career consisted of unsuccessfully solving a single problem on the blackboard in 9th grade, would have reacted to this dissertation if he were still with us. Fortunately my mother continues to be an endless source of support and encouragement. I am continually surprised and delighted over how much of her work ethics, sense of quality and unhealthy work habits I seem to have inherited from her. Without the ideas, requests, remarks, questions, encouragements and patience of those around me this work would not have been completed.
Karl Lundeng˚ard V¨aster˚as, September, 2019
3 6
Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions
Popul¨arvetenskaplig sammanfattning
Det finns m˚angaf¨oreteelser i v¨arldensom det ¨ar¨onskv¨artatt beskriva med en matematisk modell. I b¨astafall kan modellen h¨arledasifr˚anl¨amplig grundl¨aggandeteori men ibland ¨ardet inte m¨ojligtatt g¨oradet, antingen d¨arf¨oratt det inte finns n˚agonv¨alutvecklad teori eller f¨oratt den teori som finns kr¨aver information som inte ¨artillg¨anglig. I detta fall s˚abeh¨ovsen modell som, i n˚agonm˚an,st¨ammer ¨overens med teori och empiriska observa- tioner men som inte ¨arh¨arleddfr˚anden grundl¨aggandeteorin. S˚adana mod- eller kallas f¨orfenomenologiska modeller. I denna avhandling konstrueras fenomenologiska modeller av tv˚aolika fenomen, str¨ommeni elektrostatiska urladdningar och d¨odsrisk. Elektrostatiska urladdningar sker n¨arladdning snabbt fl¨odarfr˚anett objekt till ett annat. V¨albekanta exempel ¨arblixtnedslag eller sm˚ast¨otar orsakade av statisk elektricitet. F¨oringenj¨orer¨ardet viktigt att kunna beskriva denna typ av elektriska str¨ommarf¨oratt se till att elektroniska system inte ¨arf¨ork¨ansligaf¨orelektromagnetisk p˚averkan utifr˚anoch att de inte st¨orandra system d˚ade anv¨ands. D¨odsrisken beskriver sannolikheten f¨ord¨odvid en viss ˚alder.Den kan anv¨andasf¨oratt uppskatta livskvaliteten i ett land eller andra demografiska eller f¨ors¨akringsrelaterade¨andam˚al. En egenskap hos b˚adeelektrostatiska urladdningar och d¨odsrisksom kan vara utmanande att modellera ¨aromr˚adend¨aren brant ¨okningf¨oljs av en l˚angsams¨ankning.S˚adanam¨onsterf¨orekommer ofta i elektrostatiska urladdningar och i m˚angal¨ander¨okar d¨odsrisken kraftigt vid ¨overg˚angen fr˚anbarn till vuxen och f¨or¨andrassedan l˚angsamt fram till tidig medel˚alder. I denna avhandling anv¨andsen matematisk funktion som kallas poten- sexponentialfunktionen som en byggsten f¨oratt konstruera fenomenologiska modeller av str¨ommeni elektrostatiska urladdningar samt d¨odsriskutifr˚an empiriska data f¨orrespektive fenomen. F¨orelektrostatiska urladdningar f¨oresl˚asen metod som kan konstruera modeller med olika noggrannhet och komplexitet. F¨ord¨odsrisker f¨oresl˚asn˚agraenkla modeller som sedan j¨amf¨ors med tidigare f¨oreslagnamodeller. I avhandlingen diskuteras ocks˚aextrempunkterna hos Vandermonde de- terminanten. Detta ¨arett matematiskt problem som f¨orekommer inom flera olika omr˚adenmen f¨oravhandlingen ¨arden mest relevanta till¨ampningen att extrempunkterna kan hj¨alpatill att v¨aljal¨ampligadata att anv¨anda n¨ar man konstruerar modeller med hj¨alpav en teknik som kallas f¨oroptimal design. N˚agraallm¨annaresultat f¨orhur extrempunkterna kan hittas p˚adi- verse ytor, t.ex. sf¨areroch kuber, presenteras och det ges exempel p˚ahur resultaten kan till¨ampas.
4 7
Popular science summary
There are many phenomena in the world that it is desirable to describe using a mathematical model. Ideally the mathematical model is derived from the appropriate fundamental theory but sometimes this is not feasible, either because the fundamental theory is not well understood or because the theory requires a lot of information to be applicable. In these cases it is necessary to create a model that, to some degree, matches the fundamental theory and the empirical observations, but is not derived from the fundamental theory. Such models are called phenomenological models. In this thesis phenomenological models are constructed for two phenomena, electrostatic discharge currents and mortality rates. Electrostatic discharge currents are rapid flows of electric charge from one object to another. Well-known examples are lightning strikes or small electric chocks caused by static electricity. Describing such currents is im- portant when engineers want to ensure that electronic systems are not dis- turbed too much by external electromagnetic disturbances or disturbs other systems when used. Mortality rate describes the probability of a dying at certain age. It can be used to assess the quality of life in a country or for other demographical or actuarial purposes. For electrostatic discharge currents and mortality rates an important feature that can be challenging to model is a steep increase followed by a slower decrease. This pattern is often observed in electrostatic discharge currents and in many countries the mortality rate increases rapidly in the transition from childhood to adulthood and then changes slowly until the beginning of middle age. In this thesis a mathematical function called the power-exponential func- tion is used as a building block to construct phenomenological models of electrostatic discharge currents and mortality rates based on empirical data for the respective phenomena. For electrostatic discharge currents a method- ology for constructing models with different accuracy and complexity is pro- posed. For the mortality rates a few simple models are suggested and com- pared to previously suggested models. The thesis also discusses the extreme points of the Vandermonde deter- minant. This is a mathematical problem that appears in many areas but for this thesis the most relevant application is that it helps choosing the appropriate data to use when constructing a model using a technique called optimal design. Some general results for finding the extreme points of the Vandermonde determinant on various surfaces, e.g. spheres or cubes, and applications of these results are discussed.
5 8
Notation
Matrix and vector notation v, M - Bold, roman lower- and uppercase letters denote vectors and matrices respectively.
Mi,j - Element on the ith row and jth column of M. M·,j, Mi,· - Column (row) vector containing all elements from the jth column (ith row) of M. nm [aij]ij - n × m matrix with element aij in the ith row and jth column.
Vnm, Vn = Vnn - n × m Vandermonde matrix. Gnm, Gn = Gnn - n × m generalized Vandermonde matrix. Anm, An = Ann - n × m alternant matrix.
Standard sets Z, N, R, C - Sets of all integers, natural numbers (including 0), real numbers and complex numbers. n n n Sp , S = S2 - The n-dimensional sphere defined by the p - norm, ( n+1 ) n n+1 X p Sp (r) = x ∈ R |xk| = r . k=1 k C [K] - All functions on K with continuous kth derivative.
Special functions Definitions can be found in standard texts. Suggested sources use notation consistent with thesis. (α,β) Hn, Pn - Hermite and Jacobi polynomials, see [2]. Γ(x), γ(x, y), ψ(x) - The Gamma-, incomplete Gamma and Digamma functions, see [2].
2F2(a, b; c; x) - The hypergeometric function, see [2]. m,n a Gp,q z - The Meijer G-function, see [236]. b Ei(x) - The exponential integral, see [2].
6 9
Probability theory and statistics Pr[A] - Probability of event A. Pr[A|B] - Conditional probability of event A given B.
EX [Y ] - Expected value of quantity Y with respect to X. Var(X) - Variance of X. AIC - Akaike Information Criterion, see Definition 1.14.
AICC - Second order correction of the AIC, see Remark 1.9. I(f, g) - Kullback–Leibler divergence, see Definition 1.15.
Mortality rate
Sx(∆x) - Survival function, see Definition 1.19. Tx - Remaining lifetime for an individual of age x. µ(x) - Mortality rate at age x, see Definition 1.20. mx,t - Central mortality rate at age x, year t, see page 66.
Other df = f 0(x) - Derivative of the function f with respect to x. dx dkf = f (k)(x)- kth derivative of the function f with respect to x. dxk ∂f = f 0(x) - Partial derivative of the function f with respect to x. ∂x a¯b - Rising factorial a¯b = a(a + 1) ··· (a + b − 1).
7 10 11
Contents
List of Papers 13
1 Introduction 15 1.1 The Vandermonde matrix ...... 19 1.1.1 Who was Vandermonde? ...... 19 1.1.2 The Vandermonde determinant ...... 21 1.1.3 Inverse of the Vandermonde matrix ...... 25 1.1.4 The alternant matrix ...... 26 1.1.5 The generalized Vandermonde matrix ...... 29 1.1.6 The Vandermonde determinant in systems with Coulombian interactions ...... 30 1.1.7 The Vandermonde determinant in random matrix theory ...... 33 1.2 Curve fitting ...... 37 1.2.1 Linear interpolation ...... 37 1.2.2 Generalized divided differences and interpolation . . . 42 1.2.3 Least squares fitting ...... 45 1.2.4 Linear least squares fitting ...... 45 1.2.5 Non-linear least squares fitting ...... 46 1.2.6 The Marquardt least squares method ...... 47 1.3 Analysing how well a curve fits ...... 50 1.3.1 Regression ...... 50 1.3.2 Quantile-Quantile plots ...... 52
9 12
Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions
1.3.3 The Akaike information criterion ...... 53 1.4 D-optimal experiment design ...... 57 1.5 Electromagnetic compatibility and electrostatic discharge currents ...... 60 1.5.1 Electrostatic discharge modelling ...... 62 1.6 Modelling mortality rates ...... 65 1.6.1 Lee–Carter method for forecasting ...... 68 1.7 Summaries of papers ...... 71
2 Extreme points of the Vandermonde determinant 75 2.1 Extreme points of the Vandermonde determinant and related determinants on various surfaces in three dimensions . . . . . 77 2.1.1 Optimization of the generalized Vandermonde deter- minant in three dimensions ...... 77 2.1.2 Extreme points of the Vandermonde determinant on the three-dimensional unit sphere ...... 81 2.1.3 Optimisation using Gr¨obnerbases ...... 82 2.1.4 Extreme points on the ellipsoid in three dimensions . 83 2.1.5 Extreme points on the cylinder in three dimensions . . 85 2.1.6 Optimizing the Vandermonde determinant on a sur- face defined by a homogeneous polynomial ...... 87 2.2 Extreme points of the Vandermonde determinant on the sphere 89 2.2.1 The extreme points on the sphere given by roots of a polynomial ...... 89 2.2.2 Further visual exploration on the sphere ...... 96 2.3 Extreme points of the Vandermonde determinant on some surfaces implicitly defined by a univariate polynomial . . . . 103 2.3.1 Critical points on surfaces given by a first degree uni- variate polynomial ...... 104 2.3.2 Critical points on surfaces given by a second degree univariate polynomial ...... 105 2.3.3 Critical points on the sphere defined by a p-norm . . . 107 2.3.4 The case p = 4 and n =4...... 107
10 13
CONTENTS
2.3.5 Some results for even n and p ...... 110 2.3.6 Some results for cubes and intersections of planes . . . 118 2.3.7 Optimising the probability density function of the eigenvalues of the Wishart matrix ...... 120
3 Approximation of electrostatic discharge currents using the analytically extended function 123 3.1 The analytically extended function (AEF) ...... 125 3.1.1 The p-peak analytically extended function ...... 126 3.2 Approximation of lightning discharge current functions . . . . 133 3.2.1 Fitting the AEF ...... 133 3.2.2 Estimating parameters for underdetermined systems . 134 3.2.3 Fitting with data points as well as charge flow and specific energy conditions ...... 135 3.2.4 Calculating the η-parameters from the β-parameters . 138 3.2.5 Explicit formulas for a single-peak AEF ...... 139 3.2.6 Fitting to lightning discharge currents ...... 140 3.3 Approximation of electrostatic discharge currents using the AEF by interpolation on a D-optimal design ...... 143 3.3.1 D-optimal approximation for exponents given by a class of arithmetic sequences ...... 145 3.3.2 D-optimal interpolation on the rising part ...... 146 3.3.3 D-optimal interpolation on the decaying part . . . . . 148 3.3.4 Examples of models from applications and experiments 150 3.3.5 Modelling of ESD currents ...... 150 3.3.6 Modelling of lightning discharge currents ...... 152 3.3.7 Summary of ESD modelling ...... 159
4 Comparison of models of mortality rate 161 4.1 Modelling and forecasting mortality rates ...... 162 4.2 Overview of models ...... 162 4.3 Power-exponential mortality rate models ...... 164
11 14
Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions
4.3.1 Multiple humps ...... 165 4.3.2 Single hump model ...... 165 4.3.3 Split power-exponential model ...... 166 4.3.4 Adjusted power-exponential model ...... 166 4.4 Fitting and comparing models ...... 167 4.4.1 Some comments on fitting ...... 168 4.4.2 Results and discussion ...... 174 4.5 Comparison of parametric models applied to mortality rate forecasting ...... 178 4.5.1 Comparison of models ...... 180 4.5.2 Results, discussion and further work ...... 180
References 185
Index 209
List of Figures 211
List of Tables 215
List of Definitions 216
List of Theorems 217
List of Lemmas 218
12 15
List of Papers
Paper A Karl Lundeng˚ard,Jonas Osterberg¨ and Sergei Silvestrov. Extreme points of the Vandermonde determinant on the sphere and some limits involving the generalized Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019.
Paper B Karl Lundeng˚ard,Jonas Osterberg¨ and Sergei Silvestrov. Optimization of the determinant of the Vandermonde matrix on the sphere and related surfaces. Methodology and Computing in Applied Probability, Volume 20, Issue 4, pages 1417 – 1428, 2018.
Paper C Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Extreme points of the Vandermonde determinant on surfaces implicitly determined by a univariate polynomial. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019.
Paper D Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Optimization of the Wishart joint eigenvalue probability density distribution based on the Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019.
Paper E Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. On some properties of the multi-peaked analytically extended function for approximation of lightning discharge currents. Chapter 10 in Engineering Mathematics I: Electromagnetics, Fluid Mechanics, Material Physics and Financial Engineering, Volume 178 of Springer Proceedings in Mathematics & Statistics, Sergei Silvestrov and Milica Ranˇci´c(Eds), Springer International Publishing, pages 151–176, 2016. 16
Paper F Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. Estimation of parameters for the multi-peaked AEF current functions. Methodology and Computing in Applied Probability, Volume 19, Issue 4, pages 1107 – 1121, 2017.
Paper G Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. Electrostatic discharge currents representation using the analytically extended function with p peaks by interpolation on a D-optimal design. Facta Universitatis Series: Electronics and Energetics, Volume 32, Issue 1, pages 25 – 49, 2019.
Paper H Karl Lundeng˚ard,Milica Ranˇci´cand Sergei Silvestrov. Modelling mortality rates using power-exponential functions. Submitted to journal, 2019.
Paper I Andromachi Boulougari, Karl Lundeng˚ard,Milica Ranˇci´c, Sergei Silvestrov, Belinda Strass and Samya Suleiman. Application of a power-exponential function based model to mortality rates forecasting. Communications in Statistics: Case Studies, Data Analysis and Applications, Volume 5, Issue 1, pages 3 – 10, 2019.
Parts of the thesis have been presented at the following international conferences:
• ASMDA 2015 - 16th Applied Stochastic Models and Data Analysis In- ternational Conference with 4th Demographics 2015 Workshop, Piraeus, Greece, June 30 – July 4, 2015.
• SPLITECH 2017 - 2nd International Multidisciplinary Conference on Computer and Energy Science, Split, Croatia, July 12 – 14, 2017.
• EMC+SIPI 2017 - IEEE International Symposium on Electromagnetic Compatibility, Signal and Power Integrity, Washington DC, USA, August 7 – 11, 2017.
• SPAS 2017 - International Conference on Stochastic Processes and Alge- braic Structures, V¨aster˚as,Sweden, October 4 – 6, 2017.
• SMTDA 2018 - 5th Stochastic Modelling Techniques and Data Analysis International Conference, Chania, Crete, Greece, June 12 – 15, 2018.
• IWAP 2018 - 9th International Workshop on Applied Probability, Bu- dapest, Hungary, June 18–21, 2018.
Summaries of papers A-I with a brief description of the thesis authors contributions to each paper can be found in Section 1.7.
14 17
Chapter 1
Introduction
This chapter is partially based on Papers D, E, H, and I
Paper D Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Optimization of the Wishart joint eigenvalue probability density distribution based on the Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019.
Paper E Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. On some properties of the multi-peaked analytically extended function for approximation of lightning discharge currents. Chapter 10 in Engineering Mathematics I: Electromagnetics, Fluid Mechanics, Material Physics and Financial Engineering, Volume 178 of Springer Proceedings in Mathematics & Statistics, Sergei Silvestrov and Milica Ranˇci´c(Eds), Springer International Publishing, pages 151–176, 2016.
Paper H Karl Lundeng˚ard,Milica Ranˇci´cand Sergei Silvestrov. Modelling mortality rates using power-exponential functions. Submitted to journal, 2019.
Paper I Andromachi Boulougari, Karl Lundeng˚ard,Milica Ranˇci´c, Sergei Silvestrov, Belinda Strass and Samya Suleiman. Application of a power-exponential function based model to mortality rates forecasting. Communications in Statistics: Case Studies, Data Analysis and Applications, Volume 5, Issue 1, pages 3 – 10, 2019. 18 19
INTRODUCTION
Two topics are discussed in this thesis, finding the extreme points of the Vandermonde determinant and phenomenological modelling using power- exponential functions. Several of the methods and approaches that are discussed are also applied to modelling of electrical current for use in elec- tromagnetic compatibility, or to modelling of mortality rate of humans for actuarial or demographical purposes. The topics are related since the ex- treme points of the Vandermonde determinant is relevant for certain curve fitting problems that can appear in the construction of the phenomenologi- cal models. An overview of the major relations between the different parts of the thesis are illustrated in Figure 1.1. The relations are of many kinds, common definitions and dependent results, conceptual connections as well as similarities in proof techniques and problem formulations. This thesis is based on the nine papers listed on pages 13–14. The contents of the papers have been rearranged (and in some cases parts have been omitted) to avoid repetition and improve cohesion, but the original text and structure of the papers have been largely preserved. Significant parts of Chapters 1-3 have also appeared in [180]. If a section is based on a paper this is specified at the beginning of the section and unless otherwise specified any subsections are from the same source. A section that is based on a paper contains text from the paper that is unchanged except for modifications to correct misprints and ensure consistency within the thesis. Chapter 1 introduces concepts used in later chapters. The Vandermonde matrix, its history, applications, generalizations and some of its proper- ties are introduced in Section 1.1. Section 1.2 discusses a few different ap- proaches to curve fitting. Section 1.3 discusses a few methods for evaluating the result. Basic optimal design is discussed in Section 1.4. Sections 1.5 and 1.6 introduce electromagnetic compatibility and mortality rate modelling. Chapter 2 discusses the optimisation of the Vandermonde determinant over various surfaces. First the extreme points on a few different surfaces in three dimensions are examined, see Section 2.1. In Section 2.2 the determi- nant is optimised on the sphere in higher dimensions and some results for surfaces defined by a univariate polynomial are discussed in Section 2.3. Chapter 3 discusses fitting a piecewise non-linear regression model to data. The particular model is introduced in Section 3.1 and a general frame- work for fitting it to data using the Marquardt least squares method is de- scribed in Sections 3.2.1–3.2.5. The framework is then applied to lightning discharge currents in Section 3.2.6. An alternate curve fitting method based on D-optimal interpolation (found analogously to the results in Section 2.2) is described and applied to electrostatic discharge currents in Section 3.3. Chapter 4 compares several different mathematical models of mortality rate for humans. The comparison is done by fitting the models to central mortality rates from several different countries and then analysing how well the model fits and what happens when the results of the fitting is used for mortality rate forecasting (using the so called Lee–Carter method).
17 20
Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions
Curve fitting
Least squares Linear Non-linear method least squares fitting least squares fitting Section 1.2.3 Section 1.2.4 Section 1.2.5
The Marquardt Linear interpolation D-optimal design least squares method Section 1.2.1 Section 1.4 Section 1.2.6
Extreme points of Phenomenological modelling with the Vandermonde power-exponential functions determinant
Vandermonde matrix Power exponential function Section 1.1
Extreme points on Electromagnetic Evaluation of various surfaces in 3D compatibility curve fit Section 2.1 Section 1.5 Section 1.3
Optimization The Analytically Mortality rate on a sphere Extended Function modelling Section 2.2 Section 3.1 Section 1.6
Optimization on a Mortality rate Lightning discharge surface defined by a models fitted current modelling univariate polynomial to data Section 3.2 Section 2.3 Section 4.1
Mortality rate Interpolation on models applied a D-optimal design to forecasting Section 3.3 Section 4.5
Figure 1.1: Illustration of the most significant connections in the thesis.
18 21
1.1. THE VANDERMONDE MATRIX
1.1 The Vandermonde matrix
The Vandermonde matrix is a well-known matrix with a very special form that appears in many different circumstances, a few examples are polynomial interpolation (see Sections 1.2.1 and 1.2.2), least squares curve fitting (see Section 1.2.3), optimal experiment design (see Section 1.4), construction of error-detecting and error-correcting codes (see [31, 124, 242] as well as more recent work such as [28]), determining if a market with a finite set of traded assets is complete [62], calculation of the discrete Fourier transform [241] and related transforms such as the fractional discrete Fourier transform [215], the quantum Fourier transform [70], and the Vandermonde transform [11, 12], solving systems of differential equations with constant coefficients [213], various problems in mathematical physics [283], nuclear physics [51], and quantum physics [249, 271], systems of Coulombian interactions (see Section 1.1.6) and describing properties of the Fisher information matrix of stationary stochastic processes [158] and in various places in random matrix theory (see Sections 1.1.7 and 2.3.7). In this section we will review some of the basic properties of the Van- dermonde matrix, starting with its definition. Definition 1.1. A Vandermonde matrix is an n × m matrix of the form 1 1 ··· 1 h im,n x1 x2 ··· xn V (x ) = xi−1 = (1) mn n j . . .. . i,j . . . . m−1 m−1 m−1 x1 x2 ··· xn where xi ∈ C, i = 1, . . . , n. If the matrix is square, n = m, the notation Vn = Vnm will be used. Remark 1.1. Note that in the literature the term Vandermonde matrix is often used for the transpose of the matrix given above.
1.1.1 Who was Vandermonde? The matrix is named after Alexandre Th´eophileVandermonde (1735–1796) who had a varied career that began with law studies and some success as a concert violinist, transitioned into work in science and mathematics in the beginning of the 1770s that gradually turned into administrative and leadership positions at various Parisian institutions as well as work in politics and economics in the end of the 1780s [86]. His entire mathematical career consisted of four published papers, first presented to the French Academy of Sciences in 1770 and 1771 and published a few years later. The first paper, M´emoire sur la r´esolutiondes ´equations [279], discusses some properties of the roots of polynomial equations, more specifically for- mulas for the sum of the roots and a sum of symmetric functions of the pow-
19 22
Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions ers of the roots. This paper has been mentioned as important since it con- tains some of the fundamental ideas of group theory (see for instance [168]), but generally this work is overshadowed by the works of the contempo- rary Joseph Louis Lagrange (1736–1813) [166]. He also notices the equality a2b + b2c + ac2 − a2c − ab2 − bc2 = (a − b)(a − c)(b − c), which is a special case of the formula for the determinant of the Vandermonde matrix, but this connection is not discussed in the paper. The second paper, Remarques sur des probl`emesde situation [280], dis- cusses the problem of the knight’s tour (what sequence of moves allows a knight to visit all squares on a chessboard exactly once). This paper is con- sidered the first mathematical paper that uses the basic ideas of what is now called knot theory [237]. The third paper, M´emoire sur des irrationnelles de diff´erents ordres avec une application au cercle [281], is a paper on combinatorics and the most well-known result from the paper is the Chu–Vandermonde identity, n k n−k n X Y r + 1 − j Y s + 1 − j Y r + s + 1 − j = , j j j k=1 j=1 j=1 j=1 where r, s ∈ R and n ∈ Z. The identity was first found by Chu Shih-Chieh ca 1260 – ca 1320, traditional chinese: 朱世傑 in 1303 in The precious mirror of the four elements 四元玉 and was rediscovered (apparently independently) by Vandermonde [8, 223]. In the fourth paper M´emoire sur l’´elimination [282] Vandermonde dis- cusses some ideas for what we today call determinants, which are functions that can tell us if a linear equation system has a unique solution or not. The paper predates the modern definitions of determinants but Vander- monde discusses a general method for solving linear equation systems using alternating functions, which has strong relation to determinants. He also notices that exchanging exponents for indices in a class of expressions from his first paper will give a class of expressions that he discusses in his fourth paper [300]. This relation is mirrored in the relationship between the deter- minant of the Vandermonde matrix and the determinant of a general matrix described in Theorem 1.3. While Vandermonde’s papers can be said to contain many important ideas they do not bring any of them to maturity and he is therefore usu- ally considered a minor scientist and mathematician compared to well- known contemporary mathematicians such as Etienne´ B´ezout(1730–1783) and Pierre-Simon de Laplace (1749–1827) or scientists such as the chemist Antoine Lavoisier (1743–1794) that he worked with for some time after his mathematical career. The Vandermonde matrix does not appear in any of Vandermonde’s published works, which is not surprising considering that the modern matrix concept did not really take shape until almost a hundred years later in the works of Sylvester and Cayley [43, 268]. It is therefore
20 23
1.1. THE VANDERMONDE MATRIX strange that the Vandermonde matrix was named after him, a thorough discussion on this can be found in [300], but a possible reason is the simple formula for the determinant that Vandermonde briefly discusses in his fourth paper can be generalized to a Vandermonde matrix of any size. One of the main reasons that the Vandermonde matrix has become known is that it has an exceptionally simple expression for its determinant that in turn has a surprisingly fundamental relation to the determinant of a general matrix. We will be taking a closer look at the determinant of the Vandermonde ma- trix and related matrices several times in this thesis so the next section will introduce it and some of its properties.
1.1.2 The Vandermonde determinant Often it is not the Vandermonde matrix itself that is useful, instead it is the multivariate polynomial given by its determinant that is examined and used. The determinant of the Vandermonde matrix is usually called the Vander- monde determinant (or Vandermonde polynomial or Vandermondian [283]) and can be written using an exceptionally simple formula. But before we discuss the Vandermonde determinant we will disuss the general determi- nant. Definition 1.2. The determinant is a function of square matrices over a field F to the field F, det : Mn×n(F) → F such that if we consider the determinant as a function of the columns det(M) = det(M·,1, M·,2,..., M·,n) of the matrix the determinant must have the following properties
• The determinant must be multilinear det(M·,1, . . . , aM·,k + bN·,k,..., M·,n)
= a det(M·,1,..., M·,k,..., M·,n) + b det(M·,1,..., N·,k,..., M·,n).
• The determinant must be alternating, that is if M·,i = M·,j for some i 6= j then det(M) = 0.
• If I is the identity matrix then det(I) = 1.
Remark 1.2. Defining the multilinear and alternating properties from the rows of the matrix will give the same determinant. The name of the alter- nating property comes from the fact that it combined with multilinearity implies that switching places between two columns changes the sign of the determinant. This definition of the determinant is quite abstract but it is sufficient to define a unique function.
21 24
Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions
Theorem 1.1 (Leibniz formula for determinants). A standard result from linear algebra says that the determinant is unique and that it is given by the following formula n X I(σ) Y det(M) = (−1) mi,σ(i) (2) σ∈Sn i=1 where Sn is the set of all permutations of the set {1, 2, . . . , n}, that is all lists that contain the numbers 1, 2, . . . , n exactly once, and if σ is a permutation then σ(i) is the ith element of that permutation. Remark 1.3. Often formula (2) is used immediately as the definition of the determinant of a matrix, see for instance [9]. The formula is usually attributed to Gottfried Wilhem Leibniz (1646–1716), probably due to a letter that he wrote to Guillaume de l’Hˆopital(1661–1704) in 1693 where he describes a method of solving linear equation systems that is closely related to Cramer’s rule [218], the particular letter was published in [173] and a translation can be found in [263]. The determinant has several uses and interpretations, for example • If det(M) 6= 0 then the vectors corresponding to the columns (or rows) are linearly independent. Compare this to the properties of the Wronskian matrix described on page 27. • If the columns (or rows) of M are interpreted as sides defining an n-dimensional parallelepiped the absolute value of det(M) will give the volume of this parallelepiped. Compare this to the interpretation of D-optimality on page 58. The sign of the determinant is also important when considering the orientation of the surface which is highly relevant in geometric algebra and integration over several variables, see [123, 246] for examples in geometric algebra, physics and analysis. We will now discuss the Vandermonde determinant specifically.
Theorem 1.2. The Vandermonde determinant, vn(x1, . . . , xn), is given by Y vn(x1, . . . , xn) = det(Vn(x1, . . . , xn)) = (xj − xi). 1≤i 22 25 1.1. THE VANDERMONDE MATRIX Proof of Theorem 1.2. There are many versions of this proof, see for exam- ple [18,36,42,126], with focus on different aspects of the proof. Here we will provide a fairly concise version that still makes all the steps of the proof clear. We start by only considering one of the variables xk, which gives a single variable function vn(xk). From the general expression for the deter- minant, expression (2) it is clear that vn(xk) must be a polynomial of degree n in xk. We also know that if we let xk = xi for any 1 ≤ i ≤ n, i 6= k, the determinant will be equal to zero since the corresponding matrix will have two identical columns. Thus if vn(xi) = 0 we can write n Y vn(xk) = P (xk) (xk − xi) i=1 i6=k where P (xk) is a polynomial. If we repeat this argument for all the variables, and ensure that no roots appear twice in the factorization, we get n−1 Y vn(x1, . . . , xn) = Pn(x1, . . . , xn) (xn − xi) i=1 n−2 n−1 Y Y = Pn−1(x1, . . . , xn) (xn−1 − xi) (xn − xi) i=1 i=1 n−1 Y = P0(x1, . . . , xn)(x2 − x1)(x3 − x2)(x3 − x1) ··· (xn − xi) i=1 and since this factorization has each xk appear as a root n times we can conclude that Y vn(x1, . . . , xn) = det(Vn(x1, . . . , xn)) = C (xj − xi) 1≤i Theorem 1.3. There is a relationship between the exponents of the expanded Vandermonde determinant and the indices in the expression for a general determinant, more specifically n ! n ! Y Y Y xi vn(x1, . . . , xn) = xi (xj − xi) i=1 i=1 1≤i 23 26 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Proof. We will prove this theorem by showing that replacing exponents with indices will give a function that by Definition 1.2 is a determinant. In Definition 1.2 we interpreted the determinant as a function of the columns of the matrix, for the Vandermonde determinant this corresponds to a function of the xi since they define the columns. Here we will interpret each part of Definition 1.2 as a statement about the xi and then show how it is implied by the Vandermonde determinant. • Alternating: The alternating property is easy to interpret in terms of the xi since if xi = xj for some i 6= j then we have two identical columns. Consider the product form of the Vandermonde determinant given in Theorem 1.2. Switching places between xi and xj with i < j in the Vandermonde determinant is equal to switching sign in all factors that contain either xi or xj as well as xk with i ≤ k ≤ j. There will be j − i − 1 factors that contain xi and satisfy i < k ≤ j and j − i − 1 factors that contain xj satisfy i ≤ k < j and one factor (xi − xj). This means that in total we will change sign in 2(j − i) − 1 factors which means the sign of the whole product will change. • Multilinearity: If we denote the left hand side in (3) with w n ! Y w = xk vn(x1, . . . , xn) k=1 then multiplying the kth column by a scalar can be interpreted as follows n n X i X i M·,k → aM·,k ⇔ w = xkci → axkci i=1 i=1 and addition of columns as n n X i X i i M·,k → M·,k + N·,k ⇔ w = xkci → (xk + yk)ci i=1 i=1 and multilinearity follows immediately from this. • det(I) = 1: For the identity matrix we have ( 1 i = j xi,j = 0 i 6= j which for the expanded Vandermonde determinant corresponds to the transformation ( 1 i = j xj → i 0 i 6= j 24 27 1.1. THE VANDERMONDE MATRIX when expanding the Vandermonde determinant we get n−1 Y vn(x1, . . . , xn) = vn−1(x1, . . . , xn−1) (xn − xi) i=1 n−1 = xn vn−1(x1, . . . , xn−1) + P (n) n−2 n−1 Y = xn vn−2(x1, . . . , xn−2) (xn−1 − xi) + P (n) i=1 n−1 n−2 = xn xn−1vn−2(x1, . . . , xn−2) + P (n, n − 1) n Y k−1 = xk + P (n, n − 1,..., 1) k=1 k−1 where P (I), I ⊂ Z>0, does not contain any terms of the form xk for all k ∈ I. Thus applying the transformation corresponding to the identity matrix we get n ! n n Y Y k Y xi vn(x1, . . . , xn) = xk + P (n, . . . , 1) → 1 + 0 = 1. i=1 k=1 k=1 Thus if we take the right hand side in equation (3) and exchange exponents for indices we get a determinant be Definition 1.2 and since the determinant j is unique by Theorem 1.1 and xi,j = xi in the Vandermonde matrix this must be equal to n ! n Y X I(σ) Y i xi vn(x1, . . . , xn) = (−1) xσ(i). i=1 σ∈Sn i=1 1.1.3 Inverse of the Vandermonde matrix The inverse for the Vandermonde matrix has been known for a long time, es- pecially since the solution to a Lagrange interpolation problems (see Section 1.2.1) gives the inverse indirectly. Here we will only give a short overview of the work on expressing the inverse as an explicit matrix. An explicit expression for the inverse matrix has been known since at least the end of the 1950s, see [199]. Theorem 1.4. The elements of the inverse of an n-dimensional Vander- monde matrix V can be calculated by j−1 −1 (−1) σn−j,i Vn ij = n (4) Y (xk − xi) k=1 k6=i 25 28 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions where σj,i is the j:th elementary symmetric polynomial with variable xi set to zero. j X Y 1 , a = b σ = x (1 − δ ) , δ = (5) j,i mk mk,i a,b 0 , a 6= b 1≤m1 We will not give the proof of this theorem here, but the general outline of a proof will be given in Section 1.2.1. In the literature there are many cases where the inverse is instead written as a product of several simpler matrices, usually triangular or diagonal [214, 225,226,277]. There is also a lot of literature that takes a more algorithmic approach and tries to find fast ways of computing the elements, classical examples include the Parker–Traub algorithm [274] and the Bj¨orck–Pereyra algorithm [23], and more recent results can be found in [84]. 1.1.4 The alternant matrix Many generalizations of the Vandermonde matrix have been proposed and studied in the literature. An early generalization is the alternant matrix which is a matrix that exchanges the powers in the Vandermonde matrix with other functions [219]. Definition 1.3. An alternant matrix is a matrix of the form f1(x1) f1(x2) ··· f1(xn) f2(x1) f2(x2) ··· f2(xn) A (f ; x ) = [f (x )]m,n = (6) mn m n i j i,j . . .. . . . . . fm(x1) fm(x2) ··· fm(xn) where fi : F → F where F is a field. If the matrix is square, n = m, the notation An = Anm will be used. Remark 1.4. Someties the alternant matrix is used as an alternative name for the Vandermonde matrix or the Vandermonde matrix multiplied by a diagonal matrix [276]. There are several special cases of alternant matrices that are useful or interesting in various mathematical fields: Interpolation and curve fitting Just like the Vandermonde matrix can be used for polynomial interpolation the alternant matrix can be used to describe interpolation with other sets of function, see Section 1.2.1 and 1.2.2, as well as approximate curve fitting, for example using the least squares method described in Section 1.2.3. 26 29 1.1. THE VANDERMONDE MATRIX Alternant codes As mentioned on page 19 there are several different error-detecting and error-correcting codes that can be described using the Vandermonde matrix. These and some related codes can also be categorized as alternant codes, a term introduced in [121]. For a survey on these codes see [295]. Jacobian matrix One of the most well-known examples of an alternant matrix is the Jaco- n n bian matrix. Let f : F → F be a vector-valued function that is n times differentiable with respect to each variable, then the Jacobian matrix is the matrix J, given by ∂y1 ∂y2 ··· ∂yn ∂x1 ∂x1 ∂x1 ∂y1 ∂y2 ··· ∂yn ∂x2 ∂x2 ∂x2 . . . . . . .. . . . . ∂y1 ∂y2 ··· ∂yn ∂xn ∂xn ∂xn where y = f(x). The most common application of the Jacobian matrix is to use its determinant to describe how volume elements are deformed when changing variables in multivariate calculus [246]. The numerous applications and generalizations that follow from this alone are too numerous to list so here we will only note that it holds a central role in many methods for multivariate optimizations, such as the Marquardt least squares method described in Section 1.2.6. Wronskian matrix di−1 n−1 If fn = (f1, . . . , fn), fi = dxi−1 , and gn = (g1, . . . , gn), gi ∈ C [C], then the alternant matrix An(fn; gn) will be the Wronskian matrix. The Wronskian matrix has a long history [125] and is commonly used to test if a set of functions are linearly independent as well as finding solutions to ordinary differential equations [101]. If the determinant of the Wronskian matrix is non-zero then the functions are linearly independent, see [27,32], but proving linear dependence requires further conditions, see [25, 26,230, 231, 293]. A classical application of the Wronskian is confirming that a set of solu- tions to a linear differential equation are linearly independent, or if n−1 lin- early independent solutions are known, constructing the remaining linearly independent solution using Abel’s identity (for n = 2) or a generalisation of it [34]. If Li is a linear partial differential operator of order i, then the alternant matrix An(Ln; gn), where Ln = (L1,...,Ln), is the generalized Wronskian matrix [227], has been used in for example diophantine geometry [82, 244] and for solving Korteweg-de Vries equations, see [197] and the references 27 30 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions therein. The generalized Wronskian matrix has similar properties with re- spect to the linear dependence of the functions it is created from as the standard Wronskian [294]. Both the Wronskian and generalized Wronskian is also useful in algebraic geometry, see [101] for several examples. Bell matrix Alternating matrices can be used to convert function composition into ma- di−1 j trix multiplication. By letting Di = dxi−1 and gj(x) = (f(x)) , where f is infinitely differentiable, the alternant matrix B[f] = An(Dn, gn) is called a Bell matrix (its transpose is known as the Carleman matrix). Some au- thors, for instance [159], refer to Bell matrices as Jabotinsky matrices due to a special case of Bell matrices considered in [137]. That Bell matrices converts function composition into matrix multiplica- tion can be seen by noting that the power series expansion of the jth power of f can be written as ∞ j X i (f(x)) = B[f]ijx i=1 and from this equality follows that B[f ◦ g] = B[g]B[f]. This is the basic property behind a popular technique called Carleman linearisation or Carle- man embedding that has seen wide use in the theory of non-linear dynamical systems. The literature on the subject is vast but a systematic introduction is offered in [165]. Moore matrix When working in a finite field with prime characteristic p an analogue of the Vandermonde and Wronskian matrix can be constructed by taking an alternant matrix where the rows are given by power of the Frobenius au- tomorphism, F (ω) = ωp. This matrix is called the Moore matrix and is named after its originator E. H. Moore who also calculated its determinant, ω1 ··· ωn p p n−1 p−1 p−1 ω1 ··· ωn Y Y Y . . . = ··· (ωi + ki−1ωi−1 + ... + k1ω1)(mod p), . .. . i=1 k k =0 pn−1 pn−1 i−1 1 ω1 ··· ωn and showed that if this determinant was not equal to zero then ω1,..., ωn are linearly independent [211]. There are several uses for the determinant of the Moore matrix in function field arithmetic, see for instance [113], a classical example is finding the modular invariants of the general linear group over a finite field [72, 224]. The determinant also plays an important role in the theory of Drinfeld modules [221]. 28 31 1.1. THE VANDERMONDE MATRIX 1.1.5 The generalized Vandermonde matrix There are several types of matrices (or determinants) that have been referred to as generalized Vandermonde matrices, for example the confluent Vander- monde matrix is sometimes referred to as the generalized Vandermonde ma- trix [149,150,175,194,265], this matrix and its role in interpolation problems is briefly described on page 40. Other examples include modified versions of confluent Vandermonde matrices [91], as well as matrices with elements given by multivariate monomials of increasing multidegree [39], or similarly over the algebraic closure of a field [61], matrices with elements given by multivariate polynomials with univariate terms [283]. α αn In this thesis we call the alternant matrix Amn(x 1 , . . . , x ; x1, . . . , xn) the generalized Vandermonde matrix. Definition 1.4. A generalized Vandermonde matrix is an n × m matrix of the form α1 α1 α1 x1 x2 ··· xn α2 α2 α2 h im,n x1 x2 ··· xn G (x ) = xαi = (7) mn n j . . .. . i,j . . . . αm αm αm x1 x2 ··· xn where xi ∈ C, αi ∈ C, i = 1, . . . , n. If the matrix is square, n = m, the notation Gn = Gnm will be used. This name has been used for quite some time, see [120] for instance. The main reason to study this matrix seems to be its connection to Schur polynomials, see below, and thus the research on the matrix is primarily focused on its determinant. Many of the results are algorithmic in nature [47, 66–68,157] but there are also more algebraic examinations [85,97,250,296]. There are several of examples where the determinant of generalized Van- dermonde matrices are interesting or useful. Schur polynomials Given an integer partition λ = (λ1, . . . , λn), that is 0 < λ1 ≤ λ2 ≤ . . . λn and each λi ∈ N, we can define a(λ1+n−1,λ2+n−2,...,λn)(x1, . . . , xn) = det(Gn(λ1 + n − 1, λ2 + n − 2, . . . , λn; x1, . . . , xn)). Note that a(λ1+n−1,λ2+n−2,...,λn)(x1, . . . , xn) is a polynomial that always have a(n−2,n−1,...,0)(x1, . . . , xn) = vn(x1, . . . , xn) as a factor. The polynomials given by expressions of the form a(λ1+n−1,λ2+n−2,...,λn)(x1, . . . , xn) sλ(x1, . . . , xn) = a(n−2,n−1,...,0)(x1, . . . , xn) 29 32 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions are called the Schur functions or Schur polynomials and were introduced by Cauchy [42] but named after Issai Schur (1875–1941) that showed that they were highly useful in invariant theory and representation theory. For instance they can be used to determine the character of conjugacy classes of representations of the symmetric group [98]. They have also been used in other areas, for instance to describe the generating function of many classes of plane partitions, see for instance [36] for several examples. The literature on Schur polynomials is vast and so are the applications so there will be no attempt to summarise them here. Integration of an exponential function over a unitary group If we let U(n) be the n-dimensional unitary group and dU a Haar measure normalised to 1 then the Harish-Chandra–Itzykson–Zuber integral formula [116, 136], says that if A and B are Hermitian matrices with eigenvalues λ1(A) ≤ ... ≤ λn(A) and λ1(B) ≤ ... ≤ λn(B) then Z det [exp(tλ (A)λ (B))]n,n n−1 t tr(AUBU ∗) j k j,k Y e dU = n(n−1) i! (8) U(n) t 2 vn(λ(A))vn(λ(B)) i=1 where vn is the determinant of the Vandermonde matrix. If t = 1 and A and B are chosen as diagonal matrices ( ( ai if i = j, bi if i = j, Aij = Bij = 0 if i 6= j, 0 if i 6= j, then formula (8) reduces to an expression involving determinants of a gen- eralized Vandermonde matrix and two Vandermonde matrices, ea1b1 ea1b2 . . . ea1bn ea2b1 ea2b2 . . . ea2bn ...... anb1 anb2 anbn Z ∗ e e . . . e etr(AUBU ) dU = . U(n) vn(a1, . . . , an)vn(b1, . . . , bn) 1.1.6 The Vandermonde determinant in systems with Coulombian interactions Several interesting mathematical problems that feature Vandermonde ma- trices and Vandermonde determinant can be described as questions about systems with Coulombian interactions. The name Coulombian interaction come from Charles-Augustin Coulomb (1736–1806) who is probably most well-known for quantifying the force between two charged particles (what is today known as Coulomb’s law) in 1785 [59]. Coulombs law states that 30 33 1.1. THE VANDERMONDE MATRIX the force between two charged particles is proportional to the product of the charges and the inverse of the square of the distance between the two charges. When talking about Coulombian interactions in mathematics or mathematical physics it usually refers to a system described by an energy given by N 1 X X H (x , . . . , x ) = g(x − x ) + N V (x ) (9) N 1 N 2 i j i i6=j i=1 where the interaction kernel, g(x), can take a few different forms, more on this later, and V (x) is an external potential that can behave in many dif- d ferent ways. The points xi usually belong to R (or some subset thereof) but there is also research that involves more general manifolds. A common goal is to minimize this energy or find some other extreme points. There are many areas where this kind of problems, or closely related problems, appear. See the extended version of [255] for a recent review of the field. In this section we will mention a few examples of interesting systems with Coulombian interactions that are connected to the Vandermonde determi- nant and the properties of the Vandermonde determinant we discuss in this thesis. Fekete points In Section 1.2.1 interpolation of a finite number of points using a polynomial will be discussed. When a function is approximated by a polynomial using interpolation the approximation error depends on the chosen interpolation points. The Fekete points is a set of points that provide an almost optimal choice of interpolation points [248] and they are given by maximizing the Vandermonde determinant. Taking the logarithm of the expression for the Vandermonde determinant given in Theorem 1.2 gives X log(vn(x1, . . . , xn)) = log(xj − xi) 1≤i 1 and thus − 2 log(vn(x1, . . . , xn)) gives the same as setting g(x) = log(x) and V (x) ≡ 0 in (9). Finding the Fekete points is also of interest in complex- ity theory and would help with finding an appropriate starting polynomial for a homotopy algorithm for realizing the Fundamental Theorem of Alge- bra [258,262]. In Chapter 2 we will discuss how to find the maximum points of the Vandermonde determinant for certain special cases. A common gen- eralisation of the Fekete points is the case where multivariate polynomials d are used, see for example [30, 37, 203]. The case where and points in C are interpolated have also been examined, an example of a recent significant results is [20] and a review can be found in [24]. 31 34 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Distribution of electrical charges The most classical example of a system with Coulombian interactions is a system of charged particles confined to some volume, even if it was not studied (from a mathematical point of view) until almost a hundred years after Coulomb’s law was introduced [119, 267]. The classical mathematical formulation of this problem considers p+1 charges fixed at points a0,...,ap ∈ C with weights η0,...,ηp and n moveable charges x1,...,xn. The questions is then what x-values give the extreme points of L(x1, . . . , xn) given by n p X X 1 X 1 L(x1, . . . , xn) = ηj log + log . |aj − xk| |xk − xi| k=1 j=0 1≤i More background on this type of problem together with a collection of recent results can be found in [73]. If there are no fixed charges the problem becomes equivalent to maximising the absolute value of the Vandermonde determinant similar to finding the Fekete points. The problems discussed in Chapter 2 belongs to the class of equations that are called Schr¨odinger-like in [73]. Sphere packing There are several different interaction kernels apart from the logarithmic interaction kernel, g(x) = − log(x), that are interesting in mathematical physics, especially statistical mechanics and quantum mechanics. One im- 1 portant class of interaction kernels are those given by g(x) = |x|s where s is a positive integer. When this interaction kernel is used value given by formula (9) is called the Riesz s-energy. There is a large body of significant literature, in [255] over 30 references are listed as introduction to different related problems. 1 It is worth noting that lim 1 − = − log(|x|) which connects min- s→0 |x|s imising the Riesz s-energy to the Fekete points. If we instead s → ∞ the problem of minimising the Riesz s-energy formally corresponds to the optimal sphere-packing problem, that is finding the arrangement of non-overlapping identical spheres that cover as much of a space as possible. This is a classical problem where extensive effort has gone into finding optimal packings but for many years the problem was only fully solved in one, two and three dimensions, until recently when surprisingly simple proofs were found for 8 and 24 dimensions (seemingly without giving any results for any number of dimensions in-between). For a thorough collection of classical results see [58] and for the recent results see [52–54, 284]. 32 35 1.1. THE VANDERMONDE MATRIX Coulomb gas In mathematical physics a system of particles whose energy can be described by (9) is often called a Coulomb gas [93, 207, 255]. One of the most wide- reaching results in the analysis of Coulomb gases was that many gas sys- tems can be described using random matrices that belongs to a so-called β-ensemble which is defined by matrices with random elements. The foun- dational results were found in the early 1960s and applied to the cases where β = 1, β = 2 and β = 4 [78–81]. These cases will be briefly discussed in Section 1.1.7 and describe where the Vandermonde determinant appears the probability density functions for the eigenvalues of the random matrices. If the same theory is extended to other values of β it can also be connected to equations similar to the Harish-Chandra–Itzykson–Zuber integral formula described on page 30 [93]. 1.1.7 The Vandermonde determinant in random matrix theory This section is based on Section 1, 3 and 4 in Paper D Random matrix theory is a large research are with many applications, primarily in quantum mechanics and statistical mechanics [93,207,255] but also in wireless communication and finance [13] and they appear as an im- portant tool for analysing and evaluating algorithms in numerical linear algebra [83]. One class of random matrices that have been analysed extensively are the so-called β-ensembles, for a brief motivation see the section on Coulomb gas above. Here we will define the some well-known β-ensembles and describe where the Vandermonde determinant appears in their probability density functions. Definition 1.5. Let X = (X1, ··· ,Xn), where Xi ∼ N (µi,ΣΣ) and Xi is independent of Xj, where i 6= j. The matrix W : p × p is said to be Wishart distributed [292] if and only if W = XX> for some matrix X in a family of Gaussian matrices Gm×n, m ≤ n, that is, X ∼ Nm,n(µµ,Σ, I) where Σ ≥ 0. Next we will look at the expression for the probability density distribu- tion of the eigenvalues of a Wishart distributed matrix taken from [7]. Theorem 1.5. If X is distributed as N (µµ,ΣΣ), then the probability density > distribution of the eigenvalues of XX , denoted λ = (λ1, . . . , λm), is given by: − 1 n − 1 n 1 (n−p−1) π 2 det(ΣΣΣ) 2 det(D) 2 Y 1 −1 P(λ) = 1 (λi − λj) exp − Tr(ΣΣ D) 2 np 1 1 2 2 Γp 2 n Γp 2 p i 33 36 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions It will prove useful that the formula given in Theorem 1.5 contains the Y term (λi − λj) which we recognize from Theorem 1.2 as the determinant i Lemma 1.1. Let P be a polynomial and A be a symmetric n × n matrix. If the eigenvalues of A, {λk, k = 1, . . . , n}, are all distinct then n X P (λk) = Tr (P (A)) . k=1 Proof. By definition, for any eigenvalue λ and eigenvector v we must have Av = λv and thus m ! m m X k X k X k P (A)v = ckA v = ck(A v) = ckλ v k=0 k=0 k=0 and thus P (λ) is an eigenvalue of P (A). For any matrix, A, the sum of eigenvalues is equal to the trace of the matrix n X λk = Tr(A) k=1 when multiplicities are taken into account. For the matrices considered in the Lemma 1.1 all eigenvalues are distinct. Thus applying this property to the matrix P (A) gives the desired statement. Lemma 1.2. A Wishart distributed matrix W as defined in Definition 1.5 will be a symmetric n × n matrix. Proof. From the definition W is a p × p matrix such that W = XX>. Then W> = (XX>)> = (X>)>X> = XX> = W and thus W is symmetric. The Gaussian Orthogonal Ensembles (GOE), the Gaussian Unitary En- sembles (GUE), the Gaussian Symplectic Ensembles (GSE) and the Wishart Ensembles (WE) are well-known classical ensembles. More detailed discus- sions on these ensembles can be found in [6,7,75,163,207,220,292], here we will only give their definitions and look at how the Vandermonde determi- nant appears in the probability density function for their eigenvalues. 34 37 1.1. THE VANDERMONDE MATRIX Definition 1.6. The Gaussian Orthogonal Ensemble (GOE) is characterised by a symmetric matrix X with real elements. The diagonal entries of X are independent and identically distributes (i.i.d) with a standard normal distri- bution N (0, 1) while the off-diagonal entries are i.i.d with a standard normal distribution N1(0, 1/2). That is, a random matrix X gives a GOE, if it is symmetric and real-valued (Xij = Xji) and has (√ 2ξii ∼ N1(0, 1), if i = j X−ij = (10) ξij ∼ N1(0, 1/2), i < j. Definition 1.7. The Gaussian Unitary Ensemble (GUE) is characterised by Hermitian (that is H>∗ = H where >∗ denotes the conjugate transpose) complex-valued matrices H. The diagonal entries of H are independent and identically distributes (i.i.d) with a standard normal distribution N (0, 1) while the off-diagonal entries are i.i.d with a standard normal distribution N2(0, 1/2). In other words, a random matrix H belongs to the GUE, if it is complex-valued, Hermitian, and the entries satisfy (√ 2ξii ∼ N2(0, 1), if i = j Hij = (11) √1 (ξ + iη ) ∼ N (0, 1/2), i < j, 2 ij ij 2 where i is the imaginary unit. Definition 1.8. The Gaussian Symplectic Ensemble (GSE) is characterised by a matrix, S, with quaternion elements that is self-dual (that is S>∗ = S where >∗ denotes the conjugate transpose of a quaternion). The diagonal entries H are independent and identically distributes (i.i.d) with a standard normal distribution N (0, 1) while the off-diagonal entries are i.i.d with a standard normal distribution N4(0, 1/2). In other words, a random matrix S belongs to the GUE, if it is complex-valued, Hermitian, and the entries satisfy (√ 2ξii ∼ N2(0, 1), if i = j Hij = (12) √1 (ξ + iα + jβ ) + kγ ∼ N (0, 1/2), i < j, 2 ij ij ij ij 4 where i, j and k are the fundamental quaternion units. Definition 1.9. The Wishart Ensembles (WE), Wβ(m, n), m ≥ n, are char- acterised by the symmetric, Hermitian or self-dual matrix W = Wβ(N,N) obtained as W = AA>, W = HH>, or W = SS> where > represents the appropriate transpose as given in the definition of the GOE, GUE and GSE respectively. To obtain the joint eigenvalue densities for random matrices, we apply the the principle of matrix factorization, for instance if the random matrix X is expressed as X = QΛQ>, then Λ directly gives the eigenvalues of X [138]. 35 38 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Applying the Jacobian technique for joint density transformation, see for example [7], this yields the joint densities of eigenvalues and eigenvectors. Lemma 1.3. The three Gaussian ensembles have joint eigenvalues proba- bility density function given by N ! Y 1 X Gaussian: (λ) = Cβ |λ − λ |β exp − λ2 (13) Pβ N 1 2 2 i i N Y Γ (1 + β/2) Cβ = (2π)−N/2 . N Γ (1 + jβ/2) j=1 Lemma 1.4. The Wishart ensembles have a joint eigenvalue probability density distribution given by N ! Y Y 1 X Wishart: (λ) = Cβ,α |λ − λ |β λα−p exp − λ2 (14) Pβ N 1 2 i 2 i i β β where α = 2 m and p = 1 + 2 (N − 1). The β parameter is decided by what type of elements are in the Wishart matrix, real-valued elements corresponds to β = 1, complex-valued elements correspond to β = 2 and quaternion β,α elements correspond to β = 4, and the normalizing constant CN is given by N Y Γ (1 + β/2) Cβ,α = 2−Nα . (15) N β j=1 Γ (1 + jβ/2) Γ α − 2 (n − j) More information on Lemma 1.3 and 1.4 can be found in standard text on random matrix theory, see for example [138, 207, 220]. Thus the joint eigenvalue probability density distribution for all the en- sembles can be summarized in the following theorem (for more detail see for example [83, 163, 207]). Theorem 1.6. Suppose that X belongs to one of the ensembles discussed given by Definitions 1.6–1.9. Then the distribution of eigenvalues of XN is given by ! Y β X (x , ··· , x ) = C¯β |x − x |β exp − x2 (16) PX 1 N N i j 4 i i ¯(β) where CN are normalized constants and can be computed explicitly and β is determined by the elements of X as in Lemma 1.4. 36 39 1.2. CURVE FITTING From (16) it should be noted that the properties of a probability density function, that is, N Z Y 0 ≤ P(x) ≤ 1 and P(x) dxi = 1 N R i=1 Y β do hold as verified in [207]. We also notice that the term |xi − xj| in i 1.2 Curve fitting The process of constructing a mathematical curve so that it has the best possible fit to some series of data pints is usually referred to as curve fitting. Exactly what fit means and what constraints are put on the constructed curve varies depending on context. In this section we will discuss a few different scenarios and methods that are related to the Vandermonde ma- trix and the methods used in later chapters to construct phenomenological mathematical models. We will give an introduction to a few different interpolation methods in Sections 1.2.1–1.2.2, that gives a curve that passes exactly through a finite set of points. If we cannot make a curve that passes through the points exactly we will need to choose how to measure the distance between the curve and the points in order to determine what curve fits the data points best. In Sections 1.2.3–1.2.6 the so called least squares approach to this kind of problem is presented. 1.2.1 Linear interpolation The problem of finding a function that generates a given set of points is usually referred to as an interpolation problem and the function generating the points is called an interpolating function. A common type of inter- polation problem is to find a continuous function, f, such that the given set of points {(x1, y1), (x2, y2),...} can be generated by calculating the set {(x1, f(x1)), (x2, f(x2)),...}. Often the interpolating function is also a lin- ear combination of elementary functions, but interpolation can also be done in other ways, for instance with fractals (the classical texts on this is [15,16]) or parametrised curves. For some examples, see Figure 1.2. 37 40 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Figure 1.2: Some examples of different interpolating curves. The set of red points are interpolated by a polynomial (left), a self-affine fractal (middle) and a Lissajous curve (right). In the case of the interpolating function being a linear combination of other functions and the interpolation is achieved by changing the coefficients of the linear combination this is said to be a linear model (not to be confused with linear interpolation that is interpolation with piecewise straight lines). For linear models the interpolation problem can be described using al- ternant matrices. Suppose we want to find a function m X f(x) = aigi(x) (17) i=1 that fits as well as possible to the data points (xi, yi), i = 1, . . . , n. We then get an interpolation problem described by the linear equation system Aa = y where a are the coefficients of f, y are the data values and X is the appropriate alternant matrix, g1(x1) g2(x1) . . . gm(x1) a1 y1 g1(x2) g2(x2) . . . gm(x2) a2 y2 X = , a = , y = . . . .. . . . . . . . . . g1(xn) g2(xn) . . . gm(xn) an yn Polynomial interpolation A classical form of interpolation is polynomial interpolation where n data points are interpolated by a polynomial of at most degree n − 1. The Vandermonde matrix can be used to describe this type of interpola- tion problem simply by rewriting the equation system given by p(xk) = yk, k = 1, . . . , n as a matrix equation n−1 1 x1 ··· x1 a1 y1 n−1 1 x2 ··· x a2 y2 2 = . . . .. . . . . . . . . . n−1 1 xn ··· xn an yn 38 41 1.2. CURVE FITTING That the polynomial is unique (if it exists) is easy to see when considering the determinant of the Vandermonde matrix Y det(Vn(x1, . . . , xn)) = (xj − xi). 1≤i Clearly this determinant is non-zero whenever all xi are distinct which means that the matrix is invertible whenever all xi are distinct. If not all xi are distinct there is no function of the x coordinate that can interpolate all the points. There are several ways to construct the interpolating polynomial without explicitly inverting the Vandermonde matrix. The most straight-forward is probably Lagrange interpolation, named after Joseph-Louis Lagrange (1736– 1813) [167] who independently discovered it a few years after Edward Waring (1736–1798) [288]. The idea behind Lagrange interpolation is simple, construct a set of n polynomials {p1, p2, . . . , pn} such that ( 0, i 6= j pi(xj) = 1, i = j and then construct the final interpolating polynomial by the sum of these pi weighted by the corresponding yi. The pi polynomials are called Lagrange basis polynomials and can easily be constructed by placing the roots appropriately and then normalizing the result such that pi(xi) = 1, which gives the expression (x − x1) ··· (x − xi−1)(x − xi+1) ··· (x − xn) pi(x) = . (xi − x1) ··· (xi − xi−1)(xi − xi+1) ··· (xi − xn) The explicit formula for the full interpolating polynomial is n X (x − x1) ··· (x − xk−1)(x − xk+1) ··· (x − xn) p(x) = yk (xk − x1) ··· (xk − xk−1)(xk − xk+1) ··· (xk − xn) k=1 and from this formula the expression for the inverse of the Vandermonde matrix can be found by noting that the jth row of the inverse will consist of the coefficients of pj, the resulting expression for the elements is given in Theorem 1.4. Polynomial interpolation is mostly used when the data set we wish to interpolate is small. The main reason for this is the instability of the inter- polation method. One example of this is Runge’s phenomenon that shows that when certain functions are approximated by polynomial interpolation fitted to equidistantly sampled points will sometimes lose precision when the number of interpolating points is increased, see Figure 1.4 for an example. 39 42 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions p1(x) p2(x) p3(x) p4(x) p(x) (x, y) 2 0 −2 0 1 2 3 4 5 6 7 8 Figure 1.3: Illustration of Lagrange interpolation of 4 data points. The red 4 X dots are the data set and p(x) = ykp(xk) is the interpolating k=1 polynomial. One way to predict this instability of polynomial interpolation is that the conditional number of the Vandermonde matrix can be very large for equidistant points [108]. There are different ways to mitigate the issue of stability, for example choosing data points that minimize the conditional number of the relevant matrix [106, 108] or by choosing a polynomial basis that is more stable for the given set of data points such as Bernstein polynomials in the case of equidistant points [222]. Other polynomial schemes can also be considered, for instance by interpolating with different basis functions in different inter- vals, for example using polynomial splines. Naturally another choice is to instead of polynomials choose basis func- tions that are more suitable to the problem at hand. For an example of this see Section 3.3. While the instability of polynomial interpolation does not prevent it from being useful for analytical examinations it is generally considered imprac- tical when there is noise present or when calculations are performed with limited precision. Often interpolating polynomials are not constructed by inverting the Vandermonde matrix or calculating the Lagrange basis poly- nomials, instead a more computationally efficient method such as Newton interpolation or Neville’s algorithm are used [235]. There are some variants of Lagrange interpolation, such as barycentric Lagrange interpolation, that have good computational performance [21]. In applications where the data is noisy it is often suitable to use least squares fitting, which is discussed in Section 1.2.3, instead of interpolation. 40 43 1.2. CURVE FITTING 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0 −40−20 0 20 40 −40−20 0 20 40 −40−20 0 20 40 Figure 1.4: Illustration of Runge’s phenomenon. Here we attempt to approx- imate a function (dashed line) by polynomial interpolation (solid line). With 7 equidistant sample points (left figure) the approx- imation is poor near the edges of the interval and increasing the number of sample points to 14 (center) and 19 (right) clearly re- duces accuracy at the edges further. Finally we will discuss an interesting and important (but for the rest of the thesis irrelevant) form of polynomial interpolation called Hermite interpolation where it is not only required that p(xk) = yk but also that the derivatives up to a certain order (sometimes allowed vary per point) are also given. This requires a higher degree polynomial that can be found by solving the equation system p(x ) = y k k0 p0(x ) = y k k1 . . (i) p (xk) = yki for all k = 1, 2, . . . , n where ki are integers that defines the order of the derivative that needs to match at the point given by xk. When this equation system is written as a matrix equation the resulting n X matrix, C, will have dimension m × m with m = ki with rows given by i=1 ( j 0, b ≤ kj X Ca,b = with c = a − ki and c < a ≤ c + kj+1. (b−1)! xb−c−1, b > k (b−c−1)! k j i=1 The matrix C is called a confluent Vandermonde matrix and has been studied extensively since Hermite interpolation is important both for nu- merical and analytical purposes. For example the confluent Vandermonde 41 44 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions matrix also has a very elegant formula for the determinant [3] Y (ki+1)(kj +1) det(C) = (xj − xi) . 1≤i There are also many results related to its inverse and numerical proper- ties, classical examples are [104, 105, 107], some further examples are men- tioned on page 29 but this is a vanishingly small part of the total literature on the subject. 1.2.2 Generalized divided differences and interpolation In Section 1.2.1 we saw how the coefficients of an interpolating polyno- mial could be computed by inverting the Vandermonde matrix or using the Lagrange basis polynomials. Another method for the coefficients of the polynomials is based on a computation called divided differences. Definition 1.10. Let x0,... , xn then the divided differences operator that acts on a function f(x) is defined as f(x0), n = 0, [x0, . . . , xn]f(x) = [x1, . . . , xn]f(x) − [x0, . . . , xn−1]f(x) , n > 0. xn − x1 The reason that the divided difference operator is interesting in polyno- mial interpolation is that if we apply it to two distinct points, x0 and x1, and a function f(x) then the result is the slope of a line that passes through the two points (x0, f(x0)) and (x1, f(x1)), [x1]f(x) − [x0]f(x) f(x1) − f(x0) [x0, x1]f(x) = = . x1 − x0 x1 − x0 A line that passes through the two points can then be constructed like this p(x) = f(x0) + (x − x0)[x0, x1]f(x). It can similarly be shown that a polynomial that interpolates a set of points (x0, f(x0)),..., (xn, f(xn)) can be written p(x) =f(x0) + (x − x0)[x0, x1]f(x) + (x − x0)(x − x1)[x0, x1, x2]f(x) + ... + (x − x0) ··· (x − xn−1)[x0, . . . , xn]f(x). This method for interpolation is usually referred to as Newton interpolation and is probably the most well-known application of divided differences. In 42 45 1.2. CURVE FITTING some literature, e.g. [65], this property is even used as a definition for divided differences. Since we expect to find the same polynomial whether we use the Lan- grange interpolation method described in Section 1.2.1 or the Newton inter- polation method described above we also expect there to be some relation between the divided difference operator and the Vandermonde determinant. Turns out there is a fairly simple relation, see [253] for details. Lemma 1.5. The divided difference operator defined in Definition 1.10 can also be written as n−1 1 x0 ··· x0 f(x0) n−1 1 x1 ··· x f(x1) 1 ...... n−1 1 xn ··· xn f(xn) [x0, . . . , xn]f(x) = , (18) vn(x0, . . . , xn) where vn(x0, . . . , xn) denotes the Vandermonde determinant. Remark 1.5. Sometimes, see for example [253], the relation in Lemma 1.5 is used as the definition of the divided difference operator. The divided differences operator can also be used to describe the er- ror that one gets when a function is approximated by interpolating with a polynomial, the following lemma is from [156]. Lemma 1.6. Let p(x) be a polynomial of degree smaller than or equal to n that interpolates the points {(xi, f(xi)), i = 0, . . . , n}. For any x 6= xi, i = 0, . . . , n the error f(x) − p(x) is given by n Y f(x) − p(x) = [x0, . . . , xn, x]f(x) (x − xi). i=0 Combining Lemma 1.5 and Lemma 1.6 gives n−1 1 x0 ··· x0 f(x0) n−1 1 x1 ··· x f(x1) 1 ...... n−1 1 xn ··· xn f(xn) n 1 x ··· xn−1 f(x) Y f(x) − p(x) = (x − x ). v (x , . . . , x , x) i n 0 n i=0 which gives some insight to why the value of the Vandermonde determinant is important when choosing interpolation points. Another popular application of the divided differences operator is the construction of so called B-splines, piecewise polynomial functions that allow 43 46 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions for very efficient storage and computation of a variety of shapes. The concept of (mathematical) splines first appeared in the 1940s [251,252] and B-splines were developed in the 1960s and 1970s [22,63,64]. We can define a B-spline using the divided differences as follows. Definition 1.11. Given a sequence, · · · ≤ t−1 ≤ t0 ≤ t1 ≤ t2 ≤ · · · we can define the kth B-spline of order m as ( m (−1) [tk, . . . , tk+m]gk(x, t), tk ≤ x < tk+1, Bk,m(x) = 0, otherwise, where ( (x − t)k−1, x ≥ t, gk(x, t) = 0, otherwise. and the divided difference operator acts with respect to t. Remark 1.6. There are several different ways to define B-splines, above we followed the definition in [253]. In modern literature it is more common that B-splines and their computation are described from the perspective of so-called blossoms [99,239,240] rather that the divided difference description. B-splines can be used for many things, for example approximation theory [204], geometric modelling [99] and wavelets construction [49]. We will not discuss their use further in this thesis. If we want to do linear interpolation and use some other set of basis functions other than the monomials, as in (17), then we need to define a generalized version of the divided difference operator. Definition 1.12. Given a set of m linearly independent functions, G = {gi}, and n values, x0,...,xn, then the generalized divided differences operator that acts on a function f(x) is defined as g1(x1) g2(x1) ··· gn−1(x1) f(x1) g1(x2) g2(x2) ··· gn−1(x2) f(x2) ...... g1(xn) g2(xn) ··· gn−1(xn) f(xn) [x1, . . . , xn]Gf(x) = . g1(x1) g2(x1) ··· gn(x1) g1(x2) g2(x2) ··· gn(x2) ...... g1(xn) g2(xn) ··· gn(xn) Remark 1.7. We mentioned previously that the divided difference operator can be used to construct B-splines and using the generalized divided differ- ence operator similar tools can be constructed using other sets functions than polynomials as a basis, see for example [196]. 44 47 1.2. CURVE FITTING 1.2.3 Least squares fitting If it is not necessary to exactly reproduce the series of data points a com- monly applied alternative to interpolation is least squares fitting. A least squares fitting of a mathematical model to a set of data points {(xi, yi), i = 1, . . . , n} is the choice of parameters of the model, here denoted β, chosen such that the sum of the squares of the residuals n X 2 S(β) = (yi − f(β; xi)) i=1 is minimized. This choice is appropriate if data series is affected by inde- pendent and normally distributed noise, see Section 1.3.1. The most wide-spread form of least squares fitting is linear least squares fitting where, analogously to linear interpolation, the function f(β; x) de- pends linearly on β. This case has a unique solution that is simple to find. It is commonly known as the least squares method and we describe it in detail in the next section. With a non-linear f(β; x) it is usually much more difficult to find the least squares fitting and often numerical methods are used, e.g. the Marquardt least squares method described in Section 1.2.6. In Section 3.2 we present a scheme for approximating electrostatical discharges to ensure electromagnetic compatibility (see Section 1.5) that uses both the least squares method and the Marquard least squares method. In Chapter 4 we fit several models to estimated mortality rates using non- linear least squares fitting and compare the result in various way described in Sections 1.3.1–1.3.3. 1.2.4 Linear least squares fitting Suppose we want to find a function m X f(x) = βigi(x) (19) i=1 that fits as well as possible in the least squares sense to the data points (xi, yi), i = 1, . . . , n, n > m. We then get a curve fitting problem described by the linear equation system Aβ = y where β are the coefficients of f, y is the vector of data values and A is the appropriate alternant matrix, g1(x1) g2(x1) . . . gm(x1) β1 y1 g1(x2) g2(x2) . . . gm(x2) β2 y2 A = , β = , y = . . . .. . . . . . . . . . g1(xn) g2(xn) . . . gm(xn) βn yn This is an overdetermined version of the linear interpolation problem described in Section 1.2.1. 45 48 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions How can we actually find the coefficients that minimize the sum of the squares of the residuals? First we can define the square of the length of the residual vector, e = Aβ − y, as a function n > X 2 > S(e) = e e = |ei| = (Aβ − y) (Aβ − y). i=1 This kind of function is a positive second degree polynomial with no mixed terms and thus has a global minima where ∂s = 0 for all 1 ≤ i ≤ n. We ∂ei can find the global minima by looking at the derivative of the function, ei is determined by βi and ∂ei = Ai,j ∂βj thus n n ∂S X ∂ei X = 2e = 2(A β − y )A = 0 ⇔ A>Aβ = A>y ∂β i ∂β i,· i i,j i i=1 j i=1 This gives A>Aβ = A>y ⇔ β = (A>A)−1A>y and by the Gauss–Markov theorem ([102,103,201], see for instance [208] for a more modern description), if (A>A)−1 exists then (19) gives the linear, unbiased estimator that gives the lowest variance possible for any linear, unbiased estimator. The matrix given by (A>A)−1A> is sometimes referred to as the Moore–Penrose pseudoinverse of A. i−1 Clearly a linear curve fitting model with gi(x) = x gives an equation system described by a rectangular Vandermonde matrix. 1.2.5 Non-linear least squares fitting So far we have only considered models that are linear with respect to the parameters that specify them. If we relax the linearity condition and simply consider fitting a function with m parameters, f(β1, . . . , βm; x), to n data points in the least squares sense it is usually referred to as a non-linear least squares fitting. There is no general analogue to the Gauss–Markov theorem for non- linear least squares fitting and therefore finding the appropriate estimator requires more knowledge about the specifics of the model. In practice non- linear least squares fittings are often found using some numerical method for non-linear optimization of which there are many (see for instance [247] for an overview). In the next section we will give an overview of a standard method called the Marquardt least squares method. In Section 3.2.2 we will use a combi- nation of the Marquardt least squares method and methods for linear least 46 49 1.2. CURVE FITTING squares fitting to fit a non-linear model described by G (β; t) η = y where β, η are vectors of parameters to be fitted, y is the data we wish to fit the model to and G (β; t) is the generalized Vandermonde matrix 1−t β 1−t β 1−t βm (t1e 1 ) 1 (t1e 1 ) 2 ··· (t1e 1 ) 1−t β 1−t β 1−t βm (t2e 2 ) 1 (t2e 2 ) 2 ··· (t2e 2 ) G (β; t) = . . . .. . . . . . 1−tn β 1−tn β 1−tn βm (tne ) 1 (tne ) 2 ··· (tne ) 1.2.6 The Marquardt least squares method This section is based on Section 3.1 of Paper E The Marquardt least squares method, also known as the Levenberg-Marquardt algorithm or damped least squares, is an efficient method for least squares estimation for functions with non-linear parameters that was developed in the middle of the 20th century (see [174], [202]). The least squares estimation problem for functions with non-linear pa- rameters arises when a function of m independent variables and described by k unknown parameters needs to be fitted to a set of n data points such that the sum of squares of residuals is minimized. The vector containing the independent variables is x = (x1, ··· , xn), the vector containing the parameters β = (β1, ··· , βk) and the data points (Yi,X1i,X2i, ··· ,Xmi) = (Yi, Xi) , i = 1, 2, ··· , n. Let the residuals be denoted by Ei = f(Xi; β) − Yi and the sum of squares of Ei is then written as n X 2 S = (f(Xi; β) − Yi) , i=1 which is the function to be minimized with respect to β. The Marquardt least squares method is an iterative method that gives approximate values of β by combining the Gauss-Newton method (also known as the inverse Hessian method) and the steepest descent (also known as the gradient) method to minimize S. The method is based around solving the linear equation system A∗(r) + λ(r)I δ∗(r) = g∗(r), (20) ∗(r) ∗(r) where A is a modified Hessian matrix of E(b) (or f(Xi; b)), g is a rescaled version of the gradient of S, r is the number of the current iteration 47 50 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions of the method, and λ is a real positive number sometimes referred to as the fudge factor [235]. The Hessian, the gradient and their modifications are defined as follows: A = J>J, ∂fi ∂Ei Jij = = , i = 1, 2, ··· , m; j = 1, 2, ··· , k, ∂bj ∂bj and ∗ aij (A )ij = √ √ , aii ajj while > ∗ gi g = J (Y − f0), f0i = f(Xi, b, c), gi = . aii Solving (20) gives a vector which, after some scaling, describes how the parameters b should be changed in order to get a new approximation of β, δ∗(r) b(r+1) = b(r) + δ(r), δ(r) = √i . (21) aii It is obvious from (20) that δ(r) depends on the value of the fudge factor λ. Note that if λ = 0, then (20) reduces to the regular Gauss-Newton method [202], and if λ → ∞ the method will converge towards the steepest descent method [202]. The reason that the two methods are combined is that the Gauss-Newton method often has faster convergence than the steepest descent method, but is also an unstable method [202]. Therefore, λ must be chosen appropriately in each step. In the Marquardt least squares method this amounts to increasing λ with a chosen factor v whenever an iteration increases S, and if an iteration reduces S then λ is reduced by a factor v as many times as possible. Below follows a detailed description of the method using the following notation: n 2 (r) X (r) S = Yi − f(Xi, b , c) , (22) i=1 n 2 (r) X (r) (r) S λ = Yi − f(Xi, b + δ , c) . (23) i=1 The iteration step of the Marquardt least squares method can be described as follows: • Input: v > 1 and b(r), λ(r). / Compute S λ(r). λ(r) • If λ(r) 1 then compute S v , else go to .. 48 51 1.2. CURVE FITTING λ(r) (r) (r+1) λ(r) • If S v ≤ S let λ = v . . If S λ(r) ≤ S(r) let λ(r+1) = λ(r). • If S λ(r) > S(r) then find the smallest integer ω > 0 such that S λ(r)vω ≤ S(r), and set λ(r+1) = λ(r)vω. • Output: b(r+1) = b(r) + δ(r), δ(r). This iteration procedure is also described in Figure 1.5. Naturally, some condition for what constitutes an acceptable fit for the function must also be chosen. If this condition is not satisfied the new values for b(r+1) and λ(r+1) will be used as input for the next iteration and if the condition is satisfied the algorithm terminates. The quality of the fitting, in other words the value of S, is determined by the stopping condition and the initial values for b(0). The initial value of λ(0) affects the performance of the algorithm to some extent since after the first iteration λ(r) will be self-regulating. Suitable values for b(0) are challenging to find for many functions f and they are often, together with λ(0), found using heuristic methods. Input: Compute S λ(r) b(r), λ(r) and v > 1 (r) ω = ω + 1 ω = 1 λ(r) 1 Compute S λ YES v NO NO NO (r) S λ(r)vω ≤ S(r) S λ(r) ≤ S(r) S λ ≤ S(r) NO v YES YES YES (r+1) (r) ω (r+1) (r+1) λ(r) λ = λ v λ = λ(r) λ = v Output: b(r+1) = b(r) + δ(r), δ(r) Figure 1.5: The basic iteration step of the Marquardt least squares method, definitions of computed quantities are given in (21), (22) and (23). In Section 3.2 the Marquardt least squares method will be used for least squares fitting with power-exponential functions. 49 52 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 1.3 Analysing how well a curve fits In this thesis we will discuss several ways to construct mathematical mod- els. With several mathematical models available it is needed to have some method for comparing the methods and choose the most suitable one. When the model in constructed with a certain application in mind there is often a set of required or desired properties given by the application and choosing the best model is a matter of seeing which model matches the requirements the best. In many cases this process is not straightforward and often there is not one model that is better than the other candidate models in all aspects, a common example is the trade-off between accuracy and complexity of the model. It is often easy to improve the model by increasing its complex- ity (either by introducing more general and flexible mathematical concepts that are more difficult to analyse or less well understood or by extending the model in a way that increases the cost of computations and simulations using the model), but finding the best compromise between accuracy and complexity can be difficult. In this section we will discuss how to com- pare models primarily with respect to accuracy and the number of required parameters. 1.3.1 Regression Regression is similar to interpolation except that the presence of noise in the data is taken into consideration. The typical regression problem assumes that the data points {(xi, yi), i = 1, . . . , n} are sample from a stochastic variable of the form Yi = f(β; xi) + i where f(β; x) is a given function with a fixed number of undetermined pa- rameters β ∈ B and i for i = 1, . . . , n are samples of a random variable with expected value zero, called the errors or the noise for the data set. There are many different classes of regression problems defined by the type of function f(β1, . . . , βm; x) and the distribution of errors. Here we will only consider the situation when the i variables are in- dependent and normally distributed with identical variance and that the k parameter space B is a compact subset of R and that for all xi the function f(β; xi) is a continuous function of β ∈ B. Suppose we want to choose the appropriate set of parameters for f based on some set of observed data points. A common approach to this is so called maximum likelihood estimation. Definition 1.13. The likelihood function, L is the function that gives us the probability that a certain observation, x, of a stochastic variable X is made given a certain set of parameters, β, Lx(β) = Pr(X = x|β). 50 53 1.3. ANALYSING HOW WELL A CURVE FITS Thus choosing parameters that maximize the likelihood function gives the set of parameters that seem to be most likely based on available infor- mation. Typically these parameters cannot be calculated exactly and must be estimated, this estimation is called the Maximum Likelihood Estimation (MLE). To find the MLE we need to find the maximum of the likelihood function. Note that here we will only consider the case where the noise variables, i, are independent and normally distributed with mean zero. Lemma 1.7. For the stochastic variables Yi = f(β; xi) + i where f(β; x) is a given function with a fixed number of undetermined parameters β ∈ B and i for i = 1, . . . , n are independent random variables with expected value zero and standard deviation σ the likelihood function is given by the joint probability density function for the noise, n 2 n n Y (yi − f(β; xi)) L (β) = (2π) 2 σ exp − . y σ2 i=1 Proof. Since each i is normally distributed with mean zero and standard de- viation σ the difference between the observed value and the given function, yi − f(β; xi) is normally distributed with mean zero and standard devia- tion σ. Since all the errors are independent the joint probability density function is just the product of n probability density functions of the form (y −f(β;x ))2 p (β;(x , y )) = √ 1 exp − i i for i = 1, . . . , n. i i i 2πσ σ2 For the MLE we only care about what parameters give the maximum of the likelihood function, not the actual value of the likelihood function so we can ignore the constant factor and in practice it also often simple to consider the maximum of the logarithm of the likelihood function. This leads to the following lemma. Lemma 1.8. Consider a regression problem described by a set of data points (xi, yi), i = 1, . . . , n and the stochastic variables Yi = f(β; xi) + i where f(β; x) is a given function with a fixed number of undetermined parameters β ∈ B and i for i = 1, . . . , n are independent normally distributed random variables with expected value zero and standard deviation σ. The MLE for the parameters β will minimize the sum of the squares of the residuals, n X 2 S(β) = (yi − f(β; xi)) . i=1 Proof. Since the natural logarithm is a monotonically increasing function − ln(Ly(β)) will have a minimum point where Ly(β) has a maximum point. 51 54 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions By Lemma 1.7 n ! n n Y yi − f(β; xi) − ln(L (β)) = − ln (2π) 2 σ exp − y σ2 i=1 n n n 1 X 2 = − ln (2π) 2 σ + (y − f(β; x )) σ2 i i i=1 n n 1 = − ln (2π) 2 σ + S(β). σ2 Since the first term and the factor in front of S(β) does not depend on β the minimum point of S(β) will coincide with the maximum of the likelihood function. Here we can see that finding the MLE is equivalent to using the curve fitting technique describes in Section 1.2.3. In Section 1.2.4 we saw that solving this problem in the case when the model was linear with respect to its parameters was relatively straightforward and in Section 1.2.5 we saw that when the model was nonlinear with respect to its parameters the problem was considerably harder. 1.3.2 Quantile-Quantile plots In the regression problem discussed in Section 1.3.1 is assumed that the noise of the model follows some distribution, for the purpose of this thesis we only considered the case of normally distributed noise but the essential problem formulation is the same regardless of noise distribution. Testing this assumption can be done in different ways and in Section 4.4.1 we will demonstrate some cases where the assumption is not entirely true using a quantile-quantile plot (Q-Q plot). Q-Q plots are a common tool for graphi- cally analysing how close sampled data is to a given distribution [273]. Suppose we have n samples of a stochastic variable X from an unknown distribution. Suppose we want to make a Q-Q plot that tests if X belong to a distribution with cumulative distribution function F . First we sort the samples in ascending order, x1 ≤ x2 ≤ ... ≤ xn and then choose what we expect to be the corresponding probability for each sample size. A common k − 0.5 approach is computing the -th quantile for sample x , k = 1, . . . , n, n k k − 0.5 i.e. finding F −1 . It is then expected that the points given by n k − 0.5 F −1 , x should mostly follow a straight line in the Q-Q plot n k apart from some random noise. If the residuals show some other pattern or some points lie very far from the line this indicates that the residuals would be better described by some other distribution or that there is a significant number of outliers. 52 55 1.3. ANALYSING HOW WELL A CURVE FITS There are many versions of this kind of tool and many alternatives for which quantiles to choose, see for examples Table 2.1 in [273], but here we will only use quantiles given above. This kind of tool does not provide a rigorous test, rather it is up to whoever analyses the sample to determine if the sample points are close enough to linear or not. These types of plot can also help identifying another distribution that might be a more reasonable assumption, for example we use a Q-Q plot to compare to a normal distribution and the lower end of the curve turns downwards and the higher tail turns upwards this indicates that the samples come a relatively long-tailed distribution, while if the ends of the curve turn in the opposite directions this indicates a relatively short-tailed distribution [273]. 1.3.3 The Akaike information criterion When constructing a mathematical model of an observable process without describing the underlying causes of the process, i.e. a phenomenological model, there are many tools available for creating a model that can recreate a finite set of points with arbitrarily high precision, for example the inter- polation methods described in Sections 1.2.1 – 1.2.2 or the least squares methods in Sections 1.2.3 – 1.2.6. Regardless of what method is chosen the accuracy (unless already exact) can be improved by adding more (free) parameters to the model (exactly how this is accomplished depends on the model). There is a well known anecdote, see [77], where Freeman Dyson describes how Enrico Fermi dissuaded him from pursuing a research project in particle physics where the model had many free parameters by saying I remember my friend Johnny von Neumann used to say, ’with four parameters I can fit an elephant, and with five I can make him wiggle his trunk’. Sometimes, especially when working with data that has significant noise, you can have a model that is ’too accurate’ in the sense that the model de- scribes some of the noise as well as the underlying process in a curve fitting or regression problem. For an example see Figure 4.4 where some models designed to reproduce a certain pattern in data gives unreasonable results when noise makes the pattern indistinct. This phenomena is called overfit- ting and can cause problems with extrapolation based on and interpretation of the model. A common sign of overfitting is that the model gives unrea- sonable results for points not in the original data set, similar to Runge’s phenomenon, illustrated in Figure 1.4. One way to detect possible overfitting is to compare the model to a sim- ilar model with fewer parameters and see if the improvement in accuracy is sufficiently large to warrant the use of the more complex model. A for- 53 56 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions malised way of doing this is using the Akaike Information Criterion (AIC) originally developed in the 1970s [4, 5]. Definition 1.14. Let g be a model of some data, y, with k estimated parameters and let Lˆ(g|y) be the maximum value of the likelihood function for the model. Then the Akaike Information Criterion is given by AIC(g|y) = 2(k + 1) − 2 log Lˆ(g|y) . The AIC is used for comparing models to each other and a lower AIC indicates a model that describes the data better but without overfitting or needless complexity, essentially a model that contains more information. The key concept for explaining why the AIC works is Kullback–Leibler divergence, also known as Kullback–Leibler information or relative entropy. Definition 1.15. The Kullback–Leibler divergence of two probability dis- tributions over the real numbers with probability density functions f and g is defined as Z ∞ f(x) I(f, g) = f(x) ln dx. −∞ g(x) Remark 1.8. There are definitions of the Kullback–Leibler divergence for discrete and multivariate distributions as well, but for the purposes of this thesis this is the only definition we will need. The Kullback–Leibler divergence has two properties that makes it useful for evaluating mathematical models. Lemma 1.9. Let f and g be the probability density functions of two prob- ability distributions. Then I(f, g) ≥ 0 and if and only if I(f, g) = 0 then f = g. Proof. Since f(x) and g(x) are probability density functions they must be g(x) non-negative and thus f(x) must also be non-negative. Then the following inequality holds g(x) g(x) ln ≤ − 1. f(x) f(x) If we multiply by f(x) and integrate on both sides of the inequality and note g(x) f(x) that ln f(x) = − ln g(x) then we get Z ∞ f(x) Z ∞ g(x) −I(f, g) = − f(x) ln dx ≤ f(x) − 1 dx −∞ g(x) −∞ f(x) Since f and g are probability density distributions we can conclude that Z ∞ g(x) Z ∞ f(x) − 1 dx = g(x) − f(x) dx −∞ f(x) −∞ Z ∞ Z ∞ = g(x) dx − f(x) dx = 1 − 1 = 0 −∞ −∞ 54 57 1.3. ANALYSING HOW WELL A CURVE FITS and thus Z ∞ f(x) −I(f, g) = − f(x) ln dx ≤ 0 ⇔ I(f, g)) ≥ 0. −∞ g(x) To prove that I(f, g) = 0 if and only if f = g we can argue similarly as before and get Z ∞ I(f, g) ≥ f(x) − g(x) dx −∞ and since f and g are probability density function the right hand side will only be zero if f = g. A common interpretation of I(f, g) is that the larger the Kullback– Leibler divergence is, the more information is lost by using g as an ap- proximation of f [40,164]. If we have a distribution with probability density function f and a number of candidates for approximating this distribution that have probability density functions g1, g2, ..., gn, then the best can- didate for approximation would be the candidate with I(f, gk) closest to zero. Often this is not useful when trying to model a process based on ob- servations since the true distribution is unknown. One solution to this is to estimate Kullback–Leibler divergence from the true model. This is the main idea behind the AIC. Fully deriving the AIC is somewhat complicated, see for example [35], and here we will only give a short motivation (based on Section 7.2 in [40]). First we must consider the situation that we can apply the AIC. We will have taken some set of observations from a stochastic variable Y with an unknown probability density function f and based on those constructed our model (using for example a curve fitting technique). Let us call the observations y and the model we construct based on the data g(·|y). This will not necessarily give the best possible version of the model so simply looking at I(f, g(·|y)) can be misleading, it is better to consider the expected value of I(f, g(·|y)) with respect to y that has the property EY [I(f, g(·|Y ))] > I(f, g(·|y∗)) where y∗ is the set of observations that gives the best possible version of the candidate model. Since f is unknown we cannot estimate this expected value directly but it can be rewritten as follows Z ∞ EY [I(f, g(·|Y ))] = f(y) ln(f(y)) dy −∞ Z ∞ − EY f(x) ln(g(x|Y )) ln(f(x)) dx −∞ = c − EY EX [ln(g(X|Y )] 55 58 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions where X is an independent stochastic variables with the same distribution as Y . Since c is a constant value that is independent of our candidate model we can ignore it and focus on EY EX [ln(g(X|Y )]. We saw in Section 1.3.1 that when we construct the model from the data we use the MLE for the parameters. If we denote the parameters given by the MLE by βˆ then instead of g(x|y) we can denote the chosen model with g(βˆ; x). Thus h ˆ i EY EX [ln(g(X|Y )] = EβˆEX ln(g(β; X) . Using a Taylor expansion we can see that ∂ > ln(g(βˆ; X)) β − βˆ ∂β1 1 1 ˆ . . ln(g(β; X)) ≈ ln(g(β; X)) + . . ∂ ˆ ln(g(βˆ; X)) βn − βn ∂βn > β1 − βˆ1 β1 − βˆ1 1 ∂2 n,n . ˆ . + . ln(g(β; X)) . . (24) 2 ∂βi∂βj 1,1 βn − βˆn βn − βˆn Since βˆ were given by the MLE the first order derivatives will disappear, ∂ ln(g(βˆ; X)) = 0. Taking the expectation of the remaining part of the ∂β1 expression with respect to X gives ˆ EX [ln(g(β; X))] ≈ EX [ln(g(β; X))] > β − βˆ n,n β − βˆ 1 1 " 2 # 1 1 1 . ∂ ˆ . + . EX ln(g(β; X)) . 2 ∂βi∂βj 1,1 βn − βˆn βn − βˆn ˆ = EX [ln(g(β; X))] + T (β). Next we take the expectation with respect to βˆ and get h ˆ i ˆ EβˆEX ln(g(β; X) ≈Eβˆ[EX [ln(g(β; X))] + T (β)] ˆ =EX [ln(g(β; X))] + Eβˆ[T (β)]. It can be shown that the first term is given by the maximum of the likelihood function, EX [ln(g(β; X))] = Lˆ, and that the second therm is approximately equal to the number of free parameters which is the number of parameters ˆ of the model plus the standard deviation of the noise, Eβˆ[T (β)] ≈ k + 1. Combining this gives the expression for the AIC given in Definition 1.14 apart from the factor −2 which is used as a matter of convention [40]. 56 59 1.4. D-OPTIMAL EXPERIMENT DESIGN Remark 1.9. For models with many parameters and small sample sizes it is recommended to add a second order correction to the AIC called the AICC [40,129] and in the case of least squares fitting is given by the following expression 2(k + 1)(k + 2) AIC = AIC + . C n − k − 2 There are several other information criterion that could be used in a similar way to the AIC, for example the Takeuchi information criterion [270] or the Bayesian information criterion [254]. Here we will use the AIC since it is considered a reliable criterion that is simple to calculate [40,164] and is asymptotically optimal for selecting the model with the least mean square errors [297]. 1.4 D-optimal experiment design For the class of linear non-weighted regression problems described in Sec- tion 1.3.1 minimizing the square of the sum of residuals gives the maximum- likelihood estimation of the parameters that specify the fitted function. This estimation naturally has a variance as well and minimizing this variance can be interpreted as improving the reliability of the fitted function by minimiz- ing its sensitivity to noise in measurements. This minimization is usually done by choosing where to sample the data carefully, in other words, given the regression problem defined by yi = f(β; xi) + i for i = 1, . . . , n with the same conditions on f(β; x) and i as in Section 1.3.1 we want to choose a design {xi, i = 1, . . . , n} that minimizes the variance of the values predicted by the regression model. This is usually referred to as G-optimality. To give a proper definition of G-optimality we will need the concept of the Fisher information matrix. When motivating the expression for the AIC in Section 1.3.3 the matrix ∂2 n,n ln(g(βˆ; X)) ∂βi∂βj 1,1 appeared, see expression (24). In that context we were interested how much information was lost when the model g was used instead of the data. If the model g is the true distribution, twice differentiable, and has only one parameter, β, it is possible to describe how information about the model that is contained in the parameter using the Fisher information ∂2 log(g(β; X)) I(β) = −E . X ∂β2 57 60 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Essentially this expression measures the probability of a particular outcome being observed for a known value of β, so if the Fischer information is only large in near a certain points it is easy to tell which parameter value is the true parameter value and if the Fisher information does not have a clear pea it is difficult to estimate the correct value of β. When the model has several parameters the Fisher information is replaced by the Fischer information matrix. n Definition 1.16. For a finite design x ∈ X ⊆ R the Fisher information matrix, M, is the matrix defined by ∂2 n,n M(β) = −EX ln(g(βˆ; X)) ∂βi∂βj 1,1 Remark 1.10. The concept of information in the AIC and the concept of information here are two different but related concepts, for a discussion of this relation see Section 7.7.8 in [40]. There is a lot of literature on the Fisher information matrix and but in the context of the least squares problems discussed here we have a fairly simple expression for its elements, see [208] for details. n Lemma 1.10. For a finite design x ∈ X ⊆ R the Fisher information matrix for the type of least squares fitting problem considered in this section can be computed by n X > M(x) = f(xi)f(xi) i=1 > where f(x) = f1(x) f2(x) ··· fn(x) . Definition 1.17 (The G-optimality criterion). A design ξ is said to be G-optimal if it minimizes the maximum variance of any predicted value Var(y(ξ)) = min max Var(y(x)) = min max f(x)>M(z)f(x). xi, i=1,2,...,n x∈X z∈X x∈X The G-optimality condition was first introduced in [264] (the name G- optimality comes from later work by Kiefer and Wolfowitz where they de- scribe several different types of optimal design using alphabetical letters [153], [154]) and is an example of a minimax criterion, since it minimizes the maximum variance of the values given by the regression model [208]. There are many kinds of optimality conditions related to G-optimality. One which is suitable for us to consider is D-optimality. This type of opti- mality was first introduced in [285] and instead of focusing on the variance of the predicted values of the model it instead minimizes the volume of the confidence ellipsoid for the parameters (for a given confidence level). 58 61 1.4. D-OPTIMAL EXPERIMENT DESIGN Definition 1.18 (The D-optimality criterion). A design ξ is said to be D-optimal if it maximizes the determinant of the Fisher information matrix det(M(ξ)) = max det(M(x)). x∈X The D-optimal designs are often good design with respect to other types of criterion (see for example [112] for a brief discussion on this) and is often practical to consider due to being invariant with respect to linear transformations of the design matrix. A well-known theorem called the Kiefer–Wolfowitz equivalence theorem shows that under certain conditions G-optimality is equivalent to D-optimality. Theorem 1.7 (Kiefer–Wolfowitz equivalence theorem). For any linear re- gression model with independent, uncorrelated errors and continuous and linearly independent basis functions fi(x) defined on a fixed compact topo- logical space X there exists a D-optimal design and any D-optimal design is also G-optimal. This equivalence theorem was originally proven in [155] but the for- mulation above is taken from [208]. Thus maximizing the determinant of the Fisher information matrix corresponds to minimizing the variance of the estimated β. Interpolation can be considered a special case of re- gression when the sum of the square of the residuals can be reduced to zero. Thus we can speak of D-optimal design for interpolation as well, in fact optimal experiment design is often used to find the minimum number of points needed for a certain model. For a linear interpolation problem defined by the alternant matrix A(f; x) the Fisher information matrix is M(x) = A(f; x)>A(f; x) and since A(f; x) is an n × n matrix det(M(x)) = det(A(f; x)>) det(A(f; x)) = det(f; x))2. Thus the maximization of the de- terminant of the Fisher information matrix is equivalent to finding the ex- treme points of the determinant of an alternant matrix in some volume given by the set of possible designs. A standard case of this is polynomial interpolation where the x-values are in a limited interval, for instance −1 ≤ xi ≤ 1 for i = 1, 2, . . . , n. In this > case the regression problem can be written as Vn(x) β = y where Vn(x) is a Vandermonde matrix as defined in equation (1) and the constraints on the elements of β means that the volume we want to optimize over is a cube in n dimensions. There is a number of classical results that describe how to find the D-optimal designs for weighted univariate polynomials with various efficiency functions, e.g. [87], and in Section 2.3.3 we will demonstrate one way to optimize the Vandermonde determinant over a cube. The shape of the volume to optimize the determinant in is given by con- straints on the data points. For example, if there is a cost associated with each data point that increases quadratically with x and there is a total bud- get, C, for the experiment that cannot be exceeded the constraint on the 59 62 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 2 2 2 x-values becomes x1 + x2 + ... + xn ≤ C and the determinant needs to be optimized in a ball. In Chapter 2 we examine the optimization of the Van- dermonde determinant over several different surfaces in several dimensions. In Section 3.3 we use a D-optimal design to improve the stability of an interpolation problem as an alternative to the non-linear fitting from Section 3.2. Note that while choosing a D-optimal design can give an approximation method that is more stable since it minimizes the variance of the parameters, the approximating function can still be highly sensitive to changes in param- eters (the variance of the predicted values can be minimized but still high) so it does necessarily maximize stability or stop instability phenomenons similar to Runge’s phenomenon for polynomial interpolation. 1.5 Electromagnetic compatibility and electrostatic discharge currents There are many examples of electromagnetic phenomena that involve two objects influencing each other without touching. Almost everyone is familiar with magnets that attract or repel other object, sparks that bridge physical gaps and radio waves that send messages across the globe. While this action- at-a-distance can be very useful it can also cause unintended interactions between different systems. This is usually referred to as electromagnetic disturbance or electromagnetic interference and the field of electromagnetic compatibility (EMC) is the study and design of systems that are not sus- ceptible to disturbances from other systems and does not cause interference with other systems or themselves [228, 290]. There are many possible causes of electromagnetic disturbance including a multitude of sources. Some examples are man-made sources such as broad- casting and receiving devices, power generators and converters, power con- version and ignition systems for combustion engines, manufacturing equip- ment like ovens, saws, mills, welders, blenders and mixers, other equipment such as fans, heaters, coolers, lights, computers, instruments for measure- ments and control, examples of natural sources are atmospheric-, solar- and cosmic noise, static discharges and lightning [212]. Mathematical modelling is an important tool for EMC [212]. Using com- puters for electromagnetic analysis have been done since the 1950s [115] and it rapidly became more and more useful and important over time [234]. In practice many different types of models and methods are used, all with their own advantages and disadvantages, and the design process often involves a combination of analytical and numerical techniques [94]. The sources of elec- tromagnetic disturbances are not always well understood or cannot be well described and deriving all parts of the model from first principles requires a combination of many different techniques, both numerical, stochastic and analytical, see [76,229] for examples. In practice it is often reasonable to use 60 63 1.5. ELECTROMAGNETIC COMPATIBILITY AND ELECTROSTATIC DISCHARGE CURRENTS phenomenological models reproducing typical patterns based on statistical data [45, 148]. Requirements for a product or system to be considered electromagneti- cally compatible can be found in standards such as the IEC 61000-4-2 [132] and IEC 62305-1 [133]. In several of these standard approximations of typ- ical currents for various phenomena are given and electromagnetic compat- ibility requirements are based on the effects of the system being exposed to these currents, such as the radiated electromagnetic fields. Ideally the descriptions of these currents should give an accurate description of the ob- served behaviour that the standard is based on as well being computationally efficient (since computer simulations replacing construction of prototypes can save both time and resources) and be compatible with the mathematical tools that are commonly used in electromagnetic calculations, for instance Laplace and Fourier transforms. In this thesis we will discuss approximations of electrostatic discharge currents, either from a standard or based on experimental data. In Section 1.5.1 a review of models in the literature can be found and in Chapter 3 we propose a new function, the analytically extended function (AEF), for modelling these currents that has some advantages compared to the commonly used models and can be applied to many different cases, typically at the cost of some extra manual work in fitting the model. Electrostatic discharge (ESD) is a common phenomenon where a sud- den flow of electricity between two charged object occurs, examples include sparks and lightning strikes. The main mechanism behind is usually said to be contact electrification, this phenomena is due to all materials occa- sionally emitting electrons, usually at a higher rate when they are heated. Typically the emission and absorption balances out but since the rate of emission varies between different materials an imbalance can occur when two materials come sufficiently close to each other. When the materials are separated this charge imbalance might remain for some time, it can be re- stored by the charged objects slowly emitting electrons to the surrounding objects but in the right conditions, for example if the charged object comes near a conductive material with an opposite charge, the restoration of the charge balance can be very rapid resulting in an electrostatic discharge. The reader is likely to be familiar with the case of two materials rubbing against each other building up a charge imbalance and one of the objects generating a spark when moved close to a metal object. This case is common since fric- tion between objects typically means a larger contact area where charges can transfer and movement is necessary for charge separation. For this reason this mechanism is often referred to as friction charging or the triboelectric effect. Contact charging can happen between any material, including liq- uids and gases, and can also be affected by many other types of phenomena, such as ion transfer or energetic charged particles colliding with other ob- jects [100]. Therefore the exact mechanisms behind electrostatic discharges 61 64 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions can be difficult to understand and describe, even when the circumstances where the electrostatic discharge are likely are well known [195]. In this thesis we focus on two types of electrostatic discharge, lightning discharge and human-to-object (human-to-metal or human-to-human). Lightning discharges can cause electromagnetic disturbances in three ways, by passing through a system directly, by passing through a nearby object which then radiates electrical fields that disturbs the system, or by indirectly inducing transient currents in systems when the electrical field associated with a thundercloud disappears when the lightning discharge re- moves the charge imbalance between cloud and ground [45]. We discuss modelling of some lightning discharges from standards and experimental data in Section 3.2. Electrostatical discharges from humans are very common and are typ- ically just a nuisance, but they can damage sensitive electronics and can cause severe accidents, either by the shock from the discharge causing a human error or by directly causing gas or dust explosions [148, 195]. We discuss modelling of a simulated human-to-object electrostatical discharge in Section 3.3. 1.5.1 Electrostatic discharge modelling Well-defined representation of real electrostatic discharge currents is needed in order to establish realistic requirements for ESD generators used in testing the equipment and devices, as well as to provide and improve the repeata- bility of tests. It should be able to approximate the current for various test levels, test set-ups and procedures, and also for various ESD conditions such as approach speeds, types of electrodes, relative arc length, humidity, etc. A mathematical function is necessary for computer simulation of such phenomena, for verification of test generators and for improving standard waveshape definitions. A number of current functions, mostly based on exponential functions, have been proposed in the literature to model the ESD currents, [44,95,96, 142,144,152,266,278,286,287,301,302]. Here we will give a brief presentation of some of them and in Section 3.1 we will propose an alternative function and a scheme for fitting it to a waveshape. A number of mathematical expressions have been introduced in the liter- ature for the purpose of representation of the ESD currents, either the IEC 61000-4-2 Standard one [132], or experimentally measured ones, e.g. [95]. In this section we give an overview of most commonly applied ESD current approximations. A double-exponential function has been proposed by Cerri et al. [44] for representation of ESD currents for commercial simulators in the form t t − τ − τ i(t) = I1e 1 − I2e 2 , 62 65 1.5. ELECTROMAGNETIC COMPATIBILITY AND ELECTROSTATIC DISCHARGE CURRENTS this type of function is also applied in other types of engineering, see Sec- tion 3.1 for some examples. This model was also extended with a four-exponential version by Keenan and Rossi [152]: t t t t − τ − τ − τ − τ i(t) = I1 e 1 − e 2 − I2 e 3 − e 4 . (25) The Pulse function was proposed in [89], t p t − τ − τ i(t) = I0 1 − e 1 e 2 , and has been used for representation of lightning discharge currents both in its single term form [181] as well as linear combinations of two [266], three or four Pulse functions [301]. The Heidler function [117] is one of the most commonly used functions for lightning discharge modelling n t I τ1 t 0 − τ i(t) = n e 2 , η 1 + t τ1 Wang et al. [286] proposed an ESD model in the form of a sum of two Heidler functions: n n t t I τ1 t I τ3 t 1 − τ 2 − τ i(t) = n e 2 + n e 4 , (26) η1 1 + t η2 1 + t τ1 τ3 1/n 1/n with η = exp − τ1 nτ2 and η = exp − τ3 nτ4 being the 1 τ2 τ1 2 τ4 τ3 peak correction factors. The function has been used to fit different electro- static discharge currents using different methods [95, 286, 302]. Berghe and Zutter [278] proposed an ESD current model constructed as a sum of two Gaussian functions in the form: ! ! t − τ 2 t − τ 2 i(t) = A exp − 1 + Bt exp − 2 . (27) σ1 σ2 The following approximation using exponential polynomials is presented in [287], i(t) = Ate−Ct + Bte−Dt, (28) and has been used for design of simple electric circuits which can be used to simulate ESD currents. One of the most commonly used ESD standard currents is the IEC 61000- 4-2 current that represents a typical electrostatic discharge generated by the human body [132]. In the IEC 61000-4-2 standard [132] this current 63 66 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions is given by a graphical representation, see Figure 3.10, together with some constraints, see page 150. In Figure 1.6 the models discussed in this section have been fitted to the graph given in the standard. The data from the standard is not included in this figure since some features, notably the initial delay visible in the standard is not reproduced in either model. The different models give quite different quantitative behaviour in the region 2.5 − 25 ns. In Section 3.1 we propose a new scheme for modelling this type of functions and in Section 3.3 we fit this model to the IEC 61000-4-2 standard current and some experimental data. Two Heidler, [95] Two Heidler, [302] 14 Pulse binomial, [266] Exponential polynomial, [287] 12 Two Gaussians, [278] Four exponential, [152] 10 8 ) [A] t ( i 6 4 2 0 0 5 10 15 20 25 30 35 40 45 50 55 t [ns] Figure 1.6: Comparison of different functions representing the Standard ESD current waveshape for 4kV. The model given in Section 3.1 is also fitted to both lightning discharge current from the standard and from measured data in Section 3.2. 64 67 1.6. MODELLING MORTALITY RATES 1.6 Modelling mortality rates This section is based on Paper H Understanding how the probability of surviving to or beyond a certain age is, can be an important question for insurers, actuaries, demographers and policies makers. For some purposes a simple mathematical model can be desirable. Here we will discuss some basic mathematical concepts useful for this type of understanding. Definition 1.19. Consider an individual whose current age is x and whose remaining lifetime is denoted Tx > 0 then the survival function, Sx(∆x), is defined as Sx(∆x) = Pr[Tx > ∆x]. It is typically assumed [71] that the remaining lifetime Tx obeys the relation Pr[T0 > x + ∆x] Pr[Tx > ∆x] = Pr[T0 > x + ∆x|T0 > x] = (29) Pr[T0 > x] S0(x + ∆x) or phrased in terms of the survival function Sx(∆x) = . S0(x) There are three conditions a survival function must satisfy [71] in order to have a reasonable interpretation in terms of lifespan • Only individuals with positive remaining lifetime are considered thus an individual must survive at least 0 units of time, Sx(0) = 1. • There are no immortal individuals, lim Sx(∆x) = 0. ∆x→∞ • Since the definition only contains an upper bound on remaining life- time Sx(∆x) must be non-increasing. Here we will not work with the survival function directly, instead we will model the mortality rate. Definition 1.20. The mortality rate, µ, (also known as force of mortality, death rate or hazard rate) for an individual of age x is defined as µ(x) = lim Pr[T0 ≤ x + dx|T0 > x]. (30) dx→0+ We can express the mortality rate using the survival function and vice versa using the following lemma. Lemma 1.11. If Sx(∆x) is a survival function whose derivative exists when x and ∆x are both non-negative and µ(x) is the corresponding mortality rate then dS0 dx µx = − S0(x) 65 68 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions and Z x+∆x Sx(∆x) = exp − µ(t) dt . x Proof. Using (29) and that the derivative of S0(x) exists we can rewrite the definition of the mortality rate as dS0 1 − Pr[Tx > dx] dx d µ(x) = lim = − = − ln(S0(x)). dx→0+ dx S0(x) dx Thus we have expressed the mortality rate in terms of the survival function and using some calculus we can express the survival function in terms of the mortality rate. First note that if the derivative of S0(x) exists then dS0 dx d µ(x) = − = − ln(S0(x)) S0(x) dx and integrating on both sides gives Z x+∆x µ(t) dt = ln(Sx(∆x)) − ln(Sx(0)) x and since Sx(0) = 1 then ln(Sx(0)) = 0 Z x+∆x Sx(∆x) = exp − µ(t) dt . x In Chapter 4 we will apply these concepts to models of the human lifes- pans in different countries and different years. Mortality rates are typically estimated from demographic data for a country using the central mortality rate which is defined differently than the mortality rate given by (30). The central mortality rate of a group of individuals of age x at time t is denoted m and defined as m = dx where d is the number of deaths at age x x,t x,t Lx x during some time period and Lx is the average number of living individu- als of age x during that same interval. In this thesis we will only consider time intervals of one year and thus the estimates or the central mortality rate mx,t is estimated the same way as µ(x) so for any given year t we can assume that mx,t ≈ µ(x). When examining the mortality rate for developed countries there are three patterns that are recurring all over the world, an increased mortality 1 rate for infants that decreases rapidly, in other words µ(x) ∼ x for small x, exponential growth of mortality rate for higher ages, µ(x) ∼ ecx for large x, and a ’hump’ for young adults where the mortality rate first increases quickly and then remains constant or slowly decreases for some years. Some examples of mortality rates that demonstrate these patterns can be seen 66 69 1.6. MODELLING MORTALITY RATES in Figure 1.7. The typical explanations for the rapid decrease in mortality rate is that small children are sensitive to disease, disorders and accidents but becomes more resilient as they mature. The ’hump’ for young adults is usually attributed to a lifestyle change, starting in their early to mid teens individuals tend to become more independent and take more risks, especially young men. Sometimes this phenomena is known as the accident hump since accidents (often vehicular accidents) are believed to explain a large part of the shape of the hump, e.g. in the USA in 2017 approximately 40% of the deaths in the age range 15-35 [161]. The increase in mortality rate for higher ages is explained by the increased risk of health issues that follows naturally from aging. In some countries there is also a visible trend that the growth of the mortality rate starts to slow down for very high ages, whether this trend is generally present and at which age it should be taken into consideration is still being debated, see [14, 17, 88, 109, 177] for examples of varying views. USA 1992 Sweden 1992 Switzerland 1992 Ukraine 1992 -4 -4 -4 -4 ) ) ) ) x,t x,t x,t -6 -6 x,t -6 -6 ln(m ln(m ln(m -8 -8 ln(m -8 -8 -10 -10 -10 -10 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 age, x, years age, x, years age, x, years age, x, years Japan 1992 Taiwan 1992 Australia 1992 Chile 1992 -4 -4 -4 -4 ) ) ) ) x,t x,t x,t -6 -6 -6 x,t -6 ln(m ln(m ln(m -8 -8 -8 ln(m -8 -10 -10 -10 -10 0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60 age, x, years age, x, years age, x, years age, x, years Figure 1.7: Examples of central mortality rate curves for men demonstrating the typical patterns of rapidly decreasing mortality rate for very young ages followed by a ’hump’ for young adult and a rapid in- crease for high ages. In Section 4.2 an overview of models in literature will be given and three new models introduced. The different models will then be compared to each other by fitting the models to the central mortality rate for men in various different countries and computing the corresponding AIC values. It is not only a models ability to reproduce observed patterns in data that determines its usefulness. Choosing the appropriate model depends on the intended application. For many applications it is not just desirable to understand what the mortality rate is now and how it has changed his- 67 70 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions torically, but also how it can be predicted. In Section 4.5 we will see what effects replacing historical data on mortality rates with values given by a few different fitted models will have when using a method of forecasting called the Lee–Carter method. The Lee–Carter forecasting method is described in the next section. 1.6.1 Lee–Carter method for forecasting This section is based on Paper I For many applications it is also important to be able to forecast how the mortality rate will change in the future. There are several methods of producing the forecasts but the method proposed by Lee and Carter in 1992 [170], seems to be generally accepted, because it produces satisfactory fits and forecasts of mortality rates for various countries. Secondly, the structure of the Lee–Carter (L–C) method allows for easy computation of confidence intervals related to mortality projections. Lee and Carter developed their approach specifically for U.S. mortality data, 1900-1989 and forecasted (over a 50 year forecast horizon), 1990-2065. However, the method has now been applied to mortality data from many other countries and time periods, e.g. Chile [172], China [147], Japan [291], the seven most economically developed nations (G7) [275], India [46], the Nordic countries [162], Sri Lanka [1] and Thailand [298]. Lee and Carter assumed [170] that the central mortality rate for a given age changes as a log-normal random walk with drift ln(mx,t) = ax + bxkt + εx,t, (31) where ln(mx,t) is the central mortality rate at age x in year t, ax is the average pattern of mortality at age x, bx represents how mortality at each age varies when the general level of mortality changes, kt is mortality index that captures the evolution of rates over time and εx,t an error term which causes the deviation of the model from the observed mortality rates, assumed 2 to be normally distributed N(0, σt ). The parametrization given in (31) is not unique. For example, if we have a solution ax, bx and kt, then there might exist any non-zero constant c ∈ R which gives another solution of ax − cbx, cbx and kt = c, for which these transformations might produce identical forecasts. In order to get a unique solution when fitting a L–C model, constraints must be imposed. The constraints can be chosen in different ways but here we will use the X following: bx is constrained to sum to 1, bx = 1, and kt to sum to 0, x X kt = 0, which gives ax to be as the average over time of the ln(mx,t), t 1 X ax = T ln(mx,t). t 68 71 1.6. MODELLING MORTALITY RATES The parameters ax, bx and the mortality indices kt are found as follows: Given a set of ages (or age ranges), {xi, i = 1, . . . , n}, and a set of years, 1 X {tj, j = 1,...,T }, first estimatea ˆx = T ln(mx,t). Then construct the t matrix given by Zij = ln(mxi,tj ) − aˆxi . From the conditions imposed on bx and kt we now know that Zij = bxi ktj and thereby the values of bx and kt can be found using the singular value decomposition (SVD) of Z. Finding > the standard SVD Z = USV gives ˆbx as the first column of U and kˆt is given by the largest singular value multiplied by the first column of V >. Forecasting future mortality indices can be done in different ways, but in practice the random walk with drift model (RWD) for kˆt is common because of its simplicity and straight forward interpretation, so we will also use the RWD model to estimate kˆt by kˆt = kˆt−1 + θ + εt. In this specification, θ is the drift term, and kˆt is forecast to decline linearly with increments of θ, while deviations from this path, εt, are permanently incorporated in the trajectory. The drift term θ is estimated as below, which shows that θˆ only depends on the first and last values of kˆt estimates, kˆ − kˆ θˆ = T 1 . T − 1 ˆ We can now forecast the mortality index with the formula kt+∆t = kˆt + θ∆t + εt and then predict the logarithm of the central mortality rate as ˆ ˆ ln(mx,t+∆t) =a ˆx + bxkt. To accompany this prediction we also want a con- fidence interval for the forecast at time ∆t. This can be done by computing the confidence interval for the mortality index by computing the standard deviation of the mortality indices compared with the RWD model and then multiplying the result with square root of ∆t. Thus if we used the central mortality rate for T different years we can with with confidence level α say that √ √ ˆ ˆ kt+∆t − λ α · σ · ∆t ≤ kt+∆t ≤ kt+∆t + λ α · σ · ∆t, 2 2 where λβ is the inverse of the cumulative normal distribution function for T −1 ! X 2 β and σ = (kt+1 − kt − θ) . In Section 4.5 examples of forecasts t=1 are illustrated, see Figure 4.7 for central mortality rates and Figure 4.8 for mortality indices. The L–C model is not without flaws, a common remark is that the as- sumptions on how mortality rate changes are quite restrictive. It cannot cap- ture age specific changes of pattern, for example medical breakthroughs in reducing a specific cause of death that is common in a certain age range [169]. It also often fails when applied to specific causes of mortality, for example motor vehicle accidents showed a rising trend initially as the availability of motor vehicles increased but over time it has decreased due to improved 69 72 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions safety of vehicles and roads as well as increased urbanisation [111]. The Lee- Carter model has also been criticized for giving age-profiles that evolve in implausible ways for long-run forecasts [111] as well as extrapolation back- wards in time [169]. It can also misrepresent the temporal dependence be- tween age groups [210]. Several variants and extensions of the L–C approach that improves performance have been suggested, see [29,57,162,169,171,176] for examples. These models extended the L–C approach by including addi- tional period effects and in some cases cohort effects. In Section 4.5 we will fit a few models to central mortality rate data for several countries and then use the fitted model to produce values for the L–C method and examine the differences in the predictions based on the different sets of data. 70 73 1.7. SUMMARIES OF PAPERS 1.7 Summaries of papers Paper A [187] This paper examines the extreme points of the Vandermonde determinant on the sphere in three or more dimensions. A few different ways to analyse the three-dimensional case are shown in Section 2.1.2, and a detailed description of the method used to solve the n-dimensional problem from [269] can be found in Section 2.2.1. The extreme points are given in terms of roots of rescaled Hermite polynomials. For dimensions three to seven explicit expres- sions are given the results are visualized dimensions by using symmetries of the answers to project all the extreme points onto a two-dimensional surface, see Section 2.2.2. The thesis author contributed primarily to the derivation of some of the recursive properties of the Vandermonde determinant and its derivatives and to a lesser extent to the visualisation aspects of the problem. Paper B [186] The Vandermonde determinant is optimized over the ellipsoid and cylinder in three dimensions, see Section 2.1.4 and 2.1.5. Lagrange multipliers are used to find a system of polynomial equations which give the local extreme points. Using Gr¨obnerbasis and other techniques the extreme points are given either explicitly or as roots of univariate polynomials. The results also presented visually for some special cases. The method is also extended to surfaces defined by homogeneous polynomials, see Section 2.1.6. The ex- treme points on sphere defined by the p - norm (primarily p = 4) are also discussed. The thesis author primarily contributed to the examination of the ellipsoid, cylinder and surfaces defined by homogenous polynomials. Paper C [216] The sphere in n dimensions with respect to a p - norm can be thought of as a surface defined implicitly by a univariate polynomial. Here it is shown that the extreme points of the Vandermonde determinant on a bounded surface defined by a univariate polynomial are given the zeroes of the polynomial solution of a differential equation with polynomial coefficients. Expressions for polynomials whose roots give the coordinates of the extreme points are given for the cases of a surface given by a general first or second-degree poly- nomial, some higher degree monomials and cubes (Sections 2.3.1–2.3.3 and 2.3.6). Some results that can be used to reduce the dimension of the prob- lem, but not solve it entirely, for even n and p are also discussed. The thesis author contributed by extending previous results to cubes and the general polynomials of low degree and, based on contributions from the other au- thors, he found how the Newton–Girard formulae can be used to compactly express and simplify the equation system corresponding to the case where the surface is a sphere defined by a p -norm for even n and p. 71 74 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Paper D [217] This paper reviews the role of the Vandermonde matrix in random matrix theory and shows how the problem of finding the extreme points of the prob- ability distribution of the eigenvalues of a Wishart matrix can be rewritten as a problem of finding the extreme points of the Vandermonde determinant on a sphere with a radius given by the trace of the square of the Wishart ma- trix. The thesis authors contribution was showing that the extreme points of the probability distribution of the eigenvalues must lie on a sphere with a particular radius and how to use the properties of the Vandermonde deter- minant to find a polynomial whose roots give the coordinates of the extreme points of the probability distribution of the eigenvalues, see Section 2.3.7. Paper E∗ [185] This paper is a detailed description and derivation of some properties of the analytically extended function (AEF) and a scheme for how it can be used in approximation of lightning discharge currents, see sections 3.1.1 and 3.2.2. Lightning discharge currents are classified in the IEC 62305-1 Stan- dard into waveshapes representing important observed phenomena. These waveshapes are approximated with mathematical functions in order to be used in lightning discharge models for ensuring electromagnetic compatibil- ity. A general framework for estimating the parameters of the AEF using the Marquardt least squares method (MLSM) for a waveform with an arbitrary (finite) number of peaks as well as for the given charge transfer and specific energy is described, see sections 1.2.6, 3.2 and 3.2.3. This framework is used to find parameters for some single-peak waveshapes and advantages and disadvantages of the approach are discussed, see Section 3.2.6. The thesis author contributed with the p -peak formulation of the AEF, modification to the MLSM and basic software for fitting the AEF to data. Paper F∗[184] In this paper it is examined how the analytically extended function (AEF) can be used to approximate multi-peaked lightning current waveforms. A general framework for estimating the parameters of the AEF using the Mar- quardt least squares method (MLSM) for a waveform with an arbitrary (finite) number of peaks is presented, see Section 3.2. This framework is used to find parameters for some waveforms, such as lightning currents from the IEC 62305-1 Standard and recorded lightning current data, see Section 3.2.6. The thesis author contributed with improved software for fitting the AEF to the more complicated waveforms (compared to Paper E). ∗The model and techniques in Paper E and F are applied to various waveforms in [144, 145, 188–190]. 72 75 1.7. SUMMARIES OF PAPERS Paper G [191] The multi-peaked analytically extended function (AEF) is used in this pa- per for representation of electrostatic discharge (ESD) currents. In order to minimize unstable behaviour and the number of free parameters the expo- nents of the AEF are chosen from an arithmetic sequence. The function is fitted by interpolating data chosen according to a D-optimal design. ESD current modelling is illustrated through two examples: an approximation of the IEC Standard 61000-4-2 waveshape, and a representation of some mea- sured ESD current. The contents of this paper is in Section 3.3. The thesis author contributed with the derivation of the D-optimal design, motivating its use as well as software for fitting the AEF to the example currents. Paper H [192] There are many models for the mortality rates for various years and coun- tries. A phenomenon that complicates the modelling of human mortality rates is a rapid increase in mortality rate for young adults (in many de- veloped countries this is especially pronounced at the age of 25). In this paper a model for mortality rates based on power-exponential functions is introduced and compared to empirical data for mortality rates from sev- eral countries and other mathematical models for mortality rate. The thesis authors contribution is the formulation of the model and writing software for fitting the various models to empirical data and computing the Akaike Information Criterion to facilitate comparison between the models. Paper I [33] Mortality rate forecasting is important in actuarial science and demogra- phy. There are many models for mortality rates with different properties and varying complexity. In this paper several models are used to mortality rates listings by fitting the models to empirical data using non-linear least square fitting. These listings are then used to forecast the mortality rate using the Lee–Carter method and the results for the different models are compared. The thesis authors contribution was assisting with writing soft- ware that computed the mortality rate listings as well as devise the method for comparing the reliability of the forecast in a simple manner. 73 76 77 Chapter 2 Extreme points of the Vandermonde determinant This chapter is based on Papers A, B, C, and D Paper A Karl Lundeng˚ard,Jonas Osterberg¨ and Sergei Silvestrov. Extreme points of the Vandermonde determinant on the sphere and some limits involving the generalized Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019. Paper B Karl Lundeng˚ard,Jonas Osterberg¨ and Sergei Silvestrov. Optimization of the determinant of the Vandermonde matrix on the sphere and related surfaces. Methodology and Computing in Applied Probability, Volume 20, Issue 4, pages 1417 – 1428, 2018. Paper C Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Extreme points of the Vandermonde determinant on surfaces implicitly determined by a univariate polynomial. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019. Paper D Asaph Keikara Muhumuza, Karl Lundeng˚ard,Jonas Osterberg,¨ Sergei Silvestrov, John Magero Mango and Godwin Kakuba. Optimization of the Wishart joint eigenvalue probability density distribution based on the Vandermonde determinant. Accepted for publication in Algebraic structures and Applications. SPAS2017, V¨aster˚asand Stockholm, Sweden, October 4 – 6, 2017, Sergei Silvestrov, Anatoliy Malyarenko, Milica Ranˇci´c(Eds), Springer International Publishing, 2019. 78 79 2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D 2.1 Extreme points of the Vandermonde determi- nant and related determinants on various sur- faces in three dimensions In this chapter we will discuss how to optimize the determinant of the Van- dermonde matrix and some related determinants over various surfaces in three dimensions and the results will be visualized. 2.1.1 Optimization of the generalized Vandermonde deter- minant in three dimensions This section is based on Section 1.1 of Paper A In this section we plot the values of the determinant v3(x3) = (x3 − x2)(x3 − x1)(x2 − x1), and also the generalized Vandermonde determinant g3(x3, a3) for three dif- 2 2 2 3 ferent choices of a3 over the unit sphere x1 +x2 +x3 = 1 in R . Our plots are over the unit sphere but the determinant exhibits the same general behavior over centered spheres of any radius. This follows directly from (1.4) and that exactly one element from each row appears in the determinant. For any scalar c we get n ! Y ai gn(cxn, an) = c gn(xn, an), i=1 which for vn becomes n(n−1) vn(cxn) = c 2 vn(xn), (32) and so the values over different radii differ only by a constant factor. In Figure 2.1 value of v3(x3) has been plotted over the unit sphere and the curves where the determinant vanishes are traced as black lines. The coordinates in Figure 2.1 (b) are related to x3 by √ 2 0 1 1/ 6 0√ 0 x3 = −1 1 1 0 1/ 2 0√ t, (33) −1 −1 1 0 0 1/ 3 where the columns in the product of the two matrices are the basis vectors in 3 3 R . The unit sphere in R can also be described using spherical coordinates. In Figure 2.1 (c) the following parametrization was used. cos(φ) sin(θ) t(θ, φ) = sin(φ) . (34) cos(φ) cos(θ) 77 80 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions (a) Plot with respect to (b) Plot with respect to (c) Plot with respect to the regular x-basis. the t-basis, see (33). parametrization (34). Figure 2.1: Plot of v3(x3) over the unit sphere. We will use this t-basis and spherical parametrization throughout this sec- tion. From the plots in Figure 2.1 it can be seen that the number of extreme points for v3 over the unit sphere seem to be 6 = 3!. It can also been seen that all extreme points seem to lie in the plane through the origin that is orthogonal to an apparent symmetry axis in the direction (1, 1, 1), the direction of t3. We will see later that the extreme points for vn indeed lie in n X the hyperplane xi = 0 for all n, see Theorem 2.2, and the total number i=1 of extreme points for vn equals n!, see Remark 2.1. The black lines where v3(x3) vanishes are actually the intersections be- tween the sphere and the three planes x3 − x1 = 0, x3 − x2 = 0 and x2 − x1 = 0, as these differences appear as factors in v3(x3). We will see later on that the extreme points are the six points acquired from permuting the coordinates in 1 x3 = √ (−1, 0, 1) . 2 For reasons that will become clear in Section 2.2.1 it is also useful to think about these coordinates as the roots of the polynomial 1 P (x) = x3 − x. 3 2 So far we have only considered the behavior of v3(x3), that is g3(x3, a3) with a3 = (0, 1, 2). We now consider three generalized Vandermonde de- terminants, namely g3 with a3 = (0, 1, 3), a3 = (0, 2, 3) and a3 = (1, 2, 3). These three determinants show increasingly more structure and they all have a neat formula in terms of v3 and the elementary symmetric polynomials X ekn = ek(x1, ··· , xn) = xi1 xi2 ··· xik , 1≤i1 78 81 2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D (a) Plot with respect to (b) Plot with respect to (c) Plot with respect to the regular x-basis. the t-basis, see (33). angles given in (34). Figure 2.2: Plot of g3(x3, (0, 1, 3)) over the unit sphere. where we will simply use ek whenever n is clear from the context. In Figure 2.2 we see the determinant 1 1 1 g3(x3, (0, 1, 3)) = x1 x2 x3 = v3(x3)e1, 3 3 3 x1 x2 x3 plotted over the unit sphere. The expression v3(x3)e1 is easy to derive, the v3(x3) is there since the determinant must vanish whenever any two columns are equal, which is exactly what the Vandermonde determinant expresses. The e1 follows by a simple polynomial division. As can be seen in the plots we have an extra black circle where the determinant vanishes compared to Figure 2.1. This circle lies in the plane e1 = x1 + x2 + x3 = 0 where we previously found the extreme points of v3(x3) and thus doubles the number of extreme points to 2 · 3!. A similar treatment can be made of the remaining two generalized de- terminants that we are interested in, plotted in the following two figures. (a) Plot with respect to (b) Plot with respect to (c) Plot with respect to the regular x-basis. the t-basis, see (33). angles given in (34). Figure 2.3: Plot of g3(x3, (0, 2, 3)) over the unit sphere. 79 82 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions (a) Plot with respect to (b) Plot with respect to (c) Plot with respect to the regular x-basis. the t-basis, see (33). angles given in (34). Figure 2.4: Plot of g3(x3, (1, 2, 3)) over the unit sphere. a3 g3(x3, a3) (0, 1, 2) v3(x3)e0 = (x3 − x2)(x3 − x1)(x2 − x1) (0, 1, 3) v3(x3)e1 = (x3 − x2)(x3 − x1)(x2 − x1)(x1 + x2 + x3) (0, 2, 3) v3(x3)e2 = (x3 − x2)(x3 − x1)(x2 − x1)(x1x2 + x1x3 + x2x3) (1, 2, 3) v3(x3)e3 = (x3 − x2)(x3 − x1)(x2 − x1)x1x2x3 Table 2.1: Table of some determinants of generalized Vandermonde matrices. The four determinants treated so far are collected in Table 2.1. Deriva- tion of these determinants is straight forward. We note that all but one of them vanish on a set of planes through the origin. For a = (0, 2, 3) we have the usual Vandermonde planes but the intersection of e2 = 0 and the unit sphere occur at two circles. 1 x x + x x + x x = (x + x + x )2 − (x2 + x2 + x2) 1 2 1 3 2 3 2 1 2 3 1 2 3 1 1 = (x + x + x )2 − 1 = (x + x + x + 1) (x + x + x − 1) , 2 1 2 3 2 1 2 3 1 2 3 and so g3(x3, (0, 2, 3)) vanish on the sphere on two circles lying on the planes x1 + x2 + x3 + 1 = 0 and x1 + x2 + x3 − 1 = 0. These circles can be seen in Figure 2.3 as the two black circles perpendicular to the direction (1, 1, 1). Note also that while v3 and g3(x3, (0, 1, 3)) have the same absolute value on all their respective local extreme points (by symmetry) we have that both g3(x3, (0, 2, 3)) and g3(x3, (1, 2, 3)) have different absolute values for some of their respective extreme points. 80 83 2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D 2.1.2 Extreme points of the Vandermonde determinant on the three-dimensional unit sphere This section is based on Section 2.2 of Paper A It is fairly simple to describe v3(x3) on the circle that is formed by the intersection of the unit sphere and the plane x1 + x2 + x3 = 0. Using Rodrigues’ rotation formula to rotate a point, x, around the axis √1 (1, 1, 1) 3 with the angle θ will give the rotation matrix √ √ 2 cos(θ) + 1 1− cos(θ)− 3 sin(θ) 1− cos(θ)+ 3 sin(θ) 1 √ √ R = 1− cos(θ)+ 3 sin(θ) 2 cos(θ) + 1 1− cos(θ)− 3 sin(θ) . θ 3 √ √ 1− cos(θ)− 3 sin(θ) 1− cos(θ)+ 3 sin(θ) 2 cos(θ) + 1 A point which already lies on S2 can then be rotated to any other point on S2 by letting R act on the point. Choosing the point x = √1 (−1, 0, 1) θ 2 gives the Vandermonde determinant a convenient form on the circle since: √ − 3 cos(θ) − sin(θ) 1 Rθx = √ √ −2 sin(θ) , 6 3 cos(θ) + sin(θ) which gives √ 2v3(Rθx) = 2 3 cos(θ) + sin(θ) √ 3 cos(θ) + sin(θ) + 2 sin(θ) √ −2 sin(θ) + 3 cos(θ) + sin(θ) 1 1 =√ 4 cos(θ)3 − 3 cos(θ) = √ cos(3θ). 2 2 Note that the final equality follows from cos(nθ) = Tn(cos(θ)) where Tn is the nth Chebyshev polynomial of the first kind. From formula (55) if follows that P3(x) = T3(x) but for higher dimensions the relationship between the Chebyshev polynomials and Pn is not as simple. Finding the maximum points for v3(x3) on this form is simple. The Van- dermonde determinant will be maximal when 3θ = 2nπ where n is some 2π integer. This gives three local maxima corresponding to θ1 = 0, θ2 = 3 4π and θ3 = 3 . These points correspond to cyclic permutation of the coordi- nates of x = √1 (−1, 0, 1). Analogously the minimas for v (x ) can be shown 2 3 3 to be a transposition followed by cyclic permutation of the coordinates of x. Thus any permutation of the coordinates of x correspond to a local extreme point just like it was stated on page 78. 81 84 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 2.1.3 Optimisation using Gr¨obnerbases This section is based on Section 4 of Paper B In this section we will find the extreme points of the Vandermonde de- terminant on a few different surfaces. This will be done using Lagrange multipliers and Gr¨obnerbases but first we will make an observation about the Vandermonde determinant that will be useful later. Lemma 2.1. The Vandermonde determinant is a homogeneous polynomial n(n−1) of degree 2 . Proof. Considering the expression for the Vandermonde determinant in The- n X n(n − 1) orem 1.2 the number of factor of v (x) is i − 1 = . Thus n 2 i=1 n(n−1) vn(cx) = c 2 vn(x). (35) Gr¨obnerbases together with algorithms to find them, and algorithms for solving a polynomial equation is an important tool that arises in many applications. One such application is the optimization of polynomials over affine varieties through the method of Lagrange multipliers. We will here give some main points and informal discussion on these methods as an in- troduction and describe some notation. Definition 2.1. ([60]) Let f1, ··· , fm be polynomials in R[x1, ··· , xn]. The affine variety V (f1, ··· , fm) defined by f1, ··· , fm is the set of all points n (x1, ··· , xn) ∈ R such that fi(x1, ··· , xn) = 0 for all 1 ≤ i ≤ m. When n = 3 we will sometimes use the variables x, y, z instead of x1, x2, x3. Affine varieties are this way the common zeros of a set of multi- variate polynomials. Such sets of polynomials will generate a greater set of polynomials [60] by ( m ) X hf1, ··· , fmi ≡ hifi : h1, ··· , hm ∈ R[x1, ··· , xn] , i=1 and this larger set will define the same variety. But it will also define an ideal (a set of polynomials that contains the zero-polynomial and is closed under addition, and absorbs multiplication by any other polynomial) by I(f1, ··· , fm) = hf1, ··· , fmi. A Gr¨obnerbasis for this ideal is then a finite set of polynomials {g1, ··· , gk} such that the ideal generated by the leading terms of the polynomials g1, ··· , gk is the same ideal as that generated by all the leading terms of polynomials in I = hf1, ··· , fmi. 82 85 2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D In this paper we consider the optimization of the Vandermonde deter- minant vn(x) over surfaces defined by a polynomial equation on the form n X p sn(x1, ··· , xn ; p; a1, ··· , an) ≡ ai|xi| = 1, (36) i=1 where we will select the constants ai and p to get ellipsoids in three di- mensions, cylinders in three dimensions, and spheres under the p-norm in n dimensions. The cases of the ellipsoids and the cylinders are suitable for solution by Gr¨obnerbasis methods, but due to the existing symmetries for the spheres other methods are more suitable, as provided in Section 2.3.3. From (35) and the convexity of the interior of the sets defined by (36), under a suitable choice of the constant p and non-negative ai, it is easy n X p to see that the optimal value of vn on ai|xi| ≤ 1 will be attained on i=1 n X p ai|xi| = 1. And so, by the method of Lagrange multipliers we have that i=1 the minimal/maximal values of vn(x1, ··· , xn) on sn(x1, ··· , xn) ≤ 1 will be attained at points such that ∂vn −λ ∂sn = 0 for 1 ≤ i ≤ n and some constant ∂xi ∂xi λ and sn(x1, ··· , xn) − 1 = 0, [243]. For p = 2 the resulting set of equations will form a set of polynomials in λ, x1, ··· , xn. These polynomials will define an ideal over R[λ, x1, ··· , xn], and by finding a Gr¨obnerbasis for this ideal we can use the especially nice properties of Gr¨obner bases to find analytical solutions to these problems, that is, to find roots for the polynomials in the computed basis. 2.1.4 Extreme points on the ellipsoid in three dimensions This section is based on Section 5 of Paper B In this section we will find the extreme points of the Vandermonde determi- nant on the three dimensional ellipsoid given by ax2 + by2 + cz2 = 1 (37) where a > 0, b > 0, c > 0. Using the method of Lagrange multipliers together with (37) and some rewriting gives that all stationary points of the Vandermonde determinant lie in the variety V = V ax2 + by2 + cz2 − 1, ax + by + cz, ax(z − x)(y − x) − by(z − y)(y − x) + cz(z − y)(z − x). 83 86 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Computing a Gr¨obnerbasis for V using the lexicographic order x > y > z give the following three basis polynomials: 2 g1(z) =(a + b)(a − b) − 4(a + b)2(a + c)(b + c) + 3c2(a2 + ab + b2) + 3c(a3 + b3) z2 + 3c(a + b + c) 4(a + b)(a + c)(b + c) + (a2 + b2)c + (a + b)c2z4 − c2(b + c)(a + c)(a + b + c)2z6, (38) 2 2 2 2 g2(y, z) = 2(a + b) (a + c)(b + c) + c(a + 2b )(a + b + c) + 2bc (a + b) z 5 3 + q1z − q2z − b(a − b)(a + b)(a + b + 3c)y, (39) 2 2 2 2 g3(x, z) = 2(a + b) (a + c)(b + c) + c(2a + b )(a + b + c) + 2ac (a + b) z 5 3 − q1z + q2z − a(a − b)(a + b)(a + b + 3c)x, (40) 2 2 q1 = 9 c (b + c)(a + c)(a + b + c) , 2 2 2 2 2 2 q2 = 3c(a + b + c)(3a b + 4a c + 3ab + 6abc + 4ac + 4b c + 4bc ). This basis was calculated using software for symbolic computation [200]. Since g1 only depends on z, and g2 and g3 are first degree polynomials in y and x respectively, the stationary points can be found by finding the roots of g1 and then calculate the corresponding x and y coordinates. A general formula can be found in this case (since g1 only contains even powers of z it can be treated as a third degree polynomial) but it is quite cumbersome and we will therefore not give it explicitly. Lemma 2.2. The extreme points of v3 on an ellipsoid will have real coor- dinates. Proof. The discriminant is a useful tool for determining how many real roots low-level polynomials have. Following Irving [135] the discriminant, ∆(p), 2 3 of a third degree polynomial p(x) = c0 + c1x + c2x + c3x is 3 2 2 3 2 2 ∆ = 18c1c2c3c4 − 4c2c4 + c2c3 − 4c1c3 − 27c1c4 and if p(x) is non-negative then all roots will be real (but not necessarily distinct). Since the first basis polynomial g1 only contains terms with even 2 exponents and is of degree 6 the polynomialg ˜1 defined byg ˜1(z ) = g1(z) will be a polynomial of degree 3 whose roots are the square roots of g1. Calculating the discriminant ofg ˜1 gives 2 2 4 3 ∆(˜g1) = 9(a − b) (a + b + 3c) (a + b + c) abc 32(a3b2 + a3c2 + a2b3 + a2c3 + b3c2 + b2c3) + 61abc(a + b + c)2 . Since a, b and c are all positive numbers it is clear that ∆(g1) is non- negative. Furthermore, since a, b and c are positive numbers all terms ing ˜1 with odd powers have negative coefficients and all terms with even powers have positive coefficients. Thus if w < 0 theng ˜1(w) > 0 and thus all roots must be positive. 84 87 2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D x2 y2 Figure 2.5: Illustration of the ellipsoid defined by + +z2 = 0 with the ex- 9 4 treme points of the Vandermonde determinant marked. Displayed in Cartesian coordinates on the right and in ellipsoidal coordinates on the left. An illustration of an ellipsoid and the extreme points of the Vandermonde determinant on its surface is shown in Figure 2.5. 2.1.5 Extreme points on the cylinder in three dimensions This section is based on Section 6 of Paper B In this section we will examine the local extreme points on an infinitely long cylinder aligned with the x-axis in 3 dimensions. In this case we do not need to use Gr¨obnerbasis techniques since the problem can be reduced to a one dimensional polynomial equation. The cylinder is defined by by2 + cz2 = 1, where b > 0, c > 0. (41) Using the method of Lagrange multipliers gives the equation system ∂v 3 = 0, ∂x ∂v 3 = 2λby, ∂y ∂v 3 = 2λcz. ∂z Taking the sum of each expression gives c by + cz = 0 ⇔ y = − z. (42) b Combining (41) and (42) gives r c b 1 rc 1 + 1 cz2 = 1 ⇒ z = ± √ ⇒ y = ∓ √ . b c b + c b b + c 85 88 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 16 Figure 2.6: Illustration of the cylinder defined by y2 + z2 = 1 with the ex- 25 treme points of the Vandermonde determinant marked. Displayed in Cartesian coordinates on the right and in cylindrical coordinates on the left. Thus the plane defined by (42) intersects with the cylinder along the lines ( r r ! ) c 1 b 1 `1 = x, √ , − √ x ∈ R = {(x, r, −s)|x ∈ R} , b b + c c b + c ( r r ! ) c 1 b 1 `2 = x, − √ , √ x ∈ R = {(x, −r, s)|x ∈ R} . b b + c c b + c Finding the stationary points for v3 along `1: r r ! ! 2 1 b c 1 v3 (x, r, −s) = x + √ − x + (r + s) , b + c c b b + c r !! ∂v 1 b rc 3 (x, r, −s) = 2x + √ − (r + s) . ∂x b + c c b From this it follows that r ! ∂v 1 rc b 3 (x, r, −s) = 0 ⇔ x = √ − . ∂x 2 b + c b c Thus r ! r ! 1 1 rc b rc b x1 = √ − , , − (43) b + c 2 b c b c is the only stationary point on `1. It can similarly be shown that x2 = −x1 is the only stationary point on `2. The location of these points on the cylinder are shown in Figure 2.6. 86 89 2.1 EXTREME POINTS OF THE VANDERMONDE DETERMINANT AND RELATED DETERMINANTS IN 3D 2.1.6 Optimizing the Vandermonde determinant on a surface defined by a homogeneous polynomial This section is based on Section 7 of Paper B When using Lagrange multipliers it can be desirable to not have to consider the λ-parameter (the scaling between the gradient and direction given by the constraint). We demonstrate a simple way to remove this parameter when the surface is defined by an homogeneous polynomial. Lemma 2.3. Let g : R → R be a homogeneous polynomial such that k n(n−1) n g(cx) = c g(x) with k 6= 2 . If g(x) = 1, x ∈ C defines a continu- ous bounded surface then any point on the surface that is a stationary point n for the Vandermonde determinant, z ∈ C , can be written as z = cy where ∂vn ∂g = , i ∈ 1, 2, . . . , n (44) ∂xi x=y ∂xi x=y − 1 and c = g(y) k . n Proof. By the method of Lagrange multipliers the point y ∈ {x ∈ R |g(x) = 1} is a stationary point for the Vandermonde determinant if ∂vn ∂g = λ , k ∈ 1, 2, . . . , n ∂xk x=y ∂xk x=y for some λ ∈ R. The stationary points on the surface given by g(cx) = ck are given by n(n−1) ∂vn k ∂g c 2 = c λ , k ∈ 1, 2, . . . , n ∂xk x=y ∂xk x=y n(1−n) k and if c is chosen such that λ = c 2 c then the stationary points are defined by ∂v ∂g n = , k ∈ 1, 2, . . . , n. ∂xk ∂xk n k Suppose that y ∈ {x ∈ R |g(x) = c } is a stationary point for vn then − 1 the point given by z = cy where c = g(y) k will be a stationary point for the Vandermonde determinant and will lie on the surface defined by g(x) = 1. Lemma 2.4. If z is a stationary point for the Vandermonde determinant on the surface g(x) = 1 where g(x) is a homogeneous polynomial then −z is either a stationary point or does not lie on the surface. 87 90 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions k Proof. Since g(−x) = (−1) g(x) is either 1 or −1 then |vn(x)| = |vn(−x)| for any point, including z and the points in a neighbourhood around it which means that if g(−x) = g(x) then the stationary points are preserved and otherwise the point will lie on the surface defined by g(x) = −1 instead of g(x) = 1. A well-known example of homogeneous polynomials are quadratic forms. If we let g(x) = x>aSx then g(x) is a quadratic form which in turn is a homogeneous polynomial with k = 2. If S is a positive definite matrix then g(x) = 1 defines an ellipsoid. Here we will demonstrate the use of Lemma 2.3 to find the extreme points on a rotated ellipsoid. Consider the ellipsoid defined by 1 5 3 5 x2 + y2 + yz + z2 = 1 (45) 9 8 4 8 then by Lemma 2.3 we can instead consider the points in the variety 2 V = V − 2xy + 2xz + y2 − z2 − x, 9 5 3 − x2 + 2xy − 2yz + z2 − y − z, 4 4 3 5 − 2xz − y2 + 2yz + x2 − y − z. 4 4 Finding the Gr¨obnerbasis of V gives 2 g1(z) = z(6z + 1)(260642z − 27436z + 697), 3 2 g2(y, z) = − 1138484256z − 127275604z + 16689841z + 6277879y, 3 2 g3(x, z) = 10246358304z + 1145480436z − 93707658z + 6277879x. This system is not difficult to solve and the resulting points are: p0 = (0, 0, 0), 1 1 p = 0, , − , 1 6 6 √ √ √ ! 45 2 1 5 2 1 5 2 p = , − − , − , 2 361 19 722 19 722 √ √ √ ! 45 2 1 5 2 1 5 2 p = , − + , + . 3 361 19 722 19 722 The point p0 is an artifact of the rewrite and does not lie on any ellipsoid and can therefore be discarded. By Lemma 2.4 there are also three more 88 91 2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE Figure 2.7: Illustration of the ellipsoid defined by (45) with the extreme points of the Vandermonde determinant marked. Displayed in Cartesian coordinates on the right and in ellipsoidal coordinates on the left. stationary points p4 = −p1, p5 = −p2 and p6 = −p3. Rescaling each of p these points according to Lemma 2.3 gives qi = g(pi) which are all points on the ellipsoid defined by g(x) = 1. The result is illustrated in Figure 2.7. Note that this example gives a simple case with a Gr¨obnerbasis that is small and easy to find. Using this technique for other polynomials and in higher dimensions can require significant computational resources. 2.2 Extreme points of the Vandermonde determi- nant on the sphere In this section we will consider the extreme points of the Vandermonde n determinant on the n-dimensional unit sphere in R . We want both to find an analytical solution and to identify some properties of the determinant that can help us to visualize it in some area around the extreme points in dimensions n > 3. 2.2.1 The extreme points on the sphere given by roots of a polynomial This section is based on Section 2.1 of Paper A The extreme points of the Vandermonde determinant on the unit sphere in n R are known and given by Theorem 2.3 where we present a special case of Theorem 6.7.3 in [269]. We will also provide a proof that is more explicit than the one in [269] and that exposes more of the rich symmetric prop- erties of the Vandermonde determinant. For the sake of convenience some properties related to the extreme points of the Vandermonde determinant defined by real vectors xn will be presented before Theorem 2.3. 89 92 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Theorem 2.1. For any 1 ≤ k ≤ n n ∂vn X vn(xn) = . (46) ∂x x − x k i=1 k i i6=k This theorem will be proven after introducing the following useful lemma: Lemma 2.5. For any 1 ≤ k ≤ n − 1 "n−1 # ∂vn vn(xn) Y ∂vn−1 = − + (x − x ) (47) ∂x x − x n i ∂x k n k i=1 k and n−1 ∂vn X vn(xn) = . (48) ∂x x − x n i=1 n i Proof. Note that the determinant can be described recursively "n−1 # Y Y vn(xn) = (xn − xi) (xj − xi) i=1 1≤i 90 93 2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE Supposing that formula (46) is true for n − 1 results in "n−1 # n−1 ∂vn vn(xn) Y X vn−1(xn−1) = − + (x − x ) ∂x x − x n i x − x k n k i=1 i=1 k i i6=k n−1 n vn(xn) X vn(xn) X vn(xn) = + = . x − x x − x x − x k n i=1 k i i=1 k i i6=k i6=k Showing that (46) is true for n = 2 completes the proof 2 ∂v2 ∂ x2 − x1 X v2(x2) = (x − x ) = −1 = = ∂x ∂x 2 1 x − x x − x 1 1 1 2 i=1 1 i i6=1 2 ∂v2 ∂ x2 − x1 X v2(x2) = (x − x ) = 1 = = . ∂x ∂x 2 1 x − x x − x 2 2 2 1 i=1 2 i i6=2 Theorem 2.2. The extreme points of vn(xn) on the unit sphere can all be found in the hyperplane defined by n X xi = 0. (50) i=1 This theorem will be proved after the introduction of the following useful lemma: Lemma 2.6. For any n ≥ 2 the sum of the partial derivatives of vn(xn) will be zero. n X ∂vn = 0. (51) ∂xk k=1 Proof. This lemma is easily proven using Lemma 2.5 and induction: n n−1 "n−1 # ! n−1 X ∂vn X vn(xn) Y ∂vn−1 X vn(xn) = − + (xn − xi) + ∂xk xn − xk ∂xk xn − xi k=1 k=1 i=1 i=1 "n−1 # n−1 Y X ∂vn−1 = (xn − xi) . ∂xk i=1 k=1 Thus if equation (51) is true for n − 1 it is also true for n. Showing that the equation holds for n = 2 is very simple ∂v ∂v 2 + 2 = −1 + 1 = 0. ∂x1 ∂x2 91 94 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Proof of Theorem 2.2. Using the method of Lagrange multipliers it follows that any xn on the unit sphere that is an extreme point of the Vandermonde determinant will also be a stationary point for the Lagrange function n ! X 2 Λn(xn, λ) = v(xn) − λ xi − 1 i=1 for some λ. Explicitly this requirement becomes ∂Λ n = 0 for all 1 ≤ k ≤ n, (52) ∂xk ∂Λ n = 0. (53) ∂λ Equation (53) corresponds to the restriction to the unit sphere and is there- fore immediately satisfied. Since all the partial derivatives of the Lagrange function should be equal to zero it is obvious that the sum of the partial derivatives will also be equal to zero. Combining this with Lemma 2.6 gives n n n X ∂Λn X ∂vn X = − 2λxk = −2λ xk = 0. (54) ∂xk ∂xk k=1 k=1 k=1 n X There are two ways to satisfy condition (54) either λ = 0 or xk = 0. k=1 When λ = 0 equation (52) reduces to ∂v n = 0 for all 1 ≤ k ≤ n, ∂xk and by equation (32) this can only be true if vn(xn) = 0, which is of no n X interest to us, and so all extreme points must lie in the hyperplane xk = k=1 0. n Theorem 2.3. A point on the unit sphere in R , xn = (x1, x2, . . . xn), is an extreme point of the Vandermonde determinant if and only if all xi, i ∈ {1, 2, . . . n}, are distinct roots of the rescaled Hermite polynomial r ! − n n(n − 1) P (x) = (2n(n − 1)) 2 H x . (55) n n 2 92 95 2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE Remark 2.1. Note that if xn = (x1, x2, . . . xn) is an extreme point of the Vandermonde determinant then any other point whose coordinates are a permutation of the coordinates of xn is also an extreme point. This follows from the determinant function being, by definition, alternating with respect to the columns of the matrix and the xis defines the columns of the Vander- monde matrix. Thus any permutation of the xis will give the same value for |vn(xn)|. Since there are n! permutations there will be at least n! extreme points. The roots of the polynomial (55) define the set of xis fully and thus there are exactly n! extreme points, n!/2 positive and n!/2 negative. Remark 2.2. All terms in Pn(x) are of even order if n is even and of odd order when n is odd. This means that the roots of Pn(x) will be symmetrical in the sense that if xi is a root then −xi is also a root. Proof of Theorem 2.3. By the method of Lagrange multipliers condition (52) must be satisfied for any extreme point. If xn is a fixed extreme point so that vn(xn) = vmax, then (52) can be written explicitly, using (46), as n ∂Λn X vmax = − 2λx = 0 for all 1 ≤ k ≤ n, ∂x x − x k k i=1 k i i6=k or alternatively by introducing a new multiplier ρ as n X 1 2λ ρ = x = x for all 1 ≤ k ≤ n. (56) x − x v k n k i=1 k i max i6=k By forming the polynomial f(x) = (x − x1)(x − x2) ··· (x − xn) and noting that n n n 0 X Y Y f (xk) = (x − xi) = (xk − xi), j=1 i=1 x=xk i=1 i6=j i6=k n n n n n n n 00 X X Y X Y X Y f (xk) = (x − xi) = (xk − xi) + (xk − xi) l=1 j=1 i=1 x=xk j=1 i=1 l=1 i=1 j6=l i6=j j6=k i6=j l6=k i6=l i6=l i6=k i6=k n n X Y = 2 (xk − xi), j=1 i=1 j6=k i6=j i6=k 93 96 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions we can rewrite (56) as 00 1 f (xk) ρ 0 = xk, 2 f (xk) n or 2ρ f 00(x ) − x f 0(x ) = 0. k n k k And since the last equation must vanish for all k we must have 2ρ f 00(x) − xf 0(x) = cf(x), (57) n for some constant c. To find c the xn-terms of the right and left part of equation (57) are compared to each other, 2ρ c · c xn = − xnc xn−1 = −2ρ · c xn ⇒ c = −2ρ. n n n n Thus the following differential equation for f(x) must be satisfied 2ρ f 00(x) − xf 0(x) + 2ρf(x) = 0. (58) n Choosing x = az gives 2ρ f 00(az) − a2zf 0(az) + 2ρf(az) (n − 1) 1 d2f 2ρ 1 df = (az) − az (az) + 2ρf(az) = 0. a2 dz2 n a dz q n By setting g(z) = f(az) and choosing a = ρ a differential equation that matches the definition for the Hermite polynomials is found: g00(z) − 2zg0(z) + 2ng(z) = 0. (59) By definition the solution to (59) is g(z) = bHn(z) where b is a constant. An exact expression for the constant a can be found using Lemma 2.7 (for the sake of convenience the lemma is stated and proved after this theorem). We get n n X X n(n − 1) x2 = a2z2 = 1 ⇒ a2 = 1, i i 2 i=1 i=1 and so s 2 a = . n(n − 1) 94 97 2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE Thus condition (52) is satisfied when xi are the roots of r ! n(n − 1) P (x) = bH (z) = bH x . n n n 2 − n Choosing b = (2n(n − 1)) 2 gives Pn(x) with leading coefficient 1. This can be confirmed by calculating the leading coefficient of P (x) using the explicit expression for the Hermite polynomial (61). This completes the proof. Lemma 2.7. Let xi, i = 1, 2, . . . , n be roots of the Hermite polynomial Hn(x). Then n X n(n − 1) x2 = . i 2 i=1 Proof. By letting ek(x1, . . . xn) denote the elementary symmetric polynomi- als Hn(x) can be written as Hn(x) = An(x − x1) ··· (x − xn) n n−1 n−2 = An(x − e1(x1, . . . , xn)x + e2(x1, . . . , xn)x + q(x)) where q(x) is a polynomial of degree n − 3. Noting that n X 2 2 X xi = (x1 + ... + xn) − 2 xixj i=1 1≤i Comparing the coefficients in the two expressions for Hn(x) gives n An = 2 , Ane1(x1, . . . , xn) = 0, n−2 Ane2(x1, . . . , xn) = −n(n − 1)2 . Thus by (60) n X n(n − 1) x2 = . i 2 i=1 95 98 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions k Theorem 2.4. The coefficients, ak, for the term x in Pn(x) given by (55) are given by the following relations 1 a = 1, a = 0, a = , n n−1 n−2 2 (k + 1)(k + 2) a = − a , 1 ≤ k ≤ n − 3. (62) k n(n − 1)(n − k) k+2 Proof. Equation (58) tells us that 1 1 P (x) = P 00(x) − xP 0 (x). (63) n 2ρ n n n That an = 1 follows from the definition of Pn and an−1 = 0 follows from the Hermite polynomials only having terms of odd powers when n is odd and 1 even powers when n is even. That an−2 = 2 can be easily shown using the definition of Pn and the explicit formula for the Hermite polynomials (61). The value of the ρ can be found by comparing the xn−2 terms in (63) 1 1 a = n(n − 1)a + (n − 2)a . n−2 2ρ n n n−2 From this follows 1 −1 = . 2ρ n2(n − 1) Comparing the xn−l terms in (63) gives the following relation 1 1 a = (n − l + 2)(n − l)a + (n − l)a n−l 2ρ n−l+2 n−l n which is equivalent to −(n − l + 2)(n − l + 1) a = a . n−l n−l+2 ln2(n − 1) Letting k = n − l gives (62). 2.2.2 Further visual exploration on the sphere This section is based on Section 2.4 of Paper A Visualization of the determinant v3(x3) on the unit sphere is straightforward, as well as visualizations for g3(x3, a) for different a. In three dimensions all points on the sphere can be mapped to the plane. In higher dimensions we need to reduce the set of visualized points somehow. In this section we provide visualizations for v4, . . . , v7 by using symmetry properties of the Vandermonde determinant. 96 99 2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE Four dimensions By Theorem 2.2 we know that the extreme points of v4(x4) on the sphere all lie in the hyperplane x1 + x2 + x3 + x4 = 0. The intersection of this 4 hyperplane with the unit sphere in R can be described as a unit sphere in 3 R , under a suitable basis, and can then be easily visualized. This can be realized using the transformation −1 −1 0 √ 1/ 4 0 0 −1 1 0 √ x = 0 1/ 2 0 t. (64) 1 0 −1 √ 0 0 1/ 2 1 0 1 (a) Plot with t-basis given by (64). (b) Plot with θ and φ given by (34). Figure 2.8: Plot of v4(x4) over points on the unit sphere. The results of plotting the v4(x4) after performing this transformation can be seen in Figure 2.8. All 24 = 4! extreme points are clearly visible. From Figure 2.8 we see that whenever we have a local maxima we have a local maxima at the opposite side of the sphere as well, and the same for minima. This is due to the occurrence of the exponents in the rows of Vn. From equation (32) we have n(n−1) vn((−1)xn) = (−1) 2 vn(xn), and so opposite points are both maxima or both minima if n = 4k or + n = 4k + 1 for some k ∈ Z and opposite points are of different types + if n = 4k − 2 or n = 4k − 1 for some k ∈ Z . By Theorem 2.3 the extreme points on the unit sphere for v4(x4) is described by the roots of this polynomial 1 1 P (x) = x4 − x2 + . 4 2 48 97 100 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions The roots of P4(x) are: s s 1 r2 1 r2 x = − 1 + , x = − 1 − , 41 2 3 42 2 3 s s 1 r2 1 r2 x = 1 − , x = 1 + . 43 2 3 44 2 3 Five dimensions By Theorem 2.3 or 2.4 we see that the polynomials providing the coordinates of the extreme points have all even or all odd powers. From this it is easy to see that all coordinates of the extreme points must come in pairs xi, −xi. Furthermore, by Theorem 2.2 we know that the extreme points of v5(x5) on the sphere all lie in the hyperplane x1 + x2 + x3 + x4 + x5 = 0. 5 We use this to visualize v5(x5) by selecting a subspace of R that contains all points that have coordinates which are symmetrically placed on the real line, (x1, x2, 0, −x2, −x1). The coordinates in Figure 2.9 (a) are related to x5 by −1 0 1 √ 0 −1 1 1/ 2 0 0 √ x5 = 0 0 1 0 1/ 2 0 t. (65) √ 0 1 1 0 0 1/ 5 1 0 1 (a) Plot with t-basis given by (65). (b) Plot with θ and φ given by (34). Figure 2.9: Plot of v5(x5) over points on the unit sphere. The result, see Figure 2.9, is a visualization of a subspace containing 8 of the 120 extreme points. Note that to satisfy the condition that the 98 101 2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE coordinates should be symmetrically distributed pairs can be fulfilled in two other subspaces with points that can be described in the following ways: (x1, x2, 0, −x1, −x2) and (x2, −x2, 0, x1, −x1). This means that a transfor- mation similar to (65) can describe 3 · 8 = 24 different extreme points. The transformation (65) corresponds to choosing x3 = 0. Choosing 5 another coordinate to be zero will give a different subspace of R which behaves identically to the visualized one. This multiplies the number of extreme points by five to the expected 5 · 4! = 120. By Theorem 2.3 the extreme points on the unit sphere for v5(x5) is described by the roots of this polynomial 1 3 P (x) = x5 − x3 + x. 5 2 80 The roots of P5(x) are: x51 = −x55, x52 = −x54, x53 = 0, s s 1 r2 1 r2 x = 1 − , x = 1 + . 54 2 5 55 2 5 Six dimensions As for v5(x5) we use symmetry to visualize v6(x6). We select a subspace of 6 R with all symmetrical points (x1, x2, x3, −x3, −x2, −x1) on the sphere. The coordinates in Figure 2.10 (a) are related to x6 by −1 0 0 0 −1 0 √ 1/ 2 0 0 0 0 −1 √ x6 = 0 1/ 2 0 t. (66) 0 0 1 √ 0 0 1/ 2 0 1 0 1 0 0 In Figure 2.10 there are 48 visible extreme points. The remaining ex- treme points can be found using arguments analogous the five-dimensional case. By Theorem 2.3 the extreme points on the unit sphere for v6(x6) is described by the roots of this polynomial 1 1 1 P (x) = x6 − x4 + x2 − . 6 2 20 1800 99 102 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions (a) Plot with t-basis given by (66). (b) Plot with θ and φ given by (34). Figure 2.10: Plot of v6(x6) over points on the unit sphere. The roots of P6(x) are: x61 = − x66, x62 = −x65, x63 = −x64, 3 1 4 √ 1 1 2 (−1) 3 3 3 x64 = √ 10i − 10 z6w + z6w 2 15 6 6 1 r √ √ = √ 10 − 2 10 3l6 − k6 , (67) 2 15 1 1 4 √ 1 1 2 (−1) 3 3 3 x65 = √ −10i − 10 z6w + z6w 2 15 6 6 1 r √ √ = √ 10 − 2 10 3l6 + k6 , (68) 2 15 1 1 √ 1 1 2 x = 3 10 w 3 + w 3 + 5 66 30 6 6 r 1 √ = 2 10 · k + 5 , (69) 30 6 √ √ z6 = 3 + i, w6 = 2 + i 6 !! !! 1 r3 1 r3 k = cos arctan , l = sin arctan . 6 3 2 6 3 2 100 103 2.2. EXTREME POINTS OF THE VANDERMONDE DETERMINANT ON THE SPHERE Seven dimensions As for v6(x6) we use symmetry to visualize v7(x7). We select a subspace of 7 R that contains all symmetrical points (x1, x2, x3, 0, −x3, −x2, −x1) on the sphere. The coordinates in Figure 2.11 (a) are related to x7 by −1 0 0 0 −1 0 √ 0 0 −1 1/ 2 0 0 √ x7 = 0 0 0 0 1/ 2 0 t. (70) √ 0 0 1 0 0 1/ 2 0 1 0 1 0 0 (a) Plot with t-basis given by (70). (b) Plot with θ and φ given by (34). Figure 2.11: Plot of v7(x7) over points on the unit sphere. In Figure 2.11 48 extreme points are visible just like it was for the six- dimensional case. This is expected since the transformation corresponds 7 to choosing x4 = 0 which restricts us to a six-dimensional subspace of R which can then be visualized in the same way as the six-dimensional case. The remaining extreme points can be found using arguments analogous the five-dimensional case. By Theorem 2.3 the extreme points on the unit sphere for v4 is described by the roots of this polynomial 1 5 5 P (x) = x7 − x5 + x3 − x. 7 2 84 3528 101 104 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions The roots of P7(x) are: x71 = − x77, x72 = −x76, x73 = −x75, x74 = 0, 3 1 4 √ 1 1 2 (−1) 3 3 3 x75 = √ 14i − 14 z6w + z6w 2 21 6 6 1 r √ √ = √ 14 − 2 14 3l6 − k6 , (71) 2 21 1 1 4 √ 1 1 2 (−1) 3 3 3 x76 = √ −14i − 14 z6w + z6w 2 21 6 6 1 r √ √ = √ 14 − 2 14 3l7 + k7 , (72) 2 21 r 1 1 √ 1 1 2 x = 3 14 w 3 + w 3 + 5 77 42 6 6 r 1 √ = 2 14k + 5 , (73) 42 7 √ √ z6 = 3 + i, w6 = 2 + i 10 !! 1 r5 k = cos arctan , 7 3 2 !! 1 r5 l = sin arctan . 7 3 2 102 105 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL 2.3 Extreme points of the Vandermonde determi- nant on some surfaces implicitly defined by a univariate polynomial This section is based on Paper C In this section the objective is to find the extreme points of the Vander- monde determinant on a surface implicitly defined by n m X X i gR(x) = R(xi) = 0, where R(x) = rix , ri ∈ R. (74) i=1 i=0 Lemma 2.8. The problem of finding the extreme points of the Vandermonde determinant on the surface defined by gR(x) = 0 can be rewritten as an ordinary differential equation of the form f 00(x) − 2ρR0(x)f 0(x) − P (x)f(x) = 0 (75) that has a unique (up to a multiplicative constant) polynomial solution, f, and any permutation of the roots of f will give the coordinates of a critical point of the Vandermonde determinant. Proof. Using the method of Lagrange multipliers we get n ∂vn ∂gR X vn(x) = λ ⇔ = λR0(x ) ∂x ∂x x − x j j j i=1 j i i6=j for some λ ∈ R. If we only consider this expression in a single point we can consider vn(x) as a constant value and then the expression can be rewritten as n X 1 = ρR0(x ) (76) x − x j i=1 j i i6=j where ρ is some unknown constant. Consider the polynomial n Y f(x) = (x − xi) i=1 and note that 00 n 1 f (xj) X 1 = . (77) 2 f 0(x ) x − x j i=1 j i i6=j 103 106 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions In each critical point we can combine (76) and (77) thus in each of the extreme points we will have the relation 00 0 0 f (xj) − 2ρR (xj)f (xj) = 0, j = 1, 2, . . . , n for some ρ ∈ R. Since each xj is a root of f(x) we see that the left hand side in the differential equation must be a polynomial with the same roots as f(x), thus we can conclude that for any x ∈ R f 00(x) − 2ρR0(x)f 0(x) − P (x)f(x) = 0 (78) where P (x) is a polynomial of degree m − 2. Using this technique it is also easy to find the coordinates on a sphere translated in the (1,..., 1) direction. Corollary 2.1. If x = (x1, x2, . . . , xn) is a critical point of the Vander- n monde determinant on a surface S ⊂ C then (x1 + a, x2 + a, . . . , xn + a) is a critical point of the Vandermonde determinant on the surface {x + a1 ∈ n C |x ∈ S}. Proof. Follows immediately from Y vn (x1 + a, x2 + a, . . . , xn + a) = (xj + a − xi − a) 1≤i In several cases it is possible to find the extreme points by identifying the unknown parameters, ρ and the coefficients of P (x), by comparing the terms in (75) with different degrees and solving the resulting equation system. We will discuss the cases in the upcoming sections. 2.3.1 Critical points on surfaces given by a first degree uni- variate polynomial n X When R(x) = r1x + r0 the surface defined by R(xi) = 0 will always be i=1 a plane with normal (1, 1,..., 1) through the point r0 , r0 ,..., r0 . r1 r1 r1 Since r0 r0 r0 Y r0 r0 vn x1 + , x2 + , . . . , xn + = xj + − xi − r1 r1 r1 r1 r1 1≤i 104 107 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL 2.3.2 Critical points on surfaces given by a second degree univariate polynomial 1 2 1 2 2 Surfaces defined by letting R(x) = 2 x + r1x + r0 = 2 (x + r1) − r1 + 2r0 r 2 r1 will all be spheres around (−r1, −r1,..., −r1) with radius n 2 − r0 . Thus the critical points can be found by a small modification of the technique used on the unit sphere described in Section 2.2. n X 1 Theorem 2.5. On the surface defined by g(x) = x2 + r x + r the 2 i 1 i 0 i=1 coordinates of the critical points of the Vandermonde determinant are given by the roots of 1 ! n − 1 2 (x + r1) f(x) = Hn 2 2(r1 − 2r0) 2 n n−2i b 2 c i n−2i X (−1) n − 1 2 (x + r1) = n! i! 2(r2 − 2r ) (n − 2i)! i=0 1 0 where Hn denotes the nth (physicist) Hermite polynomial. 1 2 Proof. Since R(x) = 2 x + r1x + r0 the differential equation (75) will be of the form 00 0 f (x) − 2ρ(x + r1)f (x) − p0f(x) = 0. By considering the terms with degree n it is easy to see that p0 = −2ρn and thus we get 00 0 f (x) − 2ρ(x + r1)f (x) + 2ρnf(x) = 0. 1 y Setting y = ρ 2 (x + r1) gives x = 1 − r1 and by considering the function ρ 2 y g(y) = f 1 − r1 we can rewrite the differential equation as follows ρ 2 d2g ry dg − 2ρ − r + r + 2 ρ n g(y) = 0 dx2 ρ 1 1 dx y 1 00 2 0 ⇔ ρ g (y) − 2 ρ 1 ρ g (x) + 2 ρ n g(y) = 0 ρ 2 ⇔ g00(y) − 2 y g0(x) + 2 n g(y) = 0. (79) Equation (79) defines a class of orthogonal polynomials called the Hermite 1 polynomials [2], Hn(y). Thus f(x) = cHn(ρ 2 (x + r1)) for some arbitrary constant c. To find the value of ρ we can exploit some properties of the roots of the Hermite polynomials. 105 108 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions If we let yi, i = 1, . . . , n be the roots of Hn(y). On page 95 we show that these roots have the following properties n X yi = 0, (80) i=1 n X n(n − 1) y2 = . (81) i 2 i=1 y We now take the change of variables x = 1 − r1 into consideration and ρ 2 get n n !2 n ! X X yi 1 X xi = 1 − r1 = 1 yi − nr1, i=1 i=1 ρ 2 ρ 2 i=1 n n !2 n n ! n ! X X yi X 1 X 2r1 X x2 = − r = y2 − y + nr2. i 1 1 ρ i 1 i 1 i=1 i=1 ρ 2 i=1 i=1 ρ 2 i=1 Using (80) and (81) we can simplify these expression n X xi = −nr1, i=1 n X n(n − 1) x2 = + nr2. i 2ρ 1 i=1 This allow us to rephrase the constraint g(x) = 0 as follows n X 1 n(n − 1) nr2 g(x) = x2 + r x + r = − 1 + nr = 0 2 i 1 i 0 4ρ 2 0 i=1 and from this it is easy to find an expression for ρ n − 1 ρ = 2 . 8(r1 − 2r0) Thus the coordinates of the extreme points are the roots of the polynomial given in Theorem 2.5. Remark 2.3. Note that Remarks 2.1 and 2.2 apply in this case as well. For more details and demonstrations of how to visualize this result see Section 2.2.2. 106 109 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL 2.3.3 Critical points on the sphere defined by a p-norm n Definition 2.2. The p−norm of x ∈ R denoted kxkp is defined as 1 n ! p X p kxkp = |xi| , for p > 0. (82) i=1 n Definition 2.3. The infinity norm of x ∈ R denoted kxk∞ is defined as kxk∞ = sup{|xi| : 1 ≤ i ≤ n}. (83) n−1 Definition 2.4. The sphere defined by the p-norm, denoted Sp (r), for n positive integer p, is the set of all x ∈ R such that n X p p p |xi| = kxkp = r . (84) i=1 When r = 1 this is the unit sphere defined by the p-norm, denoted simply n−1 n−1 Sp . When p increases the points on Sp approach the points on the cube n−1 so for convenience we define S∞ as the cube defined by the boundary of [−1, 1]n. Spheres defined by p-norms describes many well-known geometric shapes. 1 2 2 2 2 For instance when n = 2, p = 2, then S2 (r) = {(x1, x2) ∈ R : x1 + x2 = r } is a circle and when n = 3, p = 2, then 1 2 2 2 2 2 S3 (r) = {(x1, x2, x3) ∈ R : x1 + x2 + x3 = r } is the standard 2-sphere with radius r. In the previous section we discussed how the extreme points of the Vandermonde determinant are distributed for the case p = 2 and n ≥ 2. In this section we will examine how the extreme points of the Vandermonde determinant are distributed on the sphere defined by the p-norm for the cases p ∈ {4, 6, 8} for a few different values of n. In n−1 Figure 2.12 Sp for p = 2, p = 4, p = 6, p = 8, and p = ∞ with a section cut out are illustrated. Similarly to the previous section we will construct a polynomial whose roots give the coordinates of the extreme points of the Vandermonde deter- minant. First we will consider the case p = 4, n = 4. 2.3.4 The case p = 4 and n = 4 We will illustrate the construction of a polynomial that has the coordinates of the points as roots with the case p = 4, n = 4. If we denote the poly- 4 nomial whose roots give the coordinates with P4 (x) and use the same type of argument that was used to get equation (75). Taking P (x) to be of the form: n n−2 n−4 P (x) = x + cn−2x + cn−4x + ··· (85) 107 110 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 2 Figure 2.12: Illustration of Sp for p = 2, p = 4, p = 6, p = 8, and p = ∞ with a section cut out. The outer cube corresponds to p = 0 and p = 2 corresponds to the sphere in the middle. with every other coefficient zero, when n is even of we have even powers and when n is odd we have odd powers. By identifying the powers in the differential equation (75) for the case p = 4: 00 3 0 2 P (x) + ρnx P (x) + (σnx + τnx + νn)P (x) = 0, (86) we obtain that τnxP (x) does not share any powers with any other part of the equation and thus τn = 0. Similarly, identifying the coefficients we obtain pρn + σn = 0. This leads us to the differential equation 00 3 0 2 P (x) + ρnx P (x) + (−pρnx + νn)P (x) = 0. (87) Basing on (85) and (87), and setting n = 4, p = 4 we get to generate the system of n n−1 X p Sp = xi = 1, i=1 4 n n−2 n−4 P4 (x) = x + cn−2x + cn−4x + ··· , 00 0 4 3 4 2 4 P4 (x) + ρnx P4 (x) + (−pρnx + νn)P4 (x) = 0. It follows that 4 4 X 4 4 4 4 4 S4 = xi = x1 + x2 + x3 + x4 = 1, i=1 0 00 4 4 2 4 3 4 2 P4 (x) = x + c2x + c0 ⇒ P4 (x) = 4x + 2c2x ⇒ P4 (x) = 12x + 2c2, and by substitution into the differential equation 2 3 3 2 4 2 (12x + 2c2) + ρnx (4x + 2c2x) + (−pρnx + νn)(x + c2x + c0) = 0, 4 2 (ν − 2ρc2)x + (νc2 − 4ρc0 + 12)x + (2c2 + c0ν) = 0. 108 111 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL Equating corresponding coefficients as in P (x) we get: ν − 2ρc2 = 1, νc2 − 4ρc0 + 12 = c2, 2c2 + c0ν = c0. 2 4 Setting t = x we can express S3 and P (x) as follows: 4 4 2 2 X 2 S3 = 2t1 + 2t2 = 2 ti = 1 i=1 4 4 2 2 P4 (x) = x + c2x + c0 = t − (t0 + t1)t + t0t1 = 0. 4 Equating coefficient in P4 (x) gives t0 + t1 = c2, t0t1 = c0 2 2 2 ⇒t0t1 + t1 = c2t1 ⇒ c0 + t1 = c2t1 ⇒ t1 = c2t1 − c0 2 2 2 ⇒t0 + t0t1 = c2t0 ⇒ t0 + c0 = c2t0 ⇒ t0 = c2t0 − c0 4 2 2 2 X 2 2 ⇒t0 + t1 = c2(t0 + t1) − 2c0 = c2 − 2c0 ⇒ 2 ti = 2(c2 − 2c0) = 1 i=1 This now gives a fourth equation so as to solve the system: ν − 2ρc2 = 1, (88) νc2 − 4ρc0 + 12 = c2, (89) 2c2 + c0ν = c0, (90) 2 2(c2 − 2c0) = 1. (91) From (88) we obtain ν = 1 + 2ρc2 and substituting this into (89) gives 2 c2(1 + 2ρc2) − 4ρc0 + 12 = c2 ⇒ ρ 2(c2 − 2c0) = −12 ⇒ ρ = −12. To get the last equality use (91) and the fact that c2 6= 0. Using this value in the expression for ν we obtain ν = 1 − 24c2 and substituting this value into (90) gives 1 2c + c (1 − 24c ) = c ⇒ 2c (1 − 12c ) = 0 ⇒ 1 − 12c = 0 ⇒ c = , 2 0 2 0 2 0 0 0 12 where the last equality follows from c2 6= 0. Now with ρ = −12, c0 = 1/12, using (91) we obtain 2 1 2 8 2 2(c − 2c0) = 1 ⇒ c2 = + = ⇒ c2 = √ 2 2 12 12 6 Therefore we obtain P 4(x) = x4 − √2 x2 + 1 . 4 6 12 In Section 2.3.5 we will generalise this technique somewhat. 109 112 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 2.3.5 Some results for even n and p In this section we will discuss the case when n and p are positive and even integers, and n > p. We will discuss a method that can give the coordinates n−1 extreme points of the Vandermonde determinant constrained to Sp , as defined in (84), as the roots of a polynomial. First we will examine how this optimisation problem can be rewritten as a differential equation similar to (86). Lemma 2.9. Let n and p be even positive integers. Consider the unit sphere given by the p - norm, in other words the surface given by ( ) n p n X p Sn = (x1, . . . , xn) ∈ R xi = 1 . i=1 There exists a second order differential equation a P p00(x) − p−2 xp−1P p0(x) + Qp (x)P p(x) = 0, (92) n n n n n p p where Pn (x) and Qn(x) are of the forms 1 2 n−1 p 2n X 2i Pn (x) = x + c2ix i=0 and 1 2 p−2 p p−2 X i 2i Qn(x) = −ap−2x + (−1) a2ix . i=0 p p There is also a relation between the coefficients of Pn and Qn given by j−1 ! X n + p − 2j 2j(2j − 1)c + a c + a c = 0 (93) 2j 2k 2(j−k−1) n p−2 2j−p k=0 n+p−2 for 1 ≤ j ≤ 2 where cn = 1, ck = 0 for k 6∈ {0, 2, 4, . . . , n} and ak = 0 for k 6∈ {0, 2, 4, . . . , p − 2}. Proof. This result is proved analogously to how (75) is found. Define n p Y Pn (x) = (x − xi) i=1 and note that p00 n 1 Pn (x) X 1 = . 2 p0 x − x Pn (x) i=1 i 110 113 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL Now apply the method of Lagrange multipliers and see that in the critical points n X 1 = ρR0(x ) x − x j i=1 j i i6=j where ρ is some unknown constant. In each critical point we can combine the two expressions and get p00 0 p0 Pn (xj) − 2ρR (xj)Pn (xj) = 0, j = 1, 2, . . . , n for some ρ ∈ R. Since each xj is a root of f(x) we see that the left hand side in the differential equation must be a polynomial with the same roots p as Pn (x), thus we can conclude that for any x ∈ R p00 0 p0 Pn (x) − 2ρR (x)Pn (x) − Q(x)f(x) = 0 where Q(x) is a polynomial of degree p − 2. By applying the principles of polynomial solutions to linear second order differential equation [10, 50], expanding the expression and matching the coefficients of the terms with different powers of x you can see that the coefficients of P (x) and Q(x) must obey the relation given in (93). Noting that the relations between the two sets of coefficients are lin- ear we will consider the equations given by (93) corresponding to j ∈ n n−2 n n+p−2 o 2 , 2 ,..., 2 , the corresponding system of equations in matrix form becomes p cn−2 cn−4 cn−6 ··· c4 n cn−p−2 a0 −n(n − 1) p−2 1 cn−2 cn−4 ··· c6 n cn−p a2 0 p−4 0 1 cn−2 ··· c8 cn−p+2 a4 0 n = . ...... . . ...... . . 4 0 0 0 ··· cn−2 n cn−4 ap−4 0 2 0 0 0 ··· 1 n cn−2 ap−2 0 (94) n+p−2 By solving this system we can reduce the 2 equations given by n−2 matching the terms to 2 equations that together with the condition given by (84) gives a system of polynomial equations that determines all the un- known coefficients of P (x). To describe how we can express the solution to (94) we will use a few well- known relations between elementary symmetric polynomials and power sums often referred to as the Newton–Girard formulae (Theorem 2.7), and Vieta’s formula (Theorem 2.6) that describes the relation between the coefficients of a polynomial and its roots. Here we will give some useful properties of elementary symmetric poly- nomials and power sums and relations between them. 111 114 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Definition 2.5. The elementary symmetric polynomials are defined by n X e1(x1, . . . , xn) = xi, i=1 X e2(x1, . . . , xn) = xi1 xi2 , 1≤i1 Theorem 2.6 (Vieta’s formula). Suppose x1, . . . , xn are the n roots of a polynomial n n−1 x + c1x + ... + cn. k Then ck = (−1) ek(x1, . . . , xn). Definition 2.6. A power sum is an expression of the form n X k pk(x1, . . . , xn) = xi . i=1 Theorem 2.7 (Newton–Girard formulae). The Newton–Girard formulae can be expressed in many ways. For us the most useful version is the de- terminantal expressions. Let ek = ek(x1, . . . , xn) and pk = pk(x1, . . . , xn) denote the elementary symmetric polynomials and the power sums as in Definitions 2.5 and 2.6. Then the power sum can be expressed in terms of elementary symmetric polynomials in this way e1 1 0 ··· 0 0 2e2 e1 1 ··· 0 0 3e3 e2 e1 ··· 0 0 p = . k ...... (p − 1)en−1 en−2 en−3 ··· e1 1 pen en−1 en−2 ··· e2 e1 Proof. See for example [198]. 112 115 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL Lemma 2.10. Using the following notation 2m cm cm−1 cm−2 ··· c2 n c1 2m−2 1 cm cm−1 ··· c3 n c2 2m−4 0 1 cm ··· c4 c3 t (c , c , . . . , c ) = n (95) n 1 2 m ...... 4 0 0 0 ··· cm n cm−1 2 0 0 0 ··· 1 n cm 2c and tn(c) = then tn can be written n p−1 X n(r1 + r2 + ··· + rn − 1) Y ri tn(c1, . . . , cp) = cp−i r1!r2! ··· rn! r1+2r2+3r3+···+nrn=n i=0 r1≥0, ..., rn≥0 and it obeys the recursive relation p 2p X t (c , . . . , c ) = c − c t (c , . . . , c ). n 1 p n 1 i+1 n p−i+2 p i=2 Proof. Comparing the expression for tn with the relations given in Theo- rem 2.7 it is clear that these relations are equivalent to the Newton-Girard formulae with some minor modifications. Lemma 2.11. For even n and p the condition (84) can be rewritten as −n tn(cn−p−2, cn−p, . . . , cn−2) = 1 where tn is defined by (95). n X p Proof. Note that the expression gp(x1, . . . , xn) = xi = 1 is a power sum. 1 By Theorem 2.7 the following relation holds: e1 1 0 ··· 0 0 2e2 e1 1 ··· 0 0 3e3 e2 e1 ··· 0 0 g (x) = p ...... (p − 1)en−1 en−2 en−3 ··· e1 1 pen en−1 en−2 ··· e2 e1 where ek is the k:th elementary symmetric polynomial of x1, ... , xn. Using Vieta’s formula we can relate the elementary symmetric polynomials to the coefficients of P (x) by noting that n 2 −1 n 2n X 2j X k n−k P (x) = x + c2jx = (−1) ekx j=1 k=1 or more compactly e2k = cn−2k. 113 116 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions With e2k = cn−2k and e2k+1 = 0 we get 0 1 0 ··· 0 0 2cn−2 0 1 ··· 0 0 0 cn−2 0 ··· 0 0 4cn−4 0 cn−2 ··· 0 0 gp(x) = ...... 0 cn−p−2 0 ··· 0 1 pcn−p 0 cn−p−2 ··· cn−2 0 Using Laplace expansion on every other row gives 0 1 0 0 ··· 0 0 2cn−2 1 0 ··· 0 0 2cn−2 0 1 0 ··· 0 0 0 0 1 ··· 0 0 0 cn−2 0 1 ··· 0 0 4cn−4 cn−2 0 ··· 0 0 g (x) = 4cn−4 0 cn−2 0 ··· 0 0 = − p ...... 0 0 c4 ··· 0 1 0 c2 0 c4 ··· 0 1 pc0 c2 0 ··· cn−2 0 pcn−p 0 c2 0 ··· cn−2 0 2cn−2 1 0 0 ··· 0 0 2cn−2 1 0 ··· 0 0 4cn−4 cn−2 1 0 ··· 0 0 4cn−4 cn−2 1 ··· 0 0 0 0 0 1 ··· 0 0 6cn−6 cn−4 cn−2 ··· 0 0 = = − ...... 0 0 0 0 ··· 0 1 (p − 2)c2 c4 c6 ··· cn−2 1 pc0 c2 c4 c6 ··· cn−2 0 pc0 c2 c4 ··· en−4 cn−2 p cp cp−1 cp−2 ··· c2 n c1 p−2 1 cp cp−1 ··· c3 n c2 p−4 0 1 cp ··· c4 n c3 p 2 = −n ...... = (−1) n tn(c2, c4, . . . , cp) ...... 4 0 0 0 ··· cp n cp−1 2 0 0 0 ··· 1 n cp Thus gp(x1, . . . , xn) = 1 is equivalent to −ntn(c2, c4, . . . , cp) = 1. Lemma 2.12. The coefficients of the polynomial Q(x) in (92) can be ex- pressed using the coefficients of P (x) as follows p a = (−1)k+1n2(n − 1)t (c , . . . , c ), k = 1, 2,..., . (96) 2k−2 n n−p+2k+2 n−2 2 114 117 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL Proof. By (94) we can write p −1 a0 cn−2 cn−4 cn−6 ··· cn−p−4 n cn−p−2 −n(n − 1) p−2 a2 1 cn−2 cn−4 ··· cn−p−6 n cn−p 0 p−4 a4 0 1 cn−2 ··· cn−p−8 cn−p+2 0 = n . . ...... . . ...... . 4 ap−4 0 0 0 ··· cn−2 n cn−4 0 2 ap−2 0 0 0 ··· 1 n cn−2 0 det(Tn,p,k) and using Cramer’s rule we get ap−2k = where tn(cn−p−2, . . . , cn−2) p cn−2 cn−4 ··· cn−2k+2 −n(n − 1) cn−2k−2 ··· n cn−p−2 p−2 1 cn−2 ··· cn−2k 0 cn−2k−4 ··· cn−p n 0 1 ··· c 0 c ··· p−4 c n−2k−2 n−2k−6 n n−p+2 Tn,p,k = ...... . ...... ··· . 4 0 0 ··· 0 0 0 ··· n cn−4 2 0 0 ··· 0 0 0 ··· n cn−2 | {z } M By moving the kth column to the first column and using Laplace expansion det(Tk) can be rewritten on the form 1 cn−2 ··· cn−2k 0 1 ··· cn−2k−2 ...... 0 0 ··· 1 k det(Tn,p,k) =(−1) n(n − 1) 0 0 ··· 0 M = −n(n − 1)|M| 0 0 ··· 0 ...... 0 0 ··· 0 0 0 ··· 0 p−2k cn−2 ··· cn−p+2k cn−p+2k+2 n 1 ··· c p−2k−2 c n−p+2k+2 n n−p+2k+4 . . . . = − n(n − 1) . .. . . 4 0 ··· cn−2 n cn−4 2 0 ··· 1 n cn−2 k =(−1) n(n − 1)tn(cn−p+2k+2, . . . , cn−2) −1 We can use Lemma 2.11 to see that t (c , . . . , c ) = and thus n n−p−2 n−2 n det(Tn,p,k) k+1 2 ap−2k = = (−1) n (n − 1)tn(cn−p+2k+2, . . . , cn−2) tn(cn−p−2, . . . , cn−2) 115 118 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions p Theorem 2.8. The non-zero coefficients, c2k, in Pn that solves (92) can be found by solving the polynomial equation system given by j−1 ! X p−2k+1 2 2j(2j − 1)c2j + (−1) n (n − 1)tn(cn−2k+2, . . . , cn−2) k=0 + n(n − 1)(n + p − 2j)tn(cn−p+4, . . . , cn−2) = 0, n for j = 0,..., 2 − 1. Proof. The equation system is found by using (96) to substitute ak in (93). Using Lagrange multipliers directly gives a polynomial equation system n with n equations while Theorem 2.8 gives 2 equations. As an example we can consider the case n = 8, p = 4. Matching the coefficients for (92) gives the system a c + 2c = 0, 0 0 0 a0c2 + a2c0 + 12c4 = 0, 3 30c + a c + a c = 0, 6 0 4 4 2 2 1 56 + a c + a c = 0, 0 6 2 2 4 1 a + a c = 0, 0 4 2 6 7 2 and rewriting the constraint that the points lie on S4 gives 2c6 − 4c4 = 0. In this case the expressions for a0 and a2 becomes quite simple ( a0 = −112c6, a2 = 448. By resubstituting the expressions into the system, or using Theorem 2.8 directly an equation systems for the c0, c2, c4 and c6 is given by 112c0c6 + 2c0 = 0, −112c2c6 + 448c0 + 12c4 = 0, −112c c + 332c + 30c = 0, 4 6 2 6 2 −2c6 + 4c4 + 1 = 0. The authors are not aware of any method that can be used to easily and reliably solve the system given by Theorem 2.8. In Table 2.2 results for a number of systems, both with even and odd n and various values for p are given. These were found by manually experimentation combined with computer aided symbolic computations. 116 119 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL n = 2 √ 2 3 2 2 1 2 2 1 2 2 2 3 2 2 2 4 P2 (x) = x − 2 , P4 (x) = x − 2 2, P6 (x) = x − 2 , P8 (x) = x − 2 n = 3 √ 2 3 3 3 1 4 3 1 3 3 2 3 3 2 2 4 P2 (x) = x − 2 x, P3 (x) = x − 2 2x, P6 (x) = x − 2 x, P8 (x) = x − 2 n = 4 √ P 4(x) = x4 − 1 x2 + 1 , P 4(x) = x4 − 6 x2 + 1 , 2 2 √ 48 4 √3 √12 4 4 1 1 2 1 2 P (x) = x − ( 33 + 1) 3 x + 9 − 33 ( 33 + 1) 3 6 √4 96 √ 1 √ p √ 4 4 3 4 2 1 P8 (x) = x − 6 (30 5 − 30) x + 120 5 − 5 30 5 − 30 n = 5 √ 1 2 P 5(x) = x5 − 1 x, P 5(x) = x5 − 2 5 x3 + 3 x, P 5(x) = x5 − 10 3 x3 + 10 3 x 2 √4 4 5 20 6 2 20 √ 1 √ p √ 5 5 10 4 3 1 P8 (x) = x − 10 (50 13 + 10) x + 1800 5 13 − 55 50 13 + 10 n = 6 6 6 1 4 1 2 1 P2 (x) = x − 2 x + 20 x − 1800 √ √ √ √ √ √ 6 6 50+20 5 4 5 2 (−4+2 5) 50+20 5 P4 (x) = x − 10 x + 10 x − 600 n = 7 7 7 1 5 5 3 5 P2 (x) = x − 2 x + 84 x − 3528 √ √ √ √ √ √ 7 7 1050+84 109 5 1 109 3 (−16+2 109) 105+84 109 P4 (x) = x − 42 x + 21 + 42 x − 10584 n = 8 8 8 1 6 15 4 15 2 15 P2 (x) = x − 2 x + 224 x − 6272 x + 1404928 , √ √ √ 8 8 140+42 6 6 3 3 6 4 P4 (x) = x − 14 x + 28 + 28 x √ 3 √ √ √ −(140+42 6) 2 29 140+42 6 2 3 6 − 16464 + 2352 x − 3136 + 1568 n Table 2.2: Polynomials, Pp , whose roots give the coordinates of the extreme points of the Vandermonde determinant on the sphere defined by the p-norm in n dimensions. 117 120 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 2.3.6 Some results for cubes and intersections of planes n−1 It can be noted that when p → ∞ then Sp as defined in the previous section will converge towards the cube. A similar technique to the described technique for surfaces implicitly defined by a univariate polynomial can be employed on the cube. The maximum value for the Vandermonde determinant on the cube [−1, 1]n has been known for a long time (at least since [90]). Here we will show a short derivation. Theorem 2.9. The coordinates of the critical points of vn(x) on the cube n xn ∈ [−1, 1] are given by x1 = −1, xn = 1 and xi equal to the ith root of Pn−2(x) where Pn are the Legendre polynomials n X n n+k−1 P (x) = 2n xk 2 n k n k=0 or some permutation of them. Proof. It is easy to show that the coordinates −1 and +1 must be present in the maxima points, if they were not then we could rescale the point so that the value of vn(x) is increased, which is not allowed. We may thus assume the ordered sequence of coordinates −1 = x1 < ··· < xn = +1. The Vandermonde determinant then becomes n−1 Y Y vn(x) = 2 (1 + xi)(1 − xi) (xj − xi). i=2 1 118 121 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL and thus the left hand side of the expression must form a polynomial that can be expressed as some multiple of f(x) (1 − x2)f 00(x) − 2xf 0(x) − σf(x) = 0. (97) The constant σ is found by considering the coefficient for xn−2: (n − 2)(n − 3) + 2(n − 2) − σ = 0 ⇔ σ = (n − 2)(n − 1). This gives us the differential equation that defines the Legendre polynomial Pn−2(x) [2]. The technique above can also easily be used to find critical points on the intersection of two planes given by x1 = a and xn = b, b > a. Theorem 2.10. The coordinates of the critical points of vn(x) on the in- tersection of two planes given by x1 = a and xn = b are given by xn−1 = a, x−a xn = b and xi is the ith root of Pn−2 b−a where Pn are the Legendre polynomials n X n n+k−1 P (x) = 2n xk 2 n k n k=0 or some permutation of them. Proof. We assume the ordered sequence of coordinates −1 = x1 < ··· < xn = +1. The Vandermonde determinant then becomes n−1 Y Y vn(x) = (b − a) (a − xi)(b − xi) (xj − xi). i=2 1 119 122 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions and thus the left hand side of the expression must form a polynomial that can be expressed as some multiple of f(x) (x − a)(x − b)f 00(x) + (2x − a − b)f 0(x) − σf(x) = 0. The constant σ is found by considering the coefficient for xn−2: (n − 2)(n − 3) + 2(n − 2) − σ = 0 ⇔ σ = (n − 2)(n − 1). The resulting differential equation is (x − a)(x − b)f 00(x) + (2x − a − b)f 0(x) − (n − 2)(n − 1)f(x) = 0. x−a If we change variables according to y = b−a and let g(y) = f(y(b − a) + a) then the differential equation becomes y(y − 1)g00(y) + (2y − 1)g0(y) − (n − 1)(n − 2)g(y) = 0 which we can recognize as a special case of Euler’s hypergeometric differen- tial equation whose solution can be expressed as g(y) = c ·2 F1(1 − n, n + 2; 1; y), for some arbitrary c ∈ R, where 2F1 is the hypergeometric function [2]. In this case the hypergeometric function is a polynomial and relates to the Legendre polynomials as follows 2F1(1 − n, n + 2; 1; y) = n!Pn−2(y) x−a thus it is sufficient to consider the roots of Pn−2 b−a . 2.3.7 Optimising the probability density function of the eigenvalues of the Wishart matrix This section is based on Section 5 of Paper D Here we will show an example of how the results in Section 2.2 can be applied to find the extreme points of the eigenvalue distribution of the ensembles discussed in Section 1.1.7. Lemma 2.13. Suppose we have a Wishart distributed matrix W with the probability density function of its eigenvalues given by n ! β X (λ) = C v (λ)m exp − P (λ ) (98) P n n 2 k k=1 where Cn is a normalising constant, m is a positive integer, β > 1 and P is a polynomial with real coefficients. Then the vector of eigenvalues of W will lie on the surface defined by n X P (λk) = Tr(P (W)). (99) k=1 120 123 2.3 OPTIMIZATION OF THE VANDERMONDE DETERMINANT ON SURFACES DEFINED BY A UNIVARIATE POLYNOMIAL Proof. Since W is symmetric by Lemma 1.2 then it will also have real eigen- values. By Lemma 1.1 n X P (λk) = Tr(P (W)) k=1 and thus the point given by λ = (λ1, λ2, . . . , λn) will be on the surface defined by n X P (λk) = Tr(P (W)). k=1 To find the maximum values we can use the method of Lagrange multi- pliers and find eigenvectors such that n ! ∂P ∂ X dP (λk) = η Tr(P (W)) − P (λk) = −η , k = 1, . . . , n, ∂λk ∂λk dλk k=1 where η is some real-valued constant. Computing the left-hand side gives (β) n ∂P β dP (λk) X m = P(λ) − + . ∂λk 2 dλk λk − λi i=1 i6=k Thus the stationary points of (98) on the surface given by (99) are the solution to the equation system n β dP (λk) X m dP (λk) P(λ) − + = −η , k = 1, . . . , n. 2 dλk λk − λi dλk i=1 i6=k If we denote the value of P in a stationary point with Ps then the system above can be rewritten as n X 1 1 β η dP (λk) dP (λk) = − = ρ , k = 1, . . . , n. (100) λk − λi m 2 Ps dλk dλk i=1 i6=k The equation system described by (100) appears when one tries to op- timize the Vandermonde determinant on a surface defined by a univariate polynomial. This equation system can be rewritten as an ordinary differen- tial equation. For more details see Section 2.3 Consider the polynomial n Y f(λ) = (λ − λi) i=1 121 124 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions and note that 00 n 1 f (λj) X 1 = . 2 f 0(λ ) λ − λ j i=1 j i i6=j Thus in each of the extreme points we will have the relation 2 d f dP df − 2ρ = 0, j = 1, 2, . . . , n dλ2 dλ dλ λ=λj λ=λj λ=λj for some ρ ∈ R. Since each λj is a root of f(λ) we see that the left hand side in the differential equation must be a polynomial with the same roots as f(λ), thus we can conclude that for any λ ∈ R d2f dP df − 2ρ − Q(λ)f(λ) = 0 (101) dλ dλ dλ where Q is a polynomial of degree (deg(p) − 2). Consider the β ensemble described by (16). For this ensemble the poly- nomial that defines the surface that the eigenvalues will be on is p(λ) = λ2. Thus by Lemma 2.13 the surface becomes a sphere with radius pTr(W2). The solution to the equation system given by (100) was found in Section 2.2. The solution is given as the roots of a polynomial, in this case the solution can be written as the roots of the rescaled Hermite polynomials, the explicit expression for the polynomial whose roots give the maximum points is 1 ! n − 1 2 (x + r1) f(x) = Hn 2 2(r1 − 2r0) 2 n n−2i b 2 c i n−2i X (−1) n − 1 2 (x + r1) = n! (102) i! 2(r2 − 2r ) (n − 2i)! i=0 1 0 where Hn denotes the nth (physicist) Hermite polynomial [2]. The solution on the unit sphere can then be used to find the vector of eigenvalues that maximizes the probability density function P(λ) given by (16). Since rescaling the vector of eigenvalues affects the probability density depending on the length of the original vector in the following way n(n−1)m β 2 2 (cλ) = c 2 exp (1 − c )|λ| (λ) P 2 P the unit sphere solution can be rescaled so that it ends up on the appropriate sphere. 122 125 Chapter 3 Approximation of electrostatic discharge currents using the analytically extended function This chapter is based on Papers E, F and G Paper E Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. On some properties of the multi-peaked analytically extended function for approximation of lightning discharge currents. Chapter 10 in Engineering Mathematics I: Electromagnetics, Fluid Mechanics, Material Physics and Financial Engineering, Volume 178 of Springer Proceedings in Mathematics & Statistics, Sergei Silvestrov and Milica Ranˇci´c(Eds), Springer International Publishing, pages 151–176, 2016. Paper F Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. Estimation of parameters for the multi-peaked AEF current functions. Methodology and Computing in Applied Probability, Volume 19, Issue 4, pages 1107 – 1121, 2017. Paper G Karl Lundeng˚ard,Milica Ranˇci´c,Vesna Javor and Sergei Silvestrov. Electrostatic discharge currents representation using the analytically extended function with p peaks by interpolation on a D-optimal design. Facta Universitatis Series: Electronics and Energetics, Volume 32, Issue 1, pages 25 – 49, 2019. 126 127 3.1. THE ANALYTICALLY EXTENDED FUNCTION (AEF) 3.1 The analytically extended function (AEF) In this section we consider least square approximation using a particular function we call the power-exponential function, as a basic component. Definition 3.1. Here we will refer to the function defined by (103) as the power-exponential function, β x(β; t) = te1−t , 0 ≤ t. (103) For non-negative values of t and β the power-exponential function has a steeply rising initial part followed by a more slowly decaying part, see Figure 3.1. This makes it qualitatively similar to several functions that are popular for approximation of important phenomena in different fields such as approximation of lightning discharge currents and pharmacokinetics. Examples include the biexponential function [38], [256], the Heidler function [117] and the Pulse function [299]. Figure 3.1: An illustration of how the steepness of the power exponential func- tion varies with β. The power-exponential function has been used in other applications, for example to model attack rate of predatory fish, see [232, 233]. Here we examine linear combinations of piecewise power exponential functions that will be used in later sections to approximate electrostatic discharge current functions. 125 128 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 3.1.1 The p-peak analytically extended function This section is based on Section 2 of Paper E The p -peaked AEF is constructed using the power exponential function given in Definition 3.1. In order to get a function with multiple peaks and where the steepness of the rise between each peak as well as the slope of the decaying part is not dependent on each other, we define the analyti- cally extended function (AEF) as a function that consist of piecewise linear combinations of the power exponential function that has been scaled and translated so that the resulting function is continuous. With a given differ- ence in height between subsequent peaks Im1 , Im2 ,..., Imp , corresponding times tm1 , tm2 ,..., tmp , integers nq > 0, real values βq,k, ηq,k, 1 ≤ q ≤ p+1, 1 ≤ k ≤ nq such that the sum over k of ηq,k is equal to one, the p -peaked AEF i(t) is given by (104). Definition 3.2. Given Imq ∈ R and tmq ∈ R, q = 1, 2, . . . , p such that tm0 = 0 < tm1 < tm2 < . . . < tmp along with ηq,k, βq,k ∈ R and 0 < nq ∈ Z nq X for q = 1, 2, . . . , p + 1, k = 1, 2, . . . , nq such that ηq,k = 1. k=1 The analytically extended function (AEF), i(t), with p peaks is defined as q−1 ! nq 2 X X βq,k+1 Imk +Imq ηq,kxq(t) , tmq−1 ≤ t ≤ tmq , 1≤q ≤p, i(t)= k=1 k=1 (104) p ! np+1 X X β2 I η x (t) p+1,k , t ≤ t, mk p+1,k p+1 mp k=1 k=1 where t − t t − t mq−1 mq exp , 1 ≤ q ≤ p, ∆tmq ∆tmq xq(t) = t t exp 1 − , q = p + 1, tmq tmq and ∆tmq = tmq − tmq−1 . Sometimes the notation i(t; β, η) with β = β1,1 β1,2 . . . βq,k . . . βp+1,np+1 , η = η1,1 η1,2 . . . ηq,k . . . ηp+1,np+1 , will be used to clarify what the particular parameters for a certain AEF are. Remark 3.1. The p -peak AEF can be written more compactly if we intro- 126 129 3.1. THE ANALYTICALLY EXTENDED FUNCTION (AEF) duce the vectors > ηq = [ηq,1 ηq,2 . . . ηq,nq ] , (105) h 2 2 2 i> β +1 β +1 βq,n +1 xq(t) q,1 xq(t) q,2 . . . xq(t) q , 1 ≤ q ≤ p, xq(t) = (106) h 2 2 2 i> β β βq,n xq(t) q,1 xq(t) q,2 . . . xq(t) q , q = p + 1. The more compact form is q−1 ! X I + I · η>x (t), t ≤ t ≤ t , 1 ≤ q ≤ p, mk mq q q mq−1 mq k=1 i(t) = q ! (107) X I · η>x (t), t ≤ t, q = p + 1. mk q q mq k=1 If the AEF is used to model an electrical current, than the derivative of the AEF determines the induced electrical voltage in conductive loops in the lightning field. For this reason it is desirable to guarantee that the first derivative of the AEF is continuous. Since the AEF is a linear function of elementary functions its derivative can be found using standard methods. Theorem 3.1. The derivative of the p -peak AEF is t − t x (t) mq q > Imq ηq Bq xq(t), tmq−1 ≤ t ≤ tmq , 1 ≤ q ≤ p, di(t) t − tm ∆tm = q−1 q dt x (t) tm − t I q q η>B x (t), t ≤ t, q = p + 1, mq q q q mq t tmq (108) where 2 βq,1 + 1 0 ... 0 2 0 βq,2 + 1 ... 0 B = , q . . .. . . . . . 2 0 0 . . . βq,nq + 1 2 βp+1,1 0 ... 0 0 β2 ... 0 p+1,2 Bp+1 = . . .. . , . . . . 0 0 . . . β2 p+1,np+1 for 1 ≤ q ≤ p. Proof. From the definition of the AEF (see (104)) and the derivative of the power exponential function (103) given by d x(β; t) = β(1 − t)tβ−1eβ(1−t), dt 127 130 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions expression (108) can easily be derived since differentiation is a linear oper- ation and the result can be rewritten in the compact form analogously to (107). Illustration of the AEF function and its derivative for various values of βq,k-parameters is shown in Figure 3.2. Figure 3.2: Illustration of the AEF (solid line) and its derivative (dashed line) with different βq,k-parameters but the same Imq and tmq . (a) 0 < βq,k < 1, (b) 4 < βq,k < 5, (c) 12 < βq,k < 13, (d) a mixture of large and small βq,k- parameters. Lemma 3.1. The AEF is continuous and at each tmq the derivative is equal to zero. Proof. Within each interval tmq−1 ≤ t ≤ tmq the AEF is a linear combination of continuous functions and at each tmq the function will approach the same value from both directions unless all ηq,k ≤ 0, but if all ηq,k ≤ 0 then nq X ηq,k 6= 1. k=1 Noting that for any diagonal matrix B the expression nq 2 > X βq,k+1 ηq B xq(t) = ηq,kBkkxq(t) , 1 ≤ q ≤ p, k=1 128 131 3.1. THE ANALYTICALLY EXTENDED FUNCTION (AEF) is well-defined and that the equivalent statement holds for q = p and consid- ering (108) it is easy to see that the factor (tmq −t) in the derivative ensures that the derivative is zero every time t = tmq . When interpolating a waveform with p peaks it is natural to require that there will not appear new peaks between the chosen peaks. This corresponds to requiring monotonicity in each interval. One way to achieve this is given in Lemma 3.2. Lemma 3.2. If ηq,k ≥ 0, k = 1, . . . , nq the AEF, i(t), is strictly monotonic on the interval tmq−1 < t < tmq . Proof. The AEF will be strictly monotonic in an interval if the derivative has the same sign everywhere in the interval. That this is the case follows > from (108) since every term in ηq Bq xq(t) is non-negative if ηq,k ≥ 0, k = 1, . . . , nq, so the sign of the derivative it determined by Imq . If we allow some of the ηq,k-parameters to be negative, the derivative can change sign and the function might get an extra peak between two other peaks, see Figure 3.3. Figure 3.3: An example of a two-peaked AEF where some of the ηq,k- parameters are negative, so that it has points where the first deriva- tive changes sign between two peaks. The solid line is the AEF and the dashed lines is the derivative of the AEF. The integral of the electric current represents the charge flow. Unlike the Heidler function the integral of the AEF is relatively straightforward to find. How to do this is detailed in Lemma 3.3, Lemma 3.4, Theorem 3.2, and Theorem 3.3. Lemma 3.3. For any tmq−1 ≤ t0 ≤ t1 ≤ tmq , 1 ≤ q ≤ p, Z t1 β β e t1 − tmq t0 − tmq xq(t) dt = β+1 ∆γ β + 1, , (109) t0 β β∆tmq β∆tmq 129 132 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions with ∆tmq = tmq − tmq−1 and ∆γ(β, t0, t1) = γ (β + 1, βt1) − γ (β + 1, βt0) , where Z t γ(β, t) = τ β−1e−τ dτ 0 is the lower incomplete Gamma function [2]. If t0 = tmq−1 and t1 = tmq then Z tmq eβ x (t)β dt = γ (β + 1, β) . (110) q ββ+1 tmq−1 Proof. Z t1 Z t1 β β t − tmq t − tmq xq(t) dt = exp 1 − dt t0 t0 ∆tmq ∆tmq β−1 Z t1 β e t − tmq t − tmq = β β exp 1 − β dt. β t0 ∆tmq ∆tmq t−t Changing variables according to τ = β mq gives ∆tmq Z t1 β Z τ1 β e β −τ xq(t) dt = β+1 τ e dt = t0 β τ0 eβ = (γ(β + 1, τ ) − γ(β + 1, τ )) ββ+1 1 0 eβ = ∆γ(β + 1, τ , τ ) ββ+1 1 0 β e t1 − tmq t0 − tmq = β+1 ∆γ β + 1, β , β . β ∆tmq ∆tmq When t0 = tmq−1 and t1 = tmq then Z t1 β β e xq(t) dt = β+1 ∆γ (β + 1, β) t0 β and with γ(β + 1, 0) = 0 we get (110). Lemma 3.4. For any tmq−1 ≤ t0 ≤ t1 ≤ tmq , 1 ≤ q ≤ p, q−1 ! nq Z t1 X X i(t) dt = (t1 − t0) Imk + Imq ηq,k gq(t1, t0), (111) t0 k=1 k=1 130 133 3.1. THE ANALYTICALLY EXTENDED FUNCTION (AEF) where β2 e q,k t1 − tm t0 − tm g (t , t ) = ∆γ β2 + 2, q−1 , q−1 q 1 0 β2 +1 q,k 2 q,k ∆tmq ∆tmq βq,k + 1 with ∆γ(β, t0, t1) defined as in (109). Proof. t t q−1 ! nq Z 1 Z 1 2 X X βq,k+1 i(t) dt = Imk + Imq ηq,kxq(t) dt t0 t0 k=1 k=1 q−1 ! nq t Z 1 2 X X βq,k+1 = (t1 − t0) Imk + Imq ηq,k xq(t) dt k=1 k=1 t0 q−1 ! nq X X = (t1 − t0) Imk + Imq ηq,k gq(t0, t1). k=1 k=1 Theorem 3.2. If tma−1 ≤ ta ≤ tma , tmb−1 ≤ tb ≤ tmb and 0 ≤ ta ≤ tb ≤ tmp then a−1 ! n Z tb X Xa i(t) dt = (tma − ta) Imk + Ima ηa,k ga(ta, tma ) ta k=1 k=1 b−1 q−1 ! nq ! X X X 2 + ∆tmq Imk + Imq ηq,k gˆ βq,k + 1 q=a+1 k=1 k=1 b−1 ! n X Xb + (tb − tmb ) Imk + Imb ηb,k gb(tmb , tb), (112) k=1 k=1 where gq(t0, t1) is defined as in Lemma 3.4 and eβ gˆ(β) = γ (β + 1, β) . ββ+1 Proof. This theorem follows from integration being linear and Lemma 3.4. Theorem 3.3. For tmp ≤ t0 < t1 < ∞ the integral of the AEF is p ! np+1 Z t1 X X i(t) dt = Imk ηp+1,k gp+1(t1, t0), (113) t0 k=1 k=1 where gq(t0, t1) is defined as in Lemma 3.4. 131 134 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions When t0 = tmp and t1 → ∞ the integral becomes Z ∞ p ! np+1 X X 2 i(t) dt = Imk ηp+1,k g˜ βp+1,k , (114) tmp k=1 k=1 where eβ g˜(β) = (Γ(β + 1) − γ (β + 1, β)) ββ+1 with Z ∞ Γ(β) = tβ−1e−t dt 0 is the Gamma function [2]. Proof. This theorem follows from integration being linear and Lemma 3.4. In the next section we will estimate the parameters of the AEF that gives the best fit with respect to some data and for this the partial derivatives with respect to the βmq parameters will be useful. Since the AEF is a linear function of elementary functions these partial derivatives can easily be found using standard methods. Theorem 3.4. The partial derivatives of the p-peak AEF with respect to the β parameters are 0, 0 ≤ t ≤ t , mq−1 ∂i β2 +1 = 2 I η β h (t)x (t) q,k , t ≤ t ≤ t , 1 ≤ q ≤ p, ∂β mq q,k q,k q q mq−1 mq q,k 0, tmq ≤ t, (115) ( ∂i 0, 0 ≤ t ≤ tmp , = β2 ∂β p+1,k p+1,k 2 Imp+1 ηp+1,k βp+1,k hp+1(t)xp+1(t) , tmp ≤ t, (116) where t − t t − t mq−1 mq−1 ln − + 1, 1 ≤ q ≤ p, ∆tmq ∆tmq hq(t) = t t ln − + 1, q = p + 1. tmq tmq Proof. Since the βq,k parameters are independent, differentiation with re- spect to βq,k will annihilate all terms but one in each linear combination. The expressions (115) and (116) then follow from the standard rules for differentiation of composite functions and products of functions. 132 135 3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS 3.2 Approximation of lightning discharge current functions This section is based on Section 3 of Paper F Many different types of systems, objects and equipment are susceptible to damage from lightning discharges. Lightning effects are usually anal- ysed using lightning discharge models. Most of the engineering and electro- magnetic models imply channel-base current functions. Various single and multi-peaked functions are proposed in the literature for modelling lightning channel-base currents, examples include [117, 118, 140, 141, 146]. For engi- neering and electromagnetic models, a general function that would be able to reproduce desired waveshapes is needed, such that analytical solutions for its derivatives, integrals, and integral transformations, exist. A multi- peaked channel-base current function has been proposed by Javor [140] as a generalization of the so-called TRF (two-rise front) function from [141], which possesses such properties. In this paper we analyse a modification of such multi-peaked function, a so-called p -peak analytically extended function (AEF). The possibility of application of the AEF to approximation of various multi-peaked wave- shapes is investigated. Estimation of its parameters has been performed using the Marquardt least squares method (MLSM), an efficient method for the estimation of non-linear function parameters, see Section 1.2.6. It has been applied in many fields, including lightning research, e.g. for optimizing parameters of the Heidler function [178], or the Pulse function [181, 182]. Some numerical results are presented, including those for the Standard IEC 62305 [133] current of the first-positive strokes, and an example of a fast- decaying lightning current waveform. Fitting a p-peaked AEF to recorded current data (from [257]) is also illustrated. 3.2.1 Fitting the AEF Suppose that we have kq points (tq,k, iq,k) ordered with respect to tq,k, tmq−1 < tq,1 < tq,2 < . . . < tq,kq < tmq , and we wish to choose parame- ters ηq,k and βq,k such that the sum of the squares of the residuals, kq X 2 Sq = (i(tq,k) − iq,k) , (117) k=1 is minimized. One way to estimate these parameters is to use the Marquardt least squares method described in Section 1.2.6. In order to fit the AEF it is sufficient that kq ≥ nq. Suppose we have some estimate of the β-parameters which is collected in the vector b. It is then fairly simple to calculate an estimate for the η-parameters, see Section 3.2.4, 133 136 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions which we collect in h. We define a residual vector by (e)k = i(tq,k; b, h)−iq,k where i(t; b, h) is the AEF with the estimated parameters. The Jacobian matrix, J, can in this case be described as ∂i ∂i ... ∂i ∂βq,1 t=t ∂βq,2 t=t ∂βq,nq t=t q,1 q,1 q,1 ∂i ∂i ... ∂i ∂βq,1 ∂βq,2 ∂βq,nq J = t=tq,2 t=tq,2 t=tq,2 (118) . . . . . . .. . ∂i ∂i ... ∂i ∂βq,1 ∂βq,2 ∂βq,nq t=tq,kq t=tq,kq t=tq,kq where the partial derivatives are given by (115) and (116). 3.2.2 Estimating parameters for underdetermined systems This section is based on Section 3.2 of Paper E For the Marquardt least squares method to work at least one data point per unknown parameter is needed, m ≥ k. It can still be possible to estimate all unknown parameters if there is insufficient data, m < k if we know some further relations between the parameters. Suppose that k − m = p and let γj = βm+j, j = 1, 2, ··· , p. If there are at least p known relations between the unknown parameters such that γj = γj(β1, ··· , βm) for j = 1, 2, ··· , p then the Marquardt least squares method can be used to give estimates on β1, ··· , βm and the still unknown parameters can be estimated from these. Denoting the estimated parameters b = (b1, ··· , bm) and c = (c1, ··· , cp) the following algorithm can be used: • Input: v > 1 and initial values b(0), λ(0). • r = 0 / Find c(r) using b(r) together with extra relations. • Find b(r+1) and δ(r) using MLSM. • Check chosen termination condition for MLSM, if it is not satisfied go to /. • Output: b, c. The algorithm is illustrated in Figure 3.4. 134 137 3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS Input: choose v and r = 0 initial values for b(0) and λ(0) Find b(r+1) and δ(r) Find h(r) using b(r) using MLSM together with extra relations Termination condition NO r = r + 1 satisfied YES Output: b, h Figure 3.4: Schematic description of the parameter estimation algorithm. 3.2.3 Fitting with data points as well as charge flow and specific energy conditions By considering the charge flow at the striking point, Q0, unitary resistance R and the specific energy, W0, we get two further conditions: Z ∞ Q0 = i(t) dt, (119) 0 Z ∞ 2 W0 = i(t) dt. (120) 0 First we will define Z ∞ Q(b, h) = i(t; b, h) dt 0 Z ∞ W (b, h) = i(t; b, h)2 dt. 0 These two quantities can be calculated as follows. Theorem 3.5. p q−1 ! nq ! X X X 2 Q(b, h) = ∆tmq Imk + Imq ηq,k gˆ(βq,k + 1) q=1 k=1 k=1 p ! np+1 X X 2 + Imk ηp+1,k g˜(βp+1,k), (121) k=1 k=1 135 138 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 2 p q−1 ! q−1 ! nq X X X X 2 W (b, h) = Imk + Imk Imq ηq,k gˆ(βq,k + 1) q=1 k=1 k=1 k=1 nq 2 X 2 2 + Imq ηq,k gˆ 2 βq,k + 2 k=1 nq−1 nq 2 X X 2 2 + 2 Imq ηq,r ηq,s gˆ βq,r + βq,s + 2 r=1 s=r+1 p !2 np X X 2 2 + Imk ηp,k g˜ 2 βp,k k=1 k=1 np+1−1 np+1 X X 2 2 + 2 ηp+1,r ηp+1,s g˜ βp+1,r + βp+1,s r=1 s=r+1 (122) where gˆ(β) and g˜(β) are defined in Theorems 3.2 and 3.3. Proof. Formula (121) is found by combining (112) and (113). Formula (122) is found by noting that n !2 n n−1 n X X 2 X X ak = ak + ar as k=1 k=1 r=1 s=r+1 and then reasoning analogously to the proofs for (112) and (113). We can calculate the charge flow and specific energy given by the AEF with formulas (121) and (122), respectively, and get two additional residual terms EQ0 = Q(b, h) − Q0 and EW0 = W (b, h) − W0. Since these are global conditions this means that the parameters η and β no longer can be fitted separately in each interval. This means that we need to consider all data points simultaneously. The resulting J-matrix is J1 ... 0 . .. . . . . J = 0 ... Jp+1 (123) ∂E ∂E ∂E ∂E Q0 ... Q0 ... Q0 ... Q0 ∂β1,1 ∂β1,n1 ∂βp+1,1 ∂βp+1,np+1 ∂E ∂E ∂E ∂E W0 ... W0 ... W0 ... W0 ∂β1,1 ∂β1,n1 ∂βp+1,1 ∂βp+1,np+1 136 139 3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS where ∂i ∂i ... ∂i ∂βq,1 t=t ∂βq,2 t=t ∂βq,nq t=t q,1 q,1 q,1 ∂i ∂i ... ∂i ∂βq,1 ∂βq,2 ∂βq,nq J = t=tq,2 t=tq,2 t=tq,2 q . . . . . . .. . ∂i ∂i ... ∂i ∂βq,1 ∂βq,2 ∂βq,nq t=tq,kq t=tq,kq t=tq,kq and the partial derivatives in the last two rows are given by dˆg 2 Imq ηq,s βq,s , 1 ≤ q ≤ p, dβ 2 ∂Q β=βq,s+1 = d˜g ∂βq,s 2 Imp ηp+1,s βp+1,s , q = p + 1. dβ 2 β=βp+1,s For 1 ≤ q ≤ p q−1 ! ∂W X dˆg = 2 Imk Imq ηq,s βq,s ∂βq,s dβ 2 k=1 β=βq,s+1 nq 2 dˆg X dˆg + 4 Imq ηq,sβq,s ηq,s + ηq,k dβ 2 dβ 2 2 β=2βq,s+2 k=1 β=βq,s+βq,k+2 k6=s and p ! ∂W X = 4 Imk ηp+1,sβp+1,s ∂βp+1,s k=1 nq d˜g X d˜g ηp+1,s + ηp+1,k . dβ 2 dβ 2 2 β=2βp+1,s k=1 β=βp+1,s+βp+1,k k6=s The derivatives ofg ˆ(β) andg ˜(β) are dˆg 1 eβ = 1 + Γ(β + 1) Ψ(β) − ln(β) − G(β) , (124) dβ e ββ d˜g 1 eβ = G(β) − 1 , (125) dβ e ββ where Γ(β) is the Gamma function, Ψ(β) is the digamma function, see for example [2], and G(β) is a special case of the Meijer G-function and can be defined as 3,0 1, 1 G(β) = G β 2,3 0, 0, β + 1 137 140 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions using the notation from [236]. When evaluating this function it might be more practical to rewrite G using other special functions β+1 3,0 1, 1 β G(β) = G β = 2F2(β + 1, β + 1; β + 2, β + 2; −β) 2,3 0, 0, β + 1 (β + 1)2 π csc (πβ) − Ψ(β) + π cot(πβ) + ln(β) Γ(−β) where ∞ X (β + 1)2 F (β + 1, β + 1; β + 2, β + 2; −β) = (−1)kβk 2 2 (β + k + 1)2 k=0 is a special case of the hypergeometric function. These partial derivatives were found using software for symbolic computation [200]. Note that all η-parameters must be recalculated for each step and how this is done is detailed in Section 3.2.4. 3.2.4 Calculating the η-parameters from the β-parameters Suppose that we have nq − 1 points (tq,k, iq,k) such that tmq−1 < tq,1 < tq,2 < . . . < tq,nq−1 < tmq . For an AEF that interpolates these points it must be true that q−1 nq X X βq,s Imk + Imq ηq,sxq(tq,k) = iq,k, k = 1, 2, . . . , nq − 1. (126) k=1 s=1 Since ηq,1 + ηq,2 + ... + ηq,nq = 1 equation (126) can be rewritten as n −1 q−1 q X βq,s βq,nq βq,nq X Imq ηq,s xq(tq,k) − xq(tq,k) = iq,k − xq(tq,k) − Ims s=1 s=1 (127) for k = 1, 2, . . . , nq − 1. This can be written as a matrix equation ˜ ˜ Imq Xqη˜q = iq, (128) q−1 > ˜ βq,nq X η˜q = ηq,1 ηq,2 . . . ηq,nq−1 , iq = iq,k − xq(tq,k) − Ims , k s=1 βq,s βq,n X˜q =x ˜q(k, s) = xq(tq,k) − xq(tq,k) q , k,s and xq(t) given by (105). When all βq,k, k = 1, 2, . . . , nq are known then ηq,k, k = 1, 2, . . . , nq − 1 can nq−1 X be found by solving equation (128) and ηq,nq = 1 − ηq,k. k=1 138 141 3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS If we have kq > nq − 1 data points then the parameters can be estimated with the least squares solution to (128), more specifically the solution to 2 ˜ > ˜ ˜ >˜ Imq Xq Xqη˜q = Xq iq. 3.2.5 Explicit formulas for a single-peak AEF t Consider the case where p = 1, n1 = n2 = 2 and τ = . Then the explicit tm1 formula for the AEF is ( β2 +1 (β2 +1)(1−τ) β2 +1 (β2 +1)(1−τ) i(τ) η1,1 τ 1,1 e 1,1 + η1,2 τ 1,2 e 1,2 , 0≤τ ≤1, = 2 2 2 2 (129) β2,1 β2,1(1−τ) β2,2 β2,2(1−τ) Im1 η2,1 τ e + η2,2 τ e , 1≤τ. Assume that four datapoints, (ik, τk), k = 1, 2, 3, 4, as well as the charge flow Q0 and specific energy W0, are known. If we want to fit the AEF to this data using MLSM equation (123) gives f1(τ1) f2(τ1) 0 0 f1(τ2) f2(τ2) 0 0 0 0 g (τ ) g (τ ) 1 3 2 3 0 0 g (τ ) g (τ ) J = 1 4 2 4 , ∂ ∂ ∂ ∂ Q(β, η) Q(β, η) Q(β, η) Q(β, η) ∂β1,1 ∂β1,2 ∂β2,1 ∂β2,2 ∂ ∂ ∂ ∂ W (β, η) W (β, η) W (β, η) W (β, η) ∂β1,1 ∂β1,2 ∂β2,1 ∂β2,2 β2 +1 (β2 +1)(1−τ) fk(τ) = 2 η1,k β1,kτ 1,k e 1,k ln(τ) + 1 − τ , i1 β2 2 1,2 (β1,2+1)(1−τ1) η1,1 = − τ1 e , η1,2 = 1 − η1,1, Im1 β2 β2 (1−τ) gk(τ) = 2 η2,k β2,kτ 2,k e 2,k ln(τ) + 1 − τ , i3 β2 2 2,2 β1,2(1−τ3) η2,1 = − τ3 e , η2,2 = 1 − η2,1, Im1 2 2 2 2 β = β1,1 + 1 β1,2 + 1 β2,1 β2,2 , η = η1,1 η1,2 η2,1 η2,2 , 2 2 β1,s Q(β, η) X e 2 2 = η1,s 2 γ β1,s + 2, β2,s + 1 I β1,s+1 m1 s=1 2 β1,s + 1 2 2 β2,s X e 2 2 2 + η2,s 2 Γ β2,s + 1 − γ β2,s + 1, β2,s , 2β2,s+1 s=1 β2,s dˆg 2 I η β , q = 1, m1 1,s 1,s ∂Q dβ β=β2 +1 = 1,s d˜g ∂βq,s 2 Imq ηp,s β2,s , q = 2, dβ 2 β=β2,s 139 142 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions with derivatives ofg ˆ(β) andg ˜(β) given by (124) and (125), 2 2 2 2 2 2 2 2 βe = β1,1 + β1,2 + 2 β1,1 + β1,2 + 2 (β2,1 + β2,2)(β2,1 + β2,2) , 2 2 2 2 ηb = η1,1 η1,2 η2,1 η2,2 , ηe = (η1,1η1,2)(η1,1η1,2)(η2,1η2,2)(η2,1η2,2) , ∂ ∂ ∂ W (β, η) = 2 βq,s Q (2β, ηb) + β Q βe, ηe . ∂βq,s ∂βq,s q, (s−1 mod 2)+1 ∂βq,s 3.2.6 Fitting to lightning discharge currents This section is based on Section 4 of Paper F In this section some results of fitting the AEF to a few different waveforms will be demonstrated. Some single-peak waveforms given by Heidler func- tions in the IEC 62305-1 standard [133] will be approximated using the AEF, and furthermore, fitting the multi-peaked waveform to experimental data will be presented. Single-peak waveforms In this section some numerical results of fitting the AEF function to single- peak waveshapes are presented and compared with the corresponding fitting of the Heidler function. The AEF given by (129) is used to model few light- ning current waveshapes whose parameters (rise/decay time ratio, T1/T2, peak current value, Im1, time to peak current, tm1, charge flow at the strik- ing point, Q0, specific energy, W0, and time to 0.1Im1, t1) are given in Table 3.1. Data points were chosen as follows: (i1, τ1) = (0.1 Im1 , t1), (i2, τ2) = (0.9 Im1 , t2 = t1 + 0.8 T1), (i3, τ3) = (0.5 Im1 , th = t1 − 0.1 T1 + T2), (i4, τ4) = (i(1.5 th), 1.5 th). The AEF representation of the waveshape denoted as the first positive stroke current in IEC 62305 standard [133], is shown in Figure 3.5. Ris- ing and decaying parts of the first negative stroke current from IEC 62305 standard [133] are shown in Figure 3.6, left and right, respectively. β and η parameters of both waveshapes optimized by the MLSM are given in Ta- ble 3.1. We have also observed a so-called fast-decaying waveshape whose pa- rameters are given in Table 3.1. It’s representation using the AEF function is shown in Figure 3.7, and corresponding β and η parameter values in Ta- ble 3.1. 140 143 3.2. APPROXIMATION OF LIGHTNING DISCHARGE CURRENT FUNCTIONS Figure 3.5: First-positive stroke represented by the AEF function. Here it is fitted with respect to both the data points as well as Q0 and W0. Figure 3.6: First-negative stroke represented by the AEF function. Here it is fitted with the extra constraint 0 ≤ η ≤ 1 for all η-parameters. Apart from the AEF function (solid line), the Heidler function represen- tation of the same waveshapes (dashed line), and used data points (red solid circles) are also shown in the figures. Multi-peaked AEF waveforms for measured data In this section the AEF will be constructed by fitting to measured data rather than approximation of the Heidler function. We will use data based on the measurements of flash number 23 in [257]. Two AEFs have been constructed, one by choosing peaks corresponding to local maxima, see Fig- ure 3.8, and one by choosing peaks corresponding to local maxima and local minima, see Figure 3.9. For both AEFs there are two terms in each interval which means that for each peak there are two parameters that are chosen manually (time and current for each peak) and for each interval there are two parameters that are fitted using the MLSM. The AEF in Figure 3.8 demonstrates that the AEF can handle cases where the function is not constant or monotonically increasing/decreasing between peaks. This is only possible if the AEF has more than one term in the interval. 141 144 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Figure 3.7: Fast-decaying waveshape represented by the AEF function. Here it is fitted with the extra constraint 0 ≤ η ≤ 1 for all η-parameters. Figure 3.8: AEF fitted to measurements from [257]. Here the peaks have been chosen to correspond to local maxima in the measured data. Conclusions This section investigated the possibility to approximate, in general, multi- peaked lightning currents using an AEF function. Furthermore, existence of the analytical solution for the derivative and the integral of such function has been proven, which is needed in order to perform lightning electromagnetic field (LEMF) calculations based on it. Two single-peak Standard IEC 62305-1 waveforms, and a fast-decaying one, have been represented using a variation of the proposed AEF function (129). The estimation of their parameters has been performed applying the MLS method using two pairs of data points for each function part (rising and decaying). The results show that there are several factors that need to be taken into consideration to get the best possible approximation of a given waveform. The accuracy of the approximation varies with the chosen data points and the number of terms in the AEF. In several cases the two-term sum converged towards a single term sum. This can probably be improved by choosing the number of terms and the number and placement of data points in other ways which the authors intend to examine further. Further 142 145 3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN First-positive First-negative Fast-decaying stroke stroke T1/T2 10/350 1/200 8/20 tm1 [µs] 31.428 3.552 15.141 Im1 [kA] 200 100 0.001 Q0 [C] 100 // W0 [MJ/Ω] 10 // t1 [µs] 14.5 1.47 6.34 β1,1 0.114 1.84 7.666 β1,2 2.17 9.99 2.626 β2,1 0.284 0.099 0.925 β2,2 0 0.127 2.420 η1,1 −0.197 1 0 η1,2 1.197 0 1 η2,1 1 0.401 0.2227 η2,2 0 0.599 0.7773 Table 3.1: AEF function’s parameters for some current waveshapes examples of fitted (single- and multi-peaked) waveforms can be found in [189] and [143]. 3.3 Approximation of electrostatic discharge cur- rents using the AEF by interpolation on a D- optimal design This section is based on Paper G In this section we analyse the applicability of the AEF with p peaks to representation of ESD currents by interpolation of data points chosen ac- cording to a D-optimal design. This is illustrated through examples from two applications. The first application is modelling of ESDs from IEC stan- dards commonly used in electrostatic discharge immunity testing, and the second modelling of lightning discharges. For the ESD immunity testing application we model the IEC Standard 61000-4-2 waveshape, [131, 132] and an experimentally measured ESD cur- rent from [151]. For the lightning discharge application we model the IEC 61312-1 stan- dard waveshape [117,134] and a more complex measured lightning discharge current from [69]. We also use the same method to approximate a measured derivative of a lightning discharge current derivative from [130]. 143 146 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Figure 3.9: AEF fitted to measurements from [257]. Here the peaks have been chosen to correspond to local maxima and minima in the measured data. In both applications the basic properties of the current (or current deriva- tive) are the same, these properties and how they are modelled with the AEF is discussed in the next section. Multi-peaked analytically extended function A so-called multi-peaked analytically extended function (AEF) has been proposed and applied to lightning discharge current modelling in Section 3.1 and [183]. Initial considerations on applying the function to ESD currents have also been made in [189]. The AEF consists of scaled and translated power-exponential functions, that is functions of the form x(β; t) = te1−tβ, see Definition 3.1. Here we define the AEF with p peaks as q−1 nq X X i(t) = Imk + Imq ηq,kxq,k(t), (130) k=1 k=1 for tmq−1 ≤ t ≤ tmq , 1 ≤ q ≤ p, and p np+1 X X Imk ηp+1,kxp+1,k(t), (131) k=1 k=1 for tmp ≤ t. The current value of the first peak is denoted by Im1 , the difference between each pair of subsequent peaks by Im2 ,Im3 ,...,Imp , and their cor- responding times by tm1 , tm2 , . . . , tmp . In each time interval q, with 1 ≤ q ≤ p + 1, the number of terms is given by nq, 0 < nq ∈ Z. Parameters ηq,k are nq X such that ηq,k ∈ R for q = 1, 2, . . . , p + 1, k = 1, 2, . . . , nq and ηq,k = 1. k=1 144 147 3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN Furthermore xq,k(t), 1 ≤ q ≤ p + 1 is given by t−tmq−1 x βq,k; , 1 ≤ q ≤ p, tmq −tmq−1 xq,k(t) = (132) t x βq,k; , q = p + 1. tmq Remark 3.2. When previously applying the AEF, see Section 3.1.1, all exponents (β-parameters) of the AEF were set to β2+1 in order to guarantee that the derivative of the AEF is continuous. Here this condition will be satisfied in a different manner. Since the AEF is a linear function of elementary functions its derivative and integral can be found using standard methods. For explicit formulae please refer to Theorems 3.1–3.3. Previously, the authors have fitted AEF functions to lightning discharge currents and ESD currents using the Marquardt least square method but have noticed that the obtained result varies greatly depending on how the waveforms are sampled. This is problematic, especially since the methodol- ogy becomes computationally demanding when applied to large amounts of data. Here we will try one way to minimize the data needed but still enough to get an as good approximation as possible. The method examined here will be based on D-optimality of a regression model. A D-optimal design is found by choosing sample points such that the determinant of the Fisher information matrix of the model is minimized. For a standard linear regression model this is also equivalent, by the so- called Kiefer-Wolfowitz equivalence criterion, to G-optimality which means that the maximum of the prediction variance will be minimized. These are standard results in the theory of optimal experiment design and a summary can be found in for example [208]. Minimizing the prediction variance will in our case mean maximizing the robustness of the model. This does not guarantee a good approximation but it will increase the chances of the method working well when working with limited precision and noisy data and thus improve the chances of finding a good approximation when it is possible. 3.3.1 D-optimal approximation for exponents given by a class of arithmetic sequences It can be desirable to minimize the number of points used when construct- ing the approximation. One way of doing this is choosing the D-optimal sampling points. In this section we will only consider the case where in each interval the n exponents, β1,..., βn, are chosen according to k + m − 1 β = , m = 1, 2, . . . , n m c 145 148 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions where k is a non-negative integer and c a positive real number. Note that in order to guarantee continuity of the AEF derivative the condition k > c has to be satisfied. In each interval we want an approximation of the form n X βi βi(1−t) y(t) = ηit e i=1 1−t l and by setting z(t) = (te ) c we obtain n X k+i−1 y(t) = ηiz(t) . i=1 If we have n sample points, ti, i = 1, . . . , n, then the Fisher information matrix, M, of this system is M = U >U where k k k z(t1) z(t2) . . . z(tn) k+1 k+1 k+1 z(t1) z(t2) . . . z(tn) U = . . . .. . . . . . k+n−1 k+n−1 k+n−1 z(t1) z(t2) . . . z(tn) Thus if we want to maximize det(M) = det(U)2 it is sufficient to maximize 1−t l or minimize the determinant det(U). Set z(ti) = (tie i ) c = xi then n ! Y k Y un(t1, . . . , tn) = det(U) = xi (xj − xi) . (133) i=1 1≤i To find ti we will use the Lambert W function. Formally the Lambert W function is the function W that satisfies t = W (tet). Using W we can invert z(t) in the following way te1−t = xc ⇔ −te−t = −e−1xc ⇔ t = −W (−e−1xc). (134) The Lambert W is multivalued but since we are only interested in real- valued solutions we are restricted to the main branches W0 and W−1. Since W0 ≥ −1 and W−1 ≤ −1 the two branches correspond to the rising and decaying parts of the AEF respectively. We will deal with the details of finding the correct points for the two parts separately. 3.3.2 D-optimal interpolation on the rising part The D-optimal points on the rising part can be found using Theorem 3.6. 146 149 3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN Theorem 3.6. The determinant n ! Y k Y un(k; x1, . . . , xn) = xi (xj − xi) i=1 1≤i Proof. Without loss of generality we can assume 0 < x1 < x2 < . . . < xn−1 < xn ≤ 1. Fix all xi except xn. When xn increases all factors of un that contain xn will also increase, thus un will reach its maximum value on the edge of the cube where xn = 1. Using the method of Lagrange multipliers in the plane given by xn = 1 gives n ∂un k X 1 = u (k; x , . . . , x ) + = 0, ∂x n 1 n x x − x j j i=1 j i i6=j n Y for j = 1, . . . , n − 1. By setting f(x) = (x − xi) we get i=1 n 00 k X 1 k 1 f (xj) + = 0 ⇔ + = 0 x x − x x 2 f 0(x ) j i=1 j i j j i6=j 00 0 ⇔ xjf (xj) + 2kf (xj) = 0 (135) for j = 1, . . . , n − 1. Since f(x) is a polynomial of degree n that has x = 1 as a root then equation (135) implies f(x) xf 00(x) + 2kf 0(x) = c x − 1 where c is some constant. Set f(x) = (x−1)g(x) and the resulting differential equation is x(x − 1)g00(x) + ((2k + 2)x − 2k)g0(x) + (2k − c)g(x) = 0. The constant c can be found by examining the terms with degree n − 1 and is given by c = 2k + (n − 1)(2k + n), thus x(1 − x)g00(x) + (2k − (2k + 2)x)g0(x) +(n − 1)(2k + n)g(x) = 0. (136) 147 150 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions Comparing (136) with the standard form of the hypergeometric function [2] x(1 − x)g00(x) + (c − (a + b + 1)x)g0(x) − abg(x) = 0 shows that g(x) can be expressed as follows g(x) = C · 2F1(1 − n, 2k + n; 2k, x) n−1 (2k)n−1 X n − 1(2k + n)i = C · (−1)i xi (n − 1)! i i i=0 (2k) where C is an arbitrary constant and since we are only interested in the roots of the polynomial we can chose C so that it gives the desired form of the expression. The connection to the Jacobi polynomial is given by [2] m! F (−m, 1 + α + β + n; α + 1; x) = P (α,β)(1 − 2x), 2 1 (α + 1)m m and α = 2k − 1, β = 0, m = n − 1 gives the expression in Theorem 3.6. (a,b) Note that the Jacobi polynomials Pn (x) are orthogonal polynomials on the interval [−1, 1] with respect to the weight function (1 − x)a(1 + x)b and thus all of its zeros will be real, distinct and located in [−1, 1], see [48]. Thus all zeros of the polynomial given in Theorem 3.6 will be real, distinct and located in the interval [0, 1]. We can now find the D-optimal t-values using the upper branch of the Lambert W function as described in equation (134), −1 c ti = −W0(−e xi ), where xi are the roots of the Jacobi polynomial given in Theorem 3.6. Since −1 −1 ≤ W0(x) ≤ 0 for −e ≤ x ≤ 0 this will always give 0 ≤ ti ≤ 1. Remark 3.3. Note that xn = 1 means that tn = tq and also is equivalent nq X to the condition ηq,r = 1. In other words, we are interpolating the peak r=1 and p − 1 points inside each interval. 3.3.3 D-optimal interpolation on the decaying part Finding the D-optimal points for the decaying part is more difficult than it is for the rising part. Suppose we denote the largest value for time that can reasonably be used (for computational or experimental reasons) with tmax. 1 This corresponds to some value xmax = (tmax exp(1 − tmax)) c . Ideally we n would want a corresponding theorem to Theorem 3.6 over [1, xmax] instead n of [0, 1] . It is easy to see that if xi = 0 or xi = 1 for some 1 ≤ xi ≤ n − 1 then wn(k; x1, . . . , xn) = 0 and thus there must exist some local extreme 148 151 3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN point such that 0 < x1 < x2 < . . . < xn−1 < 1. This is no longer guaranteed n when considering the volume [1, xmax] instead. Therefore we will instead n extend Theorem 3.6 to the volume [0, xmax] and give an extra constraint on the parameter k that guarantees 1 < x1 < x2 < . . . < xn−1 < xn = xmax. Theorem 3.7. Let y1 < y2 < . . . < yn−1 be the roots of the Jacobi poly- (2k−1,0) nomial Pn−1 (1 − 2y). If k is chosen such that 1 < xmax · y1 then the determinant wn(k; x1, . . . , xn) given in Theorem 3.6 is maximized or min- n imized on the cube [1, xmax] (where xmax > 1) when xi = xmax · yi and xn = xmax, or some permutation thereof. Proof. This theorem follows from Theorem 3.6 combined with the fact that wn(k; x1, . . . , xn) is a homogeneous polynomial. Since wn(k; b · x1, . . . , c · k+ n(n−1) n xn) = b 2 ·wn(k; x1, . . . , xn) if (x1, . . . , xn) is an extreme point in [0, 1] n then (b·x1, . . . , b·xn) is an extreme point in [0, b] . Thus by Theorem 3.6 the points given by xi = xmax · yi will maximize or minimize wn(k; x1, . . . , xn) n on [0, xmax] . Remark 3.4. It is in many cases possible to ensure the condition 1 < (2k−1,0) xmax · y1 without actually calculating the roots of Pn−1 (1 − 2y). In the literature on orthogonal polynomials there are many expressions for upper and lower bounds of the roots of the Jacobi polynomials. For instance in [74] an upper bound on the largest root of a Jacobi polynomial is given and can be, in our case, rewritten as 3 y > 1 − 1 4k2 + 2kn + n2 − k − 2n + 1 and thus 3 1 1 − 2 2 > 4k + 2kn + n − k − 2n + 1 xmax guarantees that 1 < xmax · y1. If a more precise condition is needed there are expressions that give tighter bounds of the largest root of the Jacobi polynomials, see [179]. We can now find the D-optimal t-values using the lower branch of the Lambert W function as in equation (134), −1 c ti = −W−1(−e xi ), where xi are the roots of the Jacobi polynomial given in Theorem 3.6. Since −1 −1 ≤ W−1(x) < −∞ for −e ≤ x ≤ 0 this will always give 1 ≤ ti < tmax = −1 −W−1(−e xmax) so xmax is given by the highest feasible t. Remark 3.5. Note that here just like in the rising part tn = tp which means that we will interpolate to the final peak as well as p − 1 points in the decaying part. 149 152 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 3.3.4 Examples of models from applications and experiments Here we will apply the described scheme to two different applications, mod- elling of ESD currents commonly used in electrostatic discharge immunity testing and modelling of lightning discharge currents. The values of n and peak-times have been chosen manually, and k and c have been chosen by first fixing k and then numerically finding a c that gave a close approximation. For this purpose we used software for numerical computing [205], based on the interior reflective Newton method described in [55, 56]. This is then repeated for k = 1,..., 10 and the best fitting set of parameters is chosen. Note that this methodology uses all available data points to evaluate fitting but could probably be simplified further. For example, by using a simpler method for choosing c given k, only use a subset of available points to asses accuracy or, with sufficient experimentation find some suitable heuristic for choosing the appropriate value of k. Since the waveforms are given as data rather than explicit functions the D-optimal points have been calculated and then the closest available data points have been chosen. In these examples the coefficients in the linear sums can be negative. 3.3.5 Modelling of ESD currents The IEC 61000-4-2 standard current waveshape All ESD generators used in testing of equipment and devices must be able to reproduce the same ESD current waveshape. The requirements for this waveshape are given in the IEC 61000-4-2 Standard, [132]. The IEC 61000-4-2 Standard gives a graphical representation of the typ- ical ESD current, Figure 3.10, and also defines, for a given test level voltage, required values of ESD current’s key parameters. The values of the ESD currents key parameters are listed in Table 3.2 for the case of the contact discharge, where Ipeak is the ESD current initial peak, tr is the rising time defined as the difference between time moments corresponding to 10% and 90% of the current peak Ipeak, I30 and I60 are the ESD current values calculated for time periods of 30 and 60 ns, respectively, starting from the time point corresponding to 10% of Ipeak. In this section we present the results of fitting a 2-peak AEF to the Standard ESD current given in IEC 61000-4-2. Data points which are used in the optimization procedure are manually sampled from the graphically given Standard [132] current function, Figure 3.10. The peak currents and corresponding times are also extracted, and the results of D-optimal inter- polation with two peaks are illustrated, see Figure 3.11. The parameters are listed in Table 3.3. In the illustrated examples a fairly good fit is found but typically areas with steeply rising and decaying parts are somewhat more difficult to fit with good accuracy than the other parts of the waveform. 150 153 3.3. APPROXIMATION OF ELECTROSTATIC DISCHARGE CURRENTS USING THE AEF BY INTERPOLATION ON A D-OPTIMAL DESIGN U [kV] Ipeak [A] tr [ns] 2 7.5 ± 15% 0.8 ± 25% 4 15.0 ± 15% 0.8 ± 25% 6 22.5 ± 15% 0.8 ± 25% 8 30.0 ± 15% 0.8 ± 25% U [kV] I30 [A] I60 [A] 2 4.0 ± 30% 2.0 ± 30% 4 8.0 ± 30% 4.0 ± 30% 6 12.0 ± 30% 6.0 ± 30% 8 16.0 ± 30% 8.0 ± 30% Figure 3.10: IEC 61000-4-2 Standard ESD Table 3.2: IEC 61000-4-2 current waveform with parameters, [132] standard ESD current parame- (image slightly modified for clarity). ters [132]. Local maxima and minima and corresponding times extracted from IEC 61000-4-2, [132] 15 IEC 61000-4-2 Peak current [A] Peak time [ns] 2-peaked AEF Peaks Imax1 = 15 tmax1 = 6.89 Interpolated points 10 Imin1 = 7.1484 tmin1 = 12.85 ) Imax2 = 9.0921 tmax2 = 25.54 t ( i Parameters of interpolated AEF 5 Interval n k c 0 ≤ t ≤ tmax1 3 1 0.01385 0 tmax1 ≤ t ≤ tmax2 3 4 2.025 0 2 4 6 8 t < t 5 10 2.395 t [s] #10-8 max2 Figure 3.11: 2-peaked AEF interpo- Table 3.3: Parameters’ values of lated on a D-optimal design represent- 2-peaked AEF representing the IEC ing the IEC 61000-4-2 Standard ESD 61000-4-2 Standard ESD current current waveshape for 4 kV. waveshape for 4 kV. 151 154 Extreme points of the Vandermonde determinant and phenomenological modelling with power-exponential functions 3-peaked AEF representing measured current from ESD In this section we present the results of fitting a 3-peaked AEF to a waveform from experimental measurements from [151]. The result is also compared to a common type of function used for modelling ESD current, also from [151]. In Figures 3.12 and 3.13 the results of the interpolation of D-optimal points are shown together with the measured data, as well as a sum of two Heidler functions that was fitted to the experimental data in [151]. This function is given by n n t H t H τ1 − t τ3 − t i(t) = I e τ2 + I e τ4 , 1 nH 2 nH 1 + t 1 + t τ1 τ3 I1 = 31.365 A,I2 = 6.854 A, nH = 4.036, τ1 = 1.226 ns, τ2 = 1.359 ns, τ3 = 3.982 ns, τ4 = 28.817 ns. Note that this function does not reproduce the second local minimum but that all three AEF functions can reproduce all local minima and maxima (to a modest degree of accuracy) when suitable values for the n, k and c parameters are chosen. In Figure 3.13 we can see that even small bumps in the rising part are successfully reproduced. 3.3.6 Modelling of lightning discharge currents IEC 61312-1 standard current waveshape In this section we use the scheme to represent the IEC 61312-1 Standard current wave shape as it is described in [117]. Rather than being given graphically, as the IEC 61000-4-2 Standard current waveform, the shape is described using a Heidler function,