Deep X-Valuation Adjustments (Xvas) Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Deep X-Valuation Adjustments (XVAs) Analysis * Presented by Bouazza Saadeddine y Joint work with Lokman Abbas-Turkiy and Stéphane Crépeyz * Crédit Agricole CIB, France LaMME, Université d'Evry/Paris-Saclay, France LPSM Sorbonne Université, France y LPSM Université de Paris, France z Plan of the presentation 2/34 1. XVA primer 2. General simulation and learning issues 3. From linear regressors to neural networks 4. Backward learning scheme with over-simulation 5. Numerical benchmark 6. GPU optimizations 7. Work in progress Motivation 4/34 200809 crisis major banking reforms aimed at securing the financial system; ! Collateralization and capital requirements were raised; Unintended consequences: quantify market incompleteness by banks, based on XVA metrics; XVAs: pricing add-ons meant to account for counterparty risk and its capital and funding implications VA: Valuation Adjustment; X: catch-all letter to be replaced by C (Credit), D (Debt), F (Funding), M (Margin) or K (Capital). During the financial crisis, roughly two-thirds of losses attributed to counterparty credit risk were due to CVA losses and only about one-third were due to actual defaults (Basel Committee report, June 2011); In January 2014, JP Morgan recorded a $1.5 billion FVA loss; Essential to model future evolution of XVAs. ! X and Y are resp. the state of defaults and market risk factors at time t , i 0; : : : ; n ; i i i 2 f g CVA could be simulated with 1 layer of Nested Monte-Carlo (NMC), assuming analytic MtM Time steps M(i) M(i-1) n 1 + M(i) − CVAi := E[ j=i MtMj+1 1 tj < tj+1 Xi; Yi] f gj M(i) P FVA would need n layers of NMC ( : bank funding spread) n 1 + FVAi := E[ − j+1 (MtMj+1 1 >t CVAj+1 FVAj+1) (tj+1 tj) Xi; Yi] j=i f j+1g − − − j i.e. exponenPtial complexity in n, unless interpolation or regression is used for cutting the recursion. General simulation and learning issues 7/34 (Xi)0 i n and (Yi)0 i n are jointly Markov processes evaluated at 0=t0 < <tn =T ; We want to learn: := E[ X ; Y ] i i;nj i i where i;n := fi((Xj)i j n; (Yj)i j n); The probability changes everyday due to changes in market conditions; We are in a setting where X contributes more to the variance of than Y ; Simulating X is much faster than simulating Y Over-simulating X. ! Depth Mkva . KVA0 Mec ECs, 0<s<T . Mfva Mcva ECs . Mim FVAt=s,...,s+1 Mmtm . CVAt, MVAt, t=s,...,s+1 . IMt=s,...,s+1 , MtMt=s,...,s+1 t FVA IMv u, u=t,...,T CVAu, MVA , MtMw=v,...,v+ IMu=t,...,T u=t,...,T , MtM MVAu, CVAu IMv=u,...,T , MtMw , MtMv=u,...,T Hard to compute because of critical tail events and of the presence of multiple XVA layers EC is an Expected Shortfall of default losses and fluctuations of lower XVAs; In Abbas-Turki, Diallo and Crépey (2018), a benchmark approach involving multi-layer NMC and linear regressions, along with GPU optimization techniques have been developed. This benchmark however has an exponential complexity in the # of layers; We focus here on an approach based on neural regressions, with linear complexity. Regression setup 9/34 (Xi)0 i n and (Yi)0 i n are jointly Markov processes evaluated at 0=t0 < <tn =T ; We want to learn: := E[ X ; Y ] i i;nj i i where i;n := fi((Xj)i j n; (Yj)i j n); Technique well known to the quant finance community (Longstaff and Schwartz (2001)); k k k Draw i.i.d samples (Xi ; Yi ; i;n) 1 k m of (Xi; Yi; i;n); f g k k k Estimate E[i;n Xi; Yi] using linear regression of (i;n)1 k m against (Xi ; Yi ) 1 k m j f g (: feature mapping). Neural networks as regressors 10/34 XVAs exhibit non-trivial dependencies on the (many) risk factors; Very hard to manually craft a good enough feature mapping ; Neural Networks (NNs) offer a way to learn the feature mapping too: ^ ^ 2 find i argmin E[(i;n '(Xi; Yi)) ] 2 2 − where ' is a NN parametrized by ; '^ (Xi;Yi) would then be an estimator of E[i Xi;Yi], only valid for given market conditions; i j NNs more flexible than linear regression, e.g. allowing to enforce positivity using a ReLU or SoftPlus activation at the output layer. An over-simulation scheme 11/34 Leverage on the hierarchy between X and Y : relax the i.i.d setting and sample more real- izations of X than Y ; k Simulate i.i.d paths (Y )1 k of Y ; k;l k At ti and for every 1 k , simulate i.i.d realizations (Xi )1 l ! of Xi given Yi and k;l k;l k set i;n := fi(Xi ; Yi ); k;l k k;l k;l Yields a sample (Xi ; Yi ; i;n) 1 k of size m := ! of (Xi; Yi; i;n); f g1l ! The generated sample is not i.i.d; More efficient in speed since Y is more costly to simulate than X and in memory usage. i i In the XVA setting, assuming is the counterparty's default intensity process: X : default indicator, Y : market risk factor processes (rates, FX, default intensities); i i i i 0; : : : ; n ; P(Xi = 0 Yj 0 j i) = exp( j t); 8 2 f g jf g − j=1 Conditional on Yj 0 j i, Xi can be simulated veryPfast as 1 i t> with Exp(1). f g f j=1 j g P Learning scheme 13/34 At every time step ti: k;l k k;l 1. Simulate (Xi ; Yi ; i;n) 1 k according to the previous over-simulation scheme; f g1l ! k;l k;l k 2. Train a NN to regress (i;n) 1 k against (Xi ; Yi ) 1 k , i.e. 1l ! f g 1l ! ! ^ k;l k;l k 2 find i argmin (i;n '(Xi ; Yi )) ; 2 2 − k=1 l=1 X X 3. Use (x; y) '^ (x; y) as an estimator for E[i;n Xi = x; Yi = y]. 7! i j Backward learning 14/34 Possible to have non-smooth paths ' (Xi; Yi) 0 i n even when the labels are smooth; f i g Learnings on each time step t are being done independently of each other; i Start the learning at T and then at every time step reuse the previous solution as an initial- ization of the training algorithm; Local minima associated with each time step will be close to each other paths of ' (Xi; Yi) 0 i n are now smooth; ! f i g A form of transfer learning, also helps accelerate the convergence of the learning procedure. Choosing the over-simulation factor 15/34 Assume that simulating Y costs times more than X Y in terms of computation time; i ij i ! chosen so as to minimize the variance of the loss 1 ! f (Xk;l;Y k) w.r.t ! under m k=1 l=1 i i a budget constraint = (!+), where f is such that f (X ; Y ) = ( ' (X ; Y ))2; P P i i i;n − i i Defining := E[(f (X1;1; Y 1))2] E[f (X1;1; Y 1) f (X1;2; Y 1)] i i i − i i i i and := E[f (X1;1; Y 1) f (X1;2; Y 1)] (E[f (X1;1; Y 1)])2, one can show that: i i i i i − i i 2 2 ! 1 1 Var f (Xk;l; Y k) = i ! i + i + p m i i 0 !0 − 1 0 1 1 k=1 l=1 ! s i s i X X @ @ A @ A A t = 9:5 years 8:47317 101 t = 8:0 years × 2 101 8:47317 101 × × 8:47317 101 × 8:47317 101 × 1 heuristic 8:47317 10 1 × 1 M 10 8:47317 10 × 8:47317 101 × 8:47317 101 × 0 50 100 150 200 250 0 50 100 150 200 250 3:45 101 4:7 101 × × t = 6:5 years 3:44 101 t = 5:0 years × 4:68 101 3:43 101 × × 3:42 101 4:66 101 × heuristic × 3:41 101 × M 1 4:64 10 3:4 101 × × 1 4:62 101 3:39 10 × × 0 50 100 150 200 250 0 50 100 150 200 250 3:14 101 × t = 4:5 years t = 3:0 years 3:12 101 2:62 101 × × 3:1 101 × 1 2:61 101 3:08 10 × × 1 3:06 10 1 heuristic × 2:6 10 3:04 101 × M × 3:02 101 2:59 101 × × 3 101 1 × 1 2:58 10 2:98 10 × × 0 50 100 150 200 250 0 50 100 150 200 250 SGD iteration SGD iteration i Figure 1. Optimal over-simulation factor at different time steps and SGD iterations. i r Finite parameter space case 17/34 Assume is finite, let 0 < < and define: ? # := minE[f(X ; Y )] 2 ! ^ 1 k k;l #;! := min f(Xi ; Yi ) m 2 k=1 l=1 ? S := :XE[fX(X ; Y )] # + f 2 ! g 1 S^ := : f (Xk; Y k;l) #^ + ;! 2 m i i ;! ( k=1 l=1 ) X X Assume S =/ and let u: S such that E[f (X; Y )] E[f (X; Y )] ? for all n ; n ! u() − S, for some ? , and define g (X ; Y ) := f (X; Y ) f (X ; Y ).