Evolving machine intelligence and its influence on risk landscapes & analyses

Dr. Jeffrey Bohn, Head, Swiss Re Institute

CDAR Symposium, 19th October 2018 Digital Society = People + Data + Networks + Machine Intelligence What risks emerge?

2 Inspiration from Ludwig Boltzmann (Austrian physicist & philosopher who developed statistical mechanics; coined the term ergodic) In my view all salvation for [machine intelligence] may be expected to come from Darwin’s theory

Boltzmann’s ergodic hypothesis: For large systems of interacting particles in equilibrium, time averages are close to the ensemble average.

3 Outline for presentation

• Digital society • Changing risk landscapes • Reinsurance • Evolution of risk modeling • Machine intelligence • Machine intelligence’s fragility • Matching the algorithm/system/process/data to risk-related use cases • Final remarks

4 Digital society

5 The Digital Society is the result of technology abundance

Semiconductors enabled cheap and abundant computation The enabled cheap and abundant connectivity Big data is creating cheap and abundant information IoT is enabling cheap and abundant sensors … Machine intelligence is enabling cheap and abundant predictions

In digital societies, if data is the oil, Machine Intelligence is the refinery

6 Rise of the machines

Early 1980’s Early 1990’s Early 2000’s Late 2000’s Now – Future Machine Overcrowding AI Winter Deep Learning Meta-learning Learning

No substantial Primitive AI Exploration into Innovating machine Current/ongoing innovation; caused that required many learning systems learning systems to development of Description by weak computer sets of rules to perform that are fed large work unsupervised learning systems aim hardware and basic functions amounts of data and faster to mimic humans lack of data Boosting, Boltzmann Rules based Regressions Sigmoid/Back prop. machines, GANs Models Pattern matching Clustering Small ANNs Deep RNNs Projective simulation Linear models Non-linear models Boosting Quantum-inspired Evolutionary

Perform basic Stalled while awaiting Create self-learning Self-learning systems Systems that tasks normally done more advanced systems that feed that improve accurately mimic Objectives by humans computer systems on data (Stock unsupervised human behavior (CPU Chess Players) and data trading algorithms) (Self-driving Cars) (Chatbots)

Note: AI: ; ANN: Artificial neural network; RNN: Recurrent neural network; GAN: Generative adversarial networks

7 Rise of the networks

Early 1980’s Early 1990’s Early 2000’s Late 2000’s Now – Future Worldwide Web Ubiquitous Government Cloud Edge & Fog & Client-servers Internet/ IoT Early networks Migration to dial-up Internet Change economics Further change of were funded by a internet and penetration in of networked networked Description few governments proliferation of developed world computing in terms computing to such as the U.S. client-server exceeds 50% of fixed investment further reduce networks investment, add IoT

Examples Arpanet Top-level domains Appliances shipped AWS, GCP, and Make appliances e.g., .com take hold with IP addresses Azure “smart” e.g., Tesla

Connect researchers Connect Connect Create more elastic Move machine & military universities & companies, networks that Objectives intelligence out to companies governments, extends networked network edges to universities & computing increase speed & consumers create new “intelligence”

8 Data is the new oil: Crude product multiplies in value after refinement

Generate Organise Analyse Deliver & collect & curate & transform & multiply value

IoT 5G Connectivity Big Data Cloud computing Human + Artificial Intelligence Robotics Autonomous systems Platform (eg. Smart City)

9 Changing risk landscapes

10 Digital society and new risk landscapes

• Insufficient investment in data collection & curation systems • Digital “exhaust” and “hygiene” • New networks overlayed on legacy networks create gaps and vulnerabilities • Differing value/regulatory systems underlying society’s digitization create disconnects • Over-dependence on automated/machine-intelligence-enabled systems leads to blind spots and unnecessary risk • Vulnerabilities arise from IoT-enabled devices with outdated security features • Mismatching digital tools with a given use case leads to poor implementations & vulnerabilities

Cyber risk and algorithmic malpractice increasingly becoming most important risk categories

11 Lessons from technological change in a different industry

Sale of Cameras

Source: Gartner Inc. (2015)

12 Changes arising from distributed ledger technology (DLT)

Internal Enhancement of End-customer New value-chains improvements insurance value-chain driven initiatives

End- End- End- End- End- customer customer customer customer customer

Broker Reinsurer

new new player player Service Service Insurer Capital Insurance Capital

Enhancing internal Organizing members of End-customers in non- New players creating a DLT- business process existing insurance value- insurance verticals based alternative value- efficiencies by hosting chain into a DLT cooperative organizing DLT cooperatives chain or market for shared services on internal for driving data for aggregation of shared distribution of re/insurance DLT platforms standardization, aggregated services including insurance services data access and shared processes

Incremental Business model efficiency gains disruption

13 What is Cyber Risk?

Accidental breaches of security / Human error.

Unauthorized and deliberate breach of computer security to access systems.

IT operational issues resulting from poor capacity planning, integration issues, system integrity etc.

14 How severe is this risk?

1.9B records stolen in first 3 quarters of 2017 compared to 600M in entire 2016

500M forms of malware (400k created everyday)

1M identities breached per breach

Probability of breach in a year is close to 100%

15 The global cyber insurance market is still small…

Insurance premium in 2017 per LoB, in USD bn

EMEA North America Motor 184 Asia Pacific Motor 271 Property 117 Motor 199 Property 196 Liability 42 Property 41 Liability 97 Cyber 0.4 Liability 19 Cyber 2.9 Cyber 0.2

Source: Swiss Re Institute 16 …but expected to grow rapidly

Expected global cyber insurance premium volume (2015 – 2025), in USD bn Global insurance premium growth (1960 – 2017)

20% CAGR 32% CAGR Allianz (2015-2020) (2020-2025)

44% CAGR Aon (2015-2020) 1.5% PWC 21% CAGR (2015-2020)

15% CAGR Advisen (2015-2020)

ABI 37% CAGR (2015-2020)

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0

2015 2020 2025

Source: Swiss Re Sigma No 1/2017, Cyber: getting to grips with a complex risk Source: Swiss Re Sigma No 3/2018, World insurance in 2017

Cyber-insurance market is expected to grow more rapidly than other lines of business. Currently, market focus is on the SME segment. However, first cyber insurance solutions for individuals are also entering the market.

17 Reinsurance

18 Reinsurance is a catalyst for economic growth.

Activity Benefits

Risk transfer function Diversify risks on a global basis Make insurance more broadly available and less expensive

Capital market function Invest premium income according to Provide long-term capital to the expected pay-out economy on a continuous basis

Information function Price risks Set incentives for risk and knowledge adequatebehavior

Reinsurers absorb shocks, support risk prevention and provide capital for the real economy

19 Assets and Underwriting steering have similar basic building blocks and should be done jointly

Current Portfolio NAV by Asset Class and Expected premia & claims segment by UW portfolio segment

common Forward-looking Views Forecasts Expected Returns (FLV) on exposure, loss and drivers premium rate trends

Management Shift across asset Organic growth, Pruning, Actions – Hedging, Initiatives, Large classes; IR Duration changes shaping the Transactions target

Target Long-term Strategic Asset Target Liability Portfolio Portfolio Allocation (SAA) or BU Plan

20 Forward-looking modeling is key to improving a reinsurer’s performance

• Capital allocation: Choose/avoid outperforming/underperforming insurance portfolio segments (IPS)

• Risk selection: Select risks within each IPS

• Strategic asset allocation: Choose/avoid outperforming/underperforming asset classes

21 More examples where machine intelligence changes insurance

Forward-looking modeling of risk pools Incorporating unstructured data into business and capital steering Tracking natural catastrophe damage in real time Assessing damage Automated underwriting Improving customer targeting Parametric insurance contract implementation Intelligent automation & robotic process automation (RPA) for underwriting and claims processing Chatbots for customer support Natural language processing applied to contract review

22 Evolution of risk modeling

23 “Risk” definitions matter

• Symmetric return/loss distributions with parameterizable distributions based on existing data– “known unknowns” – Volatility measures may be sufficient – “Beta” risk should be the focus i.e., allocation more important than individual exposure selection • Asymmetric return/loss distributions with changing distribution parameters with sparse data– “partially known unknowns” – Focus shifts to tail risk/expected tail loss – Diversification opportunities continue even as portfolio becomes quite large – Active management may be better compensated given huge savings to avoiding tail events • Emerging risk means data availability may be so sparse as to make any parameter estimation infeasible– “unknown unknowns” – Ambiguity creates risk that cannot be managed using traditional approaches – “Structural” model based on subject matter experts can provide some guidance, which cast some light on unknown

24 RISK CONTRIBUTION (RC) & EXPECTED-TAIL-LOSS CONTRIBUTION (TLC) No Loss RISK CONTRIBUTION Probability • Exposure’s contribution to portfolio’s volatility Expected • Relates to shorter-term risk Loss

EXPECTED TAIL-LOSSCONTRIBUTION • Exposure’s contribution to extreme loss possibility (i.e. the “tail”) Risk Contribution • Relates to rare, but severe losses

Expected Tail Loss (Extreme Loss)

0 Exposure Loss Simple Simulated Stratified * Volatility Multiple * Flexible * Dynamic * Analytical * ETL contribution * Individual exposure solutions differentiation 25 Black swans, gray rhinos, and perfect storms

• Defining extreme-downside, scenario categories: – Black swans: Unknowable given current information set and virtually impossible to predict – Gray rhinos: Highly probable and straightforwardly predictable given current information set, but neglected – Perfect storms: Low probability and not straightforwardly predictable given the outcome results from interaction of infrequent events • Scenario-based analyses vs. forecasts • Deeper analyses of underlying assumptions, relationships, and data • More focus on tools/processes to manage multiple sets of scenarios and analyses across time • Renewed efforts to enforce preproducibility, reproducibility, and out-of-sample testing • Process management systems with robust audit logs are more important than ever

26 Challenges specific to financial market and insurance risk modeling

• Sources of non-stationarity/ non-ergodicity • Ergodic systems – Structural changes in the real economy – Closed 1. Digital ecosystems – Low dimensional 2. Increased concentration within sectors – Not evolving, but can be cyclical 3. Information & communications technology • Non-ergodic systems – Structural changes in the financial economy – Open 1. Near-zero interest rates – Generate new information all the time 2. Inflation – Learn and adapt over time (e.g., Darwinian evolution) 3. Globalization of capital markets – Past data mostly not useful – Purely inductive models are mostly ineffective

It is far better to have absolutely no idea of where one is– and to know– than to believe confidently that one is where one is not. -- Jean-Dominque Cassini, astronomer, 1770

27 Marrying qualitative and quantitative data

• Roughly 90% of available data are qualitative and unstructured e.g., articles, blogs, e-mail, regulatory filings, slide presentations, social media, etc. • Quantitative data may not reflect all forward-looking risks (e.g., Environment, Social, Governance-- ESG) • Transforming qualitative data into indicators and combining in some way (e.g., shading, weighted combinations, etc.) with quantitative data may be a path to improving existing models

28 Misusing Occam’s razor

William of Ockham’s actual quote (most likely): Numquam ponenda est pluralitas sine necessitate [Plurality must never be posited without necessity]

Simplicity in concept may belie complexity in reality– e.g., biological evolution, construct an optimal portfolio, optimally allocate capital to insurance portfolio segments, etc.

29 Machine intelligence

30 How we define “artificial” and “intelligence” will influence research and development in machine intelligence YOU ARE HERE • Artificial: Human-made, contrived, not natural, not real • Intelligence: Learn and apply knowledge or skills, solve problems, ability to reason/ plan/ adapt/ respond/ think abstractly • Artificial intelligence: Ability to simulate human intelligence– is this all? Are these definitions adequate? How we define machine intelligence materially influences the tools we create

“Viewed narrowly, there seem to be almost as many definitions of intelligence as there were experts asked to define it.” Sternberg

The future will be Human Intelligence augmented by some kind of Machine Intelligence

31 Machine intelligence taxonomy

Artificial intelligence: Mimic human intelligence and possibly go beyond Artificial general intelligence: Possibly self-aware intelligence : Data-dependent calibration Deep learning: Model-free, data-dependent calibration Meta-learning: Learning how to learn Cognitive computing: Simulate human-brain processes Augmented intelligence: Human assistants Expert systems: Advice systems using knowledge databases Robotic process automation: Roboticized systems that replicate repetitive processes Intelligent automation: Hybridized processes that use automation to better leverage human productivity

32 Six major categories for applying machine intelligence in business applications

Supervised Learning: Regression, Classification, Dimension Reduction.. Machine Learning Unsupervised Learning: Association rule learning, clustering… Platform (ML) Reinforcement Learning Deep Learning Neural Network/ Convolutional Neural Network Platform (DL) Speech Recognition/Question Answering

Natural Language Content Creation Machine Processing (NLP) Intelligence for Natural Language Generation Business Applications Internet of Things Digital Twin/AI Modeling (IoT) Peer-to-Peer Networks/ Edge Computing

Virtual Agents Recognition Technology Emotion Recognition

Image Recognition Decision Management Decision Support System Infotainment System 33 Boltzmann machines • Boltzmann machine: Stochastic recurrent neural network that can be seen as the stochastic, generative counterpart of Hopfield nets. Maximizes the log likelihood of given patterns exhibiting Hebbian engrams. Threshold is binary. Finds hidden relationship layers. • Hopfield networks: Form of recurrent artificial neural network that is content-addressable with binary threshold nodes. • Hebbian engram: Two cells (or systems of cells) that are repeatedly active at the same time tend to become “associated.” • Dynamic Boltzmann machines (DyBM): Maximize log likelihood of time series with property of spike-timing dependent plasticity (STDP) [amount of synaptic strength change depends on when two neurons fired.] Threshold is binary. • Gaussian Dynamic Boltzmann machines: Extend DyBM to deal with real values capturing long-term dependencies (similar to VAR, which is a special case). Can add an RNN layer to induce hidden, high-dimensional non-linear relationship features.

Ackley, David H., Geoffrey E. Hinton, and Terrence J. Sejnowski, 1985, “A learning algorithm for Boltzmann machines,” Cognitive Science, 9(1), 147-169. Dasgupta, Sakyashingha and Takayuki Osogami, 2017, “Nonlinear dynamic Boltzmann machines for time-series prediction,” Proceedings of the thirty-first AAAAI conference on artificial intelligence. Hebb, Donald O., 1949, The organization of behavior, New York: Wiley & Sons. Hopfield, John J., 1982, “Neural networks and physical systems with emergent collective computational properties, “ Proceedings of the National Academies of Sciences, 79, 2554-2558. Marks, T., and J. movellan, 2001, “Diffusion networks , products of experts, and factor anlaysis,” Proceedings of the Third International conference on Independent compennet analysis and blind source separation. Osogami, Takayuki and M. Otsuka, 2015, “Learning dynamic Boltzmann machines with spike-timing dependent plasticity,” Techicial report RT0967, IBM Research.

34 “Big Picture” for machine intelligence– four schools of thought

Sequential Data Spatial Data (Time Series) (Cross Sectional)

GPK

ARGP Gaussian Process Bayesian EKF (Non-Parametric) Kalman Filter GPTS UKF State Space Dynamic Model Bayesian Linear Regression Linear Model

DyBM LSTM

Non-Bayesian RNN CNN (Parametric) G-DyBM Kernel Regression Neural Networks

Connections GARCH ARMA Extend Generalized Linear Model Linear Smoothers Influence ARCH AR MA Rival Linear Time Regression Linear Regression Related

35 Machine intelligence’s fragility

36 What is Meant by “Fragility of Machine Intelligence”?

Fragility of Machine Intelligence: When a machine intelligence-based solution fails to meet accuracy or performance criteria expected of the model’s application, the model outcome may adversely affect dependent systems. The unexpected output of the machine intelligence is a result of the model’s fragility, or inability to process data in relation to the expected performance criteria.

Sources of fragility • Overlaying new systems on existing systems that are not likely to be seamlessly interoperable • Mismatching systems/algorithms with particular use cases • Interaction of humans and machines • Poorly designed system architectures • Poorly designed algorithms buried in a system architecture– algorithmic malpractice • Overall exposure to cyber attacks

37 Major vulnerabilities exist across many modern machine intelligence methods Vulnerabilities Problems Typical Algorithms “Blackbox” Difficulty in debugging, error tracking Neural Networks Computational 1. Lack of computing power Neural Networks, RNN, Search Algorithms Cost 2. High computational complexity (e.g. 푂(푛3)) 1. Biased/volatile/fat-tailed/arbitrary data Neural Networks, Clustering, Classification Data 2. Requirement of well-labeled data Algorithms (RNN, , SVM), 3. Insufficient data Assembly Algorithms 1. Cyber attacks Security Neural Networks, IoT 2. Privacy gets threated 1. Impractical vocabulary bank in NLP 2. Complex hyperparameter selection Neural Networks, NLP, RNN, Gaussian Algorithm 3. Not scalable Regression, Kalman Filter, Sorting Design 4. Cannot solve non-linear problems Algorithms, Blockchain Algorithms, 5. Not interpretable Robotics Algorithms 6. Fail to capture error in learning process Information 1. Memory loss in training process Neural Networks, PCA, Clustering Loss 2. Feature loss in dimensionality reduction Algorithms Regulation Lack of standardized regulations IoT 38 Impacts of Machine Intelligence Fragility

• Regulatory • Impact to Peripheral Machine Intelligence Modules • Impact to Transaction Flows • Financial Impact to Business (resulting from Impact to Transaction Flows) • Financial Impact to Business (as a “Residual Result”) • Property Damage • Human Injury or Loss of Life

39 Matching algorithm/system/process/ data to risk-related use cases

40 Matching techniques to objectives

• Objectives/use cases: • OLS/Factor modeling (Explicit & implicit) – Identify trends (first moment estimation e.g., • Better regression techniques: Random forests, expected return) lasso regression – Identify short-term risk (second moment estimation ARIMA/GARCH/Vector Auto Regression (VAR) e.g., volatility estimation) • – Identify long-term "risk" (higher moments, e.g., • Recursive Neural Networks (RNNs) expected tail loss or expected shortfall from a • Boltzmann machines skewed, non-normal [likely non-ergodic] distribution) – Restricted Boltzmann machines – Dynamic Boltzmann machines – Gaussian dynamic Boltzmann machines • – LightGBM

41 Modeling challenges

• Variable data availability & quality • Changing underlying data generation processes • Moving up the ladder of causation (Pearl, 2018, p. 28) – Level 1: Association (Seeing, observing) – Level 2: Intervention (Doing, intervening) – Level 3: Counterfactuals (Imagining, retrospection, understanding) • Macroeconomic interaction • Insurance market interaction (Game theory) • Behavioral economics (Changing product design, marketing, distribution, strategy can change opportunities within a given segment) • Digitizing society impact

42 How to persuade an individual to make a decision (Inspired by Koomey (2001) figure 19.1 p. 88)

Agree on objectives and criteria Disagree on objectives and/or for acceptance criteria for acceptance

Agree on data Computational decision Negotiate

Disagree on data Experiment Paralysis or chaos

43 Ensemble modeling is the process by which multiple weaker systems are combined to create more complex systems

Layer 1 Layer

Layer 2 Layer

Layer 3 Layer

Output

44 Example: highly complex datasets, such as “donut problem” can’t be learned without ensemble modeling (Source: XLP)

Linear Model Non-Linear Model

A A B B

When splitting the donut into binary classifications, a linear A single, non-linear model would draw a curve that most model would simply cut the data in the middle. This leads minimizes the error of binary classification. The error rate to a very high error rate of 50%, and is no better than using a non-stacked model would still be extremely high random guessing 45 Stacking to generate more complex system, ensemble models learn from data with greater accuracy (Source: XLP)

A B

First Level Algorithm Level First

y2

y1 y3

Ensemble models study the errors of weaker tests, creating more complex systems with higher predictive accuracy

Second Level Algorithm Level Second

New Prediction

46 Weaknesses of Existing ML/AI Tools Gradient Boosting algorithm has a high prediction accuracy using the original VIX data

LightGBM Predictions based on Historical VIX Data 220 Historical AAPL Price 200 Predicted Values by LightGBM

180 Predictions and real values almost 160 align as MSE = 0.00000106 140

120

100

80

60

40

20

0 1/1/1990 1/1/1995 1/1/2000 1/1/2005 1/1/2010 1/1/2015 1/1/2020

47 Weaknesses of Existing ML/AI Tools However, a minor jump in VIX can alter the accuracy of the model predictions significantly

LightGBM Predictions based on Altered VIX Data 220 Historical AAPL Price 200 Predicted Values by LightGBM

180 Relatively larger residuals are seen 160 as MSE = 9.229

140

120

100

80

60

40

20

0 1/1/1990 1/1/1995 1/1/2000 1/1/2005 1/1/2010 1/1/2015 1/1/2020

48 Weaknesses of Existing ML/AI Tools Gradient Boosting fails when data is limited, as the algorithm merely tries to find the similar pattern in the past data

Predictions by Linear Regression vs LightGBM 9 Predicted Values by LightGBM A vanilla linear 8 Predicted Values by Linear Regression regression manages to Historical Data capture the up trend 7

6 13 data points in total Booster acts as a classification tree here. 5 When new data come, it is only able to find a similar pattern in historical data and then 4 assign a same value. In this case, booster believes that the 3 data points in test data are similar to the last data point in training 3 data. Therefore, the predicted values are always same as the last value in historical 2 line. That fails to capture the overall trend

1

0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021

49 Weaknesses of Existing ML/AI Tools Gradient Boosting fails when a proper window size is not well-selected for a cyclical time series

A 3999-row cyclical data, with 3 data Predictions by LightGBM with points in a cycle Different Window Size

1,005 LightGBM with 3-Day Back Window 1,004 LightGBM with 4-Day Back Window 1,003 Historical Data 1,002 1,001 1,000 999 998 997 996 995 994 993 992 0 1 2 3 4 5 6 7 8 9 10 11

50 Final remarks

51 Defining a research agenda at the intersection of machine intelligence and digitizing societies

• Focus more on… – Addressing the challenge of data curation – Communicating actionable insight – Testing algorithms on real, useful data at scale – Discussing how machine intelligence should integrate into a digitizing society (define algorithmic malpractice; shape regulatory environment regarding data & machine intelligence) • Focus less on… – Developing more sophisticated algorithms without a specific applied use case – Testing algorithms on toy datasets

52 Challenges

Collecting & curating suitable & sufficient data Matching the type of machine intelligence with an objective or use case Making algorithms interpretable and diagnosable Defining what constitutes algorithmic malpractice in the machine intelligence arena Dealing with data sparsity and non-stationary data-generating processes Architecting and complying with data privacy regulations Addressing conflicts arising from human notions of law, fairness, and justice and machine-intelligence capabilities that can circumvent protections Addressing system fragility as interconnected and complex networks are infused with machine intelligence Addressing growing cyber-risk in digital ecosystems infused with machine intelligence

53 Inspiration from biology (Theodosius Dobzhansky)

Nothing in a digitizing society makes sense except in the light of the evolution of machine intelligence

54 55 Legal notice

©2018 Swiss Re. All rights reserved. You are not permitted to create any modifications or derivative works of this presentation or to use it for commercial or other public purposes without the prior written permission of Swiss Re.

The information and opinions contained in the presentation are provided as at the date of the presentation and are subject to change without notice. Although the information used was taken from reliable sources, Swiss Re does not accept any responsibility for the accuracy or comprehensiveness of the details given. All liability for the accuracy and completeness thereof or for any damage or loss resulting from the use of the information contained in this presentation is expressly excluded. Under no circumstances shall Swiss Re or its Group companies be liable for any financial or consequential loss relating to this presentation.

56