The Diagram for Uncertainty Visualization

y V Carlos D. Correa and Peter Lindstrom p I a n r o f ia tr o n r t E m io a n t o io f Center for Applied Scientific Computing ‐ CASC n Lawrence Livermore National Lab acos(NMI)

This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE‐AC52‐07NA27344.

Visualization should help compare models with observations

ncar‐ccsm3 cnrm‐cm3 bccr‐bcm2 bccr‐bcm2

290 Average annual temperature 1900‐2000 as 289 predicted by various climate models.

288 Which model is more similar to a reference 287 model or observations? Temperature 286 Obs. Trend plots often do not expose these 285 aspects. 1900 1920 1940 1960 1980 2000 Year Visualization should help find correlations of similar outputs – important for uncertainty quantification Seasons Zones Annual Dec‐Jan‐Feb Jun‐Jul‐Aug Northern High Lats Northern Med. Lats Northern Low Lats Southern Low Lats Southern Med. Lats Southern High Lats

Divide ensembles in 6 latitude zones and 3 temporal averages •Are there correlations across seasons or latitudes? •Are there large discrepancies in the different outputs? Visualization should help find correlations of similar outputs – important for uncertainty quantification Seasons Zones Annual Dec‐Jan‐Feb Jun‐Jul‐Aug Northern High Lats Northern Med. Lats Northern Low Lats Southern Low Lats Find correlation Southern Med. Lats Find correlation betweenbetween (6+1)x3(6+1)x3 == Southern High Lats 2121 variabvariablesles

AA scattescatterplotrplot matrixmatrix bebeccoomemess impractiimpracticcalalforfor manymany outputsoutputs

Visualization should help find correlations of similar outputs – important for uncertainty quantification Seasons Zones Annual Dec‐Jan‐Feb Jun‐Jul‐Aug Northern High Lats Northern Med. Lats Northern Low Lats Southern Low Lats Southern Med. Lats Southern High Lats AA parallelparallel coocoorrdinatedinate vviisuasualizalizattioionn isis moremore practicalpractical ButBut onlyonly certaincertain papairwirwiisese comparisonscomparisonsareare popossiblessible

Visual Summaries Upper percentile • Represent directly summary quantities, e.g.,

Upper quartile mean, standard deviation, entropy. median Lower quartile Lower percentile • Box‐plots and their many variants

• One plot per ensemble may result in clutter

• Visualizing several statistics simultaneously in a metric space: Taylor diagram

Potter et al. (Eurovis’2010) The Taylor Diagram

Simultaneously plots Standard deviation, Root Mean Square Error and Correlation R. Applications of the Taylor Diagram

Taylor, 2005 (Taylor Diagram Primer) Anscombe’s Trio Variables B,C,D: same standard deviation and same correlation w.r.t. A

noise non‐linear outlier

InformationInformation TheoryTheory toto thethe Rescue!Rescue!

Information Theory Primer • Entropy H(X) – Measure of information H(X) H(Y) uncertainty of X • H(X,Y) – Uncertainty of X,Y H(X|Y) I(X;Y) H(Y|X) • H(X|Y) – Uncertainty of X given that I know Y • Mutual Information I(X;Y) H(X,Y) – How much knowing X reduces the uncertainty of Y Variation of Information VI=H(X | Y) + H(Y | X) XX andand YY areare thethe VI=0 samesame

H(X,Y) = H(X) = H(Y) = I(X;Y)

VI

XX andand YY areare VI=H(X;Y) independentindependent

H(X,Y) = H(X) + H(Y) I(X;Y) = 0 The Variation of Information VI: a measure of distance in The Variation of Information VI: a measure of distance in information theory

Normalized Mutual Information (NMI)

RVI Diagram RVIEquivalences Diagram Statistics Information Theory

VI Diagram Experiment of 2D distributions with outliers Beta (clean) Beta (outliers)

2 m D a h r is g to add outliers to gr is a h m D 2

uniform binomial MI diagram is more resilient to outliers

OutliersOutliers havehave aa significantsignificant TheThe informationinformation inin bothboth thethe impactimpact onon correlationcorrelation “clean”“clean”andand “dirty”“dirty”distributionsdistributions isis essentiallyessentially thethe samesame

Computing Entropy and Mutual Information may require estimation of underlying probability functions

N(sx=1.0, sy=0.5, R=0.99} N(sx=1.0, sy=1.5, R=0.99} AlthoughAlthough therethere areare differences,differences, N(sx=1.0, sy=0.5, R=0.95} relativerelativedistancesdistances areare consistentconsistentforfor N(sx=1.0, sy=1.5, R=0.95} eacheach choicechoice ofof kernelkernel N(sx=1.0, sy=0.5, R=0.90} N(sx=1.0, sy=1.5, R=0.90} Uncertainty Quantification in Climate Simulations Precipitation average 7 Zonal averages (color) 3 Temporal averages (shape) 3 different ensemble sets (size) 290

Intercomparison Studies 289 Annual mean temperature 1900‐2000 288 287 Temperature

286

285 1900 1920 1940 1960 1980 2000 Year 290

Intercomparison Studies 289 Annual mean temperature 1900‐2000 288 287 Temperature

286

285 1900 1920 1940 1960 1980 2000 Year 290

Intercomparison Studies 289 Annual mean temperature 1900‐2000 288 287 Temperature

286

285 1900 1920 1940 1960 1980 2000 Year MID applies to discrete data: useful when comparing Clustering Results

• Summarize study in clustering [Filippone et al. 2009] • 8 different methods • 4 classification problems Concluding Remarks • Taylor diagram: – easy to compute. – Well understood in geophysical sciences, climate.

• MI diagram: – Counterpart using information theory. – requires an estimation step that may introduce additional uncertainties. – extends nicely to categorical data, multi‐variate distributions. – exposes non‐linearities, difficult to see via (linear) correlation.

• More informed decisions when combining both diagrams. Thanks!

Questions?