<<

EULERIAN TOUR ALGORITHMS FOR DATA VISUALIZATION AND THE PAIRVIZ PACKAGE

Catherine Hurley R.W. Oldford NUI Maynooth U. Waterloo

July 8 2009 UseR!

Monday 13 July 2009 Graphics: Effect Ordering

• Packages: seriation, gclus, corrgram

• Example: PCP Flea data Standard order Correlation order

Tars2 Aede1 Aede2 Aede3 Tars1 Head Tars1 Tars2 Aede1 Aede2 Head Aede3 0.2 -0.6

Monday 13 July 2009 Pairviz: relationship ordering

• Statistical graphics are about comparisons

between variables, cases, groups, models

Flea data: correlation order

Aede3 Aede2 Aede1 Tars2 Aede2 Aede1 Aede3 Tars2 Tars1 Head Tars2 Tars1 Aede1 Head Aede2 Tars1 Aede3 Head 0.6 0.0 -0.6

Monday 13 July 2009 A graph model • Build a graph where nodes are statistical objects

C B • Edges are relationships

D A • Example: Node Vis Edge Vis E F Group Boxplot Two groups CI for mean diff Var Hist Two vars Scatterplot 2 vars Scat 4-d space Dynamic scat Model Resid 2 Models PCP

Monday 13 July 2009 Example: planned comparisons

Mice in 5 diet groups, response is lifetime Nodes are treatments, edges are planned comparisons Weights are p-values

Planned comparisons of diets 10 lopro 50 40 0.0147 5

30 0 Lifetime

NP 0 N/N85 0 N/R50 0.3111 R/R50 20 Differences -5

0.0083 10

N/R50 N/N85 NP lopro N/R50 N/R40 R/R50 N/R50

N/R40 Reducing calories and protein increases lifetime

Monday 13 July 2009 Graph Traversal

• Traverse all nodes: hamiltonian

C C D B D B

E A E A

C

F H F H B G G D

A Open Closed hamiltonian path E

G F

Closed eulerian path on K7 • Traverse all edges: eulerian path

• Use gclus, seriation: hamiltonian paths on complete graphs

• PairViz: eulerian paths

Monday 13 July 2009 Graph Structures

: all

comparisons are interesting Aede3 Aede2 Aede1 Tars2 Aede2 Aede1 Aede3 Tars2 Tars1 Head Tars2 Tars1 Aede1 Head Aede2 Tars1 Aede3 Head 0.6 0.0 • Edge-weighted graphs: low -0.6 weight edges are more Weight edges by 1-corr, eulerian interesting follows low weight edges

• Bipartite graph X1 eg only treatment-control Y1 X2 comparisons are of interest Y2

X3

Monday 13 July 2009 Graph Structures- cont’d

•Hypercube graph • B

Cube for factorial experiment transform G C A 110 111

D

010 011

AD AC 100 101 to L(G) BC AB

000 001

BD CD or model selection: eg Each node in G is a var, Each node in G is a predictor subset each node in L(G) is var pair, edge: add/drop predictor edge is 3-d transition

Monday 13 July 2009 Algorithms- Complete graph

• Closed eulerian path exists when each node has odd number of vertices: ie for K2n+1

• Hamiltonian decomposition of graph

• into hamiltonian cycles: eulerian for K2n+1

3 3 3 2 2 2

4 4 4

1 1 1

5 5 5

7 7 7 6 6 6

• into hamiltonian paths: approx eulerian for K2n

• classical algorithm: hpaths

• WHam: weighted_hpaths: pick best for H1, best orientaton and order for others.

Monday 13 July 2009 Algorithms-Complete graph cont’d

• Recursive algorithm: eseq:

• Start with eulerian on Kn, append edges to get eulerian on Kn+2

1 2 3 4 5

6 7

Monday 13 July 2009 Algorithms- general

• Eulerian graph: connected, all nodes have even number of edges

lopro • Otherwise, add edges, pairing up odd nodes 0.0147 Chinese postman does this in optimal way NP 0 N/N85 0 N/R50 0.3111 R/R50

0.0083

• Classical algorithm (Hierholzer, Fleury) N/R40

• Our version GrEul, (etour) follows weight increasing edges

Monday 13 July 2009 Algorithms comparison

Complete-no weights

Etour 9 Eseq 9 hpaths 9 8 8 8

6 6 6 4 4 4 2 2 2

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 prefers low vertices prefers low edges 4 hamiltonians

Monday 13 July 2009 Algorithms: complete, weighted

Eurodist: 21 European cities

Algorithm eseq: Eurodist edge weights Weighted etour on Eurodist Weighted hamiltonians on Eurodist 4000 4000 4000 3000 3000 3000 2000 2000 2000 1000 1000 1000 0 0 0 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 ignores weights Starts in Geneva

1 2 3 4 5 6 7 8 9 10 hamiltonian decomp, with increasing path lengths

Monday 13 July 2009 Example: model selection

Mammal sleep data Y= log brain wt. Predictors A= non dreaming sleep, B=dreaming sleep, C=log body wt, D=life span

ABCD •Hypercube graph represents possible moves in a stepwise regression algorithm ABC ABD ACD BCD •Graph Qn is hamiltonian, and eulerian for even n •Edge weights: change in SSE

AB AC AD BC BD CD

A B C D

0 Sleep data: Model residuals.

•Eulerian starting with full model

•All models with C are good ABCD BCD CD ACD ABCD ABC BC C AC ABC AB A AD ABD BD D AD ACD AC A 0 D CD C 0 B BD BCD BC B AB ABD ABCD •Bar chart: change in SSE

Monday 13 July 2009 More variables

Sleep data: 10 vars (nodes) 45 edges Eulerian has length 50

Eulerian on scagnostics: Outlying

GP Bd L Br Bd SW PS TS SE PS TS D L P L PS Br P TS Bd TS PS P D D Br P D 0.6 0.3 0.0

Using outlying index from scagnostics package for eulerian traversal zoom on first half of display

Monday 13 July 2009 More variables-cont’d

Reduce the graph NN graph: eliminate edges with outlier index < .2

Bd NN Eulerian on scagnostics: Outlying Br

SW

L

GP L Bd SW L Br GP GP 0.6 0.3 0.0

Reduces graph from 10 to 5 nodes, and 45 to 5 edges Other nodes have no edges

Monday 13 July 2009 IN CONCLUSION..

• Pairviz package: relationship ordering for data visualisation

• Current version: algorithms presented here

• Thanks to graph, igraph

• Work in progress: ordering dynamic visualisations via ggobi.

with Adrian Waddell, UW

Monday 13 July 2009