EULERIAN TOUR ALGORITHMS FOR DATA VISUALIZATION AND THE PAIRVIZ PACKAGE
Catherine Hurley R.W. Oldford NUI Maynooth U. Waterloo
July 8 2009 UseR!
Monday 13 July 2009 Graphics: Effect Ordering
• Packages: seriation, gclus, corrgram
• Example: PCP Flea data Standard order Correlation order
Tars2 Aede1 Aede2 Aede3 Tars1 Head Tars1 Tars2 Aede1 Aede2 Head Aede3 0.2 -0.6
Monday 13 July 2009 Pairviz: relationship ordering
• Statistical graphics are about comparisons
between variables, cases, groups, models
Flea data: correlation order
Aede3 Aede2 Aede1 Tars2 Aede2 Aede1 Aede3 Tars2 Tars1 Head Tars2 Tars1 Aede1 Head Aede2 Tars1 Aede3 Head 0.6 0.0 -0.6
Monday 13 July 2009 A graph model • Build a graph where nodes are statistical objects
C B • Edges are relationships
D A • Example: Node Vis Edge Vis E F Group Boxplot Two groups CI for mean diff Var Hist Two vars Scatterplot 2 vars Scat 4-d space Dynamic scat Model Resid 2 Models PCP
Monday 13 July 2009 Example: planned comparisons
Mice in 5 diet groups, response is lifetime Nodes are treatments, edges are planned comparisons Weights are p-values
Planned comparisons of diets 10 lopro 50 40 0.0147 5
30 0 Lifetime
NP 0 N/N85 0 N/R50 0.3111 R/R50 20 Differences -5
0.0083 10
N/R50 N/N85 NP lopro N/R50 N/R40 R/R50 N/R50
N/R40 Reducing calories and protein increases lifetime
Monday 13 July 2009 Graph Traversal
• Traverse all nodes: hamiltonian path
C C D B D B
E A E A
C
F H F H B G G D
A Open hamiltonian path Closed hamiltonian path E
G F
Closed eulerian path on K7 • Traverse all edges: eulerian path
• Use gclus, seriation: hamiltonian paths on complete graphs
• PairViz: eulerian paths
Monday 13 July 2009 Graph Structures
• Complete graph: all
comparisons are interesting Aede3 Aede2 Aede1 Tars2 Aede2 Aede1 Aede3 Tars2 Tars1 Head Tars2 Tars1 Aede1 Head Aede2 Tars1 Aede3 Head 0.6 0.0 • Edge-weighted graphs: low -0.6 weight edges are more Weight edges by 1-corr, eulerian interesting follows low weight edges
• Bipartite graph X1 eg only treatment-control Y1 X2 comparisons are of interest Y2
X3
Monday 13 July 2009 Graph Structures- cont’d
•Hypercube graph •Line graph B
Cube for factorial experiment transform G C A 110 111
D
010 011
AD AC 100 101 to L(G) BC AB
000 001
BD CD or model selection: eg Each node in G is a var, Each node in G is a predictor subset each node in L(G) is var pair, edge: add/drop predictor edge is 3-d transition
Monday 13 July 2009 Algorithms- Complete graph
• Closed eulerian path exists when each node has odd number of vertices: ie for K2n+1
• Hamiltonian decomposition of graph
• into hamiltonian cycles: eulerian for K2n+1
3 3 3 2 2 2
4 4 4
1 1 1
5 5 5
7 7 7 6 6 6
• into hamiltonian paths: approx eulerian for K2n
• classical algorithm: hpaths
• WHam: weighted_hpaths: pick best for H1, best orientaton and order for others.
Monday 13 July 2009 Algorithms-Complete graph cont’d
• Recursive algorithm: eseq:
• Start with eulerian on Kn, append edges to get eulerian on Kn+2
1 2 3 4 5
6 7
Monday 13 July 2009 Algorithms- general
• Eulerian graph: connected, all nodes have even number of edges
lopro • Otherwise, add edges, pairing up odd nodes 0.0147 Chinese postman does this in optimal way NP 0 N/N85 0 N/R50 0.3111 R/R50
0.0083
• Classical algorithm (Hierholzer, Fleury) N/R40
• Our version GrEul, (etour) follows weight increasing edges
Monday 13 July 2009 Algorithms comparison
Complete-no weights
Etour 9 Eseq 9 hpaths 9 8 8 8
6 6 6 4 4 4 2 2 2
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 prefers low vertices prefers low edges 4 hamiltonians
Monday 13 July 2009 Algorithms: complete, weighted
Eurodist: 21 European cities
Algorithm eseq: Eurodist edge weights Weighted etour on Eurodist Weighted hamiltonians on Eurodist 4000 4000 4000 3000 3000 3000 2000 2000 2000 1000 1000 1000 0 0 0 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 ignores weights Starts in Geneva
1 2 3 4 5 6 7 8 9 10 hamiltonian decomp, with increasing path lengths
Monday 13 July 2009 Example: model selection
Mammal sleep data Y= log brain wt. Predictors A= non dreaming sleep, B=dreaming sleep, C=log body wt, D=life span
ABCD •Hypercube graph represents possible moves in a stepwise regression algorithm ABC ABD ACD BCD •Graph Qn is hamiltonian, and eulerian for even n •Edge weights: change in SSE
AB AC AD BC BD CD
A B C D
0 Sleep data: Model residuals.
•Eulerian starting with full model
•All models with C are good ABCD BCD CD ACD ABCD ABC BC C AC ABC AB A AD ABD BD D AD ACD AC A 0 D CD C 0 B BD BCD BC B AB ABD ABCD •Bar chart: change in SSE
Monday 13 July 2009 More variables
Sleep data: 10 vars (nodes) 45 edges Eulerian has length 50
Eulerian on scagnostics: Outlying
GP Bd L Br Bd SW PS TS SE PS TS D L P L PS Br P TS Bd TS PS P D D Br P D 0.6 0.3 0.0
Using outlying index from scagnostics package for eulerian traversal zoom on first half of display
Monday 13 July 2009 More variables-cont’d
Reduce the graph NN graph: eliminate edges with outlier index < .2
Bd NN Eulerian on scagnostics: Outlying Br
SW
L
GP L Bd SW L Br GP GP 0.6 0.3 0.0
Reduces graph from 10 to 5 nodes, and 45 to 5 edges Other nodes have no edges
Monday 13 July 2009 IN CONCLUSION..
• Pairviz package: relationship ordering for data visualisation
• Current version: algorithms presented here
• Thanks to graph, igraph
• Work in progress: ordering dynamic visualisations via ggobi.
with Adrian Waddell, UW
Monday 13 July 2009