Informa on Visualiza on Where Design Meets Engineering
Nan Cao TongJi College of Design and Innova on Intelligent Big Data Visualiza on Lab h p://nancao.org
Data-Driven Social Media Management, Marke ng Breaking Bin Laden “The best way to capture the imagina on is to speak to the eyes.”
-- William Playfair Nan Cao (曹楠) h p://nancao.org
A Big Data Visualiza on Researcher A Professor of TongJi College of Design and Innova on Intelligent Big Data Vis Lab
Healthcare informa cs
Urban Compu ng
Behavior Analysis Outlier Detec on Informa on Security Informa on Visualiza on
The computer supported techniques that transform data into interac ve visual representa ons William Playfair (Sep. 22nd 1759 – Feb.11th 1823),a Sco sh engineer and Sta s cal Charts – 2.5 Centuries Ago poli cal economist, the founder of graphical methods of sta s cs. The process of Napoleon’s envision of Moscow
Charles Joseph Minard (27 March 1781 – 24 October 1870) was a French civil engineer recognized for his significant contribu on in the field of informa on Visual Story Telling – Two Centuries ago graphics in civil engineering and sta s cs. Minard was among other things noted for his representa on of numerical data on geographic maps. O o Neurath (December 10, 1882 – December 22, 1945) was an Austrian philosopher of science, sociologist, and poli cal economist.
Isotype (picture language) – One century ago Diamonds Were a Girl’s Best Friend.
Nigel Holmes (born 15 June 1942, Swanland, England) is a Bri sh/American graphic designer, author, and theorist, who focuses on informa on graphics and informa on design.
Infographics – 60 years ago Diamonds Were a Girl’s Best Friend.
Edward Tu e (1942-) is a sta s cian and ar st, and Professor Emeritus of Poli cal Science, Sta s cs, and Computer Science at Yale University.
Chartjunck – 1983 Ben Shneiderman (born August 21, 1947) is an American computer scien st, a Dis nguished Professor in CS department of the University of Maryland, and the founding director (1983-2000) of Human-Computer Interac on Lab.
TreeMap - 1992 A start point of modern informa on visualiza on Daniel Keim, a German computer scien st and full professor (Chair of Informa on Processing) in the Computer Science department at the University of Konstanz, Germany.
Visual Analysis Reference Model – 2006 1759-1882 1781-1870 1882-1945 1947- Graphic designer 1942-
Engineer Engineer philosopher Computer Scien st Computer Scien st of science
Sta s cian & Ar st This field is majorly driven by engineering and science since the very beginning Nowadays various visualiza on techniques have been developed to solve problems that require humans’ opinions and judgments for decision making
www.visualcomplexity.com Big Data
25 million miles of airline traffic in 2 minutes
Large Data Visualiza on
14.8 million tweets 25 million miles of airline traffic 500 million users
• Challenging Task : – Squeezing millions and even billions of records into million pixels (1600 X 1200 ≈ 2 million pixels)
26 Key Challenges
Visual clu er Performance issues Limited cogni on
How can we avoid visual How can we render the How can users understand clu ers like overlaps huge datasets in real me the visual representa on and crossings? with rich interac ons? when the informa on is overwhelming? 27 Reference Model
Raw Data Abstract Data Visual Form View filtering rendering User
Data Encoding and Display Mining Layout interac ons
Reference Model For Informa on Visualiza on and Visual Analysis
28 Case Study: Applica ons in the Field of Informa on Security InfoSec in Our Daily Life
Protect the low level communica on of signals
金融欺诈的检测与分析 金融欺诈的检测与分析金融欺诈的检测与分析 金融欺诈的检测与分析金融欺诈的检测与分析 Behavior Security The Growth of Social Media
http://www.jeffbullas.com/2014/01/17/20-social-media-facts-and-statistics-you-should-know-in-2014/ Do you trust your friends in social media ?
(ACM Conference on Online Social Networks, 2014) An adage since 1993 Anonymous users are poten al threats to the society Research Goal
Finding users with anomalous behaviors on social media (e.g. Twi er) Anomaly Detec on
Anomaly detec on (or outlier detec on) is the iden fica on of items, events or observa ons which do not conform to an expected pa ern or other items in a dataset -- Wikipedia Key Challenges
• It is difficult to define what is normal or abnormal
• Unavailable of ground truth or labelled data making results valida on difficult Informa on visualiza on can be a good aid to these challenges System Overview Live Monitoring
Preprocessing Analysis Visualiza ons
Sen ment Data Filtering Analysis Feature Extrac on Anomaly Detec on Data Store Hadoop Indexing Clustering
query Index Live Twi er Data Whisper (2012)
"Whisper: Tracing the Spa otemporal Process of Informa on Diffusion in Real Time", IEEE Transac ons on Visualiza on and Computer Graphics. TVCG, InfoVis 2012. SocialHelix (2013)
"SocialHelix: Visual Analysis of Sen ment Divergence in Social Media", Journal of Visualiza on, May 2015, Volume 18, Issue 2, pp 221-235 FluxFlow (2014)
#FluxFlow: Visual Analysis of Anomalous Informa on Spreading on Social Media, TVCG, IEEE VAST 2014 (Honorable Men on) Episogram (2015)
"Episogram: Visual Summariza on of Egocentric Social Interac ons” IEEE Computer Graphics and Applica ons TargetVue (2016)
TargetVue: Visual Analysis of Anomalous User Behaviors in Online Communica on Systems, IEEE TVCG 2016 User Behaviors
• Pos ng (Twee ng) – Create a message and post it to others
• Responding (Replying / Retwee ng) – Spread the messages posted by others
Capturing User Behaviors via Features
Sta s c Features # of Pos ng / Retwee ng Content Features Sen ments, Keywords Temporal Features Frequency of Posi ng Network Features Centrality of the user User User Profile Features # of followers T1 T2 T3 T4 feature vector me-series Anomaly Detec on
TLOF: Temporal Local Outlier Factor
The TLOF gives an anomaly measurement for every users by iden fying the features that are significantly different from other users in the test data and the past history of his own
Breunig, Markus M., et al. "LOF: iden fying density-based local outliers." ACM sigMOD record. Vol. 29. No. 2. ACM, 2000. Introduc on Visualiza on Design Evalua on User Interface The Inspec on View
Behaviors Features Social Interac ons Showing Pos ng and Retwee ng Ac vi es
Ac vity Thread
Time Raw Tweet Users who retweet the tweet at different me Showing Pos ng and Retwee ng Ac vi es
A user is visualized as circle sized by their importance and colored by their anomaly score 51 Showing Pos ng and Retwee ng Ac vi es
End Time Start Time
The centric user’s ac vity history is recorded in a circular meline 52 Showing Pos ng and Retwee ng Ac vi es
End Time Start Time
the ac vity’s t1 Ac vity Threads t2 Color: sen ments ini ate me Thickness: # of involving users t3
53 Showing Pos ng and Retwee ng Ac vi es
End Time Start Time
t1 t2
t3
54 Showing Pos ng and Retwee ng Ac vi es
End Time Start Time
t1 t2
t3
55 Who is the normal user? Context View 1: Showing Ac vity Dynamics