Informaon Visualizaon Where Design Meets Engineering

Nan Cao TongJi College of Design and Innovaon Intelligent Big Data Visualizaon Lab hp://nancao.org

Data-Driven Social Media Management, Markeng Breaking Bin Laden “The best way to capture the imaginaon is to speak to the eyes.”

-- Nan Cao (曹楠) hp://nancao.org

A Big Data Visualizaon Researcher A Professor of TongJi College of Design and Innovaon Intelligent Big Data Vis Lab

Healthcare informacs

Urban Compung

Behavior Analysis Outlier Detecon Informaon Security Informaon Visualizaon

The computer supported techniques that transform data into interacve visual representaons William Playfair (Sep. 22nd 1759 – Feb.11th 1823),a Scosh engineer and Stascal – 2.5 Centuries Ago polical economist, the founder of graphical methods of stascs. The process of Napoleon’s envision of Moscow

Charles Joseph Minard (27 March 1781 – 24 October 1870) was a French civil engineer recognized for his significant contribuon in the field of informaon Visual Story Telling – Two Centuries ago graphics in civil engineering and stascs. Minard was among other things noted for his representaon of numerical data on geographic . Oo Neurath (December 10, 1882 – December 22, 1945) was an Austrian philosopher of science, sociologist, and polical economist.

Isotype (picture language) – One century ago Diamonds Were a Girl’s Best Friend.

Nigel Holmes (born 15 June 1942, Swanland, ) is a Brish/American graphic designer, author, and theorist, who focuses on informaon graphics and informaon design.

Infographics – 60 years ago Diamonds Were a Girl’s Best Friend.

Edward Tue (1942-) is a stascian and arst, and Professor Emeritus of Polical Science, Stascs, and Computer Science at Yale University.

Chartjunck – 1983 (born August 21, 1947) is an American computer scienst, a Disnguished Professor in CS department of the University of Maryland, and the founding director (1983-2000) of Human-Computer Interacon Lab.

TreeMap - 1992 A start point of modern informaon visualizaon Daniel Keim, a German computer scienst and full professor (Chair of Informaon Processing) in the Computer Science department at the University of Konstanz, Germany.

Visual Analysis Reference Model – 2006 1759-1882 1781-1870 1882-1945 1947- Graphic designer 1942-

Engineer Engineer philosopher Computer Scienst Computer Scienst of science

Stascian & Arst This field is majorly driven by engineering and science since the very beginning Nowadays various visualizaon techniques have been developed to solve problems that require humans’ opinions and judgments for decision making

www.visualcomplexity.com Big Data

25 million miles of airline traffic in 2 minutes

Large Data Visualizaon

14.8 million tweets 25 million miles of airline traffic 500 million users

• Challenging Task : – Squeezing millions and even billions of records into million pixels (1600 X 1200 ≈ 2 million pixels)

26 Key Challenges

Visual cluer Performance issues Limited cognion

How can we avoid visual How can we render the How can users understand cluers like overlaps huge datasets in real me the visual representaon and crossings? with rich interacons? when the informaon is overwhelming? 27 Reference Model

Raw Data Abstract Data Visual Form View filtering rendering User

Data Encoding and Display Mining Layout interacons

Reference Model For Informaon Visualizaon and Visual Analysis

28 Case Study: Applicaons in the Field of Informaon Security InfoSec in Our Daily Life

Protect the low level communicaon of signals

金融欺诈的检测与分析 金融欺诈的检测与分析金融欺诈的检测与分析 金融欺诈的检测与分析金融欺诈的检测与分析 Behavior Security The Growth of Social Media

http://www.jeffbullas.com/2014/01/17/20-social-media-facts-and-statistics-you-should-know-in-2014/ Do you trust your friends in social media ?

(ACM Conference on Online Social Networks, 2014) An adage since 1993 Anonymous users are potenal threats to the society Research Goal

Finding users with anomalous behaviors on social media (e.g. Twier) Anomaly Detecon

Anomaly detecon (or outlier detecon) is the idenficaon of items, events or observaons which do not conform to an expected paern or other items in a dataset -- Wikipedia Key Challenges

• It is difficult to define what is normal or abnormal

• Unavailable of ground truth or labelled data making results validaon difficult Informaon visualizaon can be a good aid to these challenges System Overview Live Monitoring

Preprocessing Analysis Visualizaons

Senment Data Filtering Analysis Feature Extracon Anomaly Detecon Data Store Hadoop Indexing Clustering

query Index Live Twier Data Whisper (2012)

"Whisper: Tracing the Spaotemporal Process of Informaon Diffusion in Real Time", IEEE Transacons on Visualizaon and . TVCG, InfoVis 2012. SocialHelix (2013)

"SocialHelix: Visual Analysis of Senment Divergence in Social Media", Journal of Visualizaon, May 2015, Volume 18, Issue 2, pp 221-235 FluxFlow (2014)

#FluxFlow: Visual Analysis of Anomalous Informaon Spreading on Social Media, TVCG, IEEE VAST 2014 (Honorable Menon) Episogram (2015)

"Episogram: Visual Summarizaon of Egocentric Social Interacons” IEEE Computer Graphics and Applicaons TargetVue (2016)

TargetVue: Visual Analysis of Anomalous User Behaviors in Online Communicaon Systems, IEEE TVCG 2016 User Behaviors

• Posng (Tweeng) – Create a message and post it to others

• Responding (Replying / Retweeng) – Spread the messages posted by others

Capturing User Behaviors via Features

Stasc Features # of Posng / Retweeng Content Features Senments, Keywords Temporal Features Frequency of Posing Network Features Centrality of the user User User Profile Features # of followers T1 T2 T3 T4 feature vector me-series Anomaly Detecon

TLOF: Temporal Local Outlier Factor

The TLOF gives an anomaly measurement for every users by idenfying the features that are significantly different from other users in the test data and the past history of his own

Breunig, Markus M., et al. "LOF: idenfying density-based local outliers." ACM sigMOD record. Vol. 29. No. 2. ACM, 2000. Introducon Visualizaon Design Evaluaon User Interface The Inspecon View

Behaviors Features Social Interacons Showing Posng and Retweeng Acvies

Acvity Thread

Time Raw Tweet Users who retweet the tweet at different me Showing Posng and Retweeng Acvies

A user is visualized as circle sized by their importance and colored by their anomaly score 51 Showing Posng and Retweeng Acvies

End Time Start Time

The centric user’s acvity history is recorded in a circular meline 52 Showing Posng and Retweeng Acvies

End Time Start Time

the acvity’s t1 Acvity Threads t2 Color: senments iniate me Thickness: # of involving users t3

53 Showing Posng and Retweeng Acvies

End Time Start Time

t1 t2

t3

54 Showing Posng and Retweeng Acvies

End Time Start Time

t1 t2

t3

55 Who is the normal user? Context View 1: Showing Acvity Dynamics

me

Aggregate all the acvity threads and align all the involving users together along a meline The Inspecon View

Behaviors Features Social Interacons Showing a User’s Features

Behavior Features Content Features Temporal Features Network Features User User Profile Featuers T1 T2 T3 Mean Feature feature vector me-series Visualizing a User’s Features

A user is visualized as circle sized by their importance and colored by their anomaly score 60 Visualizing a User’s Features

Using a baseline circle to indicate the mean feature values over all the users 61 Visualizing a User’s Features

F1 Fn F2

F7 F3

F6 F4 F5 Feature axes are radially arranged

62 Visualizing a User’s Features

Plot the user’s feature values along the feature axes surrounding the baseline 63 Visualizing a User’s Features

Connecng the data points to produce the feature glyph

64 Who is the normal user?

65 Context View 2: Showing Feature Dynamics

• Each user is shown as a heatmap • Row: different features • Column: different mestamp • Cell Color: feature values (red: larger than mean, blue: smaller Showing the change of users’ features over me in a heatmap than mean) The Inspecon View

Behaviors Features Social Interacons Showing the Social Interacons

• Nodes indicate users • Links indicate their social interacons (retweet/following) 68 Introducon Visualizaon Design Evaluaon Evaluaon: Bot Detecon Challenge

• The goal of the influence challenge was to • design social bots in Twier to promote the advantages of vaccinaon • influence a target network of users who are supporters of an-vaccine • Lasted for a month during Dec 2014 Bot Influence Challenge • 8000 target users, 4 million tweets Evaluaon

Bot Detecon Challenge

Iteravely Model Tuning Final Results Conclusion The History of Informaon Visualizaon

1759-1882 1781-1870 1882-1945 1947- Graphic designer 1942-

Engineer Engineer philosopher Computer Scienst of science

Stascian & Arst Examples of Big Data Visualizaon A Case Study of Big Data Visual Analysis

Behaviors Features Social Interacons Thank You

Healthcare informacs

Urban Compung

Informaon Security Intelligent Big Data Visualizaon Lab Informaon Visualizaon Where Design Meets Engineering

Nan Cao TongJi College of Design and Innovaon Intelligent Big Data Visualizaon Lab hp://nancao.org