Informamon Visualizamon
Total Page:16
File Type:pdf, Size:1020Kb
Informa(on Visualiza(on Where Design Meets Engineering Nan Cao TongJi College of Design and Innova(on Intelligent Big Data Visualiza(on Lab hp://nancao.org Data-Driven Social Media Management, Marke5ng Breaking Bin Laden “The best way to capture the imagina(on is to speak to the eyes.” -- William Playfair Nan Cao (曹楠) hFp://nancao.org A Big Data Visualizaon Researcher A Professor of TongJi College of Design and Innovaon Intelligent Big Data Vis Lab Healthcare informacs Urban Compung Behavior Analysis Outlier Detecon Informaon Security Informa(on Visualiza(on The computer supported techniques that transform data into interac(ve visual representa(ons William Playfair (Sep. 22nd 1759 – Feb.11th 1823),a ScoUsh engineer and Sta(scal Charts – 2.5 Centuries Ago poli5cal economist, the founder of graphical methods of sta(scs. The process of Napoleon’s envision of Moscow Charles Joseph Minard (27 March 1781 – 24 October 1870) was a French civil engineer recognized for his significant contribu5on in the field of informaon Visual Story Telling – Two Centuries ago graphics in civil engineering and stas5cs. Minard was among other things noted for his representaon of numerical data on geographic maps. Oo Neurath (December 10, 1882 – December 22, 1945) was an Austrian philosopher of science, sociologist, and poli5cal economist. Isotype (picture language) – One century ago Diamonds Were a Girl’s Best Friend. Nigel Holmes (born 15 June 1942, Swanland, England) is a Bri5sh/American graphic designer, author, and theorist, who focuses on informaon graphics and informaon design. Infographics – 60 years ago Diamonds Were a Girl’s Best Friend. Edward Tue (1942-) is a stas5cian and ar5st, and Professor Emeritus of Poli5cal Science, Stas5cs, and Computer Science at Yale University. Chartjunck – 1983 Ben Shneiderman (born August 21, 1947) is an American computer scien5st, a Dis5nguished Professor in CS department of the University of Maryland, and the founding director (1983-2000) of Human-Computer Interac5on Lab. TreeMap - 1992 A start point of modern informaon visualizaon Daniel Keim, a German computer scien5st and full professor (Chair of Informaon Processing) in the Computer Science department at the University of Konstanz, Germany. Visual Analysis Reference Model – 2006 1759-1882 1781-1870 1882-1945 1947- Graphic designer 1942- Engineer Engineer philosopher Computer Scienst Computer Scienst of science Stas5cian & Ar5st This field is majorly driven by engineering and science since the very beginning Nowadays various visualiza(on techniques have been developed to solve problems that require humans’ opinions and judgments for decision making www.visualcomplexity.com Big Data 25 million miles of airline traffic in 2 minutes Large Data Visualiza(on 14.8 million tweets 25 million miles of airline traffic 500 million users • Challenging Task : – Squeezing millions and even billions of records into million pixels (1600 X 1200 ≈ 2 million pixels) 26 Key Challenges Visual cluFer Performance issues Limited cogni5on How can we avoid visual How can we render the How can users understand cluFers like overlaps huge datasets in real 5me the visual representaon and crossings? with rich interac5ons? when the informaon is overwhelming? 27 Reference Model Raw Data Abstract Data Visual Form View filtering rendering User Data Encoding and Display Mining Layout interac5ons Reference Model For Informa(on Visualiza(on and Visual Analysis 28 Case Study: Applica(ons in the Field of Informa(on Security InfoSec in Our Daily Life Protect the low level communicaon of signals 金融欺诈的检测与分析 金融欺诈的检测与分析金融欺诈的检测与分析 金融欺诈的检测与分析金融欺诈的检测与分析 Behavior Security The Growth of Social Media http://www.jeffbullas.com/2014/01/17/20-social-media-facts-and-statistics-you-should-know-in-2014/ Do you trust your friends in social media ? (ACM Conference on Online Social networks, 2014) An adage since 1993 Anonymous users are poten(al threats to the society Research Goal Finding users with anomalous behaviors on social media (e.g. Twier) Anomaly Detec6on Anomaly detec5on (or outlier detec5on) is the iden5ficaon of items, events or observaons which do not conform to an expected paern or other items in a dataset -- Wikipedia Key Challenges • It is difficult to define what is normal or abnormal • Unavailable of ground truth or labelled data making results validaon difficult Informaon visualizaon can be a good aid to these challenges System Overview Live Monitoring Preprocessing Analysis Visualiza(ons Senment Data Filtering Analysis Feature Extrac6on Anomaly Detec6on Data Store Hadoop Indexing Clustering query Index Live Twier Data Whisper (2012) "Whisper: Tracing the Spaotemporal Process of Informa(on Diffusion in Real Time", IEEE Transac5ons on Visualizaon and Computer Graphics. TVCG, InfoVis 2012. SocialHelix (2013) "SocialHelix: Visual Analysis of Senment Divergence in Social Media", Journal of Visualizaon, May 2015, Volume 18, Issue 2, pp 221-235 FluxFlow (2014) #FluxFlow: Visual Analysis of Anomalous Informa(on Spreading on Social Media, TVCG, IEEE VAST 2014 (Honorable Men5on) Episogram (2015) "Episogram: Visual Summarizaon of Egocentric Social Interacons” IEEE Computer Graphics and Applicaons TargetVue (2016) TargetVue: Visual Analysis of Anomalous User Behaviors in Online Communicaon Systems, IEEE TVCG 2016 User Behaviors • Pos5ng (Tweeng) – Create a message and post it to others • Responding (Replying / Retwee5ng) – Spread the messages posted by others Capturing User Behaviors via Features Stas5c Features # of Pos5ng / Retwee5ng Content Features Sen5ments, Keywords Temporal Features Frequency of Posing network Features Centrality of the user User User Profile Features # of followers T1 T2 T3 T4 feature vector 5me-series Anomaly Detec6on TLOF: Temporal Local Outlier Factor The TLOF gives an anomaly measurement for every users by iden5fying the features that are significantly different from other users in the test data and the past history of his own Breunig, Markus M., et al. "LOF: iden5fying density-based local outliers." ACM sigMOD record. Vol. 29. no. 2. ACM, 2000. Introducon Visualiza(on Design Evaluaon User Interface The Inspecon View Behaviors Features Social Interac5ons Showing Posng and Retwee6ng Ac6vi6es Ac5vity Thread Time Raw Tweet Users who retweet the tweet at different 5me Showing Posng and Retwee6ng Ac6vi6es A user is visualized as circle sized by their importance and colored by their anomaly score 51 Showing Posng and Retwee6ng Ac6vi6es End Time Start Time The centric user’s ac5vity history is recorded in a circular 5meline 52 Showing Posng and Retwee6ng Ac6vi6es End Time Start Time the ac5vity’s t1 Ac5vity Threads t2 Color: sen5ments ini5ate 5me Thickness: # of involving users t3 53 Showing Posng and Retwee6ng Ac6vi6es End Time Start Time t1 t2 t3 54 Showing Posng and Retwee6ng Ac6vi6es End Time Start Time t1 t2 t3 55 Who is the normal user? Context View 1: Showing Ac6vity Dynamics me Aggregate all the ac5vity threads and align all the involving users together along a 5meline The Inspecon View Behaviors Features Social Interac5ons Showing a User’s Features Behavior Features Content Features Temporal Features network Features User User Profile Featuers T1 T2 T3 Mean Feature feature vector 5me-series Visualizing a User’s Features A user is visualized as circle sized by their importance and colored by their anomaly score 60 Visualizing a User’s Features Using a baseline circle to indicate the mean feature values over all the users 61 Visualizing a User’s Features F1 Fn F2 F7 F3 F6 F4 F5 Feature axes are radially arranged 62 Visualizing a User’s Features Plot the user’s feature values along the feature axes surrounding the baseline 63 Visualizing a User’s Features Connec5ng the data points to produce the feature glyph 64 Who is the normal user? 65 Context View 2: Showing Feature Dynamics • Each user is shown as a heatmap • Row: different features • Column: different 5mestamp • Cell Color: feature values (red: larger than mean, blue: smaller Showing the change of users’ features over 5me in a heatmap than mean) The Inspecon View Behaviors Features Social Interac5ons Showing the Social Interac6ons • nodes indicate users • Links indicate their social interac5ons (retweet/following) 68 Introducon Visualiza(on Design Evaluaon Evaluaon: Bot Detec6on Challenge • The goal of the influence challenge was to • design social bots in TwiFer to promote the advantages of vaccinaon • influence a target network of users who are supporters of an5-vaccine • Lasted for a month during Dec 2014 Bot Influence Challenge • 8000 target users, 4 million tweets Evaluaon Bot Detec5on Challenge Itera6vely Model Tuning Final Results Conclusion The History of Informaon Visualizaon 1759-1882 1781-1870 1882-1945 1947- Graphic designer 1942- Engineer Engineer philosopher Computer Scienst of science Stas5cian & Ar5st Examples of Big Data Visualizaon A Case Study of Big Data Visual Analysis Behaviors Features Social Interac5ons Thank You Healthcare informacs Urban Compung Informaon Security Intelligent Big Data Visualiza(on Lab Informa(on Visualiza(on Where Design Meets Engineering Nan Cao TongJi College of Design and Innova(on Intelligent Big Data Visualiza(on Lab hp://nancao.org .