<<

Wright State University CORE Scholar

The Ohio Center of Excellence in Knowledge- Kno.e.sis Publications Enabled Computing (Kno.e.sis)

2009

User-Generated Content on Social Media Challenges, Opportunities

Meenakshi Nagarajan Wright State University - Main Campus

Follow this and additional works at: https://corescholar.libraries.wright.edu/knoesis

Part of the Bioinformatics Commons, Communication Technology and New Media Commons, Databases and Information Systems Commons, OS and Networks Commons, and the Science and Technology Studies Commons

Repository Citation Nagarajan, M. (2009). User-Generated Content on Social Media Challenges, Opportunities. . https://corescholar.libraries.wright.edu/knoesis/92

This Presentation is brought to you for free and open access by the The Ohio Center of Excellence in Knowledge- Enabled Computing (Kno.e.sis) at CORE Scholar. It has been accepted for inclusion in Kno.e.sis Publications by an authorized administrator of CORE Scholar. For more information, please contact [email protected]. User-Generated Content on Social Media Challenges, Opportunities

Meena Nagarajan, KNO.E.SIS, Wright State [email protected], http://knoesis.wright.edu/researchers/meena/ 1

Tuesday, October 27, 2009 1 The Shift.. in the rules of the game

• Online Media: Packaged Goods Media to a Conversational Media • Variety of networked interactions, many in near real-time • Information economy: from dearth of signals to plenty much!

2 http://gregverdino.typepad.com/greg_verdinos_blog/images/2007/07/09/web2_logos.jpg

Tuesday, October 27, 2009 2 Social Media Investigations

• Network: Social structure emerges from the aggregate of relationships (ties)

!" • People: poster identities, the !" active effort of accomplishing interaction

• Content : studying the content of communication."Who says what, to whom, why, to what extent and with what effect?" [Laswell]

3

Tuesday, October 27, 2009 3 Effects of Networked Publics

• Certain social phenomenon admittedly more complex

• begs for a people-content-network confluence • Micro-level variations of Content- people-network on macro-level features

• “How do the topic of discussion, emotional charge of a conversation, poster characteristics & network connections affect ....?”

4

Tuesday, October 27, 2009 4 People-Content-Network - Possibilities

• Emerging social order in online conversations • How are the people-content-network dynamics shaping online conversations? • Can we understand the Influentials theory, information diffusion properties in networks (etc.) while taking people and content into account?

5

Tuesday, October 27, 2009 5 Stand on the shoulder of micro-giants

• The point is that we need a strong grasp on the micro-level variables of the content, people and network dimensions to begin explaining what they are doing to any social phenomenon... • My focus is on the micro-level variables in the content dimension.

6

Tuesday, October 27, 2009 6 Mapping User-generated Content to Context

7

Tuesday, October 27, 2009 7 Dimensions Of Analysis

• Named Entity Identification WHAT and Disambiguation

• Cultural Named Entities

• Music artist, track named • What are the Named Entities entities (IBM) [ISWC09a, and topics that people are VLDB09], Movie named making references to? entities (MSR) [WWW2010] • How are they interpreting any • Summaries of user situation in local contexts and perceptions behind real-time supporting them in their events from Twitter variable observations? 8

Tuesday, October 27, 2009 8 Dimensions Of Analysis

WHAT

• What are the Named Entities www.evri.com and topics that people are

http://memetracker.org/ making references to?

• How are they interpreting any situation in local contexts and supporting them in their variable observations? 9

Tuesday, October 27, 2009 9 Dimensions Of Analysis

WHY

• What are the diverse intentions that produce the diverse content on social media? • Why we share by looking at what we predominantly do with the medium. Value derived, repurposing.. • Emotion, sentiment expressions.. 10

Tuesday, October 27, 2009 10 Dimensions Of Analysis

• Mapping User Intentions WHY • Information Seeking, Sharing, Transactional intents [WI09] • What is the intention landscape of social media • where is the monetization potential

11

Tuesday, October 27, 2009 11 Dimensions Of Analysis

• Self-presentation in Online HOW Dating Profiles (with Prof. Marti Hearst, UC Berkeley) [ICWSM09] • What do word usages tell us about an active population, or about the medium?

• Dynamics of a conversation - snubs, flaming words, coordination.. or lack thereof! 12

Tuesday, October 27, 2009 12 Dimensions Of Analysis

HOW

• What do word usages tell us about an active population? • Self-presentation • Dynamics of a conversation - snubs, flaming words, coordination.. or lack thereof!

http://wordwatchers.wordpress.com/2008/10/06/language-in-speeches-and-interviews-summary-comparisons/ 13

Tuesday, October 27, 2009 13 The Social Media Content Landscape..

14

Tuesday, October 27, 2009 14 +,'((*-./&0)* !"#$%$#&'()* 9"$:/.,*7'.83*-./&0)*

;353./83"3/&)** 1'.$'2(3*4/"5.$2&6/"*7'.83*-./&0)* 1'.$'2(3*4/"5.$2&6/"** 7'.83*-./&0)* Population, Medium Diversity Some mediation Rate of exchange (asynchronous, synchronous) Many-to-many reach Shared Contexts Slangs, abbreviations, grammar, spelling, media-specific vocabulary Interpersonal interactions 15 !"#$%%&&&'()*+,'*-.%#!-/-0%123,45--/%676879:8;8%0)<40%-%== Tuesday, October 27, 2009 15 Variety & Formality

Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 *

TYPE OF DATA FORMALITY Nat Broadcast Reportage 62.2 Informational writing 61 Academic Social Science 60.6 Writing 58 Professional Letters 57.5 Non Acad Social Science 56.9 Broadcasts 55 Blog corpus 53.3 Scripted Speech 53 TYPE OF DATA FORMALITY Email Corpus 50.8 Prepared speeches 50 Personal Letters 49.7 Imaginative writing 47 Fiction Prose 46.3 Interviews 46 Unscripted Speeches 44.4 Spontaneous speech 44 Conversations 38 Phone Conversations 36 * Heylighen, F. & Dewaele, J. Variation in the contextuality of language: An empirical measure Foundations of Science, 2002, 293-340 Weblogs, Genres and Individual Differences: How bloggers write for who they write for; Scott16 Nowson

Tuesday, October 27, 2009 16 Variety & Formality

Formality Score = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2 *

TYPE OF DATA FORMALITY Nat Broadcast Reportage 62.2 TYPE OF DATA FORMALITY Informational writing 61 Broadcasts 55 Academic Social Science 60.6 Blog corpus 53.3 Writing 58 Scripted Speech 53 Professional Letters 57.5 Email Corpus 50.8 Critic Music reviews from Non Acad Social Science 56.9 50.13 www.metacritic.com/music/ Broadcasts 55 Yahoo Personals AboutMe 50.10 Blog corpus 53.3 MySpace About Me 50.07 Scripted Speech 53 TYPE OF DATA FORMALITY MySpace - comments on Artist Pages 50.06 Email Corpus 50.8 Prepared speeches 50 Prepared speeches 50 Personal Letters 49.7 Personal Letters 49.7 Imaginative writing 47 Twitter 49.46 Fiction Prose 46.3 Facebook posts 48.20 Interviews 46 Imaginative writing 47 Unscripted Speeches 44.4 Fiction Prose 46.3 Spontaneous speech 44 Conversations 38 Phone Conversations 36 * Heylighen, F. & Dewaele, J. Variation in the contextuality of language: An empirical measure Foundations of Science, 2002, 293-340 Weblogs, Genres and Individual Differences: How bloggers write for who they write for; Scott16 Nowson

Tuesday, October 27, 2009 16 Making for lack of context..

• Supplement what the data is showing you with what you already know.. • Statistical NLP + Contextual Knowledge • Ontologies, Taxonomies, Dictionaries, social medium, shared spatio- temporal contexts..

17

Tuesday, October 27, 2009 17 Representative Efforts

WHAT WHY HOW

18

Tuesday, October 27, 2009 18 Cultural NER

WHAT

19

Tuesday, October 27, 2009 19 Cultural NER

WHAT

It was THE HANGOVER of the year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now

19

Tuesday, October 27, 2009 19 Cultural NER

WHAT

It was THE HANGOVER of the year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now

LOVED UR MUSIC YESTERDAY!

19

Tuesday, October 27, 2009 19 Cultural NER

WHAT

I decided to check out the Wanted demo today even though I really did not like the movie It was THE HANGOVER of the minus Mrs Jolie a.k.a Fox of course! year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now

LOVED UR MUSIC YESTERDAY!

19

Tuesday, October 27, 2009 19 Cultural NER

WHAT

I decided to check out the Wanted demo today even though I really did not like the movie It was THE HANGOVER of the minus Mrs Jolie a.k.a Fox of course! year..lasted forever.. so I went to the movies..bad choice picking “GI Jane” worse now

LOVED UR MUSIC YESTERDAY!

Obama the Dark Knight of socialism.. the man is not as impressive as Ledger yea

19

Tuesday, October 27, 2009 19 Intuitions..

• Spotting and Sense Identification It was THE HANGOVER of the year..lasted forever.. so I went to the • Open vs. Closed world movies..bad choice picking “GI Jane” worse now • unlike person, location, named entities, contexts and senses change fairly rapidly • We assume an open-world wrt senses

• No comprehensive sense knowledge base

• Reduce it to a spotting and binary sense classification problem

20

Tuesday, October 27, 2009 20 Two flavors..

• Artist and tracks spotting in MySpace music forums

• using the MusicBrainz Taxonomy

• with Daniel Gruhl, Jan Pieper, Christine Robson, IBM Almaden, Amit Sheth, Knoesis [ISWC09a]

• on Thursday Oct 29, Session: Discovering Semantics • Movie names from Weblogs

• with Amir Padovitz, Social Streams MSR, [WWW2010]

21

Tuesday, October 27, 2009 21 Cultural NER in Weblogs

• Goal: Supplement classifiers with information that will help them disambiguate the reference of a term better! • A Complexity of Extraction measure associated with an entity in target sense in a corpus

• with all cues equal, systems that are ‘complexity aware’ will treat cues differently

22

Tuesday, October 27, 2009 22 Measure of Extraction Complexity

• Feature extraction: Graph-based

spreading activation and Extracted Complexity clustering (general weblogs) Time Travellerʼs Wife Angels and Demons • entity sense definition from .. The Hangover Wikipedia + evidence a corpus .. presents for the target sense of the .. entity Wanted Up Twilight • Ranked list speaks for itself ...

• More varied senses and contexts, implies higher extraction complexity 23

Tuesday, October 27, 2009 23 Feature as a Prior Decision Tree and Boosting Classifiers

X axis: precision 1500+ hand-labeled data points Y axis: recall Blue: basic features Red: with Entropy baseline Green: with our Complexity of Extraction feature 24

Tuesday, October 27, 2009 24 As a Prior in Binary Classification

Average F-measure over 1000 decision tree, boosting models

Average Accuracy over 1000 decision tree, boosting models

1500+ hand-labeled data points Blue: basic features Red: with Entropy baseline Green: with our Complexity of 25 Extraction feature Tuesday, October 27, 2009 25 To chew on..

• The concept of ‘Extraction Complexity’ as an additional prior is very promising • applies to general NER

26

Tuesday, October 27, 2009 26 User Intention Mapping

WHY • Unlike Web search intent, entity alone is in-sufficient to characterize intent here.. • Three broad intentions: information seeking, sharing, transactional, combinations thereof. • ‘i am thinking of getting X’ (transactional) ‘i like my new X’ (information sharing) ‘what do you think about X’ (information seeking)

27

Tuesday, October 27, 2009 27 Action Patterns

• Resorted to ‘action patterns’ surrounding named entities

• “where can i find a psp cam..” • A minimally supervised bootstrapping algorithm

• 10 seed action patterns, learn new ones from unannotated corpus, relying on a empirical and semantic similarity with seed patterns

• semantic similarity from communicative functions of words Linguistic Inquiry Word Count (www.LIWC.net)

28

Tuesday, October 27, 2009 28 Information Seeking, Transactional

• Patterns learned using 8000 uncategorized posts on MySpace forums

Sample learned patterns does anyone know how know where i can was wondering if someone Im not sure how someone tell me how

• Intent recognition recall using pre-classified user posts from Facebook Marketplace (to buy): 81%

29

Tuesday, October 27, 2009 29 Impact on Online Advertising?

• Generate ads from user profile (interests, hobbies) or from posts with monetizable intents?

30

Tuesday, October 27, 2009 30 Targeted Content Delivery Platform

• Of all the ads generated using profile (hobbies, interests) information, 7% received attention

• Ads generated using authored, monetizable posts, 59% received attention What More at [WI09], Beyond Search and Internet Economics Workshop, Why MSR, Redmond, WA http://research.microsoft.com/en-us/um/redmond/about/collaboration/awards/beyondsearchawards.aspx 31

Tuesday, October 27, 2009 31 Self-Presentation

[ICWSM09] HOW • On Online-dating profiles (with Prof. Marti Hearst, UCB)

• quantifying usages of words from linguistic, personal and psychological categories in LIWC

• Exploratory Factor Analysis to identify systematic co-occurrence patterns among LIWC variables

• grouping user profiles on the basis of their shared multi-dimensional features to compare and contrast self-presentation 32

Tuesday, October 27, 2009 32 Imitate to Impress !?

• More similarities than differences

• Men displaying a higher usage of tentative words (maybe, perhaps..)

• typically attributed to feminine discourse

• Many similarities in word combinations and words used!

• Perhaps, self-expression tends towards attempting homophily in online dating..

33

Tuesday, October 27, 2009 33 Science, Fun and Profit

34

Tuesday, October 27, 2009 34 BBC SoundIndex, IBM

“A pioneering project to tap into the online buzz surrounding What artists and songs, by leveraging several popular online sources” Why

De-spam, slang transliterations, entity identification, voting theory When, to combine multi-modal online data sources [ICSC08a,VLDB09] Where, http://www.almaden.ibm.com/cs/projects/iis/sound/ Who 35

Tuesday, October 27, 2009 35 Twitris: Kno.e.sis

Real-time user perceptions as the fulcrum for browsing the Web [ISWC09b] What

When, Where

36

Tuesday, October 27, 2009 36 Iran elections: Discussions in the US and Iran on the same day

The mystery of Soylent Green: information where you can use it

37

Tuesday, October 27, 2009 37 twitris socially influenced browsing Ashu, Raghava, Wenbo, Pramod. Vinh, Karthik, Meena, Amit, and Ajith kno.e.sis center, Wright State University

Twitris aggregates social perceptions from Twitter using a spatio, temporal and thematic approach. Twitris captures what was said, when it was said and where it was said. Fetch resources from the Web to explore perceptions further. Browse the Web for issues that matter to people, using people's perceptions as the fulcrum.

What does twitris do? Temporal perspective Spatial perspective

Capture changing Capture changing perceptions, perceptions, issues of interest issues of interest every day; Nobel every day; legalize is no more the Opinion on Iran illegal immigrants news for Obama! Election from the , captured October US talks about Oil Opinion on Iran in the healthcare economies, Election from Iran context on 12. blogging talks about September 18. theocracy oppression, demonstration

The fourth estate Find resources related to perspective social perceptions Twitris: Twitter through

News and Wikipedia articles to put extracted descriptors in context Integrate user observations with news on a particular day; Correlate citizen journalism with the fourth estate; On September 18, Obama was talking about Illegal immigrants in the context of healthspace,time,theme care;

✓ Exploit spatio, temporal semantics for thematic aggregation Little statistics from Tiwtris (unit: tweets)

✓ Analyze the anatomy of a tweet "RT @m33na come back and checkl new events on twitris #twitris" Healthcare ( Aug 19 - Oct 20) : 721 K (US Only) RT: Retweet or a repost of a tweet; # (hashtags) user generated meta; @- refer to other users Obama (Oct 8 - 20): 312 K (US Only) ✓Data from diverse sources (Twitter, news services, Wikipedia, and other Web resources) Come H1N1 (Octsee 5 - 20) : 232 K& (US Only) play with Twitris @ the ✓ End user application Iran Election (June 5 - Oct 20) : 2.8 m (Worldwide)

twitris internals in less than Concept Cloud, International Semantic Web Challenge News and related ` articles 140 characters

Context Twitris Parallel crawling to scale + Selected Context Data processing pipeline to streamline Term + Selected Twitter, geocode services,at data analytics, ISWC ’09 Term to handle heterogeneity Live resource aggregation Near real time: Processing upto a day Google DBpedia before News widget widget Spatio-temporally weighted text analytics

Data Processing TFIDF Spatio, Temporal, Extracting Twitris DB based Thematic storylines Cavetas and Future work descriptor descriptor around Twitter extraction extraction descriptors 1. Handle Twitter constructs such as hashtags, Search retweets, mentions and replies better 2. Different viz widgets such as time series to show changing perceptions from a place for an Data Collection event and demographic based visualizations. S S S Geocode Lookup Data Dumper 3. Sentiment analysis event-1 crawler h Author Location h . h . . a Lookup . a . . . a . . 4. Robust computing approaches (Cloud, Hadoop) . r . r . r . http://twitris.knoesis.org . . Geocode Lookup Data Dumper 5. FB Connect for sharing and personalization event-k e Author Location e . e . crawler d Lookup d . d ...... M Author Location Geocode Lookup M Data Dumper . Lookup M event-n e e e crawler m m m o o o r r r y y y

Check us out at: http://twitris/.knoesis.org A tetris like approach to twitter to gather Follow us @7w17r15

Become a FB Fan and share Twtitris with everyone aggregated social signals is defined as 38

Tuesday, October 27, 2009 38 Thank You!

Google, Bing, Yahoo: Meena Nagarajan

[email protected]

http://knoesis.wright.edu/researchers/meena

39

Tuesday, October 27, 2009 39 References

http://knoesis.wright.edu/researchers/meena/pubs.php [WISE09] Meenakshi Nagarajan, Karthik Gomadam, Amit Sheth, Ajith Ranabahu, Raghava Mutharaju and Ashutosh Jadhav, Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009. [ISWC09a] Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Context and Domain Knowledge Enhanced Entity Spotting in Informal Text, The 8th International Semantic Web Conference, 2009. [ISWC09b] Twitris, Submission to the International Semantic Web Challenge, collocated with the International Semantic Web Conference 2009. [VLDB09] Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social Intelligence in a Realtime Dashboard System, Pending Review, VLDB Journal, Special Issue on "Data Management and Mining on Social Networks and Social Media", 2009. [WWW2010] Meenakshi Nagarajan, Amir Padovitz, A Measure of Extraction Complexity: a Novel Prior for Improving Recognition of Cultural Entities, Manuscript in Preparation, for The Nineteenth International World Wide Web Conference, 2010. [ICSC08a] Alfredo Alba, Varun Bhagwan, Julia Grace, Daniel Gruhl, Kevin Haas, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Nachiketa Sahoo. Applications of Voting Theory to Information Mashups, Second IEEE International Conference on Semantic Computing, ICSC 2008. [ICSC08b] Meenakshi Nagarajan, Cartic Ramakrishnan, Amit Sheth, “Text Analytics for Semantic Computing - the good, the bad and the ugly”, Second IEEE International Conference on Semantic Computing Santa Clara, CA, USA, 2008. [WI09] Meenakshi Nagarajan, Kamal Baid, Amit P. Sheth, and Shaojun Wang, Monetizing User Activity on Social Networks - Challenges and Experiences, 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Sep 15-18 2009. [ICWSM09] Meenakshi Nagarajan, Marti A. Hearst. An Examination of Language Use in Online Dating Personals, 3rd Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2009 [IC09] Amit Sheth, Meenakshi Nagarajan. Semantics-Empowered Social Computing IEEE Internet Computing 13(1), 2009.

40

Tuesday, October 27, 2009 40