2. A first glance at different kinds of social media data

1 2006 2 2011 3 Social Media Data . Texts . Images . Videos . Mixed formats . Connections I (friends, followers) . Connections II (links/URLs) . Connections/Actions (likes, favs, comments, downloads) Images

http://www.guardian.co.uk/uk/2011/dec/07/twitter-riots-how-news-spread

Vis F, Faulkner S, Parry K, Manyukhina Y & Evans L (2013) Twitpic-ing the riots: analysing images shared on Twitter during the 2011 UK riots In Weller K, Bruns A, Burgess J, Mahrt M & Puschmann C (Ed.), Twitter and Society (pp. 385-398). Peter Lang. 5 Hashtags

Bruns, A., & Burgess, J. (2012). Notes towards the scientific study of Twitter. In Tokar, A., Beurskens, M., Keuneke, S., Mahrt, M., Peters, I., Puschmann, C., van Treeck, T., & Weller, K. (Eds.). (2012). Science and the Internet (pp. 159-169). Düsseldorf: Düsseldorf University Press 6 http://nfgwin.uni-duesseldorf.de/sites/default/files/Bruns.pdf Mentions

7 Timeline

Gummer, T., Roßmann, J., & Wolf, C. (2014). Candidates’ Twitter Use in the German Election 2013. Presentation at the General Online Research 2014, , . 8 Timeline

Gummer, T., Roßmann, J., & Wolf, C. (2014). Candidates’ Twitter Use in the German Election 2013. Presentation at the General Online Research 2014, Cologne, Germany. 9 Rhythm of a City

10 http://engineering.twitter.com/2012/06/studying-rapidly-evolving-user.html Bruns, A., Weller, K., & Harrington, S. (2014). Twitter and Sports: Football Fandom in Emerging and Established Markets. In: K.Weller, A. Bruns,80000 J. Burgess, M. Mahrt and C. Puschmann (Eds.), Twitter and Society (pp. 263-280). New York et al.: Peter Lang.

70000 Followers BVB 09 II (@BVB)

60000

FC Bayern München (@BayMuenchen)

50000

40000 SV Werder II (@werderbremen)

Hamburger SV (@HSV) numberoffollowers 30000

FC Schalke 04 II (@s04, official) 1. FC Köln (@fckoeln) SV Werder Bremen I (@Werder_Bremen) 20000 Borussia Mönchengladbach (@VfLBorussia) FC Schalke 04 I (@FCSchalke04, inofficial) 10000

0 Jun 11 Jul 11 Aug 11 Sep 11 Oct 11 Nov 11 Dec 11 Jan 12 Feb 12 Mar 12 Apr 12 May 12 Jun 12 month 1. FC Augsburg (@FCAugsburg) 1. FC Kaiserslautern (@Rote_Teufel)* 1. FC Köln (@fckoeln) 1. FC Nürnberg (@1_fc_nuernberg) 1. FSV Mainz 05 (1FSVMainz05) 1899 Hoffenheim (achtzehn99) (@bayer04fussball) Borussia Mönchengladbach (@VfLBorussia) BVB Dortmund 09 I (@BVBDortmund09) BVB Dortmund 09 II (@BVB) FC Bayern München (@BayMuenchen) FC Schalke 04 II (@s04, official) FC Schalke 04 I (@FCSchalke04, inofficial) Hamburger SV (@HSV) I (@ichbin96) Hannover 96 II (@hannover96) Hertha BSC (@HerthaBSC)* SC Freiburg (@sc_freiburg) SV Werder Bremen I (@Werder_Bremen) SV Werder Bremen II (@werderbremen) VfB Stuttgart (@VfB) 11 Interactions

Paßmann, J., Boeschoten, T., & Shäfer, M.T. (2014). The Gift of the Gab: Retweet Cartels and Gift Economies on Twitter. In K. Weller, A. Bruns, J. Burgess, M. Mahrt & C. Puschmann (Eds.), Twitter and Society. New York et al.: Peter Lang. 12 Networks

following retweeting mentioning

Lietz, H., Wagner, C., Bleier, A., & Strohmaier, M. (2014). When politicians talk: Assessing online conversational practices of political parties on twitter. In International AAAI Conference on Weblogs and Social Media (ICWSM2014), Ann Arbor, MI, USA, June 2-4, 2014. 13 Networks

Facebook (Paul Butler) Data from: Facebook https://www.facebook.com/note.php?note_id=469716398919 14 Geo data

Twitter Data from Twitter https://blog.twitter.com/2013/geography-tweets-3 Geo data

Livehood Project Daten: Foursquare (via Twitter) http://livehoods.org/maps/montreal 16 Geo data

http://www.nytimes.com/interactive/2009/11/26/us/20091126-search-graphic.html?_r=0 Data from: Allrecipes.com 17 The Guardian Data from: Twitter http://www.guardian.co.uk/news/datablog/2 012/nov/28/data-shadows-twitter-uk- floods-mapped#zoomed-picture http://www.jeuneafrique.com/Article/ARTJAWEB20130215165826/internet-libreville-accra-addis- 19 abebareseaux-sociaux-les-capitales-africaines-de-twitter-quartier-par-quartier.html#Tunis Northeastern University and Harvard University Data from: Twitter. http://www.ccs.neu.edu/home/amislove/twittermood/ 20 The Australian Twitter-Sphere (by A. Bruns)

http://www.cci.edu.au/node/1362 21 Some more about geo information

Overview on new geo-data: . Elwood S., Goodchild M.F., Sui D.Z. (2012). Researching Volunteered Geographic Information: Spatial Data, Geographic Research, and New Social Practice. Annals of the Association of American Geographers, 102(3), 571-590. Twitter . Leetaru K., Wang S., Cao G., Padmanabhan A., Shook E. (2013). Mapping the global Twitter heartbeat: The geography of Twitter. First Monday, 18(5).

Research Methods SERIOUSLY? DO THEY NOT REALIZE THAT 99% OF TWEETS ARE WORTHLESS BABBLE THAT READ SOMETHING LIKE ‘JUST WOKE UP. GOING TO STARBUCKS NOW. GETTING LATTE.’

READER’S COMMENT FOUND IN THE COMMENT SECTION FOR GROSS, D. (2010, APRIL 14). LIBRARY OF CONGRESS TO ARCHIVE YOUR TWEETS. CNN. RETRIEVED FROM HTTP://EDITION.CNN.COM/2010/TECH/04/14/LIBRARY.CONGRESS.TWITTER/, RETRIEVED NOVEMBER 19. PHOTOS: HTTPS://WWW.FLICKR.COM/SEARCH/?TEXT=COFFEE&LICENSE=4%2C5%2C6%2C9%2C10

24 New type of data

. Researchers value social media as a new type of data . Previously „ephemeral data“ become visible . Immediate – quick reaction to events . Structured . „natural“ data

“What I find really interesting is that structure becomes manifest in internet communication. So it’s the first time in history actually that we can, that social structures between people become manifest within a technology. (...) They become visible, they become crawlable, they become analyzable.”

Kinder-Kurlanda, Katharina E., and Katrin Weller. 2014. "'I always feel it must be great to be a hacker!': The role of interdisciplinary work in social media research." In Proceedings of the 2014 ACM conference on Web Science, 91-98. New York: ACM. 25 Approaches . Surveys . Experiments . Interviews . Web ethnography

. Content analysis

. Network analysis . Linguistic analyses (eg. sentiment analysis)

Rather rarely used in combination Many case studies, little methodological standards

Multi-disciplinary environment . Freedom to explore new approaches . Multi-method . Exchange with other disciplines

27 How to study social media?

„information disclosure and privacy on Facebook“

„Election prediction with Twitter data“ Challenge vs. Chance . lots of room for exploration and innovation but . few or no standards

29 Outlook: Data collection options

„manual“ forms APIs Official resellers of collection

Re-using published Third party tools (Crowdsourcing) datasets

30 Big Data?

31 Big Data?

Examples from Twitter research . 309,740 Twitter users (with followers and tweets) . 17,803 tweets from 8,616 users + 1st degree network (3,048,360 directed edges, 631,416 unique followers, and 715,198 unique friends) . 1.3 million Twitter conversations, with each conversation containing between 2 and 243 posts . 20,000 tweets . 21,623,947 geo-tagged tweets . 99,832 tweets

But also: . One person’s Twitter network (652 followers, 114 followings). . Experiment with 125 students. . 1,827 annotated tweets . Experiment with 1677 participants . Survey with 505 young American adults . none

Different methods – in social science based Twitter research

Weller, K. (2014). What do we get from Twitter – and what not? A close look at Twitter research in the social sciences. 33 Knowledge Organization. 41(3), 238-248 Big data? Twitter and elections

No. of Tweets No. of publications (2013) 0-500 3 501-1.000 4 1.001-5.000 1 5.001-10.000 1 10.001-50.000 7 50.001-100.000 4 100.001-500.000 5 500.001-1.000.000. 3 1.000.001-5.000.000 3 mehr als 5.000.000 3 More than 100.000.000 1 More than 1.000.000.000 1 no/insufficient details 13

Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript. Example: Twitter

35 Example: Twitter Data Example: Twitter Data Some small example with Twitter data

38 Testdata . Go to http://tiny.cc/testdata (link will be deactivated after the course) . Save the file to the desktop. . Open the file.

39 Who is discussing?

. Identify all users, who have written at least one tweet. . What is the distribution of tweets per user? . How many users have written exactly one tweet? . Who are the five most active users? – What can you find out about them?

40 Other information

• How many tweets are geocoded?

. How many RTs?

. YourTwapperkeeper

41 What is going on?

. Read approx. 30 tweets. . How would you approach studying what the tweets are about?

. Look up approx. 10 links from tweets. . How would you approach studying what the tweets are about?

42 Weller, K., Dröge, E., & Puschmann, C. (2011). Citation Analysis in Twitter: Approaches for Defining and Measuring Information Flows within Tweets during Scientific Conferences. In M. Rowe, M. Stankovic, A.-S. Dadzie, & M. Hardey (Eds.), Making Sense of Microposts (#MSM2011), Workshop at Extended Semantic Web Conference (ESWC 2011), Crete, Greece (pp. 1–12). CEUR Workshop Proceedings Vol. 718. Frequency of URLs: #www2010 Distribution of URLs from #www2010

45 40 #www2010 35 30 25 20 15 Frequency of URL on rank n ofFrequency URL rank on 10 5

0

1

31 61 91

121 241 151 181 211 271 301 331 361 391 421 451 481 511 541 URL on rank n (ranked by frequency) Frequency of URLs: #mla09

Distribution of URLs from #mla09

30

25 #mla09

20

15

10

Frequency of URL on rank n ofFrequency URL rank on 5

0

1 9

49 89 17 25 33 41 57 65 73 81 97

153 105 113 121 129 137 145 161 169 177 185 URL on rank n (ranked by frequency) URL Categorization

Blog Conference Error Media

Press Project Publication Slides

Twitter Other Frequent URLs and their categories: #www2010 URL Frequency Category http://blog.marcua.net/post/566480920/twitter-papers-at-the- 41 Blog www-2010-conference http://www.danah.org/papers/talks/2010/WWW2010.html 35 Publication http://kmi.tugraz.at/staff/markus/www2010/www2010_rooms 29 Twitter tream.html http://xquery.pbworks.com/rtp-meetup 22 Error http://www.elon.edu/e- 22 Conference web/predictions/futureweb2010/carl_mala mud_www_keynote.xhtml http://www.elon.edu/e- 18 Conference web/predictions/futureweb2010/default .xhtml http://futureweb2010.wordpress.com/schedule/ 16 Conference http://www.slideshare.net/haewoon/what-is-twitter-a-social- 13 Slides network-or-a-news-media-3922095 http://events.linkeddata.org/ldow2010/ 12 Conference http://opengraphprotocol.org/ 12 Project http://www.websci10.org/program.html 12 Conference Frequent URLs and their categories: #mla09 URL Frequency Category http://amandafrench.net/2009/12/30/make-10-louder/ 27 Blog http://www.briancroxall.net/2009/12/28/the-absent-presence- 23 Blog todays-faculty/ http://nowviskie.org/2009/monopolies-of-invention/ 22 Blog http://chronicle.com/article/missing-in-action-at/63276/ 20 Error http://www.profhacker.com/?p=4448 18 Press http://www.samplereality.com/2009/11/15/digital-humanities- 18 Blog sessions-at-the-2009-mla/ http://chronicle.com/blogpost/the-mlathe-digital/19468/ 16 Press http://www.profhacker.com/2010/01/09/academics-and-social- 15 Press media-mla09-and-twitter/ http://academhack.outsidethetext.com/home/2010/the-mla- 15 Blog briancroxall-and-the-non-rise-of-the-digital-humanities/ http://www.samplereality.com/2010/01/02/the-mla-in-tweets/ 15 Blog URL Categories: #mla09 and #www2010

Categories of URLs from #mla09 Categories of URLs from #mla09 (counting all URLs, n=551) (counting unique URLs only, n=199) Publication; Twitter; 22 Other; 20 4 Other; 36 Twitter; 14 Slides; 0 Slides; 0 Project; 11 Publication; 3 Blog; 54 Blog; 229 Press; 123 Project; 5 Press; 34

Error; Error; Media; 34 69 Conference 28 Conference; ; 23 Media; 25 16

Categories of URLs from #www2010 Categories of URLs from #www2010 (counting all URLs, n=1460) (counting unique URLs only, n=574) Other; 169 Conference; 37 Twitter; 76 Twitter; 31 Blog; Other; Blog; Slides; 106 222 68 Conference 94 ; 206 Slides; 45 Publication; Error; 92 135 Error; 201 Project; Publication; Media; 71 116 52 Press; 92 Media; 137 Project; 51 Press; 33 Internal Citations: Retweets

Different ways to count retweets #www2010 #mla09 Automatically detected RTs: Number and 1,121 (33.38% of 414 (21.46% of 1,929) percentage of RTs in entire conference dataset 3,358) ∅ RTs per twitterer (automatically detected RTs, 1.24 1.12 entire conference dataset) Manually detected RTs: Number and percentage 1,318 (39.25% of 514 (26.65% of 1,929) of RTs in entire conference dataset 3,358)

Weller, K., Dröge, E., & Puschmann, C. (2011). Citation Analysis in Twitter: Approaches for Defining and Measuring Information Flows within Tweets during Scientific Conferences. In M. Rowe, M. Stankovic, A.-S. Dadzie, & M. Hardey (Eds.), Making Sense of Microposts (#MSM2011), Workshop at Extended Semantic Web Conference (ESWC 2011), Crete, Greece (pp. 1–12). CEUR Workshop Proceedings Vol. 718. Testdata 2 and 3 . Go to http://tiny.cc/testdata2 and/or http://tiny.cc/testdata3 (links will be deactivated after the course) . Save file to the desktop. . No. 2: Import csv to Excel. . No. 3: Open in Excel.

. Explore!

50 YourTwapperkeeper YourTwapperkeeper http://www.tweetarchivist.com/ 53 http://www.tagsleuth.com/ 54 „Homework“

55 Voluntary Homework

. Think about a case study you would be interested in (in the context of social media research). Prepare a research question you would like to answer.

. Alternatively, you can usa an example topic tomorrow.

56 Activate TagSleuth Account . Activate a free 3 day trial account for TagSleuth. . Set up a collection that matches your selected topic.

57 Conclusions 2

58 Lessons learned . In the context of social science research, it is not all about „big“ data, but about new data which can enable new types of insights. . New types of data also come with several challenges, e.g. concerning new methods. . Get to know „your“ platforms and their data – as early as possible in the research process. Familiarizing with platforms may take some time. . Identify what is feasible or not in your domain of interest. . If your ideal dataset is not accessible, think about proxies.

59 If you have time to read 3 papers…

. Almuhimedi, H., Wilson, S., Liu, B., Sadeh, N., & Acquisti, A. (2013). Tweets are forever: a large-scale quantitative analysis of deleted tweets (p. 897). ACM Press. http://doi.org/10.1145/2441776.2441878 . Fabio Giglietto, Luca Rossi, Davide Bennato (2012) The Open Laboratory: Limits and Possibilities of Using Facebook, Twitter, and YouTube as a Research Data Source, 145-159. In: Journal of Technology in Human Services 30 (3-4). . Mahrt, M., & Scharkow, M. (2013). The value of big data in digital media research. In: Journal of Broadcasting & Electronic Media, 57(1), 20-33.

60