<<

Web Content Analysis

Workshop in Advanced Techniques for Political Communication Research: Web Content Analysis Session

Professor Rachel Gibson, University of Manchester Aims of the Session

• Defining and understanding the Web as on object of study & challenges to analysing web content • Defining the shift from Web 1.0 to Web 2.0 or ‘social media’ • Approaches to analysing web content, particularly in relation to election campaign research and party homepages. • Beyond websites? Examples of how web 2.0 campaigns are being studied – key research questions and methodologies. The Web as an object of study

• Inter-linkage • Non-linearity • Interactivity • Multi-media • Global reach • Ephemerality

Mitra, A and Cohen, E. ‘Analyzing the Web: Directions and Challenges’ in Jones, S. (ed) Doing Research 1999 Sage From Web 1.0 to Web 2.0

Tim Berners-Lee “Web 1.0 was all about connecting people. It was an interactive space, and I think Web 2.0 is of course a piece of jargon, nobody even knows what it means. If Web 2.0 for you is blogs and wikis, then that is people to people. But that was what the Web was supposed to be all along. And in fact you know, this ‘Web 2.0’, it means using the standards which have been produced by all these people working on Web 1.0,” (from Anderson, 2007: 5) Web 1.0 The “read only” web

Web 2.0 – ‘What is Web 2.0’ Tim O’Reilly (2005) “The “read/write” web Web 2.0

• Technological definition – Web as platform, supplanting desktop pc – Inter-operability , through browser access a wide range of programmes that fulfil a variety of online tasks – i.e. manage a home page, communicate with friends, share/publish pictures, receive news.

• Social understanding – Web 2.0 is based around social networking activities – relies on and built through ‘social or participatory’ software. – Users create, distribute, value-add to online content through blogs, facebook profiles, online video, wikis, tagging tools. ‘Produsers’ Bruns (2007) or ‘Prosumers’ – distinction between producers and users/consumers of content disappears. – Hallmark of these applications is the way in which they devolve creative and classificatory power to ‘ordinary’ users. Web 1.0 & Web 2.0 applications ––‘social‘social media’

Web 1.0 • Websites • • Discussion forums/chat room/IM

Web 2.0 or ‘social media’ • Blogs • Social Network Sites (Facebook) • Online Video sharing/posting (YouTube) • Online Photo sharing/posting (Flickr) • Microblogging sites (Twitter) @, #, RT • Online Link sharing/posting (Del.icio.us, Digg)

Methodologies used for analysing online campaigns (supply(supply--side)side)

Adapted from offline - `digitized methods` (Rodgers, 2010) Web 1.0 • Online interviews, elite surveys • Online discourse analysis • Online content/feature analysis – ‘Classic ‘ or offline approaches – ‘Web specific – mixed mode

Online specific - ‘digital methods’ Web 2.0 • Hyperlinking and webspheres – Voson, Issue Crawler, SocScibotOnline, NodeXL, • Blog analysis toolkits and search engines –Technorati . BAT (Blog Analysis Toolkit) • Twitter and facebook scrapers - Infoscape Lab, 140kit.com, Twapperkeeper, Discovertext • YouTube scrapers -Tubemogul, Tubekit • Wikipedia – Wikiscanner

Methodologies for analysing online campaigns (‘demand’ side)

User Focused (demand) • Focus groups: Price and Capella (2001); Stromer-Galley and Foot (2002) • Surveys: Bimber and Davis (2003); Gibson & McAllister (2007) • Experiments: Iyengar (2002); Lupia & Philpott (2005) • User statistics: Hitwise, Sitemeter, Alexa, Google Analytics • Computer tracking devices: Phorm • Online SNA software: NodeXL, Discourse analysis to study web content

• Focus of the analysis is on web as socio-cultural ‘texts’.

• Described and explored from a qualitative perspective to uncover underlying meaning and significance.

• Examples of studies: – Markham (1998) Howard (2002) Hakken (1999) Hine (2000)

– Benoit and Benoit (2000) ‘The Virtual Campaign: Presidential Primary Websites in Campaign 2000’ American Communication Journal

– Warnick (1998) ‘Appearance or Reality? Political Parody on the Web in Campaign 96’ Critical Studies in Mass Communication Web specific content/feature analysis

• Focus is on the web page/site as unit of analysis

• Structural-Functional & Quantitative - systematic set of criteria developed and applied to sites to yield largely quantitative measures of content, functionality, usability, and design features. – Quantitative-automated - Bauer and Scharl (2000) – Quantitative semi -automated • Web 1.0 • Web 1.5? • Action Centers • Heuristic Evaluation - Nielsen & Molich (1990) ‘Heuristic evaluation of user interfaces’; Collings and Pearce (2002); Yates (2006). Bauer and Scharl (2000) Internet Research

Quantitative semisemi--automatedautomated

1. Web 1.0 static/fixed homepages - Parties and campaigns sites Gibson and Ward (2000; 2002) Foot & Schneider (2002) - E-government field see Baker (2009) Pina et al. (2009) Panopoulou et al (2008) West (2007) Henriksson et a. (2006) Holzer and Kim (2005); Garcia et al (2005) http://www.insidepolitics.org/egovt06us.pdf 2. Web 1.5? - Updated web 1.0 schemes to incorporate web 2.0 elements. Party and Campaign home pages - Gulati & Wiliams, 2006; Foot and Schneider 2006 NSM - Stein, 2009 3. ‘Action Centers’ - MyBO, Membersnet, MyConservatives, LibDemACT (Gibson, 2010; Lilleker and Jackson, 2010) Gibson & Ward (2000) SSCORE

Gibson & Ward (2002) AJPS Gibson & Ward (2002) AJPS Gibson & Ward (2002) AJPS Stein, L.. 2009 NM&S Stein, L.. 2009 NM&S Stein, L.. 2009 NM&S Stein, L.. 2009 NM&S

Gibson, R. 2010 APSA Hyperlinking and webspheres

• Focus is on inter-relational/textual element of websites and the wider context. Unit of analysis is the web network rather than individual sites.

• Thematic set of websites are identified, ‘captured’/archived over time and the linkage patterns explored as well as content. – Method pioneered by Schneider and Foot (2004) ‘Web Sphere analysis’. – Define a Web Sphere as a ‘hyperlinked set of dynamically defined digital resources spanning multiple Web sites and deemed relevant to related to a central theme or ‘object’.’ The borders of the sphere are defined relationship to central theme/object and a given time period. See Library of Congress, http://www.archive.org/index.php & WebArchivist.org http://webarchivist.org/ Hyperlink analysis

More sophisticated/automated methods have developed for academic research since around 2003. – Adaptations of technology – most simple form of online link analysis. – Hindman et al. (2003) ‘Googlearchy’ pioneered the term ‘googlearchy’ referred to a ‘power law .operating in regard to the prominence of sites. – Adamic and Glance (2005) examined linkage between left and right-wing blogs in the U.S. http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdf – Ackland and Gibson (2007) comparative analysis of political party systems linkage using VOSON http://anu.voson.edu.au crawler to examine the out and inbound linkages among political parties in 6 different democracies. Hyperlinks = networked communication – inclusivity, identity, opponent dismissal, force multiplication. Test via nos to links to other parties, target of links by tld, inter-linkage within party groups. – For tools see http://www.issuecrawler.net ; http://www.touchgraph.com http://socscibot.wlv.ac.uk/

InterInter--linkagelinkage between Parties by ideological familfamilyy

Table 4: Inter-linkage between political parties by party type (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Far Left Left Centre Right Far Right Ecol. Reg. Total no. of parties linking to Total no. % of links other parties of links within party made to family other parties Ecologist 0 0 2 1 0 6 0 6 14 79 Far Left 11 7 0 2 1 3 1 12 71 46 Left 3 4 3 2 0 2 0 2 22 50 Centre 0 4 2 2 0 1 0 5 12 25 Right 1 3 1 8 1 2 1 9 20 55 Far Right 0 0 0 1 2 0 0 3 7 71 Regionalist 0 0 0 0 0 0 1 1 1 100 39 147 Note: The numbers in columns 1-7 are counts of parties (e.g. 11 far left parties linked to other far left parties). Note that the same party can appear in more than once in a given row in columns 1-7, and this is why column 8 (showing the total number of p arties that have linked to another seed site) is not equal to the sum of the preceding columns. Beyond websites…?

Blog analysis – key studies Types, Content, Structure Sweetster-Trammel, Kaye D. 2007. ‘Candidate campaign Blogs: Directly Reaching Out to the Youth Vote. ’ The American Behavioral Scientist . 50(9): 1255-1264. Xifra, J. and A. Huerta2008. s ‘Blogging PR: An Exploratory Analysis of Public Relations’ Public Relations Review 34:265-275. Herring, S. et al. ‘Weblogs as a bridging genre’ Information Technology and People 18(2): 2005: 142 -171.

Influence Karpf, D. 2008. ‘Measuring Influence in the Political Blogosphere’ Institute Politics and Technology Review , Institute for Politics, Democracy and the Internet. :33-41 http://www.the4dgroup.com/BAI/articles/PoliTechArticle.pdf Etling, B. et al. 2010. ‘Mapping the Arabic blogosphere: politics and dissent online’ 12: 1225-1243. Elmer, G.; Ryan, P.M.; Devereaux, Z.; Langlois, G.; Redden, J. and McKelvey, F. 2007. Election Bloggers: Methods for Determining Political Influence. First Monday 12(4). Drezner and Farrell. 2008. ‘The Power and Politics of Blogs’ Public Choice (134): 15-30 Beyond websites…?

Twitter analysis • Boyd, D. et al. 2010 ‘Tweet Tweet Retweet’ Working paper. http://www.danah.org/papers/TweetTweetRetweet.pdf • Tumasjan et al. 2010. ‘Predicting Elections with Twitter’ Association for the Advancement of Artificial Intelligence. Social Science Computer Review http://ssc.sagepub.com/content/early/2010/09/24/089443931038655 7.full.pdf+html • Zuckerman, E. ‘Studying Twitter and the Moldovan Protests’ www.Ethanzuckermann.com/blog/

Beyond websites…?

YouTube analysis • Shah, C. 2010. ‘Supporting Research Data Collection from YouTube with Tubekit’ Journal of Information Technology & Politics 7(2&3): 226-240 • • Wallsten, K. 2010. ‘Yes We Can’: How Online Viewership, Blog Discussion, Campaign Statement, and Mainstream Media Coverage Produced by a Viral Video Phenomenon.’ Journal of Information Technology & Politics 7(2&3): 163-181

• Zink, M. Suh, K and J. Kurose. 2009. ‘Characteristic of YouTube network traffic at a campus network,’ Computer Networks 53: 501-514

• Tian, Y. 2010. ‘Organ Donation on Web 2.0: Content and Audience Analysis of Organ Donation Videos on YouTube’ Health Communication 25: 238-246. • Online resources and tools for collection and analysing web 2.0 content.

Digital Tool Kits Digital Methods Initiative - https://wiki.digitalmethods.net/Dmi/ToolDatabase Richard Rodgers, Natively Digital : The Link | The URL | The Tag | The Domain | The PageRank | The Robots.txt Device Centric : Google | Google Images | Google News | Google Blog Search | Yahoo | YouTube | Del.icio.us | Technorati | Wikipedia | Alexa | IssueCrawler | Twitter | Facebook Infoscape Research Lab Tools www.infoscapelab.ca Greg Elmer - Blog Aggregator – measures activity levels over time and hyperlinks cited in blog posts - YouTube & Twitter scraper – title, tags, links, date uploaded, author info, views per week - Facebook Group scraper - title, type, number and members of group Training / Research • Exploring Online Research Methods Website: http://www.geog.le.ac.uk/ORM/site/home.htm • Digital Methods Initiative Training Course https://www.digitalmethods.net/Digitalmethods/WebHome • http://nms.sagepub.com/ • http://jcmc.indiana.edu/ • http://www.connectedaction.net/ • http://www.methodspace.com/group/sageresearchmethodsonline

Concluding comments • There are a growing number of options open to researchers seeking to capture and analyze campaigns and their influence on voters. • Approaches have evolved as Web itself has. From adaptation ‘traditional’ methods suitable to analysis of static point to mass ‘old’ media to newer web ‘specific’ tools that allow for capture and analysis of more dynamic ‘many to many’ medium. • Network analysis tools increasingly used – reflecting a shift in the understanding and practice of the Web as a dynamic social context - ‘conversational’ - rather than a fixed infrastructure. • The notion of ‘content’ itself has changed fundamentally - user -generated rather than editor controlled (although MSM still dominant in news). • This has some significant implications for content analysis: 1. Linkage of the context to the content. To interpret meaning of web content – twitter feed, facebook profiles, locate it in the wider and ongoing stream of dialogue. 2. . Locating content – inter-operability of the various platforms means that content is shared across multiple sites. So need to ‘follow ‘ the content. 3. Volume of content increases - capturing and archiving issues 4. Quality of content changes . A new language or terminology emerging content = actions. Posts, embeds, widgets, tweets, email, views, followers, ‘befriending’ or ‘liking’ a politician/party . A new software language has grown up - @, #, RT, http. Automating/manual coding. Short demo of a digital methods

Getting started with NodeXL. Manual: Hansen et al. Analyzing Social Media Networks with NodeXL Not focused on web content per se but structure or influence of that content. So ex. compare politicians’ twitter networks, or a trending topic – import by #. Example: Fake FB network but you can import pajek, ucinet datasets or network data from twitter, youtube, email lists ‘Vertices’ = nodes or ‘agents’ and ‘edges’ or ties between nodes. ‘Edges’ are relationships between them - ‘undirected’ or ‘directed’ ‘Unweighted’ edge is simplest – whether relationship exists. ‘Weighted’ adds information (strength, frequency of the tie) have darker/thicker lines. NodeXL works through setting up an ‘edge list’ which represents a network. This network can then be represented in a graphical form. Through this you can calculate various measures of centrality of individuals or nodes in the network. Group function – can group types of individuals and calculate a collective weight.