Identifying Ideological Perspectives of Web Videos Using Folksonomies
Total Page:16
File Type:pdf, Size:1020Kb
Identifying Ideological Perspectives of Web Videos Using Folksonomies Wei-Hao Lin and Alexander Hauptmann Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 15213 USA +1-412-268-f3119,1448g fwhlin,[email protected] Abstract awareness of individual news broadcasters’ or video au- thors’ biases, and can encourage viewers to seek videos ex- We are developing a classifier that can automatically identify pressing contrasting viewpoints. Classifiers that can auto- a web video’s ideological perspective on a political or social matically identify a web video’s ideological perspective will issue (e.g., pro-life or pro-choice on the abortion issue). The problem has received little attention, possibly due to inherent enable video sharing sites to organize videos on various so- difficulties in content-based approaches. We propose to de- cial and political views according to their ideological per- velop such a classifier based on the pattern of tags emerging spectives, and allow users to subscribe to videos based on from folksonomies. The experimental results are positive and their personal views. Automatic perspective classifiers will encouraging. also enable content control or web filtering software to filter out videos expressing extreme political, social or religious views that may not be suitable for children. 1 Introduction Although researchers have made great advances in au- Video sharing websites such as YouTube, Metacafe, and tomatically detecting “visual concepts” (e.g., car, outdoor, Imeem have been extremely popular among Internet users. and people walking) (Naphade & Smith 2004), developing More than three quarters of Internet users in the United classifiers that can automatically identify whether a video is States have watched video online. In a single month in about Catholic or abortion is still a very long-term research 2008, 78.5 million Internet users watch 3.25 billion videos goal. The difficulties inherent in content-based approaches on YouTube. On average, YouTube viewers spend more than may explain why the problem of automatically identifying a one hundred minutes a month watching videos on YouTube video’s ideological perspective on an issue has received little (comScore 2008). attention. Video sharing websites have also become an important • In this paper we propose to identify a web video’s ideo- platform for expressing and communicating different views logical perspective on political and social issues using as- on various social and political issues. In 2008, CNN and sociated tags. In previous work we have shown that indi- YouTube held United States presidential debates in which vidual news broadcasters’ biases can be reliably identified presidential candidates answered questions that were asked based on a large number of visual concepts (Lin & Haupt- and uploaded by YouTube users. In March 2008 YouTube mann 2008). This paper complements our previous work launched YouChoose’08 1 in which each presidential candi- by showing that ideological perspectives are not only re- date has their own channel. The accumulative viewership for flected in the selection of visual concepts, but also in tags one presidential candidate as of June 2008 has exceeded 50 describing the content of videos. millions (techPresident 2008). In addition to politics, many users have authored and uploaded videos expressing their Videos on video sharing sites such as YouTube allow views on social issues. For example, Figure 1 is an exam- users to attach tags to categorize and organize videos. ple of a “pro-life” web video on the abortion issue2, while The practice of collaboratively organizing content by tags Figure 2 is an example of “pro-choice” web video3. is called folksonomy, or collaborative tagging. In Sec- We are developing a computer system that can automat- tion 3.3 we show that a unique pattern of tags emerges ically identify highly biased broadcast television news and from videos expressing opinions on political and social web videos. Such a system may increase an audience’s issues. Copyright c 2008, Association for the Advancement of Artificial • In Section 2 we apply a statistical model to capture the Intelligence (www.aaai.org). All rights reserved. pattern of tags from a collection of web videos and asso- 1http://youtube.com/youchoose ciated tags. The statistical model simultaneously captures 2http://www.youtube.com/watch?v= two factors that account for the frequency of a tag associ- TddCILTWNr8 ated with a web video: what is the subject matter of a web 3http://www.youtube.com/watch?v= video? and what ideological perspective does the video’s oWeXOjsv58c author take on an issue? Figure 1: The key frames of a web video expressing a “pro-life” view on the abortion issue, which is tagged with prayer, pro-life, and God. Figure 2: The key frames of a web video expressing a “pro-choice” view on the abortion issue, which is tagged with pro, choice, feminism, abortion, women, rights, truth, Bush. • We evaluate the idea of using associated tags to classify two sets of weights combined. a web video’s ideological perspective on an issue in Sec- tion 3. The experimental results in Section 3.2 are very encouraging, suggesting that Internet users holding simi- lar ideological beliefs upload, share, and tag web videos similarly. V2 abortion 2 Joint Topic and Perspective Model choice T We apply a statistical model to capture how web videos V1 expressing strongly a particular ideological perspective are tagged. The statistical model, called the Joint Topic and Per- spective Model (Lin, Xing, & Hauptmann 2008), is designed life to capture an emphatic pattern empirically observed in many ideological texts (editorials, debate transcripts) and videos Figure 3: A three tag simplex illustrates the main idea be- (broadcast news videos). We hypothesize that the tags asso- hind the Joint Topic and Perspective Model. T denotes the ciated with web videos on various political and social issues proportion of the three tags (i.e., topical weights) that are also follow the same emphatic pattern. chosen for a particular issue (e.g., abortion). V1 denotes The emphatic pattern consists of two factors that gov- the proportion of the three tags after the topical weights are ern the content of ideological discourse: topical and ide- modulated by video authors holding the “pro-life” view; V2 ological. For example, in the videos on the abortion is- denotes the proportion of the three tags modulated by video sue, tags such as abortion and pregnancy are expected authors holding the contrasting “pro-choice” view. to occur frequently no matter what ideological perspective a web video’s author takes on the abortion issue. These We illustrate the main idea of the Joint Topic and Per- tags are called topical, capturing what an issue is about. spective Model in a three tag world in Figure 3. Any point In contrast, the occurrences of tags such as pro-life in the three tag simplex represents the proportion of three and pro-choice vary much depend on a video author’s tags (e.g., abortion, life, and choice) chosen in web view on the abortion issue. These tags are emphasized (i.e., videos about the abortion issue (also known as a multino- tagged more frequently) on one side and de-emphasized mial distribution’s parameter). T represents how likely we (i.e., tagged less frequently) on the other side. These tags would be to see abortion, life, and choice in web are called ideological. videos about the abortion issue. Suppose a group of web The Joint Topic and Perspective Model assigns topical video authors holding the “pro-life” perspective choose to and ideological weights to each tag. The topical weight of a produce and tag more life and fewer choice. The ide- tag captures how frequently the tag is chosen because of an ological weights associated with this “pro-life” group in ef- issue. The ideological weight of a tag represents to what de- fect move the proportion from T to V1. When we sample gree the tag is emphasized by a video author’s ideology on tags from a multinomial distribution of a parameter at V1, an issue. The Joint Topic and Perspective Model assumes we would see more life and fewer choice tags. In con- that the observed frequency of a tag is governed by these trast, suppose a group of web video authors holding the “pro- choice” perspective choose to make and tag more choice µ and fewer life. The ideological weights associated with τ τ this “pro-choice” group in effect move the proportion from Στ T to V2. When we sample tags from a multinomial distri- π Pd Wd,n βv bution of a parameter at V2, we would see more life and V µφ fewer choice tags. The topical weights determine the po- Nd φv D V sition of T in a simplex, and each ideological perspective Σφ moves T to a biased position according to its ideological weights. We can fit the Joint Topic and Perspective Model on data Figure 4: A Joint Topic and Perspective model in a graphical to simultaneously uncover topical and ideological weights. model representation. A dashed line denotes a deterministic These weights succinctly summarize the emphatic patterns relation between parents and children nodes. of tags associated with web videos about an issue. More- over, we can apply the weights learned from training videos, and predict the ideological perspective of a new web video weights under the Joint Topic and Perspective model is based on associated tags. P (τ; fφvgjfWd;ng; fPdg; Θ) D 2.1 Model Specification and Predicting Y Y Ideological Perspectives /P (τjµτ ; Στ ) P (φvjµφ; Σφ) P (Pdjπ) v d=1 Formally, the Joint Topic and Perspective Model assumes Nd the following generative process for the tags associated with Y P (Wd;njPd; τ; fφvg) web videos: n=1 Y Y Pd ∼ Bernoulli(π); d = 1;:::;D = N(τjµτ ; Στ ) N(φvjµφ; Σφ) Bernoulli(Pdjπ) W jP = v ∼ Multinomial(β ); n = 1;:::;N v d d;n d v d Y w w Multinomial(W jP ; β); w exp(τ × φv ) d;n d βv =P w0 w0 ; v = 1;:::;V n w0 exp(τ × φv ) where N(·), Bernoulli(·) and Multinomial(·) are the prob- τ ∼ N(µτ ; Στ ) ability density functions of multivariate normal, Bernoulli, ∼ φv N(µφ; Σφ): and multinomial distributions, respectively.