Who Produced This Video, Amateur Or Professional?
Total Page:16
File Type:pdf, Size:1020Kb
Who Produced This Video, Amateur or Professional? Jinlin Guo, Cathal Gurrin Songyang Lao CLARITY and Computing School of Information System School, Dublin City University and Management Glasnevin, Dublin 9, Dublin, NUDT, Changsha, Hunan, Ireland China {jinlin.guo, cgur- [email protected] rin}@computing.dcu.ie ABSTRACT 1. INTRODUCTION As the increasing affordability for capturing and storing video Video content has been historically created and supplied and the proliferation of Web 2.0 applications, video content by a limited number of production companies, TV networks, is no longer necessarily created and supplied by a limited and cable networks. They are produced by professionals and number of professional producers; any amateur can produce consumed by general public. The video quality was in gen- and publish his/her video quickly. Therefore, the amount eral guaranteed since it was recorded by professional captur- of both professional-produced as well as amateur-produced ing device and subject to careful post-produced according to video on the web is ever increasing. In this work, we pro- certain cinematic principles. pose a question; whether we can automatically classify an Nowadays, the increasing affordability for capturing and Internet video clip as being either professional-produced or storing video has resulted in a massive amount of personal amateur-produced? Hence, we investigate features and clas- video content, and the proliferation of Web 2.0 applications sification methods to answer this question. Based on the is re-shaping the video consumption model. Especially, the differences in the production processes of these two video rise of some easy-to-use social networking websites such as 1 categories, four features including camera motion, structure, YouTube , makes it easy for users uploading, managing, audio feature and combined feature are adopted and studied sharing video. Therefore, the users on the Internet are along with with four popular classifiers KNN, SVM GMM no longer only video consumers, but also participators and 2 and C4.5. Extensive experiments over carefully-constructed, producers, just as the slogan of Tudou , one of the most representative datasets, evaluate these features and classi- popular video sharing websites, “Everyone is the director of fiers under different settings and compare to existing tech- life”. Now, hundreds of millions of Internet users are self- niques. Experimental results demonstrate that SVMs with publishing consumers. This results in an explosive increase multimodal features from multi-sources are more effective at in the quantity of Internet video. Recent statistics show classifying video type. Finally, for answering the proposed that, on the the primary video sharing website YouTube, 48 question, results also show that automatically classifying hours of video are uploaded every minute by users, resulting a clip as professional-produced video or amateur-produced in nearly 8 years of content uploaded every day, and more video can be achieved with good accuracy. video is uploaded to YouTube in one month than the 3 major US networks created in 60 years3. The video on the Internet, Categories and Subject Descriptors we call them user-uploaded video, may be either produced by amateur or professional based on its original producer I.4 [Image Processing and Computer Vision]: Miscel- (the user uploaded the video may be the authors of the laneous; I.5.4 [Pattern Recognition]: Computer Vision— video or not). Hence, the user-uploaded video can be catego- Applications rized into amateur-produced video (APV) and professional- produced video (PPV) based on the author type. General Terms We define an APV clip as being recorded by an amateur Experimentation, Performance without much knowledge in producing video and generally using personal video capture devices, then uploaded to web- sites by the user (it may be the amateur or not) with little Keywords post-production. By contrast, the PPV is captured by pro- Amateur-produced Video, Professional-produced Video, Cam- fessional devices and edited based on certain cinematic prin- era Motion, Video genre Classification ciples, such as news, sports and movies. Note that, a number of Internet video clips are made by extracting/ripping con- tent from PPV such as TV programs, DVD movies, and then uploaded (sometimes even with some captions and back- Permission to make digital or hard copies of all or part of this work for ground music added). In this work, we still consider them personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific 1http://www.youtube.com permission and/or a fee. 2 ICMR’13, http://www.tudou.com/ April 16–20, 2013, Dallas, Texas, USA. 3 Copyright 2013 ACM 978-1-4503-2033-7/13/04 ...$15.00 . http://www.youtube.com/t/press_statistics 271 News Stream A Tennis Match Video Story Unit Game Shot Set Frame Loose Serve Image View Advertisement (a) A structural method for broadcasted news video (b) A structural method for a tennis match video Figure 1: Two structural methods for broadcasted news video and tennis match video respectively as PPV. Therefore, compared with PPV, the APV has the 2. RELATED WORK following characteristics [7]. Most relevant work in our context focuses on video genre classification. Some researchers have focused on classify- • A great number of APV clips can be found on the Web. ing segments of video, such as identifying violent [10]or Since APV requires less production efforts, anyone can scary [18] scenes in a movie. However, most of the video readily make short video clips by using a camcoder classification work attempts to classify an entire video clip or even a smartphone. Easy of production creates a into one of several genres, such as sports, news, cartoon, massive amount of APV content. music. In general, the previous methods can be categorized • Due to the uncontrolled capturing conditions and ac- into four types: text-based approaches [1, 23], audio fea- companied personal capture devices, APV is most of ture based approaches [5, 15, 17], visual feature based ap- the time of lower quality than PPV. proaches [4, 20, 22], and those that used some combination of text, audio and visual features[3, 4, 8]. In fact, most authors • The APV is usually less structured: APV is not as incorporated audio and visual features into their approaches well structured as PPV. The PPV clips are consumed (we call it content-based approaches), and these approaches by general public. They are produced following certain achieved good performance. Here we will give a brief review cinematic principles. The structure is understood such about the content-based approaches. Extensive surveys of as the structures of news video and tennis video shown these techniques can be referred in [2, 12]. in Fig. 1. However, APV is usually captured by differ- The combination of audio and visual low-level features ent amateurs, who generally do not follow professional attempts to incorporate the audio and visual aspects that guidelines and best practice when producing video. In these features represent and complement each other. Audio most cases, there is no post-production before upload- features can be derived from either the time domain or the ing to the video sharing websites. Therefore, there is frequency domain. Time domain features such as the root less structure information existing in the APV. mean square of signal energy (RMS), Zero-Crossing Rate (ZCR) and frequency domain feature MFCC are commonly The question raised here is whether we can automatically used in previous work [3, 12]. determine that a video clip was created by an amateur or Visual features in general include motion features [11], professional? That is, can we classify an Internet video clip keyframe image features such as Scale Invariant Feature into PPV and APV automatically? This is helpful, for ex- Transform (SIFT) [22]andcolorortexture[3, 4, 8], struc- ample, when generating ranked lists in response to a user ture features, such as average shot length, gradual transition query or when categorizing results. For example, a user and cut shot ratios [8, 12, 21], identification of some simple searches for a concert video clip and prefers the clips pub- objects [20], with research focussed on how to combine these lished by official producer, rather than by the audience in features. the scene. In this work, we investigate features and tech- Ways of using these features investigated in existed work niques for answering this question. Our approach is based include many of the standard classifiers because of their on the differences that are inherent in the production pro- ubiquitous nature, such as KNN [8, 22], Linear Discriminant cesses of these two video categories. Multimodal features in- Analysis (LDA) [8], SVM [3, 4, 5, 8, 16, 21], C4.5 decision cluding camera motion, structure information, audio feature tree [4, 9], GMM [11, 12, 17, 20]. Moreover, some more and combined feature together with four popular classifiers complicated methods such as HMM [4, 19, 20] and neural are adopted and also evaluated within different experiment networks [12] were also introduced to video genre classifica- settings. Furthermore, with the goal of comparitive evalua- tion. tion, two representative state-of-the-art frameworks are also It should be noted that two evaluations strongly promoted implemented to classify APV and PPV. the research on Internet video genre classification. The first The paper is structured as follows. In section 2, we briefly one is set out by Google as an ACM Multimedia Grand Chal- review the related work. In section 3, we detail the multi- lenge task in 2009 (also in 2010) 4.