<<

1

Automatic Detection of TV Commercial Blocks in Broadcast TV Content

Alexandre Ferreira Gomes

 work is dated because some of the assumed commercial Abstract — This paper describes in detail an algorithm characteristics are no longer valid. In this context, the main proposed for detecting TV Commercial Blocks in Broadcast TV objective of this Thesis is to design, implement and assess an Content, based on the presence or absence, in the screen, of a TV improved solution for the detection of commercials when channel logo. No pre-built database is required, as the proposed operating with current TV commercial content. solution sets up its own collection of logos and takes into account the different types of logos that are broadcasted in a regular TV This shall be done by implementing a mechanism allowing to transmission. By distinguishing a TV channel logo from a correctly detect the beginning and the end of each commercial commercial brand logo, a final classification is assigned for each block present in the provided TV content and to generate an video shot, differentiating regular programs from commercial events report identifying all the detected commercials and the blocks. For the used test video dataset, that resulted from respective occurrence times. In section an recordings of three different Portuguese TV channels, a overview of some proposed methods is presented. Section III minimum accuracy of 93,9% on commercials detection was achieved; furthermore, the measured and reported processing will discuss in detail the algorithm developed with an time suggests that the proposed solution could enable real time architecture and explanation of each module. In Section IV the (i.e., while recording) detection of commercial blocks. methodology, conditions and metrics will be discussed followed by the results obtained for the algorithm. Index Terms — TV ; commercial blocks; shot detection; Digital on-Screen Graphics; logos detection; video II. TV COMMERCIALS CHARACTERIZATION processing. In this Section the main characteristics of TV commercials I. INTRODUCTION are exposed. HIS paper will focus on the development of an algorithm A. Legal Framework T that aims to detect TV Commercial Blocks in Broadcast In 2010, the European Parliament and the Council of the TV, based on the presence or absence, in the screen, of a TV European Union (EU) established a Directive [1] gathering a channel logo. As the global economy evolves, companies need set of rules concerning TV broadcasting activities. The to improve their marketing solutions in order to get some relevant points are the required insertion of some video and/or advantage over competitors; TV advertising commercials have audio elements to distinguish TV advertising from editorial emerged as an essential tool for achieving this goal. content, the imposed limit of 20% of TV advertising spots per is an important publicity space for companies and the visibility hour and the obligation of keeping the audio volume the same achieved by using this remarkable communication medium is as the remaining programs. something for what most companies fight for. Also, as it intends to capture viewers’ attention, there is a critical artistic B. Typical Structure of a Commercial Block component related, not withstanding its fundamental The typical structure of a Commercial Block is composed marketing objective. Curiously, there are two different faces by the following elements: i) Initial commercial block of the same coin struggling in this business. In first place, the separator, containing the word “Publicidade” (the Portuguese advertisers who want to check if their contracts with the word for “Advertising”); ii) TV Commercials; iii) Broadcaster broadcasters have been fulfilled, i.e., guaranteeing clauses like self-promotion; iv) Institutional commercials (specific from “which”, “when” and “how many times” some commercials the public TV channels) and vi) Final commercial block shall be broadcasted. On the other hand, the viewers who separator. In Portuguese television near 100% of the typically wish to eliminate the transmitted commercials from commercials last between 5 and 60 seconds. The exceptions their recorded TV programs or even from real-time programs. present a duration of 120 seconds. From the video content point of view, TV commercials are a C. Intrinsic characteristics of TV commercials special type of content. Their characteristics can be considered Intrinsic characteristics are those specifically related to the as intrinsic - if associated to the advertising content itself - or the process of making a commercial, notably its content extrinsic - if external to the advertising - and have been used elements, in which several advertising and marketing to build several solutions. However, some of the published techniques are applied. Some features used to attract the viewers’ attention can be analyzed and used to detect the presence of commercials; for this, well defined mathematical 2 features with a high distinguishing power from regular concluded that 99.98% of the BF sequences identified as programs are measured. Examples of intrinsic characteristics potentially belonging to commercial blocks were indeed part are: i) high scene cut rates (including hard cuts, fades and of a commercial block. However, about 15% of the overall dissolves); ii) considerable text presence (providing some key commercial blocks length was missed, notably commercial information in a clear way and in a short time is a major goal); block introduction, broadcaster self-promotions, previews and iii) audio jingles and background music; iv) the audio level the first and last commercials, because these elements were (which uses to be higher than regular content’s, despite the not separated with BFs. Sadlier et al. [3][4] developed a fact that legal framework presented in II. A is not allowing this different method to detect BFs. This solution includes two difference in EU countries anymore). main stages, the first associated to BF detection and the second corresponding to silence detection. Using this method, D. Extrinsic characteristics of TV Commercials Sadlier et al. got a 100% precision and a recall of 89.3%. The extrinsic characteristics of a commercial are those not related to the commercials’ message and content, and also not 2) Going deeper – Cut Rates to the advertising techniques themselves. These characteristics In [2] Lienhart et al. improved the detection performance of are normally related to the structure and composition of the their first solution by also detecting the presence of hard cuts commercial block. Examples of extrinsic characteristics are: i) and fades. Chen et al. [5] and Colombo et al. [6] propose commercial block separator (which is a short audiovisual similar solutions, despite the method to detect transitions is sequence that introduces or finalizes a commercial block and not the same. Colombo et al. [6] and Feng and Neuman [7] is mandatory according to the European directives); ii) the introduce the detection of dissolves, the most difficult video presence or absence of the channel logo in one of the screen transition case to deal with. In [2], some results are presented corners (as it the channel logo is typically suppressed during based on the “Cuts per Minute” feature. A false positive the commercial block); iii) black frames and silence (a detection of commercials of 0.09% and a detection of about classical hint used to detect the limits of a single commercial, 96.14% of the total commercials in the video are reported. as this type of frames are sometimes inserted at the beginning and at the end of each commercial); iv) time duration (it is a 3) Motion Analysis feature difficult to guess a priori, as multiple values have been Some authors [2][6][7] have referred that a commercial can adopted, though most commercials in the Portuguese TV be distinguished from other video content by comparing not channels have a time duration in the range of 5 to 60 seconds: only the cut rate but also the action level within each is shot, v) commercials repetition (a single TV commercial may be which is in generally higher for advertising content. For broadcast several times in a single commercial block, during a motion analysis, the most referred feature is ECR, proposed by day, a week or a month, meaning that, in a broadcast video Zabih et al. [8]. stream with enough temporal duration, any TV commercial is inevitably repeated). 4) Logo Detection The absence of TV channel logos during the commercial III. OVERVIEW OF TV COMMERCIALS DETECTION SCHEMES blocks is another characteristic that can be exploited for the In this Section, the most relevant solutions in the literature automatic detection of commercials, and several methods have targeting the detection of TV commercials are presented. been developed with interesting results. In [9] Glasberg et al. There are two main different approaches, knowledge-based propose a Static Area Descriptor for logo detection. The tests detection and repetition-based detection, which will be performed by Glasberg et al. were able to detect 90 out of 98 presented and discussed in the following. commercials; the algorithm failed with commercials containing the company logo on the screen. In [10], Albiol et A. Knowledge-based Detection al. propose a Logo Mask Extraction scheme based on the idea The knowledge-based schemes for the detection of TV that a logo exists if there is, in the image, an area with stable commercials are those based on the a priori knowledge of contours. Esen et al. [11], Mikhail et al. [12] an Ozay et al. specific characteristics. In practice, these methods tend to use [13] present more works related to automatic TV logos simultaneously both intrinsic and extrinsic characteristics. detection and recognition. Several combinations of characteristics have been exploited with appropriately designed and tested algorithms, as 5) Others presented in the following. Some works rely on background music and speech analysis [14][15][16], text detection [17][18] and still images detection 1) The First Steps – Black Frames and Silence [15]. Lienhart et al. [2] released one of the most important works B. Repetition-based Detection in the area of the detection of TV commercials. The algorithm makes a set of assumptions based on observations from the The repetition-based detection methods rely on the fact that German TV and considers both intrinsic and extrinsic types of each TV commercial is an individual video stream piece that features. The most intuitive starting point chosen in [2] is the is (or may be) repeated several times along a certain period of detection of dark monochrome frames; this algorithm was time. Some works [2][19][20][21] have been developed with tested with seven different German TV video sequences, good results. For instance, Li et al. [20] report a recall of obviously including the respective commercial blocks; they 99.2% and 99.5% and precision of 96.8% and 97.4% for two 48-hour clips. Comparing to knowledge-based detection 3 methods, repetition-based methods are much more B. Characterizing TV Channel Logos computationally expensive. TV channel logos may differ in terms of opacity, stillness, shape, animation, color and location. All these possible IV. PROPOSED SOLUTION: ARCHITECTURE AND ALGORITHMS characterization dimensions are summarized in Table 1. This section describes the solution designed and Table 1 – Commonly observed characteristics in TV logos. implemented to detect commercial blocks in TV content, using Stillness only the visual data; its global architecture as well as the Opacity Shape # Colors Location functional and algorithmic descriptions of its main building (Shape/Texture) Opaque blocks are presented and the rationale behind them is Numbers

Static Letters Single highlighted. Semi- All Polygonal transparent corners A. Learning with Real TV Content Dynamic Irregular Multiple

Circular To get a good insight about the current characteristics of Transparent commercial blocks, several video segments (from different channels and with different content) were recorded and The enormous diversity of logos hinders the development of a observed by the author. The exhaustive analysis of this content single solution to detect all of them. In fact, even opaque and revealed that some of the assumptions accepted in the past static logos, which are the easiest to detect, can be hard to find about TV commercials, such as the use of black frames and when the background is very textured for a long time (see silences, high cut rates and audio level and structure analysis Figure 3 (a)), or when the contrast between the logo and the are no longer reasonable. It was also noticed that the background is not sufficient to allow their discrimination, even commercials detection solutions reviewed, based on logo visually by a human (see Figure 3 (b)). detection, available in the literature, have been reviewed. are in general too simplistic as they do not consider the fact that not all observed logos in the TV screen are TV channel logos; nowadays, other logos can be found in the screen corners with many different purposes. There is nowadays a recurrently used graphical element known as “Digital on-Screen graphic” (DoG), also named as “bug” in some countries. DoGs are (a) (b) typically placed in a given corner of screen for the entirety of Figure 3 – Difficult logo examples: (a) Highly textured background; a program and were created as a way to brand some TV shows (b) Background with the same color of the logo. with an identity [22][23]; however, their application has been Among the selected characteristics, the most critical ones are largely extended and DoGs are now also used with opacity and stillness. If a logo is not totally opaque, the commercial purposes and services. Figure 1 and Figure 2 problems presented in Figure 3 (a) and (b) are even more present two examples of DoGs usage. critical as the difficulty in discriminating the logo from the background increases. With respect to stillness - which can be analyzed in terms of shape and texture - the challenge is to develop a method which is robust to texture changes and simultaneously able to correctly identify a dynamic logo in terms of shape. For all these reasons, TV logo detection is not a trivial task. However, even for the most problematic cases, the amount of possible visual effects is not infinite. Another related issue is that TV channels change their logo more often Figure 1 - Screenshot extracted from a Portuguese TV commercial: the commercial brand logo appears in the upper right corner during than one may expect. This changes may be temporary (e.g., the whole commercial. during special events, national holidays, specific moments of the year or some days after and before the channel’s anniversary) or permanent (e.g., for rebranding purposes), as Portuguese public TV channel logo, RTP1, recently did.

Figure 4 - RTP1 logos: the old and the new. C. Proposed System Architecture Figure 2 - Screenshot from a Portuguese TV news program: in the This section presents a high-level description of the upper left corner, the TV channel logo; in the lower right corner, the proposed algorithm for commercial blocks detection, together news program logo, the current time and live traffic information. with the motivation for the main design options.

4

1) Designing the system The good design and high performance of the targeted commercial blocks detection solution depends on the appropriate consideration of the main characteristics of the commercial blocks. In the following, the main design considerations are presented: 1. Video segments processing - The first important observation to make is that if a video is divided into small temporal segments (i.e. sets of frames or frame windows), where all the frames are consistent in terms of spatial content, then those frames should be classified in the same way (i.e. all part or not of a commercial block); in particular, those temporal segments may coincide with “video shots”, where a video shot may be defined as a sequence of frames running for an uninterrupted time period and filmed by the same camera. 2. DoG presence diagnostic - Each video segment passes through a set of procedures aiming to detect potential DoGs. The designed solution is not focused on any specific type of DoGs; it actually intends to cover all the possible cases. The Figure 5 – Global System Architecture. rationale behind the designed solution takes into account the 2. DoG Acquisition Algorithm (DoGA) – Once a shot is following simple assumptions about a DoG: i) it is present in detected, its edge and color characteristics are analyzed in one of the screen corners; and ii) it is quasi-stable in terms of order to detect and characterize possible static areas (DoGs), edges (including inner and outer limits) and color. A low level following two main procedures: analysis is performed for each video segment in order to get a a. Video Segment Edges & Color Analysis – Each video conclusion about the existence of a DoG (or more) on the segment is processed in terms of edges and color but only in screen. the screen regions that are likely to contain DoGs, i.e., the four 3. DoGs Database (DDB) - A problem to consider is related screen corners. to the fact that, during a long period of TV broadcasting, many b. DoG Detection – The possible existence of a DoG in the DoGs may occur corresponding to TV channel logos and non- video segment is evaluated; if a DoG is detected, all the TV channel logos. This observation highlights a major issue to relevant information about it is extracted to characterize it. take into account when designing an efficient TV channel logo 3. DoGs Database Updating & DoG Type Decision – When detection solution: the need to store, organize and manage the a DoG is detected, some procedures and comparisons with the detected DoGs and also to differentiate them – that is the DoGs database (which is empty when the application is reason of the creation of a “DoGs Database” (DDB), which is launched) are performed, so that the application can empty when the algorithm starts running and shall be distinguish between TV channel logos and non-TV channel continuously updated and checked in order to improve and logos and correctly classify each segment as commercial or speed up the whole commercial blocks detection process. The regular program. The DoGs database updating/management is DoGs Database shall also contain all the relevant information a key process as it is responsible for classifying a DoG as TV about each DoG it stores. This information includes not only channel logo or non-TV channel logo, which is the major color and edge data, but also timing information (e.g., the time decision to take. that a DoG was on air) and the DoG classification, i.e. TV 4. Video Segment Classification – The final classification of channel logo or non-TV channel logo. the segment under analysis, as Commercial Block or Regular 2) Architecture walkthrough Program, is taken based on the type of detected DoG; a report The proposed Global System (GS) is shown in Figure 5; the is continuously produced with the logo classification results. overall process includes the following main steps: A. Shot Change Detection and Segmentation 1. Shot Change Detection and Segmentation (SCD) – The Figure 6 presents the flowchart of the module responsible input video is fragmented in video segments (or time for the video shots detection and segmentation. windows) characterized by similar spatial content (i.e. shots), to be analyzed in the subsequent modules. 5

4. Hard Cut Decision - Once the adaptive threshold has been computed, the final step is to compare the Chi-Square Distance value of each pair of frames inside the window under analysis with adaptTh, to decide about the existence of a hard cut; if the Chi-Square Distance is above adapthTh, there is a hard cut. 5. Forced Segmentation - Simultaneously with the hard cuts detection procedure, a mechanism is running in order to avoid processing segments considered as too long. If the number of frames have occurred since the last cut reaches the maximum acceptable length for each video segment, it may force a new segment to be created. B. DoGs Acquisition Algorithm The DoG Acquisition algorithm refers to all the algorithmic operations performed with the aim to detect and characterize DoGs in the TV content. Those tasks are associated to two main functions: Video Segment Edges & Color Analysis (see Figure 7) and DoG Detection (see Figure 8).

Figure 6 - Shot Change Detection and Segmentation module flowchart. When designing the algorithm responsible for detecting shot changes, two observations were essential to design the solution: (i) hard cuts are the most common type of video shot transitions; (ii) transitions from Regular Programs to Commercial Blocks (and vice-versa) are typically implemented using hard cuts. Thus, detecting this type of transitions becomes a priority. In this work, the YCrCb color space was used for representing the video data, in order to separate luminance and chrominance components. The Luminance Histogram Operations are two simple procedures applied over the input luminance frames: Luminance Frame Histogram Computation and the Luminance Histogram Distance Computation, explained in the following. 1. Luminance Frame Histogram Computation - Computes Figure 7 - Video Segment Edges & Color Analysis module flowchart. the luminance histogram of each input video frame. The following operations are performed in Video Segment 2. Luminance Histogram Distance Computation – It is Edges & Color Analysis: applied to each pair of consecutive frames, i and i-1, in order 1. Key Frames Extraction – Some “key frames” (KF) are to evaluate how similar they are. Chi-Square Distance was the selected and extracted as segment representative frames. metric chosen to compute the distance between two 2. Key Frames Edges Detection - An edge detection histograms. algorithm (Canny edge detector) is applied to the four corners 3. Adaptive Threshold Computation - For each temporal where DoGs are likely to be present, over each KF. Figure 8 window comprising WF frames, the mean value of Chi-Square presents an example with a difficult logo detection situation. Distance between consecutive frames inside the current window, , is computed. Then, the adaptive threshold (adaptTh) is obtained as adapthTh = ThWin, where ThWin depends on , according to (1):

(1)

Figure 8 – Example of a KFs edges maps obtained for a single corner As frame windows with different Chi-Square Mean values of the sequence “rtp1_demo1” [24]. have specific characteristics and represent different content, must be dealt with differently. That is the reason why the 3. Key Frames Edges Fusion – For each corner, the corresponding KF edge maps are combined applying a logical algorithm ignores the windows where , and assigns different factors, TpA and TpB, according to . ‘OR’ operation over them. Figure 9 shows the result for the video test sequence “rtp1_demo1”. 6

To be checked DoGs are those considered as potentially recoverable in DoG Presence Verification process, which can only be made if the DoGs Database is not empty.

Figure 9 – Example of Key Frames Edges Fusion output.

4. Color Analysis of Edge Pixels – This is the last step before concluding about the existence or not of relevant static pixels, for a corner of the video segment under analysis. The color analysis is performed not only for the edge pixels themselves, but also for their neighborhood, obtained after dilation is applied to the output generated by the Key Frames Edge Fusion step. Finally, the variances of the two chrominance components, and , are computed for every pixel belonging to the dilated edge map. Figure 11 - DoG Detection module flowchart. 5. Video Segment Static Pixels Detection - A final decision about each pixel state, as “static” or “non static”, is taken 2. DoGs Presence Verification (DPV) – Performed only over according to (2), where is the maximum acceptable To be checked DoGs. Consists in performing edges and color value for the variance of chrominance components. comparison between the output of SMPs Intersection and all the DoGs in the Database. If is there any match found, it is considered that a DoG is detected.

(2) C. DoGs Database Updating & DoG Type Decision The DoGs Database is a structure designed to keep all the Figure 10 presents the output of this step, for the sequence that necessary information about the DoGs detected over time. As has been followed. this is a key element in the global system proposed, it demands a clear and robust management, as the data it contains is critical to conclude about the final classification of each video segment. The DDB is organized in two areas, according to their type: TV channel logos and non-TV channel logos (divided in commercial brand logos, program/series logos and Other DoGs). Figure 12 presents the DoGs Database Updating & Logo Type Decision module flowchart. Two different solutions to execute the tasks associated to DoGs Database Updating and DoG Type Decision are suggested: the Basic Solution (implemented and assessed) and the Advanced

Figure 10 - Static Pixels Map (SPM) for the video sequence used as Solution (only described conceptually). The main differences example. between both solutions are not only reflected in the amount of data that is kept for each DoG, but also in the procedures Next, DoG Detection main function proceeds (see Figure 11). related with the DoGs Insertion in DoGs Database and DoGs The following operations are performed in Video Segment Type Decision. The Advanced Solution was thought to Edges & Color Analysis: respond not only to legal issues but also to some specific 1. SPMs Intersection - A logical ‘AND’ operation is applied observations made - some examples of conditions included in to the most recent Nseg SPMs, which implies that the first the rationale to design the Advanced Solution is the legal decision about the presence of a DoG only occurs after the limitation of twelve minutes of advertising per broadcaster, Nseg-th segment is processed. According to the amount of per hour; the difference established between the different pixels that remain after this operation, a first decision about types of non-TV channel logos; the observations related to the DoG detection is made, according to the thresholds MinPixTh simultaneous presence on the screen of different types of and StatPixTh: DoGs, among others. The detail presented by this last solution is expected to assure additional value comparing to Basic Solution, as it takes into account more aspects related to the

(3) actual commercials characteristics. However, the Advanced Solution is only described conceptually because it was settle and defined only when the Thesis was already in a very 7 advanced state and it was no longer possible to implement the it is a TV channel logo, a commercial brand logo or a whole new DDB management system. Also, to build and program/series brand logo. reproduce the conditions allowing properly testing and The following modules are the basis of DoGs Database assessing the new system would be very sluggish process that could not be done in the remaining time. Updating & Logo Type Decision: 1. DoGs Matching - The DoGs Matching (DM) module has the goal to compare the DoG acquired by the DoG Acquisition Algorithm with the DoGs in the DDB to decide if the DoG is already known and stored. This process is applied in both Basic and Advanced Solutions. The DM module is similar to the DPV module as their goals are similar – to associate a DoG in a corner to one of the DoGs in the DDB. So, DoGs identified in the DPV module are not processed again in DM. Comparing to DPV, an additional condition is introduced here to increase the robustness of the matches found at this stage. This new condition (which could not be used in DPV because, Figure 12 - DoGs Database Updating & Logo Type Decision module as said before, DoGs that goes through that step are not flowchart. complete or well-defined) depends on the distance between Both Basic and Advanced Solutions use the following data the center of mass (CM) of the DoG detected and the DoG in to characterize each DoG: DDB. a. DoG type – Primary labeling information, which 2. DoGs Insertion in DDB distinguishes a TV channel logo from a non-TV channel logo; DoGs Insertion in the DoGs Database depends on several these labels, once attributed, are definitive; there is also a conditions that must be verified in first place. Those temporary state, corresponding to undefined logos, which may conditions are based on the DoGs detection results for all evolve to one of the other two states. corners (after analyzing each video segment) and they differ b. Chrominance components per pixel – Average value of for the Basic and Advanced Solutions. each chrominance component over the video segments where a. Basic Solution Conditions the DoG was detected, and for each pixel belonging to the In this simpler approach, every new DoG detected is DoG’s edges. considered potentially relevant and thus its actual type (TV c. Center of mass – Coordinates ( , ) of the DoG’s channel logo or non-TV channel logo) has to be verified in center of mass which are used to characterize the spatial later steps, depending on its time persistence; thus, if a given distribution of the DoG pixels. DoG is detected for the first time, it is added to the DDB and it d. Date of last detection – Date corresponding to the last is classified as being an undefined logo. b. Advanced Solution Conditions detection of the DoG. This date is regularly checked and Taking into account the characteristics of each detected DoG, stored in the Database Update & Management module it is the following verifications are made: useful to know when a DoG does not appear for a long time i. If a given DoG is detected at the same time as another and may be removed from the DDB; it is also used to serialize DoG already known and classified as a TV channel logo, it is the logos in the DDB. ignored (i.e., it is not inserted in the DDB), as only one TV e. Time persistence – Percentage of time the DoG has been channel logo may exist in a same frame, so the detected DoG on the air which is essential information to determine the DoG is certainly an “Other DoGs”. type. ii. If a given DoG is detected at the same time as other In addition, the Advanced Solution also requires the DoGs, but none of them is classified as a TV channel logo, it is following information about each DoG: added to DDB as undefined logo. f. Maximum consecutive duration – Maximum consecutive iii. If a given DoG is detected alone (i.e., no more DoGs time the DoG was broadcasted, TMaxConsec. This information were identified in the same video segment) for a consecutive is useful to distinguish TV channel logos from non-TV channel time lower than four minutes, it is classified as undefined logo logos. and the value “1” is assigned to the flag “alone_appearance”. g. Flag “alone_appearance” – Flag used to know if a given iv. If a given DoG is detected alone (i.e., no more DoGs DoG has already appeared alone (i.e., with no more DoGs in were identified in the same video segment) for a consecutive the same video segment); it is useful when determining the time higher than four minutes, it is classified as undefined DoGs’ type, as a DoG only appears in three circumstances: if logo. 8

The additional DoGs’ parameters specific for this solution, was chosen as the best parameter. The DDB works as a queue presented in the beginning of this section, are inserted in the whose first element is the DoG that most recently appeared, DDB as follows: while the last element is the oldest. This criterion was adopted A. Maximum consecutive duration – The maximum to make the search processing more efficient as the most likely the DoG to appear at any moment is the one which has been consecutive time the DoG was broadcasted, T is MaxConsec, broadcasted recently. By doing this, it is expected to reduce computed and saved. This parameter may be updated in later the computational cost and run time of stages that imply DoG’s appearances, whenever the maximum consecutive comparison operations on a pixel level (e.g. DoGs Presence Verification and DoGs Matching), because the access to the duration of the DoG increases. most recently broadcasted DoGs has priority. This procedure B. Flag “alone_appearance” – If the condition 3. stated is common to both Basic and Advanced Solutions. above is verified, the value assigned to this parameter is “1”; d. DoGs Removal from DDB otherwise, it is “0”. The DoGs that do not need to be stored anymore shall be 3. Database Update & Management - This module is removed. Two conditions to remove non-TV channel logos responsible for updating the information about all DoGs in the from the DDB are proposed: the first one depends on the DDB. Whenever a known DoG is detected, the maximum DDB’s size – when the limit is reached, DoG in the DDB consecutive duration, date of last occurrence and time queue’s last position is removed; the second one depends on persistence are reevaluated. Taking the new information into the date of each DoG’s last appearance – if it does not appear account, several operations shall be executed in order to keep a long time ago, it should be removed. the DDB updated. Also, as it has been already mentioned, the D. Video Segment Classification amount of different DoGs detected in the long-term is The Video Segment Classification is the final decision stage enormous; thus, a main goal to build an intelligent and and it decides the final output of the proposed global solution: efficient DDB is to keep only the DoGs that actually represent a segment is classified either as a Commercial Block or a logos. Regular Program. Figure 13 presents the flowchart designed a. Basic Solution Rationale - In the context of this solution, for this last stage. The following situations are considered every DoG is seen as a potential TV channel logo, being that when classifying a video segment: the reason why, following this simplistic approach, all 1. No logo detected – If no DoG is detected in any corner, the detected DoGs are added to DDB in first place, as undefined video segment is classified as belonging to a Commercial logo. Then, the principle followed to later classify a DoG in Block. DDB as TV channel logo or non-TV channel logo is the time 2. TV channel logo detected – When a segment is associated persistence it presents, as the examination of real TV content to a TV channel logo, the segment is classified as a Regular allows to conclude that, in long term, a TV logo is on air for Program. much longer than any other type of DoG. The major risk 3. Undefined logo detected - When a segment is associated to associated to this approach is related to the definition of an undefined logo, the segment is classified as a Regular thresholds associated to time persistence evaluation, which Program. This decision may not be obvious, but the goal is may lead to potentially poor assignment of TV channel logos not to lose potentially relevant content assuming the user classification to DoGs that are, for instance, “program/series wants to see regular programs: as an undefined logo may be logos”. If this happens, the effect in the final results may be an later classified as TV channel logo, this classification incorrect classification when “program/series logos” appear in guarantees that no regular content (i.e. non-commercial commercials, as it happens in some cases. content) is wrongly classified as commercial. b. Advanced Solution Rationale – As already referred, DoGs 4. Non-TV channel logo(s) detected – When all DoGs considered important to keep in the DDB are those identified detected are non-TV channel logos, the segment is classified as TV channel logos and, among the non-TV channel logos, as a Commercial Block. the “commercial brand logos” and “program/series logos”. After this step, the most recent video segment has completed The reason why the “Other DoGs” should not be kept (and its processing. A report with the video segment classification they are not) in the DDB is because they do not represent is updated and the algorithm continues its course to the next specific instances (a TV channel, a program or a brand), are video segments. circumstantial to the content and may vary a lot even during a single program; leaving them in the DDB would not only increase the required memory but also the computational cost required for the comparison operations made on DoGs Presence Verification and DoGs Matching modules. On the other hand, “commercial brand logos” and “program/series logos” should be kept in the DDB because they appear regularly on TV and may have a real impact on each video segment classification - this is due to the way the final classification of each video segment is determined. c. DoGs Serialization in DDB - In order to organize all the Figure 13 – Video Segment Classification module flowchart. acquired DoGs in a logical order, the date of last occurrence 9

V. PERFORMANCE EVALUATION 2. DoGA Assessment – Table 3 presents the final results for An exhaustive evaluation methodology has been designed DoGA module. In this table, the “positive events” are the to obtain reliable and representative results as well as clear screen corners containing DoGs. and relevant conclusions. Tests were performed with two main Table 3 – DoGA module: performance results. modules of the proposed solution, notably the Shot Change Corners TPR TNR Prec F1 Acc 2100 98,5% 95,6% 95,8% 97,1% 97,0% Detection algorithm (SCD) and the DoG Acquisition algorithm (DoGA), and also with the Global Solution (GS) for The results are satisfying and prove the validity of the Commercials Detection. proposed approach. All logos included in the categories “Opaque & Static”, “Opaque and Dynamic (Texture)” and A. Test Material “Semitransparent & Static” were always correctly detected for 1. SCD Assessment – Eight video sequences [25] have been all the sequences, demonstrating the robustness of the created, where each sequence is a composition of recording proposed algorithm. Most of the FP events were caused by from a single TV channel. The videos have a spatial resolution highly textured backgrounds, as it was expected. of 1920x1080 pixels and include several kinds of content, including movies, sports, cartoons and news programs. The 3. Global Solution Assessment – Tables 4, 5 and 6 present total duration is 2240 seconds. the results obtained for TV Commercials detection, tested over 2. DoGA Assessment - Eight video sequences [26] have been the sequences sicNotGS, tviGS and rtpGS. In these tables, the created, where each one is a composition of recording from a event considered as “positive” is a frame belonging to a single TV channel. The TV channels were selected in order to commercial block. assure a high diversity for the DoGs; there are four categories: Opaque & Static, Opaque & Dynamic (Texture), Table 4 - GS module: performance results for sicNotGS video test Semitransparent & Static, Semitransparent & Dynamic sequence. (Shape). The total duration is 2575 seconds. Shots Total TPR TNR Prec F1 Acc 3. Global Solution Assessment - Three video sequences were Detection Frames Ground created [27] for this test experiment, each one composed by 89,4% 100,0% 100,0% 94,4% 98,1% recordings of a single TV channel, containing content of Truth 6824 regular programs and commercials (among these, some with SCD 87,4% 100,0% 100,0% 93,3% 97,7% and some without commercial brand logo in the screen). The Table 5 - GS module: performance results for tviGS video test total duration is 1149 seconds. sequence. Video B. Performance Assessment Methodology and Metrics Total Shots TPR TNR Prec F1 Acc Frames Binary classifiers were used to assess all algorithms. The Detection following metrics were used: True Positive Rate, True Ground 93,2% 96,4% 94,6% 93,9% 95,1% Negative Rate, False Positive Rate, False Negative Rate, Truth 4215 Precision, F1-Score and Accuracy. SCD 85,8% 97,7% 94,7% 90,1% 93,9%

C. Results and analysis Table 6 - GS module: performance results for rtpGS video test 1. SCD Assessment - Table 2 presents the final results for sequence. SCD module - in this table, the “positive events” are the Video Total frames consisting in hard cuts. Overall, these results are Shots TPR TNR Prec F1 Acc Frames satisfying. In general, taking into account the context of this Detection Ground Thesis, a FN occurrence is more worrisome that a FP 94,2% 100,0% 100,0% 97,0% 97,8% occurrence (i.e., it is more problematic not to detect a hard cut Truth 17708 than to detect an inexistent hard cut), because a FN may be the SCD 92,0% 100,0% 100,0% 95,9% 97,0% boundary between a Regular Program and a Commercial Block and, in that case, it is likely to cause the Overall, the results are satisfying. It becomes clear that the misclassification of the corresponding video segment. difference between using the ground truth and the developed Regarding FPs, most of them result from sudden and high SCD algorithm is negligible. TNR, Prec, F1 and Acc present changes in the brightness conditions and are difficult to avoid. values above 93,9%. The greatest amount of FP and FN In terms of FN, they result from particular cases where, occurrences are related to the incorrect association between despite the differences between consecutive frames, the static areas in a commercial and the DoGs in DDB, showing histogram does not vary enough in the context of the frame that the DoGs Matching mechanism is not perfect and should window used, and thus the adopted threshold is not suited to be improved. Run time is about 66% of the actual test that situation. sequences duration, in average, showing that the algorithm may run in real-time. Table 2 - SCD module: performance results. Hard Cuts Frames TPR FPR FNR Prec F1 Ground Truth VI. CONCLUSION 56080 510 90,2% 0,08% 10,6% 92,4% 91,1% Three main strengths can be identified in the solution proposed in this Thesis: in first place, the approach followed to detect DoGs (even those with poorly defined boundaries 10 and likely to be confused with similar background) is not Content-Based Multimed. Indexing, CBMI 2008, Conf. Proc., pp. computationally expensive and the results show the validity of 236–241, 2008. [12] E. Mikhail and D. Vatolin, “Automatic Logo Removal for the process. In second place, and to the knowledge of this Semitransparent and Animated Logos.” Thesis author, this is the first work that deals with the TV logo [13] N. Özay and B. Sankur, “Automatic TV logo detection and detection problem as a particular case of the more generic classification in broadcast videos,” Eur. Signal Process. Conf., no. DoGs detection case. In third place, the fact that both Basic Eusipco, pp. 839–843, 2009. [14] C. Panagiotakis and G. Tziritas, “A speech/music discriminator and Advanced Solutions for the DoGs Database Management based on RMS and zero-crossings,” IEEE Trans. Multimed., vol. 7, are based on real and recent observations about the way no. 1, pp. 155–166, 2005. broadcasters and advertisers are using DoGs. It is also [15] L. Duan, J. Wang, Y. Zheng, J. S. Jin, H. Lu, and C. Xu, important to note that the proposed solution may be “Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis,” in Proceedings implemented in real-time (as the run time of the global of the 14th annual ACM international conference on Multimedia, solution is about one third lower than the actual media time), 2006, pp. 201–210. which is mostly due to the fact that the DoGs acquisition [16] L. Lu, H. J. Zhang, and S. Z. Li, “Content-based audio classification process only depends, at most, on a tenth of the total number and segmentation by using Support Vector Machines,” Multimed. Syst., vol. 8, no. 6, pp. 482–492, 2003. of frames analyzed. Also, the proposed system does not need [17] M. Li, C. Yong, W. Min, and L. Yuanxing, “TV commercial any previously built DoGs database, which is an important detection based on shot change and text extraction,” in Proceedings advantage to take into account. On the other hand, there are of the 2009 2nd International Congress on Image and Signal some processes that have been identified as containing Processing, CISP’09, 2009. [18] N. Dimitrova, L. Agnihotri, and G. Wei, “Video classification based potential to be improved; e.g., the fact that the DoGs on HMM using text and faces,” in EUSIPCO, 2000. acquisition algorithm depends on static pixels makes it hard to [19] J. M. Gauch and A. Shivadas, “Identification of new commercials properly detect very dynamic logos, despite this type of TV using repeated video sequence detection,” in IEEE International channel logos is not common at all. Another negative point in Conference on Image Processing 2005, 2005, vol. 3, pp. II–1252. [20] Y. Li, D. Zhang, X. Zhou, and J. S. Jin, “A confidence based terms of DoGs acquisition is its sensibility to variations, on a recognition system for TV commercial extraction,” Conf. Res. pixel level, in the position of the logo on the screen, despite Pract. Inf. Technol. Ser., vol. 75, pp. 57–64, 2008. the dilation operation that is made in Color Analysis of Edge [21] C. Herley, “ARGOS: Automatically extracting repeating objects Pixels step being able to prevent some of those cases. from multimedia streams,” IEEE Trans. Multimed., vol. 8, no. 1, pp. 115–129, 2006. [22] “Jornal ‘Público’ Online.” [Online]. Available: VII. BIBLIOGRAPHY https://www.publico.pt/tecnologia/noticia/a-partir-de-junho-a- publicidade-na-tv-deixa-de-subir-o-volume-1725736. [Accessed: [1] “Directive 2010/13/EU of the European Parliament and of the 08-Oct-2016]. Council of 10 March 2010 on the coordination of certain provisions [23] “Branding with Bugs.” [Online]. Available: laid down by law, regulation or administrative action in Member http://www.videomaker.com/article/c3/14602-branding-with-bugs. States concerning the provision of audiovisual media services.” [Accessed: 08-Oct-2016]. [Online]. Available: http://eur-lex.europa.eu/legal- [24] “‘rtp1_demo1’ video sequence.” [Online]. Available: content/EN/ALL/?uri=CELEX:32010L0013. [Accessed: 14-Oct- https://www.dropbox.com/s/9rlzlcsajgad9dz/rtp1_demo1.mp4?dl=0. 2016]. [Accessed: 08-Oct-2016]. [2] . Lienhart, C. Kuhmunch, and W. Effelsberg, “On the detection [25] “Shot Change Detection Assessment Dataset.” [Online]. Available: and recognition of television commercials,” in Proceedings of IEEE https://www.dropbox.com/sh/64mbtob2zgby3j6/AACAD- International Conference on Multimedia Computing and Systems, wHwCffLoFyw3zoEKoka?dl=0. [Accessed: 08-Oct-2016]. 1997, pp. 509–516. [26] “DoG Acquisition Assessment Dataset.” [Online]. Available: [3] D. a. Sadlier, S. Marlow, N. O’Connor, and N. Murphy, “Automatic https://www.dropbox.com/sh/2hnhjyb9ld55mk4/AAAZuucfALVyy TV advertisement detection from MPEG bitstream,” in Internationa ZYSV-sj8ZE8a?dl=0. [Accessed: 08-Oct-2016]. Conference on Enterprise Information Systems, 2002, vol. 35, no. [27] “Global Solution for Detecting Commercials Assessment Dataset.” 12, pp. 2719–2726. [Online]. Available: [4] S. Marlow, D. a Sadlier, K. McGeough, N. O’Connor, and noel https://www.dropbox.com/sh/urmly4b8yjm1tgk/AACk0CI6Nopxel Murphy, “Audio and video processing for automatic TV KglA-momw7a?dl=0. [Accessed: 08-Oct-2016]. advertisement detection,” in Irish Signals and Systems Conference, 2001, pp. 25–27. [5] J. Chen, J. Yeh, W. Chu, J. Kuo, and J. Wu, “Improvement of commercial boundary detection using audiovisual features,” Adv. Multimed. Inf. Process., vol. 3767, pp. 776–786, 2005. [6] C. Colombo, A. Del Bimbo, and P. Pala, “Retrieval of commercials by semantic content: The semiotic perspective,” Multimed. Tools Appl., vol. 13, no. 1, pp. 93–118, 2001. [7] Z. Feng and J. Neumann, “Real time commercial detection in videos,” 2013. [8] R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and classifying scene breaks,” in Proceedings of the third ACM international conference on Multimedia, 1995, vol. 95, pp. 189–200. [9] R. Glasberg, C. Tas, and T. Sikora, “Recognizing commercials in real-time using three visual descriptors and a decision-tree,” IEEE Int. Conf. Multimed. Expo, pp. 1481–1484, 2006. [10] L. T. A. Albiol, M. J.Fullà, F. A. Albiol, “Detection of TV commercials,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, vol. 3, pp. 541–544. [11] E. Esen, M. Soysal, T. K. Ateş, A. Saracoǧlu, and a. A. Alatan, “A fast method for animated TV logo detection,” 2008 Int. Work.