Notes on Cinemetric Data Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Notes on Cinemetric Data Analysis Mike Baxter January 2014 Contents 1 Cinemetrics and R 1 1.1 Introduction 1 1.2 Cinemetrics 2 1.2.1 The idea 2 1.2.2 Pattern recognition in cinemetrics 3 1.2.3 Lies, damned lies and statistics (and fear) 3 2 Examples 5 2.1 Preamble 5 2.2 Examples 5 2.2.1 ASLs over time 5 2.2.2 Comparison of ASL distributions 6 2.2.3 Pattern and SL distributions 8 2.2.4 Internal pattern - individual films 10 2.2.5 Internal pattern - an example of global analysis 11 2.2.6 Shot-scale analysis 13 3 Getting R, getting started 21 3.1 Finding R 21 3.2 Data entry 21 3.2.1 General 21 3.2.2 Using the Cinemetrics database 22 3.3 Packages 22 3.4 Reading 23 4 Descriptive statistics 24 4.1 Introduction 24 4.2 Basics 24 4.3 Functions 25 4.4 Data manipulation 26 4.5 Illustrative graphical analyses 27 4.6 Definitionsandcommentsonsomesimplestatistics 31 5 Graphical analysis – basics 35 5.1 Histograms 35 5.1.1 Basics – an example 35 5.1.2 Technicalities 36 5.1.3 Example continued - log-transformation 37 5.2 Kernel density estimates 38 5.3 Boxplots 41 5.3.1 Basics 41 5.3.2 Interpretation 42 i 5.3.3 Boxplots and outliers 43 6 Comparative graphical analysis 46 6.1 KDEs and Histograms 46 6.2 Boxplots and violin plots 49 6.3 Cumulative frequency diagrams 49 6.4 Comparison with reference distributions 51 6.4.1 Comparisons with the lognormal distribution 51 6.4.2 Aspects of the normal distribution 51 6.4.3 Normal probability plots 52 6.4.4 Using KDEs for SL comparisons 52 6.4.5 Examples 54 7 Time-series analysis of SLs 58 7.1 Introduction 58 7.2 Polynomial models (trendlines) 59 7.3 Simple moving averages 62 7.4 Smoothed moving averages 63 7.5 Loess smoothing 64 7.5.1 Examples 64 7.5.2 Some technicalities 66 7.6 Robust methods 67 7.6.1 Introduction 67 7.6.2 Moving medians 68 7.6.3 Moving average of ranked SLs 69 7.6.4 Order structure matrices 71 7.7 Kernel density estimation 73 7.8 Summary 74 7.8.1 General similarities 74 7.8.2 Specific differences between methods 75 7.8.3 Additional methodological comments 76 7.8.4 Discussion and conclusion 77 8 Enhanced plots 79 8.1 Lights of New York (1928) explored 79 8.1.1 Color-coding shots 79 8.1.2 Adding ‘wallpaper’ 81 8.1.3 Order structure matrices revisited 82 8.2 Intolerance revisited 84 8.3 Jeu d’esprit 85 8.3.1 Halloween (1978) – matching color to content 85 8.3.2 Controlling for shot numbers - an idea 86 8.3.3 Code 89 8.3.4 Coda 90 9 Shot-scale data analysis 94 9.1 Correspondence analysis – initial example 94 9.2 What correspondence analysis does 97 9.3 Producing enhanced CA plots 98 9.4 CA and supplementary points 102 9.5 Other plotting methods 104 9.5.1 Analysis of ranks 104 9.5.2 Analysis of individual shot-scale types 106 ii 10 Large data sets 108 10.1 Introduction 108 10.2 Data entry and basic analysis 108 10.2.1 Data entry 108 10.2.2 Data analysis 109 10.2.3 Dealing with several films at once 109 10.3 Further basic code 110 10.4 Hypothesis tests 112 11 Median or mean? (and more) 113 11.1 Introduction 113 11.2 Some theory 114 11.3 Outliers and robustness 116 11.3.1 General remarks 116 11.3.2 Outliers and Brief Encounter 117 11.4 Case studies 119 11.4.1 ASL and MSL trends over time 119 11.4.2 ASLs,MSLsandthesilent-soundtransition 120 11.4.3 ASLsandMSLsinD.W.Griffith’sone-reelers 124 12 Lognormality of SLs – a case study 127 12.1 The lognormality issue 127 12.2 Exploratory analyses 128 12.2.1 Graphical inspection 128 12.2.2 Comparing estimates of the median 129 12.2.3 Skewness and kurtosis I 131 12.2.4 Skewness and kurtosis II 133 12.3 Omnibus tests of (log)normality 134 12.4 On distributional regularity – an alternative approach 137 12.5 R code 138 13 Multivariate methods – basics 140 13.1 Introduction 140 13.2 Data processing 141 13.2.1 Brisson’s partitioning method 141 13.3 Smoothing 142 13.3.1 General issues 142 13.3.2 Scaling 142 13.3.3 Degrees of smoothing 144 13.4 Distance measures 145 13.5Principalcomponent(PCA)andclusteranalysis 146 13.5.1 Principal component analysis 146 13.5.2 Cluster analysis 146 13.6 R code for Brisson’s partition method 147 14 Structure in Griffith’s Biographs 150 14.1 Introduction 150 14.2 Initial cluster analyses 151 14.3 Principal component analysis (PCA) 154 14.4 Varying the level of smoothing 157 14.5 Discussion 160 iii A The Lognormal distribution 167 A.1 Introduction 167 A.2 The lognormal and normal distributions 167 A.2.1 Definitions 167 A.2.2 Parameter interpretation 168 A.2.3 Models and estimates 169 A.3 Why logarithms? 170 A.4 Other normaliizing transformations 171 A.4.1 Box-Cox (BC) transformations 171 A.4.2 Yeo-Johnson (YJ) transformations 172 iv List of Figures 2.1 ASLs for American films (1955-2005). 6 2.2 Histograms for the ASLs of American and European silent films 1918-1923. 7 2.3 UsingKDEsandCFDstocompareASLdistributions. 8 2.4 Histograms and KDEs for the SL distributions of Brief Encounter and Harvey. 9 2.5 Smoothed SL data for The Blue Angel. 10 2.6 Time-series analyses of SL data from King Kong (2005). 12 2.7 Barchartsofshot-scaledatafor24filmsofFritzLang. 14 2.8 A correspondence analysis for the shot-scale data for films of Fritz Lang 15 2.9 Correspondence analyses of shot-scale data for films of Lang, Hitchcock and Ophuls. 17 2.10 Bar charts of shot-scale data for films of Hitchcock’ 18 2.11 Bar charts of shot-scale data for films of Hitchcock, LangandOphuls. 19 4.1 Scatterplot matrices for variables descriptive of SL distributions. 27 4.2 A ‘magnified’ version of part of Figure 4.1. 29 4.3 A scatterplot matrix of robust measures of dispersion for 134 Hollywood films. 30 5.1 Histograms for the SLs of A Night at the Opera. 36 5.2 Histograms of logged SLs for A Night at the Opera. 38 5.3 The default KDE for the SLs of A Night at the Opera. 39 5.4 The KDE for A Night at the Opera,usingSLslessthan40. 40 5.5 Different KDEs for the logged SLs of A Night at the Opera. 40 5.6 A boxplot and violin plot for A Night at the Opera 41 5.7 A variety of plots for logged and unlogged SLs for Harvey. 42 5.8 A histogram for the SLs of The Scarlet Empress and a variety of boxplots. 44 6.1 A comparison of SL histograms for Easy Virtue and The Skin Game. 47 6.2 A comparison of KDEs for the SLs of Easy Virtue and The Skin Game. 48 6.3 A comparison of KDEs for the logged SLs of Easy Virtue and The Skin Game. 48 6.4 Boxplots and violin plots, log-scaled, for Easy Virtue and The Skin Game. 49 6.5 A comparison of CFDs for the SLs of Easy Virtue and The Skin Game. 50 6.6 A probability plot for logged SL data for a A Night at the Opera. 53 6.7 Different comparisons of SLs for Exodus and Aristocats usingKDEs. 53 6.8 A histogram, KDE and probability plot for logged SLs of Walk the Line. 54 6.9 KDE and probability plots for A Night at the Opera and Walk the Line. 55 6.10 Comparisons of KDEs of films for logged SLs compared to lognormality. 56 7.1 Moving average of the SLs for Ride Lonesome from Salt (2010). 59 7.2 Fifth and first order polynomial fits to the SL data for Intolerance. 60 7.3 Linear and polynomial fits for the SLs of the different stories within Intolerance. 61 7.4 Moving averages for Ride Lonesome. 62 7.5 Moving average and loess smooths for Ride Lonesome. 63 7.6 Plots of SL against cut-point for Top Hat with a moving average and loess smooths. 65 7.7 Different loess smooths for the SLs of Top Hat. 65 v 7.8 Different loess smooths for the logged SLs of Top Hat. 67 7.9 Moving average, moving median and loess smooths for Top Hat. 68 7.10 A time-series analysis of ranked SL data for Top Hat fromRedfern. 69 7.11 A time-series analysis of ranked SL data for Top Hat. 71 7.12 An order structure matrix for the SL structure of Top Hat,afterRedfern. 72 7.13 A KDE of the time-series structure of Top Hat,fromRedfern. 73 7.14 KDE and loess smooths of the time-series for Top Hat. 74 8.1 Loess smooths for Lights of New York. 79 8.2 ‘Wallpaper’ plots and loess smooths for Lights of New York. 81 8.3 An order structure matrix plot forLights of New York and a variant of it. 83 8.4 Heat color representations of SL data for Halloween withloesssmooths.