2 Yule and Hooker and the Concepts of Correlation and Trend
Total Page:16
File Type:pdf, Size:1020Kb
Notes 2 Yule and Hooker and the Concepts of Correlation and Trend 1. Galton is often regarded as the father of correlation, but the concept – or at least the word – had been in common use, particularly in physics, since the middle of the cen- tury. The historical development of correlation, although fascinating, lies outside the scope of this study. A contemporary account of the development is provided by Pear- son (1920): for a later, rather more detached, discussion, see Stigler (1986, chapter 8). 2. George Udny Yule plays a major and recurring role in our story. Born on 18 February 1871 in Beech Hill near Haddington, Scotland, Yule was a member of an established Scottish family composed of army officers, civil servants, scholars and administrators and both his father, also named George Udny, and a nephew were knighted. Although he originally studied engineering and physics at University College, London (UCL), and Bonn, Germany, publishing four papers on electric waves, Yule returned to UCL in 1893, becoming first a demonstrator for Karl Pearson, then an Assistant Professor. Yule left UCL in 1899 to work for the City and Guilds of London Institute but also later held the Newmarch Lectureship in Statistics at UCL. In 1912 he became lecturer in statistics at the University of Cambridge (later being promoted to Reader) and in 1913 began his long association with St. John’s College, becoming a Fellow in 1922. Yule was also very active in the Royal Statistical Society: elected a Fellow in 1895, he served as Honorary Secretary, was President and was awarded the prestigious Guy Medal in Gold in 1911. His textbook, Introduction to the Theory of Statistics, ran to 14 editions during his lifetime and, as well as contributing massively to the foundations of time series analysis, he also researched on Mendelian inheritance and on the statis- tics of literary style, as well as other aspects of statistics. Retiring from his readership at the age of 60, and having always been a very fast driver, he decided to learn to fly, eventually buying his own plane and acquiring a pilot’s licence. Unfortunately, from 1932 heart problems curtailed his flying experiences and he became a quasi-invalid for the rest of his life, dying on 26 June 1951 in Cambridge. For further biographical details and a full list of publications, see Kendall (1952) and also Williams (2004). 3. Historical perspectives on Yule’s development of correlation and regression, which are not our major concern or focus here, but are arguably extremely important for the development of applied statistical techniques, are provided by Aldrich (1995, 1998) and Hepple (2001). 4. Expressing the k arrays in a tabular form gives what Yule refers to as a correlation table. 5. In terms of the original variables Y and X, the regression line is Y = a + bX with a = Y − bX. 6. Or, as Yule (1897b, page 818) put it, ‘errors of mean square, measuring the degree of scatter of the X’s and Y’s around their mean values’. 2 7. R1 is, of course, the coefficient of multiple correlation in modern econometric parlance. 8. We follow Yule (1907) is using the notation r12.3 here, rather than ρ12 as used in Yule (1897b). 9. Meteorological applications of correlation were also beginning to appear around this time: see Pearson and Lee (1897) and Cave-Browne-Cave and Pearson (1902). 419 420 Notes 10. Reginald Hawthorn Hooker (1867–1944) was a contemporary and close friend of Yule. Educated at the Collège Rollin, Paris, and Trinity College, Cambridge, Hooker was a career civil servant at the Board (later Ministry) of Agriculture, retiring in 1927, although before he joined the board in 1895 he was for four years Assistant Secretary of the Royal Statistical Society. A Fellow of both this and the Royal Meteorological Society, of which he was twice President, Hooker made contributions in the use of cor- relation to analyse economic, agricultural and meteorological data: see the obituary by Yule (1944a) for biographical detail. 11. Some years earlier, Poynting (1884) had used the ratio of a four-year moving average to a ten-year moving average of wheat prices and imports of cotton and silk to remove both secular movements and short-run fluctuations from the series, but his analysis was purely descriptive. 12. Hooker’s results from using this technique were reported in further detail in Yule (1906), which also contained an extended discussion about the economic factors influencing marriage and birth-rates. 13. Hooker (1901b) used the notation rm(t−k), with m and t denoting the marriage rate and trade respectively. Hooker also constructed ‘coefficient curves’ in which a curve was interpolated through the scatterplot of rm(t−k) against k, from which he was able to estimate the maximum correlation and the (non-integer) value of k producing that maximum; we have not felt the desire to repeat this procedure here! 14. As time series data are now explicitly being considered, generic observations are denoted by t subscripts. 3 Schuster, Beveridge and Periodogram Analysis 1. Franz Arthur Friedrich Schuster (1851–1934) was born and educated in Frankfurt am Main, Germany before joining his parents in Manchester, where the family textile busi- ness was based, in 1870, becoming a British citizen in 1875. Independently wealthy, he studied physics at Heidelberg, Gottingen and Berlin before spending five years at the Cavendish Laboratory in Cambridge. In 1881 Schuster became Professor of Applied Mathematics and then, in 1887, Langworthy Professor of Physics, at Owens College, part of Manchester University. After retiring in 1907 (and being succeeded by Ernest Rutherford) he devoted the remainder of his professional life to promoting the cause of international science, being knighted in 1920. Schuster’s interest in meteorol- ogy, earthquakes and terrestrial magnetism was longstanding and he took part in four eclipse expeditions. 2. The earliest reference to periodogram analysis appears to be in a note by G.G. Stokes to the paper by Stewart and Dodgson (1879) reporting on attempts to detect links between sunspots and magnetic and meteorological changes on earth. 3. A modern and comprehensive treatment of Fourier analysis may be found in Pollock (1999, chapters 13–15), where the derivation of these equations may be found. 4. The series is obtained from the National Geophysical Data Center (NGDC) website: www.ngdc.noaa.gov. 5. The periodogram is calculated with s = 10 and p ≤ 240, so that all one-tenth of a year cycles up to 24 years are computed. 6. Schuster (1906) made a concerted attempt to investigate many of the minor peaks in the periodogram but, given the difficulties that later became apparent in trying to interpret calculated periodograms (see Chapters 5 and 13), we content ourselves with just this analysis. Notes 421 7. William Beveridge, 1st Baron Beveridge (1879–1963), was a British economist and social reformer. After being educated at Charterhouse School and Balliol College, Oxford, Beveridge’s varied career encompassed the law, journalism, the civil service and politics. Knighted in 1919, he was made a baron in 1946 and eventually became leader of the Liberals in the House of Lords. He was also a leading academic, being Director of the London School of Economics and Master of University College, Oxford. Probably best remembered for his 1942 report Social Insurance and Allied Services (known as the Beveridge Report), which served as the basis for the welfare state, especially the National Health Service, Beveridge was an expert on unemployment and labour mar- kets. Although he had a long-standing interest in the history of prices, his work on periodogram analysis stands apart from his usual areas of research. 8. The periodogram was constructed for s = 2 and p ≤ 168, so that all half-year cycles up to 84 years were calculated. This was felt to be as fine a ‘mesh’, to use Beveridge’s terminology, as was needed for the purposes of the example. Beveridge used an even finer mesh and, given the rudimentary calculating technology available at the time, this obviously took considerable time and effort. The periodogram was recalculated by Gower (1955) using a computer for the first time: the computations took approximately two hours on the Manchester University Mark II Electronic Computer. The calculations reported here were produced using a Gauss program written by the author and took less than 0.008 seconds using a standard desktop PC. 4 Detrending and the Variate Differencing Method: Student, Pearson and Their Critics 1. William S. Gosset (1876–1937) was one of the most influential statisticians of the twentieth century. As an employee of Guinness, the famous Irish brewer of stout, he was precluded from publishing under his own name and thus took the pseudonym ‘Student’. Gosset worked primarily on experimental design and small sample prob- lems and was the inventor of the eponymous t-distribution. His research focus is tangential to our theme and this was his only paper on the analysis of time series. For further biographical details see Pearson (1950, 1990). 2. Karl Pearson (1857–1936) was another of the greats of statistics in the early twentieth century whose interest in time series was only tangential. His son, Egon, another extremely influential statistician, produced a biography of his father soon after Karl’s death (Pearson, 1936/38, 1938). More recently, Porter (2004) concentrates on Pear- son’s life before 1900, focusing on his wide range of interests and the factors that influ- enced him into becoming a statistician. John Aldrich’s Pearson website provides eas- ily accessible detail and numerous links: www.economics.soton.ac.uk/staff/aldrich/ kpreader.html.