<<

Early beginnings

450 bc Hippias of Elis 1303 A Chinese diagram entitled “The Old uses the value 400 bc In the Indian epic the Mahabharata, King Rtuparna estimates the Method of the Seven Multiplying of the length of a king’s of fruit and leaves (2095 fruit and 50 000 000 leaves) on two great Squares” shows the binomial coefficients reign (the ) to work branches of a vibhitaka tree by counting the number on a single twig, then ad 7 by Quirinus, governor of the Timeline of 1188 Gerald of up to the eighth power – the that 1346 Giovanni Villani’s Nuova out the date of the first multiplying by the number of twigs. The estimate is found to be very close to Roman province of Judea, is mentioned in 10th century The earliest known graph, in a commentary on a Wales completed are fundamental to the of Cronica gives statistical Olympic Games, some 300 the actual number. This is the first recorded example of – “but this Luke’s Gospel as causing Joseph and Mary to book by Cicero, shows the movements of the planets through the the first population , and that appeared five hundred on the population years before his time. is kept secret”, says the account. travel to Bethlehem to be taxed. zodiac. It is apparently intended for use in monastery schools. census of Wales. years later in the west as Pascal’s triangle. and of Florence. Photo: Matthias Kabel Matthias Photo: Roman Gudyma/iStock/Thinkstock Roman byggarn79/iStock/Thinkstock

500 400 300 200 100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

Statistics is about gathering and working out what the numbers can 431 bc Attackers besieging Plataea in the Peloponnesian ad 2 Chinese census under the 840 Islamic Al-Kindi uses 1069 Domesday Book: survey 1150 Trial of the Pyx, an annual of the purity tell us. From the earliest farmer estimating whether he had enough grain war calculate the height of the wall by counting the Han dynasty finds 57.67 million – the most common for William the Conqueror of of coins from the Royal Mint, begins. Coins are to last the winter to the of the Large Hadron Collider confirming number of bricks. The count was repeated several times people in 12.36 million symbols in a coded message will stand farms, villages and livestock in drawn at random, in fixed proportions to the number the probable existence of new particles, people have always been making by different soldiers. The most frequent value (the ) households – the first census for the most common letters – to break his new kingdom – the start of minted. It continues to this day. inferences from data. Statistical tools like the mean or average summarise was taken to be the most likely. Multiplying it by the from which data survives, and secret . Al-Kindi also introduces in England. Snarikov/iStock/ Anton Thinkstock height of one brick allowed them to calculate the length still considered by scholars to Arabic numerals to Europe. data, and standard deviations measure how much variation there is within a of the ladders needed to scale the walls. have been accurate. set of numbers. Frequency distributions - the patterns within the numbers 1886 Philanthropist Charles Booth begins or the shapes they make when drawn on a graph - can help predict future 1761 The Rev. Thomas his survey of the London poor, to produce his events. Knowing how sure or how uncertain your estimates are is a key part Bayes proves Bayes’ 1835 Belgian ’s 1854 John Snow’s “cholera map” “poverty map of London”. Areas were coloured

iStock/Thinkstock of statistics. theorem – the cornerstone 1791 First use of the word Treatise on Man introduces social pins down the source of an outbreak black, for the poorest, through to yellow for of “statistics” in English, by Sir statistics and the concept of as a water pump in Broad Street, the upper-middle class and wealthy. Today vast amounts of digital data are transforming the world and the Mathematical foundations and the testing of beliefs John Sinclair in his Statistical the “average man” – his height, body London, beginning the modern way we live in it. Statistical methods and are used everywhere, from and hypotheses. Account of Scotland. mass index, and earnings. study of epidemics. health, science and to managing traffic and studying sustainability 1868 Minard’s graphic and climate change. No sensible decision is made without analysing the data. 1654 Pascal and diagram of Napoleon’s March The way we handle that data and draw conclusions from it uses methods Fermat correspond 1713 Jacob Bernoulli’s Ars 1789 Gilbert White and other 1808 Gauss, with 1840 William Farr sets up the on Moscow shows on one about dividing stakes 1663 John Graunt conjectandi derives the of 1749 Gottfried Achenwall clergymen-naturalists keep contributions from Laplace, official system for recording diagram the covered, 1898 Von Bortkiewicz’s data on deaths whose origins and progress are charted here. 1560 Gerolamo Cardano in gambling games uses parish records large numbers – the more often coins the word “statistics” (in records of , dates derives the causes of death in England and the number of men still alive of soldiers in the Prussian army from Julian Champkin calculates and together create to estimate the you repeat an , German, Statistik); he of first snowdrops and cuckoos, – the bell-shaped curve Wales. This allows epidemics to be at each kilometre of the horse kicks shows that apparently rare Significance magazine of different dice throws the mathematical population of the more accurately you can the information you need to etc; the data is later useful for fundamental to the study of tracked and diseases compared – march, and the temperatures events follow a predictable pattern, for gamblers. of probability. London. predict the . run a nation state. study of climate change. variation and . the start of . they encountered on the way. the . iStock/Thinkstock Jackson/iStock/Thinkstock Brian

1560 1580 1600 1620 1640 1660 1680 1700 1720 1740 1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000 0. 4 1570 Astronomer Tycho 1644 Michael van Langren 1657 Huygens’s 1693 Edmund Halley prepares the 1728 Voltaire and his mathematician 1757 Casanova becomes a 1790 First US census, taken 1805 Adrien-Marie 1839: The American Statistical 1859 Florence Nightingale uses statistics of 1894 introduces the term Brahe uses the draws the first known graph On Reasoning in first mortality tables statistically friend de la Condamine spot that a trustee of, and may have by men on horseback directed Legendre introduces the Association is formed. Alexander Crimean War casualties to influence public “standard ”. If are normally .3 mean to reduce errors in his of statistical data that Games of Chance is relating death rates to age – the Paris bond lottery is offering more in had a hand in devising, by Thomas Jefferson, counts method of Graham Bell, Andrew Carnegie opinion and the War Office. She shows casualties distributed, 68% of samples will lie within estimates of the locations of shows the size of possible the first book on foundation of life . He also prize money than the total cost of the French national lottery. 3.9 million Americans. for fitting a curve to a and President Martin Van Buren month by month on a circular chart she devises, one of the mean. Later he 0. 20 34.1% 34.1%

stars and planets. errors. It is of different . drew a stylised map of the path of a the tickets; they corner the market given set of . will become members. the “Nightingale rose”, forerunner of the pie develops chi-squared tests for whether two .1 2.1% 2.1% estimates of the distance He also invented the solar eclipse over England – one of and win themselves a fortune. chart. She is the first woman member of the variables are independent of each other. 0.1% 13.6% 13.6% 0.1%

between Toledo and Rome. pendulum clock. the first data visualisation maps. Royal Statistical Society and the first overseas 0. 00

1786 William −3σ −2σ −1σ µ21σ σ 3σ Mwtoews Playfair introduces 1833 The British Association for the 1849 member of the American Statistical Association.

graphs and bar Advancement of Science sets up a designs his “difference         to show statistics section. Thomas Malthus, engine”, embodying         . who analysed population growth, and the ideas of data 1877 , Darwin’s cousin,            Charles Babbage are members. It later handling and the modern describes regression to the mean. In                1888 he introduces the concept of  becomes the Royal Statistical Society. . Ada Lovelace,     Lord Byron’s niece, correlation. At a “Guess the weight of        writes the world’s first an Ox” contest in Devon he describes Kim/iStock/Peter Thinkstock   1900 Louis Bachelier 1916 During the First World War 1935 . A. Fisher   for it. the “Wisdom of Crowds” – that the shows that fluctuations in car designer Frederick Lanchester 2002 Paul DePodesta uses  1924 Walter revolutionises modern average of many uninformed guesses stock market prices behave develops statistical to predict 1948-53 The Kinsey Report 1950s Genichi Taguchi’s statistical statistics – “sabermetrics” Shewhart invents statistics. His Design of is close to the correct value. in the same way as the the outcomes of aerial battles: if the gives ways of 1940-45 Alan Turing at Bletchley Park cracks 1946 Cox’s theorem gathers objective data on human methods to improve the quality of – to transform the fortunes 2012 Nate Silver, , random Brownian motion you double their size land armies to aid industrial deciding which of the German wartime , using derives the axioms of sexual behaviour. A large-scale automobile and electronics components 1979 introduces 1997 The term of the Oakland Athletics successfully predicts the result in of molecules – the start of are only twice as strong, but air production and scientific experiments are advanced and Colossus, the probability from simple survey of 5000 men and, later, revolutionise Japanese industry, which bootstrapping, a simple way to estimate the “” first baseball team; the film all 50 states in the US Presidential financial mathematics. forces are four times as powerful. significant and which are not. first programmable electronic computer. logical assumptions. 5000 women, it causes outrage. far overtakes western European rivals. distribution of almost any of data. appears in print. Moneyball tells the story. election. He becomes a media star. Monkey Business/Thinkstock Monkey Rainer Plendl/iStock/ Rainer Thinkstock

1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012

Modern era 1908 William Sealy Gossett, chief 1911 Herman Hollerith, 1935 George Zipf finds that many 1937 introduces 1944 The German tank problem: the 1948 Claude Shannon 1950 Richard Doll 1958 The Kaplan–Meier 1972 David Cox’s 1977 1982 1988 Margaret Thatcher 1993 The statistical 2002 The amount of 2004 Launch 2008 Hal Varian, chief 2012 The Large Hadron Collider brewer for Guinness in Dublin, inventor of punch-card phenomena – river lengths, city confidence intervals in statistical Allies desperately need to know how many introduces and Bradford Hill gives doctors a simple proportional hazard introduces the box- or self-publishes The Visual becomes the first world programming language information stored of Significance at Google, confirms existence of a Higgs describes the t-test. It uses a small devices used to analyse populations – obey a so testing. His work leads to Panther tanks they will face in France on and the “bit” – fundamental establish the link statistical way of judging which model and the concept box-and-whisker diagram, Display of Quantitative leader to call for action “R” is released, now a digitally surpasses magazine. says that statistics will boson-like particle with number of samples to ensure that data in US , that the largest is twice the size of modern scientific sampling. D-Day. Statistical analysis of the serial to the digital age. between cigarette treatments work best. It has of partial likelihood. which shows the , Information, setting new on climate change. standard statistical tool. non-digital. be “the sexy profession probability of five standard every brew tastes equally good. merges his company to the second largest, three times the numbers on gearboxes from captured smoking and lung saved millions of lives. and spread of standards for graphic of the next ten years”. deviations – around one chance form what will become size of the third, and so on. tanks indicates how many of each are cancer. Despite data in a single image. visualisation of data. in 3.5 million that all they are IBM, pioneers of machines Bundesarchiv, Bild 101I-783-0110-12/ Dörner/CC-BY-SA being produced. predict 270 fierce opposition the Les Cunliffe/iStock/ Thinkstock seeing is coincidence. to handle business data a month; reports from intelligence sources result is conclusively and of early . predict many fewer. The total turned out to proved, to huge be 276. Statistics had outperformed spies. benefit. Madyno Photo: Sami Keinänen Photo: Alejandro Catalá Rubio/ Alejandro iStock/Thinkstock Catherine Yeulet/ Catherine iStock/Thinkstock