Some definitions
● experiment – e.g. throwing 2 times a dice ● realization: outcome of one specific experiment ● event space/sample space: all possible outcomes ● random variable “Zufallsvariable” – function of outcome of experiment (e.g. sum of dices) – event space and realization defined accordingly for random variables – random variable can be either discrete or continues
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Example
● experiment: throwing two dices ● event space of experiment {11,12,13,14,15,16,21,22,23,...,64,65,66} ● random variable x = sum of dices – event space of x: {2,3,4,5,6,7,8,9,10,11,12} ● realization of experiment e.g. 25 accordingly realization of x=7
cumulated distribution u(x): probability to observe or smaller value
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Probability density function (p.d.f)
● Repeat an experiment with outcome characterized by single continuous variable x ● Definition: the probability to measure a value x in the interval [x,x+dx] is give by probability density function f(x) (pdf) “Wahrscheinlichkeitsdichte”
P is a “measure” of how often a value of x occurs in a given sample pdf f(x) >=0 and normalized
pdf f(x) is NOT a probability, it has dimension 1/x! Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Cumulative distribution function (c.d.f.)
● Cumulative distribution function F(x), also know as probability distribution function. “Wahrscheinlichkeitsverteilung = Verteilungsfunktion” ● F(x') is interpreted as probability to find value x <= x' ● F(x) is continuously non-decreasing function ● F(-∞) = 0 and F(+∞)=1 ● is directly related to the probability density function f(x) by:
● f(x) is given (for well-behaved distributions) by:
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Average
● arithmetic mean of data set:
● geometric mean of data set:
● harmonic mean of data set:
3 P y t h a g o r e a n m e a n s : h a r m o n i c m e a n < = g e o m e tr i c m e a n < = a r it h m e t i c mean ● weighted mean of data set:
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Example: Arithmetic Mean
● Average number of children per family in Germany is 2.3 ● Average lifetime expectation for men is 74 for women 78 ● Average amount of semester for physics studies in Heidelberg is 11.2 ● ...
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Example: Geometric Mean
● Needed to average multiplicative functions: – interests per year in the last 5 years ● 2002: 2.5 % ● 2003: 2.5 % ● 2004: 3.0 % ● 2005: 3.5 % ● 2006: 3.5 % ● after five years:
● comparable average interest:
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Example: Harmonic Mean
● Travel half of the distance with 40 km/h and half of the distance with 60 km/h. What is the average speed?
● Travel half of the time with 40 km/h and half of the time with 60 km/h. What is the average speed?
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Example: Weighted Mean
● 5 measurements { } with different uncertainties { }
● arithmetic mean is special case of weighted means (same weight for each measurement)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer ... even more averages
● Mode/Modus: most probable value (highest bin in distribution) definition not unique, unimodal, bimodal distributions
● Median: smallest value which is ≥ than 50% of the events (median more robust against outliers than artithm .mean)
For unimodal, symmetric functions centered around : median = mode = arithmetic mean
else, for unimodal functions, empiric rule mean-mode = 3 x (mean-median)
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Examples:
● Give median, mode, arithmetic mean of:
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Energy loss in Material
● Assume tracking stations with 12 layers ● The energy loss per traversed material (dE/dx) of the particle traversing a layer follows a Landau distribution
m o d e
dE/dx
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Energy loss in Material
● The mode of the dE/dx distribution as a function of particle momentum is used to seperate different particle species
1 10 momentum [GeV/c]
How to get estimate of mode from 12 measurements?
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Truncated Mean
● Discard 10/20% measurements with larges value (symmetrize the function) ● Take arithmetic mean of remaining ones
● All about estimating the true mode/median/mean of distributions from given data set. More about estimators later.
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Citation ....
“Die Rate der Menschen die in USA in Armut leben dramatisch gestiegen. Die Hälfte aller Menschen haben ein Einkommen unter dem Durchschnitt. “
aus “So lügt man mit Statistik”
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Measure the Spread
● How to characterise width/spread? ● First thought mean deviation from the mean:
Could consider average absolute deviation:
However hard to handle mathematically.
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Variance
● Way better quantity:
mean square deviation called sample variance
● For any function :
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Variance
● For data analysis, preferably loop only once over data:
mean square – square of the mean
For large numbers, safer to shift distribution by estimated mean :
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Standard Deviation (R.M.S), FWHM
● standard deviation σ or RMS: root mean squared
[“standard ” is a joke, there are several standards in literature ...] ● FWHM: full width at half maximum more robust against outliers, fluctuations harder at low statistσics for Gaussian distribution: FWHM = 2.35σ
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer