Statistical Theory: Introduction 1 What Is This Course About?

Statistical Theory: Introduction 1 What is This Course About? Statistical Theory 2 Elements of a Statistical Model Victor Panaretos Ecole polytechnique fédéralede Lausanne 3 Parameters and Parametrizations 4 Some Motivating Examples Statistical Theory (Week 1) Introduction 1 / 16 Statistical Theory (Week 1) Introduction 2 / 16 What is This Course About What is This Course About? Statistics Extracting Information from Data −→ We may at once admit that any inference from the Age of Universe (Astrophysics) particular to the general must be attended with Random Networks (Internet) some degree of uncertainty, but this is not the Microarrays (Genetics) Inflation (Economics) same as to admit that such inference cannot be Stock Markets (Finance) absolutely rigorous, for the nature and degree of Phylogenetics (Evolution) Pattern Recognition (Artificial the uncertainty may itself be capable of rigorous expression. Intelligence) Molecular Structure (Structural Biology) Climate Reconstruction (Paleoclimatology) Seal Tracking (Marine Biology) Ronald A. Fisher Quality Control (Mass Disease Transmission (Epidemics) The object of rigor is to sanction and legitimize Production) the the conquests of intuition, and there was never any other object for it. Variety of different forms of data are bewildering But concepts involved in their analysis show fundamental similarities Imbed and rigorously study in a framework Jacques Hadamard Is there a unified mathematical theory? Statistical Theory (Week 1) Introduction 3 / 16 Statistical Theory (Week 1) Introduction 4 / 16 What is This Course About? What is This Course About? The Job of the Probabilist Statistical Theory: What and How? Given a probability model P on a measurable space (Ω, ) find the What? The rigorous study of the procedure of extracting information from F probability P[A] that the outcome of the experiment is A . data using the formalism and machinery of mathematics. ∈ F How? Thinking of data as outcomes of probability experiments The Job of the Statistician Given an outcome of A (the data) of a probability experiment on Probability offers a natural language to describe uncertainty or partial ∈ F (Ω, ), tell me something interesting about the (uknown) probability knowledge F ∗ model that generated the outcome. Deep connections between probability and formal logic P Can break down phenomenon into systematic and random parts. (∗something in addition to what I knew before observing the outcome A) Such questions can be: What can Data be? 1 Are the data consistent with a certain model? To do probability we simply need a measurable space (Ω, ). Hence, F 2 Given a family of models, can we determine which model generated almost anything that can be mathematically expressed can be thought as the data? data (numbers, functions, graphs, shapes,...) These give birth to more questions: how can we answer 1,2? is there a best way? how much “off” is our answer? Statistical Theory (Week 1) Introduction 5 / 16 Statistical Theory (Week 1) Introduction 6 / 16 A Probabilist and a Statistician Flip a Coin A Probabilist and a Statistician Flip a Coin Example Example (cont’d) Let X1, ..., X10 denote the results of flipping a coin ten times, with Statistician Asks: 0 if heads , Xi = , i = 1, ..., 10. Is the coin fair? (1 if tails What is the true value of θ given X? iid How much error do we make when trying to decide the above from X? A plausible model is Xi Bernoulli(θ). We record the outcome ∼ How does our answer change if X is perturbed? X = (0, 0, 0, 1, 0, 1, 1, 1, 1, 1). Is there a “best” solution to the above problems? iid How sensitive are our answers to departures from Xi Bernoulli(θ) Probabilist Asks: ∼ How do our “answers” behave as # tosses ? Probability of outcome as function of θ? −→ ∞ How many tosses would we need until we can get “accurate answers”? Probability of k-long run? Does our model agree with the data? If keep tossing, how many k-long runs? How long until k-long run? Statistical Theory (Week 1) Introduction 7 / 16 Statistical Theory (Week 1) Introduction 8 / 16 The Basic Setup The Basic Setup: An Ilustration Example (Coin Tossing) Elements of a Statistical Model: Consider the following probability space: n Have a random experiment with sample space Ω. Ω = [0, 1] with elements ω = (ω1, ..., ωn) Ω ∈ n X :Ω R is a random variable, X = (X1, ..., Xn), defined on Ω are Borel subsets of Ω (product σ-algebra) → F When outcome of experiment is ω Ω, we observe X(ω) and call it P is the uniform probability measure (Lebesge measure) on [0, 1]n ∈ the data (usually ω omitted). Now we can define the experiment of n coin tosses as follows: Probability experiment of observing a realisation of X completely Let θ (0, 1) be a constant determined by distribution F of X. ∈ For i = 1, ..., n let Xi = 1 ωi > θ n { } F assumed to be member of family F of distributions on R . Let X = (X , ..., X ), so that X :Ω 0, 1 n 1 n → { } 0 if x ( , 0), Goal i ∈ −∞ Then FX (xi ) = P[Xi xi ] = θ if xi [0, 1), Learn about F F given data X. i ≤ ∈ ∈ 1 if x [1, + ). i ∈ ∞ And F (x) = n F (x ) X i=1 Xi i Statistical Theory (Week 1) Introduction 9 / 16 Statistical Theory (Week 1)Q Introduction 10 / 16 Describing Families of Distributions: Parametric Models Parametric Models Definition (Parametrization) Example (Geometric Distribution) Let Θ be a set, F be a family of distributions and g :Θ F an onto → k mapping. The pair (Θ, g) is called a parametrization of F. Let X1, ..., Xn be iid geometric(p) distributed: P[Xi = k] = p(1 p) , − k N 0 . Two possible parametrizations are: ∈ ∪ { } Definition (Parametric Model) 1 [0, 1] p geometric(p) 3 7→ A parametric model with parameter space Θ Rd is a family of 2 [0, ) µ geometric with mean µ ⊆ ∞ 3 7→ probability models F parametrized by Θ, F = F : θ Θ . { θ ∈ } Example (Poisson Distribution) Example (IID Normal Model) λ λk Let X1, ..., Xn be Poisson(λ) distributed: P[Xi = k] = e− k! , k N 0 . n xi ∈ ∪ { } 1 1 2 Three possible parametrizations are: (yi µ) 2 F = e− 2σ − dyi :(µ, σ ) R R+ ( σ√2π ∈ × ) 1 [0, ) λ Poisson(λ) Yi=1 Z−∞ ∞ 3 7→ 2 [0, ) µ Poisson with mean µ When Θ is not Euclidean, we call F non-parametric ∞ 3 7→ 3 [0, ) σ2 Poisson with variance σ2 When Θ is a product of a Euclidean and a non-Eucidean space, we ∞ 3 7→ call F semi-parametric Statistical Theory (Week 1) Introduction 11 / 16 Statistical Theory (Week 1) Introduction 12 / 16 Identifiability Identifiability Parametrization often suggested from phenomenon we are modelling Example (Binomial Thinning) But any set Θ and surjection g :Θ F give a parametrization. → Let Bi,j be an infinite iid array of Bernoulli(ψ) variables and ξ1, ..., ξn be Many parametrizations possible! Is any parametrization sensible? { } an iid sequence of geometric(p) random variables with probability mass k function P[ξi = k] = p(1 p) ,k N 0 . Let X1, ..., Xn be iid random Definition (Identifiability) − ∈ ∪ { } variables defined by A parametrization (Θ, g) of a family of models F is called identifiable if g :Θ F is a bijection (i.e. if g is injective on top of being surjective). ξj → X = B , j = 1, .., n When a parametrization is not identifiable: j i,j i=1 Have θ = θ but F = F . X 1 6 2 θ1 θ2 Even with amounts of data we could not distinguish θ from θ . Any F F is completely determined by (ψ, p), so [0, 1]2 (ψ, q) F ∞ 1 2 X ∈ 3 7→ X is a parametrization of F. Can show (how?) Definition (Parameter) A parameter is a function ν : F , where is arbitrary. p θ → N N X geometric ∼ ψ(1 p) + p − A parameter is a feature of the distribution Fθ When θ F is identifiable, then ν(F ) = q(θ) for some q :Θ . However (ψ, p) is not identifiable (why?). 7→ θ θ → N Statistical Theory (Week 1) Introduction 13 / 16 Statistical Theory (Week 1) Introduction 14 / 16 Parametric Inference for Regular Models Examples Will focus on parametric families F. The aspects we will wish to learn about will be parameters of F F. ∈ Regular Models Example (Five Examples) Assume from now on that in any parametric model we consider either: Sampling Inspection (Hypergeometric Distribution) 1 All of the F are continuous with densities f (x, θ) θ Problem of Location (Location-Scale Families) 2 All of the Fθ are discrete with frequency functions p(x, θ) and there exists a countable set A that is independent of θ such that Regression Models (Non-identically distributed data) x A p(x, θ) = 1 for all θ Θ. ∈ ∈ Will beP considering the mathematical aspects of problems such as: Autoregressive Measurement Error Model (Dependent data) 1 Estimating which θ Θ (i.e. which F F) generated X ∈ θ ∈ Random Projections of Triangles (Shape Theory) 2 Deciding whether some hypothesized values of θ are consistent with X 3 The performance of methods and the existence of optimal methods 4 What happens when our model is wrong? Statistical Theory (Week 1) Introduction 15 / 16 Statistical Theory (Week 1) Introduction 16 / 16 1 Motivation: Functions of Random Variables Overview of Stochastic Convergence 2 Stochastic Convergence How does a R.V. “Converge”? Statistical Theory Convergence in Probability and in Distribution Victor Panaretos Ecole Polytechnique Fédéralede Lausanne 3 Useful Theorems Weak Convergence of Random Vectors 4 Stronger Notions of Convergence 5 The Two “Big” Theorems Statistical Theory (Week 2) Stochastic Convergence 1 / 21 Statistical Theory (Week 2) Stochastic Convergence 2 / 21 Functions of Random Variables Functions of Random Variables Once we assume that n we start understanding dist[X¯n] more: 2 → ∞ Let X1, ..., Xn be i.i.d.

Load more