ASTRONOMY 630 Numerical and Statistical Methods in Astrophysics
Total Page:16
File Type:pdf, Size:1020Kb
Astr 630 Class Notes { Spring 2020 1 ASTRONOMY 630 Numerical and statistical methods in astrophysics CLASS NOTES Spring 2020 Instructor: Jon Holtzman Astr 630 Class Notes { Spring 2020 2 1 Introduction What is this class about? What is statistics? Read and discuss Feigelson & Babu 1.1.2 and 1.1.3. The use of statistics in astronomy may not be as controversial as it is in other fields, probably because astronomers feel that there is basic physics that underlies the phenomena we observe. What is data mining and machine learning? Read and discuss Ivesic 1.1 Astroinformatics What are some astronomy applications of statistics and data mining? Classifica- tion, parameter estimation, comparison of hypotheses, absolute evaluation of hypoth- esis, forecasting, finding substructure in data, finding corrrelations in data Some examples: • Classification: what is relative number of different types of planets, stars, galax- ies? Can a subset of observed properties of an object be used to classify the object, and how accurately? e.g., emission line ratios • Parameter estimation: Given a set of data points with uncertainties, what are slope and amplitude of a power-law fit? What are the uncertainties in the parameters? Note that this assumes that power-law description is valid. • Hypothesis comparison: Is a double power-law better than a single power-law? Note that hypothesis comparisons are trickier when the number of parameters is different, since one must decide whether the fit to the data is sufficiently bet- ter given the extra freedom in the more complex model. A simpler comparison would be single power-law vs. two constant plateaus with a break at a speci- fied location, both with two parameters. But a better fit does not necessarily guarantee a better model! • Absolute evaluation: Are the data consistent with a power-law? Absolute as- sessments of this sort can be more problematic than hypothesis comparisons. • Forecasting of errors: How many more objects, or what reduction of uncer- tainties, would allow single and double power-law models to be clearly distin- guished? Need to specify goals, and assumptions about data. Common need for observing proposals, grant proposals, satellite proposals ... • Finding substructure: 1) spatial structure, 2) structure in distributions, e.g., separating peaks in RVs or MDFS, 3) given multi-dimensional data set, e.g. abundance patterns of stars, can you identify homogeneous groups, e.g. chemi- cal tagging? Astr 630 Class Notes { Spring 2020 3 • Finding correlations among many variables: given multi-dimensional data, are there correlations between different variables, e.g., HR diagram, fundamental plane, abundance patterns More examples in Ivesic et al 1.1. Note general explosion of the field in the last decade (e.g., CCA and other insti- tutes). Familiarity with these topics is critical, and mastery even better. Clearly, numerical/computational techniques are an important component of as- trostatistics. Previous course, ASTR 575, attempted (with poor success?) to combine these.... Initial anticipation (!) of ASTR630 class topics: • Intro: usage of data/statistical analysis in astronomy • Basic probability and statistics. Conditional probabilities. Bayes theorem • Statistical distributions in astronomy. Point estimators: bias, consistency, effi- ciency, robustness • Random number generators. Simulating data. • Fitting models to data. Parametric vs non-parametric models. Frequentist vs Bayesian. Least squares: linear and non-linear. Regularization. • Bayesian model analysis. Marginalization. MCMC. • Handling observational data: Correlated and uncorrelated errors. Forward mod- eling. • Determining uncertainties in estimators/parameters: bootstrap and jackknife. • Hypothesis testing and comparison • non-parametric modeling • Issues in modeling real data. Forward modeling. Data driven analysis • Clustering analysis. Gaussian mixture models. Extreme deconvolution. K means. • Dimensionality reduction: principal components, tSNE, etc. • Deep learning and neural networks • ?Fourier analysis (probably not) Astr 630 Class Notes { Spring 2020 4 Logistics: Canvas, class notes, grading TBD (homework, no exams?) Resources: • Ivesic et al : Statistics, Data Mining, and Machine Learning in Astronomy (2014), with Python • Bailer-Jones: Practical Bayesian Inference (2017), with R • Feigelson & Babu: Modern Statistical Methods for Astronomy: With R Appli- cations (2012) • Robinson : Data Analysis for Scientists and Engineers (2016) • Press et al., Numerical Recipes (3rd edition, 2007) Web sites: • California-Harvard Astrostatistics Collaboration • Center for Astrostatistics at Penn State: note annual summer school in Astro- statistics, also Astronomy and Astroinformatics portal • International Computing Astronomy Group Astr 630 Class Notes { Spring 2020 5 2 Probability, basic statistics, and distributions Why study probability? • Probability and statistics: in many statistical analyses, we are talking about the probability of different hypotheses being consistent with data, so it is important to understand the basics of probability. • In many circumstances, we describe variation in data (either intrinsic or exper- imental) with the probability that we will observe different outcomes Different approaches to probability: classical, frequentist, subjective, axiomatic (see here): • Classic Theory - Calculation of the ratio of favourable to total outcomes. Does not involve experiment. Computed by counting NE events of N total possible events, where all outcomes are equally likely. All outcomes must be equally likely. Cannot handle an infinite number of possible outcomes. Example: coins, dice, cards. • frequentist: Perform an experiment n times, testing for the occurence of event E, and define: P (E) = lim n(E)=N n!1 We must estimate from a finite number of trials. We postulate that n(E)=n approaches a limit as n goes to infinity. Not possible for single events! Example: a loaded die. • subjective: judgement based on intuition; a combination of rational statistics, personal observations, and, potentially, superstition. Easily biased by non- rational use of data. In real-world situations, often all we have! Example: weather. Axiomatic approach: provide a logically consistent set of axioms to work with probabilities, however they are determined. Define a random experiment H (with nondeterministic outcomes), a sample description space Ω (the set of all outcomes), and an event E (a subset of the sample description space which satisfies certain constraints). Consider the following definitions of axiomatic theory: For any outcome, E, 0 < P (E) < 1 P (E0) = 1 − P (E) Astr 630 Class Notes { Spring 2020 6 where E0 is the complement of E (i.e., not E). For mutually exclusive (no overlap) outcomes only: P (A or B) ≡ P (A [ B) = P (A) + P (B) If Ei are the set of mutally exclusive and exhaustive outcomes (covers all possibilities), then: ΣP (Ei) = 1 These imply, for non mutally exclusive outcomes: P (A [ B) = P (A) + P (B) − P (A \ B) where P (A \ B) ≡ P (A and B) ≡ P (A; B) in different notations, used in different books. Note graphical representation. If events A and B are independent, i.e., one has no effect on the probability of the other, then P (A \ B) = P (A)P (B) This serves as a definition of independence. Note that if more than two conditions are considered (e.g., A,B,C), then independence requires that all pairs of conditions must be independent. Caution : addition of probabilities only for mutally exclusive events! Often, it is convenient to think about the complement of events when approaching problems, and couch the problem in terms of \and" rather than \or". Example: If there is a 20% chance of rain today and 20% chance of rain tomorrow, what is the probability that it will rain in the next two days? Example: we draw 2 cards from a shuffled deck. What is the probability that they are both aces given a) we replace the first card, and b) we don't? Example: consider a set of 9 objects of three different shapes (square, circle, triangle) and three different colors (red, green, blue). What is probability we will get a blue object? A triangle? What is probability that we will get a blue triangle? Are these independent? Understand and be able to use basic axioms of probability. 2.1 Conditional probabilities Sometimes, however, we have the situation where the probability of an event A is related to the probability of another event B, so if we know something about the other event, it changes the probability of A. This is called a conditional probability, P (AjB), the probability of A given B. If P (AjB) = P (A), then the results are independent. Astr 630 Class Notes { Spring 2020 7 What are some astronomy examples? Example: Imagine we have squares, circles and triangles again, but only blue triangles. What is the probability we will have a triangle if we only look at blue objects? Thinking about this in the classical approach: n(blue) P (blue) = n n(blue triangles) P (blue \ triangle) = n n(blue triangles) P (triangle | blue) = n(blue) Generalizing to the axiomatic representation: P (blue \ triangle) P (triangle | blue) = P (blue) which serves as a definition of conditional probability, P (AjB), the probability of A given B: P (A \ B) P (AjB) = P (B) Note that, by the same reasoning: P (A \ B) P (BjA) = P (A) But, certainly, P (AjB) 6= P (BjA)! Given the definitions, P (BjA)P (A) = P (AjB)P (B) which is known as Bayes theorem. Note that this is a basic result that is not con- troversial, but some applications of Bayes theorem can be: the Bayesian paradigm extends this to the idea that the quantity A or B can also represent a hypothesis, or model, i.e. one talks about the relative probability of different models. If we consider a set of k mutually exclusive events Bi that cover all possibilities (e.g., red, green and blue), then the probability of some other event A is: P (A) = P (A \ B1) + P (A \ B2) + :::P (A \ Bk) P (A) = P (AjB1)P (B1) + P (AjB2)P (B2) + :::P (AjBk)P (Bk) Astr 630 Class Notes { Spring 2020 8 Bayes theorem then gives: P (AjBi)P (Bi) P (AjBi)P (Bi) P (BijA) = = P (A) P (AjB1)P (B1) + P (AjB2)P (B2) + :::P (AjBk)P (Bk) This is useful for addressing many probability problems.