Introduction

DA-STATS Introduction Shan E Ahmed Raza Department of Computer Science University of Warwick [email protected] DASTATS – Teaching Team Shan E Ahmed Raza Fayyaz Minhas Richard Kirk Introduction 2 About Me • PhD (Computer Science) University of Warwick 2011-2014 warwick.ac.uk/searaza Supervisor(s): Professor Nasir Rajpoot and Dr John Clarkson • Research Fellow University of Warwick 2014 – 2017 • Postdoctoral Fellow The Institute of Cancer Research 2017 – 2019 • Assistant Professor Computer Science, University of Warwick 2019 • Research Interests Computational Pathology Multi-Channel and Multi-Model Image Analysis Deep Learning and Pattern Recognition Introduction 3 Course Outline • Introduction To Descriptive And Predictive Techniques. warwick.ac.uk/searaza • Introduction To Python. • Data Visualisation And Reporting Techniques. • Probability And Bayes Theorem • Sampling From Univariate Distributions. • Concepts Of Multivariate Analysis. • Linear Regression. • Application Of Data Analysis Requirements In Work. • Appreciate Of Limitations Of Traditional Analysis. Introduction 4 DA-STATS Topic 01: Introduction to Descriptive and Predictive Analysis Shan E Ahmed Raza Department of Computer Science University of Warwick [email protected] Outline • Introduction to Descriptive, Predictive and Prescriptive Techniques • Data and Model Driven Modelling • Descriptive Analysis Scales of Measurement and data Measures of central tendency, statistical dispersion and shape of distribution • Correlation Techniques DA-STATS - Topic 01: Introduction to Descriptive and Predictive 6 Analysis Books – Lecture Material DA-STATS - Topic 01: Introduction to Descriptive and Predictive 7 Analysis Statistical Analysis • Descriptive analysis – understand the past. • Predictive analysis – predict the future. • Prescriptive analysis – recommend an action. DA-STATS - Topic 01: Introduction to Descriptive and Predictive 8 Analysis Descriptive Analysis • Use of reports and visual displays to explain or understand past and current business performance • Contain statistical summaries of metrics such as sales and revenue • Intended to provide an outline of trends in current and past performance What Happened? DA-STATS - Topic 01: Introduction to Descriptive and Predictive 9 Analysis Predictive Analysis • Ability to predict future performance • Detecting patterns or relationships in historical data • Project these relationships into the future • Domain knowledge to construct a simplified representation What could happen? DA-STATS - Topic 01: Introduction to Descriptive and Predictive 10 Analysis Prescriptive Analysis • Recommend a choice of action from predictions of future performance • Optimum decision based on the need to maximize (or minimize) some aspect of performance • Many different scenarios can be tested until one is found that best meets the optimization criteria What should be done? DA-STATS - Topic 01: Introduction to Descriptive and Predictive 11 Analysis Supermarket DA-STATS - Topic 01: Introduction to Descriptive and Predictive 12 Analysis Data Driven Modelling • Data-driven modeling aims to derive a description of behavior from observations of a system. • Describe the relationship between input and output • Also known as descriptive models Note this is different from descriptive analysis • Imitates real behavior • More data (observations) to form the description → more accurate the description DA-STATS - Topic 01: Introduction to Descriptive and Predictive 13 Analysis Model Driven Modelling • Explain a system’s behavior not just derived from its inputs but through a representation of the internal system’s structure. • A real system is simplified into its essential elements (its processes) and relationships between these elements (its structure) • The effect of a change on design of the process can be assessed by changing the structure of the model. • Generally, we have far smaller data needs than data-driven models DA-STATS - Topic 01: Introduction to Descriptive and Predictive 14 Analysis Pros and Cons • Data driven models built in error terms. errors can be quantified, and confidence levels can be estimated large amount of data to estimate the parameters to fit the model • Process driven models built using mathematical equations errors may be introduced during simplifications real-world observations are used to evaluate the model DA-STATS - Topic 01: Introduction to Descriptive and Predictive 15 Analysis Descriptive Statistics • Usually, the first step prior to more complex analysis • Scales of Measurement • Using Tables to Organize Data • Measures of Location or central tendency Statistical dispersion Shape of distribution DA-STATS - Topic 01: Introduction to Descriptive and Predictive 16 Analysis Scales of Measurement • Measurement, is the assignment of numbers to attributes, objects, or events according to predetermined rules. • Four different measurement scales Nominal Scales Ordinal Scales Interval Scales Ratio Scales DA-STATS - Topic 01: Introduction to Descriptive and Predictive 17 Analysis Nominal Scale • No quantitative information • Numbers merely to distinguish one type of thing from another type of thing or one event from another event • As no quantitative information being communicated, we are free to exchange one number for any other currently unused number • For instance, the numbers assigned to the members of a football team do not carry any quantitative value. DA-STATS - Topic 01: Introduction to Descriptive and Predictive 18 Analysis Ordinal Scales • Adds relative position information to nominal scale • Reflects a quantitative relationship between the various categories • Rankings Comparatively more or less Not how much DA-STATS - Topic 01: Introduction to Descriptive and Predictive 19 Analysis Interval Scales • Set of quantitatively ordered categories • Intervals between the categories are held constant • Do not possess a true zero point. • Different interval scale can be used to measure the same amount DA-STATS - Topic 01: Introduction to Descriptive and Predictive 20 Analysis Ratio Scale • Addition of an absolute zero point to interval scale • Zero marks the absence of a quantity DA-STATS - Topic 01: Introduction to Descriptive and Predictive 21 Analysis Discrete and Continuous Quantities • An important feature of measuring variables concerns how many different values can be assigned DA-STATS - Topic 01: Introduction to Descriptive and Predictive 22 Analysis Discrete and Continuous Quantities • Discrete Variables Can take on only a finite number of values No meaningful values exist between any two adjacent values To find statistical features of sets of discrete data it is permissible to use “in-between” values DA-STATS - Topic 01: Introduction to Descriptive and Predictive 23 Analysis Discrete and Continuous Quantities • Continuous Variables Theoretically have an infinite number of points between any two numbers Variables do not have gaps between adjacent numbers DA-STATS - Topic 01: Introduction to Descriptive and Predictive 24 Analysis Unorganised Raw Data - Example Scores from 90 participants who completed a questionnaire measuring their achievement. DA-STATS - Topic 01: Introduction to Descriptive and Predictive 25 Analysis Simple Frequency Distribution • How many participants received each score? • Frequency is the number of occurrences of a repeating event • Another example is frequency of radio waves defined per unit time. Number of times a wave completes its cycle in a unit time (per sec) 1 푓 = 푇 DA-STATS - Topic 01: Introduction to Descriptive and Predictive 26 Analysis Grouped Frequency Distribution • The number of scores that fall into each of several ranges of scores • Some sets of data cover a wide range of possible scores which can make the resulting frequency distributions long and cumbersome • Exchange loss of information with a table that is easy to understand DA-STATS - Topic 01: Introduction to Descriptive and Predictive 27 Analysis Cumulative Frequency Distributions • The sum of frequencies found at that interval plus all preceding intervals • Keeps a running tally of all scores up through each given interval DA-STATS - Topic 01: Introduction to Descriptive and Predictive 28 Analysis Measures of Central Tendency • Statistical indices designed to communicate what is the “center” or “middle” of a distribution • Three measures of central tendency Mean Median Mode DA-STATS - Topic 01: Introduction to Descriptive and Predictive 29 Analysis The Mean • The mean, colloquially referred to as the “average,” is the most frequently used measure of central tendency. Σ푋 휇 = 푁 휇 = the symbol for the mean of a population 푋 = a score in the distribution 푁 = population size Σ = sum up a set of scores, Σ푋 = 푋1 + 푋2 + 푋3 + ⋯ + 푋푁 DA-STATS - Topic 01: Introduction to Descriptive and Predictive 30 Analysis The Mean • What is the mean of this population of scores? 5, 8, 10, 11, 12 5 + 8 + 10 + 11 + 12 46 휇 = = = 9.20 5 5 DA-STATS - Topic 01: Introduction to Descriptive and Predictive 31 Analysis The Mean of a Frequency Distribution • Mean of a frequency distribution Σ푋푓 휇 = Σ푓 푓 = frequency with which a score appears DA-STATS - Topic 01: Introduction to Descriptive and Predictive 32 Analysis Calculating the mean from a frequency distribution DA-STATS - Topic 01: Introduction to Descriptive and Predictive Analysis 33 The Weighted Mean • Weighted Mean 푛 푉 + 푛 푉 + 푛 푉 + ⋯ 푛 푉 푀 = 1 1 2 2 3 3 4 5 푛1 + 푛2 + 푛3 + ⋯ 푛푛 DA-STATS - Topic 01: Introduction to Descriptive and Predictive 34 Analysis Weighted Mean - Example

Introduction

EART6 Lab Standard Deviation in Probability and Statistics, The

Robust Logistic Regression to Static Geometric Representation of Ratios

Iam 530 Elements of Probability and Statistics

Measurements of Entropic Uncertainty Relations in Neutron Optics

THE USE of EFFECT SIZES in CREDIT RATING MODELS By

Measures of Central Tendency (Mean, Mode, Median,) Mean • Mean Is the Most Commonly Used Measure of Central Tendency

Media Diversity and the Analysis of Qualitative Variation

Standard Deviation - Wikipedia Visited on 9/25/2018

Probability and Statistics Course Outline

Central Tendency, Dispersion, Standard Error, Coefficient of Variation, Probability Distributions and Confidence Limits)

Lecture 4: Measure of Dispersion

Chapter 3 Commonly Used Statistical Terms