Statistical Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Analysis statistical_analysis-WORKBOOK In [1]: ## CSS coloring for the dataframe tables from IPython.core.display import HTML css = open('../style/style-table.css').read() + open('../style/style-not ebook.css').read() HTML('<style>{}</style>'.format(css)) Out[1]: Statistical Analysis Index Exploratory Data Analysis Data Types Estimates of Location mean exercise exercise exercise median exercise Estimates of Variability Standard deviation and related estimates exercise Estimates based on percentiles Estimates of skewness and tailedness exercise Estimates of Distribution Boxplot exercise Frequency tables, density plots and histograms exercise exercise Correlation Data and Sampling distribution Common distributions Processing math: 100% file:///C/...ta_science_in_python/01.intrduction_to_python_for_data_science/04.statistical_analysis/statistical_analysis-WORKBOOK.html[18.09.2019 09:04:37] statistical_analysis-WORKBOOK Binomial distribution Exercise Exercise Poisson distribution Exercise Exponential distribution Exercise Uniform distribution Normal distribution Chi2 distribution Sampling Distribution of a Statistic Sampling Distribution of Means Exercise Exploratory Data Analysis [[1](#ref1)] Statistics has developed mostly in the past century. Probability theory was develped in the 17th to 19th century based on the work of Thomas Bayes, Pierre-Simon laplace and Carl Gauss. source: https://bit.ly/1LEZVtR Statistics is an applied science concerned with analysis and modeling of data. Francis Galton, Karl Pearson andRoland Fisher laying the pillars of moders statistics by introducing key ideas like experimental design and maximum likelihood estimation. These and many statistical concepts formed the foundations of moderen data science. The first step on any data science project is exploring the data. Exploratory Data Analysis (EDA) is a new area of statistics established by John W. Tukey, where he called fo rreformation of statistics in his paper The Future of Data Analysis. He proposed a new scientific discipline called data analyis that includes statistical inference as one of its compenents, beside engineering and computer science techniques. [[2](#ref2)] source: https://bit.ly/2SuYDdm file:///C/...ta_science_in_python/01.intrduction_to_python_for_data_science/04.statistical_analysis/statistical_analysis-WORKBOOK.html[18.09.2019 09:04:37] statistical_analysis-WORKBOOK With the rapid development of computing power, as well as data analysis software and the availability of more and bigger data, allowed exploratory data analysis has evelved well. Donoho in his paper 50 years of data science traces the root of data science back to Tukey's work. Data Types[[1](#ref1), [3](#ref3)] DIFFERENT TYPES OF DATA SOURCES Data manifests itself in many different shapes. Each shape of data may hold much value to your targeted goal. In some shapes this is easier to extract than others. Different shapes of data require different storage solutions and should therefore be dealt with in different ways. Thee shapes of data will be distinguished as follows: 1. Unstructured Data Unstructured data is the raw form of data. It can be any type of file, for example; texts, pictures, sounds, sensor measurements or videos. This data is often stored in a repository of files. Extracting value out of this shape of data is often the hardest. Since you first need to extract structured features from the data that describe or abstract it. Experts estimate that 80 to 90 percent of the data in any organization is unstructured. And the amount of unstructured data in enterprises is growing significantly — often many times faster than structured databases are growing.[[4](#ref4)] file:///C/...ta_science_in_python/01.intrduction_to_python_for_data_science/04.statistical_analysis/statistical_analysis-WORKBOOK.html[18.09.2019 09:04:37] statistical_analysis-WORKBOOK source: https://bit.ly/2GmXK1F 2. Structured Data Structured Data is tabular data (rows and columns) which are very well defined. Meaning that we know which columns there are and what kind of data they contain. source: https://bit.ly/2rdzwi9 The two basic types of structured data is: [[1](#ref1)] 1. Numeric, which comes in two forms: A. continuous, are the data that can take any value real in an interval ( ∈ R) such as speed or time duration. file:///C/...ta_science_in_python/01.intrduction_to_python_for_data_science/04.statistical_analysis/statistical_analysis-WORKBOOK.html[18.09.2019 09:04:37] statistical_analysis-WORKBOOK synonyms: interval, float, numeric B. discrete, are the data that takes only integer values ( ∈ Z) such as the count of event occurance. synonyms: integer, count 2. Categorical, are the data that takes a fixed set of values (such as car or bike or truck). Two special cases of categorical data: synonyms: enums, enumerated, factors, nominal, polychotomous A. Binary data: where the data can take one of two possible values, such as 0/1, true/false, cat/dog, ... synonyms: dichotomous, logical, inducator, boolean B. Ordinal data: where the order of the values matters, as example very cold, cold, warm, hot. synonyms: ordered factor Knowing the data type is important in the process of exploring your data, since you can know the required type of visual display, data analysis, even statistical/training model. The typical frame of data using in data science is Rectangular data objects, like spread sheets and pandas dataframes. It's essentially two-dimensional table/matrix, where: the rows represent the observation samples also called: cases, examples, instances, records the columns represent the features: also called: attributes, inputs, predictors, variables The dataset that you require to analyse does not come always in Rectangular form (such as unstructured data (like text)), but it has to be processed to the the Rectangular form. There is another important data structures that is used in data science beside the Rectangular data, and that is the nonrectangular data stuructures, these data structures can come if the forms of: 1. Time Series, is a series of data points indexed (or listed or graphed) in time order. Most file:///C/...ta_science_in_python/01.intrduction_to_python_for_data_science/04.statistical_analysis/statistical_analysis-WORKBOOK.html[18.09.2019 09:04:37] statistical_analysis-WORKBOOK commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots or the measurements of a sensor over time.[[5](#ref5)] source: https://bit.ly/2SX7ghq 1. Spatial data structures, is a data stuructures that are used in mapping and location analytics. It is optimized for storing and querying data that represents objects defined in a geometric space. Most spatial databases allow the representation of simple geometric objects such as points, lines and polygons. [[6](#ref6)] The structure of this data is more complex than the Rectangular data. Possible represenations of this data sturcture: [[1](#ref1)] Object representation: where the focus of the data is an pbject (e.g. a house) and its spatial coordinates field View representation: focuses on small units of space and the value of a relevant metric (as example, pixel brightness) </br> source: https://bit.ly/2YfljVn 1. Graph (or network) data structures, are used to represent physical, social, and anstract file:///C/...ta_science_in_python/01.intrduction_to_python_for_data_science/04.statistical_analysis/statistical_analysis-WORKBOOK.html[18.09.2019 09:04:37] statistical_analysis-WORKBOOK replationships. Forexample, a graph of a social network. Graph structures are sueful for certain types of problems such as network optimization and recommender systems. source: https://bit.ly/2KeXXW1 Estimates of Location [[1](#ref1)] Also called "Central Tendency" measure, and is the basic step in exploring numeric data, by estimating where most oof the data is located. The basic estimators for location are mean and median Mean also called average or mathematical expectation, is the central value of a discrete set of numbers: specifically, the sum of the values divided by the number of values. for a dataset with n values, x = {x1, x2, . , xn}, its mean is defined as: ∑ x n ˉx = ni=1 i In statistics when representing the count of population/sample, the capitalized character (as example N) refer to the population, while the lower case (as example n) refers to a sample from a population. mean is a sensitive estimation to outliers, but the effect of the outlier decreases as the number of samples (n) increases. Exercise file:///C/...ta_science_in_python/01.intrduction_to_python_for_data_science/04.statistical_analysis/statistical_analysis-WORKBOOK.html[18.09.2019 09:04:37] statistical_analysis-WORKBOOK 1. From the sub_set dataset below: A. Plot the points B. compute the mean to tips and plot that mean as a horizontal line. In [118]: import numpy as np import seaborn as sns; sns.set() import matplotlib.pyplot as plt from pandas import DataFrame as df In [119]: dataset = sns.load_dataset("tips") sub_set = dataset.loc[:,['total_bill', 'tip']] sub_set.head() ### YOUR CODE HERE # # ### Out[119]: total_bill tip 0 16.99 1.01 1 10.34 1.66 2 21.01 3.50 3 23.68 3.31 4 24.59 3.61 1. Create sub_set_with_outlier by taking the first 15 elements of sub_set , then: A. Plot the points B. compute the mean to tips and plot that mean as a horizontal line. In [121]: ### YOUR CODE HERE # # ### Exercise Repeat the previous example but with taking the sub_set_with_outlier
Recommended publications
  • Moments of the Product and Ratio of Two Correlated Chi-Square Variables
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Stat Papers (2009) 50:581–592 DOI 10.1007/s00362-007-0105-0 REGULAR ARTICLE Moments of the product and ratio of two correlated chi-square variables Anwar H. Joarder Received: 2 June 2006 / Revised: 8 October 2007 / Published online: 20 November 2007 © The Author(s) 2007 Abstract The exact probability density function of a bivariate chi-square distribu- tion with two correlated components is derived. Some moments of the product and ratio of two correlated chi-square random variables have been derived. The ratio of the two correlated chi-square variables is used to compare variability. One such applica- tion is referred to. Another application is pinpointed in connection with the distribution of correlation coefficient based on a bivariate t distribution. Keywords Bivariate chi-square distribution · Moments · Product of correlated chi-square variables · Ratio of correlated chi-square variables Mathematics Subject Classification (2000) 62E15 · 60E05 · 60E10 1 Introduction Fisher (1915) derived the distribution of mean-centered sum of squares and sum of products in order to study the distribution of correlation coefficient from a bivariate nor- mal sample. Let X1, X2,...,X N (N > 2) be two-dimensional independent random vectors where X j = (X1 j , X2 j ) , j = 1, 2,...,N is distributed as a bivariate normal distribution denoted by N2(θ, ) with θ = (θ1,θ2) and a 2 × 2 covariance matrix = (σik), i = 1, 2; k = 1, 2. The sample mean-centered sums of squares and sum of products are given by a = N (X − X¯ )2 = mS2, m = N − 1,(i = 1, 2) ii j=1 ij i i = N ( − ¯ )( − ¯ ) = and a12 j=1 X1 j X1 X2 j X2 mRS1 S2, respectively.
    [Show full text]
  • A Multivariate Student's T-Distribution
    Open Journal of Statistics, 2016, 6, 443-450 Published Online June 2016 in SciRes. http://www.scirp.org/journal/ojs http://dx.doi.org/10.4236/ojs.2016.63040 A Multivariate Student’s t-Distribution Daniel T. Cassidy Department of Engineering Physics, McMaster University, Hamilton, ON, Canada Received 29 March 2016; accepted 14 June 2016; published 17 June 2016 Copyright © 2016 by author and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/ Abstract A multivariate Student’s t-distribution is derived by analogy to the derivation of a multivariate normal (Gaussian) probability density function. This multivariate Student’s t-distribution can have different shape parameters νi for the marginal probability density functions of the multi- variate distribution. Expressions for the probability density function, for the variances, and for the covariances of the multivariate t-distribution with arbitrary shape parameters for the marginals are given. Keywords Multivariate Student’s t, Variance, Covariance, Arbitrary Shape Parameters 1. Introduction An expression for a multivariate Student’s t-distribution is presented. This expression, which is different in form than the form that is commonly used, allows the shape parameter ν for each marginal probability density function (pdf) of the multivariate pdf to be different. The form that is typically used is [1] −+ν Γ+((ν n) 2) T ( n) 2 +Σ−1 n 2 (1.[xx] [ ]) (1) ΓΣ(νν2)(π ) This “typical” form attempts to generalize the univariate Student’s t-distribution and is valid when the n marginal distributions have the same shape parameter ν .
    [Show full text]
  • A Guide on Probability Distributions
    powered project A guide on probability distributions R-forge distributions Core Team University Year 2008-2009 LATEXpowered Mac OS' TeXShop edited Contents Introduction 4 I Discrete distributions 6 1 Classic discrete distribution 7 2 Not so-common discrete distribution 27 II Continuous distributions 34 3 Finite support distribution 35 4 The Gaussian family 47 5 Exponential distribution and its extensions 56 6 Chi-squared's ditribution and related extensions 75 7 Student and related distributions 84 8 Pareto family 88 9 Logistic ditribution and related extensions 108 10 Extrem Value Theory distributions 111 3 4 CONTENTS III Multivariate and generalized distributions 116 11 Generalization of common distributions 117 12 Multivariate distributions 132 13 Misc 134 Conclusion 135 Bibliography 135 A Mathematical tools 138 Introduction This guide is intended to provide a quite exhaustive (at least as I can) view on probability distri- butions. It is constructed in chapters of distribution family with a section for each distribution. Each section focuses on the tryptic: definition - estimation - application. Ultimate bibles for probability distributions are Wimmer & Altmann (1999) which lists 750 univariate discrete distributions and Johnson et al. (1994) which details continuous distributions. In the appendix, we recall the basics of probability distributions as well as \common" mathe- matical functions, cf. section A.2. And for all distribution, we use the following notations • X a random variable following a given distribution, • x a realization of this random variable, • f the density function (if it exists), • F the (cumulative) distribution function, • P (X = k) the mass probability function in k, • M the moment generating function (if it exists), • G the probability generating function (if it exists), • φ the characteristic function (if it exists), Finally all graphics are done the open source statistical software R and its numerous packages available on the Comprehensive R Archive Network (CRAN∗).
    [Show full text]
  • Gamma and Related Distributions
    GAMMA AND RELATED DISTRIBUTIONS By Ayienda K. Carolynne Supervisor: Prof J.A.M Ottieno School of Mathematics University of Nairobi A thesis submitted to the School of Mathematics, University of Nairobi in partial fulfillment of the requirements for the degree of Master of Science in Statistics. November, 2013 Declaration I, Ayienda Kemunto Carolynne do hereby declare that this thesis is my original work and has not been submitted for a degree in any other University. Sign: ---------------------------------------- Date................................ AYIENDA KEMUNTO CAROLYNNE Sign: ---------------------------------------- Date............................... Supervisor: J.A.M OTTIENO i Dedication I wish to dedicate this thesis to my beloved husband Ezekiel Onyonka and son Steve Michael. To you all, I thank you for giving me the support needed to persue my academic dream. ii Acknowledgements I am grateful to the all-mighty God who has watched over me all my life and more so, during my challenging academic times. It is my prayer that he grants me more strength to achieve all my ambitions and aspirations in life. I am thankful to the University of Nairobi, which not only offered me a scholarship, but also moral support during this study. It is my pleasant privilege to acknowledge with gratitude one and all from whom I received guidance and encouragement during hard times of my study. Special thanks to my supervisor Prof. J.A.M Ottieno, School of Mathematics, University of Nairobi for his moral and material support, encouragement through which I have managed to do my work. Also I cannot forget to appreciate his invaluable guidance, tolerance and constructive criticism without which I would not have accomplished my objectives.
    [Show full text]
  • Hand-Book on STATISTICAL DISTRIBUTIONS for Experimentalists
    Internal Report SUF–PFY/96–01 Stockholm, 11 December 1996 1st revision, 31 October 1998 last modification 10 September 2007 Hand-book on STATISTICAL DISTRIBUTIONS for experimentalists by Christian Walck Particle Physics Group Fysikum University of Stockholm (e-mail: [email protected]) Contents 1 Introduction 1 1.1 Random Number Generation .............................. 1 2 Probability Density Functions 3 2.1 Introduction ........................................ 3 2.2 Moments ......................................... 3 2.2.1 Errors of Moments ................................ 4 2.3 Characteristic Function ................................. 4 2.4 Probability Generating Function ............................ 5 2.5 Cumulants ......................................... 6 2.6 Random Number Generation .............................. 7 2.6.1 Cumulative Technique .............................. 7 2.6.2 Accept-Reject technique ............................. 7 2.6.3 Composition Techniques ............................. 8 2.7 Multivariate Distributions ................................ 9 2.7.1 Multivariate Moments .............................. 9 2.7.2 Errors of Bivariate Moments .......................... 9 2.7.3 Joint Characteristic Function .......................... 10 2.7.4 Random Number Generation .......................... 11 3 Bernoulli Distribution 12 3.1 Introduction ........................................ 12 3.2 Relation to Other Distributions ............................. 12 4 Beta distribution 13 4.1 Introduction .......................................
    [Show full text]
  • Handbook on Probability Distributions
    R powered R-forge project Handbook on probability distributions R-forge distributions Core Team University Year 2009-2010 LATEXpowered Mac OS' TeXShop edited Contents Introduction 4 I Discrete distributions 6 1 Classic discrete distribution 7 2 Not so-common discrete distribution 27 II Continuous distributions 34 3 Finite support distribution 35 4 The Gaussian family 47 5 Exponential distribution and its extensions 56 6 Chi-squared's ditribution and related extensions 75 7 Student and related distributions 84 8 Pareto family 88 9 Logistic distribution and related extensions 108 10 Extrem Value Theory distributions 111 3 4 CONTENTS III Multivariate and generalized distributions 116 11 Generalization of common distributions 117 12 Multivariate distributions 133 13 Misc 135 Conclusion 137 Bibliography 137 A Mathematical tools 141 Introduction This guide is intended to provide a quite exhaustive (at least as I can) view on probability distri- butions. It is constructed in chapters of distribution family with a section for each distribution. Each section focuses on the tryptic: definition - estimation - application. Ultimate bibles for probability distributions are Wimmer & Altmann (1999) which lists 750 univariate discrete distributions and Johnson et al. (1994) which details continuous distributions. In the appendix, we recall the basics of probability distributions as well as \common" mathe- matical functions, cf. section A.2. And for all distribution, we use the following notations • X a random variable following a given distribution, • x a realization of this random variable, • f the density function (if it exists), • F the (cumulative) distribution function, • P (X = k) the mass probability function in k, • M the moment generating function (if it exists), • G the probability generating function (if it exists), • φ the characteristic function (if it exists), Finally all graphics are done the open source statistical software R and its numerous packages available on the Comprehensive R Archive Network (CRAN∗).
    [Show full text]
  • Field Guide to Continuous Probability Distributions
    Field Guide to Continuous Probability Distributions Gavin E. Crooks v 1.0.0 2019 G. E. Crooks – Field Guide to Probability Distributions v 1.0.0 Copyright © 2010-2019 Gavin E. Crooks ISBN: 978-1-7339381-0-5 http://threeplusone.com/fieldguide Berkeley Institute for Theoretical Sciences (BITS) typeset on 2019-04-10 with XeTeX version 0.99999 fonts: Trump Mediaeval (text), Euler (math) 271828182845904 2 G. E. Crooks – Field Guide to Probability Distributions Preface: The search for GUD A common problem is that of describing the probability distribution of a single, continuous variable. A few distributions, such as the normal and exponential, were discovered in the 1800’s or earlier. But about a century ago the great statistician, Karl Pearson, realized that the known probabil- ity distributions were not sufficient to handle all of the phenomena then under investigation, and set out to create new distributions with useful properties. During the 20th century this process continued with abandon and a vast menagerie of distinct mathematical forms were discovered and invented, investigated, analyzed, rediscovered and renamed, all for the purpose of de- scribing the probability of some interesting variable. There are hundreds of named distributions and synonyms in current usage. The apparent diver- sity is unending and disorienting. Fortunately, the situation is less confused than it might at first appear. Most common, continuous, univariate, unimodal distributions can be orga- nized into a small number of distinct families, which are all special cases of a single Grand Unified Distribution. This compendium details these hun- dred or so simple distributions, their properties and their interrelations.
    [Show full text]
  • Multivariate Statistics
    Chapter 2 Multivariate Statistics The univariate normal distribution takes a central position in statistics. This situation is even more apparent in multivariate analysis, because the mul- tivariate normal distribution is one of the few analytically tractable multi- variate distributions (Eaton, 1983). Absolutely continuous multivariate dis- tributions on Rm can be defined by specifying the density function of the distribution with respect to the Lebesgue measure on Rm. The probability mass of such distributions cannot be concentrated in a null set. However, if the random vector takes values in, for example, a proper affine subspace, then this approach needs some modification. We will call these distributions singular distributions. Degenerated distributions give all their mass to just one point. Therefore, we call such distributions singular distributions. To incorporate singular distributions, it is convenient to define a multivariate distribution by specifying the distribution of each linear function of the ran- dom vector (see, e.g., Muirhead, 1982, theorem 1.2.2). We briefly repeat this discussion in subsection 2.1.1. It is often stated, however, that the density function of a normally distributed m × 1 vector X with singular covariance matrix ‘does not exist’. Whereas, indeed, the density does not exist with re- spect to the Lebesgue measure on Rm, the density function does exist on an affine subspace, a topic we will discuss in subsection 2.1.2. To be able to work with the density function of singular normally distributed random vec- tors in practice, the problem is often redefined on this affine subspace (see, e.g., definition 3.2.2 of Tong, 1990).
    [Show full text]
  • Chi Percent Point Function with Degrees of Freedom Parameter V
    Auxillary CHPPF CHPPF PURPOSE Compute the chi percent point function with degrees of freedom parameter v. DESCRIPTION The distribution of the positive square root of a variable having a chi-square distribution is a chi-distribution. The chi-distribution has the following probability density function: –x2 -------- 2 υ – 1 (), υ e x υ > f x = --------------------------υ x > 0, 0 (EQ Aux-67) ---– 1 υ 22 Γ --- 2 where Γ is the gamma function The percent point function is the inverse of the cumulative distribution function. The cumulative distribution sums the probability from 0 to the given x value. The percent point function takes a cumulative probability value and computes the corresponding x value. The input value should be between 0 and 1 and the degrees of freedom should be a positive real number. The chi percent point function is computed numerically. SYNTAX LET <y2> = CHPPF(<y1>,<v>) <SUBSET/EXCEPT/FOR qualification> where <y1> is a positive number, parameter, or variable in the range 0 to 1; <y2> is a variable or a parameter (depending on what <y1> is) where the computed chi ppf value is stored; <v> is a positive number, parameter, or variable that specifies the degrees of freedom; and where the <SUBSET/EXCEPT/FOR qualification> is optional. EXAMPLES LET A = CHPPF(0.9,10) LET A = CHPPF(A1,10) LET X2 = CHPPF(X1,10) NOTE 1 DATAPLOT uses a bisection method to compute the chi ppf value. The algorithm for the central beta distribution is given in the Kennedy and Gentle bool (see the REFERENCE section below). The algorithm for the chi-distribution is similar.
    [Show full text]
  • Probability Distribution Relationships
    STATISTICA, anno LXX, n. 1, 2010 PROBABILITY DISTRIBUTION RELATIONSHIPS Y.H. Abdelkader, Z.A. Al-Marzouq 1. INTRODUCTION In spite of the variety of the probability distributions, many of them are related to each other by different kinds of relationship. Deriving the probability distribu- tion from other probability distributions are useful in different situations, for ex- ample, parameter estimations, simulation, and finding the probability of a certain distribution depends on a table of another distribution. The relationships among the probability distributions could be one of the two classifications: the transfor- mations and limiting distributions. In the transformations, there are three most popular techniques for finding a probability distribution from another one. These three techniques are: 1 - The cumulative distribution function technique 2 - The transformation technique 3 - The moment generating function technique. The main idea of these techniques works as follows: For given functions gin(,XX12 ,...,) X, for ik 1, 2, ..., where the joint dis- tribution of random variables (r.v.’s) XX12, , ..., Xn is given, we define the func- tions YgXXii (12 , , ..., X n ), i 1, 2, ..., k (1) The joint distribution of YY12, , ..., Yn can be determined by one of the suit- able method sated above. In particular, for k 1, we seek the distribution of YgX () (2) For some function g(X ) and a given r.v. X . The equation (1) may be linear or non-linear equation. In the case of linearity, it could be taken the form n YaX ¦ ii (3) i 1 42 Y.H. Abdelkader, Z.A. Al-Marzouq Many distributions, for this linear transformation, give the same distributions for different values for ai such as: normal, gamma, chi-square and Cauchy for continuous distributions and Poisson, binomial, negative binomial for discrete distributions as indicated in the Figures by double rectangles.
    [Show full text]
  • Probability Theory with Simulations - Part-III Continuous Distributions in One-Dimension
    Probability Theory with Simulations - Part-III Continuous distributions in one-dimension - Andras Vetier 07 11 2017 Contents 1 Continuous random variables3 2 Distribution function3 3 *** Empirical distribution function5 4 Density function6 5 *** Histogram9 6 Uniform distributions 10 7 Distributions of some functions of random numbers 12 8 *** Arc-sine distribution 18 9 *** Cauchy distribution 19 10 *** Beta distributions 20 11 Exponential distribution 23 12 *** Gamma distribution 25 13 Normal distributions 27 14 *** Distributions derived from normal 30 1 15 ***Generating a random variable with a given continuous distribution 31 16 Expected value of continuous distributions 33 17 Expected value of a function of a continuous random variable 39 18 ***Median 42 19 Standard deviation, etc. 45 20 *** Poisson-processes 53 21 ***Transformation from line to line 55 2 1 Continuous random variables In this chapter, we start studying random variables which, contrary to discrete random variables, may take any numbers from an interval of the real line. Some examples: 1. X = the amount of time someone has to wait for the bus when he or she goes to work. The waiting time may be any positive number not exceeding a certain upper limit. 2. X = the weight of a new-born baby. The weight may be any number between certain lower and upper limits. 3. X = the height of a randomly chosen Hungarian man. The height may be any number between certain lower and upper limits. Obviously, when we observe such a random variable, that is, we check the time or measure the weight or the height, - we get the time rounded, for example, to minutes, possibly with some decimals, - we get the weight rounded, for example, to kilograms or grams, or to pounds, possibly with some decimals, - we get the height rounded, for example, to centimeters, or to inches, possibly with some decimals, as well.
    [Show full text]
  • Arxiv:1807.01756V3 [Q-Fin.PR] 18 Apr 2019
    April 19, 2019 3:29 WSPC/INSTRUCTION FILE main International Journal of Theoretical and Applied Finance c World Scientific Publishing Company OPTION PRICING WITH HEAVY-TAILED DISTRIBUTIONS OF LOGARITHMIC RETURNS LASKO BASNARKOV Faculty of Computer Science and Engineering, SS. Cyril and Methodius University, Skopje, Macedonia and Macedonian Academy of Sciences and Arts, Skopje, Macedonia lasko.basnarkov@finki.ukim.mk VIKTOR STOJKOSKI Macedonian Academy of Sciences and Arts, Skopje, Macedonia [email protected] ZORAN UTKOVSKI Fraunhofer Heinrich Hertz Institute, Berlin, Germany [email protected] LJUPCO KOCAREV Macedonian Academy of Sciences and Arts, Skopje, Macedonia and Faculty of Computer Science and Engineering, SS. Cyril and Methodius University, Skopje, Macedonia [email protected] Received (Day Month Year) arXiv:1807.01756v3 [q-fin.PR] 18 Apr 2019 Revised (Day Month Year) A growing body of literature suggests that heavy tailed distributions represent an ade- quate model for the observations of log returns of stocks. Motivated by these findings, here we develop a discrete time framework for pricing of European options. Probability density functions of log returns for different periods are conveniently taken to be con- volutions of the Student’s t-distribution with three degrees of freedom. The supports of these distributions are truncated in order to obtain finite values for the options. Within this framework, options with different strikes and maturities for one stock rely on a single parameter – the standard deviation of the Student’s t-distribution for unit period. We provide a study which shows that the distribution support width has weak influence on the option prices for certain range of values of the width.
    [Show full text]