Statistical-Methods.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
8 Statistical Methods Raghu Nandan Sengupta and Debasis Kundu CONTENTS 8.1 Introduction .................................................... 414 8.2 Basic Concepts of Data Analysis . .................................. 415 8.3 Probability . .................................................... 416 8.3.1 Sample Space and Events . .................................. 416 8.3.2 Axioms, Interpretations, and Properties of Probability . ............. 418 8.3.3 Borel σ-Field, Random Variables, and Convergence .................... 418 8.3.4 SomeImportantResults....................................... 419 8.4 Estimation..................................................... 421 8.4.1 Introduction . ......................................... 421 8.4.2 Desirable Properties .......................................... 421 8.4.3 Methods of Estimation . .................................. 422 8.4.4 MethodofMomentEstimators.................................. 426 8.4.5 Bayes Estimators . ......................................... 428 8.5 Linear and Nonlinear Regression Analysis . ........................... 430 8.5.1 Linear Regression Analysis . .................................. 432 8.5.1.1 Bayesian Inference . .................................. 434 8.5.2 Nonlinear Regression Analysis .................................. 436 8.6 Introduction to Multivariate Analysis . .................................. 437 8.7 JointandMarginalDistribution....................................... 442 8.8 MultinomialDistribution........................................... 445 8.9 MultivariateNormalDistribution...................................... 448 8.10 Multivariate Student t-Distribution..................................... 450 8.11WishartDistribution............................................... 453 8.12MultivariateExtremeValueDistribution................................. 454 8.13MLEEstimatesofParameters(RelatedtoMNDOnly)....................... 458 8.14 Copula Theory . ................................................ 459 8.15 Principal Component Analysis . .................................. 461 8.16 Factor Analysis . ................................................ 464 8.16.1 Mathematical Formulation of Factor Analysis . .................... 464 8.16.2 Estimation in Factor Analysis . .................................. 466 8.16.3 Principal Component Method . .................................. 467 8.16.4 Maximum Likelihood Method . .................................. 468 8.16.5 General Working Principle for FA . ........................... 468 8.17 Multiple Analysis of Variance and Multiple Analysis of Covariance . ............. 471 8.17.1 Introduction to Analysis of Variance . ........................... 471 8.17.2 Multiple Analysis of Variance . .................................. 472 8.18ConjointAnalysis................................................ 475 413 414 Decision Sciences 8.19 Canonical Correlation Analysis . .................................. 476 8.19.1 Formulation of Canonical Correlation Analysis . .................... 477 8.19.2 Standardized Form of CCA . .................................. 478 8.19.3 Correlation between Canonical Variates and Their Component Variables ...... 479 8.19.4TestingtheTestStatisticsinCCA ................................ 480 8.19.5 Geometric and Graphical Interpretation of CCA . .................... 485 8.19.6 Conclusions about CCA . .................................. 485 8.20ClusterAnalysis................................................. 486 8.20.1ClusteringAlgorithms........................................ 488 8.21 Multiple Discriminant and Classification Analysis . .................... 493 8.22 Multidimensional Scaling . ......................................... 497 8.23StructuralEquationModeling........................................ 498 8.24FutureAreasofResearch........................................... 501 References . .................................................... 502 ABSTRACT The chapter of Statistical Methods starts with the basic concepts of data analysis and then leads into the concepts of probability, important properties of probability, limit theorems, and inequalities. The chapter also covers the basic tenets of estimation, desirable properties of esti- mates, before going on to the topic of maximum likelihood estimation, general methods of moments, Baye’s estimation principle. Under linear and nonlinear regression different concepts of regressions are discussed. After which we discuss few important multivariate distributions and devote some time on copula theory also. In the later part of the chapter, emphasis is laid on both the theoretical content as well as the practical applications of a variety of multivariate techniques like Principle Component Analysis (PCA), Factor Analysis, Analysis of Variance (ANOVA), Multivariate Analy- sis of Variance (MANOVA), Conjoint Analysis, Canonical Correlation, Cluster Analysis, Multiple Discriminant Analysis, Multidimensional Scaling, Structural Equation Modeling, etc. Finally, the chapter ends with a good repertoire of information related to softwares, data sets, journals, etc., related to the topics covered in this chapter. 8.1 Introduction Downloaded by [Debasis Kundu] at 16:48 25 January 2017 Many people are familiar with the term statistics. It denotes recording of numerical facts and figures, for example, the daily prices of selected stocks on a stock exchange, the annual employment and unemployment of a country, the daily rainfall in the monsoon season, etc. However, statistics deals with situations in which the occurrence of some events cannot be predicted with certainty. It also provides methods for organizing and summarizing facts and for using information to draw various conclusions. Historically, the word statistics is derived from the Latin word status meaning state. For several decades, statistics was associated solely with the display of facts and figures pertaining to eco- nomic, demographic, and political situations prevailing in a country. As a subject, statistics now encompasses concepts and methods that are of far-reaching importance in all enquires/questions that involve planning or designing of the experiment, gathering of data by a process of experimen- tation or observation, and finally making inference or conclusions by analyzing such data, which eventually helps in making the future decision. Fact finding through the collection of data is not confined to professional researchers. It is a part of the everyday life of all people who strive, consciously or unconsciously, to know matters of interest concerning society, living conditions, the environment, and the world at large. Sources Statistical Methods 415 of factual information range from individual experience to reports in the news media, government records, and articles published in professional journals. Weather forecasts, market reports, costs of living indexes, and the results of public opinion are some other examples. Statistical methods are employed extensively in the production of such reports. Reports that are based on sound statistical reasoning and careful interpretation of conclusions are truly informative. However, the deliberate or inadvertent misuse of statistics leads to erroneous conclusions and distortions of truths. 8.2 Basic Concepts of Data Analysis In order to clarify the preceding generalities, a few examples are provided: Socioeconomic surveys: In the interdisciplinary areas of sociology, economics, and political science, such aspects are taken as the economic well-being of different ethnic groups, consumer expenditure patterns of different income levels, and attitudes toward pending legislation. Such studies are typically based on data oriented by interviewing or contacting a representative sample of person selected by statistical process from a large population that forms the domain of study. The data are then analyzed and interpretations of the issue in questions are made. See, for example, a recent monograph by Bandyopadhyay et al. (2011) on this topic. Clinical diagnosis: Early detection is of paramount importance for the successful surgical treatment of many types of fatal diseases, say, for example, cancer or AIDS. Because frequent in-hospital checkups are expensive or inconvenient, doctors are searching for effective diagnosis process that patients can administer themselves. To determine the mer- its of a new process in terms of its rates of success in detecting true cases avoiding false detection, the process must be field tested on a large number of persons, who must then undergo in-hospital diagnostic test for comparison. Therefore, proper planning (designing the experiments) and data collection are required, which then need to be analyzed for final conclusions. An extensive survey of the different statistical methods used in clinical trial design can be found in Chen et al. (2015). Plant breeding: Experiments involving the cross fertilization of different genetic types of Downloaded by [Debasis Kundu] at 16:48 25 January 2017 plant species to produce high-yielding hybrids are of considerable interest to agricultural scientists. As a simple example, suppose that the yield of two hybrid varieties are to be compared under specific climatic conditions. The only way to learn about the relative performance of these two varieties is to grow them at a number of sites, collect data on their yield, and then analyze the data. Interested readers may refer to the edited volume by Kempton and Fox (2012) for further reading on this particular topic. In recent years, attempts have