Jorge Fallas

Total Page:16

File Type:pdf, Size:1020Kb

Jorge Fallas 1 XLStatistics ANÁLISIS ESTADÍSTICO CON EXCEL La información independientemente de lo costosa que haya sido crearla, puede ser replicada y compartida a un costo mínimo o nulo. -- Thomas Jefferson Jorge Fallas 2010 i Contenido ¿Qué es estadística? ...................................................................................................................... 1 Exactitud y precisión ....................................................................................................................... 1 Variables: medición y clasificación .................................................................................................. 2 Variables cualitativas ................................................................................................................ 2 Variables cuantitativas ............................................................................................................. 3 El proceso de investigación ............................................................................................................. 4 Terminología ................................................................................................................................... 5 Prueba de hipótesis: Error tipo I y II ............................................................................................... 12 Sugerencias para el análisis de datos ........................................................................................... 13 Estadística: Software gratuito ........................................................................................................ 13 ¿Qué es XLSTatistics? .................................................................................................................. 16 Instalación ..................................................................................................................................... 17 Desinstalación ............................................................................................................................... 17 XLStatistics: interfaz grafica y funciones ....................................................................................... 17 Otros complementos gratuitos para análisis estadístico en Excel .................................................. 23 Complementos comerciales para Excel ......................................................................................... 23 Programas gratuitos y en línea para análisis estadístico ............................................................... 24 Sitios de interés ............................................................................................................................. 25 Análisis de una variable numérica (variable cuantitativa discreta ó continua) ................................ 28 Data and Description: Datos y su descripción ........................................................................ 29 Summaries: Síntesis de datos (Tabla de frecuencia y gráficos).............................................. 30 Tests: Pruebas estadísticas ................................................................................................... 31 Prueba “t” para una muestra (requisitos muestra independiente y normalidad) ...................... 31 Análisis de Residuos .............................................................................................................. 31 Prueba de Normalidad ........................................................................................................... 33 Análisis de poder y determinación de tamaño de muestra ...................................................... 33 Análisis de error marginal para IC de la media ....................................................................... 34 Pruebas no paramétricas ....................................................................................................... 35 Prueba de signos (prueba de mediana) .................................................................................. 35 Prueba de Chi-2 para la varianza ........................................................................................... 35 Intervalos de tolerancia y de predicción .................................................................................. 36 Herramientas adicionales ....................................................................................................... 36 Datos agrupados 1NumGD.xls ............................................................................................... 36 Gráfico de probabilidad normal ............................................................................................... 37 Análisis de una variable cualitativa (nominal-ordinal) .................................................................... 38 Data and Description: Datos y su descripción ........................................................................ 38 ii Summaries: Tabla de frecuencia y gráficos ............................................................................ 39 Tests: Intervalo de confianza y prueba de hipótesis para proporciones (N grande) ................ 39 Análisis de poder y determinación de tamaño de muestra ...................................................... 40 Análisis de error marginal para IC de proporciones ................................................................ 40 Intervalo de confianza y prueba de hipótesis para proporciones (muestras pequeñas) .......... 40 Prueba de Bondad de ajuste (ji-cuadrado) ............................................................................. 41 Prueba de corridas o de Wald–Wolfowitz ............................................................................... 41 Análisis de dos variables numéricas (continuas ó discretas): Correlación y regresión simple ........ 43 Conceptos básicos sobre regresión lineal simple ................................................................... 44 Data and Description: Datos y su descripción ........................................................................ 45 Análisis de correlación y regresión lineal ................................................................................ 45 Prueba de hipótesis (intercepto, pendiente) ........................................................................... 45 Análisis de varianza: significancia del modelo ........................................................................ 46 Análisis de residuos: Evaluación de los supuestos del modelo............................................... 46 Intervalo de predicción y banda de predicción ........................................................................ 50 Herramientas adicionales ....................................................................................................... 50 Ajuste de funciones ................................................................................................................ 51 Transformaciones lineales ...................................................................................................... 51 Ecuación polinomial ............................................................................................................... 51 Regresión no lineal ................................................................................................................. 51 Cambio de pendiente (tendencia) en el set de datos .............................................................. 52 Media móvil ............................................................................................................................ 52 Media con barras ó bandas de error ....................................................................................... 53 Regresión lineal localmente ponderada .................................................................................. 53 Estadísticos descriptivos y gráficos para la variable predictora (X) y la dependiente (Y) ........ 55 Análisis de correlación no paramétrico ................................................................................... 56 Series de tiempo .................................................................................................................... 56 Ejercicio ................................................................................................................................. 57 Modelos matemáticos para datos cuantitativos bivariados ..................................................... 57 Precauciones al realizar un análisis de correlación-regresión ................................................ 57 Ejemplo de análisis de regresión utilizando software en línea ................................................ 59 Análisis de una variable numérica y otra nominal (1Num1Cat) ...................................................... 65 Datos y estadísticos descriptivos ............................................................................................ 65 Grafico de frecuencia comparativo por categoría ................................................................... 66 Grafico de Columna ............................................................................................................... 67 Grafico de medias con barra de error ..................................................................................... 67 Grafico de medias con mas categorías .................................................................................. 68 Análisis de varianza paramétrico de 1 vía .............................................................................. 68 iii Supuestos del
Recommended publications
  • Correspondence Analysis
    NCSS Statistical Software NCSS.com Chapter 430 Correspondence Analysis Introduction Correspondence analysis (CA) is a technique for graphically displaying a two-way table by calculating coordinates representing its rows and columns. These coordinates are analogous to factors in a principal components analysis (used for continuous data), except that they partition the Chi-square value used in testing independence instead of the total variance. For those of you who are new to CA, we suggest that you obtain Greenacre (1993). This is an excellent introduction to the subject, is very readable, and is suitable for self-study. If you want to understand the technique in detail, you should obtain this (paperback) book. Discussion We will explain CA using the following example. Suppose an aptitude survey consisting of eight yes or no questions is given to a group of tenth graders. The instructions on the survey allow the students to answer only those questions that they want to. The results of the survey are tabulated as follows. Aptitude Survey Results – Counts Question Yes No Total Q1 155 938 1093 Q2 19 63 82 Q3 395 542 937 Q4 61 64 125 Q5 1336 876 2212 Q6 22 14 36 Q7 864 354 1218 Q8 920 185 1105 Total 3772 3036 6808 Take a few moments to study this table and see what you can discover. The most obvious pattern is that many of the students did not answer all the questions. This makes response patterns between rows difficult to analyze. To solve this problem of differential response rates, we create a table of row percents (or row profiles as they are called in CA).
    [Show full text]
  • A Correspondence Analysis of Child-Care Students' and Medical
    International Education Journal Vol 5, No 2, 2004 http://iej.cjb.net 176 A Correspondence Analysis of Child-Care Students’ and Medical Students’ Knowledge about Teaching and Learning Helen Askell-Williams Flinders University School of Education [email protected] Michael J. Lawson Flinders University School of Education [email protected] This paper describes the application of correspondence analysis to transcripts gathered from focussed interviews about teaching and learning held with a small sample of child-care students, medical students and the students’ teachers. Seven dimensions emerged from the analysis, suggesting that the knowledge that underlies students’ learning intentions and actions is multi-dimensional and transactive. It is proposed that the multivariate, multidimensional, discovery approach of the correspondence analysis technique has considerable potential for data analysis in the social sciences. Teaching, learning, knowledge, correspondence analysis INTRODUCTION The purpose of this paper is to describe the application of correspondence analysis to rich text- based data derived from interviews with teachers and learners about their knowledge about teaching and learning. Correspondence analysis is a non-linear, multidimensional technique of multivariate descriptive analysis that “specialises in ‘discovering,’ through detailed analysis of a given data set” (Nishisato, 1994 p.7). A description of what teachers and learners know about teaching and learning will assist in developing the educational community’s understanding about teaching and learning. If researchers, designers and policy makers are well informed about teachers’ and learners’ knowledge, they will be better equipped to design and recommend educational programs that meet students’ learning needs. If teachers possess high quality knowledge about their own, and their students’, knowledge then they will be better equipped to design and deliver high quality teaching.
    [Show full text]
  • Estimating Complex Production Functions: the Importance of Starting Values
    Version: January 26 Estimating complex production functions: The importance of starting values Mark Neal Presented at the 51st Australian Agricultural and Resource Economics Society Conference, Queenstown, New Zealand, 14-16 February, 2007. Postdoctoral Research Fellow Risk and Sustainable Management Group University of Queensland Room 519, School of Economics, Colin Clark Building (39) University of Queensland, St Lucia, QLD, 4072 Email: [email protected] Ph: +61 (0) 7 3365 6601 Fax: +61 (0) 7 3365 7299 http://www.uq.edu.au/rsmg Acknowledgements Chris O’Donnell helpfully provided access to Gauss code that he had written for estimation of latent class models so it could be translated into Shazam. Leighton Brough (UQ), Ariel Liebman (UQ) and Tom Pechey (UMelb) helped by organising a workshop with Nimrod/enFuzion at UQ in June 2006. enFuzion (via Rok Sosic) provided a limited license for use in the project. Leighton Brough, tools coordinator for the ARC Centre for Complex Systems (ACCS), assisted greatly in setting up an enFuzion grid and collating the large volume of results. Son Nghiem provided assistance in collating and analysing data as well as in creating some of the figures. Page 1 of 30 Version: January 26 ABSTRACT Production functions that take into account uncertainty can be empirically estimated by taking a state contingent view of the world. Where there is no a priori information to allocate data amongst a small number of states, the estimation may be carried out with finite mixtures model. The complexity of the estimation almost guarantees a large number of local maxima for the likelihood function.
    [Show full text]
  • Apple / Shazam Merger Procedure Regulation (Ec)
    EUROPEAN COMMISSION DG Competition CASE M.8788 – APPLE / SHAZAM (Only the English text is authentic) MERGER PROCEDURE REGULATION (EC) 139/2004 Article 8(1) Regulation (EC) 139/2004 Date: 06/09/2018 This text is made available for information purposes only. A summary of this decision is published in all EU languages in the Official Journal of the European Union. Parts of this text have been edited to ensure that confidential information is not disclosed; those parts are enclosed in square brackets. EUROPEAN COMMISSION Brussels, 6.9.2018 C(2018) 5748 final COMMISSION DECISION of 6.9.2018 declaring a concentration to be compatible with the internal market and the EEA Agreement (Case M.8788 – Apple/Shazam) (Only the English version is authentic) TABLE OF CONTENTS 1. Introduction .................................................................................................................. 6 2. The Parties and the Transaction ................................................................................... 6 3. Jurisdiction of the Commission .................................................................................... 7 4. The procedure ............................................................................................................... 8 5. The investigation .......................................................................................................... 8 6. Overview of the digital music industry ........................................................................ 9 6.1. The digital music distribution value
    [Show full text]
  • Statistics and GIS Assistance Help with Statistics
    Statistics and GIS assistance An arrangement for help and advice with regard to statistics and GIS is now in operation, principally for Master’s students. How do you seek advice? 1. The users, i.e. students at INA, make direct contact with the person whom they think can help and arrange a time for consultation. Remember to be well prepared! 2. Doctoral students and postdocs register the time used in Agresso (if you have questions about this contact Gunnar Jensen). Help with statistics Research scientist Even Bergseng Discipline: Forest economy, forest policies, forest models Statistical expertise: Regression analysis, models with random and fixed effects, controlled/truncated data, some time series modelling, parametric and non-parametric effectiveness analyses Software: Stata, Excel Postdoc. Ole Martin Bollandsås Discipline: Forest production, forest inventory Statistics expertise: Regression analysis, sampling Software: SAS, R Associate Professor Sjur Baardsen Discipline: Econometric analysis of markets in the forest sector Statistical expertise: General, although somewhat “rusty”, expertise in many econometric topics (all-rounder) Software: Shazam, Frontier Associate Professor Terje Gobakken Discipline: GIS og long-term predictions Statistical expertise: Regression analysis, ANOVA and PLS regression Software: SAS, R Ph.D. Student Espen Halvorsen Discipline: Forest economy, forest management planning Statistical expertise: OLS, GLS, hypothesis testing, autocorrelation, ANOVA, categorical data, GLM, ANOVA Software: (partly) Shazam, Minitab og JMP Ph.D. Student Jan Vidar Haukeland Discipline: Nature based tourism Statistical expertise: Regression and factor analysis Software: SPSS Associate Professor Olav Høibø Discipline: Wood technology Statistical expertise: Planning of experiments, regression analysis (linear and non-linear), ANOVA, random and non-random effects, categorical data, multivariate analysis Software: R, JMP, Unscrambler, some SAS Ph.D.
    [Show full text]
  • Bab 1 Pendahuluan
    BAB 1 PENDAHULUAN Bab ini akan membahas pengertian dasar statistik dengan sub-sub pokok bahasan sebagai berikut : Sub Bab Pokok Bahasan A. Sejarah dan Perkembangan Statistik B. Tokoh-tokoh Kontributor Statistika C. Definisi dan Konsep Statistik Modern D. Kegunaan Statistik E. Pembagian Statistik F. Statistik dan Komputer G. Soal Latihan A. Sejarah dan Perkembangan Statistik Penggunaan istilah statistika berakar dari istilah-istilah dalam bahasa latin modern statisticum collegium (“dewan negara”) dan bahasa Italia statista (“negarawan” atau “politikus”). Istilah statistik pertama kali digunakan oleh Gottfried Achenwall (1719-1772), seorang guru besar dari Universitas Marlborough dan Gottingen. Gottfried Achenwall (1749) menggunakan Statistik dalam bahasa Jerman untuk pertama kalinya sebagai nama bagi kegiatan analisis data kenegaraan, dengan mengartikannya sebagai “ilmu tentang negara/state”. Pada awal abad ke- 19 telah terjadi pergeseran arti menjadi “ilmu mengenai pengumpulan dan klasifikasi data”. Sir John Sinclair memperkenalkan nama dan pengertian statistics ini ke dalam bahasa Inggris. E.A.W. Zimmerman mengenalkan kata statistics ke negeri Inggris. Kata statistics dipopulerkan di Inggris oleh Sir John Sinclair dalam karyanya: Statistical Account of Scotland 1791-1799. Namun demikian, jauh sebelum abad XVIII masyarakat telah mencatat dan menggunakan data untuk keperluan mereka. Pada awalnya statistika hanya mengurus data yang dipakai lembaga- lembaga administratif dan pemerintahan. Pengumpulan data terus berlanjut, khususnya melalui sensus yang dilakukan secara teratur untuk memberi informasi kependudukan yang selalu berubah. Dalam bidang pemerintahan, statistik telah digunakan seiring dengan perjalanan sejarah sejak jaman dahulu. Kitab perjanjian lama (old testament) mencatat adanya kegiatan sensus penduduk. Pemerintah kuno Babilonia, Mesir, dan Roma mengumpulkan data lengkap tentang penduduk dan kekayaan alam yang dimilikinya.
    [Show full text]
  • 449, 468 Across-Stage Inferencing 568, 57
    xxxv Index 6 Cs 1254, 1258 Apriori-based Graph Mining (AGM) 13 6 Ps 1253 Apriori with Subgroup and Constraint (ASC) 948 Arabidopsis thaliana 205, 227 A ArM32 operations 458 ArnetMiner 331 abnormal alarm 2183 Arterial Pressure (AP) 902 academic theory 1678 ArTex with Resolutions (ARes) 1047 Access Methods (AM) 449, 468 arthrokinetics restrictions 653 across-stage inferencing 568, 575-576, 580, 583 Artificial Intelligence (AI) 991, 1262 Active Learning (AL) 67, 69 Artificial Neural Networks (ANN) 250, 299, 367, 704, ADaMSoft 1315 1114, 1573, 2253 Adaptive Modulation and Coding (AMC) 345 Art of War 477 Adaptive Neuro Fuzzy Inference System (ANFIS) Asset Management (AM) 1607 1114-1115, 1124, 2250, 2252, 2254-2255, 2268 Association Rule Mining (ARM) 28, 47, 108, 148, Additive White Gaussian Noise (AWGN) 2084 684, 856, 974, 989, 1740 advanced search 1314, 1317, 1338 ANGEL data 846-847 Adverse Drug Events (ADEs) 1942 approaches 1739-1740, 1743, 1748-1749 Agglomerative Hierarchical Clustering (AHC) 1435 association rules 57, 65, 587 aggregate constraint 1842 calendric 594 aggregate function 681-682, 686, 1434, 1842, 2028- maximal 503-509, 512-514 2029 quality measures 130 agrometeorological agents 1625 Association rules for Textures (ArTex) 1047 agrometeorological methods 1625-1626 Associative Classification (AC) 975 agrometeorological variables 1626 atomic condition 285-287 ambiguity-directed sampling 71 Attention Deficit Hyperactivity Disorder (ADHD) American Bureau of Shipping (ABS) 2175 1463 American National Election Studies (ANES) 30, 42
    [Show full text]
  • Research Article the Use of Multiple Correspondence Analysis to Explore Associations Between Categories of Qualitative Variables in Healthy Ageing
    Hindawi Publishing Corporation Journal of Aging Research Volume 2013, Article ID 302163, 12 pages http://dx.doi.org/10.1155/2013/302163 Research Article The Use of Multiple Correspondence Analysis to Explore Associations between Categories of Qualitative Variables in Healthy Ageing Patrício Soares Costa,1,2 Nadine Correia Santos,1,2 Pedro Cunha,1,2,3 Jorge Cotter,1,2,3 and Nuno Sousa1,2 1 Life and Health Sciences Research Institute (ICVS), School of Health Sciences, University of Minho, 4710-057 Braga, Portugal 2 ICVS/3B’s, PT Government Associate Laboratory, Guimaraes,˜ 4710-057 Braga, Portugal 3 Centro Hospital Alto Ave, EPE, 4810-055 Guimaraes,˜ Portugal Correspondence should be addressed to Nuno Sousa; [email protected] Received 30 June 2013; Revised 23 August 2013; Accepted 30 August 2013 Academic Editor: F. Richard Ferraro Copyright © 2013 Patr´ıcio Soares Costa et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The main focus of this study was to illustrate the applicability of multiple correspondence analysis (MCA) in detecting and representing underlying structures in large datasets used to investigate cognitive ageing. Principal component analysis (PCA) was used to obtain main cognitive dimensions, and MCA was used to detect and explore relationships between cognitive, clinical, physical, and lifestyle variables. Two PCA dimensions were identified (general cognition/executive function and memory), and two MCA dimensions were retained. Poorer cognitive performance was associated with older age, less school years, unhealthier lifestyle indicators, and presence of pathology.
    [Show full text]
  • Stock Market Prediction Using Ensemble of Graph Theory, Machine Learning and Deep Learning Models
    San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-20-2019 STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS Pratik Patil San Jose State University Follow this and additional works at: https://scholarworks.sjsu.edu/etd_projects Part of the Artificial Intelligence and Robotics Commons, and the Other Computer Sciences Commons Recommended Citation Patil, Pratik, "STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS" (2019). Master's Projects. 692. DOI: https://doi.org/10.31979/etd.38nc-j52r https://scholarworks.sjsu.edu/etd_projects/692 This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact [email protected]. STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS A Project Report Presented to Dr. Ching seh Wu Department of Computer Science San José State University In Partial Fulfillment Of the Requirements for the Class CS 298 By Pratik Patil May 2019 © 2019 Pratik Patil ALL RIGHTS RESERVED The Designated Thesis Committee Approves the Thesis Titled STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS by Pratik Patil APPROVED FOR THE DEPARTMENT OF COMPUTER SCIENCE SAN JOSÉ STATE UNIVERSITY May 2019 Dr. Ching seh Wu Department of Computer Science Dr. Katerina Potika Department of Computer Science Dr. Marjan Orang Department of Economics ACKNOWLEDGEMENT This has been one long and arduous journey, but nevertheless a worthwhile life experience because of the many great Professors at SJSU and beloved friends.
    [Show full text]
  • 1 Reliability of Programming Software: Comparison of SHAZAM and SAS
    Reliability of Programming Software: Comparison of SHAZAM and SAS. Oluwarotimi Odeh Department of Agricultural Economics, Kansas State University, Manhattan, KS 66506-4011 Phone: (785)-532-4438 Fax: (785)-523-6925 Email: [email protected] Allen M. Featherstone Department of Agricultural Economics, Kansas State University, Manhattan, KS 66506-4011 Phone: (785)-532-4441 Fax: (785)-523-6925 Email: [email protected] Selected Paper for Presentation at the Western Agricultural Economics Association Annual Meeting, Honolulu, HI, June 30-July 2, 2004 Copyright 2004 by Odeh and Featherstone. All rights reserved. Readers may make verbatim copies for commercial purposes by any means, provided that this copyright notice appears on all such copies. 1 Reliability of Programming Software: Comparison of SHAZAM and SAS. Introduction The ability to combine quantitative methods, econometric techniques, theory and data to analyze societal problems has become one of the major strengths of agricultural economics. The inability of agricultural economists to perform this task perfectly in some cases has been linked to the fragility of econometric results (Learner, 1983; Tomek, 1993). While small changes in model specification may result in considerable impact and changes in empirical results, Hendry and Richard (1982) have shown that two models of the same relationship may result in contradicting result. Results like these weaken the value of applied econometrics (Tomek, 1993). Since the study by Tice and Kletke (1984) computer programming software have undergone tremendous improvements. However, experience in recent times has shown that available software packages are not foolproof and may not be as efficient and consistent as researchers often assume. Compounding errors, convergence, error due to how software read, interpret and process data impact the values of analytical results (see Tomek, 1993; Dewald, Thursby and Anderson, 1986).
    [Show full text]
  • SQSTM1 Mutations in Familial and Sporadic Amyotrophic Lateral Sclerosis
    ORIGINAL CONTRIBUTION SQSTM1 Mutations in Familial and Sporadic Amyotrophic Lateral Sclerosis Faisal Fecto, MD; Jianhua Yan, MD, PhD; S. Pavan Vemula; Erdong Liu, MD; Yi Yang, MS; Wenjie Chen, MD; Jian Guo Zheng, MD; Yong Shi, MD, PhD; Nailah Siddique, RN, MSN; Hasan Arrat, MD; Sandra Donkervoort, MS; Senda Ajroud-Driss, MD; Robert L. Sufit, MD; Scott L. Heller, MD; Han-Xiang Deng, MD, PhD; Teepu Siddique, MD Background: The SQSTM1 gene encodes p62, a major In silico analysis of variants was performed to predict al- pathologic protein involved in neurodegeneration. terations in p62 structure and function. Objective: To examine whether SQSTM1 mutations con- Results: We identified 10 novel SQSTM1 mutations (9 tribute to familial and sporadic amyotrophic lateral scle- heterozygous missense and 1 deletion) in 15 patients (6 rosis (ALS). with familial ALS and 9 with sporadic ALS). Predictive in silico analysis classified 8 of 9 missense variants as Design: Case-control study. pathogenic. Setting: Academic research. Conclusions: Using candidate gene identification based on prior biological knowledge and the functional pre- Patients: A cohort of 546 patients with familial diction of rare variants, we identified several novel (n=340) or sporadic (n=206) ALS seen at a major aca- SQSTM1 mutations in patients with ALS. Our findings demic referral center were screened for SQSTM1 muta- provide evidence of a direct genetic role for p62 in ALS tions. pathogenesis and suggest that regulation of protein deg- radation pathways may represent an important thera- Main Outcome Measures: We evaluated the distri- peutic target in motor neuron degeneration. bution of missense, deletion, silent, and intronic vari- ants in SQSTM1 among our cohort of patients with ALS.
    [Show full text]
  • Correspondence Analysis and Classification Gilbert Saporta, Ndeye Niang Keita
    Correspondence analysis and classification Gilbert Saporta, Ndeye Niang Keita To cite this version: Gilbert Saporta, Ndeye Niang Keita. Correspondence analysis and classification. Michael Greenacre; Jörg Blasius. Multiple Correspondence Analysis and Related Methods, Chapman and Hall/CRC, pp.371-392, 2006, 9781584886280. hal-01125202 HAL Id: hal-01125202 https://hal.archives-ouvertes.fr/hal-01125202 Submitted on 23 Mar 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. CORRESPONDENCE ANALYSIS AND CLASSIFICATION Gilbert Saporta, Ndeye Niang The use of correspondence analysis for discrimination purposes goes back to the “prehistory” of data analysis (Fisher, 1940) where one looks for the optimal scaling of categories of a variable X in order to predict a categorical variable Y. When there are several categorical predictors a commonly used technique consists in a two step analysis: multiple correspondence on the predictors set, followed by a discriminant analysis using factor coordinates as numerical predictors (Bouroche and al.,1977). However in banking applications (credit scoring) logistic regression seems to be more and more used instead of discriminant analysis when predictors are categorical. One of the reasons advocated in favour of logistic regression, is that it gives a probabilistic model and it is often claimed among econometricians that the theoretical basis is more solid, but this is arguable.
    [Show full text]