Statistical Inference on the Multinomial Logistic Regression Coefficients with Application

Al- Azhar University – Gaza Deanship of Postgraduate Studies Faculty of Economics and Administrative Sciences Department of Applied Statistics Statistical Inference on The Multinomial Logistic Regression Coefficients With Application اﻻستدﻻل اﻹحصائي حول معامﻻت اﻻنحدار اللوجستي متعدد الحدود مع التطبيق Prepared by Mohammed Abed Elftah Mohamed Shehada Supervised by Prof. Dr. Mahmoud K. Okasha Professor of Statistics A Thesis Submitted in Partial Fulfillment of Requirements for the Degree of M.Sc. of Applied Statistics March, 2015 يَ ْر َفعِ اللَّ ُه الَّ ِذي َن آَ َم ُنوا ِم ْن ُك ْم ُ َ الَّ ِذي َن ُووا ا ْل ِ ْل َم َ َ َ ا ٍت ﴿ سورة اجملادةل , الآية 11﴾ I اﻹهداء إىل ٍِ أرضعتين احلت واحلنبُ إىل رٍش احلت وثيسٌ اىشفبء إىل اىقيت اىنبصع ثبىجُبض )واىدتٍ اىعشَشح ( إىل ٍِ مييه اهلل ثبهلُجخ واىىقبر .. إىل ٍِ عيَين اىعطبء ثدوُ اّتظبر .. إىل ٍِ أمحو أمسه ثنو افتخبر .. أرجى ٍِ اهلل أُ ميد يف عَزك ىرتي مثبراً قد حبُ قطبفهب ثعد طىه اّتظبر وستجقً ميَبتل جنىً أهتدٌ هبب اىُىً ويف اىغد وإىل اﻷثد ..)واىدٌ اىعشَش ( إىلسندٌ يف احلُبح سوجيت ...إىل فزحيت يف احلُبح أثنبئٍ اﻷعشاء إىل اىقيىة اىطبهزح اىزقُقخ واىنفىص اىربَئخ إىل رَبحني حُبتٍ )إخىتٍ و أخىاتٍ( إىل أسبتذتٍ املىقزَِ اىذَِ مل َجخيىا عيٍ ثبىعطبء إىل أصدقبئٍ .. إىل مو ٍِ سبهٌ يف إخزاج هذا اىعَو إىل اىنىر إىُنٌ مجُعب اهدٌ هذا اىعَو املتىاضع II DECLARATION I certify that this thesis is submitted for the partial fulfillment of the Master degree and it is the result of my own research, except where otherwise acknowledged, and that this thesis (or any part of it) has not been submitted for a higher degree to any university or institution. Signed ............ Mohammed Abed Elftah Mohamed Shehada Date: ----------------- III ACKNOWLEDGEMENTS I would like to extend my sincere thanks and gratitude to all who helped me to complete this work and link to such a degree of knowledge. I express my deepest gratitude to my dissertation supervisor, Prof. Dr. Mahmoud K. Okasha, for his helpful discussions, suggestions and guidance. Without whose knowledge and assistance this study would not have been successful. I take this opportunity to give a big thanks to Prof. Dr. Abdulla El-Habil , Dr. Shady AL-Telbany and Dr. Mo‘omen EL-Hanjouri, I thank them for their helpful discussions, suggestions and guidance. I would also give my thanks to all the faculty members in the Departments of Statistics in a Al azhar University-Gaza. Special thanks also to all my graduate friends for sharing the literature and invaluable assistance. I would also like to convey thanks to the Palestinian central Bureau of statistics for their cooperation and permission to use the real data. I have learned many valuable lessons from them . I obligated to them every greeting and respect. I would like to thank candle of my life "my wife" for here encouraging me to take the effort, for her patience, and for the time she sacrificed. Finally, I would like to extend my sincere thanks and gratitude to all who helped me complete and all those who helped me and supported me. Thank you all IV ABSTRACT Various estimation methods for the parameters of the multinomial logistic regression model are available and various numeric optimization algorithms could be used in the estimation process of the model's parameters. This study aims to find out the best estimation method and best numeric algorithm to build a reliable multinomial logistic regression model. It is aimed to look at the theoretical background of various estimation methods and numeric algorithms and discuss their capabilities in providing a reliable estimates of the parameters of the multinomial logistic regression model. All theoretical methods are applied on a real data set from a survey, conducted by the Palestinian Bureau of Statistics (PCBS), to classify the anemia status among Palestinian children aged five years old or less (2010-2011). In this study, we compared five estimation methods and algorithms using different assessment techniques (classification table, cross-validation and ROC curves) and obtained a reliable MLR model with the best accuracy and the least error rate for a given dataset. Ten independent variables from the survey were selected and used to classify the Anemia response categories (normal, mild anemia, moderate anemia and severe anemia). The result of comparison between all the classification models using various estimation methods, was that the ridge estimation method gives a highly reliable estimated model which can predict anemia categories better than the other methods. The correct classification of 92.4% has been achieved using the ridge method and approximately 79.3% using all other methods (Iteratively reweighted least squares, maximum-likelihood, iterative maximum-likelihood and Neural Networks). Thus, ridge estimation method proved to be more accurate than all the other methods. Applying the ROC curve analysis, ridge method shows a big improvement in the area under the curve (AUC), where the AUC was 92.4% versus all other methods which were only approximately 55.5 %. V الملخص ىناك العديد من طرق تقدير معامﻻت نموذج اﻻنحدار الموجستي المتعدد، والعديد من الخوارزميات العددية التي يمكن استخدميا مع كل طريقة لتقدير معامﻻت النموذج، وتيدف ىذه الدراسة ﻹيجاد أفضل طريقة وأفضل خوارزمية لتقدير المعالم من أجل اشتقاق أفضل نموذج انحدار لوجستي متعدد يمكن اﻻعتماد عميو في تصنيف فئات متغير تصنيفي، وذلك من خﻻل مناقشة الخمفية النظرية لطرق التقدير المختمفة ل نموذج اﻻنحدار الموجستي المتعدد والخوارزميات العددية المستخدمة ليذا الغرض، ومناقشة قدراتيا في تزويدنا بنموذج انحدار لوجستي متعدد ذو ثقة عالية. كل الطرق النظرية التي نقوم بمناقشتيا خﻻل ىذه الدراسة سيتم تطبيقيا عمى بيانات حقيقية تم أخذىا من مسح معد بواسطة الجياز المركزي لﻹحصاء الفمسطيني )2010-2011(؛ بيدف تصنيف حالة فقر الدم عند اﻷطفال في الفئة العمرية خمس سنوات فأقل، وقد قارّنا في ىذه الد ارسة بين خمسة طرق تقدير وخوارزميات باستخدام ثﻻثة تقنيات تقييم مختمفة )مثل جداول التصنيف وأساليب محاكاة وتحميل منحنيات الروك(، وحصمنا عمى نموذج انحدار لوجستي متعدد أكثر دقة ومعدل أقل خطأ لمبيانات، فقد تم اختيار عشرة متغيرات مستقمة من بين متغيرات المسح، وتم استخداميا لتصنيف حالة فقر الدم عند اﻷطفال والتي اشتممت عمى التصنيفات التالية: طبيعي – فقر دم خفيف – فقر دم متوسط – فقر دم شديد. ونتيجة المقارنة لطرق التقدير والخوارزميات المختمفة كانت لصالح طريقة ريدج التي بينت أنيا تستطيع أن تتنبأ بكفاءة أعمى من باقي الطرق، حيث بمغت نسبة التصنيف الصحيحة 94.4 % لطريقة ريدج مقارنة ب 79.3 % تقريبا لباقي الطرق )طريقة المربعات الصغرى المرجحة والمتدرجة، طريقة اﻷرجحية العظمي، وطريقة اﻷرجحية العظمي المتدرجة، وطريقة الشبكات العصبية( . وقد أثبتت طريقة ريدج لمتقدير أنيا اكثر دقة من باقي الطرق، وبتطبيق تحميل منحنيات الروك نﻻحظ أن طريقة ريدج تظير تحسناً ممحوظاً في المساحة تحت المنحني 92.4% عن باقي الطرق اﻷخرى التي ت اروحت المساحة تحت المنحني باستخداميا جميعاً حول %55.5 فقط. VI Table of Contents Chapter 1 Introduction ..............................................................................................1 1.1 Background ..........................................................................................................1 1.2 Motivation: ...........................................................................................................2 1.3 Research Problem: ...............................................................................................3 1.4 Research Methodology: .......................................................................................3 1.5 Research Objectives: ............................................................................................4 1. 6 Data Description .................................................................................................4 1.6.1 The study-population: ................................................................................... 5 1.6.2 Sampling Frame ............................................................................................ 5 1.6.3 Sample Size, Sample Design and Type ........................................................ 5 1.6.4 Concepts and Definitions (PCBS, 2010) ..................................................... 5 1.7 Literature Review Previous Studies .....................................................................6 Chapter (2) Logistic Regression ..............................................................................11 2.1 Introduction: .......................................................................................................11 2.2 Logistic function, odds , odds ratio, and logit....................................................11 2.2.1 Logistic function ......................................................................................... 11 2.2.2 The logistic model....................................................................................... 12 2.2.3 Odds ............................................................................................................ 13 2.2.4 The odds ratio ............................................................................................. 13 2.2.5 The logit ..................................................................................................... 13 2.3 The logistic regression – binary response ..........................................................13 2.4 Fitting the logistic regression model ..................................................................14 2.5 Significance of the coefficients in logistic regression .......................................16 2.5.1 Likelihood ratio test .................................................................................... 16 2.5.2 Wald statistic ..............................................................................................

Statistical Inference on the Multinomial Logistic Regression Coefficients with Application

EART6 Lab Standard Deviation in Probability and Statistics, The

Robust Logistic Regression to Static Geometric Representation of Ratios

Iam 530 Elements of Probability and Statistics

Measurements of Entropic Uncertainty Relations in Neutron Optics

THE USE of EFFECT SIZES in CREDIT RATING MODELS By

Measures of Central Tendency (Mean, Mode, Median,) Mean • Mean Is the Most Commonly Used Measure of Central Tendency

Introduction

Media Diversity and the Analysis of Qualitative Variation

Standard Deviation - Wikipedia Visited on 9/25/2018

Probability and Statistics Course Outline

Central Tendency, Dispersion, Standard Error, Coefficient of Variation, Probability Distributions and Confidence Limits)

Lecture 4: Measure of Dispersion