The University of Dodoma University of Dodoma Institutional Repository http://repository.udom.ac.tz

Social Sciences Master Dissertations

2019 analysis and forecasting of patients arriving at regional referral hospital Dodoma, Tanzania (during the year 2017-2018)

Loibor, Julius Moinget

The University of Dodoma

Loibor, J. M. (2019). Probability distribution analysis and forecasting of patients arriving at regional referral hospital Dodoma, Tanzania (during the year 2017-2018). (Master's Dissertation). The University of Dodoma, Dodoma. http://hdl.handle.net/20.500.12661/2408 Downloaded from UDOM Institutional Repository at The University of Dodoma, an open access institutional repository. PROBABILITY DISTRIBUTION ANALYSIS AND

FORECASTING OF PATIENTS ARRIVING AT

REGIONAL REFERRAL HOSPITAL DODOMA,

TANZANIA (During the year 2017-2018).

JULIUS MOINGET LOIBOR

MASTER OF SCIENCE IN STATISTICS

THE UNIVERSITY OF DODOMA

OCTOBER, 2019 PROBABILITY DISTRIBUTION ANALYSIS AND

FORECASTING OF PATIENTS ARRIVING AT REGIONAL

REFERRAL HOSPITAL DODOMA, TANZANIA (2017-2018).

BY

JULIUS MOINGET LOIBOR

A DISSERTATION SUBMITTED IN PARTIAL FULFILMENT OF

THE REQUIREMENTS FOR DEGREE OF MASTER OF SCIENCE

IN STATISTICS

THE UNIVERSITY OF DODOMA

OCTOBER, 2019 DECLARATION

AND

COPYRIGHT

I Julius Moinget Loibor declare that this thesis is my own original work and that it has not been presented and will not be presented to any other University for a similar or any other degree award

No part of this thesis/dissertation may be reproduced, stored in any retrieval system, or transmitted in any from or by any means without prior written permission of the author or the University of Dodoma. If transformed for publication in any other format shall be acknowledged that, this work has been submitted for degree award at the University of Dodoma”

i CERTIFICATION

The undersigned certify that they have read and hereby recommended for acceptance by the University of Dodoma dissertation entitled “ Probability Distribution

Analysis and Forecasting of Patients Arriving at Regional Referral Hospital

Dodoma, Tanzania” in partial fulfilment of the requirements for the degree of

Master of Science in Statistics of the University of Dodoma.

ii ACKNOWLEDGEMENT

God has been everything throughout my MSc Statistics studies. I thank almighty for the unparallel mercies, abundant strength and for His grace upon my life throughout my study in this University.

I am highly indebted to my supervisor Prof. Ramkumar T. Balan for his support, patience, encouragement, and supervision of this project. I say a very big thanks to you, sir, for your effortless services including daily meetings and time spent to make the dissertation a success, especially helping me to channel the idea of the study to fruitful results. My special appreciation goes to Ag. Deputy Principal, Dr. Peter

Kirigiti Josephat for his big clear guidance and assistance to us. I am also grateful to my Head of Department of Mathematics & Statistics, Dr. Jefta Sunzu for his assistance and his staff members for their adequate support and encouragement. I sincerely appreciate all the lecturers in the Department of Mathematics& Statistics, who contributed a lot in my studies.

I would like to take this opportunity to thank my parents and all my family members for their unconditional love and constant support throughout. My special thanks go to my brother Nginyei Moinget for his unlimited support and prayer during my study. Also, I would like to thank Mr. Kisioki Moitiko and Kisika Laizer for their support that had given to me during my study.

I would like to thank a lot to all my fellow MSc Statistics classmates and friends for their support, advice and full cooperation, which contributed a lot to the success in my studies. To the management of Dodoma regional referral hospital, I am sincerely grateful for their full cooperation that had shown to me during data collection. May the Almighty God richly bless you all.

iii ABSTRACT

Health care is essential to the general welfare of society. Studying the hospital patients‟ data distribution through the probability distribution analysis and forecasting time series model is very important in the health care system. This study has examined the hospital inpatients and outpatients' daily data for two years taken from DRRH through the hospital electronic health management information system.

This study seeks to identify comprehensively the appropriate statistical distributions on inpatient and outpatient data of the DRR hospital. Primary fitting of the distributions to inpatient and outpatient data was performed by the Easyfit 5.5

Profession statistical software. The software deals with 61 continuous distributions, including three goodness of fit test for raw data and two for frequency data.

Kolmogorov- Smirnov test, Anderson- Darling test and Chi-Square test only for raw data. The parameters of the selected distributions were estimated by the maximum likelihood method. The final selection of fittest distribution was done with respect to the minimum calculated value of log-likelihood and hence AIC and BIC values. The research work revealed that Generalized Extreme Value distribution is the best-fit distribution model for the hospital inpatient daily data. Also, the followed by Log logistic (3P) distribution was selected to be the best-fit distribution model representing the hospital outpatients‟ daily data. The study identified ARIMA

(1, 1, 0) model as the best predictive model for the daily average number of outpatients visiting the hospital outpatient department for two years. In order to prepare adequate facilities for the overwhelming outpatients in the outpatient department at the hospital, the DRRH administration should make use of the probability distributions and forecasted figures to plan further development activities for the hospital.

iv TABLE OF CONTENTS

DECLARATION AND COPYRIGHT ...... i

CERTIFICATION ...... ii

ACKNOWLEDGEMENT ...... iii

ABSTRACT ...... iv

TABLE OF CONTENTS ...... v

LIST OF TABLES ...... ix

LIST OF FIGURES ...... x

LIST OF APPENDICES ...... xi

LIST OF ABBREVIATIONS ...... xii

CHAPTER ONE ...... 1

1.1 Background of the Study...... 1

1.2 Statement of the Problem...... 5

1.3 General Objective of the Study...... 5

1.3.1 Specific Objective of the Study...... 5

1.4 Research Questions of the Study...... 6

1.5 The Significance of the Study...... 6

CHAPTER TWO ...... 8

2.0 Literature Review...... 8

2.1 Definitions of key terms...... 8

2.2 Empirical review of Probability Distributions from another study...... 10

2.3 Empirical review studies with Time Series Methods...... 13

2.4 Research gap...... 16

CHAPTER THREE ...... 17

3.0 Research Methodology...... 17

3.1 Research Approach...... 17 v 3.2 Research Design of the study...... 17

3.3 Study Area of the Research...... 17

3.3.1 Hospital Profile...... 18

3.4 Source of data collection...... 21

3.5 Statistical Distribution Models...... 21

3.5.1 Selection of certain family of probability distribution models...... 22

3.5.2 Probability density function and distribution function of selected distributions.23

3.6 Estimation of parameters of Probability Distributions...... 31

3.7 The criteria for choosing the best statistical distribution for the data...... 42

3.7.1 The Goodness of Fit test...... 42

3.7.2 Chi-square goodness fit test...... 43

3.7.3 The Akaike‟s Information Criterion...... 44

3.7.4 The Bayesian Information Criterion...... 44

3.8 Time Series Model...... 45

3.8.1 Autoregressive Integrated Moving Average (ARIMA) model...... 45

3.8.2 Stationary and Non-stationary time series...... 46

3.8.3 The Augmented Dickey-Fuller Test (ADF)...... 47

3.8.4 Differencing and Lag...... 47

3.8.5 Autocorrelation Function (ACF)...... 48

3.8.6 Partial Autocorrelation Function (PACF)...... 48

3.8.7 Mean Squared Error (MSE)...... 49

3.8.8 The Mean Absolute Percentage Error (MAPE)...... 49

3.8.9 Time series Diagnostic Checking...... 50

3.9 Forecasting...... 50

3.10 Data Analysis...... 50

3.11 Data Processing...... 51 vi 3.12 Reliability and Validity...... 51

3.12.1 Reliability...... 52

3.12.2 Validity...... 52

3.13 Ethical consideration...... 53

CHAPTER FOUR ...... 54

4.0 Findings and Discussions...... 54

4.1 Summary statistics of demographic characteristics of hospital patients...... 54

4.2 Descriptive Statistics for daily total inpatients data...... 56

4.3 Descriptive Statistics for daily total Outpatients data...... 57

4.4 Determination of the feasible distributions on daily total inpatients and total outpatients hospital data...... 58

4.4.1 Assessment of Goodness of Fit by Graphical Methods...... 62

4.4.2 Probability–Probability (P-P) Plot...... 66

4.4.3 Quantile- Quantile Plot (Q-Q) Plot...... 67

4.5 Estimation Parameters of the Selected Probability Distributions for both total inpatients and outpatients‟ hospital daily data...... 69

4.5.1 Identification of probability distributions and its properties of all selected distributions in both total hospital inpatients and outpatients data...... 72

4.5.2 The AIC and BIC interpretation for model selection...... 74

4.6 Prediction of the probability of occurrence of inpatients, outpatients level exceeding some limits in the hospital...... 79

4.6.1 Prediction of the probability of occurrence of outpatients‟ level exceeding some limits...... 83

4.7 Forecasting...... 88

4.7.1 Stationarity of time series data...... 89

4.7.2 Time Series ARIMA model Identification...... 90

4.7.3 Time series Model Parameter Estimation...... 92

vii 4.7.4 Forecasting the expected number of average outpatient per day in the next five weeks...... 93

CHAPTER FIVE ...... 95

CONCLUSION AND RECOMMENDATIONS...... 95

5.1 Summary of the results of the findings...... 95

5.2 Conclusion ...... 98

5.3 Recommendations ...... 99

5.4 Future area of the study ...... 99

REFERENCES ...... 101

APPENDICES ...... 105

viii LIST OF TABLES

Table 1. 1: Number of outpatients and inpatients for the past five years countrywide, 2013-2017...... 3 Table 3. 1: Departments, sections and units in the hospital ...... 19 Table 4. 1: Descriptive Analysis of Hospital Patients in DRRH (2017-2018)...... 55 Table 4. 2: Descriptive Statistics Daily total Inpatients Data ...... 56 Table 4. 3: Descriptive Statistics for daily total Outpatients data ...... 57 Table 4. 4: showing the test statistic for Gen. Extreme Value distribution of a total number of inpatients hospital data...... 58 Table 4. 5: Showing the test statistic for Dagum distribution of a total number of outpatients‟ hospital data...... 61 Table 4. 6: Estimated Parameters for suggested distributions representing the daily total number of inpatients‟ data from DRRH by Method of maximum likelihood estimation...... 70 Table 4. 7: Estimated Parameters of suggested distributions representing the daily total number of outpatients‟ data from DRRH by Method of maximum likelihood estimation...... 71 Table 4. 8: indicating the properties of all five distributions for total hospital inpatients data...... 72 Table 4. 9: indicating the properties of all nine distributions for total hospital outpatients‟ data...... 73 Table 4. 10: Model selection for a total number of inpatients in DRRH...... 75 Table 4. 11: Model selection for a total number of outpatients in DRRH...... 77 Table 4. 12: The measures of Positions of Patients Admitted in Hospital...... 87 Table 4. 13: ADF Test for the Differenced Average Daily Outpatients Visiting Hospital ...... 89 Table 4. 14: Time Series ARIMA model selection...... 90 Table 4. 15: Parameter estimates for ARIMA (1, 1, 0) model...... 93 Table 4. 16: Indicating 95% for the average daily total OPDs forecast values by the ARIMA model (1, 1, 0) for the next 5 weeks...... 94

ix LIST OF FIGURES

Figure 3. 1: Dodoma Regional Referral Hospital Layout Map...... 20 Figure 4. 1: Probability density curve of five preliminary selected distributions on inpatients daily data...... 63 Figure 4. 2: Cumulative density curve of the selected distributions for hospital inpatients daily data...... 64 Figure 4. 3: Probability density curves of nine selected distributions for a number of hospital daily outpatients‟...... 65 Figure 4. 4: Probability Plot of all the five selected distributions for fitting a total number of hospital inpatients daily data...... 66 Figure 4. 5: Q-Q Plot of five selected distributions fitting a number of inpatients daily...... 68 Figure 4. 6: Probability density curve of the Generalized Extreme Value distributions for hospital inpatients daily data in DRRH...... 76 Figure 4. 7: Probability density curve of Dagum distribution representing outpatient daily data in DRRH 2107-2018...... 78 Figure 4. 8: Probability graph of Gen. Extreme Value with P (X<30)...... 80 Figure 4. 9: CDF graph of Gen. Extreme Value with P (X<30)...... 81 Figure 4. 10: Probability graph of Gen. Extreme Value with P (50550)...... 87 Figure 4. 19: Residual diagnostic plots for differenced ARIMA (1, 1, 0) model...... 91 Figure 4. 20: Normality check of residuals for the ARIMA (1, 1, 0) model...... 92

x LIST OF APPENDICES

Appendix 1: Details of fitting other distributions for the inpatient's data...... 105 Appendix 2: Details of fitting other distributions for the outpatient's data...... 112 Appendix 3. Time series plots before and after differenced...... 121 Appendix 4. SAS CODES...... 123 Appendix 5. Introduction Letter...... 124

xi LIST OF ABBREVIATIONS

ACF Autocorrelation Function

ADF Augmented Dickey-Fuller

AD Anderson Darling

AIC Akaike Information Criterion

AR Autoregressive

ARIMA Auto-Regressive Integrated Moving Average.

ARMA Auto-Regressive Moving Average

BIC Bayesian Information Criterion

DRRH Dodoma Regional Referral Hospital e.g. For example etc. Et cetera (and so on) ie. That is to say

IPD Inpatient Department

LogL Logarithm Likelihood

MA Moving Average

MOHCDGEC Ministry of Health Community Development, Gender, Elderly and Children

ML Maximum Likelihood

MLE Maximum Likelihood Estimation

OPD Outpatient Department

PACF Partial Autocorrelation Function

PDF Probability Density Function

P-P Probability-Probability plot

xii Q-Q Quantile-Quantile plot

SAS Statistical Analysis Software

SD Standard Deviation

SPSS Statistical Packages for Social Science

TOTIPD Total Inpatients Department

TOTOPD Total Outpatients Department

xiii CHAPTER ONE

INTRODUCTION

1.1 Background of the Study.

Hospitals in general and healthcare systems in particular, are the major determinant of the quality of healthy life. In Tanzania, both government hospitals and private hospitals provide health care services to different groups of people, but most of the common people rely on public health services. The need for health care services has been rising in recent years due to the growth of both preventable and communicable diseases. As a result of lack of sufficient hospitals, human resources in the field of health, well organized operating system, patients in the government hospitals are suffering a lot. They have to spend excessively long times to get attended and fast dispensing due to overcrowded waiting for treatment so that patients are dissatisfied and grumbling. The outpatient and inpatient departments are core units in the hospital regulating the patients arriving at the hospital and so a scientific evaluation is needed to control the system.

Hospitals generally differ from other types of health centers offering medical services by their ability to admit and care for inpatients. In a hospital, the entry point starts at the outpatient department and the clients‟ moderator will set the standard facility that the patient can receive at the outpatient department. Globally, it is accepted that the first intuition will go parallel to the way people‟s perceptions about things happen and the place they reached.

Long waiting health services in outpatient departments in hospitals has become a problem in the healthcare setting all over the world (Kelaniya, 2014). According to the Tanzania National Economic Survey (2017), both outpatient and inpatient 1 attendance continues to increase countrywide with a current outpatient record of

4984645 and 1775835 inpatients in the year 2016 and it increased to 5266252 outpatients and 1650224 inpatients in the year 2017 (see table 1). Among several composite provisions of health care services, admission of a number of patients, the number of arrivals to the hospital was one of the very significant measures of healthcare utilization. In the brightness of these challenges, a need for review and improvement of our healthcare practices has become evident. Statistical probability model evaluation and Time series trend forecasting are scientific approaches to understand the overcrowding of patients in the hospital. Forecasting the number of patients visiting a hospital can be helpful in allocating limited human and material resources to the hospital (Sukmak, Thongkam & Leejongpermpoon, 2015).

Additionally, assessment of a number of outpatients visiting the hospital can help health care administrators in decision making and preparation for future events

(Hadavandi et al., 2012). One of the most important and widely used time series models is autoregressive integrated moving average (ARIMA) model (Juang W-C et al, 2017). The outcome of statistical probability distribution analysis helps the management to plan the system on infrastructure, healthcare services, medicine and suggest the future requirements. Statistical probability distribution study of the hospital will help to enhance the allocation of the exchequer for infrastructure, assignment of doctors, specialists, and other medical care servants, the volume of medicine needed as well as the transport and other facilities for the patients. It will reduce queuing of patients in the outpatient department and facilitate the emergency services. The inpatient statistics and its distribution are more essential to get proper accommodation and health service for all needful patients.

2 Table 1. 1: Number of outpatients and inpatients for the past five years

countrywide, 2013-2017

Outpatients Year Hospital Dispensary Health Inpatients Centre 2013 2903275 42682479 6873056 4529771 2014 5910725 14008692 475295 1665935 2015 4480781 25072487 6006466 1858956 2016 4984645 20859281 6515219 1775835 2017 5266252 21115639 7050725 1650224

Source: MOHCDGEC 2017.

Mehrannia and Pakgohar (2014) had shown the advantage of using software to fit data and interpreting probability. They are able to fit automatically a variety of known distribution patterns simultaneously on data using algorithms and programming. These techniques are preferred, especially in cases where little or no information about the base distributions /pattern in the data is not available and desired to find the best fit distribution (Mehrannia & Pakgohar, 2014). Though, to our knowledge, there are little distributions suitable for hospital patients‟ data of

Tanzania especially for Dodoma Regional Referral Hospital (DRRH). We are starting with a bunch of continuous probability distributions from a broad area of statistical distributions and from them, appropriate distribution is attained with statistical accuracy.

Dodoma Regional Referral Hospital is one of the public referral hospitals in

Dodoma city and it is located in the central part of Tanzania, which accommodates

420-bed facilities, average attendance of 350 outpatients per day as reported in the

Annual Assessment Report, External Hospitals Performance Assessment for 3 Regional Referral Hospital (2018). This study was analyzed data on patients‟ daily arrival from January 1, 2017, to December 31, 2018, to the Dodoma regional referral hospital. In recent years there is a continuous increase in the number of patients at the outpatient department and it causes long waiting time for health services and overcrowding of patients in DRRH.

The elementary befitting distributions were found using the EasyFit statistical software asserting the goodness of fit with Chi-Square, Anderson and Kolmogorov tests. The primary selected distribution analysis is conducted by P-P plot and Q-Q plot and the best fit is determined by the use of AIC or BIC or Log-likelihood values. There are sixty-one well developed statistical distributions entertained in the

EasyFit software and to a numerical continuous data, the best fit can be evaluated with suitable logic. The graphs of the probability density, distribution function, survival function, hazard function, etc. are available for the best fit model. Also, the basic descriptive statistics, facility to find the probability of the suggested distributions in the specified interval, etc. is part of the software.

This study points out the following continuous probability distributions-Beta, Burr,

Cauchy, Dagum, Gamma (3P), Generalized Extreme Value, Generalized Gamma

(4P), Johnson SB, Log logistic (3P), Nakagami, Rayleigh, Rayleigh (2P) and

Weibull distributions- as the basis to find the best fit probability distribution for number of inpatients and outpatients of DRRH. It attempts to find probability distribution for the patient daily arrival data to the hospital as well as probability distribution of daily stay in hospital. This study is a search to determine the most suitable probability distributions for a number of registrations of patients daily in the

4 hospital record, seeking services as well as the probability distribution of the number of patients staying in a day in the hospital to get care, service, and medicine.

1.2 Statement of the Problem.

Overcrowding of patients and long waiting time for public health care services like hospitals and health centers are well known as the major difficulty of hospitals in

Dodoma City especially in Dodoma regional referral hospital and it creates unhappiness and complaints among the communities. A numbers of researchers to date had tried to focus and study only on statistical queuing models, statistical regression models or descriptive statistics to evaluate and forecast the number of patients arriving in a hospital (Lane et al., 2003; Whitt & Zhang, 2017) rather than use of statistical probability models. It is very important to know the probability distribution of the number of patients arriving in a hospital for exact understanding and maintaining the quality of health care. So far studies conducted are seldom in

Tanzania to distinguish the distribution pattern of inpatients and outpatients by which the weekly, monthly and yearly patient pattern can be traced and exact probability can be evaluated. Therefore, this study is an attempt to find the probability pattern of arrival of patients and the needed patients kept in the hospital.

1.3 General Objective of the Study.

The general objective of the study is to identify comprehensively the appropriate statistical distributions on inpatient and outpatient data of the DRR hospital.

1.3.1 Specific Objective of the Study.

i. To determine the feasible statistical distributions on daily inpatient and

outpatient hospital data.

5 ii. To find the appropriate statistical distribution depicting estimates of the

parameters for the outpatient and inpatient distributions.

iii. To predict the probability of occurrence of inpatient, outpatient levels

exceeding some limits.

iv. To predict the average outpatient size per day in the next five weeks.

1.4 Research Questions of the Study.

This study is aimed to address the following questions:

i. What are the plausible probability distributions identifying the hospital

inpatient and outpatient data?

ii. What is the optimum probability model and what are the estimates of the

parameters of such a model?

iii. What is the probability of a number of patients exceeding a specified

level?

iv. What is the predicted number of patients per day in the forthcoming

weeks?

1.5 The Significance of the Study.

This study will be useful to generate infrastructure facility as well as human resource to the management of Dodoma regional referral hospital to satisfy the primary needs of the health care system of common people of Dodoma. The probability distribution is an eye-opener to the authorities on the average and dispersion of outpatients and the level of incoming patients in a year. The outcomes are expected to help them to make advanced planning in terms of manpower and logistical requirements for better service delivery to the satisfaction and expectation of the patients. This study will serve as a basis for the researchers to develop new statistical models required by the

6 government and health organizations. It will be a benchmark to study other hospital patient data to acquire new quality services in the healthcare sector. This study will be significant as there was no previous study, to the best of our knowledge, has been carried out with similar methodological statistical distribution analysis of patients arriving at a hospital by fitting parametric probability distributions and so it is more authenticable. It will be beneficiary to reduce the excess expenditure incurred by the government by appropriate assessment of patient output. The prediction model is more beneficiary to introduce logical planning of the hospital facilities.

7 CHAPTER TWO

2.0 Literature Review.

This section provides with the overview of the definition of key terms used in this study and also the discussion on empirical studies of statistical probability distributions as well as empirical studies on time series methods.

2.1 Definitions of key terms.

The following are the definition of significant terms which is commonly appeared in this study.

Distribution is prearrangement of the principles of a variable showing observed or hypothetical frequency of occurrence.

The parameter is the quantity that symbolizes a numerical population and can be valued by calculations from sample data.

A probability distribution is the list of probabilities associated with each of its possible values, for a discrete random variable.

Patients are the people who are seeking the healthcare services at both outpatient department and inpatient department of the various units in the hospital.

Outpatients are the patients arriving at a hospital either for diagnosis or treatment and return back home after getting the service.

Inpatients are those admitted and stay overnight or for some days, weeks or months in the hospital taking medicines and other services.

8 Referral hospital is a public hospital or government hospital, funded by the government for daily operation. It provides medical care at a subsidized rate and the running cost is covered by the central government.

Likelihood function. Definition: Assume that, 1,  2,..., n have a joint density

function f 12,  ,..., n   . Given X11  , XX22,..., nn is observed, the

function of  is defined by L  L  1,  2 ,..., nn f   1 ,  2 ,...,   is called the likelihood function.

Maximum Likelihood Estimator (MLE) - This is the value of the parameter that makes the observed data most likely to have occurred given the data generating process assumed to have produced the observations.

Akaike Information Criterion (AIC)-This is defined by (Anderson and Burnham,

2004) as a way of selecting a model from a set of models. The selected model is the one that minimizes the kullback-Leibler distance between the model and the truth.

Bayesian Information Criterion (BIC) - This is defined by (Schwarz, 1978) as a criterion for model selection among a finite set of models. It is based on the likelihood and it is closely related to Akaike information criterion.

EasyFit is statistical evaluating simulation software, which allows to fit probability distributions for a given data and to choose the best fit sample and implement the results of the analysis to make a better decision (Lairenjam et al, 2016).

Forecasting is one of the principal steps in the development process. The achievement of the plans depends on the accuracy of the forecasts (Hasibuan, 2011).

9 In facilities industries like the hospital, there are many plans that depend on the forecast, from the capacity planning to aggregate planning, from layout decisions to daily schedules.

ARIMA model is an advanced technique presented by Box and Jenkins (1976) and until now becomes the most popular method to forecast univariate time series data.

d A time-series { Xt} is an ARIMA model of p, d, q, if  t is an ARMA of order p, q, ARMA (p, q) that is if the series {Xt} is differenced d times, and it follows an

ARMA (p, q) process, then it is an ARIMA (p, d, q) series.

Box-Jenkins Method. The Box-Jenkins methodology refers to the set of procedures for identifying, fitting and checking the ARIMA model with time-series data (Box &

Jenkins 1976). Box- Jenkins forecasting models consists of four-step iterative procedures such as model identification, model estimation, model checking, and model forecasting.

2.2 Empirical review of Probability Distributions from another study.

This section provides with an overview of some studies using statistical distributions, how they have been identified the best fit probability distribution on given data and the methods of attaining the best fit from a collection of probability distributions. Statistical distributions modeling and fitting traditionally were developed and used in the area of actuaries. However lately in other areas like health insurance, non-life insurance, climate-based rainfall, engineering, etc. employed distribution analysis for modeling and fitting the probability distributions for detailed analysis. The idea of statistical distribution modeling is mostly applied in claim data and cost reduction processes and it has been widely discussed from the

10 view of many researchers and institutions relating to a different probability distribution (Adeleke & Ibiwoye, 2011). A study was conducted in Ghana on motor insurance by comparing Negative and to determine which of the distribution is best to fit the data (Dadey et al; 2011). This study reveals that the Negative Binomial probability distribution is the best to fit the insurance data rather than a Poisson probability distribution. A study was conducted by Hollervik and Rodgers( 2007) to assess the statistical distributions on length of stay of patients in the hospital by fitting selected statistical probability distributions such as , , , and

Lognormal distribution. This study made use of a graphical method, Probability

Plot, for assessing whether the length of stay of patients in the hospital follows some specific defined statistical probability distributions. Also, it uses formal tests-

Kolmogorov Smirnov test and Anderson Darling test -for testing goodness of fit to see whether the data on length of stay in hospital follows a specified probability distribution. The findings of their study discovered with high significance that the lognormal distribution is the best approximated one regarding the length of stay in hospital data. Prieto et al. ( 2014) had conducted a study to model the major failures in power grids in the whole range and their main objective was to find probability distributions that could be best for fitting the electricity transmission network reliability data. The study fitted six statistical distributions, using the methods of maximum likelihood to estimate the parameters and selected the best fit by Bayesian

Information Criterion. The study also tested the goodness of fit of those statistical distributions, using the Kolmogorov-Smirnov test based on bootstrap resampling.

The findings of this study have come out with the two best statistical probability

11 distributions, Pareto II distribution, and lognormal distribution for fitting the electricity transmission data.

Omari, Nyambura and Mwangi (2018) had conducted a study on modeling the frequency and severity of insurance claims by using statistical distributions. Their major objective was to find a suitable statistical distribution that fits the non-life insurance data. Their study had used the method of maximum likelihood to estimate the parameters of the selected statistical distributions, using the Chi-square test for checking the goodness of fit for the claim frequency distributions, whereas the

Kolmogorov- Smirnov, and Anderson- Darling tests for testing the claim severity of the selected statistical distributions. Also, they used the Akaike information criterion to choose the best fit within the selected statistical distributions. The findings of their study showed that the lognormal distribution was the best distribution fitting the claim size of insurance. On the other hand, Negative Binomial distribution and

Geometric distribution are selected as the best distributions for fitting the frequency data of claims in comparison with other statistical selected distributions.

SAS Global Forum ( 2010) has done a work attempting to model claim of severity and came out with interesting key terms when fitting probability distributions for the severity of random events. The examples of the events they had used includes with the negative impact such as the distribution of insurance loss claimed under insurance policies, the severity of damage caused by other factors and the events with the positive impact such as sizes for products characterizing demand. The study had used the procedures of estimating the parameters of any continuous statistical distributions that are used to model the severity of a continuous event of interest.

Therefore, the findings of their study revealed that the severity of an event does not

12 follow any selected statistical distributions of the study. Sukono et al ( 2018) carried out a study on estimation of claims risk model and motor vehicle insurance premiums by using Bayesian approach by fitting the selected distribution and also used the method of maximum likelihood to estimate the parameters of the chosen distributions. According to their study, the frequency of claims follows a Poisson distribution while the number of claims assumed follow Gamma distribution.

Lavanya, Radha, and Arulanandu (2018) had done a study on the fitting of seasonal rainfall data by using seven selected statistical distributions such as Weibull,

Gamma, Lognormal, Generalized extreme value, Normal, Exponential and in order to find out the appropriate probability distribution which best fits the rainfall data. They used the method of maximum likelihood to estimate the parameters of the eight selected distributions and used Chi-square test, Kolmogorov-

Smirnov test and Anderson test for testing the goodness of fit for those selected distributions. At the end of their study, they found that the generalized extreme value distribution was considered as the best fit distribution for seasonal rainfall data.

2.3 Empirical review studies with Time Series Methods.

Time series modeling usually was established and used in the area of econometrics.

However, there are currently works in other areas such as biomedicine, meteorology, health, etc. employing time series techniques for modeling and forecasting. For example, Suleman and Sarpong (2011) had conducted a study in the area of health and used time series analysis to model and forecast hypertension cases in Ghana.

They had used the Box-Jenkins approach to model hypertension over a daily data spanned from 2000 to 2010. The data was represented by the ARIMA model developed by Box and Jenkins (1976). Therefore, the findings of their study had

13 come up with ARIMA (3, 0, 2) model and concluded that it was adequate for forecasting hypertension.

Tireito et al (2015) had carryout a study in the area of meteorology and used the time series model and forecasted monthly rainfall of UasinNgishu County in Kenya.

They have used the Box and Jenkins methodology to model and forecast ARIMA model on historical rainfall data of UasinNgishu County of 456 months from

January 1977 to December 2014. The study ended with SARIMA (0, 0, 0) (0, 1, 2)12 model as the best fit and projected the average expected monthly rainfall for the next two years. Sukmak, Thongkam & Leejongpermpoon (2015) conducted a study using time series forecasting on visits of Anxiety Disorder outpatients at the Centre using

Data mining. The main objective of their study was to forecast the number of anxiety disorder patients who are seeking treatment at an outpatient clinic in 2011 by comparing two Artificial Neural Network (ANN) models and selecting the most powerful model. The demand forecast model was constructed over the time series data from January 2007 to December 2010 and the model accuracy was evaluated via Mean Absolute Percentage Error (MAPE) and Radial Basis Function (RBF) was selected to be the best fit. Hassan et al. (2014) carried out a study on the wholesale market price of rice and used time series analysis to fit a seasonal ARIMA model and forecast whole price based on monthly data from July 1975 to 2011. SARIMA

(1, 1, 1)(0, 1, 1)12 was the best fit by considering the model with minimum values of

Root Mean Squared Error, BIC value and MAPE and the diagnostic check of the fitted model was conducted by Ljung-Box test. A similar study was conducted by

Adejumo and Momo (2013) on the retail and wholesale price of rice in Nigeria and come up with ARIMA (2, 1, 1) model representing the best fit. Box-Jenkins

14 methodology was applied to forecast both imported and domestic prices of rice for which ARIMA (2, 1, 1) had the smallest Mean Square Error.

P. Rotela Junior et al. (2014) applied ARIMA time series forecasting model for the

Bovespa Stock Index in Brazil. The main object of their study was to evaluate the performance of the ARIMA model for time series forecasting of Ibovespa. Their study also utilized mathematical modeling with the help of the Box-Jenkins method.

In order to compare results with other smoothing models, the error of evaluation

MAPE was used. At the end of their study, the findings, results, and conclusion showed that the ARIMA model shows lower MAPE value, ensuring the suitability of the ARIMA model for stock market index forecasting. Juang W-C et al. (2017) had applied time series modeling and forecasting for emergency department visits in a Medical Centre in Southern Taiwan. The study had shown that the final suitable

ARIMA (0, 0, 1) model was selected as the best fit and it has the smallest value of the MAPE 8.91%.

Kibona and Mbago (2018) they had applied time series ARIMA model for forecasting wholesale prices of maize in Tanzania. The major objective of their study was to modeling and forecasting whole prices of maize in Tanzania for data from February 2004 to August 2017 obtained from the Bank of Tanzania. Their study also employed mathematical modeling with the help of the Box- Jenkins method. Therefore, the results of the findings of their study had concluded that the

ARIMA (3, 1, 1) model was selected as the best model for maize wholesale prices based on the minimum value of AIC, BIC and the fitted model was found to be adequate using Ljung-Box test.

15 2.4 Research gap.

From the reviewed literature reviews and available resources, it is clear that many previous researchers who conducted studies on patients‟ arrival in the hospital had concentrated on focusing statistical prediction models to predict and forecast the number of patients arriving in a hospital. Seldom at all have attempted to use statistical distribution theory to evaluate the number of patients arriving in a hospital within a limit or exceeding a limit. Therefore, this study will fill the gap to understand the probability of a happening number of patients in regular days and exemption days as well as the cut-off numbers of patients at a certain prefixed level.

Also, an attempt was done to test whether the hospital patient‟s arrival data follows a certain probability distribution by testing the goodness of fit. Thus statistically acclaimed best-fitted probability distribution on hospital incoming data is proposed and it was a fresh challenge to assess the characteristics of the data by theoretical basis.

16 CHAPTER THREE

3.0 Research Methodology.

This chapter highlights the research approach and the research design used in this study. It explains about the study area of the research, source of data collections, the statistical distribution models, time series models as well as the data analysis applied in this study.

3.1 Research Approach.

The approach of empirical research adopted for this study was quantitative research technique as to achieve the main objective of the study and it makes use of the secondary data on the daily incoming number of patients to the hospital for analysis.

3.2 Research Design of the study.

The research design states toward the whole plan about what the researcher will be done to answer the topic of research questions (Saunders, Lewis & Thornhill, 2012).

This study was used as a retrospective research design using secondary data of patients arriving at the Dodoma regional referral hospital through the electronic health management information systems. Particularly this design was aimed to identify comprehensively the appropriate statistical distribution models on the arrival data of the patients for treatment in the hospital and predict the number of daily patients.

3.3 Study Area of the Research.

The present study was carried out at Dodoma regional hospital in Dodoma city which is one of the largest hospitals in the central part of our country. Dodoma regional hospital is a referral hospital with specialized services, receives referral

17 cases from all surrounding district hospitals and other neighboring regional hospitals like Singida and Manyara regional hospitals, sending their complex cases for specialized care. Recently there is a continuous increase in the number of patients at the outpatient department leading to the long waiting health services times and overcrowding of patients in Dodoma regional referral hospital. Also, Dodoma region nowadays has reached a significant increase in population due to the establishment of central government institutions and offices. Also, high learning institutions, business centers, and foreign visitors have significant implication on the quantity and quality services in all sectors, particularly in the health sector.

3.3.1 Hospital Profile.

The referral system at Dodoma regional hospital has started from the level of the dispensary to health Centre, to the district hospital and up till now to the level of

Regional Referral Hospital. Some cases which need further management and proper diagnosis which are not found at the Dodoma regional hospital will be referred to

Muhimbili National Hospital, Dar es Salaam. The districts which do not have a district hospital, use Dodoma regional referral hospital as their district hospital. The

Dodoma regional referral hospital offers the services in various departments such as outpatients, surgical and pediatric departments and so on as shown in the table below:

18 Table 3. 1: Departments, sections and units in the hospital

Department Section/Units Outpatient department Causality, internal medicine clinic, surgical clinic, RCH clinic, Diabetic clinic, CTC, TB/Leprosy, Dental clinic, Eye clinic & ENT clinic Surgical department Operating theater, CSSD, Male surgical ward, Female surgical ward Internal medicine Male medical ward, Female medical ward & ICU Pediatric department Pediatric ward & NICU Obstetrics and Labor room, Antenatal ward, Postnatal ward & Gynecology Neonatal ward. Orthopedic Orthopedic & Physiotherapy Pharmacy Dispensing and Pharmacy store Radiology CT, Ultrasound, and X-ray Laboratory Clinical laboratory and Mortuary Administration Medical record, Registry, Accounting, General store (procurement), Workshop (medical engineering) & IT Source: DRRH Health Secretary Office (2019).

19 Figure 3. 1: Dodoma Regional Referral Hospital Layout Map.

Source: DRRH Health Secretary Office (2019).

20 3.4 Source of data collection.

Generally, there are two kinds of the source of data collections such as primary and secondary data collection. Primary data is the first-hand data collected from the field and they are used to analyze and find the answer of objectives. Secondary data is the second-hand data information that has been previously published in journals, books, online system and also has been used for a different purpose (Kothari, 2004). This study was used daily secondary data of patients arriving in the hospital from the

Dodoma regional referral hospital. This daily secondary data on patients was taken from the hospital electronic health management information system database by retrieving and filtering the number of the patient arrived in hospital and admitted in the hospital on each day. The data was collected for two years from January 1, 2017, to December 31, 2018, on daily arrival and inpatients in records of secondary data which comprised of 730 days.

3.5 Statistical Distribution Models.

White and Bennetts (1996) have been explained that a good statistical model is one that provides a good approximate mathematical representation of the data being modeled with particular highlighting being on the composition or patterns in the data. The practice involves the function of appropriate statistical distribution analysis and modeling of data have been increasingly important in scientific research and study inquiries and coming up with valid conclusions (Daniel, 2014).

In this section, the statistical distribution modeling process of the selected distributions after initial analysis of the hospital patients‟ data, conducted through the use of EasyFit software was described. That is the assessment of primary statistical distributions befitting the patient data subjected to sixty-one (61)

21 continuous probability distributions suggested in the EasyFit software. Kaishev and

Krachunov ( 2010) have been recommended the necessary four steps- statistical methods- to be followed for fitting an appropriate statistical distribution to any given data. The steps are:

1) Selection of the certain particular distribution family

2) Estimation of parameters of the chosen fitted distributions

3) Specification of the criteria to select the appropriate distribution from the

family of Distributions

4) Testing the goodness of fit of the approximate distributions.

3.5.1 Selection of certain family of probability distribution models.

Selection of suitable probability distribution is an essential step before any statistical analysis and definitely, it is central to the pursuit of science in general (Kadane &

Lazar, 2004). Most empirical economic research, that involves the specification, estimation, and evaluation of the statistical models based on certain criteria, will choose an appropriate distribution for the data. So here for the first step of selection of probability distributions, and its initial analysis of Dodoma regional referral hospital (DRRH) data, it was conducted through EasyFit 5.5 Professional software.

Running the program, the software will give the ranks of fit of each probability distribution with its ML parameter estimates and goodness of fit of the distributions.

From this list, one can draw the suitable number of distributions needed as per ranks.

For the raw data, three tests were applied to find the goodness of fit of the distributions -Chi-Square test, Anderson Darling test, and Kolmogorov Smirnov test

-and the ranks were given in their respective order separately. For the frequency data, the Chi-Square test is not adopted in this software. From the suggested

22 distributions the selection of first five ranked probability distributions was done on each goodness of fit criterion and finally, a list of suggested distributions is made.

Now it is necessary to find the likelihood function and hence Akaike Information criteria based on the sample to distinguish the best fit. The common statistical distributions in daily total inpatient data and outpatient data were the followings: 1)

Beta distribution, 2) , 3),4) Dagum distribution,5)Gamma 3P distribution, 6) Generalized Extreme Value distribution, 7)

Generalized Gamma 4P distribution, 8) Johnson SB distribution, 9) Log logistic 3P distribution, 10) Nakagami distribution, 11) ,12) Rayleigh 2P distribution and 13) Weibull distribution.

The parameters of these distributions, probability plot, distribution function, survival function, hazard function, etc. are available in the Easyfit software. Also, the basic characteristics of the data like Mean, SD, Quartiles, and Deciles were also given for the suggested models. It is also efficient to calculate required probabilities of any intervals of the range and open-ended classes of the distribution with estimated parameters. At the various level of significance, the acceptance and rejection of the suggested distribution to the given data are mentioned so that it is easy to identify the goodness of fit of the distribution.

3.5.2 Probability density function and distribution function of selected

distributions.

The probability function can be either discrete or continuous and for the continuous random variable, it is called probability density function. In probability theory, a probability density function describes the relative likelihood of the random variable at a given value. A continuous probability distribution is usually defined by its

23 distribution or by density function. The concept of statistical distribution in modeling a data study is particularly very useful because of its significant results

(Ramberg el at; 1979). The probability density functions of all the selected probability distributions after initial analysis of the data were given below:

(i) .

This is one of the probability distributions selected for fitting the total inpatient data.

The Beta distribution has both an upper and a lower bound and generally Beta distribution is defined over the interval (0 1). However, Beta distribution can be transformed to any interval (a b). Let X is a random variable which follows the Beta

distribution with two parameters 1 and 2 , then the probability density function is:

11 1 ab 12  f ()  …………………… (1) 121 12,  ba 

Where the parameter 1 is shaped parameter, 2 is a continuous , a

& b are the boundary parameters (a

Beta distribution has the mean and :

1 12 E    and the V    2 ……………… (2)  12 1  2   1   2 1

The cumulative density function of Beta distribution is:

FI  2  1,  2  , …………………………………………… (3)

Where 2 is the regularized incomplete Beta function.

(ii) Burr distribution 24 This probability distribution is matching to the inpatient data. Let χ be a random variable following Burr distribution with three parameters , k, and  then the probability density function of Burr distribution is:

 1     f    …………………………………….. (4)   1   1  

Where is the shape parameter  0 ,  is location parameter  0 and  is the scale parameter  0 with 0  .

The cumulative density function of Burr distribution is:

   F   11   ……………………………………… (5)  

(iii) Gamma (3P) Distribution.

Three parametric Gamma distributions represent the total inpatients of hospital daily data. Let χ be a random variable following the gamma distribution with parameters

, and  then the probability density function (pdf) is:

 1        fe,  ,  ,    ,   ,  0,  0 ………… (6)   

Where is the location parameter,  is the shape parameter and  is the scale parameter.

The Cumulative Density Function (CDF) of gamma distribution (3P) is:

25     F     ……………………………………………….. (7)  

Where  is the Gamma function and  z is the incomplete Gamma function.

(iv) Generalized Gamma (4P) distribution.

Total inpatients arriving at the hospital in a day is identified by this distribution.

A random variable Y follows Generalized Gamma distribution with four parameters

,,   and , then the probability density function (pdf) is:

  yy  1    fy  exp  ………………………. (8)       

Where the parameters , are shape parameters ,0 ,  is the scale parameter  0 ,  is the location parameter  0 and  y 

The cumulative density function of Generalized Gamma distribution is:

  y      Fy    …………………………………………. (9)  

(v) Generalized Extreme Value Distribution.

This probability distribution also fits the patient‟s daily arrival at the hospital. Let a random variable χ is distributed as Generalized Extreme Value distribution with parameters ,  and  then its probability density function is:

1 11   f( ) exp  1   z  1   z ,   0 …………………… (10)  

26 1 expzz  exp  ,  0 ………………………………………. (11) 

  Where z  , 10  for   0 ………………………...... (12)  

 , for   0 and the parameter  is shaped parameter,  is scale parameter and  is location parameter.

(vi) Cauchy Distribution.

This is another probability distribution for the patient‟s arrival at the hospital. Let χ be a random variable follows Cauchy distribution with parameters  and  then its probability density function is:

1 2  f ( )  1 ………………………………………. (13)  

Where the parameter  is scale parameter,  is location parameter and

 

The Cumulative Density Function of Cauchy distribution is:

1  F( ) arctan 0.5 ……………………………………… (14) 

(vii) Dagum Distribution.

Dagum distribution is one of the primarily fitted distributions to represent the total outpatient data. Let χ be a random variable which follows the Dagum distribution with three parameters  ,  and  then the probability density function (pdf) was:

27  1     f ()   1 …………………………………………… (15)   1  

Where parameter represents the shape,  measures scale implies location parameter and 0  

The cumulative density function of Dagum distribution is:

   F( )  1   …………………………………..………… (16)  

(viii) Johnson SB Distribution.

This is a distribution befitting patient‟s arrival at the hospital. A random variable χ follow the Johnson SB distribution with four parameters, the probability density function is:

2 1    f ( ) exp     ln  …… (17) 2    2         

Where  ,,    ,  0,   and  0, respectively.

(ix) Log-Logistic (3P) distribution.

The arrival of the patients to the hospital is identified by this distribution. A random variable Y follows the Log Logistic (3P) distribution with three-parameters ,  and  with the probability density function:

1 2 yy      fy( ) 1 ……………… (18)          

28 Where the parameter  explains the shape,  shows scale and  is a location parameter such that y  . The corresponding cumulative density function of

Log Logistic (3P) distribution is:

 1  Fy( ) 1  …………………………… (19) y  

(x) Nakagami distribution.

This is another continuous probability distribution fitted to the total outpatients‟ data. Let a random variable χ distributed as Nakagami distribution with parameters

 and  , then the probability density function is:

m 2 2m 1m 2 f ( )m  exp  ……… (20) m  

Where m  0.5,  0 and 0   .

The cumulative density function of this distribution is:

 m 2 F()  m ………………………………………… (21) m

(xi) Rayleigh distribution.

This distribution for the hospital data shows inpatients staying pattern and outpatients‟ arrival pattern. A random variable Y follows the Rayleigh distribution with parameter  as the probability density function is:

2 yy1  fy( ) exp …………………………… (22) 2  2 

29 Where y 0, 0.

The cumulative density function of the Rayleigh distribution is:

2 1 y Fy( ) 1  exp  …………………………… (23)  2 

(xii) Rayleigh (2P) distribution.

Rayleigh (2P) distribution is a continuous probability distribution narrating patients‟ daily arrival at the hospital and the stay at the hospital. Let a random variable χ is distributed as Rayleigh (2P) distribution with parameters  and  if the probability density function is:

2  1    f ( ) exp ……………. (24) 2  2 

Where the  is scale parameter ( 0),  is location parameter ( 0 ) and

 . The corresponding cumulative density function distribution was:

2 1  F( ) 1  exp  ……………… (25)  2 

(xiii) Weibull distribution.

Daily outpatients and the inpatients of the hospital data is depicted by Weibull distribution. A random variable Y follows the Weibull distribution with  and  as parameters has the probability density function:

y    1  f y  y e , y 0, ,0 ………………………… (26)  

30 Where is the shape parameter and  is the scale parameter. The Weibull distribution has the mean and variance:

 1  E(y)  1     ,

 2  2  2  2  1  V (y)   1    1   ………………. .………….. (27)        

All the PDF and CDF from equations (1-27) was obtained from EasyFit statistical software Help Option.

3.6 Estimation of parameters of Probability Distributions.

Estimation of the parameters for each of the chosen statistical probability distributions for hospital patients‟ data was done after an initial analysis of the data through EasyFit 5.5 Professional software. After estimating the parameters of the chosen probability distributions, further analysis of the fitted distribution will be allowed to proceed. In this present study, the parameters of the selected statistical probability distributions were estimated using the method of the maximum likelihood technique. Method of the maximum likelihood is a usually useful technique of estimation in a variety of problems. The method of maximum likelihood is often yielding better estimates compared to the other methods of the estimation like a method of the Least-squares, method of moments especially when the sample size is large.

Assume that, x1, x2, x3 ,…, xn is a random sample of independent and identically distributed observations drawn from an unknown population. Let χ =x denote a realization of a random variable χ with probability density function f(x, θ), where θ is a scalar of unknown parameters to be estimated. The objective of statistical 31 inference is to infer θ from the observed data. The maximum likelihood estimate requires primarily the likelihood function of a random variable. The likelihood function L (θ) is the joint probability density function of the observed data expressed as a function in terms of parameter θ. Given that x1, x2, x3 ,…, xn have a joint density function f(xi/θ) for every observed sample of independent observations

n i=1,2,3,…,n, then the likelihood function is given by ( Omari, Lf()(,)  i  i1

Nyambura, & Mwangi 2018).

The MLE  of a parameter θ is obtained by maximizing the likelihood function L

(θ). Since the logarithm of the likelihood function is a monotonically non-decreasing function of „x‟ maximizing L (θ), it is equivalent to maximizing logL (θ). Ie

n l( ) log L (  ) log f (  ,  )  i . ……………………………………. .(28) i1

It is needed to derive the MLE‟s for the set of parameters of the probability distributions by solving the normal equations obtained by maximization of logL (θ).

But ML estimate of the parameters for the given set of data is readily available with the help of EasyFit 5.5 software and SAS 9.4 software. With the calculated

Likelihood value for the given data, one can find Akaike‟s Information Criterion and

Bayesian Information Criterion and it is useful to compare the goodness of fit of the models.

The probability density functions and the maximum likelihood estimates of parameters of respective selected distributions with respect to hospital inpatient and outpatient data were given below;

(i) Burr distribution.

32 The pdf of the Burr distribution is

 1    f     …………………………………… (29)   1   1  

The likelihood function of the Burr distribution becomes,

 1   n  l ,,      ………………………. (30)    1 i1   1  

Taking the logarithm to the likelihood function:

 nn    logl ,  ,   n log   n log   n log   (   1) log  (   1) ln 1  …(31)     ii11   

The ML estimators of ,  and  can be obtained by differentiating the above log-likelihood function w.r.t. , , and equate it to zero.

 l ,,    n n   log 1   0 ……………………… (32)    i1  

      log   l ,,    n nn      log  (  1)  0 . (33)     ii11  1  

n  l ,  ,  nn  1  ( 1)       0 …… (34)   i1   

Solving the equations the MLEs are available but EasyFit 5.5 software gives the value of estimates along with its characteristics mean, SD. 33 (ii) Gamma (3P) distribution.

The pdf of three parameters gamma distribution is

 1        fe,  ,  ,    ,   ,  0,  0…… (35)   

The likelihood function of the above pdf becomes:

 1 n       l ,  ,   exp …………………. (36)      i1    

Log-likelihood function is

nn i   logl ,  ,   (   1) log i    n  log   n log     …… (37) ii11

MLE of ,  and  can be obtained by differentiating the log-likelihood function and equating it to zero and solve the system equations.

l ,,    n  logi   nn log       0 . (38) i1

l ,,    n i   n   0 …………………… (39)  i1

n 1 l ,,    1  i   n   10  ……… (40)  i1

But EasyFit 5.5 software directly calculate the estimates for the given data and the values are available by running the program.

(iii) Rayleigh distribution.

The pdf of the Rayleigh distribution with one parameter is

34 2 1  f  exp ……………………………… (41)   2  2 

The likelihood function of the Rayleigh pdf is:

2 n 1  l  iiexp ……………………………. (42)    2  i1 2 

n n 2  1 i logln  log i  2 log    …………… (43) i1 2 i1 

MLE of the parameters are given by EasyFit 5.5 software for given data.

(iv) Weibull distribution.

The Weibull distribution is a continuous statistical distribution commonly used to model the lifetimes of components. The pdf of two parameters Weibull distribution is

y    1  f y  y e , y 0, ,0 ………… (44)  

Where  and  are the parameters.

Likelihood function and log-likelihood function to the pdf are:

y n   l(,)  y 1 e    …………………………. (45) i1 

nny logl ( ,  ) n log   n  log   (   1) log yi    ii11 …………… (46)

Differentiating w.r.t. δ, β and equate to zero of the above log-likelihood function.

ln(  ,  )nn 1  nlog  log y  y  0 …… (47) ii  ii11  35 ln(,)     n ,  y 1  0……………………. (48)  1  i    i1

To obtain the MLE of  and  we use the Newton Raphson method by solving the equations but EasyFit 5.5 and SAS 9.4 software directly give the MLEs for the given set of data.

(v) Cauchy distribution.

The pdf of the Cauchy distribution with two parameters is:

1 2  f ( )  1 …………………………………. (49)  

The likelihood function is:

1 2 n  l ,1    i ………………………………. (50)   i1 

The log-likelihood function of Cauchy distribution becomes:

n logl ,  n log   n log   log 2     2 ………. … (51)      i   i1

MLE of the parameters are obtained by solving the following equations.

l ,2  n    i 0………………………… (52)   2 2 i1  i 

l ,  n n 2    0 …………………… (53)  2 2 i1  i 

EasyFit 5.5 Professional software gives the values of estimates directly.

(vi) Beta distribution.

The pdf of the Beta distribution with two parameters is:

36      ab1211    f    12 ii…………………… (54) 121 12   ba 

The likelihood and log-likelihood function of Beta distribution become:

n      ab1211    l ,   12 ii 12  121 i1    ba 12   …………………………. (55)

logl1 ,  2  n log  (  1   2 )  n log   1  n log   2  nn 12 1 log  abii    1 log    ii11 n  1 log b  a  12   …………………….. (56)

MLEs of 1 and 2 are obtained by finding the derivative of log-likelihood, setting the partial derivative equal to zero and solving the likelihood system of equations.

n  logl nn'(1   2 )  '(  1 )   log(i a )  n log( b  a )  0 .. (57) 1 ()()  1   2   1 i1

n  logl n'12 n'(2 )   log(b i )  n log( b  a )  0 .. (58) 2   1   2 ()  2 i1

But parameters estimates are readily obtained with the help of the EasyFit 5.5.

(vii) Rayleigh (2P) distribution.

Consider the pdf of the Rayleigh (2P) distribution with two parameters as defined in equation (24), then likelihood function is given as:

2 n   1  l ,i expi …………………… (59)    2  i1 2 

37 Log-likelihood function becomes:

2 nn1  logln ,   2 log   log     i … (60)    i   ii112 

MLEs of  and  is found by the derivative of the log-likelihood with respect to each parameter and equating the partial derivative equal to zero and also solves the likelihood system equations.

2  logl  ,  n   20n i  …………………… (61)  i1 

2  logl  ,  nn  log  i  0 ……….. (62)  i   ii11

But with the support of the EasyFit 5.5 software, the MLEs of parameters will be computed.

(viii) Nakagami distribution.

Consider the pdf of the Nakagami distribution with two parameters:

m 2 2m 1m 2 f ( )m  exp  ……………………….. (63) m  

The likelihood function of Nakagami distribution is:

n m 2 mm 2m 1   2 lm ,    exp    ………… ….. (64) i1 m      

Log-likelihood function is:

logl m ,  n log 2  nm log m  n log  m  nm log  n n m 2 …… (65) 2m  1 log ii   i1  i1

38 The MLEs of m and  of Nakagami distribution is obtained by EasyFit 5.5

Professional statistical software directly.

(ix) Generalized Gamma (4P) distribution.

The pdf of Generalized Gamma (4P) distribution with four parameters is:

  yy  1    fy  exp  ……………………… (66)       

Likelihood and Log-likelihood are:

 n   1    l ,  ,       exp i  …………. (67)   i  i1    

n logl , , ,  n log   (  1) log( i  )  n  log  (  ) i1  … (68) n    nlog   i     i1 

MLEs of ,  , and  parameters of Generalized Gamma (4P) distribution is obtained with the help of EasyFit 5.5 software directly.

(x) Log Logistic (3P) distribution.

The pdf of the Log-Logistic (3P) distribution with four parameters is:

1 2 yy      fy( ) 1 ………………………… (69)          

The likelihood function of Log Logistic (3P) distribution is:

1 2 n           l ,  ,  ii  1   ……….. ………. (70)        i1      

39 Taking the logarithm Log-likelihood function is:

n i   logl ,  ,   n log   n log   (   1) log  i1   …….. (71) n    2 log  1 i    i1 

To get the MLEs of ,  and  parameters of Log Logistic (3P) distribution, differentiate the log-likelihood function with each parameter and equating it to zero and solve the system likelihood equations. But EasyFit 5.5 software give it straight.

(xi) Johnson SB distribution.

The pdf of the Johnson SB distribution with four parameters is:

2 1    f ( ) exp     ln  ………….. (72) 2    2         

The likelihood function and Log-likelihood function were:

2 n  1  l ,  ,  ,   exp     ln i  2    i1 2 ii        i  .. (73)

n nn logln , , ,  log   log 2   log ii   log     2 ii11 2 n 1  ln i   i1 2   i  ….. (74)

MLEs of ,  , and  of Johnson SB distribution is obtained easily from Easy Fit

5.5 Professional software directly.

(xii) Generalized Extreme Value Distribution.

The pdf of the Generalized Extreme Value distribution with three parameters is 40 1 11   f( ) exp  1   z  1   z ,   0 ………. (75)  

 z  Where 

The corresponding likelihood function is:

n 1 11   l,  ,   exp  1   z  1   z ……… (76) i1  

And the log-likelihood function is:

1 nn    1        logln ,  ,  log    1  ii  log 1   …..       ii11      

(77)

The MLEs of parameters , and  can be obtained directly with the help of

EasyFit 5.5 software.

(xiii) Dagum Distribution.

Consider the pdf of the Dagum distribution:

 1     f ()   1 …………………………………………….78   1  

41 The likelihood function and Log-likelihood functions are:

 1 i   n  l ,,       1 …………………………………..79 i1 i  1  

n i logl ,  ,   n log   n log     1 log  i1  ………80 n i nlog   1 log 1  i1 

The MLEs of parameters , and  can be obtained directly with the help of

EasyFit 5.5 Professional Statistical software.

3.7 The criteria for choosing the best statistical distribution for the data.

From those statistical distributions suggested for daily total inpatients and outpatients, only one will be more fit to the patients‟ data. Since the parameters were obtained by using the method of the maximum likelihood the criteria for selecting appropriate statistical distribution out of the five in total inpatients data and nine in total outpatients data was also based on the values of the estimated maximum likelihood function, which will directly calculate AIC and BIC. The smaller the value of the AIC the better the distribution model to fit the given set of data or larger the likelihood the better the distribution model (Boadi et al., 2015). So it is aimed to find the largest likelihood value or smallest AIC to identify the best fit.

3.7.1 The Goodness of Fit test.

The goodness of fit describes how best the statistical distribution fit a dataset observation. The evaluation of the goodness of fit is very important in the process of

42 selecting the best distribution. It measures the compatibility of a random sample with a theoretical probability distribution. Therefore to assess the goodness of fit of the selected thirteen distributions various tests are applied. The goodness of fit tests is effectively based on either two distribution functions or probability functions in order to test the null hypothesis that the unknown distribution or probability function follows a specified function or other. The methods considered for finding the appropriateness of the fitted distributions are Q-Q Plots, PP Plots, Kolmogorov-

Smirnov test and its modified named Anderson Darling test and Chi-Square test. For assessing whether or not patients‟ dataset follows chosen distributions such as Beta,

Burr, Cauchy, Dagum, Gamma 3P, Generalized Extreme Value, Generalized

Gamma 4P, Johnson SB, Log logistic 3P, Nakagami, Rayleigh, Rayleigh 2P, and

Weibull distribution the above criterion were used.

The Anderson Darling test is used to test if a sample data come from a population with a specified distribution. It is the modification of the Kolmogorov Smirnov test and gives more weight to tail than does KS test. The Kolmogorov Smirnov test compares a hypothetical or fitted cumulative distribution function with an empirical cumulative distribution in order to assess the goodness of fit of a given data set to a theoretical distribution. The goodness of fit test statistic of the Kolmogorov Smirnov

(KS) and the Anderson Darling test was obtained using the EasyFit 5.5 Professional software.

3.7.2 Chi-square goodness fit test.

This is one of the goodness of fit test used in this study. The chi-square test allows us to test whether the data approves the observed and proposed frequencies. The

Chi-square statistic measures how well the expected frequency of the fitted

43 distributions compares with the observed frequency of histogram of the observed data. The chi-square statistic is calculated using this formula:

2 n    22 ii ……………………………. (81) cal  n1 df i1 i

th Where θi is the observed frequency in the i cell, Ei is the expected frequency in the ith cell and I is the number of observations (1, 2, 3…, k). The Chi-square goodness of fit test statistic was obtained using the EasyFit 5.5 Professional software.

3.7.3 The Akaike’s Information Criterion.

The Akaike‟s Information Criterion (AIC) is another goodness of fit statistic used in this study. The AIC is a means of selecting a model from a set of models. It is an estimation of Kullback- Leibler information or distance which attempts to select a good approximating model for assumption based on the principle of cost-cutting (

Anderson and Burnham, 2004). This AIC was derived based on the concept that truth is very complex and that no “true models” exits ( Anderson and Burnham,

2004). The AIC can be defined as, AIC = -2(maximized log-likelihood) +2(the number of parameters estimated). For the fitted statistical distributions which will appear to have the smallest value of AIC is chosen to be the most appropriate distribution for fitting the patients‟ data set.

3.7.4 The Bayesian Information Criterion.

The Bayesian Information Criterion is another criterion for model selection which was proposed by (Schwarz, 1978) in order to identify the appropriate best-fit distribution model for a given set of data. The formula for calculating the BIC was given below:

44 BIC 2log L  log n ………………………………………………. (82).

Where n is the number of observation and  is the number of parameters in the distribution model. And also L is the maximized log-likelihood function.

In this present study, both values of AIC and BIC for all suggested statistical distributions were computed after the values of the log-likelihood for the individual distribution has been obtained. The researcher used estimates of parameters of the fitted distribution model to find log-likelihood for individual statistical distribution and hence both values of AIC and BIC were computed manually using Excel software.

3.8 Time Series Model.

A time series model establishes a relationship between the present values of a time series and the past values so that forecasts can be on the basis of the past values alone. A time series model uses a model for the explanation that based on the theoretical foundations and mathematical representation. The time-series data could be modeled by several different approaches among which are Autoregressive model

(AR), Moving Average (MA) model, Autoregressive Moving Average (ARMA) model and Autoregressive Integrated Moving Average (ARIMA) model. In this study, the Autoregressive Integrated Moving Average (ARIMA) model was used to forecast the total number of outpatients‟ visited in the hospital-based on average patients visited on a weekly basis.

3.8.1 Autoregressive Integrated Moving Average (ARIMA) model.

ARIMA model is a progressive technique presented by Box and Jenkins (1976) and until now becomes the most popular method to forecast Univariate time series data.

45 The three combination models such as Autoregressive model (AR), the Moving

Average model (MA) and the ARMA models together form the ARIMA model. Box

Jenkins procedure contains three main stages in order to build the ARIMA model.

Model-identification, model estimation, and the model checking are usually used for determining the best ARIMA model for certain time series data. The ARIMA model

obtained from the differenced series Wt is given by:

Wt1 w t 1   2 w t  2 ...   p w t  p   1  t  1   2  t  2  ...     t …………… (83)

Since the model will be used to forecast the observation series t from a transformation of to using the substitutions:

d Wtt1    ………………………………………………………. (84)

Where, wtt1,   wtt111   and wt p1   t p ……. (85)

3.8.2 Stationary and Non-stationary time series.

A time series is supposed nearly stationary if its properties are invariant over time.

This suggests that the mean and variance are the same for all times, else the series is non-stationary. If it is a non-stationary time series, procedures are unreliable and spurious results will be formed leading to poor understanding and forecasting. To determine the stationary and non-stationary status of the time series, the graph of the

Autocorrelation function and Partial Autocorrelation function along with the

Augmented Dickey-Fuller Test (ADF) was used to check it. The spike lag of both autocorrelation function and the partial autocorrelation graph was used to understand whether the time series model is stationary or non-stationary. To ensure the stationarity of the time series model, the differencing procedure will be used by 46 making the lag differencing. Therefore, before forecasting the total number of outpatients‟ attendance in hospital for the year 2017 to 2018, the data were differenced by one lag so that the stationarity of the ARIMA model is ensured before forecasting the data.

3.8.3 The Augmented Dickey-Fuller Test (ADF).

Stationary test of a differenced time series utilizes the Augmented Dickey-Fuller technique (Dickey & Fuller (1981)). This is a generalized auto regression model formulated in the following regression equation:

5 yi yit  y i, t 1  w i,,  t     , t ……………………………….. (86) k1

The Augmented Dickey-Fuller Test is compared with critical values to draw conclusions about stationarity.

3.8.4 Differencing and Lag.

This is one of the concepts in time series analysis to reduce the variability of observed time series data. Subtracting first observation from the second and continuing this process till the last observation, a set of first difference observed values are available and its stationarity is observed. The modeling is continued only if this set is stationary, otherwise, continue the process of finding the continuous difference of first difference which is the second lag difference. Continue the differencing procedure until the observed difference is stationary. Computing differences among pair of observations at some lag is just to make a non-stationary series to be stationary. The number of times a series is differenced before stationarity determines its order of integration. Lag is another term concept in time

47 series analysis which is indicating the time period between two observations. For

example, the lag one is between Yt and Yt1 lag 2 is between Yt and Yt2 . Time series

can also be lagged forward between Yt and Yt1 where Yt is the observed value at the

current time and Yt1 the observed value at a previous time.

3.8.5 Autocorrelation Function (ACF).

ACF refers to the correlation of a time series with its own past and future values. In the context of time series analysis, the relationships between observations in different time period play a very important role. These relationships across time can be captured by the time-series correlation known as autocorrelation covariance. The

formula for auto covariance function ( k ) of a time series is given below:

N i    i k     i1 …………………………………………. (87) k N 2 i   i1

Where N is the number of observations.  is the average observation of a time span

k.  k is the autocorrelation coefficient of i , the stochastic component.

3.8.6 Partial Autocorrelation Function (PACF).

The partial autocorrelation function ( k ), where   2 is defined as the partial

correlation between Yt and Ytk under holding the random variable in betweenY , where tt , constant. It is obvious that the PACF is only defined for lags equal to two or greater because considering the given example below: If one

computes p2 of yt and yt2 under holding yt1 constant then the correlation of yt1

disappears. But if one needs to compute the p1 of yt and yt1 it is the same as

48 computing the PACF at lag one which is 1 . The partial autocorrelation plot is also commonly used for model identification in Box –Jenkins models. On the horizontal axis, the variable is lag and on the vertical axis, the partial autocorrelations coefficients at lag k is taken.

3.8.7 Mean Squared Error (MSE).

This is one of the concepts used in the context of time series analysis which measures the quality of an estimated of a parameter. The MSE measures how close

data points are to a fitted line. For an observed time-series data (YYY12, ..., N ) and a

vector of N predictions (YYY12, ,..., N ), the mean square error is given below:

1 N 2 MSE YYii ……………………………………………. (88). N i1

The smaller the value of MSE is, the closer the fit to the data given.

3.8.8 The Mean Absolute Percentage Error (MAPE).

This is another concept in time series analysis to measure the accuracy of constructing fitted line on time series values. The MAPE measures the size of the error in terms of percentage. The formula for calculating the MAPE is given below as:

1 actual  Forecast  MAPE   x100 ………………………… (89). n Actual

When there is a perfect fit in the time series data, then the value of MAPE is zero.

49 3.8.9 Time series Diagnostic Checking.

In the time series analysis before using the model for forecasting, it must be checked for model adequacy and the Ljung- Box test could be employed to the residuals to ascertain it. In time series analysis, the model is adequate if the residuals left over after fitting the model will only be simply white noise. The pattern of both plots of the ACF and the PACF is also used to detect misspecification, which will result in identifying a different and a better model.

3.9 Forecasting.

Forecasting technique is an approach to predict or estimate quantitatively and qualitatively about what is going to occur in the future based on the relevant data from the past (Hasibuan, 2011). In the decision-making process, the forecasting will play an important role. Forecasting is a planning tool which helps decision-makers to foresee the future uncertainty based on the behavior of past and current observations. Box and Jenkins (1976) have been described as the forecasting to provide the basis for economic and business planning, inventory and production control and optimization of the industrial process. From the previous studies, most research work has been found that the selected model is not necessarily the model that provides the best forecasting.

3.10 Data Analysis.

This section of the analysis basically discusses the various distributions satisfied by the daily patient's data from the Dodoma regional referral hospital from January

2017 to December 2018. This section also applies empirical data analysis to assist the identification of the set of distributions which the data might follow. The diagnostics test probability plot and q-q plot were used graphically to demonstrate

50 goodness of fit to the Beta distribution, Burr distribution, Weibull distribution,

Gamma (3P) distribution, Gen. Gamma (4P), Gen. extreme value distribution,

Rayleigh, Dagum, Nakagami, Johnson SB, Log-logistic (3P), Cauchy and the

Rayleigh (2P) distribution. The goodness of fit test was used to test the fitness of the distributions. The hospital patients‟ data were analyzed using statistical software such as Excel, SPSS, EasyFit 5.5 Professional software and also SAS 9.4 software.

3.11 Data Processing.

After completing all the process of secondary data collected from the hospital, data entry, data editing, and data cleaning were carried out using Excel and SPSS software before analysis. The patients‟ data has been collected from the DRRH and the process of cleaning was completed with recoding and all columns were made numerical with the help of EXCEL and then it is exported to SAS 9.4 software and

EasyFit 5.5 Professional software for further analysis. Computer graphics of these packages were used to identify the adequacy of fit, stationarity, tailing, skewness, kurtosis, etc. EasyFit 5.5 Profession software was the main tool used for the parameter estimation, goodness of fit and fitting probability distribution. SPSS software was also used in getting the summary statistics of the hospital patients‟ data.

3.12 Reliability and Validity.

Reliability and Validity are the two supreme essential and central structures in the assessment of any amount of instrument or tool for decent research (Mohajan,

2017). Validity concerns what an instrument measures and how well it does so whereas reliability concerns the faith that one can have in the data obtained from the use of an instrument that is the point to which any calculating mechanisms controls

51 for random error (Mohajan, 2017). For all secondary data, the detailed assessment of reliability and validity involves an appraisal of methods used to collect data

(Saunders et al., 2009).

3.12.1 Reliability.

Blumberg et al., (2005) had defined the term reliability as a measurement that supplies consistent results with equal values. It measures consistency, precision, repeatability, and trustworthiness of research (Chakrabarthy, 2013). In quantitative research, reliability refers to the consistency, stability, and repeatability of results. ie the results of a researcher are considered reliable if consistent results have been obtained in identical situations but different circumstances (Mohajan, 2017).

In this study, the reliability of the study was achieved by using the authenticable secondary hospital daily data which was recorded and well-kept in the electronic health management system database of Dodoma regional referral hospital. The data was retrieved and filtered from the daily secondary hospital patients‟ data for one year from bulk data for long years of the hospital health management database.

3.12.2 Validity.

Blumberg et al., (2005) has been defined the term validity as the extent to which an instrument measures what it asserts to measure. The validity of a research instrument assesses the extent to which the instrument measures what it is designed to measure

(Robson, 2011). In quantitative research, validity is the extent to which any measuring instrument measures what it is intended to measure (Thatcher, 2010). In this study, in order to ensure the validity, the EasyFit 5.5 Professional software was employed and fitted all sixty-one probability distributions proposed to a numerical continuous data. Then the first five ranked probability distributions were selected

52 with each goodness of fit test and the pooled set of distributions were used for further analysis.

3.13 Ethical consideration.

With the help of an introduction letter from the Graduate Studies and Continuing

Education Office of the University of Dodoma, the researcher is leaned to collect the secondary data of inpatients and outpatients of Dodoma regional referral hospital for

2017-2018. The Medical Officer in charge of Dodoma referral hospital granted permission to carry out data collection from the respective hospital departments.

53 CHAPTER FOUR

4.0 Findings and Discussions.

The part of this chapter presents the findings and the discussions of the study. The hospital patients‟ data were analyzed using EasyFit 5.5 Professional software, SAS

9.4 software, Excel and SPSS software. This chapter is divided into six sections. The analysis is started with a summary of descriptive statistics of social, demographic states of inpatient and outpatients obtained from the Dodoma regional referral hospital. Thereafter identification of plausible statistical distributions is done with the help of Easy Fit Software. Estimation of parameters is undergone by Maximum

Likelihood method and goodness of fit of the model suggested is carried out with

Anderson Darling, Kolmogorov Smirnov, and Chi-Square test. The best fit among the suggested distributions is determined by finding Likelihood function for the sample values thereby calculated Akaike information criteria and Bayes information criteria. That distribution with minimum AIC and BIC values is chosen as the best fit for the inpatient and outpatient daily data. The properties of these distributions were narrated and probability for specific events is determined.

4.1 Summary statistics of demographic characteristics of hospital patients.

The filtered patients‟ information from the hospital management information system holds information on the sex and age of the patients. Summary statistics on gender- wise frequency and percentage and age group of inpatients and outpatients of

Dodoma regional referral hospital is illustrated in the table below.

54 Table 4. 1: Descriptive Analysis of Hospital Patients in DRRH (2017-2018).

Variables Category Inpatients Outpatients Total (n=76058) (n=444296) No % No % No % Age in years 0-4 5263 6.9 24944 5.6 30207 5.8 5-60 28370 37.3 160039 36.0 188409 36.2 61+ 4396 5.8 37165 8.4 41561 8.0 Gender Male 13054 17.2 95049 21.4 108103 20.8 Female 24975 32.8 127099 28.6 152074 29.2 Source: Field data (2019).

The results of Table 4.1 shows that the majority of patients admitted in DRRH for the year 2017- 2018 were female, 24975(32.8%) compared to males 13054(17.2%).

The ratio of Male: Female=1:2, indicating that a large number of women are admitted in the hospital so that primary facilities should be more for hygienic and effective treatment. Also, it is clear the female hospital care departments like gynecology and obstetrician, community disease, etc. are more needed in Dodoma city hospital. The results of Table 4.1 shows that the most of the patients admitted in

DRRH for the year 2017- 2018 belongs to an age of (5-60) years, 28370(37.3%) where old age patients and children were respectively 4396(5.8%) and 5263(6.9%).

The ratio of Adult: Children: Old=6:1:1 showing the importance of treatment for the youths, adults and middle-aged. Comparing with international data the inpatient rate of youths and middle-aged is very high indicating the need for emergency and elaborated medical facility in Dodoma Hospital.

From Table 4.1 indicating that the female patients are visited more in the outpatient department of DRRH for the year 2017- 2018 were 127099 (28.6%) compared to

55 males 95049 (21.4%). The ratio is 3:2 for a female to male implies that females were more affected by diseases and required treatment on a daily basis. From Table

4.1 shows that most of the patients visited at the outpatient department for treatments in DRRH for the year 2017- 2018 were of ages between (5-60) years

160039(36.0%) while children and old aged patients were 24944(5.6%) and

37165(8.4%) respectively. The ratio of age 0-4; 5-60: 61+ =1:7:2 affirming the influence of young and middle-aged patients as outpatients. The queuing time, service time and supply of medication are important to maintain outpatient health care of Dodoma hospital as most of them belong to the young working class.

4.2 Descriptive Statistics for daily total inpatients data.

The mean number of inpatients per day on each day of 2017 to 2018 for DRRH is obtained and summarized by descriptive statistics as shown in Table 4.2 below:

Table 4. 2: Descriptive Statistics Daily total Inpatients Data

Statistic Value Sample size 730 Minimum 12 Maximum 100 Range 88 Mean 52.151 Variance 236.08 Std. Deviation 15.365 Coef. of Variation 0.29463 Std. Error 0.56869 Skewness 0.35666 Excess of Kurtosis 0.37362

Source: Computed by the researcher using EasyFit 5.5 Software.

56 The sample size is the total number of days on which the inpatient's data was accounted and it was 730 days. The range is the difference (100-12=88) between the highest and the smallest number of inpatients admitted in a day is 88 patients which show a considerable variation on a number of admission per day in the hospital. The average number of inpatients is 52.151 with standard deviation 15.365 so that the coefficient of variation is 29.5% also shows the variability of inpatients entry on a daily basis. The standard error is also indicating the inconsistency of an average number of patients with a positive skewness suggesting a skewed distribution and excess kurtosis measurement ascertain the clustering of data at some points near averages. Thus the distribution of the inpatients is not normal, positively skewed and leptokurtic so that such family of distribution will only be suitable for the data.

4.3 Descriptive Statistics for daily total Outpatients data.

The mean number of outpatients per day from 2017 to 2018 for DRRH is obtained and summarized by descriptive statistics as shown in Table 4.3 below:

Table 4. 3: Descriptive Statistics for daily total Outpatients data Statistic Value Sample size 730 Minimum 72 Maximum 699 Range 627 Mean 304.31 Variance 21336.0 Std. Deviation 146.07 Coef. of Variation 0.4799 Std. Error 5.4062 Skewness 0.44091 Excess of Kurtosis -0.4134 Source: Computed by the researcher using EasyFit 5.5 Software. 57 The data is composed of two years daily total outpatients for 730 days. The range is

(699-72=627) which is highly fluctuating from day to day so that probability analysis is essential to undertake and evaluate exact activities of the hospital. An average number of outpatients per day is 304.31 which is a large number and the standard deviation is 146.07, which is also very high implies that the basic data information is unclear to estimate the regular outpatients reaching at DRRH and only thorough distribution analysis the exact evaluation can be performed. The coefficient of variation is about 50% also assures the above conclusion. A positive skewness of 0.4409 and kurtosis of 0.4134(-ive) are indicators of non-normal distribution existing for the outpatient data. Thus the shape of the distribution will be tailed to the positive side while the height of distribution is little lower than the normal curve. The probability distribution analysis can only identify the nature and characteristics of the distribution.

4.4 Determination of the feasible distributions on daily total inpatients and total

outpatients hospital data.

In order to determine the suggestions of distributions, we make use of EasyFit 5.5

Professional software on raw data for inpatients and frequency data for outpatients

(as raw data does not provide consistent suggestions). Anderson test, Kolmogorov test, and chi-square test are applied in first data and first two tests are valid in the second data. However, in order to determine how well the selected distributions fit the daily total number of inpatients and the daily total number of outpatients‟ hospital data, they were tested for the goodness of fit using Anderson Darling,

Kolmogorov Smirnov, and Chi-square test. Test statistic was conducted on both the total number of inpatients and the total number of outpatients‟ hospital daily data in order to obtain the best fit distribution for data. The test statistic table for each of all selected probability distributions for the study was shown below. 58 Table 4. 4: showing the test statistic for Gen. Extreme Value distribution of the

daily total number of inpatients hospital data.

Gen. Extreme Value Distribution.

Kolmogorov-Smirnov

Sample Size 730 Statistic 0.02746 P-Value 0.63037 Rank 1

Significance level 0.2 0.1 0.05 0.02 0.01

Critical Value 0.03971 0.04527 0.05026 0.05618 0.06029

Reject? No No No No No

Anderson-Darling

Sample Size 730 Statistic 0.79848 Rank 11

Significance level 0.2 0.1 0.05 0.02 0.01

Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Chi-Squared

Deg. of freedom 9 Statistic 1.3019 P-Value 0.99837 Rank 1

Significance level 0.2 0.1 0.05 0.02 0.01

Critical Value 12.242 14.684 16.919 19.679 21.666

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

59 By Kolmogorov test and Chi-square test of goodness of fit, the ranks of this distribution are one among 61 options for continuous distributions. Also, both test accepts the null hypothesis that the inpatient data follows Generalized Extreme

Value distribution at 1%, 2%, 5%, 10%, and 20% level of significance. In addition to these, Anderson test also accepts the same hypothesis at all these levels of significances, even though there exist another 10 distributions better than this. Hence the number of inpatients per day at DRRH follows Generalized Extreme Values distribution.

Details of fitting other distributions for the inpatient's data were given in Appendix 1

The preliminarily selected distributions for inpatient daily data were:

Beta distribution, Burr distribution, Gamma (3P) distribution, Gen. Gamma (4P), and Gen. Extreme value distribution.

For the outpatient raw data, the fitting of distribution with respect to 61 continuous distributions cannot identify any distribution uniquely by means of the three tests.

So conveniently the data is arranged in terms of frequency data and then the software will test the goodness of fit by Anderson and Kolmogorov test. The possible distributions coming under rank 5 of both tests were considered to determine the plausible basic distributions for the outpatient data. Thus there are 9 distributions were under consideration for the second stage analysis.

The selected distributions were given below.

60 Table 4. 5: Showing the test statistic for the Dagum distribution of the daily total number of outpatients’ hospital data.

Dagum Distribution.

Kolmogorov-Smirnov

Sample Size 13 Statistic 0.0933 P-Value 0.99917 Rank 1

 0.2 0.1 0.05 0.02 0.01

Critical Value 0.2847 0.32549 0.36143 0.40362 0.43247

Reject? No No No No No

Anderson-Darling

Sample Size 13 Statistic 0.96788 Rank 8

 0.2 0.1 0.05 0.02 0.01

Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

The outpatient frequency data is supposed to follow Dagum distribution under the null hypothesis and it is accepted by both tests at 1%,2%, 5%,10% and20% level of significance. By Kolmogorov test, this distribution is of rank 1 while it is rank 8 for

Anderson test. Generalizing the result, the number of outpatients per day at DRRH follows Dagum distribution.

61 The details regarding the goodness of fit of other distributions coming under feasible distributions for outpatient data is given in Appendix 2. The selected preliminary distributions for outpatient daily data were: Burr distribution, Cauchy distribution,

Dagum distribution, Johnson SB distribution, Nakagami distribution, Log-logistic

(3P) distribution, Rayleigh distribution, Rayleigh (2P) distribution, and Weibull distribution.

4.4.1 Assessment of Goodness of Fit by Graphical Methods.

The graphical method is another way of determining whether certainly given sets of data follow one of the chosen statistical distributions or not. Along with the goodness of fit test, the Probability Plot (P-P), Quantile-Quantile Plot (Q-Q) and the

Probability density function and their cumulative density function can be also used to assess whether the selected statistical distributions can fit the hospital inpatients two years daily data. The graph of probability density functions of the five selected statistical distributions for fitting the number of hospital inpatients per day was shown in the figure below:

62 Figure 4. 1: Probability density curve of five preliminary selected distributions on inpatients daily data.

Probability Density Functions of five selected Distributions for IPDs 0.24

0.22

0.2

0.18

0.16

0.14

0.12 f(x)

0.1

0.08

0.06

0.04

0.02

0 16 24 32 40 48 56 64 72 80 88 96 Number of inpateints in DRRH during year 2017-2018 Histogram Beta Burr Gamma (3P) Gen. Gamma (4P) Gen. Extreme Value

Figure 4.1 shows the comparison of density functions of all five selected distributions for the fitting number of hospital inpatients daily data. The result from figure 4.1 shows that four distributions namely Beta distribution, Gamma (3P) distribution, Generalized Gamma (4P) distribution, and Generalized Extreme Value distribution fit the data very close compared to Burr distribution which is a little bit not close to other distributions. Also, the four statistical distributions named above shows the great signs of providing a good fit to the inpatients daily data and all of them were very close to each other and it was almost symmetric in shapes so that it was very difficult to determine the best fit distribution using this graph. Therefore, the further calculation was needed and using the AIC and BIC to know which one is the best fit for the inpatient's daily data (2017-2018) in DRRH.

63 Figure 4. 2: Cumulative density curve of the selected distributions for hospital inpatients daily data.

Cumulative Distribution Functions of the five selected Distributions

1

0.9

0.8

0.7

0.6

0.5

0.4 Cummulative Cummulative probability

0.3

0.2

0.1

0

16 24 32 40 48 56 64 72 80 88 96 Number of Inpatients in DRRH for year 2017-2018

Sample Beta Burr Gamma (3P) Gen. Gamma (4P) Gen. Extreme Value

Figure 4.2 shows the comparison of the cumulative density functions of all five

selected distributions for the fitting number of inpatients daily data. The figure 4.2

indicates that all the fitted five selected distributions namely Beta distribution, Burr

distribution, Gamma (3P) distribution, Generalized Gamma (4P) distribution, and

Generalized Extreme Value distribution fit the data very close each other.

64 Figure 4. 3: Probability density curves of nine selected distributions for a

number of hospital daily outpatients’.

Probability Density Functions of Nine Selected Distribution for OPDs

0.4

0.36

0.32

0.28

0.24 f(x) 0.2

0.16

0.12

0.08

0.04

0 120 160 200 240 280 320 360 400 440 480 520 560 600 640 Number of outpatients in DRRH during year 2017-2018

Histogram Johnson SB Cauchy Burr Dagum Log-Logistic (3P) Nakagami Rayleigh Rayleigh (2P) Weibull

Figure 4.3 shows the fitted density curve for all nine selected distributions models.

The distribution models were fitted to total outpatients‟ daily data for two years.

From the figure, it was observed that the four distributions such as Cauchy,

Nakagami, Johnson SB and Rayleigh distributions provide poor fit to the data. For all that four distributions the density curve is not well superimposed over the histogram. The rest of the other five fitted distributions provide a good fit to the data. However, since the other five statistical distributions named above shows the great signs of providing a good fit to the hospital outpatient data and were very close to each other, it was very difficult to determine the best fit distribution. Therefore, at this stage, the further computation was done by using the AIC, BIC and the log- likelihood to know which one is the best fit to the inpatient's daily data in DRRH.

65 4.4.2 Probability–Probability (P-P) Plot.

The probability –probability plot is a graph of the empirical cumulative density

function values plotted against the fitted cumulative density values

(www.mathwave.com). It is used to determine how well a particular distribution fits

the observed data. The p-p plot will be approximately linear if the specified fitted

distribution is correct distribution model (www.mathwave.com). In the present

study, the p-p plot was used for Beta distribution, Burr distribution, Gamma (3P)

distribution, Generalized Gamma (4P) distribution, and Generalized Extreme Value

distribution.

Figure 4. 4: Probability Plot of all the five selected distributions for fitting a

total number of hospital inpatients daily data.

P-P Plots of the five selected Distributions 1

0.9

0.8

0.7

0.6

0.5 P (Model)

0.4

0.3

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P (Empirical)

Beta Burr Gamma (3P) Gen. Gamma (4P) Gen. Extreme Value

The result from figure 4.4 shows that the Burr distribution does not provide a good

fit to the hospital inpatients data compared to other four statistical distributions

namely Beta distribution, Gamma (3P) distribution, Generalized (4P) distribution,

66 and Generalized Extreme value distribution. On observing carefully, figure 4.4 shows that the Burr distribution points is below the reference line or does not fall around the reference line. The other four statistical distributions named above shows have great signs of providing the goodness of fit to the inpatient data as all of the points were very close to the reference line and so it was very difficult to determine the best fit distribution using PP plot. Therefore, the further computation was done by using the AIC to know which one is the best fit for the total number of hospital inpatient data.

4.4.3 Quantile- Quantile Plot (Q-Q) Plot.

The Q-Q plot is mostly used in another graphical method to measure how closely a data set fits a certain distribution. The Q-Q plot was used to find a suitable fit by plotting the quartiles and comparing the selected fitted statistical distributions. The

Q-Q plot relates the quartiles derived from empirical probability distributions with the quartiles estimated by the fitted distributions (El-Shanshoury, 2017). At large, the simple idea is to calculate the theoretically for each data setpoint based on the distribution. If the data definitely follow the selected distribution the points on the Q-Q plot, it will fall approximately on a straight line (Eli-Shanshoury,

2017).

67 Figure 4. 5: Q-Q Plot of five selected distributions fitting a number of inpatients

daily.

Q-Q Plots of the five selected Distributions

96

88

80

72

64

56

Quantile (Model) Quantile 48

40

32

24

16

16 24 32 40 48 56 64 72 80 88 96 Number of inpatients in DRRH for year 2017-2018

Beta Burr Gamma (3P) Gen. Gamma (4P) Gen. Extreme Value

Figure 4.5 visualize the comparison of the fitted all five statistical distribution

selected for fitting inpatient data. Although when you take a careful observation on

figure 4.5 it shows that all five distributions appear to be good fits since all the

distributions points fall around the straight line, just only a few points at the end are

a little bit away from the straight line. However, despite that all five selected

statistical distributions named above shows the great signs of providing a good fit to

the hospital inpatient data, it was very difficult to determine the best fit distribution

using this graph. Therefore further calculation was needed by using the AIC and

BIC to know which one is the best fit for the hospital inpatient data.

68 4.5 Estimation Parameters of the Selected Probability Distributions for both

total inpatients and outpatients’ hospital daily data.

Now to estimate the distribution model parameters, the method of maximum likelihood estimation is used, which tries to find the values of the true parameters that are almost certainly produced by the data observed. It is the most adaptable method for fitting parametric statistical distribution models to data (Cousineau et al.,

2004). There are thirteen distinct statistical probability distributions altogether representing the total number of inpatients and a total number of outpatients for this study after an initial analysis through EasyFit 5.5 Professional software. The estimation of parameters was also done with the help of EasyFit software by the method of maximum likelihood.

69 Table 4. 6: Estimated Parameters for suggested distributions representing the

daily total number of inpatients’ data from DRRH by Method of

maximum likelihood estimation.

Fitted Distributions Estimated Parameters Beta  1 =22.923  2 =149.51

a=-27.279 b=569.9

Burr  =3.1238  =4.4431

 =70.384

Gamma (3P)  =37.098  =2.5249

 =-41.572

Gen. Extreme Value  =-0.20001  =14.518

 =46.154

Gen. Gamma (4P)  =1.165  =23.098

 =5.8255  =-33.938

Source: Computed by the researcher using EasyFit 5.5 Professional software.

From table 4.6 above indicates the estimated parameters of all five distributions for daily inpatients hospital data. The Beta distribution with continuous shape parameters ( =22.923 and =149.51) and boundary parameters ( a=-27.279 and b=569.9). The Burr distribution has scale of =70.384, shape of =3.1238 and estimated location of =4.4431. The Gamma (3P) distribution has scale of =2.5249, shape of

=37.098 and estimated location of =-41.572. The Gen. Extreme Value distribution with scale of =14.518, shape of =-0.20001 and estimated location of =46.154. The Gen.

Gamma (4P) distribution with two shape parameters ( =1.165 and =23.098), scale of

=5.8255 and estimated location of =-33.938.

70 Table 4. 7: Estimated Parameters of suggested distributions representing the

daily total number of outpatients’ data from DRRH by Method of

maximum likelihood estimation.

Fitted Distributions Estimated Parameters

Burr  =35.479  =2.2491  =1660.0

Cauchy  =85.679  =283.09

Dagum  =0.40197  =5.1053  =391.97

 =0.84103  =1.1222 Johnson SB  =828.9  =16.935

Log-Logistic (3P)  =4.9104  =407.67  =-126.1

Nakagami m =1.2039  =1.1301E+5

Rayleigh  =241.55

Rayleigh (2P)  =213.55  =38.465

Weibull  =2.2162  =342.85

Source: Computed by the researcher using EasyFit 5.5 Professional software.

Table 4.7 above shows the estimates of parameters of all nine distributions of daily outpatients‟ hospital data. The Burr distribution has scale parameter of =1660.0, shape of =35.479 and estimated location of =2.2491. The Cauchy distribution has scale of =85.679 and location parameter of =283.09. Dagum distribution has shape parameter of =0.40197, scale parameter of =391.97 and estimated location parameter of =5.1053. The Johnson SB distribution has two continuous shape parameters of (

=0.84103 and =1.1222), scale of =828.9 and estimated location of =16.935. The

71 Log Logistic (3P) distribution has shape of  =4.9104, scale of  =407.67 and estimated location of  =-126.1. Nakagami distribution has scale and shape of ( m =1.2039 

=1.1301E+5). The Rayleigh has scale of  =241.55. The Rayleigh (2P) has scale of 

=213.55 and estimated location of  =38.465. The Weibull distribution has shape and scale of ( =2.2162 and  =342.85).

4.5.1 Identification of probability distributions and its properties of all selected

distributions in both total hospital inpatients and outpatients data.

The identification of the probability properties of each selected distributions was done by using EasyFit 5.5 Professional statistical software. The properties of each fitted probability distributions selected for a total number of hospital inpatients two years daily data was given in table 4.8 below:

Table 4. 8: indicating the properties of all five distributions for total hospital inpatients data. Distribution Mean Variance St. Coef. Of Skewness Kurtosis dev. Variation Beta 52.107 237.01 15.395 0.2955 0.3265 0.1248 Burr 52.098 237.51 15.411 0.2958 0.3677 0.5737 Gamma (3P) 52.095 236.49 15.378 0.2952 0.3284 0.1617 Gen. Gamma (4P) 52.093 236.20 15.370 0.2950 0.3278 0.1513 Gen. Extreme 52.095 233.07 15.267 0.2931 0.2541 -0.1198 Value Source: Computed by the researcher using EasyFit 5.5 Professional software.

The results from table 4.8 show that all the properties of each selected five fitted distributions for total inpatients hospital data were very close to each other. Since

72 the values of skewness for each selected fitted distribution is greater than zero, all the distributions are positively skewed in nature.

Table 4. 9: indicating the properties of all nine distributions for total hospital

outpatients’ data.

Distribution Mean Variance St. dev. Coef. Of Skewness Kurtosis Variation Burr 303.54 21036.0 145.04 0.4778 0.5402 0.1488 Cauchy NO NO NO NO NO NO Dagum 307.72 25221.0 158.81 0.5161 1.6492 13.0490 Johnson SB NO NO NO NO NO NO Log-Logistic 310.77 31151.0 176.50 0.5679 2.5748 29.6340 (3P) Nakagami 303.79 20723.0 143.95 0.4739 0.5593 0.1695 Rayleigh 302.74 25043.0 158.25 0.5227 0.6311 0.2451 Rayleigh (2P) 306.10 19572.0 139.90 0.4570 0.6311 0.2450 Weibull 303.65 20950.0 144.74 0.4767 0.4996 0.0275 Source: Computed by the researcher using EasyFit 5.5 Professional software.

The results from table 4.9 show that the mean of each selected nine fitted distributions for total outpatients‟ data was very close to each other. But the SD of distributions is varying from 140 to 177. Two distributions namely Cauchy and

Johnson distribution indicates that there exist no such properties. Dagum distribution shows the highest (except Log logistic (3P) distribution) measures of skewness and kurtosis indicating highly peaked positively skewed curve, suitable for the data.

Since the measure of skewness for all selected distribution is greater than zero and measure of kurtosis is positive implying that the befitting distributions are positively skewed and leptokurtic in nature.

73 4.5.2 The AIC and BIC interpretation for model selection.

The best-fitting distribution on these data is determined by log-likelihood statistic,

AIC (Akaike, 1974) and BIC values of suggested distributions. Since the criteria of selection method is based on the log-likelihood values of distributions on sample values, it is calculated after estimating the parameters and hence their AIC and BIC values. The AIC criterion was derived based on the concept that the truth is very complex and that no “true model” exists for any sample data set ( Anderson, &

Burnham, 2004). Thus, for given suggested distributions, it was possible to estimate which distribution was nearby to the unknown true model. The formula for obtaining the value of AIC was given in chapter three above. In this present study, both values of AIC and BIC were computed after finding the values of the log-likelihood for given data with respect to individual distributions suggested. The MLE estimation method is used to find the parameters in each model and then log-likelihood is determined manually using Excel software for each distribution suggested. Using these values AIC and BIC for each distribution were found out by the formulae. The distribution which will be appeared to have the smallest value of the Akaike‟s

Information Criterion is the one that will be appropriate for fitting the patients‟ data.

The Bayesian Information Criterion is another criterion for model selection which was proposed by (Schwarz, 1978) in order to identify the appropriate best-fit distribution model for a given set of data. Calculation of the BIC for all suggested distribution is also based on log L and the number of parameters estimated and one select the model with the smallest value of BIC as the best fit. The formula for calculating the BIC was given in chapter three above. In this study, most of the AIC and BIC values were calculated by researcher manually using both Excel. Also, we

74 used PROC HPSEVERITY and PROC LIFEREG procedure of SAS 9.4 software to determine the values for some distributions. The AIC, BIC and Max Log-Likelihood values for the five suggested distributions for inpatient data were shown in Table

4.10 below.

Table 4. 10: Model selection for a total number of inpatients in DRRH. Distribution AIC BIC Log-likelihood Beta 44064.34 44067.79 -22028.17 Burr 6054 6067 -3023.80 Gen. Extreme Value 1067.96 1067.408 -529.98 Gamma ( 3P) 6054.31 6068.09 -3124.16 Gen. Gamma (4P) 21713.0 21716.62 -10852.58 Source: Calculated manually by the researcher using Excel and SAS 9.4

There are five distributions suitable to present the inpatient data of DRRH (See

Anderson, Kolmogorov and Chi-Square test tables). Beta, Burr, Generalized

Extreme Value, Gamma (3P), Generalized Gamma (4P) distributions were the feasible suggestions using the statistical test criteria. The result from Table 4.10 shows that from the distributions suggested the AIC, BIC, and Log-likelihood values were minimum for the Generalized Extreme Value distribution. The values were consistently small for this distribution satisfying all three criteria and so it is the best fit distribution for the inpatient data of DRRH. Burr distribution and three parametric Gamma distribution are coming under second and third suggestions but their AIC, BIC, and Log-likelihood values are far different from Generalized

Extreme Value distribution so that for probability analysis of the inpatient data it is uniquely suggested the Generalized Extreme Value distribution.

75 Figure 4. 6: Probability density curve of the Generalized Extreme Value

distributions for hospital inpatients daily data in DRRH.

Probability Density Function of Gen. Extreme Value Distribution 0.24

0.22

0.2

0.18

0.16

0.14

0.12 f(x)

0.1

0.08

0.06

0.04

0.02

0 16 24 32 40 48 56 64 72 80 88 96 Number of inpatients in DRRH during the year 2017-2018 Histogram Gen. Extreme Value

Probability Density Function of Gen. Extreme Value Distribution 0.026 0.024 0.022 0.02 0.018 0.016 0.014 f(x) 0.012 0.01 0.008 0.006 0.004 0.002 0 0 50 100 Number of inpatients in DRRH during the year 2017-2018

Gen. Extreme Value (-0.20001; 14.518; 46.154)

76 Table 4. 11: Model selection for a total number of outpatients in DRRH.

Distributions AIC BIC Log-likelihood Burr 176.967 178.662 -85.483 Cauchy 82.985 81.213 -39.493 Dagum 78.455 75.797 -36.228 Johnson SB 182.317 178.317 -87.159 Log logistic (3P) 81.975 79.317 -37.988 Nakagami 950084.112 950082.340 -475040.056 Rayleigh 152.430 151.544 -75.215 Rayleigh (2P) 99.530 97.758 -47.765 Weibull 174.965 176.095 -85.482

Source: Calculated manually by the researcher using Excel and SAS 9.4

There are nine statistical distributions shown in Table 4.11 satisfy the Kolmogorov test and Anderson test of goodness of fit for the outpatient frequency data suggesting the basic feasible probability distributions. The AIC and BIC values are derived from the maximum log-likelihood value for the sample values for each distribution and illustrated in Table 4.11. From Table 4.11 Dagum distribution is the most befitted distribution for the outpatient data as the AIC, BIC, and Log-likelihood values stood uniquely minimum compared to all other distributions. Three parametric Log logistic distribution and Cauchy distributions are the second and third appropriate fitted curves to avail the probability information.

After calculation of both the AIC, BIC and the log-likelihood values, the best statistical distribution selected was Dagum distribution among the nine distribution that was primarily accepted. The graph of Dagum distribution was shown below:

77 Figure 4. 7: Probability density curve of Dagum distribution representing outpatient daily data in DRRH 2107-2018.

Probability Density Function of Dagum Distribution

0.4

0.36

0.32

0.28

0.24 f(x) 0.2

0.16

0.12

0.08

0.04

0 120 160 200 240 280 320 360 400 440 480 520 560 600 640 Number of outpatients in DRRH during year 2017-2018 Histogram Dagum

Probability Density Function of Dagum Distribution 0.003

0.0025

0.002

0.0015 f(x)

0.001

5E-4

0 0 200 400 600 800 1000 Number of outpatients in DRRH during year 2017-2018

Dagum (0.40197; 5.1053; 391.97)

78 4.6 Prediction of the probability of occurrence of inpatients, outpatients level

exceeding some limits in the hospital.

This study also intended to calculate the probabilities of outpatients and inpatients falling in some specified levels. It will be more stable information for such skewed peaked data than evaluating with respect to mean and standard deviation. As the symmetry of distributions is not exist, the values of mean, SD, etc. were not easily interpretable and probability analysis with respect to the distribution can establish the underlying trends in the data. Once the best fit distribution was determined then the probability density function of the respective distribution was applied for calculating the probability of patients at distinct intervals. The researcher for calculating the probability for each selected distribution of this study made use of one delimiter or two delimiters which was found in EasyFit 5.5 Professional statistical software. The calculated probabilities of patients admitted in DRRH for each distribution were given below:

(a) Probability of patients admitted in hospital calculated using the probability

density function of Generalized Extreme Value distribution with respect to

its probability graphs.

1) The probability of a total number of inpatients admitted in hospital less

than thirty (30) per day is equal to 0.0652. That is least number of

patients admitted per day as inpatients is more than 30 generally, as

(P(X<30) =0.065<10%). Thus only in less than 10% days of a year (36

days including Sundays), the number of admissions less than 30 patients

is happened. That is, even in holidays more than 30 admissions in the

hospital are needed.

79 2) The probability of a total number of inpatients admitted in the hospital

between 50 and 70 per day is equal to 0.4055. ie In 41% of days (about

150 days of a year) there is a need of admitting 50 to 70 persons in the

inpatient wards.

3) The probability of a total number of inpatients admitted in the hospital

per day more than 80 is equal to 0.0413. (Only 4% odd days are occurred

with an admission of more than 80 patients in a day.

4) Thus 50% of days of a year, a regular inpatient admission of 30 to 50

patients are happened.

The respective probability graphs for each calculated probability of the total number of inpatients admitted in the hospital per day was shown below:

Figure 4. 8: Probability graph of Gen. Extreme Value with P (X<30).

Probability graph of Gen. Extreme Value with P(X<30) 0.026 0.024 0.022 0.02 0.018 0.016 0.014 f(x) 0.012 0.01 0.008 0.006 0.004 0.07 0.002 0 30 0 50 100 Number of inpatients in DRRH during the year 2017-2018

Gen. Extreme Value (-0.20001; 14.518; 46.154)

80 Figure 4. 9: CDF graph of Gen. Extreme Value with P (X<30).

Cumulative Distribution Function of Gen. Extreme Value Distributioin

1

0.9

0.8 0.7

0.6

0.5 F(x) 0.4 0.065 0.3 2 0.2

0.1

0 0 50 100 30 Number of inpatient in DRRH during the year 2017-2018

Gen. Extreme Value (-0.20001; 14.518; 46.154)

Figure 4. 10: Probability graph of Gen. Extreme Value with P (50

Probability graph of Gen. Extreme Value with P(50

Gen. Extreme Value (-0.20001; 14.518; 46.154)

81 Figure 4. 11: CDF graph of Gen. Extreme Value with P (50

Cumulative Distribution Function of Gen. Extreme Value Distributioin

1

0.9

0.8

0.7 0.4055 0.6

0.5 F(x) 0.4

0.3 0.2

0.1

0 70 0 50 100 Number of inpatient in DRRH during the year 2017-2018

Gen. Extreme Value (-0.20001; 14.518; 46.154)

Figure 4. 12: Probability graph of Gen. Extreme Value with P (80

Probability graph of Gen. Extreme Value with P(8080 100 Number of inpatients in DRRH during the year 2017-2018

Gen. Extreme Value (-0.20001; 14.518; 46.154)

82 Figure 4. 13: CDF graph of Gen. Extreme Value with P (80

Cumulative Distribution Function of Gen. Extreme Value Distributioin

0.04 1

0.9

0.8 0.7

0.6

0.5 F(x) 0.4

0.3 0.2

0.1

0 0 50 >80 100 Number of inpatient in DRRH during the year 2017-2018

Gen. Extreme Value (-0.20001; 14.518; 46.154)

4.6.1 Prediction of the probability of occurrence of outpatients’ level exceeding

some limits.

(b) Probability of patients attended the outpatient department in hospital

calculated using the probability density function of Dagum distribution with

respect to its probability graphs.

1) The probability of a total number of outpatients‟ attended in the outpatient

department of the hospital less than one hundred and twenty-five (125) per

day is equal to 0.0957. (ie only in 10% days (about 36 days of a year

including national holidays) of the year the least number of patients of 125

is coming to get service from the outpatient ward in DRRH.

2) The probability of a total number of outpatients attended an outpatient

department in the hospital between 125 and 375 per day is equal to 0.6256.

(ie. In 63% days (230 days of a year including both weekends and public

83 holidays) of the year, a large number of patients between 125 and 375 are

coming to get health service from the outpatient ward in DRRH.

3) The probability of a total number of outpatients attended an outpatient

department in the hospital between 375 and 550 per day is equal to 0.2151.

(ie. Only 22% days (about 80 days of a year including both weekends and

public holidays) of the year a minimum number of patients between 375

and 550 is coming to get health service from the outpatient ward in DRRH.

4) The probability of a total number of outpatients attended the outpatient

department in the hospital per day more than 550 is equal to 0.0635. (That

is only 6% odd days in a year (about 22 days) occur with an attendance of

more than 550 patients in a day.

5) Also, the respective probability graphs for each calculated probability of a

total number of outpatients attended the outpatient department in the

hospital per day was shown below:

84 Figure 4. 14: Probability graph of Dagum with P (X<125).

Probability Graph of Dagum Distribution with P(X<125) 0.003

0.0025

0.002

0.0015 f(x)

0.001 0.096 5E-4

0 0 125 200 400 600 800 1000 Number of outpatients in DRRH during year 2017-2018

Dagum (0.40197; 5.1053; 391.97)

Figure 4. 15: CDF graph of Dagum with P (X<125).

Cumulative Distribution Function of Dagum Distribution 1

0.9

0.8

0.7

0.6

0.5 F(x) 0.4 0.3

0.2 0.096 0.1 6 0 0 200 400 600 800 1000 Number of outpatients in DRRH during year 2017-2018

Dagum (0.40197; 5.1053; 391.97)

85 Figure 4. 16: Probability graph of Dagum with P (125

Probability Graph of Dagum Distribution with P(125

0.0025

0.002 0.6256

0.0015 f(x)

0.001

5E-4

0 0 125 200 375 400 600 800 1000 Number of outpatients in DRRH during year 2017-2018

Dagum (0.40197; 5.1053; 391.97)

Figure 4. 17: CDF graph of Dagum with P (125

Cumulative Distribution Function of Dagum Distribution 1

0.9

0.8

0.7

0.6

0.5 F(x) 0.6256 0.4 0.3

0.2

0.1

0 0 200375 400 600 800 1000 125 Number of outpatients in DRRH during year 2017-2018

Dagum (0.40197; 5.1053; 391.97)

86 Figure 4. 18: CDF graph of Dagum with P (X>550).

Cumulative Distribution Function of Dagum Distribution 1

0.9 0.8 0.2151 0.7

0.6

0.5 F(x) 0.4 0.3

0.2

0.1

0 0 200 400 600 800 1000 Number of outpatients in DRRH during year 2017-2018

Dagum (0.40197; 5.1053; 391.97)

(c)The Quartiles and Deciles for patients admitted in the hospital.

This study was also interested in calculating the measures of the position such as quartiles and deciles for patients admitted in Dodoma regional referral hospital. The quartile is the measure of position which divides the sample dataset into four equal part. The deciles are the points measuring the position of data dividing into ten equal parts. The calculated quartiles and deciles with respect to Generalized extreme value distribution for the daily total number of hospital inpatients are shown in Table 4.12 below.

Table 4. 12: The measures of Positions of Patients Admitted in Hospital.

Quartiles Deciles Position Q1 Q2 Q3 D2 D4 D6 D8 Values 41 52 62 38 48 56 65 Source: Calculated manually by the researcher.

87 Table 4.12 indicates the calculated measures of the position such as quartiles and deciles for a total number of hospital inpatients from DRRH. From the table 4.12 the

first quartile is Q1 =41 representing that in ¼ th of the year there is a maximum of 41 patients admitted per day while the other 75% of days more than 41 patients are

admitted. The second quartile Q2 =52 indicates that on 50% days of the year utmost

52 patients are admitted. Similarly, ¼ th of higher turnout in a year is with at least

62 patients per day.

From table 4.12, the second decile D2 =38 implies that at least 20% of patient entry in the inpatient ward is 38 or less number of patients in 73 days. Another 20 % days

(73 days) there is an admission of 38 to 48 patients coming, while in 73 other days

48 to 56 patients are admitted per day, 56 to 65 patients per day are requiring admission in the inpatient ward for another 73 days. 65 to 100 patients per day seek admission in the higher attendance 73 days.

4.7 Forecasting.

Before using a model for forecasting it must be checked for model adequacy and diagnostics of parameters. For a successful model, it should be noted that it had less number of variables and giving the best forecasting results. For example, in a time series model that has more than one successful ARIMA model, one should consider the model with less variable.ie the number of AR or MA with less dimension. But model checking adequacy was achieved by using the criteria of minimum AIC and

BIC (Akaike, 1974) in order to select the best ARIMA model among the successful models and it is used for forecasting process. Also in time series forecasting, the model is fixed only after testing the stationary of the model.

88 4.7.1 Stationarity of time series data.

The time-series data are first checked for stationarity before the trial of fitting a model. The variables have to be checked for unit root and the order of integration of each series data must be determined. In this study, the stationarity of the average number of outpatient daily visiting in the hospital was tested using time series plots of ACF and PACF followed by the ADF test. In this case, a visual inspection of the

ACF plot indicates that the daily total outpatient's data series is non-stationary since the ACF decay is very slow (see figure 4.21 in Appendix 3). The data were transformed by taking a difference of lag one and stationarity test was conducted with zero lagged difference. As the new time series plots show no trend and it moves around zero (0) and ACF plot shows that the ACF decrease rapidly in this plot, indicating that the number of outpatients‟ data is now stationary (see figure 4.22 in

Appendix 3). The summary of the calculated ADF test for the average daily total number of outpatients per week was shown in table 4.13 below:

Table 4. 13: ADF Test for the Differenced Average Daily Outpatients Visiting Hospital Type Lags Rho Pr< Rho Tau Pr< Tau F Pr> F Zero Mean 0 -139.003 0.0001 -14.86 <.0001 Single Mean 0 -139.015 0.0001 -14.79 <.0001 109.40 0.0010 Trend 0 -139.102 0.0001 -14.74 <.0001 108.67 0.0010 Source: Computed by the researcher using SAS 9.4 software.

From the table 4.13 above, the p-value of the tau test statistic is less than 0.0001, which indicates that the null hypothesis is rejected at 5% level of significance and the data series tend stationary. The p-value of the F test statistic is 0.001 means that joint test of time on-trend and nonstationarity is rejected at the 1% level of significance which also assures the stationarity of the first difference data set. Now,

89 the ARIMA model should be used for forecasting the average total number of outpatient per day visiting in the hospital.

4.7.2 Time Series ARIMA model Identification.

A test of misspecification must be conducted to find out whether the ARIMA model is an appropriate representation for the best forecast purpose. ARIMA model identification was done by using the model criteria selection such as AIC and the

BIC values computed. The researcher used the ARIMA procedure in SAS 9.4 to compute the values of both AIC and the BIC for model selection comparison. When comparing different time series models, smaller the AIC and BIC statistics, better the model. The calculated values of both AIC and BIC for five ARIMA models were shown in the table below:

Table 4. 14: Time Series ARIMA models selection. Model AIC BIC ARIMA (1, 1, 0) 1018.079 1023.348 ARIMA (0, 1, 1) 1020.178 1025.448 ARIMA (1, 1, 1) 1020.058 1027.962 ARIMA (2, 1,0) 1020.051 1027.956 ARIMA (0,1,2) 1021.633 1032.172 Source: Computed by the researcher using SAS 9.4

The results from table 4.14 above indicating the fitted five different possible

ARIMA models fixing d=1 in the ARIMA (p, d, q) model. From the table 4.14, a comparison of the values of both the AIC and the BIC value for the five possible fitted ARIMA models indicate that the ARIMA model (1, 1, 0) has the smallest values of AIC and BIC. Therefore, select the model ARIMA (1, 1, 0) to determine the best forecasting result on daily outpatients‟ attendance in DRRH during the year

2017-2018. 90 Even though the ARIMA (1, 1, 0) model is identified as the best fit, we must check for normality of residuals for the fitted model. The residual analysis of a model is often useful to check, whether there is any change in the time-series data that are not accounted for the presently estimated model. The graphical check of the residuals from the ARIMA model is shown in the figure below:

Figure 4. 19: Residual diagnostic plots for differenced ARIMA (1, 1, 0) model.

Source: Processed by the researcher using SAS 9.4 software

91 Figure 4. 20: Normality check of residuals for the ARIMA (1, 1, 0) model.

Source: Processed by the researcher using SAS 9.4 software.

From figure 4.19 and figure 4.20, the graphical check of the residuals of the ARIMA

(1, 1, 0) is performed. The residual correlation and white noise test plots above by

ACF and PACF and also the calculated p-value ( see table 4.17 in appendix 3) is greater than 0.05 level of significance shows that you cannot reject the null hypothesis at 5% level of significance and we conclude that there is no correlation among the residuals. Also, the normality plots show that there is no departure from normality. Therefore, it is concluded that the time series ARIMA (1, 1, 0) model is more adequate for forecasting the daily average total number of outpatients visiting in hospital outpatient department for getting the health care services.

4.7.3 Time series Model Parameter Estimation.

After identifying the best model, the process was followed by a stage of parameter estimation. ARIMA models are usually estimated which is necessarily stationary series. Parameter estimates along with corresponding standard errors of fitted

92 ARIMA (1, 1, 0) model of daily average total outpatients were shown in table 4.15 give below:

Table 4. 15: Parameter estimates for ARIMA (1, 1, 0) model.

Parameter Estimate Standard t-Value Approx Lag Error Pr > |t| MU -0.45313 2.43230 -0.19 0.8526 0 AR1,1 -0.36294 0.09277 -3.91 0.0002 1 Source: Computed by the researcher using SAS9.4

Table 4.15 indicates the estimated parameters for ARIMA (1, 1,0) selected as the best fit model for the forecasting process. The fitted ARIMA (1, 1, 0) model is:

tt 0.45313  0.36294 1 ……………………………… (90)

The negative estimate for the coefficient in ARIMA (1, 1, 0) model above means that lags bear an inverse relationship with previous variables in previous periods.

4.7.4 Forecasting the expected number of average outpatient per day in the next

five weeks.

Subsequently the time series ARIMA model satisfied almost all the relevant diagnostic test, it was used to make forecasts for the average daily total number of outpatients visiting the DRRH during the year 2017-2018 for the next five weeks.

The results of the forecast average daily total number of outpatients‟ per week are shown in the table below:

93 Table 4. 16: Indicating 95% for the average daily total OPDs forecast values by

the ARIMA model (1, 1, 0) for the next 5 weeks.

Observed value Forecast value Lower C.L Upper C.L 105 258.1524 192.3601 323.9448 106 259.0899 181.0811 337.0987 107 258.1321 165.1604 351.1037 108 257.8621 153.4889 362.2354 109 257.3425 142.2190 372.4660 Source: Calculated by the researcher by using SAS 9.4

(Data is available for 104 weeks average daily OPD from 2017 to 2018)

94 CHAPTER FIVE

CONCLUSION AND RECOMMENDATIONS.

This chapter introduces a brief summary of the results, findings of the study and outline the major conclusions that were derived from the empirical results in two parts. The first part deals with the summary of findings and next part gives the general conclusion and recommendation for future study.

5.1 Summary of the results of the findings.

The study was carried out on two-year daily inpatient and outpatient data from the

Dodoma regional referral hospital (DRRH) in Tanzania during 2017-2018. A quantitative research approach was used and the probability distribution and hence probabilities are evaluated. Summary statistics of age and sex of patients were illustrated in Table 4.1 were also shown in chapter 4. 1/3 of inpatients are only males while 2/5 th of outpatients are males. ¾ th of inpatients belong to the age group of 5-

60 years and 1/8 each of old age and children. 7/10 of outpatients are of age 5-60 years while 20% are too old and 10% are children.

The descriptive statistics of inpatients daily data given in table 4.2 shows interesting results. An average number of daily inpatients is 52 with SD=15 patients but the measure of skewness (0.35>0) and kurtosis (0.37>0) is positive indicating the non- normality and asymmetry of the distribution of inpatient data. Thus the inpatient daily data will follow a positively tailed leptokurtic distribution. Similarly, an average number of daily outpatients =304, SD=146 and skewness is positive

(0.44>0) but kurtosis is negative (-0.41) showing that same as an inpatient, outpatient distribution is non-normal, platikurtic and highly tailed to the right side of the mean. Appropriate distribution graphs were given subsequently.

95 Primary fitting of the distributions to inpatient and outpatient data was performed by the Easyfit 5.5 Profession statistical software. The software is dealing with 61 continuous distributions with three goodness of fit tests for raw data and two for frequency data. Kolmogorov Smirnov, Anderson- Darling are common tests and

Chi-Square test for raw data. Preliminary distributions for inpatient data was organized by observing repeated distributions within a rank of 5 in 3 tests. There are

5 distributions found for inpatient distributions and all of them show similar characteristics on mean, SD, skewness and kurtosis. For outpatient data, there are 9 probability distributions satisfactory but their characteristics are a little different.

The parameters of the selected distributions were estimated by the maximum likelihood method. The location, scale and shape parameters were detected and estimated for 13 cases with the help of software. To identify the best fitting distribution from the selected list, 3 statistical criterions were adopted. Log-

Likelihood was determined using Excel or SAS for the given set of inpatient and outpatient data and the AIC and BIC is deduced by the linear relations. Generalized

Extreme Value distribution is the fittest among the 5 for inpatient data and Dagum distribution is appropriate for outpatient data. The most befitting distribution is detected by finding the least values formed for the 3 criterions. The computed values of AIC, BIC and value of log-likelihood for Generalized Extreme Value distribution were 1067.96, 1067.408 and -529.98 which is minimum in each column. Similarly,

AIC, BIC, and the log-likelihood value for Dagum distribution were 78.455, 75.797 and -36.228 respectively and they are minimum in respective columns.

The graphs of the suggestive distributions were plotted and clearly shown that

Generalized Extreme Value distribution and Dagum distribution are showing the

96 least errors with the data set. Also, the P-P plot and Q-Q plots of the inpatient suggested distributions were drawn to identify the appropriate distribution.

The mean and SD of the data is not much consistent as the distribution is heavy- tailed. In such cases, actual distribution analysis is more reliable and probabilities of events are more scientific as confidence intervals are not applicable. So the probabilities of a specified number of inpatients and outpatients were found out. The following are some applies from the study.

Probability of average inpatient P(X <30) a day is only 0.06, while in more than

50% days of a year an average number of patients to be admitted is between 30 to

50, Excessive admission beyond 80 is only seen in 18 days of the year.

Probability of average outpatient is P(X <125) for 10% days of a year and in 226 days of a year there is an outpatient rate between 125 to 375 per day.

Lower Quarter days admits less than 41 patients per day and highest 25% days have to accommodate a minimum of 62 patients per day. The second middle 73 days in an ordering of admissions of patients in ascending order contain 38 to 48 patients and third middle 73 days with 48 to 56 patients and fourth middle 73 days having 56 to

65 admissions.

The time series ARIMA model was used in this study to model the patients‟ hospital data in SAS 9.4 statistical software. The study identified ARIMA (1, 1, 0) model to be the best for the daily average total number of outpatients‟ week wise visiting hospital outpatient department for two years of daily data. However, with the use of the model selection criteria minimum AIC and BIC model is selected as the best forecasting ARIMA model.

97 5.2 Conclusion

This study has examined the total hospital inpatients and total outpatients daily data for two years from DRRH obtained through the hospital electronic health management information system and has fitted all the selected statistical distributions. After following the distributions fitting process using the statistical software known as EasyFit 5.5 Professional software with tested accuracy the study proposes Generalized Extreme Value distribution as the best-fit distribution model for fitting the total hospital inpatients data. Also, the Dagum distribution followed by Log logistic (3P) distribution was selected to be the best fit distribution models for representing the total hospital outpatients‟ data. The final selection of these distributions was done by using the minimum calculated values of the both AIC value, BIC value and the value of the log-likelihood.

The distribution that has the smallest values of both AIC and BIC value, as well as the one which has the highest value of the log-likelihood, is the one chosen to be the best-fit distribution model for the hospital patients two years daily data. The present set up of inpatient and outpatient accommodation may cause insecurity and complaining in-hospital care for the Dodoma people as the probability of a number of inpatients and outpatients is more comparing to available infrastructure facilities and healthcare resources of experts. An average of 260 outpatients with a range of

140 to 390 is expected every day in the near future days of 2017-2018 study and it is an observer for the government and international healthcare agencies to engage necessary requirements for a satisfying health care system. Time series forecasting using ARIMA model can be applied as a choice provision system in the healthcare institutions.

98 5.3 Recommendations

On the basis of the findings of the study, the following recommendations were made: The DRRH should use the Generalized Extreme Value distribution for modeling and fitting the daily total number of inpatients admission in the hospital.

The DRRH should use the ARIMA (1, 1, 0) model for forecasting and planning activities and the Dagum distribution for modeling and fitting the daily total number of outpatients visiting at the hospital outpatient department. In order to prepare adequate facilities for the overwhelming outpatients in the outpatient department at the hospital, the DRRH administration should make use of the probability distributions and forecasted figure in its planning activities for the coming weeks.

Also, the government must continue to support the health facilities like DRRH in terms of personnel and logistics in order to provide quality health care for the community.

5.4 Future area of the study

1) Probability analysis should be conducted on male-female

distributions of inpatient and outpatient data.

2) Age-wise distribution study is necessitated to improve the conditions

of children and old aged patients.

3) Departmental probability and ARIMA model is necessary to expand

and modernize certain departments especially gynecology and

pediatric departments.

4) The study also can be extended by fitting and modeling the patients‟

medical bills, payment amounts in the referral hospital by finding an

appropriate probability distribution so that their cost of treatment is

transparent. 99 5) Further research could be conducted in others regional referral

hospitals in the country by introducing more continuous probability

distributions to improve the accuracy of the distributions that best fits

both total number of inpatients and a total number of outpatients in

the hospital.

100 REFERENCES Anderson, D. R., & Burnham, K. P. (2004). "Multimodel infrence "Understanding A.I.C &B.I.C in model selection". "Colorado Cooperative Fish&Wildlife Research Unit (USGS-BRD)". Adejumo A. O., & Momo A. A. (2013). Modeling Box-Jenkins Methodology on Retail Prices of Rice in Nigeria. Adeleke, I. A., & Ibiwoye, A. (2011). Modeling Claim Sizes In Personal Line Non- Life Insurance. International Business& Economics Research Journal. Akaike, H. (1974). "A new look at the statistical model identification". IEEE Transaction on Automatic Control vol. 6, 716-725. Blumberg, B., Cooper, D. R., & Schindler, P. S. (2005). Business Research Methods. Bershire: McGrawHill Education. Box, G. E., & Jenkins, G. M. (1976). Time Series Analysis: Forecasting and Control(3rd ed.). Englewood Cliff, NJ; Prentice Hall. Caleb Boadi, Simon K. Harvey., & Agyapomaa Gyeke-dako. (2015). Modelling of fire count data: fire disaster risk in Ghana. Chakrabartty, S. N. (2013). Best Split-Half and Maximum Reliability. IOSR Journal of Research& Methods in Education, 3(1), 1-8. Chitrasen Lairenjam, Shivarani Huidrom, Arnab Bandyopadhyay., & Bhadra. (2016). Assessment of Probability Distribution of Rainfall of North East Region (NER) of India. Journal of Research in Environmental and Earth Science, Volume 2, 12-18. Cousineau, D. Brown. S., & Heathcote A. (2004). Fitting distributions using maximum likelihood methods and packages. Dadey, E. Ablebu. G., & Agboda.K. (2011). Probability modeling and Simulation of insurance claim:Koforidua Polytechnic. Unpublished manuscript. Daniel, S. (2014). Fitting Distributions to Dose Data. Dickey, D. A., & Fuller, W. A. (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit-root. Journal of the American Statistical Association, 74, 427-431. El-Shanshoury, G. I. (2017). Fitting to Model Particulate Matter Concentrations. Arab Journal of Nuclear Science and Applications, 50(2), 108-122. Famoye, & Lee. (2014). Editorial: Journal of Statistical Distributions and Applications. G., S. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.

101 Gardiner, J. (2014). "Fitting Heavy-Tailed Distributions to Healthcare Data by Parametric and Bayesian Methods". Journal of Statistical Theory and Practice 8(4):, 619-52. Hadavandi E, Shavadi H, Ghanbari A., & Abbasian-Naghneh S. (2012). Developing a hybrid artificial intelligence model for outpatient visits forecasting in hospitals. Applied Soft Computing., 700-11. Hassan, M. F., Islam, M. A., Imam, M. F., & Sayem, S. M. (2014). Forecasting Whole sale Price of Coarse rice in Bangladesh: A seasonal intergrated moving average approach. Hellervik, A., & Rodgers, G. J. (2007). "A Power law distribution in patients lengths of stay in hospital". Physical A, 379,S., 235-240. Ignatov, Z. G., Kaishev, V. K., & Krachunov, R. S. (2010). An Improved Finite- Time Ruin Probability Formula and Its Mathematica Implementation. Insurance Mathematics and Economics,29, 375-386. Juang W-C, Huang S-J, Huang F-D, et al. (2017). Application of time series analysis in modelling and forecasting emergency department visits in a medical centre in Southern Taiwan. Kelaniya, S. (2014). A Simulation approach for reduced outpatient waiting time,. 4- 9. Kibona Shadrack Elia., & Mbago Maurice Chakusaga. (2018). Forecasting Wholesale Prices of Maize in Tanzania Using Arima Model. General Letters in Mathematics Vol.4, 131-141. Kothari, D. (2004). Research Methods simplified (2nd ed.). Bombay: Sage publishers. Lane, D.C., Monefeldt, C., & Husemann, E. (2003). Client Involvement in Simulation Model Building: Hints and Insights from a Case study in a London Hospital. Healthcare Management Science, Vol. 6,, 105-116. Lavanya, S., M. Radha., & Arulanandu U. (2018). Statistical Distribution of Seasonal Rainfall Data for for Rainfall Pattern in TNAU1 Station Coimbatore, Tamil Nadu, India. Int.J.Curr.Microbiol.App Sci 7(4): 3053- 3062. doi:https||doi.org/10.20546/ijcmas.2018.704.346. Machekposhti, Hamidi, Karim., & Sedghi, Hossein. (2019). Determination of the Best Fit Probability Distribution For Annual Rainfall in Karheh River at Iran. International Journal of Environmental and Ecological Engineering. Malehi, A. S., Pourmotahari, F., & Angali, K. A. (2015). Statistical models for the analysis of skewed healthcare cost data: a simulation study,. Health Economics Review, 5(1),, 1-16. Marazzi, A., Paccaud, F., & Ruffieux, C. (1998). "Fitting the distributions of length of stay by parametric models". Medical Care, 36(6), S., 915-927.

102 McClean, S., & P. Millard. (1993). "Patterns of Length of Stay after Admission in Geriatric Medicine: An Event History Approach.". Journal of Royal Statsistcal Society 42 (D):, 263-74. Mehrannia Hossein., & Pakgohar Alireza. (2014). Using EasyFit Software for Goodness-of-Fit Test and Data Generation. International Journal of Mathematical Archve-5(1), 118-124. Mohajan, H. K. (2017). Two Criteria for Good Measurements in Research: Validity and Reliability. Annals of Spiru Haret University, 17(3), 58-82. Omari, C. O., Nyambura, S. G., & Mwangi, J.M.W. (2018). Modeling the Frequency and Severity of Auto Insurance Claims Using Statistical Distributions. Journal of Mathematical Finance, 8,, 137-160. Philip Kibet Langat, Lalit Kumar., & Richard Koech. (2019). Identification of the Most Suitable Probability Distribution Models for Maximum, Minimum and Mean Streamflow. Prieto F., Sarabia, J. M. & Saez, A. J. (2014). Modelling major failures in power grids in the whole range. International Journal of Electrical Power& Energy System, 54,, 10-16. Ramberg, J. S., Tadikamalla, P.R., Dudewicz, E.J., Mykytka, E.F. (1979). A probability distribution and its in fitting data. Technometrics, vol. 21,, 201- 214. Retrieved from http://www.mathwave.com Robson, C. (2011). Real World Research: A Resource for Users of Social Research Methods in Applied Setting, (2nd ed.). Sussex, A. John Wiley and Sons l.t.d. Rotela Jr. P., Salomon, F.L.R., & de Oliveira Pamplona E. (2014). ARIMA: An Applied Time Series Forecasting Model for the Bovespa Stock Index. Applied Mathematics, 5, 3384-3391. Saunders, M., Lewis, P., & Thornhill, A. (2012). "Research Methods for Business Students" 6th edition,. Pearson Education Limeted. Saunders, M., Lewis, P., & Thornhill, A. (2009). Research Methods for Business Students, (5th ed.). Harlow, Pearson Education. So, Y. (2010). Analyzing Interval-Censored Survival Data with... IN SAS GLOBAL FORUM. Sukono, Riaman, E.Lesmana, R.Wulandari, H.Napitupulu., & S.Supian. (2018). Model estimation of claim risk and premium for motor vehicle insurance by using Bayesian method. IOP Conf. Series: Materials Science and Engineering 300 012027. Suleman N., & Sarpong S. (2011). Statistical modeling of hypertension cases in Navrongo, Ghana, West Africa.

103 Thatcher, R. (2010). Validity and Reliability of Quantitative Electroencephalography. Journal of Neurotherapy, 14, 122-152. Tireito F. , Metrine C., Kennedy N., Omukoba M., & Lucy M. (2015). A time series model of Rainfall Patterns of Uasin Gishu County. Vatinee Sukmak, Jaree Thongkam., & Jintana Leejongpermpoon. (2015). Time Series Forecasting in Anxiety Disorders of Outpatient Visits Using Data Mining. KKU Res. 20(2), 241-253. White, G. C., & Bennetts, R.E. (1996). Analysis of frequency count using the negative binomial distribution. Ecology 77,, 2549-2557. Whitt, W., & Zhang, X. (2017). A Data-Driven Model of an Emergency Department. Operations Research for Health Care,, 12, 1-15.

104 APPENDICES Appendix 1: Details of fitting other distributions for the inpatient's data.

Table below showing the test statistic for Beta distribution of a total number of inpatients data.

Beta Distribution. Kolmogorov-Smirnov Sample Size 730 Statistic 0.0291 P-Value 0.55672 Rank 4

 0.2 0.1 0.05 0.02 0.01 Critical Value 0.03971 0.04527 0.05026 0.05618 0.06029 Reject? No No No No No Anderson-Darling Sample Size 730 Statistic 0.67492 Rank 6

 0.2 0.1 0.05 0.02 0.01 Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074 Reject? No No No No No Chi-Squared Deg. of freedom 9 Statistic 1.4396 P-Value 0.99757 Rank 2

 0.2 0.1 0.05 0.02 0.01 Critical Value 12.242 14.684 16.919 19.679 21.666 Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

105 From table above, the Anderson Darling, Kolmogorov Smirnov and Chi-square goodness of fit test indicates that at 95% confidence level Beta distribution fit the total number of inpatients hospital data. There is no rejection of all the calculated test statistics as shown in table above at 5% level of significance showing that the P- value of 0.55672 and 0.99757 of Kolmogorov Smirnov and chi-square test are greater than the 5% level of significance. The calculated value statistic of 0.0291,

0.67492 and 1.4396 of Kolmogorov Simonov, Anderson Darling, and chi-square test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of inpatients hospital data from DRRH follows the Beta distribution.

Table below showing the test statistic for Burr distribution of a total number of inpatients hospital data.

Burr Distribution.

Kolmogorov-Smirnov

Sample Size 730 Statistic 0.02907 P-Value 0.55796 Rank 3

 0.2 0.1 0.05 0.02 0.01

Critical Value 0.03971 0.04527 0.05026 0.05618 0.06029

Reject? No No No No No

Anderson-Darling

Sample Size 730 Statistic 0.62993 Rank 1

 0.2 0.1 0.05 0.02 0.01

Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

106 Reject? No No No No No

Chi-Squared

Deg. of freedom 9 Statistic 4.8985 P-Value 0.84306 Rank 5

 0.2 0.1 0.05 0.02 0.01

Critical Value 12.242 14.684 16.919 19.679 21.666

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

From table above, the Anderson Darling, Kolmogorov Smirnov and Chi-square goodness of fit test indicates that at 95% confidence level Burr distribution fit the total number of inpatients hospital data. There is no rejection of all the calculated test statistics as indicated in table above at 5% level of significance showing that the

P-value of 0.55796 and 0.84306 of Kolmogorov Smirnov and chi-square test are greater than the 5% level of significance. The calculated value statistic of 0.02907,

0.62993 and 4.8985 of Kolmogorov Simonov, Anderson Darling, and chi-square test are smaller than the critical values 1%, 2% and 5% respectively. Therefore, the total number of hospital d inpatients data from DRRH follows Burr distribution.

107 Table below showing the test statistic for Gamma (3P) distribution of a total number of inpatients hospital data.

Gamma (3P) Distribution.

Kolmogorov-Smirnov

Sample Size 730 Statistic 0.02972 P-Value 0.52964 Rank 6

 0.2 0.1 0.05 0.02 0.01

Critical Value 0.03971 0.04527 0.05026 0.05618 0.06029

Reject? No No No No No

Anderson-Darling

Sample Size 730 Statistic 0.66523 Rank 2

 0.2 0.1 0.05 0.02 0.01

Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Chi-Squared

Deg. of freedom 9 Statistic 1.4551 P-Value 0.99747 Rank 4

 0.2 0.1 0.05 0.02 0.01

Critical Value 12.242 14.684 16.919 19.679 21.666

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

108 From table above, the Anderson Darling, Kolmogorov Sminron and Chi-square goodness of fit test indicates that at 95% confidence level Gamma (3P) distribution fit the total number of inpatients hospital data. There is no rejection of all the calculated test statistics as indicated in table above at 5% level of significance showing that the P-value of 0.52964 and 0.99747 of Kolmogorov Smirnov and chi- square test are greater than the 5% level of significance. The calculated value statistic of 0.02972, 0.66523 and 1.4551 of Kolmogorov Simonov, Anderson

Darling, and chi-square test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of inpatients hospital data from DRRH follows the Gamma (3P) distribution.

Table below showing the test statistic for Generalized Gamma (4P) distribution of a total number of inpatients hospital data.

Gen. Gamma (4P) Distribution.

Kolmogorov-Smirnov

Sample Size 730 Statistic 0.02973 P-Value 0.52883 Rank 7

 0.2 0.1 0.05 0.02 0.01

Critical Value 0.03971 0.04527 0.05026 0.05618 0.06029

Reject? No No No No No

Anderson-Darling

Sample Size 730 Statistic 0.66606 Rank 3

 0.2 0.1 0.05 0.02 0.01

109 Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Chi-Squared

Deg. of freedom 9 Statistic 1.448 P-Value 0.99751 Rank 3

 0.2 0.1 0.05 0.02 0.01

Critical Value 12.242 14.684 16.919 19.679 21.666

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

The fitting results from the table above, the Anderson Darling, Kolmogorov

Sminron and Chi-square goodness of fit test shows that at 95% confidence level

Generalized Gamma (4P) distribution fit the total number of inpatients hospital data.

There is no rejection of all the calculated test statistics as indicated in table above at

5% level of significance showing that the P-value of 0.52883 and 0.99751 of

Kolmogorov Smirnov and chi-square test are greater than the 5% level of significance. The calculated values statistic of 0.02973, 0.66606 and 1.448 of

Kolmogorov Simonov, Anderson Darling, and chi-square test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of inpatients hospital data from DRRH follows Generalized Gamma (4P) distribution.

However, all five statistical distributions named above shows the great signs of providing good fit to the hospital inpatients two years daily data as both of them no one was rejected for all the calculated test statistics as indicated from tables above at

5% level of significance and also it was very difficult to determine the best fit

110 distribution. Therefore, at this stage, the further calculation was done by using both values of AIC and BIC values as well as the value of log-likelihood to know which one best fit the total number of hospital inpatients two years daily data.

111 Appendix2: Details of fitting other distributions for the outpatient's data. Table below showing the test statistic for Cauchy distribution of a total number of outpatients‟ hospital data.

Cauchy Distribution

Kolmogorov-Smirnov

Sample Size 13 Statistic 0.13233 P-Value 0.95423 Rank 23

 0.2 0.1 0.05 0.02 0.01

Critical Value 0.2847 0.32549 0.36143 0.40362 0.43247

Reject? No No No No No

Anderson-Darling

Sample Size 13 Statistic 0.94565 Rank 3

 0.2 0.1 0.05 0.02 0.01

Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

The fitting results from the table above, indicating that the Anderson Darling and

Kolmogorov Smirnov goodness of fit test shows that at 95% confidence level

Cauchy distribution fit the total number of outpatients‟ hospital data. There is no rejection of all the calculated test statistics as indicated in table above at 5% level of significance showing that the P-value of 0.95423 of Kolmogorov Smirnov test is greater than the 5% level of significance. The calculated values statistic of 0.13233

112 and 0.94565 of Kolmogorov Smirnov and Anderson Darling test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of outpatients‟ hospital data from DRRH follows Cauchy distribution.

Table below showing the test statistic for Johnson SB distribution of a total number of outpatients‟ hospital data.

Johnson SB Distribution.

Kolmogorov-Smirnov

Sample Size 13 Statistic 0.12688 P-Value 0.96775 Rank 18

 0.2 0.1 0.05 0.02 0.01

Critical Value 0.2847 0.32549 0.36143 0.40362 0.43247

Reject? No No No No No

Anderson-Darling

Sample Size 13 Statistic 0.88011 Rank 2

 0.2 0.1 0.05 0.02 0.01

Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

The fitting results from the table above, indicating that the Anderson Darling and

Kolmogorov Smirnov goodness of fit test shows that at 95% confidence level

Johnson SB distribution fit the total number of outpatients‟ hospital data. There is no

113 rejection of all the calculated test statistics as indicated in table above at 5% level of significance showing that the P-value of 0.96775 of Kolmogorov Smirnov test is greater than the 5% level of significance. The calculated values statistic of 0.12688 and 0.88011 of Kolmogorov Smirnov and Anderson Darling test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of outpatients‟ hospital data from DRRH follows Johnson SB distribution.

Table below showing the test statistic for Log-Logistic (3P) distribution of a total number of outpatients‟ hospital data.

Log-Logistic (3P) Distribution.

Kolmogorov-Smirnov

Sample Size 13 Statistic 0.11328 P-Value 0.98935 Rank 9

 0.2 0.1 0.05 0.02 0.01

Critical Value 0.2847 0.32549 0.36143 0.40362 0.43247

Reject? No No No No No

Anderson-Darling

Sample Size 13 Statistic 0.95862 Rank 5

 0.2 0.1 0.05 0.02 0.01

Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

114 The fitting results from the table above, indicating that the Anderson Darling and

Kolmogorov Smirnov goodness of fit test shows that at 95% confidence level Log-

Logistic (3P) distribution fit the total number of outpatients‟ hospital data. There is no rejection of all the calculated test statistics as indicated in table above at 5% level of significance showing that the P-value of 0.98935 of Kolmogorov Smirnov test is greater than the 5% level of significance. The calculated values statistic of 0.11328 and 0.95862 of Kolmogorov Smirnov and Anderson Darling test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of outpatients‟ hospital data from DRRH follows Log-Logistic (3P) distribution.

Table below showing the test statistic for Nakagami distribution of a total number of outpatients‟ hospital data.

Nakagami Distribution.

Kolmogorov-Smirnov Sample Size 13 Statistic 0.10346 P-Value 0.99639 Rank 3

 0.2 0.1 0.05 0.02 0.01 Critical Value 0.2847 0.32549 0.36143 0.40362 0.43247 Reject? No No No No No

Anderson-Darling

Sample Size 13 Statistic 0.98675 Rank 9

 0.2 0.1 0.05 0.02 0.01 Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

115 The fitting results from the table above, indicating that the Anderson Darling and

Kolmogorov Smirnov goodness of fit test shows that at 95% confidence level

Nakagami distribution fit the total number of outpatients‟ hospital data. There is no rejection of all the calculated test statistics as indicated in Table above at 5% level of significance showing that the P-value of 0.99639 of Kolmogorov Smirnov test is greater than the 5% level of significance. The calculated values statistic of 0.10346 and 0.98675 of Kolmogorov Smirnov and Anderson Darling test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of outpatients‟ hospital data from DRRH follows Nakagami distribution.

Table below showing the test statistic for Rayleigh distribution of a total number of outpatients‟ hospital data.

Rayleigh Distribution. Kolmogorov-Smirnov

Sample Size 13 Statistic 0.12563 P-Value 0.97044 Rank 16

 0.2 0.1 0.05 0.02 0.01 Critical Value 0.2847 0.32549 0.36143 0.40362 0.43247

Reject? No No No No No

Anderson-Darling Sample Size 13 Statistic 0.71345 Rank 1

 0.2 0.1 0.05 0.02 0.01 Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No Source: Computed by the researcher using EasyFit 5.5 software.

116 The fitting results from the table above, indicating that the Anderson Darling and

Kolmogorov Smirnov goodness of fit test shows that at 95% confidence level

Rayleigh distribution fit the total number of outpatients‟ hospital data. There is no rejection of all the calculated test statistics as indicated in Table above at 5% level of significance showing that the P-value of 0.97044 of Kolmogorov Smirnov test is greater than the 5% level of significance. The calculated values statistic of 0.12563 and 0.71345 of Kolmogorov Smirnov and Anderson Darling test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of outpatients‟ hospital data from DRRH follows Rayleigh distribution.

Table below showing the test statistic for Rayleigh (2P) distribution of a total number of outpatients‟ hospital data.

Rayleigh (2P) Distribution.

Kolmogorov-Smirnov Sample Size 13 Statistic 0.10538 P-Value 0.99544 Rank 5

 0.2 0.1 0.05 0.02 0.01 Critical Value 0.2847 0.32549 0.36143 0.40362 0.43247 Reject? No No No No No

Anderson-Darling

Sample Size 13 Statistic 1.0907 Rank 21

 0.2 0.1 0.05 0.02 0.01 Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No Source: Computed by the researcher using EasyFit 5.5 software.

117 The fitting results from the table above, indicating that the Anderson Darling and

Kolmogorov Smirnov goodness of fit test shows that at 95% confidence level

Rayleigh (2P) distribution fit the total number of outpatients‟ hospital data. There is no rejection of all the calculated test statistics as indicated in Table above at 5% level of significance showing that the P-value of 0.99544 of Kolmogorov Smirnov test is greater than the 5% level of significance. The calculated values statistic of

0.10538 and 1.0907 of Kolmogorov Smirnov and Anderson Darling test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of outpatients‟ hospital data from DRRH follows Rayleigh (2P) distribution.

Table below showing the test statistic for Weibull distribution of a total number of outpatients‟ hospital data.

Weibull Distribution.

Kolmogorov-Smirnov Sample Size 13 Statistic 0.10236 P-Value 0.99686 Rank 2

 0.2 0.1 0.05 0.02 0.01 Critical Value 0.2847 0.32549 0.36143 0.40362 0.43247 Reject? No No No No No

Anderson-Darling

Sample Size 13 Statistic 0.95382 Rank 4

 0.2 0.1 0.05 0.02 0.01 Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Source: Computed by the researcher using EasyFit 5.5 software.

118 The fitting results from the table above, indicating that the Anderson Darling and

Kolmogorov Smirnov goodness of fit test shows that at 95% confidence level

Weibull distribution fit the total number of outpatients‟ hospital data. There is no rejection of all the calculated test statistics as indicated in Table above at 5% level of significance showing that the P-value of 0.99686 of Kolmogorov Smirnov test is greater than the 5% level of significance. The calculated values statistic of 0.10236 and 0.95382 of Kolmogorov Smirnov and Anderson Darling test are smaller than the critical values 1%, 2%, and 5% respectively. Therefore, the total number of outpatients‟ hospital data from DRRH follows Weibull distribution.

Table below showing the test statistic for Burr distribution of total number of outpatients‟ hospital data.

BurrDistribution Kolmogorov-Smirnov Sample Size 13 Statistic 0.10361 P-Value 0.99632 Rank 4

 0.2 0.1 0.05 0.02 0.01 Critical Value 0.2847 0.32549 0.36143 0.40362 0.43247 Reject? No No No No No

Anderson-Darling

Sample Size 13 Statistic 0.9597 Rank 6

 0.2 0.1 0.05 0.02 0.01 Critical Value 1.3749 1.9286 2.5018 3.2892 3.9074

Reject? No No No No No

Source: Computed by researcher using EasyFit 5.5 software.

119 The fitting results from the table above, the Anderson Darling and Kolmogorov

Smirnov goodness of fit test shows that at 95% confidence level Burr distribution fit the total number of outpatients‟ hospital data. There is no rejection of all the calculated test statistics as indicated in table above at 5% level of significance showing that the P- value of 0.99632 of Kolmogorov Smirnov test is greater than the

5% level of significance. The calculated values statistic of 0.10361 and 0.9597 of

Kolmogorov Smirnov and Anderson Darling test are smaller than the critical values

1%, 2% and 5% respectively. Therefore, the total number of outpatients‟ hospital data from DRRH follows Burr distribution.

However, despite all nine statistical distributions named above shows the great signs of providing good fit to the hospital outpatients two years daily data as both of them no anyone was rejected for all the calculated test statistics as indicated from tables above at 5% level of significance and but also it was very difficult to determine the best fit distribution. Therefore, at this stage, the further computation was done by using both AIC, BIC, and the log-likelihood to know which one best fit the total number of hospital outpatients‟ two years daily data.

120 Appendix3. Time series plots before and after differenced. Figure 4.21 Indicating the Time series plots before differencing

Figure 4.22 Indicating the Time series plots after differencing.

121

Autocorrelation Check of Residuals

To Lag Chi- DF Pr > Autocorrelations Square ChiSq

6 1.92 5 0.861 0.006 0.033 0.005 -0.075 0.085 -0.058

12 6.18 11 0.861 0.048 0.114 0.021 0.083 0.029 -0.115

18 7.39 17 0.978 -0.027 0.006 -0.079 -0.008 0.015 0.050

24 10.99 23 0.983 0.010 -0.132 -0.006 -0.013 -0.096 0.022

122 Appendix4. SAS CODES. procprintdata=hdata; run; procarimadata=hdata; identifyvar=av_totopd stationarity=(adf=0); run; procarimadata=hdata; identifyvar=av_totopd(1) stationarity=(adf=0); run; procarimadata=hdata; i var=av_totopd(1) noprint; e p=1; forecastlead=5; run; procarimadata=hdata; i var=av_totopd(1) noprint; e p=1 q=1; forecastlead=5; run; procarimadata=hdata; i var=av_totopd(1) noprint; e q=1; forecastlead=5; run; procarimadata=hdata; i var=av_totopd(1) noprint; e p=2; forecastlead=5; run; procarimadata=hdata; i var=av_totopd(1) noprint; e q=2; forecastlead=5; run;

PROCPRINTDATA=HDATA; RUN; PROCHPSEVERITYDATA=HDATA CRIT=AIC; LOSS TOTOPDs; DISTweibull burr; RUN; procliferegdata=hdata; model TOTIPD=/dist=gamma; run;

123 Appendix 5. Introduction Letter.

124

125