Modeling HIVAIDS Variables, A Case Of Contingency Analysis

O. C. Asogwa.

Department of Mathematics/ Statistics/ Computer Science & Informatics, Federal University Ndufu Alike-Ikwo, Ebonyi State. [email protected]

Abstract

Hypothesis testing may be performed on contingency tables in order to decide whether or not effects are present. Effects in a contingency table are defined as relationships or interactions between variables of interest. However, considering the number of adolescence patients diagnosed of HIV/AIDS within a short frame of time in BSH, Nsukka, the researcher was moved to carry out a study to actually confirm whether patients’ diagnosed of the aliment have any association based on their gender, years and age of diagnoses. The observation from the hypotheses carried out on this study using statistic, was that none of the variables of interest considered were dependent of each other. Gender and age of patients diagnosed of HIV/AIDS in the Bishop Shanahan Missionary Hospital, Nsukka are independent of the years of affection. Also the conditional independence of the pair-wise variables of interest existed. Age is conditionally independent of both gender of the affected patient and the year of the affection. Moreover, gender is conditionally unrelated to both the year of affection and the age of the affected patients and lastly, year of affection has no conditional interaction with both the gender and age of the affected patients.

Keywords: Modeling, Contingency, Independency, Chi-square, HIV/AIDS ______

1.0 Introduction In statistics, a contingency table (also common odds ration suggested by R. A. referred to as cross tabulation or crosstab) is a Fisher. [20] and [27] supplied interpretations type of table in a matrix format that displays of varied interactions which led to the well- the (multivariate) frequency distribution of the known Simpson’s paradox [3]. [26] showed variables. They are heavily used in survey that Bartlett’s procedure is an implicit research, business intelligence, engineering maximum likelihood estimation (MLE), and scientific research. They provide a basic conditioned upon the fixed margins of each picture of the interrelation between two 2×2 table. The celebrated analysis of variance variables and can help find interactions or (ANOVA, [8]) inspired discussions of associations between them. The term partitioning chi-squares within the contingency table was first used by [1] in his contingency tables, notably by [16], [19], and research “On the Theory of Contingency and [4], among others. In related research in Its Relation to Association and Normal biostatistics, [5], [32] and [18] developed chi- Correlation. square tests for no association between two Moreover, according to [23], the analysis variables across levels of the third variable. of contingency tables with multi-way These early studies led to further analyses of classifications originates from the historical three-way tables, which include estimating the development of statistical inference with 2 × 2 common odds ratio, testing zero interaction tables. In the initial extension to the case of 2 and testing no association across strata, for × 2 × K tables, [1] discussed testing for three- examples, [15], [10], [6], [17], [24], [2a],[2b], way interaction and derived an estimate of the and [11].

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 71

The classical method of partitioning the chi- adolescence patients of the case study has a squares for the three-way (I×J×K) direct or conditional interactions among the Contingency table does not provide a years of affection, gender of the patients and convenient test of the null hypothesis that the the ages of the patients. three-way interaction is zero [16]. The null distribution of Lancaster’s test statistic need 2.0 Material And Methods not be asymptotically Chi-square distributed Contingency Analysis otherwise known as [24]. A similar remark can also be applied to Categorical Data Analysis constituted the the methods of partitioning Chi-squares by framework of this research and the statistical [15] and [4]. Since partitions of chi-squares tools adopted by [22] – Chi-squared test are closely related to the likelihood ratio tests aided in the analysis of this study. The table [31] maximum likelihood estimation of comprised of the adolescence aged patients association and interaction significantly who were diagnosed positive to Human influenced the early studies of multi-way Immune Virus/Acquired Immune Deficiency contingency tables [26]. In particular, it led to Syndrome (HIV/AIDS), from 2011 to 2014. the development of a likelihood ratio test for Moreover, the data which were collected from zero interaction in a multi-way contingency the laboratory section of Bishop Shanahan table [6]. [30] used chi square test to Missionary Hospital Nsukka, were partitioned determine if there were pre-post difference into two main adolescence age categories (12 - among a group of students in an ethnic 14 & 15 – 18), and each of the years literature classroom. Also in a study by [28] a considered in this study, data collection were Chi-square test was used to identify the from January to December. Also data were difference between categories of data as collected on both genders. The data were applied on public administration research. For compiled in a categorical table otherwise further literatures and brief history see ([29], contingency table in other to capturing the [25], [14], [9]). study objectives Hoverer, the goal of this study is to verify whether the diagnosed HIV/AIDS positive .

2. 1 Layout of the Study Design (Multi-way (2 2 4) Contingency Table)

Table 1: Layout of the study design

Age (Y) y1 (12-14) y2 (15-18) Year (Z)

Gende 2011(z1) 2012(z2) n+j+ 2011(z1) 2012(z2) n+j+ ni++ r(X) 2013(z3) 2014(z4) 2013(z3) 2014(z4) Male(x n111 n112 n113 n11+ n121 n122 n123 n12+ n1++ 1) n114 n124

Femal n211 n212 n213 n21+ n221 n222 n223 n22+ n2++ e(x2) n214 n224

n++k n+11 n+12 n+13 n+1+ n+21 n+22 n+23 n+2+ n = n+14 n+24

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 72

From the table 1 above, n = = ;

.

2. 2 The Chi-Square Distribution Chi-square distribution which was first distribution is very important because many discovered by the German Statistician test statistics are approximately distributed as in papers of 1875 - 6 Chi-square. Two of the more common tests ([12], [13]) is a theoretical or mathematical using the Chi-square distribution are tests of distribution which has wide applicability in deviations of differences between theoretically statistical work. The term ‘Chi-square' expected and observation frequencies (multi- (pronounced with a hard `Ch') is used because way tables) and the relationship between the Greek letter is used to define this categorical variable (contingency tables). distribution. It was observed that the elements Moreover, the distribution was on which this distribution is based are independently rediscovered by the English squared, so that the symbol is used to Mathematician in the context of denote the distribution. goodness of fit, for which he developed his However, in probability theory, as well as Pearson’s Chi-square test, published in 1900, in Statistics, the Chi-squared distribution with with computed table of values published in degree of freedom k, denoted by is the [7]. If are independent, distributed of sum of squares of k independent standard normal random variables, then the standard normal random variables. The Chi- sum of their squares is distributed according to square distribution is used in the common chi the Chi-squared distribution with k degrees of square tests for goodness of fit of observed freedom. This is usually represented as, distribution to a theoretical one, the bearing in mind that the independence of two criteria of classification chi-squared distribution has one parameter; k: of quantitative data, and in confidence interval a positive integer that specifies the number of estimation for a population standard deviation degrees of freedom of a from a sample . standard deviation. Some other statistical tests also use this distribution, like Friedman’s analysis of variance by rank. Also Chi-square . 2.3 Derivation of the Distribution In mathematical terms, the variable is this distribution. Then suppose another value the sum of the squares of a set of normally is selected from the same standardized distributed variables. Given a standardized normal distribution. If there are t degrees of normal distribution for a variable Q with mean freedom, then let this process continue until t 0 and standard deviation, and suppose that a different Q values are selected from this particular value is randomly selected from distribution. The variable is defined as the sum of the squares of these Q values. That is

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 73

Suppose this process of independent would also be 0. There is no upper limit to selection of t different values is repeated many the value. If all the Q values were quite times. The variable will vary, because each large, then would also be large. But note that random selection from the normal distribution this is not too likely to happen. Since large Q will be different. Note that this variable is a values are distant from the mean of a normal continuous variable since each of the Q values distribution, these large values have is continuous. This sum of squares of t relatively low probabilities of occurrence, also normally distributed variables has a implying that the probability of obtaining a distribution which is called the distribution large value is low. Also note that as there with t degrees of freedom. It can be shown are more degrees of freedom, there are more mathematically that the mean of this squared Q values, and this means larger distribution is t and the standard deviation is values. Further, since most values under the 2t. An intuitive idea of the general shape of normal curve are quite close to the center of the distribution can also be obtained by the distribution, within 1 or 2 standard considering this sum of squares. Since is the deviations of center, the values of also tend sum of a set of squared values, it can never be to be concentrated around the mean of the negative. The minimum Chi-squared value distribution would be obtained if each Q = 0 so that

2.4 Chi-Square Test for Independence Like the goodness of fit test, the Chi-square The Chi-square test for independence of test of independence is very general, and can two variables is a test which uses a cross be used with variables measured on any type classification table to examine the nature of of scale, nominal, ordinal, interval or ratio. the relationship between these variables. The only limitation on the use of this test is These tables are sometimes referred to as that the sample sizes must be sufficiently large contingency tables. These tables show the to ensure that the expected number of cases in manner in which two variables are either each category is positive or more. This rule related or are not related to each other. The can be modified somewhat, but as with all test for independence examines whether the approximations, larger sample sizes are observed pattern between the variables in the preferable to smaller sample sizes. There are table is strong enough to show that the two no other limitations on the use of the test, and variables are dependent on each other or not. the Chi-square statistic can be used to test any While the Chi-square statistic and distribution contingency or cross classification table for are used in this test, the test is quite distinct independence of the two variables. The Chi- from the test of goodness of fit. The goodness square test for independence is conducted by of fit test examines only one variable, while assuming that there is no relationship between the test of independence is concerned with the the two variables being examined. The relationship between two variables. alternative hypothesis is that there is some relationship between the variables

2.5 The Chi-Square Statistic The Statistic appears quite different each category into which the data have been from the other Statistics which have been used grouped. in the previous hypotheses tests. For both the In the Chi-square tests, the null hypothesis goodness of fit test and the test of makes a statement concerning how many independence, the Chi- square statistic is the cases are to be expected in each category if same. For both of these tests, all the categories this hypothesis is correct. The chi square test into which the data have been divided are is based on the difference between the used. The data obtained from the sample are observed and the expected values for each referred to as the observed numbers of cases. category. These are the frequencies of occurrence for The Chi-square Statistic is defined as

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 74

This difference is squared and divided by the expected number of cases in that category. These values are then added for all the where is the observed number of cases in categories, and the total is referred to as the Chi-squared value. category and is the expected number of The null hypothesis is a particular claim cases in category . This Chi-square Statistic concerning how the data is distributed. The is obtained by calculating the difference null and alternative hypotheses for each Chi- between the observed number of cases and the square test can be stated as expected number of cases in each category.

However, if the claim made in the null test though, the degrees of freedom is based hypothesis is true, the observed and the on the number of categories which are used in expected values are close to each other and the calculation of the statistic. ,is small for each category. When the The Chi-square Statistic, along with the observed data does not conform to what has Chi-square distribution, allows the researcher been expected on the basis of the null to determine whether the data is distributed as hypothesis, the difference between the claimed. If the Chi-square statistic is large observed and expected values, , is enough to reject , then the sample provides large. The Chi-square Statistic is thus small evidence that the distribution is not as claimed when the null hypothesis is true and large in . If the chi square statistic is not so large, when the null hypothesis is not true. Exactly then the researcher may have insufficient how large the value must be in order to be evidence to reject the claim made in the null considered large enough to reject the null hypothesis. It is equally used to investigate hypothesis can be determined from the level whether distributions of categorical variables of significance and the Chi-square table in any differ from one another. The chi-square statistical textbook. A general formula for statistic compares the tallies or counts of determining the degrees of freedom is not categorical responses between two or more given at this stage, because this differs for the independent groups two types of chi square tests. In each type of . 2.0 Layout Presentation of Data

Table 2: Table of the observed and the expected values of the HIV positive persons as categorized Age(Y) y1(12 – 14) y2(15 – 18) Year(Z) Gender(X) (z1) (z2) (z3) (z4) n+j+ (z1) (z2) (z3) (z4) n+j+ ni++

Male(x1) 20(16) 58(61) 63(68) 38 (42) 179 48 (38) 144 (149) 172 (165) 99 (103) 463 642

Female(x2) 33(30) 115(120) 127(133) 100(83) 375 57(74) 306(292) 328(324) 193(202) 884 1,259

n++k 53 173 190 138 554 105 450 500 292 1,347 1,901

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 75

X = Gender : Male (x1); Female (x2) Y = Age : 12 – 14 (y ); 15 – 18(y ) 1 2 Z = Year : 2011 (z ); 2012 (z ); 2013 (z ); 1 2 3 2014 (z ) 4 n = n = +++ = 1,901

3.1. Hypotheses to be tested Test of interactions among gender age and year Test of conditional independence of gender on both age and year To access the conditional interaction of age on both gender and year To test for the conditional independence of year given both gender and age

3.1.1. Hypothesis : There is no interaction among gender , age and year : There is interaction among gender , age and year at level of significance Theoretically, from above, the hypothesis implies; : :

,

Test Statistic:

= 14.4357

Decision Rule: Reject , accept if otherwise. Conclusion: Since , was accepted at

3.1.2. Hypothesis : Gender is conditionally independent of both age and year : Gender is conditionally dependent of both age and year

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 76 at level of significance

Theoretically, from above, the hypothesis implies; : :

,

Test Statistic:

= 14.4357

Decision Rule: Reject , accept if otherwise. Conclusion: Since , was accepted at

3.1.3. Hypothesis : Age is conditionally not interacted with both gender and year : Age is conditionally interacted with both gender and year at level of significance Theoretically, from above, the hypothesis implies; : :

,

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 77

Test Statistic:

= 14.4357

Decision Rule: Reject , accept if otherwise. Conclusion: Since , was accepted at

3.1.4. Hypothesis : Year is conditionally independent of both gender and age : Year is conditionally independent of both gender and age at level of significance

Theoretically, from above, the hypothesis implies; : :

,

Test Statistic:

= 14.4357

Decision Rule: Reject , accept if otherwise. Conclusion: Since , was accepted at

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 78

4.0. Abridged Table of Results Table 3: Summarize table showing the tested hypotheses and evaluated where stands for Gender, Age for and stands for Year

Test of interaction among X, Y and Z Test of conditional independence of X given Y and Z Test of conditional independence of Y given X and Z Test of conditional independence of Z given X and Y . 5.0. Conclusion From table 3, it could be verified that the this study. Therefore from the final remark, study hypotheses were accomplished and none the test of dependence effect of the variables of the hypotheses tested yielded an interaction considered in this study were not met, the effects among each of the interaction or pair-wise interactions also failed. So the association effects considered. There were no gender, year and age of the affected interaction among years of affection, gender adolescence patients of the case study were of the affected and the age of the patients of independent of one another and their pair-wise adolescence persons diagnosed of the aliment conditional effect were equally independent of of the case study. Also the pair-wise each other. conditional interactive effects were not met in

ACKNOWLEDGEMENT The author is grateful to the Bishop Shanahan Missionary Hospital (BSH) Nsukka, Enugu State, Nigeria, for making the data for this research available. ______

References

[1] Bartlett, M. S. (1935). Contingency table interactions. Supplement to the journal of the Royal Statistical 2 (2), 248–252. [2] Birch, M. W. (1963). Maximum likelihood in three-way contingency tables. Journal of the Royal Statistical Society 25, 220–233. [3] Blyth, C. R. (1972). On Simpson’s paradox and the sure thing principle. Journal of American Statistics Association. 67, 364-366. [4] Claringbold, P. J. (1961). The use of orthogonal polynomials in the partition of chi-square. Australia Journal of Statistics. 3, 48-63. [5] Cochran, W. G. (1954). Some methods for strengthening the common tests. Biometrics 10, 417–451. [6] Darroch, J. N (1962). Interaction in multi-factor contingency tables. Journal of the Royal Statistical Society 24, 251–263. [7] Elderton, William Palin (1902). “Tables for testing the goodness of fit of theory to observation” Biometrika 1(2): 155-163. [8] Fisher, R. A. (1922). On the interpretation of from contingency tables and the calculation of p. Journal of the Royal Statistical Society 85(1), 87–94. [9] Galton, F. (1892). Finger Prints. London: Macmillan. [10] Gart, J. J. (1962). On the combination of relative risks. Biometrics 18, 601-610.

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 79

[11] Goodman, L. A. (1964). Simultaneous confidence limits for cross-products ratios in tables. Journal of the Royal Statistical Society 26, 86–102. [12] Hald, A. (1998). A history of Mathematical Statistics from 1750–1930. New York Wiley. ISBN 0– 471–17912–4. [13] Helmert, F. R. (1876). Ueber die Wahrscheinlichk der potenzsummen der Beobachtungsfehler and ubereinige damit im zusammenhange stehende fragen, zeitschrift fur mathamatik und physic 21, 1876, S. 102–219. [14] Heyde, C. C. and Seneta, E. (1977). I. J. Bienayme; Statistical theory Anticipated. New York: Springer–Verley. [15] Kullback, S. (1959). Information Theory and Statistics. Wiley. [16] Lancaster, H. O. (1951). Complex contingency tables treated by the partition of chi-square. Journal of Royal Statistical Society, Series, B 13, 242-249. [17] Lewis, B. N. (1962). On the Analysis of Interaction in Multi-Dimensional contingency. Journal of the Royal Statistical Society Series. [18] Mantel, N. and Hacnszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 22, 719–748. [19] Mood, A. M. (1950). Introduction to the theory of Statistics. New York: McGraw Hill. [20] Norton, H. W. (1945). Calculation of chi – square for complex contingency tables. Journal of the American Statistical Association 40(230), 251–258. [21] Pearson, K. (1904). Mathematical contributions to the theory of evolution, XIII: On the theory of contingency and its relation to association and normal correlation. Draper’s Co. Research memoirs in biometry series. I, 1–35. [22] Pearson, K. (1990). On the criterion that a given system of deviation from the probable in the case of a correlated system of variables is such that it can reasonably be supposed to have arisen from random sampling. Philosophical magazine 50, 59-73. [23] Philip E. Cheng, Jiun W. Liou, Michelle Liou and John A. D. Aston. (2006). Data Information in Contingency Tables: A Fallacy of Hierarchical Log linear model. Journal of Data Sciences 4, 387– 398. [24] Placket, R. L. (1963). “Karl Pearson and the Chi-square Test”. International Statistical Review (International Statistical Institute ((ISI)) 51(1): 59–72. JSTOR. [25] Quetelet, M. (1849). Letter Addressed to H.R.H. the Grand Duke of Saxe Coburg and Gotha on the theory of probabilities as Applied to the moral and political sciences (translated from the French by Olinthus Gregory Downs). London: Charles and Edwin Layton. [26] Roy, S. N. and Kastenbaum, M. A. (1956). On the hypothesis of no “interaction in a multi-way contingency tables. Annals of Mathematical Statistics 27(3), 749–757. [27] Simpson, E. H (1951). The interpretation of interaction in Contingency tables. Journal of the Royal Statistical Society. JSTOR. [28] Stallings, R. A and Ferris, J. M. (1988). Public Administration Research: work in PAR; 1940 – 1984, Public Review, Vol. 48 (1), pp. 580–587. [29] Stigler, S. (2002). The missing early history of contingency tables. Annales de la Faculte’ des sciences de Toulouse II (4), 563–573. [30] Trzyna, T. and Abbott, M. (1991). “Grieving in the Ethnic Literature Classroom,” College Literature 18(3), 1–2. [31] Wilks, S. S. (1935). The likelihood test of independence in contingency tables. Annals of Mathematical Statistics 6, 190–196. [32] Woolf, B. (1955). On estimating the relation between blood group and disease. Annals of Human Genetics, Vol. 19(4), 251–253.

West African Journal of Industrial & Academic Research Vol.14 No.1 June 2015 80