<<

868 Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 Spatial Cluster as a Approach in Public Health Research

H. Gladius Jennifer1, M. Bagavan Das2

1Assistant Professor , Department of Community , Karpaga Vinayaga Institute of Medical Sciences, Chinnakolambakkam, Madhuranthakam, Tamilnadu, 2Professor, School of Public Health, SRM University, Kattankulathur, Tamilnadu

Abstract Introduction: Sampling is the process of selecting unit from population of interest. Spatial cluster analysis is also a sampling strategy in large scale on population/public health research; K centroid is an exploratory tool to find the natural spatial clusters at focused level for both categorical and continues variables. Hence this study attempt.

Objective: The objective of the study was to develop a methodology for defining natural neighborhoods.

Materials and Method: The exploratory study was carried out during Nov 2016 to Dec 2017, using Primary Abstract of Kancheepuram district, Tamil Nadu issued from census 2011. Village data was extracted and the variables were made as domains by factor reduction and its scores were calculated by . The villages were grouped with similar characteristics as clusters by K mean, Hierarchical and K Mean Centroid. The SPSS 16v, QGIS, GeoDa software were used.

Results: Out of 1020 villages 917 had selected after and connectivity map was made. The census variables reduced as factors like Area, population, spatial distance, health facilities and recreation facilities by factor analysis. These factors scores were taken for the analysis after calculated weighted matrix. Villages were segregated as 5 clusters in every mapping, K Mean Centroid produced both clustering and significant map.

Conclusion: K Mean Centroid will give better understand about heterogeneity of large scale data. It helps us to select appropriate geographical locations to be sampled with existing data for further research.

Keywords: Factor scores, K mean, Hierarchical Cluster, K Mean Centroid, census.

Introduction based on certain observations about a wider set of subjects or variables. Defining appropriate sample Medical research aim to make general statement size and choosing appropriate sampling techniques are mandatory in order to find precise results. There were many sampling techniques were available such as simple random sampling, , systematic Corresponding Author: sampling, convenient sampling, purposive sampling H. Gladius Jennifer etc. these were mainly used when unit of subjects as Assistant Professor Biostatistics, Department of individual. When unit of measurement were group/ Community Medicine, Karpaga Vinayaga Institute set of people , two stage, Multi stage of Medical Sciences, Chinnakolambakkam, samplings were appropriate. The purpose of cluster Madhuranthakam, Tamilnadu analysis is to identify homogeneous subgroups with Contact No.: 7708628846 similar characteristics. Hence the community based e-mail: [email protected] studies were used the above techniques. Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 869 Spatial research studies mainly focus on Physical, approximately three hundred and eighty variables Environmental and social attributes which may require describing codes and names of district and villages, area, carefully designed strategy for collection of data.1,2 total population, worker/non worker population, schools, The main aim of spatial sampling approaches will be colleges, health facilities, communication facilities, calculating mean of given attribute in an area; to test recreation facilities, transports their availabilities the effects of difference between ecological conditions; and their km in distance to travel; water, electricity, to establish spatial investigation or describe spatial sanitation, drainage, waste disposal were available or distribution.3 not in each villages; land areas such as forest, irrigated un irrigated lands, river, pond, lake etc in sqm.2,5,-7 In recent years the concept of spatial association and prediction of variables were analyzed.2 These variables were segregated as domains by is use to identify the spatial patterns, identification of Factor analysis.7 The factor scores were calculated for disease clusters, explanation or prediction of disease risk continuous and as well as categorical variables.7 The in geographical data. In other words spatial clustering factor scores were considered as continuous variables will identify homogeneous groups of objects based on and the variables which rotated individually were taken values of their attributes/geographic space.2,4 Usually to with original values for cluster analysis.7 study Infectious spread of disease; Occurrence of disease vectors, clustering of risk factors/combination; Existence Geo Coding: The longitude and latitude of each of potential health hazards; Localized pollution sources villages in Kancheepuram district were collected and 7 these spatial cluster analysis will be appropriate.2,4 merged with by village codes. These codes and data set were finally merged with base map of The purpose of this paper is to introduce spatial Kancheepuram district after checking for duplicates and cluster analysis as neighbourhood sampling technique .7 for existing large data set like census data. This sampling approach will detect neighborhoods with Statistical Software: SPSS 16v, Q GIS, Geo Da similar attributes which will help investigator/researcher Results to identify homogeneous groups in large population studies. After calculating factor score the data was translated to spatial data by merging these attributes with base map of The objective of the study was to develop a Kancheepuram district by using village code as indicator methodology for defining natural neighborhoods and variable.7 K Mean Cluster Analysis, Hierarchical Cluster establish natural neighborhood sampling approach with analysis and K mean centroid cluster analysis were used social and physical resources using 2011 census tract to find homogeneous spatial clustering of given8 data. data. Hence this study attempt. The ratio of sum of squares was compared in order to choose the natural neighborhood or spatial cluster by Methodology physical and social amenities of villages.8 An exploratory study design was conducted in School of Public Health, SRM University, Kattankulathur, K mean cluster analysis use to create n data points Tamilnadu; approximately one year from November in to k homogeneous clusters. It assess by calculating the 2016 – December 2017. ratio of the total between group-sum of squares with the total .8 The higher the value of ratio will yield : The primary data of census 2011, better separation of the clusters.8 Kancheepuram district, Tamil Nadu was extracted from www.Census.gov.in website.5 “Rural table for Chengai In this study, k mean cluster analysis yield the ratio Anna district” and “Table for Kancheepuram district” of between-sum of squares was 0.18. The total villages were selected for this study.5,6 The two sets data were were clusters as five and as follows 1. Very poor amenities merged by village code with name after checking villages 376, 2. Average amenities villages 250, 3.Very duplication and spell checks for further analysis in SPSS good amenities villages 211, 4. Poor amenities villages 16v. 48, 5. Good amenities villages 32.

Variables in census data: The census data had Hierarchical cluster analysis is classical type of clustering method; The clusters were built in step by step 870 Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 process, either in top – down fashion or bottom – up.8 It The k mean centroid method include geometric compute the distance between two existing clusters in centroids of the observations as part of the optimization order to decide how to group the closest two together.8 It process.8 The x and y coordinates are simply added as has there patterns of linkages: single linkage, complete additional variables in the collection of attributes.8 This linkage and average linkage.8 Dentogram will give the approach will yield better ratio and significance maps; pictorial representation of cluster tree in this method.8 The only criteria will be the centroids should be there as The significance assessed by ward’s method.8 attributes in data.8

In this study, the hierarchical cluster analysis yield In this current study, k mean centroid cluster the ratio of between-sum of squares 0.16. In hierarchical analysis was 0.17. This method yield similar like k mean clustering the five clusters are as follows: 1. Very Poor cluster analysis with few deviations; the five clusters amenities villages were 500, 2. Average amenities were: 1. Very Poor amenities villages 365, 2. Average villages 302, 3. Very Good amenities villages 37, 4. Poor amenities villages 250, 3. Very Good amenities villages amenities 31 and 5. Good amenities villages were 47. 224, 4. Poor amenities villages 46 and 5. Good amenities villages were 32.

Figure 1: Conceptual Frame Work Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 871

Figure 2: Base Map of Kancheepuram district with and without boundaries

Figure 3: K Mean Cluster Analysis 872 Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03

Figure 4: Hierarchical Cluster Analysis

Figure 5: K Mean Centroid Cluster Analysis Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 873 Discussion He assessed the spatial disparities of general population with two level such as territorial level and individual There were various types of sampling procedures level survey.14 Further the spatial dependence was available for environmental studies.9,10 the traditional determined with Moran I, Spatial Lag model and Spatial sampling techniques were design based sampling Error model.14 which is based on either spatial or heterogeneity.9,10 The other method was model based, Basile c etal conducted study to compare the spatial adaptive sampling, spatially balanced sampling and context with a multilevel perspective for neighbourhood method.10 These types of sampling procedures research for mental health.15 He used regional office will be helpful to primary source of data collection survey data with 65830 study participants, Hierarchical process. spatial analysis was used to detect the disease clusters and he concluded that the neighborhood variations The present study explored a methodology that imply a significant impact between proximally closer spatial cluster analysis as a sampling approach in neighborhoods.15 biomedical or public health research. This spatial clustering would be very helpful sampling strategy These literatures evident that, the spatial cluster for secondary source of data such as census tract data, analysis would be appropriate methodology to detect sampling registration system etc. this study were assessed natural neighborhood or homogeneous spatial cluster three different cluster analysis such as K mean, K mean groups, where to be sampled in environmental based Centroid and Hierarchical cluster analysis in order to research studies. The limitation of this study was, current define best fit of natural neighbourhood; and foundK study limited with the data set at district level, further Mean centroid cluster analysis was appropriate method studies will be required at higher level as well. to detect the natural neighbourhood sampling strategy for any secondary source data such as census tract. Conclusion

Adam et al conducted a study on socially based The present study suggested that even though there spatial boundaries in canada by using census data were many types of sampling processes exists, spatial and considered residential boundaries as natural cluster analysis would be more appropriate method neighbourhood.11 They clustered one lakh population to detect the neighbourhood or homogeneous spatial as homogeneous strata by using principle component clusters. However, the efficacy of spatial sampling may analysis and Gi for spatial dependence.11 The be increased if the investigator has prior knowledge residence, physical and land features such as roads, about random field. landscapes etc were used as indicators. Conflict of Interest: Nil Similarly, Guo et al defined the concept of Source of Funding: Nil neighbourhood as residential location; they used hierarchical models in order to cluster the census data Ethical Clearance: The ethical clearance was 12 as homogeneous groups. This study revealed that obtained from institutional ethical committee of individual neighbourhood can be studied, he/she living KIMSRC (EC No: 24/2016). Then data was extracted on the boundary of a census area had more in common from Census 2011 and analyzed. with residents of near area than with resident on the far.12 Reference Study done by John RB, with the objective of whether neighborhood characteristics influences on 1. Paul E, Daniel W. Spatial : Current symptoms of depression among elderly residents in approaches and future challenges. Environmental New York city.13 They used census cohort to assess Health Perspectives, 2004, 112(9):-998-1006 the depression. The census variables were stratified as 2. Jennifer GH, Das B Spatial Epidemiology: domains by factor analysis and then individual level Three dimensional approach of Disease - Causal survey had conducted.13 association in public health research. IJMER, 2019, 11(3):53-59. Luis DS et al studied intra-urban disparities in the qol of general population of Porto using census data.14 3. Stein A, Ettema C. An overview of spatial sampling procedures and experimental design of spatial 874 Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 studies for eco system comparisons. Agriculture, 11. Adam D, Newbold KB, Taylor C (2011). Defining Eco systems and Environment 2003, 94 :31-47 socially based spatial boundaries in the region of 4. Jennifer GH, Das B Spatial cluster models: Model Peel, Ontario, Canada. Int J Health Geogr,10:38- to predict disease causal association with physical, 56. social and environmental risk factors in public 12. Guo JY, Bhat CR (2012). Operationalizing health research, JCDR, 2020, 14(1):1-3 the concept of neighbourhood: Application to 5. www.census.gov.in/accessed on 26/04/2016 residential location choice analysis. Accessed on 5/11/16. 6. Concept of definitions used in village directory, Census 2011 accessed 26/04/16 13. John RB, Magda C, Shannon B, Jennifer A, David V, Sandro G. Neighborhood characteristics and 7. Jennifer GH, Das B Two step cluster analysis change in depressive symptoms among older on census tract data of Kancheepuram distric, residents of New york city American journal of Tamilnadu: An Exploratory study, International public health, 2009, 99(7): 1308-13 journal of multidisciplinary Education Research,2019, 10(3):79 – 83 14. Luis DS. Intra urban disparities in the quality of life in the city of Porto: a Spatial contribution. 8. www.Geoda.com/manual on July 2018 Conference proceedings, Centre for Economics and 9. Wang JF, Stein A, Gao BB, Ge Y. A review of Finance, 2014. spatial sampling spatial statistics 2012,(2):1-14 15. Basile C. Comparison of spatial perspective with the 10. Jennifer GH, Das B Spatial Sampling Technique: multilevel analytical approach in neighbourhood Method to collect data randomly with geographical studies: the case of mental behavirol disorders due indicators in public health research, IJPHRD, 2020, to psychoactive substances. American journal of 11(5). Epidemiology, 2001, 162(2): 171-182