Spatial Cluster Analysis As a Sampling Approach in Public Health Research

Spatial Cluster Analysis As a Sampling Approach in Public Health Research

868 Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 Spatial Cluster Analysis as a Sampling Approach in Public Health Research H. Gladius Jennifer1, M. Bagavan Das2 1Assistant Professor Biostatistics, Department of Community Medicine, Karpaga Vinayaga Institute of Medical Sciences, Chinnakolambakkam, Madhuranthakam, Tamilnadu, 2Professor, School of Public Health, SRM University, Kattankulathur, Tamilnadu Abstract Introduction: Sampling is the process of selecting unit from population of interest. Spatial cluster analysis is also a sampling strategy in large scale data on population/public health research; K Mean centroid is an exploratory tool to find the natural spatial clusters at focused level for both categorical and continues variables. Hence this study attempt. Objective: The objective of the study was to develop a methodology for defining natural neighborhoods. Materials and Method: The exploratory study was carried out during Nov 2016 to Dec 2017, using Primary Census Abstract of Kancheepuram district, Tamil Nadu issued from census 2011. Village data was extracted and the variables were made as domains by factor reduction and its scores were calculated by factor analysis. The villages were grouped with similar characteristics as clusters by K mean, Hierarchical and K Mean Centroid. The SPSS 16v, QGIS, GeoDa software were used. Results: Out of 1020 villages 917 had selected after data mining and connectivity map was made. The census variables reduced as factors like Area, population, spatial distance, health facilities and recreation facilities by factor analysis. These factors scores were taken for the analysis after calculated weighted matrix. Villages were segregated as 5 clusters in every mapping, K Mean Centroid produced both clustering and significant map. Conclusion: K Mean Centroid will give better understand about heterogeneity of large scale data. It helps us to select appropriate geographical locations to be sampled with existing data for further research. Keywords: Factor scores, K mean, Hierarchical Cluster, K Mean Centroid, census. Introduction based on certain observations about a wider set of subjects or variables. Defining appropriate sample Medical research aim to make general statement size and choosing appropriate sampling techniques are mandatory in order to find precise results. There were many sampling techniques were available such as simple random sampling, stratified sampling, systematic Corresponding Author: sampling, convenient sampling, purposive sampling H. Gladius Jennifer etc. these were mainly used when unit of subjects as Assistant Professor Biostatistics, Department of individual. When unit of measurement were group/ Community Medicine, Karpaga Vinayaga Institute set of people cluster sampling, two stage, Multi stage of Medical Sciences, Chinnakolambakkam, samplings were appropriate. The purpose of cluster Madhuranthakam, Tamilnadu analysis is to identify homogeneous subgroups with Contact No.: 7708628846 similar characteristics. Hence the community based e-mail: [email protected] studies were used the above techniques. Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 869 Spatial research studies mainly focus on Physical, approximately three hundred and eighty variables Environmental and social attributes which may require describing codes and names of district and villages, area, carefully designed strategy for collection of data.1,2 total population, worker/non worker population, schools, The main aim of spatial sampling approaches will be colleges, health facilities, communication facilities, calculating mean of given attribute in an area; to test recreation facilities, transports their availabilities the effects of difference between ecological conditions; and their km in distance to travel; water, electricity, to establish spatial investigation or describe spatial sanitation, drainage, waste disposal were available or distribution.3 not in each villages; land areas such as forest, irrigated un irrigated lands, river, pond, lake etc in sqm.2,5,-7 In recent years the concept of spatial association and prediction of variables were analyzed.2 Spatial analysis These variables were segregated as domains by is use to identify the spatial patterns, identification of Factor analysis.7 The factor scores were calculated for disease clusters, explanation or prediction of disease risk continuous and as well as categorical variables.7 The in geographical data. In other words spatial clustering factor scores were considered as continuous variables will identify homogeneous groups of objects based on and the variables which rotated individually were taken values of their attributes/geographic space.2,4 Usually to with original values for cluster analysis.7 study Infectious spread of disease; Occurrence of disease vectors, clustering of risk factors/combination; Existence Geo Coding: The longitude and latitude of each of potential health hazards; Localized pollution sources villages in Kancheepuram district were collected and 7 these spatial cluster analysis will be appropriate.2,4 merged with data set by village codes. These codes and data set were finally merged with base map of The purpose of this paper is to introduce spatial Kancheepuram district after checking for duplicates and cluster analysis as neighbourhood sampling technique missing data.7 for existing large data set like census data. This sampling approach will detect neighborhoods with Statistical Software: SPSS 16v, Q GIS, Geo Da similar attributes which will help investigator/researcher Results to identify homogeneous groups in large population studies. After calculating factor score the data was translated to spatial data by merging these attributes with base map of The objective of the study was to develop a Kancheepuram district by using village code as indicator methodology for defining natural neighborhoods and variable.7 K Mean Cluster Analysis, Hierarchical Cluster establish natural neighborhood sampling approach with analysis and K mean centroid cluster analysis were used social and physical resources using 2011 census tract to find homogeneous spatial clustering of given data.8 data. Hence this study attempt. The ratio of sum of squares was compared in order to choose the natural neighborhood or spatial cluster by Methodology physical and social amenities of villages.8 An exploratory study design was conducted in School of Public Health, SRM University, Kattankulathur, K mean cluster analysis use to create n data points Tamilnadu; approximately one year from November in to k homogeneous clusters. It assess by calculating the 2016 – December 2017. ratio of the total between group-sum of squares with the total variance.8 The higher the value of ratio will yield Data Collection: The primary data of census 2011, better separation of the clusters.8 Kancheepuram district, Tamil Nadu was extracted from www.Census.gov.in website.5 “Rural table for Chengai In this study, k mean cluster analysis yield the ratio Anna district” and “Table for Kancheepuram district” of between-sum of squares was 0.18. The total villages were selected for this study.5,6 The two sets data were were clusters as five and as follows 1. Very poor amenities merged by village code with name after checking villages 376, 2. Average amenities villages 250, 3.Very duplication and spell checks for further analysis in SPSS good amenities villages 211, 4. Poor amenities villages 16v. 48, 5. Good amenities villages 32. Variables in census data: The census data had Hierarchical cluster analysis is classical type of clustering method; The clusters were built in step by step 870 Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 process, either in top – down fashion or bottom – up.8 It The k mean centroid method include geometric compute the distance between two existing clusters in centroids of the observations as part of the optimization order to decide how to group the closest two together.8 It process.8 The x and y coordinates are simply added as has there patterns of linkages: single linkage, complete additional variables in the collection of attributes.8 This linkage and average linkage.8 Dentogram will give the approach will yield better ratio and significance maps; pictorial representation of cluster tree in this method.8 The only criteria will be the centroids should be there as The significance assessed by ward’s method.8 attributes in data.8 In this study, the hierarchical cluster analysis yield In this current study, k mean centroid cluster the ratio of between-sum of squares 0.16. In hierarchical analysis was 0.17. This method yield similar like k mean clustering the five clusters are as follows: 1. Very Poor cluster analysis with few deviations; the five clusters amenities villages were 500, 2. Average amenities were: 1. Very Poor amenities villages 365, 2. Average villages 302, 3. Very Good amenities villages 37, 4. Poor amenities villages 250, 3. Very Good amenities villages amenities 31 and 5. Good amenities villages were 47. 224, 4. Poor amenities villages 46 and 5. Good amenities villages were 32. Figure 1: Conceptual Frame Work Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 871 Figure 2: Base Map of Kancheepuram district with and without boundaries Figure 3: K Mean Cluster Analysis 872 Indian Journal of Public Health Research & Development, March 2020, Vol. 11, No. 03 Figure 4: Hierarchical Cluster

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us