Frontier of Environmental Science December 2015, Volume 4, Issue 4, PP.91‐98 Surface Water Quality Assessment Using Multivariate Statistical Techniques: Case Study of Basin,

Liyan Zheng 1, 2, Hongbing Yu 1†, Jianan Wang 2, Zhe Wang 2 1. College of Environmental Science and Engineering, Nankai University, Tianjin 300071, China 2. Department of Environmental Science and Engineering, Nankai University Binhai College, Tianjin 300270, China †Email: [email protected] Abstract

Multivariate statistical techniques, such as principal component analysis (PCA), factor analysis (FA) and cluster analysis (CA), were applied to evaluating and interpreting the surface water quality datasets of the Songhua River Basin (SRB) in China, obtained during two years (2012-2013) of monitoring of 13 physicochemical parameters at 29 different sites. PCA assisted to recognize the factors or origins responsible for surface water quality variations and identified three latent factors and explained 83.79% of the total variance, standing for organic pollution, metal pollution and oil pollution, respectively. FA revealed that the SRB water chemistry was strongly affected by the discharge of industrial, agricultural and municipal sewage water, mining operations and petroleum exploitation. Hierarchical CA grouped 29 different sampling sites into three groups, i.e., relatively less polluted (LP), moderately polluted (MP) and highly polluted (HP) sites, based on the similarity of water quality characteristics. This study illustrates the usefulness of multivariate statistical techniques for the analysis and interpretation of huge and complex data sets, identification of pollution sources and better understanding variations in water quality for effective surface water management. Keywords: Songhua River Basin; Water quality; PCA; FA; CA

1 INTRODUCTION With increased understanding of the importance of drinking water quality to public health and raw water quality to aquatic life, there is a great need to assess water quality [1]. One of such critical efforts is the development of the surface water monitoring network [2]. However, long-term monitoring programs require monitoring of a wide range of physical, chemical and biological variables from many monitoring stations. This results in large and complex data sets that are often hard to explain and draw meaningful conclusions [3]. Further, in the surface water quality assessment, it is frequent to determine whether a variation in the concentration of measured parameters should be attributed to anthropogenic activities or to natural changes [4,5]. The problems of data reduction and interpretation, characteristic change in water quality parameters, and indicator parameter identification can be approached through the use of multivariate statistical techniques, such as principal component analysis (PCA), factor analysis (FA) and cluster analysis (CA) [2,3,5-8]. Many studies have applied PCA/FA and CA to river water and coastal water quality data: for example the Gomti River in India [5]; the Pisuerga River in Spain [6]; the Suquia River in Argentina [9]; the Mahanadi River and estuary in India [10]; and the Fuji River in Japan [11]. In this study, the Songhua River Basin in China was chosen for water quality assessment. We mainly used correlation analysis that conducted to evaluate the relationship of water quality parameters, PCA/FA to find the most important factors which describe the natural and anthropogenic influences and CA to identify several zones with different water quality. The overall aim of the present study is to provide useful information for water resources

- 91 - http://www.ivypub.org/fes

management at the watershed scale.

2 MATERIALS AND METHODS 2.1 Study Area The Songhua River Basin (SRB) is the third biggest basin in China, with a drainage area of 556800 km2, which includes multiple tributaries. The Basin lies between longitude of 119º52’ and 132º31’E and latitude of 41º40’ and 51º38’N. The location of the SRB is shown in FIG. 1.

FIG. 1 MAP OF THE STUDY AREA AND SURFACE WATER QUALITY SAMPLING SITES IN THE SRB, CHINA. 2.2 Data According to characteristics of environmental hydromechanics and land use types, we selected 29 sampling sites located in the SRB (FIG. 1) and 13 representative parameters (TABLE 1). Selected sites were sampled every month for two years (2012-2013) following the Chinese National Standards for Scientific Sampling (Ministry of Environmental Protection of China, 2002).

TABLE 1 THE WATER QUALITY PARAMETERS ASSOCIATED WITH THEIR ABBREVIATIONS AND UNITS USED IN THIS STUDY

Variable Abbreviation Units

pH pH pH units

Dissolved oxygen DO mg/l

Permanganate index CODMn mg/l Chemical oxygen demand COD mg/l

5-day biological oxygen demand BOD5 mg/l

Ammonia nitrogen NH3-N mg/l Total nitrogen TN mg/l

Total phosphorus TP mg/l

Lead Pb mg/l

Mercury Hg mg/l Zinc Zn mg/l

Volatile Phenols VPhs mg/l

Petroleum hydrocarbon Oil mg/l

- 92 - http://www.ivypub.org/fes

2.3 Principal Component Analysis (PCA)/Factor Analysis (FA) PCA starts with the covariance matrix describing the dispersion of the original variables, and extracting the eigenvalues and eigenvectors. An eigenvector is a list of coefficients by which we multiply the original correlated variables to obtain new uncorrelated variables, called principal components (PCs), which are weighted linear combinations of the original variables [6,12]. Eigenvalues of 1.0 or greater are considered significant [11]. FA follows PCA. FA can be achieved by rotating the axis defined by PCA, and constructing new groups of variables, also called varifactors (VFs) [11]. Classification of factor loadings is “strong”, “moderate” and “weak”, corresponding to absolute loading values of >0.75, 0.75-0.50 and 0.50-0.30, respectively [13]. 2.4 Cluster Analysis (CA) The purpose of CA is to partition a set of objects into two or more groups based upon the similarity of the objects with respect to a chosen set of characteristics, so that similar objects are in the same class [14]. Hierarchical cluster analysis (HCA) was presented on the normalized data set using the Ward’s method as agglomeration technique and squared Euclidean distance as a measure of similarity. The Ward’s method yields the most meaningful clusters and has been proven to be an extremely powerful grouping mechanism. The Euclidean distance is reported as Dlink/Dmax, which represents the quotient between the linkage distance divided by the maximal distance [15].

3 RESULTS AND DISCUSSION 3.1 Descriptive Statistics Statistical summary of the 13 measured variables in the river water samples from 29 sites in the SRB is presented in TABLE 2. Coefficient variation (CV) showed that all the variables except pH fluctuate significantly along the river basin. According to the Grade III standard under Environmental Quality Standards for Surface Water (GB 3838-

2002), it must be emphasized that average concentrations of some variable such as DO, COD, CODMn, BOD, NH3-N, TN and TP are higher than the standards, therefore the water resource is not adequate for human consumption or industrial purposes and needs to be purified.

TABLE 2 STATISTICAL DESCRIPTIVE OF WATER QUALITY IN THE SRB, CHINA

Variable Min Max Mean Std. Dev. CV(%)

pH 6.59 8.47 7.20 0.34 4.7

DO 2.95 11.28 7.85 3.12 39.8

CODMn 1.73 20.9 6.17 3.23 52.4

BOD5 1.03 26.67 5.01 6.01 119.9

NH3-N 0.15 8.87 1.61 2.32 144.1

COD 11.09 89.40 25.35 20.73 81.8

TN 0.61 21.33 3.55 4.22 118.9

TP 0.03 3.40 0.30 0.65 216.7

Hg 0.00 0.00009 0.00002 0.00002 100.0

Pb 0.00 0.005 0.0020 0.0015 75.0

Zn 0.00 0.116 0.035 0.03 85.1

Oil 0.001 0.430 0.080 0.11 137.5

VPhs 0.0007 0.0431 0.0042 0.01 238.1

- 93 - http://www.ivypub.org/fes

3.2 Correlation of Water Quality Parameters Data in TABLE 3 provides the Pearson’s correlation matrix of the water quality parameters obtained from PCA. According to the results of correlation matrices, some clear hydro-chemical relationships can be readily inferred:

TABLE 3 CORRELATION MATRIX OF THE 13 PHYSICOCHEMICAL PARAMETERS DETERMINED

pH DO CODMn BOD5 NH3-N Oil VPhs Hg Pb COD TN TP Zn

pH 1.00

DO 0.67 1.00

CODMn -0.59 -0.89 1.00

BOD5 -0.32 -0.78 0.79 1.00

NH3-N -0.60 -0.83 0.94 0.85 1.00

Oil -0.05 -0.30 0.37 0.64 0.56 1.00

VPhs -0.21 -0.59 0.32 0.49 0.40 0.34 1.00

Hg -0.54 -0.33 0.17 0.33 0.18 0.01 0.43 1.00

Pb -0.22 -0.31 0.21 0.49 0.21 0.36 0.49 0.55 1.00

COD -0.45 -0.85 0.78 0.94 0.83 0.62 0.71 0.47 0.53 1.00

TN -0.41 -0.68 0.83 0.73 0.95 0.59 0.34 0.02 0.02 0.74 1.00

TP -0.17 -0.34 0.56 0.53 0.70 0.79 0.10 -0.14 -0.12 0.50 0.85 1.00

Zn 0.05 -0.05 -0.03 0.25 -0.01 0.24 0.60 0.57 0.51 0.37 -0.08 -0.18 1.00

The river water pH value had relatively weak to fair correlations, i.e., most of the correlation coefficients are less than 0.7 (absolute value) with other parameters. There was a positive correlation between pH and DO (0.67); and negative correlations existed between pH and CODMn (-0.59), pH and NH3-N (-0.60), pH and Hg (-0.54). This indicated that the pH affects chemical and biological processes and the competitive ability of each metal.

There were strong relationships between CODMn and DO (-0.89), NH3-N and DO (-0.83), COD and DO (-0.85), and

BOD5 and DO (-0.78), and a moderately relationship between TN and DO (-0.68). As expected, DO is negatively correlated with most organic-related parameters. The degradation of organic matter in the water consumes the available DO, leading to the rapid depletion of available DO in water, resulting in high COD, BOD5, NH3-N and TN.

Various studies attributed the BOD5, COD, NH3-N and TN with lower DO in sewage water to the presence of biodegradable organic matter and utilization of DO by a microorganism in the water. Significant positive correlations between organic-related parameters and inorganic-nutrients parameters were also found. The correlation coefficients between CODMn and BOD5, CODMn and NH3-N, CODMn and COD, CODMn and

TN, BOD5 and NH3-N, BOD5 and COD, NH3-N and COD, NH3-N and TN, TN and TP, COD and TN, CODMn and

TP, BOD5 and TP, and NH3-N and TP were 0.79, 0.94, 0.78, 0.83, 0.85, 0.94, 0.83, 0.95, 0.85, 0.74, 0.56, 0.53 and

0.70, respectively. BOD5 is a measure of the amount of oxygen that is consumed by bacteria during the decomposition of organic matter under aerobic condition, whereas COD is a measure of the total quantity of oxygen required to oxidize organic materials into carbon dioxide and water under strong oxidants. The inorganic nutrients such as NH3-N, TN and TP were probable originate from anthropogenic activities inputs. This indicates the presence of biodegradable organic matter that is easily oxidable in the sampled water of the SRB. This also points out the same source of organic contamination, probably associated with discharges from municipal wastewaters and agricultural runoff.

- 94 - http://www.ivypub.org/fes

3.3 Pollution Sources Identification using PCA/FA PCA/FA was further applied to the normalized datasets to explore the extent of the physiochemical relationship and water pollution source identification. Results of PCA/FA including factor loadings, eigenvalues and total and cumulative variance values are presented in TABLE 4.

TABLE 4 VARIMAX-ROTATED COMPONENT MATRIX

Parameter VF1 VF2 VF3

pH -0.68 -0.06 0.69

DO -0.86 0.00 0.36

CODMn 0.89 -0.23 -0.24

BOD5 0.92 0.03 0.07

NH3-N 0.94 -0.28 -0.08 Oil 0.63 -0.12 0.67

VPhs 0.61 0.52 0.18

COD 0.96 0.17 0.09

TN 0.86 -0.45 0.08

TP 0.62 -0.59 0.35

Hg 0.43 0.68 0.14

Pb 0.39 0.72 -0.28

Zn 0.22 0.79 0.36

Variability (%) 52.18 19.97 11.64

Cumulative (%) 52.18 72.14 83.79

Varifactor 1 (VF1) explains 52.18% of the total variation and has strong positive loadings on COD, CODMn, BOD5, NH3-N and TN, and strong negative loadings on DO and pH; these parameters are indicators of organic and inorganic-nutrients pollution. High loading on organic compounds in the water body indicates that the river is heavily polluted due to anthropogenic activities through point and non-point sources taking place near the study area. The inorganic nutrients may be interpreted as representing influences from anthropogenic inputs. The direct dumping of waste and the discharge of sewage effluent into the river has been identified as the main contributing factors enhancing COD, BOD5, NH3-N and TN. Varifactor 2 (VF2) contains 19.97% of the variance and has strong positive loadings on Zn and has moderately strong positive loadings on Pb and Hg. This varifactor can be interpreted as metal pollution. In this study, the concentration of Zn, Pb and Hg are lower than the Grade Ⅲ standard under Environmental Quality Standards for Surface Water (GB 3838-2002), indicating relatively low level of heavy metal pollution in the SRB. And this pollution comes mostly from the smelting and refining of nonferrous metals or from electroplating factories. Varifactor 3 (VF3), explaining 11.64% of total variance, has moderately strong positive loadings on oil. This factor indicates a moderate effect of oil pollution from petroleum exploitation in the SRB. In fact, there are many oil- producing regions such as Songyuan Oilfield, Daqing Oilfield and other industrial factories. The Daqing oilfield is one of the biggest oilfields and the important petrochemical industry base of China. It has been exploited for several decades, which brings serious pollution to local natural environment. 3.4 Spatial Similarity and Sites Grouping HCA generated a dendrogram (FIG. 2), grouping 29 monitoring sites into three statistically distinctive groups (A, B,

- 95 - http://www.ivypub.org/fes

C). Sites within each group have similar characteristic features and natural background source types.

FIG. 2 DENDOGRAM BASED ON HACA (WARD’S METHOD). Group A (containing S4, S5, S7, S8, M7 and M8) This group sites all lied in the city area, such as Changchun City, Jilin City and Dunhua City, which were the most important industrial cities in the SRB. Therefore, these sampling sites received large amounts of pollution from various point sources, such as domestic wastewater, wastewater treatment plants and industrial effluents, and showed the highest average concentrations of all monitored water parameters. Besides, these sites mostly located in the tributaries, which related to the low dilution capacity and less active biological, chemical, physical purification capacity of water body. Group A corresponds to high pollution (HP) regions. The maximum loads of both COD and NH3-N were shown at S7 on Yitong River. In fact, Yitong River has become a natural drainage channel of Changchun City due to lower urban sewage treatment rate. Both S4 and S5 were the sections in Jilin City, receiving most of urban sewage and industrial effluent. The urban river sections were worst affected by industrial pollution from the chemical plants with insufficient wastewater treatment facilities. Therefore, it is necessary to enhance the treatment of industrial effluent, to strictly carry out the discharge standard for water pollutants and the total amount control system, and finally to restore and maintain the chemical, physical, and biological integrity of the nation’s waters in high pollution (HP). Group B (containing N2, N7, M1, M2, M3, M5, M6, M11 and M12) Although all these group sites were located in the industrial cities regions, the Group B corresponded to moderate pollution (MP) regions, because these sites were located at the main streams or major tributaries in the SRB. Since an increase in flow rate causes dilution of contaminants, the water quality of the main stream was better than that of the tributaries, and the water quality of major tributaries was better than that of small tributaries.

- 96 - http://www.ivypub.org/fes

M5 in Ashen Inner lied in the Harbin City region. There are many chemical, pharmaceutical, petroleum chemical, iron-steel and electroplating factories. M12 in the Songhua River Main-stream was located in the Jiamusi region, which was also an important industrial city. N2 and N7 were located in the main-stream of the Nenjiang River, which carried high pollution flowing through the Qiqihar City and the Daqing Oilfield. M11 in Woken River had high concentration of nutrients resulted from the discharge of Qitaihe City’s coal mining operations and agricultural runoff. M2, M3 and M6 were located at two main tributaries of the Songhua River Mainstream (Lalin River and ). These sites had fertile black soil well-suited for various agricultural activities. Therefore, the proportion of nonpoint source pollution to the total loads was generally rising, and had become the major source of the pollution loads in these regions. Above all, to control the discharge of pollutants, improvement of urban and industrial wastewater treatment facilities should be quickly undergone. Agricultural nonpoint source pollution, such as erosion of cropland and the unreasonable application of agrochemicals to cropland, should also be controlled and diminished firstly by land use planning and best management practices. Group C (including N1, N3, N4, N5, N6, S1, S2, S3, S6, M4, M9 and M10) Group C was far from the major point and nonpoint pollution sources and correspond to relatively low pollution (LP) regions. In this group, S3, S6 and M4 were located in the drinking water source protect region, which the water of a particularly high quality was needed for drinking water supplies. Relatively low concentration of all monitored water parameters were observed in N3 and N4, possibly attributed to high water flow and long stream length in . N1, S1 and S2, situated at the upstream reach of the SRB, were far from major point pollution sources and received pollution from nonpoint sources, i.e., mostly from agricultural activities and catchment runoff. Maintaining the natural geomorphologic features, especially the meandering pattern of the river, is also compulsory for the good ecological condition of the river, and it is a key factor in preserving the self-cleansing capacity of the river.

4 CONCLUSIONS In this study, the PCA/FA and CA techniques were used to evaluate spatial variations in surface water quality and identify the pollution sources responsible for water quality in the Songhua River Basin. PCA/FA identified three latent factors and explained 83.79% of the total variance, standing for organic pollution (VF1), metal pollution (VF2) and oil pollution (VF3), respectively. The main causes of degradation of the SRB were the discharge of industrial, agricultural wastes and municipal sewage water. Mining and petroleum exploitation were also among the major sources responsible for surface water quality deterioration. HCA grouped 29 different sampling sites into three groups, i.e., relatively less polluted (LP), moderately polluted (MP) and highly polluted (HP) sites, based on the similarity of water quality characteristics. According to the obtained information, it is possible to design an optimal future spatial monitoring network with lower cost, which could reduce the number of sampling sites and choose only from group A, B and C. Therefore, this study illustrates the usefulness of multivariate statistical techniques for the analysis and interpretation of huge and complex data sets, identification of pollution sources and better understanding variations in water quality for effective surface water management. Besides, interventions should be made to reduce anthropogenic discharges in the SRB; otherwise, high levels of pollution will greatly influence the population and will incur socio-economic disaster.

ACKNOWLEDGMENT The authors should acknowledge the Ministry of Science and Technology, People’s Republic of China as a water special project (grant No. 2012ZX07501002-001) for its supporting this work.

- 97 - http://www.ivypub.org/fes

REFERENCES

[1] Bierman, P. “A review of methods for analyzing spatial and temporal patterns in coastal water quality.” Ecological Indicators. 11(2011): 103-114 [2] Ouyang, Y., Nkedi-Kizza, P., Wu, Q.T., Shinde, D., Huang, C.H. “Assessment of seasonal variations in surface water quality.” Water Research, 40(2006): 3800–3810 [3] Koklu, R., Sengorur, B., Topa, B. “Water quality assessment using multivariate statistical methods—a case study: Melen River system (Turkey).” Water Resource Management, 24(2010): 959-978 [4] Zare, G.A., Sheikh. V., Sadoddin, A. “Assessment of seasonal variations of chemical characteristics in surface water using multivariate statistical methods.” International Journal of Environmental Science and Technology, 8(2011): 581-592 [5] Singh, K.P., Malik, A., Mohan, D., Sinha, S. “Multivariate statistical techniques for the evaluation of spatial and temporal variations in water quality of Gomti River (India): a case study.” Water Research, 38(2004): 3980-3992 [6] Tobiszewski, M., Tsakovski, S., Simeonov, V., Namiesnik, J. “Surface water quality assessment by the use of combination of multivariate statistical classification and expert information.” Chemosphere, 80(2010): 740-746 [7] Bhuiyan, M.A.H., Rakib, M.A., Dampare, S.B., Ganyaglo, S., Suzuki, S. “Surface water quality assessment in the Central Part of Bangladesh using multivariate analysis.” KSCE Journal of Civil Engineering, 15(2011): 995-1003 [8] Kumarasamy, P., James, R.A., Dahms, H. “Multivariate water quality assessment from the Tamiraparani river basin, Southern India.” Environmental Earth Science, 71(2014): 2441-2451 [9] Zhang, B., Song, X.F., Zhang, Y.H., Han, D.M., Tang, C.Y., Yu, Y.L., Ma, Y. “Hydrochemical characteristics and water quality assessment of surface water and groundwater in Songnen plain, Northeast China.” Water Research, 46(2012): 2737-2748 [10] Mustapha, A., Aris, A.Z., Juahir, H., Ramli, M.F., Kura, N.U. “River water quality assessment using environmentric techniques: case study of Jakara River Basin.” Environment Science and Pollution Research, 20(2013): 5630-5644 [11] Shrestha, S., Kazama, F. “Assessment of surface water quality using multivariate statistical techniques: A case study of the Fuji river basin, Japan.” Environmental Modelling & Software, 22(2007): 464–475 [12] Wang, X.Z., CAI, Q.H., Ye, L., Qu, X.D. “Evaluation of spatial and temporal variation in stream water quality by multivariate statistical techniques: A case study of the Xiangxi River basin, China.” Quaternary International, 282(2012):137-144 [13] Helena, B., Pardo, R., Vega, M., Barrado, E., Fernandez, J.M., Fernandez, L. “Temporal evaluation of groundwater composition in an alluvial aquifer (Pisuerga river, Spain) by principal component analysis.” Water Research, 34(2000): 807–816 [14] Liu, C.W., Lin, K.H., Kuo, Y.M. “Application of factor analysis in the assessment of groundwater quality in a Blackfoot disease area in Taiwan.” The Science of the Total Environment, 313(2003): 77–89 [15] McKenna JE, Jr. “An enhanced cluster analysis program with bootstrap significance testing for ecological community analysis.” Environmental Modelling & Software, 18(2003): 205–220 [16] Kotti, M.E., Vlessidis, A.G., Thanasoulias, N.C., Evmiridis, N.P. “Assessment of river water quality in Northwestern Greece.” Water Resource Management, 19(2005):77-94

- 98 - http://www.ivypub.org/fes