ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

Interactive web-based data generation software applicable for river engineering Girish S. Katkar#1, Dinesh A. Lingote*2, Ritesh Vijay@3 1 Head, Dept. of Computer Science, Taywade College, Koradi, Nagpur, M.S., India 2 Sr. Scientist, CSIR-National Environmental Engineering Research Institute, Nehru Marg, Nagpur, India 3 Principal Scientist and Head, CSIR-National Environmental Engineering Research Institute, Nehru Marg, Nagpur, India [email protected],[email protected] (corresponding author),[email protected]

Abstract— It's an era of automated system, many researching RSC (Residual Sodium Carbonate), SAR (Sodium adsorption organizations are working on automated system to ease burden ratio), Sodium, Sulfate, TC (Total Coliform), Fluoride, Lead, of conventional systems. Among all, if data generation system for Copper and Zinc, PH (Potential of Hydrogen), turbidity, water large geographical area like: River modelling, Biodiversity and conductivity, total alkalinity, hardness, arsenic, BOD air quality monitoring is considered then conventional method (Biochemical Oxygen Demand), Calcium, chloride, chlorine, used for data generation realizes burdensome, time consuming, COD (Chemical Oxygen Demand), DO (Dissolved Oxygen) never ending process and sometime fails even. Consequently, it is have been created and its water quality values have been very essential to develop auto data generation techniques which depicted on or closed to SPs [3]. will generate large data to support conventional method (form a team of researcher, carry-out field work, collect sample, test it with the instruments and generate data). This research paper introduces a web-based software developed for the Kanhan river to generate river water quality data for its modelling. Software uses different data generation techniques like: data extraction, data estimation, data generation using public-partnership and data generation using scientific software respectively. This research paper also demonstrates its public portal utilized for the public health management.

Keywords— data generation, Extraction, Data Estimation, River Engineering, Information System, Modelling, public health

I. INTRODUCTION River is very important for living organism, over exploitation and urbanization has ecologically stressed the Indian rivers. Specifically, If Kanhan River is considered, then this river also stressed environmentally. Kanhan River originates from Satpura hills situated in Chhindwara districts of Madhya Pradesh, river progresses to Nagpur district of Fig. 1 Considered path for the Kanhan River Maharashtra and lastly confluences the Wainganga River [1]. To develop this software for Kanhan River, around 300Kms Web-pages are published using tomcat web server [4] and of Kanhan River stretch from Amla Dam, Madhya Pradesh to geo-information is published using Geoserver 2.11 [5]. Gosekhurd Dam, Maharashtra, India is considered along with Software uses different techniques for information generation the water sources confluences the river. Software records as shown in Fig. 3. and discussed in detail as follows. several water quality parameters and disseminates using its public and private portal. Nearby villages and Nagpur (Maharashtra, India) city uses Kanhan river water for drinking and domestic usage; Research team is kin in conserving Kanhan River, hence developed this software to address river issues in the public domain.

II. SOFTWARE METHODOLOGY In view to generation information for Kanhan river, SPs (Study points) are imposed on the river stretch at 100-meter distance (Fig. 2). Bing Map is used as a base map [2], which is overlaid with different information layers. Several information layers like: ESP (Exchangeable Fig. 2 Study points imposed on the river path Sodium Carbonate), FC (Faecal Coliform), Hydrogen Carbonate, Iron, Magnesium, Nitrate, Phosphate, Potassium, IJSRCSAMS Volume 7, Issue 6 (November 2018) www.ijsrcsams.com

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

Data Generation

Data Extraction

Public Partnership

Scientific and modelling software Bulk Data upload

Data Estimation Data Share and update

Fig. 5 login screen. Fig. 3 Information generation techniques

A. Data generation (Conventional method). Authenticated users from different organization can register into the software (Fig. 4). Registration request of the user reaches to system administrator and after approval from administrator research-team user gets registered into the software. Using this simple procedure software forms a community of the researchers. Research-team user can login into the software (Fig. 5) and can generate main data. Team can carry out field work and collect water quality data to upload into the system. To generate data from all possible modes, research-team user is facilitated with utilities like: data share, data upload, data download (Fig. 6), point by point Fig. 6 Data Download data upload and Bulk data upload (using excel file). Apart from basic data generation, research team can also generate data using data extraction and data estimation techniques B. Public-partnership. provided by the software which is discussed associated section. If recent trend is studied, then data generation using public- partnership (It is also called as citizen assisted systems) is observed more popular and effective technique. Most of the physicochemical parameters can be recorded in the presence of water. Guest-user of the software can record presence or non-presence of the water on or closed to the SPs. Although such data has low significance in the system, but such feedback is very useful in validating the data. Software maintains different Boolean flags to identify the modes of the data generation. Among these, if guestuser flag is set to TRUE, it means data is generated by the guest-user and public- partnership mode is used for the data generation. When guest- user generates a data, software automatically sets guestuser flag to TRUE and by-default sets approved flag to FALSE. Data generated by guest-user won't be reflected to research- team till approved flag remains FALSE. Administrative user of the software can observe such data and can set approved flag to TRUE on ensuring its validity.

Fig. 4 User registration.

IJSRCSAMS Volume 7, Issue 6 (November 2018) www.ijsrcsams.com

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

causes hurdle as R&D needs data in complete form. Therefore, such missing data essentially need to be best estimated.

C. Bulk data upload. Software uses two methods for data estimation Mean and Maximum Likelihood Estimation (MLE) method respectively. Depending upon nature of data user can opt method for the data estimation. As indicated in the Fig. 9, user can click on "Data generation" link and then can click on "Estimate data" link provided in the appeared page. As shown in Fig. 10 software is featured to estimate data for single SPs or multiple SPs. software considers non-zero values in the given range of Fig. 7, Bulk data Upload SPs, calculates mean and returns estimated value (or values) This software supports data generation using scientific to single or multiple SPs, if user selects Mean option for the software; most of the scientific and modelling software data estimation. Similarly, software calculates MLE for every generates data in excel file format or CSV (Comma Separated no-zero value available in the given range and returns value Value) file format. Considering this, software facilitates quick (or values) to single or multiple SPs for which MLE data upload utility (Fig. 7). However, bulk-data upload calculated is high. While estimating and returning values to requires authentication from administrator; if data is generated the missing SPs, software prioritizes existing non-zero-data using bulk data upload utility, then such data will be reflected and leave them unchanged while estimation. in the software and will be available for the research-team, if administrator-user approves it.

D. Data share and update.

Developed software ensures all-time availability of the data. Generated data is geo-mapped, thus offers great visualization, look & feel of the data. Using such features research-team user can validate and confirm the data. Recorded physicochemical parameters are depicted as information layer (Fig. 8). This ready platform offers great navigation features for the research-team and encourages them to generate more data. Provided shareware environment also offers collective visualizations, which enables researcher to study data and its connectivity. Subsequently, observing data and their inter- relationship next phase of data can also be generated. Fig. 9 Data estimation

Software has seamless feature to generate diverse information, which can be used for executing many Research and Development (R&D) projects associated with the rivers [6].

Fig. 10 Mean and MLE options for data estimation

(MEAN)

Fig. 8 Data shared among the users and update (MLE) P( ) = . E. Data Estimation. River goes through valleys and dense forest, thus data Where n is total number of samples, = 3.14, x is sample value collection at every SPs won't be possible. Apparently, for MLE need to be calculated generates missing data at essential SPs. Such missing entries IJSRCSAMS Volume 7, Issue 6 (November 2018) www.ijsrcsams.com

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

F. Scientific and modelling software. Many free software like Earth [7], QGIS (Quantum Geographic Information System) [8] and uDig are available for generating vector layer information. Referral latitude and longitude provided by google-earth is extracted using its KML (Keyhole Markup language) and KMZ (Keyhole Markup Language Zipped) . Using such referral coordinates data is globally pointed on the map and shareware facility is created. Similarly, scientific software like: ArcInfo, ERDAS IMAGINE and some river modelling software can also be used for generating relevant data. Scientific software mostly generates output file in either CSV or excel file format, thus Fig. 11 Data extraction from text file bulk-data uploads utility already included in the software.

G. Data Extraction. Software provides several feature for data extraction; uses general available free libraries / utilities to extract data from pdf, word, and excel. River modelling mostly require numerical data; hence only structured data is considered for the extraction. Having such defined frame, software uses customized algorithm to extract data from text and html file. To extract data from text file software uploads text file into its system folder (Fig.11), collects column width from the user for each column (Fig. 12) and generate excel file. Since software prefixed user declared width for each column, thus Fig. 12 Data extraction from non-delimited text file replaces a column value with zero, if received null string. Software follows same procedure for the extraction and extracts data line by line till end of the file. Extracting data from the web is bit complicated, web-page designer mostly follows different style for designing web documents. There are no specific recommendations for designing web-pages, style attributes of webpage can either be provided along with html tags or separate CSS (Cascading Style Sheets) file can be used for designing web documents. Accuracy in the data extraction can’t be maintained, if complete auto extraction is followed. Mostly human intervention is essential to have control over data extraction; Considering such limitations, customized algorithm is developed which extracts data occurs in between outer

tags of the HTML (Hypertext Markup Language). Software is based on algorithm which follows pre- order tree traversal technique, it traverse input file top to bottom. Ignores HTML tags till the occurrence of tag. Fig. 13 Data extraction from HTML file. Counts number of lines occurs in between outer
and
, these counted lines are the required data, simple III. OTHER UTILITIES OF THE SOFTWARE. filtration techniques are applied to obtain essential data and lastly stores output in excel file. Images provided in tag A. User management are not counted as a structured data, hence tag is Software facilitates user management module and ignored while extracting data. As shown in Fig. 13, software maintains three user levels like: administrator, research-team upload html file into system folder, apply extraction algorithm and guest-user respectively. Administrator user of the and stores output in excel file format. General available software has all privileges and manages over all activities in features of Microsoft excel are also utilized for filtering and the software. Whereas research-team user of the software is tuning extracted data and thus data is generated. privileged to generate data with all possible medium. Guest user of the software is privileged with limited features and can contribute data point by point only.

IJSRCSAMS Volume 7, Issue 6 (November 2018) www.ijsrcsams.com

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

B. Public portal Basic layers: Generally, research activity takes significant time to reach Aerial, Aerial + Road, public domain. Public portal of the software immediately Road, Road + greenery disseminates approved data generated by research-team on its public portal. Using "Layers" link, various recorded Layer -Type: physicochemical factors can be observed against permissible Tiled and Untiled level at different SPs (Fig. 16). This feature helps common Refresh map data people to observe water quality at different geographical Application layers or System layers locations before using river water for drinking and domestic usage and thus helps in health management (Fig 15). As Year-wise data filter shown in Fig. 16 software provides several tools for observing Refresh map geo-mapped data. Software introduces its all features from Positions to start point public portal; depicts basic information of the Kanhan River on its home page, recorded water quality can be observed using link "Layers", data can be analysed using link "Reports", Positions to center point new guest-user or research-team user can register into software using link "Register" and lastly, user can login into Positions to end point the software using link "Sign-in" as shown in Fig. 14. Tour of river path Rotate map clock- wise, anti-clock-wise

Recorded information Filter data layers

Fig. 16 Various tools provided for observing and monitoring Geo-mapped data

C. Data analysis River study requires years-wise data and also requires data for the consecutive years. To cater such need, software facilitates to generate enormous data using different methodology. When numerous data gets generated then it is also essential to analyse the generated data periodically. Fig. 14 login screen of the public portal Considering it, software provides user friendly data analysing tools under its "Reports" link. As shown in Fig. 17, software provides column-graph, line-graph, scattered-graph, Spline- graph, Area Graph and bar-graph for the selected information layer and for the selected year against the SPs.

Fig. 15 Noted sample DO, SAR and COD values close to Bichuwa Baggu village , Madhya Pradesh, India. Fig. 17 Column graph indicates COD for the year 2017 IJSRCSAMS Volume 7, Issue 6 (November 2018) www.ijsrcsams.com

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

IV. RESULT AND DISCUSSION River follows longest path, it's very essential to monitor human activities along with the river path to protect these rivers from environmental damage. Many research bodies came forward to conduct research activity for these rivers. To conduct R&D, data requirement for such large geographical area is big challenge [9]. Introduced software uses all possible medium for the data generate which ultimately support research-team to plan mega research projects. Software provides features to create several information layers for The Kanhan river information generation. Out of these information layers, Fig. 18 shows scattered-graph of COD values for the year 2017 against SPs, Fig. 19 shows column-graph of COD values for the year 2017 against SPs and Fig. 20 shows Hardness of water for the year 2017 against SPs. Similarly, Fig. 21 and Fig. 22 shows the overall geo-mapped data generated using different supported data generation techniques. Fig. 20 Column graph indicates Hardness of water for the year 2017 Due to sudden urbanisation, many Indian rivers are ecologically stressed. To conserve these rivers, it is very essential to address river issues in public domain. It is also essential to generate river water quality data and disseminates for the common people with easiest shareware medium so that common people can observe water quality at different geographical location before its usage. Considering future requirements, data generated using such web-based software can be effective solutions for taking policy decisions, health management and for the environmental protection.

Fig. 21 shows geo-mapped data closed to Gosekhurd Dam

Fig. 18 Scattered-graph indicates COD for the year 2017

Fig. 22 shows complete generated data closed to Gosekhurd Dam

Fig. 19 Column graph indicates COD for the year 2017 IJSRCSAMS Volume 7, Issue 6 (November 2018) www.ijsrcsams.com

ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies

V. CONCLUSION AND FUTURE SCOPE Indian plateau, Environ Monit Assess. 2008 Dec; 147 (1-3):83-92. Epub 2007 Dec 22. Considering volume of data to be generated, many R&D [2] Bing Map, attributed for its free services given for satellite image and organisations are adopting different methodology for data road network maps on which vector layers are imposed generation instead of stressing conventional method; so this [3] Thorat Prerana B.1 and Charde Vijay N.2, " Physicochemical Study of software does. Used techniques has wide scope in conducting Kanhan River Water Receiving Fly Ash Disposal Waste Water of mega research projects as it bears ability to generate enormous Khaperkheda Thermal Power Station, India", International Research data required to carry out study for large geographical area. If Journal of Environment Sciences, Vol. 2(9), 10-15, September (2013). recent environmental threats like: air-pollution, climate- [4] Tomcat 8.0, attributed as tomcat web server is used for publishing change, bio-diversity and rivers water pollutions are webpages. considered, then it has significantly affected health of living [5] GeoServer 2.11, attributed as customized information layers are published using services of GeoServer 2.11 organisms. Many deadly diseases have emerged and now [6] T. P. Latchoumi#1, D. Premanand*2, " An Approach for Quality health management becomes very challenging. Assessing Management Systems Based On Data Warehouses", International alarming situation, it is essential to generate sufficient data Journal of Scientific Research in Computer Science Applications and which can be utilised for environmental protection, public Management Studies, Volume 3, Issue 2 (March 2014) awareness and health management. [7] , attributed for its free service, digitized vector layer on satellite image, creating KMZ / KML Abbreviations and Acronyms [8] QGIS, attributed for generating shape file from KMZ / KML SPs Study Points [9] M.Keerthana, "A Survey on Employee Performance Prediction using ESP Exchangeable Sodium Carbonate Data mining", International Journal of Scientific Research in Computer Science Applications and Management Studies, Volume 3, Issue 4 FC Faecal Coliform (July 2014) HC Hydrogen Carbonate RSC Residual Sodium Carbonate SAR Sodium Adsorption Ratio TC Total Coliform PH Potential of Hydrogen BOD Biochemical Oxygen Demand COD Chemical Oxygen Demand GIS Geographic Information System DO Dissolved Oxygen CSV Comma Separated Values R&D Research and Development MLE Maximum Likelihood Estimation QGIS Quantum Geographic Information System KML Keyhole Markup Language KMZ Keyhole Markup Language Zipped GPS Global Positioning System HTML Hypertext Markup Language CSS Cascading Style Sheets

ACKNOWLEDGMENT I wish to express my sincere gratitude to Dr Rakesh Kumar, Director CSIR-NEERI for providing me an opportunity to complete Ph.D. work. I sincerely thank to Dr Girish S. Katkar, Professor and Head, Department of Computer Science and Application, Art, Commerce & Science College, Koradi, Nagpur for his guidance and support to carry out this research work. I sincerely thank to Dr R. B. Biniwale, Sr. Principal Scientist and Head, Cleaner Technology and Modelling Division, NEERI for his encouragement and support to carry out Ph.D. work. Lastly, I would like to express my gratitude to the scientist and engineers working with me for their guidance and support.

REFERENCES [1] Dr. G. K. Khadse, P. M. Patni, P.S. Kelkar, S. Devotta, Qualitative evaluation of Kanhan River and its tributaries flowing over central

IJSRCSAMS Volume 7, Issue 6 (November 2018) www.ijsrcsams.com