Applications of Artificial Intelligence in River Quality Surveys
Total Page:16
File Type:pdf, Size:1020Kb
Applications of Artificial Intelligence in River Quality Surveys Research and Development Project Record EM62116 ENVIRONMENT AGENCY All pulps used in production of this paper is sourced from sustainable managed forests and are elemental chlorine free and wood free Applications of Artificial Intellig’ence in River Qtiality.Sh-veys R&D Project Record El/i621/6 .’ W J Walley and R W Martin Research. Contractor:. School -of Computing, Staffordshire University Further copies of this report are available from: Environment Agency R&D Dissemination Centre, c/o WRc, Frankland Road, Swindon, Wilts SN5 8YF WC tel: 01793-865000 fax: 01793-514562 e-mail: [email protected] Publishing Organisation: Environment Agency Rio House Waterside Drive Aztec West Almondsbury Bristol BS32 4UD Tel: 01454 624400 Fax: 01454 624409 TH-6/98-B-BCMP 0 Environment Agency 1998 All rights reserved. No part of this document may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without the prior permission of the Environment Agency. The views expressed in this document are not necessarily those of the Environment Agency. Its officers, servant or agents accept no liability whatsoever for any loss or damage arising from the interpretation or use of the information, or reliance upon views contained herein. Dissemination status Internal: Released to Regions External: Released to the Public Domain Statement of use This document contains supporting technical information for two Technical Reports (12 “Distribution of Macroinvertebrates in English and Welsh Rivers based on the 1995 Survey”, and E52 “Applications of Artificial Intelligence for the Biological Surveillance of River Quality”) that were produced as part of National R&D Project El/i621. It deals primarily with the validation of the data used in the project, and is provided for use by those wishing to examine this aspect of the project in detail. Research contractor This document was produced under R&D Project El/i621 by: School of Computing Staffordshire University The Octagon Beaconside Stafford ST18 OAD Tel: 01785 353510 Environment-Agency’s Project Manager The Environment Agency’s Project Manager for R&D Project El/i621 was: Dr John Murray-Bligh - Environment Agency, Thames Region R&D Project Record El/i621/6 CONTENTS Page . Executive Summary 111 1. Introduction 1 1.1 Scope of the Document 1 1.2 Summary of Events 1 2. Validation of.the 1990 Data * 5 2.1 Introduction 2.2 Analysisof Biological Databases 2.3 Analysis of Chemical Databases 2.4 Biological-Chemical Site Matches 3. Validation of the 1995 D&ta 11 3.1 Introduction 11 3.2 Site Classifiers 11 3.2.1 Errors and inconsistences in the TAXA Jield 11 3.2.2 Missing values in the alkalinity field 12 3.2.3 Duplicate samples 13 3.3 The Sites Database 13 3.4 The Taxonomic Database 14 3.5 Identification of Erroneous Environmental Variables 14 3.6 Corrections to Abundance Data in the 1995 Survey Databases 15. 3.7 Construction of the Project Databases 17 3.7.1 Biological Databases 17 3.7.2 Biological-Chemical Databases 18 4. References 19 APPENDIX A Tables Relating to.the Validation of the 1990 Survey Data Al Details-of biological databases acquired A-2 A2 Details of chGnica1 databases acquired A-2 A3 List of taxa used in analysis of 1990 Survey data A-3 A4 Summary of validation analysis of the biological site database A-4 A5 Summary of STRETCH.DBF structure by Region A-5 A6 Summary of analysis of chemical STRETCKDBF databases A-6 A7 Summary of analyses of chemical stretches and biological sites databases A-7 R&D Project Record El/i621/6 APPENDIX B Tables Relating to the Validation of the 1995 Survey Data Bl Details of the regional distribution of ‘imposters’ in the 1995 database B-2 B2 Site Reference by Region of Sites with no Alkalinity Value B-3 B3 Sites with differing sample alkalinities, but valid averages B-4 B4 Details of sites where more than two samples have been recorded B-5 B5 Out-of-region grid reference errors B-5 B6 Taxa requiring the use of six-figure Thames codes B-6 B7 Taxa requiring the union of two four-figure codes B-6 B8 List of 99 taxa to be incorporated in the Project Databases B-7 B9 Details of ‘Errors’ identified by autoassociative neural network B-8 BlO Number of records in M2R95 and N2R95 files B-9 Figures Bl Distributions of coefficient “a” for North West (Northern & Southern Areas) B-10 B2 Distributions of coefficient “a” for North West (Central Area) B-10 APPENDIX C List giving Details of the Validated Sites incltided.in the Project Databases Anglian Region c-2 North East Region - Northhumbria Area c-11 North West Region c-15 Midland Region C-26 Southern Region c-40 South West Region - Devon and Cornwall Areas c-47 Thames Region c-55 Welsh Region C-62 South West Region - North Wessex and South Wessex Areas c-73 North East Region - Dales and Ridings Areas C-80 APPENDIX D List of Rejected Biological Sites and the Reasons for their Rejection Anglian Region D-2 Midland Region D-4 North East Region - Dales and Ridings Areas D-5 North East Region - Northhumbria Area D-7 North West Region D-8 S outhem Region D-9 South West Region - Devon and Cornwall Areas D-9 South West Region - North Wessex and South Wessex Areas D-10 Thames Region D-10 Welsh Region D-11 R&D Project Record El/i62116 EXECUTIVE SUMMARY This Project Record provides a detailed account of work carried out on National R&D Project El/i621 “Applications of Artificial Intelligence in ,River Quality Surveys”, that was not reported in either Technical Report E12.“Distribution of Macroinvertebrates in English and’. Welsh Rivers based.on the 1995 Survey” or Technical Report E52 “Applications of Artificial Intelligence- for the Biological Surveillance of River Quality”. It mainly covers details of the data validation aspect of the project, but also includes a complete summary of events and dates (Section 1). The data validation component of the project was more substantial than originally envisaged. This was because the initial intention of basing the study on the 1990-94 Survey databases was abandoned when the 1995 ‘Survey databases became. available. Thus many of the initial analyses, which were primarily: concerned with the. validation of the 1990 databases, were repeated later for the 1995 data. Section 2 and ,Appendix A cover with all aspects of the validation of the 1990-94 biological. and chemical databases, including summaries of the various analyses, brief details of matched biological and chemical sites and an outline. of the .problems encountered due to the- lack of consistency between regional chemical databases. Section 3 describes the various analyses that were- carried for the validation of the 1995 databases; and includes details of how the abundance data from two retions, Midlands (Lower Severn Area) and North West, were manipulated to bring them in line with the national scale for abundance data. Appendix B gives details of errors found in the biological data, plus the agreed list of taxa and details, ,of the regional distribution of validated biological sites. Appendix C gives the site reference number, ,location and name.of all validated biological sites that were included in the project database. It also indicates which of the sites had matched chemical sites. Appendix D lists all biological sites that were rejected during the validation process due to errors or inconsistencies in their data fields. The reasons for rejection are also given. This document-.will be of interest to users and managers of the Agency’s databases, It will, be of particular value to anyone seeking to redesign or improve the databases. Keywords: River Surveys, England, Wales, data validation, biological database, chemical database, errors, bio-monitoring. R&D Project Record El/i621/6 111 1. INTRODUCTION 1.1 Scope of the-Document This document provides an overall outline of the project events and milestones and a detailed record of those aspects of the project that were not fully recorded in either Technical Report El2 “Distribution of Macroinvertebrates in English and Welsh Rivers based on the 1995 Survey” @Valley and Martin, 1997) or -Technical Report E52 “Applications of Artificial Intelligence for the Biological Surveillance of River Quality” (Walley et nl., 1998). It was the. original intention to; base the project on the .1990 .biological and chemical databases, and much work was done to validate and match these databases. However, delays and problems arising from this work made it both possible and more attractive to base the project on the 1995 database. Thus it .became necessary to repeat the data validation process for the 1995 Survey data; These validation exercises, carried out on the 1990 .and 1995 National Survey databases, are fully described below. In addition, a complete listing is given in Appendix C of the 6038 sites that made up the final 1995 database of validated biological sites that were.used in the project. Also given, in .Appendix D, is a list of the 675 sites that were rejected from the database, together with the reasons for their rejection. 1.2 Summary of Events The following schedule .of events gives details, of the activities undertaken and principal milestones in the project. 1 Ott 95, Project commenced. Validation of 1990 data Acquisition of biological and chemical databases. Compiled agreed list of taxa . Identified Maitland code errors inzdatabases. Identified use of erroneous abundance scales by two regions. Identified missing/erroneous data in the biological NNNSITE database. Identified errors/problems in : chemical STRETCH databases.