Explaining eco- and the ecological data chain

by Avinash Chuntharpursat, information management scientist, SAEON

Data management can be a daunting task especially to those new to the field of eco-informatics.

To start this process a definition of eco- A few definitions have been circulating, Eco-informatics characterise the informatics needs to be looked at. mostly centered on the creation of tools to of natural system knowledge. For this The following is an explanation from access and analyse natural system data. , much of today’s eco-informatics Wikipedia: However, the scope and aims of eco- research relates to the branch of known as “knowledge “Eco-informatics, or ecological informatics, informatics are certainly broader than representation, and active eco-informatics is the science of information (informatics) the development of pedestrian metadata projects are developing links to activities in ecology and environmental science. It standards to be used in documenting such as the Semantic Web [1].” integrates environmental and information datasets. Eco-informatics aims to facilitate sciences to define entities and natural environmental research and management Keeping this explanation in , it processes with language common to both by developing ways to access and integrate emerges that eco-informatics deals with humans and computers. However, this is of environmental information, the complexities of handling ecological a rapidly developing area in ecology and and to develop new enabling and environmental data. Fig. 1 is a there are alternative perspectives on what different environmental datasets to be flowchart of the ecological data handling constitutes eco-informatics” [1]. combined to test ecological hypotheses. process.

Planning of experimental/ observational trial

Methodology used Quality assurance

Stringency in collecting data Metadata

Accuracy of electronically capturing data

Quality control

Dexterity in preserving data

Accuracy and relevancy of the data analysis

Fig. 1: Flowchart of the ecological data handling process.

27 The for this being that the with different steps of the data chain. QC activities include general methods such complexity of the data process chain Quality assurance is largely associated as accuracy checks on data acquisition involves the planning of experiments with the data creation (such as planning and calculations and the use of approved and observations, methodology used, and methodology) and analysis steps, standardised procedures for (emission) stringency in collecting data, accuracy while quality control is largely associated calculations, measurements, estimating of electronically capturing data and the with the data capturing and archiving uncertainties, archiving information and dexterity of preserving data. Inclusive in steps. For a better understanding of why reporting. Higher tier QC activities include the data chain process is the analysis and how this occurs, a closer look at the technical reviews of source categories, and reuse of data to yield higher level definitions of QA and QC are needed. activity and (emission) factor data, and information products and the accuracy methods. and relevancy of the analysis. For the purposes of this exercise, the definitions of QA and QC used by the Quality assurance (QA) activities include From Fig. 1, three important components Intergovernmental Panel on Climate a planned system of review procedures emerge from the data chain. These Change (IPCC) are found to be suitable. conducted by personnel not directly are the metadata, quality control and These definitions are based on greenhouse involved in the inventory compilation/ quality assurance. Metadata is the data/ gas emissions but can be applied in development process. Reviews, preferably information about the data. A good a broader ecological context. The metadata record is needed along each by independent third parties, should be step of the data chain. The entire history definitions from Chapter 8 of the “IPCC performed upon a finalised inventory of the development of the dataset should Good Practice Guidance and Uncertainty following the implementation of QC be recorded and tracked in the metadata. Management in National Greenhouse procedures. Reviews verify that data A suitable standard for the capture and Gas Inventories” are as follows: quality objectives were met, ensure that the inventory represents the best possible storage of metadata should be used. The “Quality control (QC) is a system of estimates of emissions and sinks given Ecological Metadata Language (EML) is routine technical activities, to measure the current state of scientific knowledge one such standard that can be used for and control the quality of the inventory as ecological metadata. More information and data available, and support the it is being developed. The QC system is on EML and metadata can be accessed effectiveness of the QC programme” [3]. designed to: from the following sites: (i) Provide routine and consistent checks For the implementation of QC activities • www.ecoinformatics.org and to ensure data integrity, correctness, various software packages are available • www.eepublishers.co.za/view. and completeness; for particular fields of research. These php?sid=15047 (ii) Identify and address errors and packages aid in the identification of The other two components are quality omissions; missing values and other errors such as control (QC) and quality assurance (QA). (iii) Document and archive inventory statistically significant outliers. However, a In the diagram, QA and QC are associated material and record all QC activities. strong understanding of the nature of the data is a prerequisite for quality control activities. In the definition of QA, relevance is made to reviews by a third party. This is particularly relevant to the SAEON situation. The SAEON nodes which conduct research and observations in different bioclimatic regions of the country have node liason committees. These committees are convenient to act as independent auditors of data quality. Many data producing organisations have such technical committees. These committees could also play an important role in ensuring that the data is of the highest standards for their relevant organisations. References [1] Wikipedia http://en.wikipedia.org/wiki/ Ecoinformatics [2] Ecological Circuits, Issue 1, 2008. SAEON / EE Publishers. www.eepublishers.co.za/ view.php?sid=15047 [3] Good Practice Guidance and Uncertainty Management in National Greenhouse Gas Inventories, 2000. Intergovernmental Panel on Climate Change (IPCC). www.ipcc-nggip.iges.or.jp/public/gp/ english/ 

28