<<

Volume 6 06/04/00 11:59 Page 1

BCIS BIODIVERSITY CONSERVATION INFORMATION SYSTEM

Framework for Information Sharing Series Editor John R. Busby

VOLUME 6: STANDARDS & QUALITY ASSURANCE Volume 6 06/04/00 11:59 Page 2

©2000. Biodiversity Conservation Information System. Reproduction of this publication for educational or other non-commercial purposes is authorised without prior permission from the copyright holders, provided the source is acknowledged. Reproduction for resale or other commercial purpose is prohibited without the prior written permission of the copyright holders. The views expressed in this handbook do not necessarily reflect those of individual BCIS Members or Partners. The designations of geographical entities in this report and the pre- sentation of the material do not imply the expression of any opinion whatsoever on the part of BCIS or its Members, or other participating organisations concerning the legal status of any , territory, or area, or of its authorities, or concerning the delimitation of its frontiers or boundaries. Citation: Biodiversity Conservation Information System. 2000. Framework for Information Sharing: Standards & Quality Assurance . Busby, J.R. (Series Editor). Available from Program Manager, BCIS [contact details on http://www.biodiversity.org]

Every reasonable care has been taken to check that all web are correct at the time of writing (December 1999). Regrettably, sites and pages change over time so some addresses may no longer be correct.

2 Volume 6 06/04/00 11:59 Page 3

VOL. 6: STANDARDS & QUALITY ASSURANCE CONTENTS

Acknowledgements ...... 5

BCIS Framework for Information Sharing ...... 6

Manager’s Guide ...... 8 Context ...... 8 Actions ...... 8

1. Introduction ...... 9 1.1 The Purpose of Standards ...... 9 1.2 Standards Development ...... 9 1.3 Quality ...... 10 1.4 Quality Assurance ...... 11 1.5 Quality Control ...... 11

2. Data Creation and Caption ...... 12 2.1 Attribute Selection ...... 12 2.2 Data Formats ...... 12 2.2.1 Numeric Data ...... 12 2.2.2 Categoric Data ...... 13 2.2.3 Text ...... 13 2.2.4 Spatial Data ...... 14 2.2.5 Images ...... 15 2.2.6 Sounds ...... 15 2.2.7 Case Example ...... 15 2.3 Samplinesses ...... 17

3. Data Organisation ...... 19 3.1 Maintenance and Updating ...... 19 3.2 Archiving and Security ...... 19

3 Volume 6 06/04/00 11:59 Page 4

4. Data Access ...... 21

5. Information Dissemination ...... 22 5.1 World Wide Web ...... 22 5.1.1 Currency of information ...... 22 5.1.2 Presentation of Information ...... 22 5.1.3 HTML Metadata ...... 23 5.2 Compact Disc Read-Only Memory (CD-ROM) ...... 25 5.3 Maps ...... 25

6. Thematic Standards ...... 26 6.1 Taxon Names ...... 26 6.1.1 Accepted Names ...... 26 6.1.2 Synonyms, Homonyms and Invalid Names ...... 28 6.2 Georeferences ...... 28 6.2.1 ...... 28 6.2.2 Country Codes ...... 28 6.2.3 Other Systems ...... 29 6.3 Biological Collections Databases ...... 29 6.3.1 ASC Biological Collections Reference Model ...... 29 6.3.2 CDEFD Information Model for Biological Collections . .30 6.3.3 CRIS: Collections and Research Information System . . .30 6.3.4 HISPID: Herbarium Information Standards and Protocols for Interchange of Data ...... 31 6.3.5 ITF: International Transfer Format for Botanic Garden Plant Records ...... 31 6.4 Spatial Data ...... 31 6.4.1 Vegetation ...... 31 6.4.2 Soils ...... 32 6.4.3 Land Cover ...... 32 6.4.4 Transfer standards ...... 33 6.5 Thesauri ...... 33

4 Volume 6 06/04/00 11:59 Page 5

ACKNOWLEDGEMENTS

hese handbooks have been loosely modelled on the WCMC Handbooks Ton Biodiversity Information Management, edited by Jake Reynolds of the World Conservation Monitoring Centre and published by the Common- wealth Secretariat, London. These, in turn, benefited from experiences gained through the Biodiversity Data Management (BDM) Project, admin- istered by the Environment Programme (UNEP) and funded by the Global Environment Facility (GEF), and related initiatives supported through the (EU) and European Environment Agency (EEA). Fundamental to the development of this series have been the contribu- tions from numerous colleagues working in the field of biodiversity infor- mation management. Among these, particular mention goes to Dr Jake Reynolds and other staff of WCMC; Professor Ian Crain and Gwynneth Martin of the Orbis Institute, Ottawa; Kevin Grose, former Head of the Information Management Group at IUCN–The World Conservation Union; and staff of the Environmental Resources Information Network (ERIN) Unit, Environment , and of the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia. This Handbook has been adapted from World Conservation Monitoring Centre. 1998. WCMC Handbooks on Biodiversity Information Management. Volume 7: Data Management Fundamentals. Reynolds, J.H. (Series Editor). Commonwealth Secretariat, London. ix + 33pp.

The BCIS consortium Members extend their thanks and gratitude to the Norwegian Agency for Development Cooperation (NORAD) for providing financial support to develop the BCIS concept and these handbooks.

5 Volume 6 06/04/00 11:59 Page 6

BCIS FRAMEWORK FOR INFORMATION SHARING

he purpose of the BCIS Framework for Information Sharing is to support TBCIS Members and others making decisions on the conservation and sustainable use of living resources. The handbooks form part of a comprehensive set of supporting materials designed to build information- management capacity and improve decision-making. The intended audience includes senior managers in Member organisa- tions, their equivalents in other organisations, information and environ- mental-science professionals, and others who have a stake in the use or management of living resources. Although written to the specific need for improved management of biodiversity-related information within the BCIS network, the underlying principles apply to environmen- tal information networks in general, and to decision-making at all levels. The issues and concepts presented may also be applied in sectors other than biodiversity conservation: forestry, agriculture, wildlife management and beyond. The handbooks deal with a range of issues and processes relevant to the use of information in decision-making, including the strengthening of organisations and organisational linkages, data custodianship and management, metadata and the development of infrastructure to support data and information exchange. Experience suggests that some of the greatest challenges in information management today are concerned with organisational issues, rather than technical or scientific concerns. Consequently, topics are addressed at management and organisational levels, rather than from a technical or scientific standpoint. Nevertheless, in adopting this framework approach, BCIS has tried to adhere to recognised conventions and formalisms used in information management. Overall, the handbook series comprises: Executive Overview • Volume 1: Principles • Volume 2: Procedures Manual • Volume 3: Custodianship • Volume 4: Data Access • Volume 5: Metadata • Volume 6: Standards & Quality Assurance • Volume 7: Core Datasets • Volume 8: Tools & Technologies

6 Volume 6 06/04/00 11:59 Page 7

Collectively, the handbook series promotes a shift from tactically based information systems, aimed at supporting individual projects, to strategic systems that promote the development of information infrastructure through the building of capacity within BCIS and other networks. This approach not only encourages data to be managed more effectively within organisations, but also encourages data to be shared amongst organisa- tions for the development of the integrated products and services needed to address complex and far-reaching environmental issues. The handbook series can be used in a number of ways. Individual hand- books can be used to guide managers and professional staff on specific aspects of information management or they can be used collectively as a reference source for strategic planning and project development. The Handbook assesses the objectives for standards and for quality assur- ance/quality control. It reviews standards applicable to the various stages of data and information management and for a number of the data themes relevant to the conservation of biological diversity.

7 Volume 6 06/04/00 11:59 Page 8

MANAGER’S GUIDE

Context s the ‘Information Age’ matures, corporate performance, not to men- Ation reputation, will increasingly hinge on the quality and relevance of information. Information will increasingly be regarded as a vital corpo- rate asset, whether it is used largely internally or marketed externally. As a consequence, corporate investment in data and information infrastructure, already substantial, can be expected to increase. The temptation, driven by ‘more urgent’ priorities, is to minimise atten- tion paid to and thus investment in standards compliance, quality assur- ance and documentation. Economies in these areas will prove false. Taking a short-term view of the ‘value’ of data collection and management activities can prove very costly in the long term. It can result in inefficien- cies such as duplication of effort, overlapping or confused responsibilities towards valuable corporate assets, failure to recognise that information resources are of poor quality, depreciating or contain important gaps, or even that they are no longer relevant.

Actions Management needs to protect and manage investment in data and infor- mation resources. With regard to standards and quality issues, management needs to: • ensure that a long-term, multiple-use view of data and information resources is taken; • commit to best practice in the development and management of corporate data and information assets; • mobilise the staff and financial resources required; and • control the costs of development of and compliance with standards.

8 Volume 6 06/04/00 11:59 Page 9

1. INTRODUCTION

1.1 The Purpose of Standards Standards1 are necessary for: • interworking, i.e. people and information systems working together to perform some task. Standards are essential to define the interfaces between the task components. This implies that there is some differ- ence between the components that, in the absence of common stan- dards, would make it unlikely that they could be used together. Examples include man-machine interfaces and data exchange standards; • portability, i.e. the ease with which a piece of software or a data format can be made to run on a new platform; and • reusability, i.e. using data or software developed for one application in another application. There may be de facto standards for various communities, or officially recognised national or international standards. Unfortunately, competing standards become a source of confusion, division, obsolescence and dupli- cation of effort instead of an enhancement to the usefulness of products. Standards are thus not an end in themselves, they are a means to an end. One important purpose of standards is to minimise the transaction costs of using data. Standards, therefore, need to be relevant to the end purpose, for example supporting data exchange. This implies that some cost-benefit evaluation be applied to investment in standards-setting processes. For low-volume, low-value data transfers, little if any investment should be made in devel- oping standards. The converse is, of course, manifestly true. One needs only to look at the TCP/IP standard for the Internet. Whatever investment was made to develop that standard has been repaid many times over (although perhaps not directly to those who made that investment).

1.2 Standards Development While there are obvious technical dimensions to standards setting, the process necessarily involves people and, consequently, is invariably com- plex and time-consuming. People, in general, have very mixed feelings

1 Descriptions of this and other terms derived from The Free On-line Dictionary of Computing (FOLDOC), , Editor Denis Howe

9 Volume 6 06/04/00 11:59 Page 10

about standards. Some believe standards are both necessary and desirable and are fully supportive of or contribute to standards-setting processes; others accept the former but only reluctantly the latter, and try to min- imise their involvement with standards; while yet others accept neither and actively avoid complying with standards. Standards-setting processes have largely been conducted by and within the first of the above groups. Enthusiasm has often triumphed over com- mon sense, thus many of the resulting standards are overly elaborate and difficult to comply with. They have consequently been poorly taken up by the communities for which they were ostensibly developed. The often narrow range of stakeholders involved in standards development results in low ownership of and commitment to the results. In addition, manage- ment has often failed to appreciate and engage with the management issues involved. Standards involve cost-benefit trade-offs, which affect corporate performance, therefore management needs to become involved with and actively manage the key issues involved. While unstandardised data can be expensive to exchange, costs of com- plying with standards can also be significant. Where standards are very complex and expensive to meet, management may reach the view that full compliance cannot be justified and that only partial compliance, or individual arrangements with suppliers or clients, would be more appropriate. Of course, the more generic the standard, the wider the range of applica- tions it can support.

1.3 Quality Quality is the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs. Quality should not be be mistaken for “degree of excellence” or “fitness for use” which meet only part of the definition. Only one component of quality, fitness for use is, however, a useful meas- ure for datasets and other information resources. A dataset can be of very high value for one purpose, yet virtually useless for another. The key issue with data quality is to document, adequately, the quality assurance process the data have undergone and any standards they purport to meet.

10 Volume 6 06/04/00 11:59 Page 11

1.4 Quality Assurance Often abbreviated as ‘QA’, quality assurance is the planned and system- atic pattern of all actions necessary to provide adequate confidence that the product optimally fulfils customer’s expectations. The process begins with data creation and ends with information dissemi- nation to users. Quality-assurance procedures can be applied to all stages. These include procedures to validate, maintain, document and secure data. It is the responsibility of custodians to ensure that these procedures are implemented in line with accepted standards and user demands (see BCIS Handbook: Custodianship). Within an organisation, quality-assurance procedures should be defined within a quality policy that is well understood by appropriate staff. The policy should set challenging objectives and targets for staff to achieve, such as specific levels of numerical or spatial accuracy in data collection, allowable error rates during validation, or consistent standards of docu- mentation. The targets need to be consistently applied across the organisa- tion and be measurable for monitoring and review. As well as internal review, organisations should also seek feedback from users of its products and services. The combination of internal and external reviews allows the organisation to correct deficiencies in data quality and continuously improve its quality-assurance procedures.

1.5 Quality Control Often abbreviated as ‘QC’, and combined with the above as ‘QA/QC’, quality control is the assessment of product compliance. Independently finding deficiencies assures compliance of the product with stated require- ments. QA/QC processes can be considered part of a family of processes ensuring that the organisation can time and time again deliver the product or services that meet the client’s quality requirements (consistent with ISO 90002 and ISO 140003 ).

2 3

11 Volume 6 06/04/00 11:59 Page 12

2. DATA CREATION AND CAPTURE

2.1 Attribute Selection n initial challenge is to decide what data to collect. Attribute selection Ain the first instance will dominate future options for analysis, model- ling and interpretation. The significance of this is frequently underesti- mated and raw data are collected from the field, often at considerable expense, without sufficient planning for their future use. Even more com- monly, data are collected with a particular, relatively narrow objective in mind where, with a little more planning and perhaps a marginal increase in cost, the data could serve a much greater range of uses. Ideally, the individual attributes should be recorded in compliance with some standard but, failing that, in a consistent way. Where there are com- peting standards, or no standard that is either available or followed, it is vital to be consistent and to thoroughly document the actions taken in the capture process. It is generally far easier to convert a dataset that is recorded consistently, even though not in compliance with any stand- ard(s), than to covert a dataset that purports to meet a standard, but inconsistently.

2.2 Data Formats4 2.2.1 Numeric Data Primary numeric data range from counts of species in particular locations to measurements of rainfall or tree growth. A count results in a whole number, while a measurement is made on a continuous scale. Data can also be generated by recording machines. It is important to differentiate between zero and value not recorded—with numeric data zero can be an actual value. Derived numeric data are obtained from the manipulation and analysis of other numeric data sets. One example is ranking, where the value of some attribute is determined in relation to other records, rather than to an intrinsic characteristic. Numeric data are used extensively in modelling and in the derivation of categoric data (see below). For example tempera- ture, rainfall and altitude of a particular site can be used to place that site within a global ‘life zone’ classification.

4 Source: Olivieri, S.T., Harrison, J. and Busby, J.R. 1995. Data and Information Management and Communication, pp. 607–670 in Heywood, V.H. (ed.) Global Biodiversity Assessment. Cambridge University Press

12 Volume 6 06/04/00 11:59 Page 13

2.2.2 Categoric Data These comprise classified or coded non-numeric data, and include attrib- utes such as soil type, land cover, forest type, or species or protected area designations. These data can generally be verified against a thesaurus or data dictionary comprising a list of allowable values. They are commonly used in association with maps. Often such categories are erected for a specific purpose and are not neces- sarily widely shared or understood. Thus other people may have great difficulty understanding what these categories mean in their terms. Primary numeric data should therefore take precedence over classified categoric data, wherever practicable. This will ensure that, if the categories become obsolete or a different classification system is to be introduced, new categories can be readily constructed from the primary data. If the primary data are not available, then expensive data re-collection exercises may become necessary. A particularly pertinent example is a number of national or even continen- tal projects that have collected data by grid cells, e.g. 10 km grids. The reasons for this have been largely driven by a focus on a particular form of output, generally distribution maps at a predetermined scale, and a desire to reduce the demands on the recorders to precisely locate them- selves and thus their records. Some of these projects have run for many years and involved many thousands of recorders. Unfortunately, the millions of records they have collected have proved virtually useless for purposes involving areas smaller than the designated grid cell or those that otherwise do not match the grid cell boundaries, e.g. protected areas or water catchments. If point records had been collected in the first instance, then they could have been allocated to either a grid cell or, just as readily, to any other kind of geographic area. Examples of primary and derived attributes are shown in Table 1. A particularly difficult example is species name, a derived attribute for which there is no feasible alternative. Most researchers have no option but to use species names (where one is available), even though these change from time to time. It is simply not practicable to store all the myriad pri- mary attributes of an organism so that it can be unambiguously allocated to a taxon whenever its group is revised.

2.2.3 Text Text consists principally of descriptions—including descriptions of species, protected areas and threats. By its nature, text is much less struc- tured than other kinds of data and can become uninformative or even

13 Volume 6 06/04/00 11:59 Page 14

Table1 Examples of Primary vs. Derived Attributes

Primary Derived (i.e. a point reference grid cells (e.g. 10 km grids), such as latitude and longitude— administrative zones (e.g. counties) increasingly obtained using technologies such as Global Positioning Satellites [GPS])

vegetation height in absolute vegetation height as ‘tall’, units, e.g. metres ‘medium’ or ‘low’

actual time of observation ‘early morning’, ‘dusk’

actual date ‘summer’ start and end dates of observation

mean annual temperature ‘hot’

misleading unless careful controls are placed on its compilation through, for example, specification of structure and content, including vocabulary control. Text, when attached to a dataset or map, can provide valuable extra description. The recent availability of ‘hypertext’ links has increased the richness and comprehensiveness of text material.

2.2.4 Spatial Data Maps have always been a valuable tool for the presentation of information because they provide a ready identification of the nature of the earth’s surface and the relationships among its various features. Essentially, any feature that can be ‘georeferenced’ can be placed on a map. Spatial data include the following:

VECTOR DATA: • points: species locations, human settlements, etc.—also used for map objects too small to show as a line or polygon feature; • lines: coastlines, rivers, power lines, etc.—also used for map objects that may be too narrow to display as an area, such as a road, or a feature with no width, such as a contour line;

14 Volume 6 06/04/00 11:59 Page 15

• polygons: country boundaries, administrative areas, lakes, protected areas, and various classified entities such as vegetation, soil, geology, land use, etc.

RASTER DATA: • digital elevation models, satellite imagery, classified entities for modelling purposes.

Maps, when well designed, are a useful means of presenting complex information in a format that is readily understood by a broad spectrum of people. The development of computer tools, in particular geographic information systems (GIS), has significantly enhanced our ability to gener- ate, manipulate and analyse spatial data (see also BCIS Handbook: Tools & Technologies).

2.2.5 Images Photographs and drawings of organisms and parts of their bodies are an essential aid to identification. Photographs of habitats or landscapes, taken periodically, can be used to assess environmental change. Combin- ing images and sound, video adds a dynamic component that allows, for example, the recording of animal interactions or movement patterns in particular habitats. Video can also be used to animate multiple images derived from analysing or modelling environmental measurements or species distributions and thus to gain powerful new insights into environmental driving variables and species responses to them.

2.2.6 Sounds Sound has particular value in identifying certain species, e.g. of birds or frogs, in habitats where they are difficult to see.

2.2.7 Case Example A framework for standards applicable to specimens and observations of species collected in the field is shown in Table 2. Note that missing attribute values should be indicated as such, rather than the fields just left blank.

15 Volume 6 06/04/00 11:59 Page 16

Table 2 Indicative Attributes and Standards for Species Occurrence Records

Attribute Standard Notes Record class type of record specimen—should include information on the collection (e.g. museum name or code), collection identifier and collector(s) name observation—should include name of observer literature—should include (link to) full bibliographic reference

Taxon name taxon authority International list (e.g. Species 2000 or taxon- list specific list); national checklist (e.g. for Australia: Census of Australian Vertebrate Species (CAVS) Version 8.1); may need to include supra- and infra- specific names to provide further context or additional detail, or even broader categories where the taxon cannot be identified with precision

Georeference latitude and universally applicable but needs to be recorded (geocode) longitude consistently (e.g. degrees, minutes, seconds) map grid depends on the mapping system used in the country reference concerned and can be very difficult to convert to other systems, such as latitude-longitude. The full reference needs to be recorded, not the more normal abbreviated ones, and full details of the map co-ordinate system, including origin references, should be recorded customised grid these are frequently developed to meet particular project objectives (e.g. publication of distribution maps at particular scales), but can be extremely difficult to convert. Often the grids are so large that valuable locality details are lost

Locality named place name may need to be further qualified if this is not from gazetteer, unique (e.g. there are many different ‘Sandy often qualified Creek’s in the Australian national gazetteer) by distance and direction from named place

Date and time year/month/day may need to accommodate ranges for extended hour:minutes observation periods

[Note that other secondary supporting attributes will also be required, e.g. identifier(s) name, date of identification, altitude, depth, previous names applied to this record, geographic region (e.g. catchment), administrative region (e.g. county—remember that these boundaries can change over time), qualifiers on any of the above, e.g. geocode precision].

16 Volume 6 06/04/00 11:59 Page 17

2.3 Sampling Design The way in which records are made in the field fundamentally determines the potential uses for the data. Decisions made about sampling strategies will influence options of which analytical tools can be used. The selection of a study area is the first important decision. In some cases the problem determines the area, e.g. for a project investigating the distri- bution of lizard species on a (small) island, the study area is self-evident. However, more commonly, the study area is selected from some larger region. The selection may be made according to some probability sam- pling scheme, or it may simply reflect the recorder’s view that the study area is in some sense ‘representative’ of the larger region. However it is done, data collection needs to be ‘representative’ of the environmental feature(s) under investigation in order for the analysis to lead to reliable and useful conclusions. Frequently, a number of samples (e.g. quadrats) are selected to represent a wider area. These quadrats are located according to various schemes: random, stratified random or regular, depending on the issue under con- sideration, the nature of the environment, logistics, and the proposed analysis tools. One of the objectives of field data collection is to distinguish any patterns in the distribution of biodiversity, identify their possible causes, and pre- dict future behaviour. Thus, biologists are not so much interested in show- ing that an observed pattern departs significantly from ‘complete spatial randomness’ as in finding out what any such pattern might mean.

2.4 Tools and Processes Hardware tools for creating information comprise a huge array of tech- nologies, from binoculars and pencil and paper to satellite-borne sensors downlinked to huge data storage facilities. Software tools including word processors, spreadsheets and data-logging software. There are very few standards available, but it is important that these tools allow for auto- mated access to the resulting data (see BCIS Handbook: Tools & Technol- ogies). This will then support automated derivation of elements of descriptive metadata about the underlying data resource (see BCIS Handbook: Metadata). The capture process includes the first stages of quality assurance and quality control (as defined above). Quality assurance can include the following kinds of tests on the individual attributes of each incoming record: • have all mandatory attributes been completed;

17 Volume 6 06/04/00 11:59 Page 18

• does each attribute contain the kind of data that is appropriate for that attribute, i.e. text in text fields, numeric data in numeric fields; • does the attribute possess an ‘allowable’ value (e.g. a species name from a standard checklist) or does its value(s) fall within a ‘legal’ range (e.g. latitude co-ordinate between 0 and 90° N or 0 and 90° S); • are the attribute values consistent with other attribute values (e.g. lati- tude-longitude co-ordinates do not reference the middle of a continent for a marine organism, or a mangrove specimen is not attributed with an altitude value that would place it high up in a mountain range). Some of these tests can be carried out automatically by database manage- ment software, others can be custom built for the particular application, while others may need to be performed by viewing data plotted or printed in various different ways. It is strongly recommended that an independent peer audit of the dataset be conducted, which could include testing for completeness, attribute accuracy, spatial accuracy and quality of the documentation. The auditor’s report could be made available with the dataset, along with any comments from previous users. The capture process thus leads to the representation of information in explicit or structured form, enabling it to be available more widely. Each data resource can now be described by a ‘metadata’ locator record, con- sisting of structured elements with defined semantics (see BCIS Handbook: Metadata).

18 Volume 6 06/04/00 11:59 Page 19

3. DATA ORGANISATION

his includes all activities that classify and categorise information for Tstorage and retrieval purposes. It may include maintenance, archiving and updating sub-processes (see also BCIS Handbook: Metadata).

3.1 Maintenance and Updating Datasets and other information resources decay and become obsolete unless managed actively. Measurement techniques change, new standards are promulgated and formats, media and other technologies continue to evolve. Maintenance also includes the incorporation of new data with existing data. Basic tests should be run on data items before they are permanently stored (e.g. before new data items are added to existing datasets). These enable suspect or unusual data items to be identified and brought to the attention of experts for assessment. Unfortunately, many datasets are created for individual short-term proj- ects without taking account of long-term corporate needs. All too often, once a project is complete, the underlying data are allowed to decay to the point that they are no longer useable. This can be inefficient, with new projects having to repeatedly re-build datasets from scratch. One of the distinguishing characteristics of a professionally managed dataset is that it is maintained not only for immediate uses, but also for other potential applications.

3.2 Archiving and Security A range of operational procedures is necessary to guarantee the security of a dataset. In general, threats to electronic data security tend to be great- est where the physical environment is hostile to computing equipment (e.g. extremes of temperature, high humidity or dust), where electronic interference is strong (e.g. in hospitals, industrial plants, locations near transmitters), where power supplies are uneven or unpredictable, and where informal and therefore virus-prone computer networks are the primary means of data transfer. It is also important to protect data from accidental erasure, which may occur due to human error in copying and reorganising files, updating records or other ‘maintenance’ procedures. Erasure may also occur due to mechanical failure of disk drives, or logical faults caused by power fail- ures or fluctuations. Computer viruses also pose a threat to data security.

19 Volume 6 06/04/00 11:59 Page 20

Box 1 describes a number of protective measures which help to combat threats to data security. Such procedures can be elaborated within the overall quality policy of the organisation (e.g. under ISO 9000), or be prepared separately in the form of an operating manual. Specific plans to cope with emergencies should also be considered, for instance hardware malfunction, fire or theft. Organisations should accord a high profile to data security. On occasion, an entire project or programme has been forced to close due to loss of essential data. This occurred once in the South Pacific when a freak wave struck the office of a custodian, eliminating its data. No copy of the data was maintained off-site.

Box 1 Procedures for Protecting Data

• Regular (daily, weekly and monthly) backup of all critical data on removable electronic media (magnetic tape or optical disk). • Storage of backup media off-site (away from the workplace) in order to restore data after damage or theft of key equipment. • Periodic test restoration of backed-up data to ensure that the procedure is effective. • Periodic test recovery from simulated virus attack, hardware malfunction or other disaster. • Regular virus-checking with up-to-date software. • Avoidance of unlicensed or borrowed software, computer games or other personal software. • Power regulation via the use of uninterruptable power supplies, surge protectors and radio interference filters.

20 Volume 6 06/04/00 11:59 Page 21

4. DATA ACCESS

his includes all activities that assist searchers discover, request and Tgain access to information resources. Resource discovery is expedited by directories and metadatabases. A data custodian can develop metadata records to promote those information resources that are potentially available to others. These records can be made accessible on-line or distributed in printed catalogues. If on-line records use international standards, then they can also be made visible through other metadata systems (see BCIS Handbook: Metadata). Once users discover that an information resource is potentially available, they need guidance on how to request access. Custodians need to develop data access policies that are comprehensive, clear and readily accessible to potential users (see BCIS Handbook: Data Access). Increasingly, data custodians are allowing direct on-line access to data and information through the Internet. Users may need to complete a registration form before access is granted. This may or may not involve a transfer of funds. For information resources that are sensitive or have commercial implications, some prior negotiation on allowable uses, payment of royalties, transfer to third parties, etc., may be necessary. Once agreement has been reached on a data exchange, the data needs to be made available in a form that the user can readily accept. This process can be expedited significantly if the data format complies with an appropriate exchange protocol (see, for example, 6.3.5).

21 Volume 6 06/04/00 11:59 Page 22

5. INFORMATION DISSEMINATION

5.1 World Wide Web5 5.1.1 Currency of information Maintaining information on the Internet is little different to maintaining data anywhere else. There needs to be a consistent and documented effort to maintain the information. This includes keeping information (meta- data), with the document, providing details of who owns the document, if it has an expiry date, etc. Documents should be regularly checked to make sure that they are still current and relevant. Organisations cited may have changed their name or address, contact persons may have changed, etc. It is important that documents carry a date so that it is obvious to readers that it was valid at that time. It is also important that files not be unnecessarily moved from one direc- tory to another on your file system. Once a site has been operating for a while you will find that other sites have made links to files or documents that they are interested in—not just to your home page. If you move the location of that document on your file system, then those links at other sites will no longer work.

5.1.2 Presentation of Information An Internet service that has a consistent look and feel is easier to navigate through than one that is not consistent. Consistency in layout, use of icons and images, etc., can all add to the standard of the site as a whole. Main- taining written document standards for your site is always worthwhile, especially if you have more than one operator preparing documents. These standards may include template pages and guidelines for develop- ment, incorporation and use of icons, images and tables, etc. Try not to be over-enthusiastic with use of coloured backgrounds, colour- ed text, etc. Coloured text on a background that looks good on your machine may be unreadable on somebody else’s. Sometimes a map can look good in colour, but may not show sufficient contrast when printed in black and white.

5 From Environmental Resources Information Network (ERIN), Australia

22 Volume 6 06/04/00 11:59 Page 23

Try to keep your home page and menu pages short. A very large home page with lots of text interspersed with occasional links to other parts of the site, a very large icon and several images of your site, etc., may look very impressive but may take a long time to load. A simple home page that gives users an overview of the information your site contains will allow them to then decide to whether or not to go further into your system. Remember the aim of your web is for people to use it. To use it, they must feel comfortable with it, be able to navigate through it, not have to wait for long periods while large documents or documents with lots of images load, and be able to easily find the information they are interested in. If any of these prove difficult, they will not be inclined to visit your site again.

5.1.3 HTML Metadata

5.1.3.1 WHAT IS HTML METADATA Metadata is data about data. It is used to document information resources and is often used, at least in part, as an “index” or “directory” to data (see BCIS Handbook: Metadata). Metadata can also be used with HTML documents. Information about the document is recorded within the HEAD element of an HTML document and follows a standard format and is called the META element. The META element is used to document metadata not covered by other HTML elements such as TITLE. The META element has two main uses : • to provide a means to discover that the document or data set exists and how it might be obtained or accessed; and • to document the content, quality, and features of a document or data set, indicating its fitness for use. Such information can be extracted by servers/clients for use in identifying, indexing and cataloguing specialised document meta-information such as keywords, expiry date, etc. The HEAD section of a document, including the META elements, is extremely useful for managing documents on a WWW server. It is very easy for documents to be forgotten about and never up- dated. Often the links that are embedded in a document become stale or broken and need attention. The HTML elements within the HEAD section are used by automatic programs to gather certain information about the

23 Volume 6 06/04/00 11:59 Page 24

document. Maintenance programs or robots will automatically traverse the filesystem of a WWW server and check each document to ensure that the links are still functioning, that the HTML is correct and that the document is not past its expiry date. If the program identifies anything wrong with the document it will use the META information in the HEAD section to send an email message to the owner of the document to advise of the problem. TITLE and KEYWORDS are part of the META information and are used by indexing or harvesting programs to set up indexes across the site or a number of sites to allow users quicker access to information. Other META elements may be used for spatial searching of infor- mation or for identification of a range of other information that is not necessarily wanted within the main BODY of the document itself. It is important to note, however, that this information is NOT hidden from users as it can be brought up under ‘View Source’.

5.1.3.2 WHAT SORTS OF THINGS SHOULD BE INCLUDED The META element is valuable for including information on such things as: • Custodian of the document; • contact person at custodian institution; • person responsible for the document; • expiry date (i.e. revision date of document or date it should be removed); • date of creation; • index type information for various search engines (including spatial searching information); and • keywords, etc.

24 Volume 6 06/04/00 11:59 Page 25

5.2 Compact Disc Read-Only Memory (CD-ROM) There are several formats used for CD-ROM data, including Green Book CD-ROM, subsequently superseded by White Book CD-ROM and Yellow Book CD-ROM. ISO 96606 defines a standard file system (see BCIS Handbook: Tools & Technologies).

5.3 Maps There are many areas where standards are applicable in the area of GIS and map production. These include7: • operating system standards (DOS, UNIX, . . .); • user interface standards (Windows, Presentation Manager, X-Windows, . . .); • networking standards (Ethernet, TCP/IP, . . .); • database query standards (SQL, . . .); • display and plotting standards (Postscript, . . .); • cartographic standards (title, north point, scale, legend, date, . . .)8; and • data exchange standards (SDTS, TIGER, SIF, . . .).

6 7 Source: http://www.geog.ubc.ca/courses/klink/gis.notes/ncgia/u69.html 8 Further details, e.g.

25 Volume 6 06/04/00 11:59 Page 26

6. THEMATIC STANDARDS

6.1 Taxon Names9 he naming of all organisms of the earth, both living and fossil, is regu- Tlated by a number of international bodies operating under the aegis of either the International Union of Biological Sciences (IUBS) or the Inter- national Union of Microbiological Societies (IUMS). These bodies are responsible for issuing sets of rules or codes of nomenclature which govern the formation and choice of scientific names of taxa, but not the definition of the taxa themselves. The Codes most relevant to BCIS are the international codes of botanical10 and of zoological11 nomenclature.

6.1.1 Accepted Names12 Species 200013 Species 2000 is an umbrella project that has the objective of enumerating all known species of plants, animals, fungi and microbes on Earth. It will also provide an access point enabling users to link to other data systems for all groups of organisms using direct species-links. Users will be able to verify the scientific name, status and classification of any known species via the Species Locator, which provides access to species checklist data drawn from an array of participating databases.

Global and Regional Checklists There are many individual global or regional checklist initiatives, only a few of which are listed in the following box. Several of the following are under consideration for inclusion in Species 2000. Many individual maintain national checklists of species in sup- port of legislative, planning and management requirements. These follow differing, often incompatible standards, and cannot be readily linked together, even to form regional checklists. Some initiatives, such as the

9 Source: http://www.york.biosis.org/zrdocs/codes/codes.htm 10 International Code of Botanical Nomenclature (Tokyo Code) 11 International Code of Zoological Nomenclature 12 Source: BCIS Handbook : Core Datasets 13 Source: http://www.sp2000.org/

26 Volume 6 06/04/00 11:59 Page 27

Dataset Host WWW URL BioSystematic Systematic http://www.sel.barc.usda.gov/Diptera/biosys.htm Database of Entomology World Diptera Laboratory, Agricultural Research Service, USDA, Washington, and the Bishop Museum, Honolulu

FishBase International http://www.cgiar.org/iclarm/fishbase/ Center for Living Aquatic Resources Management (ICLARM)

Global Plant International http://iopi.csu.edu.au/iopi/iopigpc1.html Checklist Organization for Plant Information (IOPI)

Mammal National http://www.nmnh.si.edu/msw/ Species of the Museum of World Natural History, Smithsonian Institution

Tropicos Missouri Botanic http://mobot.mobot.org/Pick/Search/pick.html Gardens

World Lists of National Terrestrial: http://www.nmnh.si.edu/gopher- Isopoda Museum of menus/WorldListofTerrestrialIsopoda.html Natural History, Smithsonian Marine & freshwater: Institution http://www.nmnh.si.edu/gopher- menus/WorldListofMarineandFreshwaterCrustaceaI CrustaceaIsopoda.html

Integrated Taxonomic Information System (ITIS) , a partnership of U.S., Canadian and Mexican agencies, other organisations, and taxonomic specialists, are under way. Supra-national checklists will most usefully be built up on a group-by-group basis, along the lines planned by Species 2000.

27 Volume 6 06/04/00 11:59 Page 28

6.1.2 Synonyms, Homonyms and Invalid names There are no comprehensive standards for linking these to ‘currently accepted’ names. The international codes referenced above provide guid- ance as to how various kinds of names should be handled.

6.2 Georeferences There are a number of issues concerned with standards for map projec- tions, spheroids and co-ordinate systems that are beyond the scope of this Handbook. Specialist references and expertise should be consulted14. For example, a co-ordinate system for any particular map is usually defined by a map projection, a spheroid of reference, a datum, one or more stan- dard parallels, a central meridian, and possible shifts in the X and Y directions to locate X,Y positions of point, line, and area features.15

6.2.1 Geocodes Latitude and Longitude Latitude and longitude apply everywhere on the surface of the Earth. They are traditionally measured in degrees, minutes and seconds (DMS) and can be used to locate exact positions. However they are not uniform units of measure on the Earth’s surface. Only along the Equator does the distance represented by one degree of longitude approximate that repre- sented by one degree of latitude. Thus they cannot be used as an accurate measure of distance.

Grid Reference Systems Many countries have their own grid reference systems, some even more than one. These are usually expressed in ‘eastings’ and ‘northings’ from a specified origin.

6.2.2 Country Codes ISO 316616 is the International Standard. It is a three-part standard:

14 e.g. A Guide to Coordinate Systems in Great Britain, Ordnance Survey, UK 15 Source: http://www.geo.ed.ac.uk/agidexe/term?117 16 Source: http://www.din.de/gremien/nas/nabd/iso3166ma/index.html

28 Volume 6 06/04/00 11:59 Page 29

• ISO 3166-1 “Country codes” establishes codes for the current names of countries, dependencies, and other areas of particular geopolitical interest, on the basis of lists of country names obtained from the United Nations. ISO 3166-1 is an updated edition of ISO 3166:1993. ISO 3166-1 was published on 1 October 1997. • ISO 3166-2 “Country Subdivision Code” establishes a code for the names of the principal administrative subdivisions of the countries coded in ISO 3166-1. ISO 3166-2 was published on 15 December 1998. • ISO 3166-3 “Code for formerly used names of countries” establishes a code that represents non-current country names, i.e. the country names deleted from ISO 3166-1 since its first publication in 1974. ISO 3166-3 was published on 1 March 1999.

6.2.3 Other Systems LOCODE17—Code for Ports and Other Locations UN/LOCODE is maintained by the United Nations Economic Commis- sion for Europe. It provides code elements for more than 26,000 names of ports, airports, rail and road terminals, postal exchange offices, border crossing points and other locations used in trade and transport. All code elements in UN/LOCODE start with the ISO 3166-1 alpha-2 country code element for the country in which the place concerned is located. In some countries there are several places with the same name. In such cases the relevant ISO 3166-2 subdivision code is essential to distinguish between them.

6.3 Biological Collections Databases Many frameworks and standards have been developed in this area. The following examples should be regarded as indicative only.

6.3.1 ASC Biological Collections Reference Model18 A reference schema (information model) for biological collections. The model presented describes the high-level entities (logical and physical objects) that comprise the domain of a biological collection, including col- lecting activities, specimen objects and their handling, their taxonomy,

17 Source: http://www.unece.org/trade/rec/rec16en.htm 18 Source gopher://biodiversity.bio.uno.edu:70/00/standards/asc/ascmodel Schema: http://gizmo.lbl.gov/DM_TOOLS/OPM/BCSL/LIB/BioCollect.OPM

29 Volume 6 06/04/00 11:59 Page 30

locality and collector data, other objects, and the relationships among them. The model should be able to accommodate an insect collection as satisfactorily as a fish collection, botanical collection, or paleontological collection. Therefore, this conceptual information model provides an opportunity for the diverse disciplines of collections-based biologists to discuss their common activities and analogous information in a mutually comprehensible frame of reference. The model also provides the frame- work from which more detailed models can be developed.

6.3.2 CDEFD Information Model for Biological Collections19 CDEFD (“A Common Datastructure for European Floristic Databases”) is a concerted action project financed under the European Commission’s third framework program which has set out to provide such models to the biological community and to database designers. The core of this work was formed by a detailed data model for botanical collections (including lichens and fungi), which was widened in scope to include other natural history collections and microbiological culture collections. The present model is part of a larger model which tries to provide a uni- fied view of biological information, including taxonomic, nomenclatural, ecological, bibliographical, and geographic components, as well as the results of studies (descriptors) in individual branches of biological sciences. The provided data structures are very complex, attempting to incorporate all available information into a single model. To fit the particular needs of a given data bank they can be modified and simplified. The model allows the designer to assess the consequences of the simplification process, par- ticularly in regard to restraints on future extensions of information content and possible incompatibilities with other databases. The complex model thus provides a reference tool for the planning of specific databases. In addition, the model supplies guidelines for the definition of data fields and thereby provides a base for the discussion of data standards.

6.3.3 CRIS: Collections and Research Information System20 CRIS is a distributed, multimedia system supporting the documentation, management, analysis, and delivery of collections, educational, and research resources held and produced by the Smithsonian Natural History Museum. The foundation of the system is a series of databases describing:

19 Source: http://www.bgbm.fu-berlin.de/CDEFD/CollectionModel/cdefd.htm 20 Source: http://www.nmnh.si.edu/cris/

30 Volume 6 06/04/00 11:59 Page 31

specimens and their current and past uses, observations taken in the field, collecting sites and habitats, geographic areas, species and higher taxa, cultural groups, and relevant literature. The system is based on a multi- server architecture, integrating text databases, digital image and sound recordings, files containing results of scientific analyses, geographic information, and data thesauri.

6.3.4 HISPID: Herbarium Information Standards and Protocols for Interchange of Data21 HISPID is a standard format for the interchange of electronic herbarium specimen information. HISPID has been developed by a committee of representatives from all Australian Herbaria. This interchange standard was first published in 1989. Version 3 which was published in 1996.

6.3.5 ITF: International Transfer Format for Botanic Garden Plant Records22 The International Transfer Format for Botanic Garden Records is an Internationally agreed standard format by which electronic information about living plants, as held by botanical institutions, particularly botanic gardens, may be interchanged between organisations. The transfer format of ITF2 is based on ‘Information technology—Open Systems Inter- connection—Specification of Abstract Syntax Notation One (ASN.1)’. International Standard ISO/IEC 8824, 2nd ed. (1990)(ISO/IEC: Genève).

6.4 Spatial Data Again, the following are indicative only.

6.4.1 Vegetation National Vegetation Classification Standard (NVCS), USA23 The overall objective of the National Vegetation Classification Standard (NVCS) is to support the use of a consistent national vegetation classifica- tion to produce uniform statistics in vegetation resources from vegetation cover data at the national level. Adoption of the NVCS in subsequent

21 Source: http://www.rbgsyd.gov.au/HISCOM/HISPID/HISPID3/ hispidright.html 22 Source: http://www.rbgkew.org.uk/BGCI/news.htm 23 Source: http://www.nbs.gov/fgdc.veg/standards/vegstd.htm

31 Volume 6 06/04/00 11:59 Page 32

development and application of vegetation mapping schemes will facili- tate the compilation of regional and national summaries. In turn, the con- sistent collection of such information will eventually support the detailed, quantitative, geo-referenced basis for vegetation cover modelling, mapping, and analysis at the field level. The purpose of the national standard is to require all federal vegetation classification efforts to have some core components that are the same across all federal agencies to permit aggregating data from all federal agencies. The NVCS does not prevent local federal efforts from doing whatever they want to meet their specific purposes. NVCS does require that when those local efforts are conducted, they are conducted in ways that, among whatever else they do, they provide the required core data.

6.4.2 Soils Soil Geographic Data Standard, USA24 The overall objective of the Soil Geographic Data Standard is to standard- ize the names, definitions, ranges of values, and other characteristics of soil survey map attribute data developed by the National Cooperative Soil Survey (NCSS). The NCSS is the body composed of the various federal, state, and local units of government that work cooperatively to develop the soil survey of all lands in the .

6.4.3 Land Cover FAO Land Cover Classification25 The FAO Land Cover Classification is a comprehensive standardised a priori classification system, designed to meet specified user requirements and created mapping exercises, but independent of the scale or means used. The proposed classification can be used as reference classification system because the used diagnostic criteria allow correlation with existing classifications/legends.

24 Source: http://fgdc.er.usgs.gov/standards/status/sub2_2.html 25 Source: http://www.fao.org/WAICENT/FAOINFO/SUSTDEV/EIdirect/ EIre0019.htm

32 Volume 6 06/04/00 11:59 Page 33

6.4.4 Transfer Standards Standards include: • Spatial Data Transfer Standard (SDTS) • Geographic Information European Prestandards prepared by Technical Committee CEN/TC 287

6.5 Thesauri INFOTERRA EnVoc Multilingual Thesaurus of Environmental Terms EnVoc, the latest edition of the INFOTERRA Thesaurus of Environmental Terms, was released in June 1997. It is available in all six official United Nations languages – Arabic, Chinese, English, French, Russian and Spanish. It is available on-line at:

GEMET26 The General Multilingual Environmental Thesaurus presents 5.298 descriptors, including 109 Top Terms, and 1.264 synonyms in English. The 5,524 terms belonging to the parental thesauri, and not included in GEMET, constitute an accessory alphabetical list of free terms. GEMET provides a complete numerical equivalence (all the descriptors have an equivalent) with the following languages: Dutch, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish. Swedish and Greek are at present under work. The semantic equivalence (correct correspondence of meaning between languages) has been separately ensured for Dutch, French, German, Italian, Norwegian, Portuguese and almost completely for Spanish. Equivalence in Finnish is not yet validated. The translation of GEMET into other languages, both extra-EU and extra- European is foreseen in the future.

26 Source: http://www.mu.niedersachsen.de/cds/etc-cds_neu/software. html#GEMET

33