This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain.

Attribute and Positional Accuracy Assessment of the Murray Darling Basin Project, Australia

Fitzgerald, R.W.', Ritman, K.T. & Lewis, A.

Abstract: The Murray Darling Basin comprises a land area of 1,058,000 km2 covering a substantial portion of SE Australia and encompassing Australia's largest river system. A basin wide woody vegetation dataset has been assembled from LANDSAT TM imagery supplemented with at a nominal scale of 1: 100,000. The attribute accuracy assessment method is based on a multi-stage systematic sample design. A rectangular grid is placed over the entire Murray Darling Basin dataset and at each primary sample point, a secondary grid is created formed from a square contiguous set of aerial photos. A half tone, grey scale transparency covering approximately 10km2 is generated from digital imagery (LANDSAT TM) for each primary grid point at a contact scale matching the available air photos. Overprinted on this transparency are the secondary grid points as patch size sampling frames. The air photos are then visually co-registered underneath the transparency with Landcover features from the LANDSAT TM image. Attribute data (presence or absence of woody vegetation) is collected directly from the air photos for each secondary grid point at 4 different patch sizes. Positional accuracy is assessed by recording 40 or more ground control points from the 1: 100,000 map sheet containing the primary grid point. These Eastings and Northings are compared to their position in the LANDSAT TM image. The outcome of the accuracy assessment is a Basin wide attribute and positional accuracy statement and spatial variability contour maps of attribute and positional accuracy as GIs overlays.

INTRODUCTION

This paper discusses the attribute and positional accuracy assessment

I Infoplus, Po Box 125, Queanbeyan, NSW 2620, Australia. Fax: -+dl (0)6 299 1331, E-mail: bfitzger@pcug, org.au methodology developed for the Murray Darling Basin project2(MDBP). The Murray Darling Basin (MDB) covers an area exceeding 440 x 1: 100,000 map sheets with a total area of 1,058,000 krn2. The diverse range of vegetation and land forms creates challenges in terms of methodology and logistics. The initial focus of the accuracy assessment method include three stratification hypothesise. The first is that the local geographic variations in the classification accuracy is a function of vegetation type, terrain and substrate. Second, the smaller the patch of woody vegetation, the lower the attribute accuracy. Thirdly, the overall accuracy percentage is proportional to the percentage of vegetation cover. If the vegetation cover is patchy, then the ratio between the length of a boundary around a patch polygon and it's enclosed area is of interest. This ratio was examined by Crapper et.al. (1 986). The positional and accuracy assessment methodology will be: a. Statistically sound, practical, inexpensive to implement, easily understood by project staff and portable to MDBP's GIs; b. Produce scalable attribute and positional accuracy assessments of the woody1 non woody vegetation dataset as; I. Attribute accuracy assessment statistics (error matrix, user and producer accuracies and Kappa statistic) at different .. spatial scales; 11. Spatial variability maps of attribute and positional accuracy for the entire MDBP. A brief examination of the MDB woodylnon-woody GIs layer provided an insight into data quality and processing standards. The details of these standards including lineage are outlined in the Draft specification (Ritman, 1995). The Victorian & South Australian groups have examined attribute accuracy assessment methodologies. The work of Czaplewski et.al. (1 992) proved to be the most substantial and well documented attribute accuracy assessment methodology available. They used aerial photo interpretation as their pseudo ground truth and a systematic sample. Tadrowski et.al. (1990) in South Australia investigated supervised classification aided by manual photo interpretation and a stratified simple random sample for attribute accuracy assessment. The production of the digital woody vegetation dataset at a nominal Basin-wide scale of 1:100,000 was achieved by a two stage process (Ritman, 1995). Woody vegetation is defined as any perennial vegetation having a height exceeding 2m and a density greater than 20% crown cover (McDonald et.al., 1990). The first stage was the digital classification from LANDSAT TM imagery of a template of only woody vegetation. This method is based on that of Gilbee & Goodson (1992) and comprises an unsupervised 100 IS0 class cluster analysis of

2 This paper is a product of a consulting project titled "AtfributeAccuracy AssessmentJor Project M305", DLWC, September 1995, Murray Darling Basin Project M305.

318 LANDSAT TM imagery, followed by manual aggregation based on field input, aerial photos and ancillary data. In addition a filter (an ARCINFO AML process) designed to remove patches of cells diagonally and orthogonally connected, created by Dr. K. Ritman, was passed over the resulting woody vegetation layer to remove unconnected vegetation patches less than 0.25 ha.. The resultant woody dataset is in raster format with 25 x 25m pixels. The next stage was either a manual or digital classification of vegetation structural elements such as genus, density class and growth form. Only the woody vegetation layer is subject to the attribute and positional accuracy methodology described in this paper.

A BRIEF REVIEW OF THE LITERATURE AND PREVIOUS STUDIES.

Attribute classification accuracy is usually assessed by constructing a contingency table of a classified map versus ground truth or reference data (Congalton, 1991; Veregin ,1989). The resulting error or confusion matrix C is a k x k matrix where k is the number of discreet classes in the classification scheme. In the case of the MDB woodylnon-woody dataset, k = 2. The most commonly used index of attribute classification accuracy is the overall accuracy percent (OA%). Confidence limits for the OA% can be constructed easily from either the binomial distribution or the normal approximation to the binomial distribution. Van Genderen et.al. (1 978) logically extends the use of the binomial confidence limits to estimate sample sizes given expected classification accuracy and confidence levels. The OA% is a simple index of classification accuracy which has its limitations. The OA% can't differentiate between errors of omission and commission. Also it can not reliably be used to compare the performance of different error matrices with different sample sizes as well as not being able to account for correct classification by chance alone. One of the best methods developed to overcome these limitations is the Kappa statistic discussed extensively by Congalton (1 99 1) and Fitzgerald & Lees (1994a). It statistically quantifies the level of agreement and has been shown to give a less biased estimate of classification accuracy than the OA% (Rosenfield & Fitzpatrick- Lins, 1986). The effects of sampling schemes on classification accuracies especially when viewed across the spectral, spatial, environmental, taxonomical and temporal domains can induce substantial bias into classification. Congalton (1 988, 1991) compares the relative effects of five sampling schemes including random and stratified random sampling on classification accuracy. Franklin & Hiernaux (1 991) discuss the effect of sampling schemes on woody vegetation classification while Fitzgerald & Lees (1994b) discuss scale and its relationship to floristic structure. High accuracy surveying standards exist for assessing the accuracy of topographic maps in all three dimensions. The state and national mapping agencies are responsible for the surveying standards and integrity of the map base. The implicit assumption made in this study is that this map base is generally correcf Positional accuracy quantifies the accuracy of feature locations after various image processing and GIs transformations have been applied. A number of tests are available to assess positional accuracy including deductive estimates, internal evidence checks, comparisons to source documents and reference to independent sources of higher accuracy (Veregin, 1989). The latter is the most desirable and the one used in this study. The independent source is the AUSLIG 1: 1 00,000 map base series. Spatial variability maps of both attribute and positional accuracy will be produced for the MDB from this accuracy assessment methodology. The 12 x 18 primary systematic grid (described below) will contain the derived data values of OA% and Kappa (attribute accuracy) and 2D-RMS and CEP (positional accuracy). These gridded values will then be interpolated to a surface from which a contour map (isometric lines) of attribute and positional accuracy will be produced. The accuracy of this interpolation is dependant on the number and spatial distribution of the observed sample values. Systematic sampling (aligned or unaligned) is the best of all the sampling techniques tested to minimise the effects of spatial distribution on contouring (Veregin, 1989).

SAMPLE DESIGN

The constraints on the sample design for the attribute and positional accuracy assessment for the MDBP were as follows: a. The extent of the MDB (1 ,058,000km2) precludes field checking as the major source of ground truth. Aerial photo interpretation is the only practical means in this case; b. The design must be practical, simple and expedient to implement. Specifically the handling of maps, air photos, air photo run maps along with satellite imagery should be handled as efficiently as possible with the minimum number of each being accessed as possible; c. Attribute accuracy must be assessed at a minimum patch size of 1 hectare across the entire basin. The patch size scale effect on classification accuracy should be assessed if possible at scales of 0.25, 1.00,4.00 and 9.00 ha.;

3 This assumption proved incorrect. Field experience of project staff demonstrated that the map base is not always reliable. Combined with budgetary restrictions, the positional accuracy component of the methodology was cancelled in the implementation phase. d. The results of the accuracy assessment must be statistically defensible and easily interpreted by the end user community. With these constraints in mind a review of the literature suggested that the four best contenders for the sample design were: Simple random sampling (SRS); Stratified simple random sampling (STSRS); Cluster sampling; Systematic sampling (SYS). Simple random sampling has the advantage of being the easiest to construct. Implementation especially in the field can be problematic. Statistically, the estimates produced are easily produced and are robust. For the purposes of the MDBP, SRS was considered impractical to implement on such a large scale dataset. Stratified SRS is the most often used design as judged from literature. It is statistically more efficient than SRS (Cochran, 1977), can produce less biased results than SRS and is a little easier to implement (Congalton, 1988: Janssen & van der Waal, '1994: Van Genderen et.al., 1978). Constructing a stratified SRS is more complex than a SRS by itself and stills suffers from the problems of access to ground truth sites. The claimed statistical gains over SRS are also very dependent on the spatial autocorrelation structure of the dataset (Congalton, 1988) which is unknown. Stratified SRS was not considered any more practical to implement than SRS for the MDBP. Cluster sampling is the preferred design when cost of access to the ground truth sites is at a premium. Moisen et.al. (1 994) showed that cluster sampling with a fixed cost had a higher relative efficiency than either systematic or simple random sampling. The construction and implementation of cluster sampling are more complex than either stratified or simple random sampling. Once again the practicalities of this design precluded its use in the MDBP. Systematic samples have as their most attractive feature their ease of construction and implementation. Systematic sampling is particularly suited to spatial problems involving two dimensions or more as noted by Cochran (1977). Cochran (1977) and Congalton (1 988) propose that unaligned systematic sampling is superior to aligned or centred systematic sampling. A large number of authors and studies have used systematic sampling. Goodchild et.al. (199 1) utilised a single stage systematic sample of 1347 sites in the CALVEG study. Czaplewski et.al. (1992) used a systematic sample of 363 plots for the Victorian CNR Tree cover project. The attribute accuracy assessment method recommended for the MDBP is a two stage systematic sample with subsampling units of equal size. The sampling unit is a woody vegetation patch. The sample design has the practical advantage of simplifying the acquisition and handling of the aerial photos used to acquire the pseudo ground truth values for the attribute accuracy assessment. In the first sampling stage, a 12 x 18 rectangular primary sampling grid (2 16 primary grid points) is placed over the MDB dataset. This grid is oriented 40" from North to incorporate linear trends in Landcover categories. At each primary sample point, a 7x7 secondary grid (49 secondary grid points) is created orientated to the AMG grid. Thus the number of secondary sample points will be: 12 x 18 x 7 x 7 = 10,584. Due to the irregular outline of the MDB, approximately 10% of the primary grid point will fall outside of the Basin and will be excluded. Thus the final sample size will be less than 10,584, possibly around 9,500. The sampling unit is a woody vegetation patch. This two stage design has the added advantage of being scalable. The grid cell size at either sample stage can be varied to reflect desired confidence limits, budget and time constraints4.The presence1 absence of woody vegetation at the 4 patch sizes (0.25, 1,4 and 9 ha.) will be collected at each Secondary grid point. This data will be used to investigate the effect of patch size on classification accuracy. To collect the attribute information, a half tone, grey scale transparency covering approximately 100km2 is generated fiom satellite digital imagery (LANDSAT TM) for each primary grid point at a contact scale matching the available air photos. Overprinted on this transparency are the secondary grid points as patch size sampling frames. The air photos are then visually coregistered underneath the transparency with Landcover features from the LANDSAT TM image (MDBC, 1995). Attribute data (presence or absence of woody vegetation) is collected directly from the air photos for each secondary grid point at 4 different patch sizes. This information forms the pseudo ground truth values for comparison to the LANDSAT TM derived vegetation layer (woody1 non woody). The attribute data for each of the 49 secondary grid points is compiled and crosstabulated with the classified LANDSAT TM woody vegetation values. An OA%, user and producers accuracy along with the Kappa statistic are derived from this error matrix to assess the attribute accuracy at each primary grid point. These estimates are statistically valid at the secondary grid point level. Positional accuracy is assessed by recording 40 or more ground control points (GCPs) from the 1: 100000 map sheet containing each primary grid point. These Eastings and Northings are compared to their position in the LANDSAT TM image. From these GCPs, the mean and standard deviation of the absolute differences along with the 2D-RMS and Circular Error probability (CEP) radius are computed. The specified positional accuracy of the woody vegetation dataset at 1: 100,000 scale is 90% 5 50 metres. The results from the 3 spot checks indicate that all the mean 2D-RMS values are > 50m and the 85% CEP are 2 to 3 times the 50m specification.

4 Budgetary constraints during the implementation of the accuracy assessment methodology dictated that the sampling grids be reduced to 10 x 15 primary and 6 x 6 secondary points giving a total of 5,400 sampling points. SAMPLE SIZE DETERMINATION

The sampling unit for the attribute accuracy assessment is a patch of woody vegetation. The minimum Basin wide patch size is 1 ha. The total number of 1 ha. patches is N = 105,800,000 and at the minimum mapping unit size of 0.25 ha., N = 423,2OO,OOO. One of the statistical problems faced in the sample design is that there are very few accepted guidelines for determining sample sizes in spatial analysis. Many well respected authors and studies pluck a sample size from the ether with naught justification. Goodchild et.al. (1 991) uses a sample size of 1347 sites systematically sampled from 56,973 sites and does not described how they arrived at this figure. The beginnings of the statistical definition of sample size in spatial analysis is seen in the early work of Van Genderen et.al. (1978). They outline the use of the binomial distribution as a means for deciding the sample size based on a confidence requirement (95%) and a specified minimum classification error in the population of 85% correct. This corresponds with the US Geological Survey Circular 671 operational job specification. Rosenfield & Melly (1 980) and Veregin (1 989) provide a similar rationale based on the confidence interval for proportions, again based on the binomial distribution. At the secondary grid level, the proposed sample design takes an area of 10xlOkm. This corresponds to 10,000 x 1 ha. patches. Thus based on the binomial theory, a sample of 60 one hectare patches from 10,000 one hectare patches, a sample fraction of O.6%, should provide an unbiased estimate of the population (here a 10 x 1Okm area) classification accuracy. However, at the Basin wide level, 60 one hectare sample points amongst 105,800,000 (a sample fraction of 0.0001%) is a tad small! Theoretically it is justifiable. Typical telephone polling of the entire Australian population has sample fractions of 0.0 1%, 100 times that of the above! Unfortunately, the & GIs literature gives little guidance as to what constitutes an acceptable sample size. Discussions with Dr. Ray Czaplewski and Gretchen Moisen confirmed this view. Congalton (199 1) offers a rule ofthumb of 100 sample sites per classification category. Czaplewski & Catts (1992) recommend sample sizes of 500 to 1000 based on simulation studies. The decision on the final sample sizes was constrained by 3 factors. The number of primary sampling points needed to be as large as practicable to minimise the contour interpolation error. The secondary grid sample size of 60 (95% correct classification with 95% confidence) based on the binomial distribution seemed reasonable based on the literature and discussions with colleagues. The sample size had to accommodate the patch size sampling frames on the transparent overlay. The compromise decided on was a primary grid of 12x 18 (2 16 points) and a secondary grid of 7x7 (49 points) giving a total sample size of 10,584 patches. The secondary grid size is statistically defensible and the primary grid size gives sufficient control points for the contour interpolation.

ANALYSIS OF THE ATTRIBUTE AND POSITIONAL ACCURACY.

The Positional accuracy assessment will be based on a summary table of the absolute differences between the map and digital image coordinates (E & N) of the GCPs recorded at the secondary grid level for each primary grid point. The GCP outliers (extreme differences i.e. > 1kin), will be identified, documented and removed from the analysis. Summary statistics (means, standard deviations, maximum & minimum) for each primary grid point will be aggregated into a Basin wide positional accuracy report. The 2D-RMS and Circular error probability radius (CEP) for each primary grid point will be created from the raw GCP differences discussed above. A contour plot of the 2D-RMS from each primary grid point will be produced. This contour map illustrates the spatial variability of positional accuracy across the MDB. The Attribute accuracy assessment follows the trend in most of the literature in utilising an analysis of the error matrix. For each Primary grid point and for each patch size, an error matrix of the pseudo ground truth values of woody1 non woody derived from the air photo interpretation versus the values derived from the digital imagery is constructed. From this error matrix (for each patch size) the Overall Accuracy % (OA%), Kappa statistic and user and producer accuracies are computed for each primary grid point and for each patch size. They can be reported separately as the secondary level sample size is sufficient to make them statistically valid. The contents of the Basin wide attribute accuracy summary table is then contour mapped to produce the spatial variability maps of Attribute accuracy.

CONCLUSIONS

The attribute accuracy assessment project developed and trialed a methodology for assessing the attribute and positional mapping accuracy of the woody vegetation layer (woody/ non-woody) within the MDB dataset. The flexibility of two stage systematic sampling, its simplicity of implementation, the large number of regular grid points distributed across the basin at different sampling levels and the potential flexibility for post stratification and interpolation were the deciding factor in choosing it for the attribute accuracy assessment for the MDBP.

ACKNOWLEDGEMENTS

I'm indebted to the practical experience of Adam Choma of CNR, Peter Knock of DLWC, Bathurst and Graeme Dudgeon (NS W Dept. Agriculture, Orange). Mrs. Kim Smith my research assistant persevered with the GCPs. On the statistical front, Dr. Ray Czaplewski of the US Forestry Service, Fort Collins Colorado USA proved an invaluable sounding board. Gretchen Moisen of the US Dept. Of Agriculture, Ogden Utah USA provided insights into accuracy assessment on large scale datasets. Dr. Brian Lees (Australian National University, Geography Dept.) provided early thoughts.

REFERENCES

Cochran, W.G., 1977, Sampling Techniques (3rd ed.), John Wiley & Sons, NY Congalton, R.G., 1988, Comparison of sampling schemes used in generating error matrices for assessing the accuracy of maps generated from remotely sensed data., Photogrammetric Engineering and Remote Sensing, v 54, n 5 May 1988, p 593-600 Congalton R.G., 1991, A review of assessing the accuracy of classifications of remotely sensed data., Remote sensing of Environment, v 37, n 1, p 35-46 Crapper, P.F., Walker, P.A., Nanninga, P.M., 1986, Theoretical prediction of the effect of aggregation on grid cell data sets., Ge-processing, 3, pl55- 166. Czaplewski, R. & Catts, G.P., 1992, Calibration of Remotely Sensed proportion or area estimates for misclassification error. Remote Sensing ofEnvironment, v 39, p 29-43 Czaplewski, R., Goodson, P., Gilbee, A., Razier, P., Choma, A., 1992, Accuracy assessment ofRemotely Sensed thematic maps. Project Brief, Dept Conservation and Environment, Victoria, Australia, September 29, 1992. Fitzgerald, R. W. & Lees, B .G., 1994a, Assessing the classification accuracy of multisource remote sensing data., Remote Sensing of Environment, 47,n 3, 362-368 Fitzgerald, R.W . , & Lees, B .G., 1994b, Spatial Context and scale relationships in raster data for thematic mapping in natural systems, Spatial Data Handling Conference, Edinburgh, Scotland, Sept 1994. Published by Taylor & Francis, UK (in press). Franklin, J., Hiernaux, P.H .Y., 1991, Estimating Foliage and Woody Biomass in Sahelian and Sudanian Woodlands Using a Remote-Sensing Model., International Journal of Remote Sensing, v 12, n 6, p 1387-1404 Goodchild, M.F., Davis, F. W., Painho, M., Stoms, D.M., 1991, The use of vegetation maps in geographic information systemsfor assessing conifer lands in California. National Centre for Geographic Information and analysis, Dept Geography, Uni California. Report 9 1-2B NCGIA Gilbee, A. & Goodson, P., 1992, Mapping tree cover across Victoria using Thematic Mapper digital data and a GIs., Proc. 6th. Australian Remote Sensing Conference, Wellington, NZ , Nov 1992. Janssen, L. L.F. & van der Wel, F. J. M., 1994, Accuracy assessment of satellite derived land-cover data: a review, Photogrammetric Engineering & Remote Sensing, v 60,n 4, April, p 419-426 McDonald, R.C . , Isbell, R.F., Speight, J.G., Walker, J. & Hopkins, M. S . , 1990, Australian Soil and Land Survey; Field Handbook, ed. 2, Inkata Press, Melbourne, Australia. Moisen, G.G., Edwards, T.C. Jnr, Cutler, D.R., 1994, Spatial sampling to assess classzjkation accuracy of Remotely Sensed data. Environmental Information Management and Analysis: Ecosystem to Global Scales Michener, Brunt and Stafford (eds). p 159- 176 Murray Darling Basin Commission, 1995, Recipe for the Attribute and Positional Accuracy Assessment of the Murray Darling Basin Project., internal working document, by R W Fitzgerald, Infoplus for DLWC Land Information Centre, Bathurst, Australia. Ritman, K.T., 1995, Structural vegetation data; A speczfications for the Murray Darling Basin Project M305. DLWC, Land Information Centre, Bathurst, Australia Rosenfield, G.H. & Fitzpatrick-Lins, K., 1986, A coefficient of agreement as a measure of thematic classification accuracy., Photogrammetric Engineering and Remote Sensing, v 52, n 2, p 223-227. Rosenfield, G.H. & Melly, M.L., 1980, Applications of statistics to thematic mapping, Photogrammetric Engineering and Remote Sensing, v 46, n 10, October, p 1287-1294 Tadrowski, T., Hart, D.G.D., Schepp, K., 1990, A study of the possibilities and accuracies of 1:50 000 vegetation mapping using Remotely Sensed data. 8th Australian Inst Cartographers Conference, Darwin, Australia, May 1990. Van Genderen, J.L., Lock, B.F. & Vass, P.A., 1978, Remote Sensing: Statistical testing of thematic map accuracy. Remote Sensing ofthe Environment, v 7, p 3- 14. Veregin, H., 1989, A taxonomy of error in spatial databases. National Centre of Geographic Information and Analysis, Technical Report 89- 12, December 1989.