Mike Polioudakis 2010 February 22

NRCS PL 566 Dams from Satellite Data

Perry Oakes of the Auburn NRCS gave us data on 107 sites for PL 566 dams. This data was compared to data derived from satellites. All statistics were done in MS Excel 2007. The usual measure of correlation is Pearson’s “r” from the “correlate” function. No “p” values or are given in Excel. “n” values are given as appropriate.

The NRCS data did not consist of the water volume of the ponds, or surface area of the water in the ponds, at any real time. No real on the ground water volume or surface was measured. The NRCS data consisted primarily of the location of a dam and the contour lines for possible water depths on the dam site. From pond configurations, up to three measures of surface area for each site could be derived. See below. Not all sites could yield all three measures.

None of the indices from NRCS data directly measures the surface are of existing water in a pond behind a dam. All are derived from configurations of the dam and the ponds around the dam. They all estimate the surface area if the pond behind the dam were filled to a certain level as a result of recent local weather events. Thus comparisons are between water as found by GIS with estimations of what water could be in various situations; comparisons are not of water as found by GIS with real water as reported by direct observation.

-Sediment Pool Size. This is the smallest area, usually immediately adjacent to the dam. The total n available equals 83. Mean size = 31.86 acres. This is based on the minimal water that would be in the dam under non-drought conditions.

-Principle Spillway Crest or Middle Elevation. This is the middle size area. It is based on the volume of water that would be required to fill up the pond to the spillway and to routinely send some slight amount of water over the spillway. Ordinarily, water in the dam would not be at this level except after rain. Water might approach this level. This measure is a good indication of what to expect if water surface area were measure on-the-ground under normal conditions, although it will be somewhat greater than what would normally be found. Total n = 67. Mean size = 99.46 acres.

-Maximum Floodwater Size; or surface area on second elevation. This is the largest measure of area of NRCS PL 566 ponds used in this report. It is the maximum normal capacity of the dam, and usually is reached only under heavy sustained rainfall. Total n = 83. Mean size = 142.12 acres.

81 of 83 cases are shared between Sediment Pool and Principle Spillway.

Satellite (GIS) data was available for n = 107 of these sites. Satellite data found 100% of the NRCS dams and the water behind the dams. Mean size = 40.91 acres. GIS data mean size is closest to Sediment Pool Size, the smallest of the measures, but GIS size exceeds it somewhat. GIS mean size is smaller than Principal Spillway size and much smaller than Maximum Floodwater size. This outcome is what we would expect if the ponds had some water in them but were not consistently spilling over and were not near flood capacity; it is about what we would expect under normal conditions if heavy rain had not fallen recently. This outcome means that there is no ready measure of pond capacity or pond surface area from NRCS data that can be directly compared to the GIS data. We have to approximate what to expect from the measures available. This outcome also warns us that small ponds vary more in response to local conditions of terrain and weather, and so we might have problems comparing what is found in GIS data with estimates available from NRCS data.

Before assessing the accuracy of the GIS data, it is useful to see how the three measures from NRCS correlate. N = the cases available from both measures together only.

-Sediment Pool with Principle Spillway 0.68 45 -Sediment Pool with Maximum Floodwater 0.58 83 -Principle Spillway with Maximum Floodwater 0.85 49

The two largest measures correlate well with each other (line three). The smallest measure does not correlate well with either larger measure (lines one and two). The smallest measure of the capacity of the pool probably is not a good indication of the shape of the pool, the normal amount of water in the pool, or the configuration of the pool when normal amounts of water are in the pool. Thus it makes a difference which measure we compare to the GIS data. We have to compare GIS data with all measures, and with at least one combination of measures, so as to be accurate and to take in as many cases as possible; but we cannot compare GIS data with all combinations of measures from NRCS data without getting confusing.

The best combination of measures is based on Principle Spillway but adds input from the other two measures. Find the arithmetic mean of Sediment Pool and Maximum Floodwater; n = 81; mean = 87.68. This mean compares well with the mean for Principle Spillway (m = 99.46). For all cases in which Principle Spillway is available, take that data. For those cases in which Principle Spillway is not available, take data from the mean of Sediment Pool and Maximum Floodwater; n = 99; mean = 91.45. Call this data “Combination”. Note that the mean for Combination also compares well with the mean for Principle Spillway, so adding data to Principle Spillway to create Combination has not biased the data already available for Principle Spillway. It has allowed us to recover 32 more data points over Principle Spillway alone.

The correlations of GIS data with the various measures are:

-Sediment Pool 0.48 83 -Principle Spillway 0.58 67 -Maximum Floodwater 0.64 83 -Combination 0.65 99 Even though the mean of GIS data is closest to the mean for the smallest measure (Sediment Pool), that correlation is the weakest. This effect is probably due to the size variability of small bodies of water, both in the Sediment Pool measure and the GIS data; and the effect is likely due to the fact that the Sediment Pool is not the most accurate model for overall configuration of the pond when it holds normal amounts of water.

The correlation between measures and the correlation of GIS data with a measure both improve with increasing size of a measure. This double effect indicates that area size stabilizes as the water body gets fuller and covers more surface area, indicates that the larger measures likely do better represent the shape of the pond as it holds normal quantities of water, and indicates that the GIS data is tracking the increased stability and representation of middle sized water bodies even though its mean is smaller. It would have been more dramatic and satisfying if the mean for GIS data also approached the mean of the middle sized indicator, Principle Spillway; but that kind of correspondence is not realistic based on what the measures are.

The stability of the relation appears in the results for Combination. The correlation there is as great as with Maximum Floodwater even though the mean is smaller, and with more data points. GIS is reflecting the size and stability of the NRCS PL 566 ponds under normal conditions even if the GIS data is not the same size as the measures.

This GIS data likely does accurately reflect the surface area of the ponds at NRCS PL 566 dams under normal conditions. The GIS data can be useful for work done on NRCS PL 566 ponds if we remember the differences in means of various measures and we use scaling factors. For instance, the ratio of the means of Principle Spillway to GIS data is 2.43 (or 0.41). If we know the Principle Spillway measure for a dam, we can estimate the surface area of water to be found in the dam under normal conditions through multiplying by 0.41. This can then be checked with GIS data.

A good way to further investigate these effects is to measure the pond sizes both directly on the ground and through GIS satellite data, on the same days, compare these results, and compare these results with the measures obtained from NRCS information about pond configurations. These results would clarify the relation between NRCS measures based on pond configurations with what water is actually to be found in the ponds under particular conditions.

In situations like the one described in this report, “r” is likely a better indicator than “r-squared”. We are not seeking to understand causes of variance; this is not a multi-variable analysis. “r” gives us a reasonable descriptor of the accuracy of the GIS data and a device to move from the GIS data to the measures derived from the NRCS data. “r” gives us a good idea of what is going on. It is easy to derive “r-squared” if so desired.