EMPIRICAL MODELING OF REGIONAL STREAM HABITAT QUALITY USING GIS-DERIVED WATERSHEDS OF FLEXIBLE SCALE

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of the

By

Sanjeev Arya, B.Arch. (Hons.), M.C.R.P.

* * * * *

The Ohio State University 2002

Dissertation Committee: Approved by Dr. Steven I. Gordon, Adviser

Dr. Hazel A. Morrow-Jones ______Adviser Dr. Carolyn J. Merry City and Regional Planning Program

Dr. Kenneth T. Pearlman

Copyright © by Sanjeev Arya 2002

ABSTRACT

Two new watershed-delineation approaches are used to build statistical regression models that explain the variations in Qualitative Habitat Evaluation Index (QHEI) scores at hundreds of sites in the Eastern Corn Belt Plains ecoregion of Ohio. Hydrologically contributing areas upstream of the sampling stations are delineated using 1) non- overlapping watersheds up to the next upstream sampling station, and 2) custom watershed areas, called localsheds, delineated only up to a user-specified upstream flow- length. Both approaches provide an insight into the scale effects of various stressors. A large GIS-intensive database is developed with fine-resolution data on 30 m DEM,

1:24,000 scale roads and streams with network topology, proportionally allocated census block demography, and 30 m land cover characteristics. Stream order, reach sinuosity, and upstream network distances are derived using macro programming. The models explain more than 40% of the variation in habitat quality using stressor-related data within hydrologically connected areas that are only a mile upstream of the sampling sites.

The results confirm the utility of the localshed framework and suggest that riparian row crops, commercial, and urban land uses in narrow strips, relatively broad swaths of deciduous forest, stream channelization, reach sinuosity, and stream order have a significant impact on the habitat immediately downstream of stress locations. Further research is outlined for model structure, data resources, and GIS software.

ii

Dedicated to my mother

iii ACKNOWLEDGEMENTS

I owe sincere thanks to my adviser, Dr. Steve Gordon, for being a true mentor.

He was always available with thoughtful, precise, and pithy advice. Due to him I always had easy access to fast computers, huge disk space, and to his personal time. This work would not have been possible without his intellectual, material, and financial support.

I thank Susan Cormier and Susan Norton (USEPA), Ed Rankin (Ohio EPA),

Arnold Engelmann, and Tracy Douglas for help with data issues; Sarada Majumder and

Hag-Yeol Kim for copies of their dissertations; Kang-Ping Shen and Hag-Yeol Kim for exciting GIS discussions; Mert Cubukcu for stimulating talks and coffee; and Ellen

Wallace for providing administrative help and candies. This work is made possible by the financial support from USEPA/NSF (research grant # R824769 and cooperative agreement #CR826816-01-0), the Department of City and Regional Planning

(assistantship), and the University (fellowship). I am grateful to Drs. Carolyn Merry,

Hazel Morrow-Jones, and Kenneth Pearlman, for serving on the advisory committee and providing encouraging and useful comments.

My mother silently encourages me, from thousands of miles away, to value hard work and simplicity in life. My father and grandmother inspired from their heavenly abodes. I owe many thanks to my wife, Sujata, for keeping me focused, prosperous, and in generally good shape. Towards the end, my son Aakarsh infused a lot of fun into this writing through numerous innovative and ingenious ways.

iv VITA

January, 2002 - present …………….. GIS Application Developer Ohio Supercomputer Center Columbus, Ohio, USA.

June, 2001 – December, 2001 ……….Software and GIS Consultant Software Architects Columbus, Ohio, USA.

1996 - 2001 …………………………. Graduate Teaching and Research Ohio State University Columbus, Ohio, USA.

1997 ………………………………… M.C.R.P., City and Regional Planning Ohio State University Columbus, Ohio, USA.

1995 ………………………………… University Graduate Fellow Ohio State University Columbus, Ohio, USA.

1992 - 1995 …………………………. Architect Satish Gujral Design Plus, Inc. New Delhi, India.

1992 ………………………………… B.Arch. (Hons.), Architecture Indian Institute of Technology Kharagpur, West Bengal, India.

FIELDS OF STUDY

Major Field: City and Regional Planning

Minor Fields: Geographic Information Systems Computer Programming

v

TABLE OF CONTENTS

Page

Abstract ------ii

Dedication ------iii

Acknowledgements ------iv

Vita ------v

List of Tables ------ix

List of Figures ------xi

Chapters:

1. Introduction ------1

2. Literature review ------4

2.1 Ecological risk assessment ------4 2.2 Environmental modeling ------7 2.3 Biological criteria ------8 2.3.1 Stream habitat quality ------13 2.3.2 Qualitative Habitat Evaluation Index (QHEI) ------14 2.3.3 Index of Biotic Integrity (IBI) ------17 2.4 Ecoregions ------18 2.5 Reference sites ------18 2.6 Watersheds ------19 2.7 Scale and hierarchy ------22 2.8 Landscape factors ------25 2.8.1 Riparian zone landscape ------26 2.8.2 Urbanization ------26 2.9 Water quality case studies ------27

vi 2.9.1 Eastern Corn Belt Plains Ecoregion series ------30 2.9.2 River Raisin Basin series ------31 2.9.3 Saginaw Basin series ------32 2.9.4 Other studies ------34 2.10 Why model stream habitat quality ? ------36 2.11 Discussion ------38

3. Methodology ------42

3.1 Conceptual Model ------42 3.2 Statistical Model ------46 3.2.1 Assumptions ------48 3.2.2 Outliers ------49 3.3 Research Hypothesis ------50 3.4 Spatial Unit of Analysis ------52 3.4.1 Localshed ------54 3.4.1.1 Localshed based on spatial independence ------56 3.4.1.2 Localshed based on flexible scale ------59 3.5 Spatial Overlay ------62 3.5.1 Area-prorated assignment of Census data ------62 3.6 Uncertainty ------63 3.7 Discussion ------65

4. Data ------68

4.1 Study area – ecoregion ------69 4.2 QHEI samples ------74 4.3 DEM ------80 4.3.1 Slope ------82 4.4 Stream network ------83 4.4.1 Sinuosity ------84 4.4.2 Stream order ------88 4.4.3 Riparian zone ------88 4.5 Land use and land cover ------89 4.6 Roads ------97 4.7 Population and Housing Density ------97 4.8 Spatial unit – localshed ------97 4.8.1 Localshed based on spatial independence ------98 4.8.2 Localshed based on scale ------102 4.9 Correlation Analysis ------107

vii 4.10 Final Database ------110 4.11 Discussion ------113 5. Regression Analysis ------115 5.1 Models for spatially independent localsheds ------117 5.2 Models for equal-scale localsheds ------125 5.2.1 Residuals ------127 5.3 Discussion ------140

6. Conclusions ------144

References ------152

Appendix A ------161 Appendix B ------163 Appendix C ------169

viii

LIST OF TABLES

Table Page

1.1 Narrative and numerical criteria for IBI, in the context of aquatic life use designations in Ohio ------17

4.1 Narrative ratings and frequency distribution of QHEI scores ------75

4.2 Summary of slopes derived from the DEM ------83

4.3 NLCD Land Cover Classification System Key ------92

4.4 Metadata for the major variables in the localshed-based GIS database ------100

4.5 Descriptive statistics for 350 spatially independent localsheds ------101

4.6 Descriptive statistics for 543 equal-scale localsheds ------105

4.7 Correlation between QHEI and major variables in the localsheds ------109

4.8 Summary of database layers and sources ------111

5.1 Linear regression for independent localsheds using land use, land cover, and localshed area and maximum reach sinuosity variables ------118

5.2 Linear regression for independent localsheds using land use, land cover, and stream order ------122

5.3 Linear regression for independent localsheds using land use, land cover, and maximum reach sinuosity, with an interaction between headwater streams and riparian row crops ------122

5.4 Linear regression for independent localsheds using land use, land cover, and maximum reach sinuosity, with an interaction between larger localsheds and riparian row crops ------122

ix 5.5 Linear regression for independent localsheds using land use, land cover, and maximum reach sinuosity, and localshed area as a dummy variable ------124

5.6 Linear regression for small-scale independent localsheds using land use, land cover, and localshed area and maximum reach sinuosity variables ------124

5.7 Linear regression for equal-scale localsheds using land use, land cover, and localshed area and maximum reach sinuosity variables ------125

5.8 Linear regression for equal-scale localsheds using land use, land cover, and localshed slopes and headwater variables ------126

5.9 Summary of diagnostic tests to identify influential observation among regression residuals ------132

5.10 Descriptive statistics for 19 residuals from the model in Table 5.8 for equal-scale localsheds ------133

5.11 Location attributes for 19 residuals from the model in Table 5.8 for equal-scale localsheds ------134

5.12 Regression estimates and some database attributes for 19 residuals from the model in Table 5.8 for equal-scale localsheds ------134

5.13 Linear regression for equal-scale localsheds using land use, land cover, and watershed slopes and headwater variables ------138

5.14 Linear regression for equal-scale localsheds using land use, land cover, watershed slopes, and slope-headwater and slope-rowcrop interactions ------138

x LIST OF FIGURES

Figure Page

1.1 Potential integration of this study with an IBI model (Gordon and Majumder, 2000) ------2

2.1 Environmental modeling in the context of ecological risk assessment ------6

2.2 A legal perspective for implementing water quality standards ------10

2.3 An ecological perspective for studying water quality ------10

2.4 Discrete ecological scales along the hydrological continuum ------24

2.5 The conceptual link between landscape, stream habitat, and biotic quality ------38

3.1 A hypothetical (possibly, grid cell-based) approach to studying habitat quality ------44

3.2 Individual watersheds and independent localsheds ------58

3.3 Small- and medium-scale independent localsheds ------59

3.4 Algorithms for delineating custom-scale localsheds ------59

3.5 Overlay of data layers with localshed and riparian boundaries ------62

3.6 Algorithm for calculating area-prorated population for partially overlapping area units ------64

4.1 Study area and ECBP subecoregions ------70

4.2 Counties and USGS 7.5 min. quadrangles in study area ------71

4.3 QHEI sampling sites, topography, and major cities ------72

4.4 Histogram and normal plot for QHEI scores in the study area ------77 4.5 Boxplot for QHEI scores in the study area ------77

4.6 Variation in QHEI scores near Columbus and Dayton urban areas ------78

4.7 Spatial distribution of QHEI outliers with scores below 25 ------79

xi

4.8 Sources of uncertainty in calculating stream sinuosity ------86

4.9 Sinuosity of streams in the ECBP study area ------87

4.10 Strahler stream order for streams in the ECBP study area ------87

4.11 Length of stream arcs in the ECBP study area ------87

4.12 Major land use and land cover classes in the ECBP area ------94

4.13 Major land cover types in the ECBP area ------95

4.14 Temporal inconsistency between, and across, QHEI and land use data ------96

4.15 A hypothetical, temporal lag between land use alterations and their impact on habitat quality, in one spatial unit ------96

4.16 Histogram of inter-site network distance between neighboring QHEI samples ------104

4.17 Major land cover types in the independent and equal-scale localsheds ------104

4.18 Relative size and location of small- and medium-scale independent localsheds, and equal-scale localsheds, in the Big Darby Creek basin of ECBP study area ------106

4.19 Major basic data layers for each study area unit ------112

5.1 Diagnosis for normality of model residuals ------128

5.2 Diagnosis of model residuals using squared residuals and predicted values ----- 129

5.3 Diagnosis of model residuals using studentized residuals and adjusted predicted value ------129

5.4 Diagnosis of model residuals using partial regression with riparian row crop ------130

5.5 Diagnosis of model residuals using partial regression with riparian commercial-industrial-transportation land use ------130

5.6 Diagnosis of model residuals using partial regression with localshed forest cover ------131

xii 5.7 Diagnosis of model residuals using partial regression with localshed flat slopes ------131

5.8 Geographic distribution of model residuals, urban areas, and the Mad River basin ------135

xiii

CHAPTER 1

INTRODUCTION

First things first, but not necessarily in that order. - Dr. Who, Meglos

Stream habitat is recognized as the template on which community structures are built. Landscape (e.g., agricultural land use, forest land cover), and geomorphological

(e.g., stream geometry, valley slope) factors typically define the habitat quality of a stream. These factors operate at various spatial scales. According to Gordon and

Majumder (2000), it is important to study these interrelationships because habitat quality, as measured by the Qualitative Habitat Evaluation Index (QHEI), is directly linked to the ecological integrity of streams, as measured by the Index of Biotic Integrity (IBI). Earlier habitat-based studies have dealt with stream habitat quality using watersheds of fixed scale as the spatial unit and with a limited sample (e.g., less than 20 watersheds).

The objective of this study is to explain the variation in the quality of stream habitat by exploring landscape and geomorphological variables at different scales using relatively large samples (e.g., more than 300 cases) across a large area (e.g., the Eastern

Corn Belt Plains ecoregion of Ohio). This study uses a new concept for a hydrologically contributing area derived at a flexible scale (e.g., called localshed in this research), as the

1 spatial unit. It provides the modeler the freedom to use watersheds of custom-defined scale.

As sketched in Figure 1.1, after tackling the issue of aggregating habitat quality from small scale localsheds into larger size watershed units for studying IBI, the results of this research may potentially be integrated into models for explaining or predicting

IBI. Such models may be structured as multi-stage regression models or as simultaneous equations.

Landscape variables (land use / land cover) Stream Habitat Quality (QHEI)

aggregation ? Geomorphology Æ This Research variables Index of Biotic Integrity (IBI)

Water chemistry Gordon and Majumder variables (2000)

Geology variables

Figure 1.1: Potential integration of this study into an IBI model (Gordon and Majumder, 2000)

A detailed Geographic Information System (GIS) database is assembled for a major portion of the Eastern Corn Belt Plains (ECBP) ecoregion, which includes 30 m- resolution Digital Elevation Model (DEM) data from 1:24,000 scale hypsography Digital

Line Graph (DLG) data, streams with network topology from 1:24,000 hydrography

2 DLG data, roads from the 1:24,000 transportation DLG data, 30 m-resolution land use from the National Land Cover Dataset (NLCD 1992), and block-level population and housing units from the 1990 Census and TIGER/Line files. Raster and vector GIS functions, and computer programming algorithms are intensively used to derive a DEM, slopes, variable-scale stream buffers, localsheds, stream order, reach sinuosity, area- prorated census demography statistics, and network distance to the nearest upstream neighbor.

A review of the relevant literature is presented in Chapter 2. Chapter 3 presents the conceptual and technical foundation of this study. Data issues and exploratory analyses are presented in Chapter 4. Statistical regression models and residual analyses are discussed in Chapter 5. Concluding remarks on policy implications and future research are in Chapter 6.

3

CHAPTER 2

LITERATURE REVIEW

Nothing has really happened until it has been recorded. - Virginia Woolf

This chapter provides a discussion of the theoretical underpinnings of the research problem. A few relevant cases of water quality studies are also discussed, highlighting their research design and major results. Chapters 3 and 4 introduce the major research issues, methods, and design for this study. The next section describes the ecological risk assessment context for habitat protection.

2.1 ECOLOGICAL RISK ASSESSMENT

Ecological risk assessment is used to evaluate the potential adverse impacts of human activities on the ecosystem. It helps in identifying valued resources, prioritizing data collection efforts, and linking harmful activities with their future impacts. The

United States Environmental Protection Agency (USEPA) has developed guidance for conducting watershed-scale ecological risk assessments. It recommends methods for assessing the risk to ecological resources from human-oriented stressors (USEPA, 1997).

The efficacy of ecological risk assessment depends on the degree of collaboration between the various partners in the process – scientists, managers, and the public.

4 Ecological risk assessment quantifies the probability, or risk, associated with certain environmental events or hazards. The key steps are – identification of stressors

(e.g., pollutants), identification of the reference environment (e.g., watershed boundary), selection of endpoints (e.g., water quality index), estimation of the pattern of exposure

(e.g., transport models), and quantification of the relationship between exposure and effects (e.g., using multiple linear regression).

Endpoints may be standards or criteria defined by legislation or measures of ecological sensitivity of the reference environment to the hazard. Endpoints may represent terrestrial systems (e.g., forest cover), aquatic systems (e.g., macroinvertebrate species composition), or elements of ecosystems (e.g., stream habitat quality). Endpoints are events resulting from exposure to some environmental hazard. They typically have social, ecological, or economic value. Therefore, they are the variables of interest in decision-making. The outcome of ecological risk assessment is a determination of the probability, or risk, of the occurrence of a certain environmental event, such as the violation of a water quality standard or the extinction of certain species.

Hunsaker et al. (1990) discuss a framework for regional ecological risk assessment, which may be used to estimate regional environmental impacts. Two distinct phases of the risk assessment process are identified – problem definition and solution.

Uncertainties exist in both phases - in the identification of stressors and boundary during problem definition and selection of model and parameters in the solution phase.

According to the framework, spatial heterogeneity may be a major source of uncertainty in both phases, which may be reduced by observations measured at an optimal scale.

5 The USEPA ecological risk assessment framework is implemented in a case study of the Big Darby Creek watershed in central Ohio (Cormier et al., 2000). The whole problem formulation stage of the risk assessment process is shaped by the characteristics of the watershed resource. Their conceptual model describes the expected relationships between the elements of the ecosystem, and identifies potential stressors and ecological responses. The results from this work can be used in the analysis and modeling for risk in the next stage of risk assessment.

According to Graham et al. (1991), risk assessment works in two phases, as shown for an example in Figure 1.1. First, an understanding of the sources, operating mechanisms, and endpoints is developed. Second, the amount and pattern of exposure, and the linkage between exposure and effects is quantified.

PROBLEM FORMULATION ENVIRONMENTAL MODELING identify patterns quantify patterns

Stressor Exposure Impact (Beetle attacks) (Land use change) (Lake pH change)

Figure 2.1: Environmental modeling in the context of ecological risk assessment

In their illustrative study, the modeling of effects of elevated ozone levels is demonstrated for a forested region in . Long-term cumulative effects of chronic exposure to increased levels of ozone are more visible in conifers. Stressed conifers are more susceptible to bark beetle attack. This induced mortality alters the amount, type, and patch geometry (forest edge as well as interior forest) of forest cover in

6 the region. Finally, lake water acidity may be improved or the pH may shift because

conifers tend to acidify soils and waters. A stochastic model quantifies the probability of

land cover change due to bark beetle attacks, which is fed to an empirical lake water

quality model to evaluate the risk of pH shift.

2.2 ENVIRONMENTAL MODELING

Models in ecological studies help in estimating the patterns of exposure.

According to Gordon (1985, p. 3-17), models can be classified with many different

perspectives. Deterministic models assume that the variables have been precisely known

and measured. Environmental models, especially for toxic chemicals, are typically

deterministic in nature with a focus on specific stressors and transport processes.

However, due to various uncertainties in the ecosystem, statistical modeling is typically

used in regional environmental analyses. Probabilistic, or stochastic, models

acknowledge uncertainty in knowing the parameters and use probability distributions and

statistical techniques.1 Empirical models are derived from existing information using

statistical methods. Most environmental models are a combinations of these techniques.

Gordon and Majumder (2000) studied the stressor-response relationships in an

Ohio ecoregion using empirical modeling techniques. Their results are potentially useful

for application in environmental planning because the model links the anthropogenic and

habitat stressors to cumulative impacts on the biological quality of streams. Graham et

1 Gordon (1985) also describes other types of models. Physical modeling, such as the US Army Corps of Engineers’ models for some river basins, allows for visible simulation of flood events and damages. Mathematical modeling involves using mathematical equations to reduce the complex reality into a few general variables of interest. Steady state models assume constant inputs and states of the system. Dynamic models are based on inputs and states that change over time. Theoretical models extend the basic principles of the field. 7 al. (1991) discuss the linkage between spatial and stochastic modeling. Spatial models

(e.g., for pollutant transport) are used to forecast exposure. Stochastic models are used to determine the probability associated with a certain exposure leading to significant environmental impacts. The risk assessment approach often uses innovative and simple probabilistic measures of exposure and risk.

Computers help modeling with large data, complex equations, and fast calculations. Recent advances in Geographic Information Systems (GIS) have greatly facilitated spatial data collection, processing, analysis, and modeling (Johnston, 1998).

Models can be very valuable in planning if they are realistic, reliable and validated.

2.3 BIOLOGICAL CRITERIA

As discussed above, the analysis phase in ecological risk assessment inherently quantifies the relationship between stressors and their ecological response. But how do we quantify the integrity of aquatic ecosystems? This conundrum is addressed by biological criteria, which fit well in the framework of ecological risk assessment because the numeric measures allow for mathematical modeling.

The legal basis for biological criteria was introduced with Section 101(a) of the

Clean Water Act (CWA) of 1972, which formulated the objective of restoring and maintaining the “chemical, physical and biotic integrity” of the water bodies (Adler,

1995, p.347). Many water quality standards have been passed since then to protect designated uses for water bodies. However, most of them have been designed to use only chemical and physical criteria, which act as surrogates for achieving the goal of biological integrity (Yoder, 1991). This has proved to be insufficient for protecting

8 stream water quality (Adler et al., 1993). A study by the Ohio Environmental Protection

Agency (Ohio EPA) found that more than 49% of the water body segments analyzed met

the ambient chemical standards, but failed the biological standards of water quality

(Yoder, 1995, p.328). In light of the increasingly popular perception that relying solely

on physical and chemical measures of water quality is inadequate, a new class of

measures were proposed to monitor and evaluate the ecological health of streams (Karr,

1981). From the legal perspective, as indicated in Figure 1.2, biological criteria complete the gaps in water quality standards that may have persisted until their adoption.

The ecological perspective for measuring the biological health of streams is sketched in Figure 1.3. It portrays that aquatic life is the result of complex interaction and integration of physical, chemical, biotic, habitat, and energy processes, which may be very difficult to account for completely in any current modeling system. The Index of

Biotic Integrity (IBI), the Modified Index of Well-Being (MIwb), the Invertebrate

Community Index (ICI), and the Qualitative Habitat Evaluation Index (QHEI), are some of the popular indices used in ecosystem evaluation in Ohio.2 These indices have been

conceived to encapsulate multiple stressors that may impact the biotic integrity, habitat

structure, and macroinvertebrate community of freshwater streams. They are designed to

be flexible as their composition depends on the location, geological structure, human

intervention, and other factors in a regional geographical context. They are first defined

for certain reference sites within ecoregions, and the performance of individual streams is

then calibrated and measured on this relative scale (Yoder, 1991).

2 For a detailed reference on these indices, read relevant various chapters in Davis and Simon (1995). 9

ECOSYSTEM ECOSYSTEM HEALTH WATER QUALITY STANDARDS GOAL

Chemical integrity Chemical-specific + Whole effluent toxicity

Attainment of Waters Physical integrity Physical standards designated use

Biological integrity Biological criteria

Figure 2.2: A legal perspective for implementing water quality standards

Flow Regime (precipitation, velocity …) Habitat Structure (substrate, gradient, sinuosity …)

Chemical Factors (nutrients, pH, temperature …) Waters

Energy Source Biotic Factors (sunlight, organic matter …) (competition, predation …)

Figure 2.3: An ecological perspective for studying water quality (Modified from Karr and Dudley, 1981 and Yoder, 1995, p.329)

10 The USEPA required all states to develop narrative biological criteria by 1993.

Ohio started using narrative descriptions in the early 1980s. Numerical biocriteria were

included in Ohio’s water quality standards regulations in February, 1990 (Yoder, 1991).3

The IBI, ICI, and MIwb together compose the numerical biological criteria in Ohio.4

Ohio may arguably be the first state in the U.S. to have successfully passed legal tests for both habitat and biological indices.5

Numeric criteria of ecosystem health offer a number of advantages. They are

more likely to withstand legal scrutiny than narrative criteria alone (Adler, 1995, p.351).

Programs based on numeric ecosystem criteria have also been shown to be cost effective

as compared to the physical and chemical standards-based programs (Yoder, 1991).

Biocriteria embody the positive objectives and affirmative spirit behind the water quality standards.6 According to Adler (1995, p.346), biocriteria are “aims to achieve, not ills to

avoid.” Majumder (1998) lists many advantages of using IBI with the typical physical

and chemical measures of water quality. Biocriteria assess the end point - the health of

the stream’s communities - directly rather than indirectly through physical and chemical

surrogate measures. They have the potential to identify cumulative long-term effects

(e.g., from episodic stress events, non-point source pollution, and habitat alterations),

which can be easily missed by the snapshots provided by chemical levels in a point

source-based regulatory framework. Laboratory-based environments for evaluating the

3 In this text, the term biocriteria is used as a convenient replacement for biological criteria. 4 Biological criteria in the Ohio Water Quality Standards are typically arranged by the biological index (e.g., IBI, ICI, or MIwb), site type (e.g., boating, wading, or headwater), and ecoregion. 5 See a reference and brief discussion of Northeast Ohio Regional Sewer District v. Shank (1991) in Rankin (1995, p.195) and Yoder and Rankin (1995a, 128). 6 The Supreme Court interpreted in 1992 that the purpose of biocriteria was to “establish the desired condition of a waterway” (Arkansas v. Oklahoma 1992) as against the idea of defining tolerable levels of pollution for the traditional chemical standards (Adler, 1995, p.346). 11 health of organisms may never replicate the true interactions between all the known and

unknown elements of a stream ecosystem. It is also almost impossible to test all the

chemicals for toxicity (Ortolano, 1997).7 Biological monitoring may also be necessary to

identify local impacts due to stream-specific habitat alterations (Rankin, 1995).

Suter (1993) criticizes numeric measures of ecological health based on conceptual

and technical grounds. According to the critique, only societally-valued end points

should be measured through index components. On technical grounds, it casts doubt on

combining different metric components of the response into a single index. This practice,

the study suggests, misses a lot of useful detail on account of arbitrary functions used for

aggregation. On the other hand, the use of individual metric components instead of the

composite index in multivariate analysis seems to defeat the purpose of using aggregate

indices in the first place. Concern has also been raised for arbitrary measurement units

and the non-comparability of different indices. The weak predictive power of the

composite indices has also been criticized.

These criticisms have been somewhat addressed over time.8 It may be argued that

continuous biocriteria-based research may broaden our knowledge and reveal whether

biocriteria are true measures of ecologically significant end points. Aggregating the

metric components into an index provides a convenient perspective for evaluating and

monitoring the ecological health of streams. Using the individual metric components

provides a safety net against any major impact or cause of deterioration, which may be

7 According to Ortolano (1997, p.66), about 100,000 chemicals were commercially available in 1992, while there were less than 200 in 1940. He cites Petulla (1987, p.63) for this observation: testing a new chemical may cost up to $1 million and takes about 2-4 years. Petulla, J.M., 1987. Environmental protection in the . San Francisco: San Francisco Study Center. 8 For a more comprehensive response, see Yoder (1995, p.331) and Yoder (1991). 12 missed due to the masking of conflicting component effects during aggregation. Their

predictive power of may only be evaluated after sufficient research has been organized to

explain the fundamental processes and interactions in the stream ecosystems.

It is imperative to review a few basic concepts in biocriteria-based research –

habitat quality, biotic integrity, and the indices to measure them; ecoregions; reference

sites; watersheds; scale and hierarchy issues; and the impact of landscape factors.

2.3.1 STREAM HABITAT QUALITY

The distribution and abundance of species is functionally linked to the variations

in habitat features, or filters, at multiple scales. In a hierarchical framework, coarse-scale

filters shape the fine-scale habitats (Frissell et al., 1986; Poff, 1997).9 Habitat features

are conceptualized as filters defining the hierarchical organization of species across

different levels. These scales range from the microhabitat patches, channel unit

conditions, valley/reach elements, to watershed/basin filters.

According to the river continuum concept, rivers under stress undergo a gradual adjustment process based on the nature and magnitude of the geomorphic change, and the stream type (Vannote et al., 1980). According to Rosgen (1994), stream morphology is the integrated result of the interaction of eight major variables – channel width, depth, velocity, discharge, channel slope, roughness of channel materials, sediment load, and

9 The term scale can be quite misleading as it has been used in different contexts in the literature. Johnston (1998, p.20) has differentiated between large- versus small-scales in the context of cartography, where small-scale indicates large areas and large-scale means small areas. On the other hand, geographical and ecological studies typically associate scale with concepts of relative size (Strahler, 1957; Allen and Starr, 1982, p.5). Others have equated scale with some notion of stream order. For instance, first- to third-order might be small, fourth- to sixth-order might be meso (Allan et al., 1997), and higher than sixth-order might be large scale (Vannote et al., 1980). This work associates small scale with local or fine scale, and large scale with regional or coarse scale. 13 sediment size. These morphological variables may change over short distances due to changes in landform, geology, or tributaries.10 Streambed substrate is usually coarse gravel, but may be affected by sediment loading, bank erosion and slope (Mecklenburg,

1998). Fining is the filling of interstitial spaces with small particles of size <2mm.

These gaps are critical for macroinvertebrates and fish species. Filling reduces water flow and reaeration, thus making the bed anoxic and inhospitable for aquatic life.

Especially in Ohio, typically low stream gradients may cause higher retention time for sediments, resulting in more degradation (Rankin, 1995, p.194).

The habitat quality of several thousand miles of streams in the U.S. has been severely degraded (Adler et al., 1993, p.77). Anthropogenic intervention in a watershed alters the physical characteristics of a stream. Stream habitat components may be shaped by a complex interplay of anthropogenic and geomorphological attributes of the landscape (Hill and Platts, 1991). To protect the biological integrity of streams – a Clean

Water Act goal – it is necessary to preserve stream habitat. Ecologists commonly consider the physical habitat of streams to be the template for defining local biotic diversity (Southwood, 1977; Poff, 1997).

2.3.2 QUALITATIVE HABITAT EVALUATION INDEX

In 1989, the Ohio EPA developed an index of macro-habitat quality, the

Qualitative Habitat Evaluation Index (QHEI). The index is constructed to measure physical factors that influence fish communities and other aquatic life such as invertebrates. The attributes of habitat are visually estimated over a 150 m to 500 m

10 Rosgen’s morphological classification of streams does not involve averaging of the variables over entire basins to classify a stream. The morphological classification applies only to individual stream reaches (tens to thousands of meters). 14 reach, which corresponds to a reach where biological sampling is also done.11 The maximum achievable score for QHEI is 100. The score is based on the scores on six interrelated metrics – substrate quality, instream cover, channel quality, riparian zone, pool/riffle quality, and gradient.12 Below is a brief description of each of these metrics.

Substrate quality: Coarse substrate is generally considered indicative of relatively unaltered stream conditions. Land use changes and habitat alterations may introduce finer substrates by erosion and sedimentation. Substrates are considered embedded when fines fill up the interstices between coarser substrates. Sedimentation leads to lowering of interstitial dissolved oxygen, loss of spawning habitat, and reduction of benthic production. This metric is scored for a maximum of 20 points.

Instream cover: This attribute records the amount, occurrence, and types of instream physical structure. The various elements measured include logs, woody debris, overhanging vegetation, boulders, and deep pools. Their presence may provide heterogeneity of velocity, pools as respite from faster flows, reduce export and increase processing of organic matter, provide refuge from predators, and serve as spawning and nursing habitats. A maximum of 20 points can be scored for this metric.

Channel quality: The sinuosity, channel modifications, and channel stability are evaluated. The most common channel modifications in Ohio are channel straightening and deepening for agricultural drainage and flood control (Rankin, 1995, p. 191). This increases local gradients, destabilizes stream banks, increases sedimentation and peak

11 The author attended a multiday training session (July 19-20, 2000, at Ohio EPA, Columbus, Ohio) and found the process of scoring QHEI to be visual and site-specific. The Ohio Scenic Rivers Stream Quality Monitoring project proposes that trained volunteers may enter ICI bioassessment data directly into the database using the internet (Wilson, 2001, Personal Communication). 12 For a more detailed discussion of the QHEI and its metrics, see Rankin (1989). Most of the material on QHEI component metrics discussed here is from that source. 15 flows during flash floods. Thus, channel modifications lead to reduced stream sinuosity

and decreased channel stability.13 The maximum score for this metric may be 20 points.

Riparian zone: The extent and quality of riparian vegetation (e.g., trees, shrubs,

and wetlands) is important for maintaining the quality of stream habitat. Riparian zone

degradation may lead to loss of instream cover, increased rate of organic export,

increased stream bank and streambed erosion, increased sedimentation, and increased

water temperature. This metric may be scored for a maximum of 10 points.

Pool/Riffle quality: In Ohio, streams with high quality of habitat and biota have

fast and deep riffles with large substrates and deep pools. Since many fish and

macroinvertebrate species are habitat specialists, flow and depth heterogeneity fosters

habitat heterogeneity. Erosion and sedimentation result in increased fining of interstitial

spaces and greater embeddedness of substrates, thus degrading available habitat spaces.

The maximum score for this metric may be 20 points (12 for pool, 8 for riffle).

Gradient: In Ohio, stream gradient is used mainly as a proxy for stream flow

because data on flow is not always readily available (Rankin, 1995, p.185). It is

measured from 7.5-minute topographic maps. The maximum score may be 10 points.

In Ohio, QHEI is extensively used in: developing the state-wide aquatic life use

designations; intensive watershed surveys “to explain causes and sources of impacts”;

issuing permits of construction; and supporting CWA Section 401 and Section 404 water

quality certification programs (Rankin, 1995, p. 195).14

13 The sinuosity of a meandering channel is the ratio of the curvilinear stream path between its end points and the straight line distance between the end points. 14 Section 401 allows states to restrict potentially adverse projects, while Section 404 protects wetlands and other waters from dredge or fill operations (Adler et al., 1993, p.199). 16 2.3.3 INDEX OF BIOTIC INTEGRITY

The Index of Biotic Integrity (IBI) is an index for assessing the biological quality

of streams. The structural and functional attributes of the fish community are measured

as metrics, or components, of the index in three major categories – species richness and

composition; trophic structure; and fish abundance and health. In Ohio’s IBI, there are

twelve metrics, each scored relative to the performance of minimally impacted

communities in similar regions, in the range 1 to 5. The IBI scores range from 12 to 60.

Index Narrative and numerical criteria for IBI Exceptional Good Fair Poor Very poor EWH WWH LRW Boating sites 48 – 60 47 26 16 – 25 < 16 Wading sites 50 – 60 49 28 18 – 27 < 18 Headwater sites 50 – 60 49 28 18 – 27 < 18

Table 1.1: Narrative and numerical criteria for IBI, in the context of aquatic life use designations in Ohio.15

Similar indices have been developed for other components of the aquatic system

such as benthic macroinvertebrates, algae, and submerged aquatic vegetation

(periphyton). In Ohio, the numeric thresholds of IBI are used with the narrative

definitions of aquatic life uses in its water quality standards, as shown in Table 1.1.

Currently, the index is developed for only small rivers and streams, but research is on-

going for development of the index for lakes, estuaries, and large rivers.

15 Modified from Yoder and Rankin (1995a, p.121). EWH – Exceptional Warmwater Habitat; WWH – Warmwater Habitat; and LWH – Limited Resource Water. EWH criteria are based on the 75th percentile value of the reference site scores for the metric. WWH criteria correspond to the 25th percentile value of the reference site scores. 17 2.4 ECOREGIONS

Ecoregions provide the backdrop for collecting biological data, formulating biological criteria, and analyzing regional trends. The underlying basis for ecoregions is to define geographical boundaries, such that watersheds within an ecoregion are more similar than those across ecoregions. They also help in reducing the ‘noise’ in biocriteria data that may result from natural and anthropogenic variations (Yoder, 1991).

Ecoregions are defined based on certain singular factors shared by the larger landscape as a whole, such as soils, climate, and land use (Omernik, 1987).

2.5 REFERENCE SITES

According to Yoder (1991), reference sites indicate the ‘biological performance and characteristics exhibited by the natural habitats of the region’. This is the raison d’être of reference sites. They are defined for each ecoregion based on the principle that they reflect the currently attainable biological potential of the regional waters. The sites are selected by experts in the least impacted areas. A fair selection depends on two critical issues (Hughes 1995, p.43). First, excessive disturbance of a candidate reference site may force the inclusion of a reference site from adjacent similar ecoregions. Second, anomalous reference sites might be chosen that may not be representative of the surrounding region (e.g., ridge-top sites in plains, high gradient sites in low gradient area, or coldwater sites in characteristically warmwater areas). However, some calibration mechanisms may minimize the possibility of including suboptimal reference sites.16

16 One method is to plot the raw measurements on each metric against the log transformed drainage area, then a 95% line of best fit is determined (such that 95% of the values exist below the line) and the area below trisected or quadrisected based on the number of scoring categories in the metric. 18 2.6 WATERSHEDS

Stream habitat quality is influenced by hydrogeomorphological processes, which are largely a function of the local terrain. Topographical boundaries do not follow political boundaries so typical study area units such as counties or states may be inappropriate for stream habitat modeling. Watersheds are important geographic units of study in many surface water quality studies, primarily because they define the boundary of the hydrological contributions of the landscape, to a location.17 They are areas on the surface of the earth that collect surface water from other areas upstream of the mouth of the watershed.18 Watersheds are defined by the location of the mouth, and the topographic profile of the surrounding landscape. The term basin, as used in this study, indicates the drainage area for large rivers, or a series of rivers.19 Basins are probably similar to the first level of classification for hydrologic units, or regions, as defined by the United States Geological Survey (USGS) classification shown in Appendix A. On the other hand, watersheds defined locally using specific locations as outlets within their larger confining basins are called subwatersheds. These would correspond, conceptually, to the third and fourth levels of classification for the USGS hydrologic units, or hydrologic accounting units and hydrologic cataloging units, respectively.

Many studies have attempted to address the need for watershed-based studies for streams (Allan et al., 1997; Johnson et al., 1997; Lammert and Allan 1999; Richards et al., 1997). Jones and Gordon (2000) studied the implementation issues of watershed-

17 A watershed is variously known by many other terms in the literature – subwatershed, basin, subbasin, catchment, subcatchment, contribution area, or hydrologic unit. This may sometimes cause confusion, especially in studies dealing with multiple spatial scales. 18 The mouth is also known as the outlet, or pour-point, of the watershed. 19 A closed basin is conceptually similar to a naturally occurring, non-draining, bowl in the region, which is delineated by ridge lines of highest local topography on all sides, separating it from other adjacent basins. 19 based strategies and found general agreement among all levels of government on the need

and technical superiority of watershed-based regional options. Watershed-based

programs may also be integrated with state-level programs, such as under CWA section

305 and CWA section 319 (Adler, 1993, p.173).20

Strahler (1957) has summarized many significant properties and laws pertaining to watershed geomorphology. According to the text, quantitative measures for landform description can be dimensional or dimensionless. Dimensional properties, such as length, area, and volume, may be used to compare watersheds for size variations. These properties highlight scale differences. Dimensionless properties, such as stream order, drainage density, and relief ratio, may be compared irrespective of scale.21 These

properties may be considered topological because they describe the landform regardless

of variations in size.

Watersheds are often classified according to stream order.22 A stream order is

directly proportional to the relative watershed dimensions, channel size, and stream

discharge at that point in the hydrologic system (Strahler, 1957). Vannote et al. (1980)

indicate that ecosystem structure may be strongly influenced by stream size (e.g., order).

20 Clean Water Act (CWA) section 305(b) deals with the biennial state inventory of all waters and pollution sources. Section 319(a) mandates states to report on nonpoint source problems. 21 Drainage density is the ratio of the total stream length and the basin area. Higher drainage density means more streams of smaller lengths and smaller individual subwatersheds. In other words, low drainage density indicates widely spaced streams and relatively large individual subwatersheds in hilly areas. Relief ratio is the ratio between total basin relief (elevation difference between basin summit and mouth) and basin length (longest basin dimension). In other words, it is a measure of the overall watershed slope. 22 The order of a watershed is the same as the stream order of the stream on which the mouth or pour-point of the watershed is located. Streams are numerically ordered based upon the number and pattern of tributaries. Shreve ordering is additive (e.g., 2 + 2 = 4; 5 + 7 = 12), but Strahler ordering increments the stream order by 1 only if two streams of equal order meet (e.g., 2 + 2 = 3; 5 + 7 = 7). Headwater streams have order 1 in both systems. Different drainage networks are comparable for stream ordering only if their stream channels have been defined consistently. 20 The slopes in the valleys of the watershed determines the amount of surface runoff. Greater upstream slopes may increase surface runoff and cause erosion at downstream locations (Rankin, 1995, p.192). In riparian zones, the combination of loose soils and higher slopes may define the amount of sediment, substrate type, and the instream cover (e.g., woody debris) in the stream. Steep slopes causing bank erosion have been suggested to be a cause of increased sediment loads, which may lead to excessive fines in the substrate and cause damage to microhabitats due to filling of interstitial spaces (Mecklenburg, 1998).

In spite of many advantages, the watershed approach suffers from some problems.

First, watersheds do not depict similar ecosystems like ecoregions do. In themselves, watersheds are not very useful for comparisons of ecosystems because they are not defined based on any geologic, soils, climatic, vegetative, or land use patterns. Second, it is not always feasible to define watersheds across regions of little relief, karst topography

(with strong underground drainage patterns as in the Mad River region of Ohio), or much aridity (Omernik 1995, p.61). Third, the delineation of watersheds is as much an art as it is science. The size of watersheds for outlets on a stream in a basin may vary considerably depending upon the location of the outlet on the stream and the algorithm or delineation method used. The variability of watershed areas prompted Anderson (1957) to observe that a watershed’s size is the “devil’s own variable.”23

To address the first issue noted above, watershed-based studies may use ecoregions as the spatial framework for risk assessment because watersheds within ecoregions are more similar to each other than those across ecoregions. The second issue

23 As cited in Miller et al. (1996). 21 may be resolved by excluding anomalous watersheds using a priori knowledge, if possible, and by careful scrutiny of any statistical outliers. The third problem may be resolved by controlling for size programmatically during watershed delineation.

2.7 SCALE AND HIERARCHY24

Landscape is thought of as influencing the stream ecosystems across multiple spatial scales. An understanding of the hierarchical levels of the landscape may help in making predictions for stream ecology. The scale of a feature in ecological hierarchy is defined by “the time and space constants whereby it receives and transmits information”

(Allen and Starr, 1982, p.17). In other words, it is the attribute along a continuum where a feature under observation becomes meaningful. Too fine a resolution generates noise, or excessive and chaotic information, and too coarse a resolution generates constants and suppresses environmental heterogeneity (Shugart et al., 1991).25

Many routine phenomena (e.g., soil classes, contours, spectral bands, subwatersheds, and ecoregions) are discretizations of natural continuums. For instance, the biocriteria indices break the continuum of ecological health into narrative, or numeric, criteria. It is relatively easy to identify boundaries for ecological attributes that show abrupt and high-magnitude change. Properties showing large but gradual change, or small but abrupt change, are difficult to detect. In such cases, analysts often create artificial boundaries based on conceptual thresholds (Johnston, 1998, p.22). It should be noted that such classification is a conceptual artifact and not based on any tangible boundaries.

24 In ecological literature, continuum is conceptually similar to hierarchy. 25 As cited in Johnston (1998, p.31). 22 In the context of stream habitat quality, microhabitats, riparian zones, subwatersheds, and basins, decompose a hydrologic continuum into ecologically meaningful scales, as shown in Figure 2.4.26 It is not clear at which scale, the microhabitat, channel, reach, subwatershed, or basin, the various variables in the model have the most meaningful impact. Allan and Johnson (1997) note that there is limited understanding of the scale-dependency, hierarchical organization, and mechanisms of land-water interrelationships. They have suggested that the riparian corridors are important for hydrologic and biological processes across the riparian ecotone, while the entire catchment influences regional-scale factors such as chemical and sediment transport. On the other hand, Marsh (1997, p.229) suggests that sediment transport is more local than regional, at least in small rural watersheds. The contrasting findings probably only confirm that spatial factors affecting instream habitat are not yet completely understood and need further investigation (Johnson and Gage, 1997).

26 An ecotone is a zone of transition between habitat types (Ricklefs, 1997, p.649). Riparian zone is also sometimes mentioned as the land-water interface, or ecotone. 23

Sample site

Basin-scale

Subwatershed-scale

Reach-scale

Microhabitat-scale

Figure 2.4: Discrete hydrological scales along the ecological continuum.27

27 Motivated from Poff (1997) and Allan et al. (1997). 24 2.8 LANDSCAPE FACTORS

Johnson et al. (1997) define landscape features by land use, land cover, geology, and structure variables, (e.g., slope, aspect). Land use and land cover collectively represent the extent of human intervention. These are relatively dynamic. Geology and structure variables, such as surficial geology, catchment area, mean catchment slope, and standard deviation of catchment elevation, are relatively fixed landscape features. They characterize the geologic and topographic influences of the landscape.

Land use practices shape a number of problems related to watershed resources.

Of these, only the control of point sources, primarily industrial discharges, has seen some success through regulations. Nonpoint sources, particularly habitat degradation and contaminated surface water runoff, have required complex legal tools and strong incentives for voluntary actions.

According to Richards et al. (1997), the true effects of land use may be masked by its interaction and correlation with the geological structure in the watershed. The geomorphology may also influence stream flow. Catchments with lacustrine geology

(e.g., old lake beds with clay and fine silt) have poor infiltration capacity and, therefore, streams are characterized by ‘flash’ flows from surface runoff. Consequently, such streams also have variable temperature regimes linked to atmospheric patterns. On the other hand, catchments with morainic geology (e.g., dominated by coarse-textured, alluvial outwash/morainal deposits) have high infiltration, groundwater-driven streams, and relatively stable temperature regimes with low annual variations. Overall, lacustrine clay increases hydrologic variation, and glacial outwash indicates larger substrate contents.

25 Richards and Host (1994) suggest that forest cover and agricultural land use may be related to sediment loading. According to Wang et al. (1997), forest cover reduces runoff of water, sediments, and nutrients; stabilizes stream flow, temperature, and channel morphology (e.g., by controlling runoff and bank erosion); and supplies coarse organic material and woody debris as food and habitat for aquatic biota.

2.8.1 RIPARIAN ZONE LANDSCAPE

Trees in the riparian zone may contribute to instream cover (Rankin, 1995, p.191).

The presence of forests, in the riparian zone or the whole subwatershed, may stabilize the soils in the upstream regions; reduce the uprooting of top soil by rainwater because of canopy intervention; and deplete particle content of surface runoff through infiltration.

Agricultural land use reverses these trends. Riparian row crops involve practices, such as tillage, seasonal denuding of the land surface, and irrigation practices near the river, which may cause erosion. Livestock grazing may also increase sedimentation effects in riparian areas (Rankin, 1995, p.192).

2.8.2 URBANIZATION

Wang et al. (1997) suggest that urban land use increases water, nutrient, sediment, and toxic runoff, which destabilizes the dynamic equilibrium between streams and the landscape. Another major urban contributor to stream ecosystems may be the presence of roads. Roads are impervious, so they increase surface runoff contributions from areas upstream of the sampling locations. Roads are also a surrogate for the extent of urban development in the watershed. If deposited in reasonably large amounts, road salt may

26 also damage riparian vegetation and affect the instream conditions indirectly (Mattson and Godfrey, 1994).28 However, depending upon the extent of agricultural land use, rural nonpoint source pollution may still be more influential than urban nonpoint source pollution (Gordon and Fromuth, 1981).

Point sources include industries, sewage treatment plants, landfills and other similar entities. The presence of suspended particles in the discharge might influence streambed process, which may shape the substrate structure of the receiving stream.

Population density and number of housing units reflect the level of urbanization in the subwatershed. Higher values may indicate heavier sewage loads on the local streams and damage to the stream habitat. Population density and housing units may also be a proxy for urban development. While point source inputs have been used as stressors in the relationship between chemical pollutants and stream biota (Gordon and Majumder,

2000) and housing density has been correlated with algal abundance (Richards and Host,

1994), their impact on stream habitat quality is not yet clear.

2.9 WATER QUALITY CASE STUDIES

A review of some of the most relevant case studies is presented here to discuss the major issues mentioned earlier – the scale of assessment; data and technology availability; spatial and temporal design of study; impact on physical habitat, chemical quality and biotic integrity; and policy and management issues related to multi-scale analysis of aquatic ecosystems.

28 See a brief discussion of the case study below. 27 Many early water quality studies focused on understanding the effects of different land uses on chemical parameters of water quality. Gordon and Fromuth (1981) perform a multiple regression analysis of point sources and land use to estimate mean annual dissolved oxygen (DO) levels for a part of the Great Miami River in southwestern Ohio.

They model average annual DO as a function of point source BOD, land use, land cover, mean annual stream flow and other variables. Mean annual DO was found to be negatively associated with forest cover and positively associated with commercial land use. These results, though somewhat surprising in light of current knowledge, may be attributed to satellite data classification accuracy issues. They indicate that rural non- point runoff may be more influential than urban non-point runoff in affecting the BOD and DO levels, in the study area. It also suggests that cropland is not a major source of organic pollutants (e.g., BOD) to the Great Miami River. The results of the multiple regression were used as a planning tool to analyze the impact of two planning proposals on chemical water quality – the construction of a regional treatment facility, and the upscaling of a local sewage treatment plant.

The Great Miami River Basin in southwest Ohio is the site of another study by

Wang and Yin (1997) to explore the relationship between land use and chemical water quality. Electrolytic conductivity is analyzed for its correlation with 199 other water quality variables and land use, using daily water quality data from 376 congruent observations at six USGS monitoring stations along the Great Miami River in south-west

Ohio. Pearson’s correlation is used between conductivity and land use. Conductivity is positively correlated with urban land use. Spearman’s rank correlation shows strongly positive correlation between conductivity and soluble solids and phosphorus compounds.

28 Insoluble solids (suspended solids, turbidity), fecal coliform, and nitrogen compounds are negatively correlated with conductivity. Statistical analysis of spatial variations among the six stations is performed to test whether the adjacent upstream and downstream stations differed in average values. Matched-pair t-test is used because the observations are not entirely independent (e.g., same day under similar hydrological conditions).

Significant differences are found near the Dayton urban area.

Osborne and Wiley (1988) study an agricultural drainage basin in eastern Illinois for impacts of different land uses. Urban land use is found to be associated with maximum phosphorus concentrations and agricultural land use explained a significant amount of variation in nitrate concentrations. They also point out the seasonal pattern of association between land use and chemical concentrations. Hunsaker and Levine (1995) found similar results regarding the impact of land use on stream water quality in another

Illinois basin. Agricultural land use was significantly associated with phosphorus and nitrogen concentrations. Further, the impact was stronger when measured on the watershed scale as compared to the riparian scale.

Mattson and Godfrey (1994) use empirical modeling to estimate sodium concentration in 162 randomly chosen streams in Massachusetts. Multiple regression and

GIS analysis is used with loading from four types of roads and sea spray as independent factors. According to the study, salt is generally not toxic to most aquatic organisms.

The study also finds road length in riparian zone, medium density residential land use, and commercial land use to be significant factors in explaining sodium concentration. It also cites other studies, which indicate that salt toxicity may affect roadside vegetation more than invertebrates.

29 With growing awareness of the interrelationships between landscape and biologic measures, and with the advances in GIS technology, research focus has shifted from chemical water quality towards understanding local and regional biological quality of streams. The reviews below present three series of studies conducted across different basins in the midwestern United States.

2.9.1 EASTERN CORN BELT PLAINS ECOREGION STUDIES

The Eastern Corn Belt Plains (ECBP) ecoregion is the geographic area of research on biocriteria-based ecological risk assessment process (Norton et al., 2000; Gordon and

Majumder, 2000; Cormier et al., 2000). The ecoregion is in west-central Ohio, covers about 25 seventh-order watersheds, and has a rich database available on biological and habitat quality sampling from 1988 onwards.

Norton et al. (2000) studied the ability of biological metrics to distinguish among stream chemistry and habitat factors. QHEI is used as one of many stressor variables, and IBI is one of the response variables. Factor analysis showed that the first six stressor factors explained 69% of the variation in the response variables. Their results suggest that diagnostic models could be developed for site-specific and regional assessments.

Gordon and Majumder (2000) developed an empirical environmental model for assessing the cumulative impacts of anthropogenic and habitat factors on the fish community structures using data across 18 seventh-order basins. The GIS-based linear regression models explain about 66% of the variation in IBI using QHEI component metrics, urban land use, an index of chemical pollution, and upstream and downstream

30 IBI scores as the stressor inputs to the model. The study also suggests the use of detailed riparian data to improve the accuracy of future regional-scale environmental assessments.

In a study of the Big Darby Creek watershed by Cormier et al. (2000), conceptual modeling identified several potential stressors in the watershed – sedimentation, changes in water flow, changes in the physical characteristics of the stream channel, loss of tree cover adjacent to the stream, nutrient enrichment, and chemical contamination. It should be noted that most of these stressors are related to the habitat quality of the streams.

2.9.2 RIVER RAISIN BASIN STUDIES

The second series of environmental studies have been undertaken on catchments of different scales in the River Raisin basin, a south-eastern Michigan catchment (Allan et al., 1997; Lammert and Allan, 1999). The total drainage area of 2,776 sq. km. (fifth- order, meso-scale watershed) for the River Raisin basin consists of ten subcatchments.

The studies use biocriteria information from low flow season (e.g., summer) data.

Allan et al. (1997) studied the relative importance of riparian vs. catchment-wide landscape pattern on physical, chemical, and biological conditions of a stream by examining 23 sites across seven meso-scale catchments in the upper-half of the basin. A

GIS-linked distributed parameter model predicted that forest cover would reduce runoff, sediment, and nutrient loads. Correlation between stream condition indices and catchment land use becomes weaker at local scales. Stream integrity is influenced by land use differently at different scales. Instream habitat and organic matter are influenced by local stream conditions (e.g., vegetative cover) at the site. Nutrients, sediments, and channel characteristics are influenced by regional conditions (e.g., upland land cover).

31 Catchment-scale agricultural land use is the strongest predictor of habitat quality and biotic integrity. Local riparian vegetation is a weaker predictor of stream conditions using multiple regression. According to the study, roads in riparian zones may be studied as a proxy for riparian residential development.

Lammert and Allan (1999) studied the relative influence of habitat structure and land use/cover on overall biotic conditions of a stream by examining 18 sites across three first-order, meso-scale catchments in the same basin as the previous study. The drainage area of these three catchments ranges from 50 - 75 sq. km. Fish are found to be more sensitive to stream flow and riparian land use. Macroinvertebrates are more correlated with habitat structure measures (e.g., substrate). Habitat variables explains stream biotic conditions better than land use. Riparian land use is more important than catchment land use in predicting stream biotic conditions. The stream habitat and riparian land use variables, although significant, explain only a modest amount of variability. The best model explained up to 53% of the variation in IBI by 100 m-riparian forest land use

(28%), stream flow (16%), and channel width (9%). The results indicate that riparian land use and instream habitat may not be independent variables in this area. They conclude that local habitat conditions are best revealed at finer scales.

2.9.3 SAGINAW BASIN STUDIES

The third series of studies related to stream habitat and biota quality were undertaken in the Saginaw Basin of Lake Huron in east-central Michigan using summer and autumn data (Johnson et al., 1997; Richards et al., 1997; Richards et al., 1996).

32 Various issues related to land use, and spatial and temporal scales, are analyzed by Johnson et al. (1997) in a wide-ranging study of water chemistry across 62 sites. The catchments for the sites range from about 12 to 3,000 sq. km. in area, from first- to sixth- order in scale, across five basins ranging from about 2,000 to 7,000 sq. km. in area. The relative contribution of catchment-scale geologic and land use variables in explaining the variation in stream chemistry is measured using partial redundancy analysis. Then, their quantitative contribution is explored using multiple regression. The steps are repeated with riparian-scale data, and summer and autumn data.

Row crop agriculture is the most important explanatory variable as it is significant for most chemistry variables. Urban areas are significant in explaining particulate-related chemistry variables. Nitrates exhibit strong seasonal differences because surface runoff and fertilizer application in autumn is lower. In summer, land use factors are better in explaining water chemistry variability, while geology/structure variables fare better in autumn. Total variation explained by landscape factors is better in summer (56% vs.

39%), and slightly better by ecotonal (60% v. 56%) than catchment data. Most individual chemistry variables are best explained by riparian data in both seasons. Phosphorus was, overall, the least explained variable probably because of scale and adsorption issues.

The issues of scale and landscape influences are studied with respect to stream macroinvertebrate features using logistic regression analysis, in the same Saginaw River

Basin by Richards et al. (1997) using summer data. Catchments ranging in size from 7 to

570 sq. km. are selected for 58 sites. Reach-scale habitat properties significantly predicted 14 of the 15 species traits, as compared to four traits by catchment-scale geology/land use properties. Reach-scale models also explained 21% - 54% of the

33 variation in predicted probabilities, compared to 14% - 31% by catchment-scale models.

Cross-sectional area, percent shallows, and percent fines are the significant habitat variables. Land use variables are not significant for any trait.

Richards et al. (1996) use a study area comprising 46 sites in the second- and third-order subcatchments of the same Saginaw Basin as mentioned above. The study compares the influence of geologic versus land use/cover patterns on stream habitat and biota across catchment versus riparian scales. A partial redundancy analysis is performed, first to quantify the variation in stream macroinvertebrate assemblage data as a function of stream habitat variables, and then to quantify the relative contribution of geologic and land use variables in explaining the variation in stream habitat.

Stream habitat variables explained 37% of the variation in macroinvertebrate data.

Geologic and land use/cover factors accounted for about 50% of the variation in physical habitat. Individual habitat metrics are explained to different degrees by the combination of geology and land use variables. Also, within each individual habitat metric, the relative influence of geologic or land use variables is different. The results suggest that channel morphology and other channel variables (e.g., bankfull depth, shallows, and flood ratio) may be better explained at catchment scale. Fines, woody debris, and erosion are better explained by riparian-scale variables. Riparian zones may also explain sediment- and erosion-related variables, such as substrate and bank stability.

2.9.4 OTHER STUDIES

Besides the three series of studies discussed above, many separate studies have also researched stream habitat quality issues.

34 Richards and Host (1994) conducted an analysis of catchment-scale land use and land cover patterns for their influence on stream habitat features by examining 11 sites across second- and third-order subcatchments along the Lake Superior North Shore, in

Minnesota with low flow, or summer, season data. First, principal components analysis was performed to identify the habitat factors useful in explaining variations in macroinvertebrates. Second, correlation analysis revealed relationships between watershed-scale landscape variables and stream habitat features. Principal components analysis revealed substrate heterogeneity - the association of embeddedness, substrate size, and woody debris - as the most important factor in explaining the variation in macroinvertebrate assemblages. The second most important factor was comprised of stream width and algal abundance. The run (percent), shading (percent) and sinuosity variables are along another axis. Four best factors account for 75% of the variation.

Correlation analysis showed that substrate size decreased with increasing urban development, embeddedness increased with increasing agriculture, and algae was negatively correlated with forests and positively correlated with housing density. They also suggest that high-resolution data may be needed to quantify the impacts of riparian land use and sinuosity.

Wang et al. (1997) performed correlation analysis of land use patterns, measured at catchment- and riparian-scale, for their influence on stream habitat quality and biotic integrity of a stream. The study examines 134 sites across a range of second- to fifth- order subcatchments located throughout Wisconsin with summer data. The initial 277 sites were reduced to 134 by combining data for adjacent co-watershed sampling sites

35 and using their sum or mean value.29 The results indicate that forested land is positively correlated, and agricultural land is negatively correlated with habitat quality and IBI, at both the watershed- and riparian-scale. Agricultural land use affects habitat and biotic integrity beyond a general threshold of 50% of the watershed area. Urban land use is more strongly correlated to IBI than to habitat. The threshold value for urban impact is much lower at 10%-20%. Overall, watershed-wide land use is more strongly associated with stream habitat and biotic integrity than riparian land use.

Mecklenburg (1998) studied the influence of physical attributes of streams on sediment pollution. Three reaches, one mile each, of Salt Creek in south-central Ohio

(one straightened reach immediately upstream and one straightened reach immediately downstream of a sinuous reach) are analyzed for sediment pollution and its correlation with channel sinuosity. Sediment pollution is measured in terms of the fining and length of riffles and the fining and depth of pools. The study revealed that the sinuous reach had better riffle and pool qualities. It had less fining, or coarser substrate particles, longer average riffle length, and deeper average pool depth. This indicates better scouring of accumulated fine sediments in the sinuous reach. Less fining of streambed particles indicates better quality interstitial spaces as microhabitat, greater pool depths, less turbidity, and better scouring.

2.10 WHY MODEL STREAM HABITAT QUALITY?

Research on habitat quality may be categorized into three major areas - theoretical understanding of fundamental issues (e.g., Vannote et al., 1980; Frissell et al., 1986;

29 Note that in Gordon and Majumder (2000) same-year samples are treated as independent observations. 36 Rosgen, 1994; Mecklenburg, 1998; Poff, 1997), exploring the association between habitat and landscape factors (e.g., Lammert and Allan, 1999; Allan et al., 1997; Wang et al.,

1997; Richards et al., 1996; Richards and Host, 1994), and legal or administrative perspectives for implementing habitat-based programs (e.g., Rankin, 1995; Adler, 1995;

Yoder and Rankin, 1995b). Of these, the body of literature regarding empirical modeling, to explain or predict habitat changes, is found to be the most lacking. Most studies have used habitat quality as a stressor factor in explaining chemical or biotic impacts (e.g., Gordon and Majumder, 2000; Lammert and Allan, 1999; Richards et al.,

1997), not as an ecological endpoint or ecosystem response to landscape influences.

Hughes (1995, p.33) cites a study of over 1,300 reaches in the United States, which found that most reaches suffered from impacts related to habitat. Nonpoint sources (38%), sedimentation (28%), erosion (18%), and channelization (12%) were major conditions while point sources were a major concern in only 5% of the waters. In this context, it is interesting to note that until recently few states even collected monitoring data for habitat destruction or alteration (Rankin, 1995, p.184). Not surprisingly then, there is a corresponding lack of empirical studies with detailed data across multiple spatial scales, to explain our understanding of the interrelationships between hydrological, geomorphologic, and anthropogenic factors of the landscape (e.g.,

Allan et al., 1997; Poff, 1997; Richards et al., 1996; Richards and Host, 1994). The general lack of data and empirical modeling may also explain the relatively weak regulatory efforts to safeguard our stream habitats (Rankin, 1995, p.181).

There is a lack of regional habitat models, although a few studies have attempted to research the underlying factors related to habitat quality over a single basin (e.g., Allan

37 et al., 1997; Lammert and Allan, 1999; Johnson et al., 1997; Richards et al., 1996;

Richards et al., 1997). Habitat modeling may be performed in the framework of ecoregions to reveal any broad spatial or ecological patterns.

In the context of the ecological risk assessment framework, the applicability of environmental models, using biological criteria as an endpoint, depends upon our ability to explain and predict the biological impacts of habitat-related stressors (e.g., those identified by Cormier et al. 2000). As a corollary, the stressors may themselves be linked to human activities and the related landscape factors. This may help us explain and predict the behavior of these stressors as a result of adverse human activities. This two- stage scheme is sketched in Figure 2.5.

stressor endpoint stressor endpoint

Landscape variables Stream habitat quality Stream biological quality

Figure 2.5: The conceptual link between landscape, stream habitat, and biotic quality

Many studies have recommended further research regarding habitats across multiple spatial scales (Allan et al., 1997; Poff, 1997; Rankin, 1995, p.182). The results from regional habitat modeling may also be used as an input to regional models for biological quality (e.g., Gordon and Majumder, 2000) in a design linking the two models across multiple spatial scales. This may address the relative lack of diagnostic tools to enable planners and managers identify the stressors, and formulate the appropriate level of remedy to fix, or ameliorate, the impact (Davis and Simon, 1995, p.5).

38 2.11 DISCUSSION

Landscape influences stream habitat at multiple scales. In their assessment of scale issues, Allan et al. (1997) found watershed-scale factors to be dominant, while

Lammert and Allan (1999) found riparian-scale factors to be more useful in explaining stream habitat quality. The absence of a clear consensus on scale issues indicates that many in-stream habitat processes are not completely understood. Johnson and Gage

(1997) note that predicting environmental gradients (e.g., stream habitat) using multivariate techniques have suffered from an inherent ambiguity about the fundamental driving processes.

Watershed-based regional habitat studies may be limited by many uncertainties.

From a broader perspective, other sources of pollution (e.g., air, land, and water) may confound the dynamic equilibrium of stream ecosystems. Major factors affecting uncertainty include: definition of spatial boundary, choice of temporal scale, spatial heterogeneity, availability of data, and GIS-based data processing. There is limited understanding of scale-dependency, hierarchical organization, and land-water interrelationship issues (Allan and Johnson, 1997).

The issue of aggregation of local variables into regional geographical units across multiple scales is not trivial. Experts caution against cavalier use of aggregation

(Hunsaker et al., 1990; Rosgen, 1994). Some studies, however, have used simple measures of spatial or temporal averaging for convenience (Yoder and Rankin, 1995b, p.264; Wang et al., 1997; Gordon and Majumder, 2000). Statistical modeling may be a useful tool in the biocriteria-based monitoring of pollution control programs.

39 Sampling strategy and sample size are critical elements in watershed-based modeling. The study design of Allan et al. (1997) was more suited to discriminate watershed-scale spatial effects because it had 23 sites spread across seven fifth-order subcatchments (tributaries). Lammert and Allan’s (1999) study design was more suited for detecting local influences, with 18 sites across only three first-order subcatchments.

Similarly, the study by Wang et al. (1997) was probably better suited to detect watershed- wide effects, rather than riparian, because of the relatively coarse resolution – 400 m (200 m for urban areas) – of the land use data.

Watershed delineation may be critical in watershed-based ecosystem analyses.

Development of custom algorithms may be necessary to create relatively uniform spatial units. Detailed data are needed to study the impacts at finer scales. Studies using riparian-scale variables have reported inconclusive results about their predictive power

(Johnson et al., 1997). However, most of these studies were limited by the resolution of the data (Richards et al., 1996, Richards and Host, 1994). Besides detailed land use/cover data, other factors (e.g., slope, roads, sinuosity) may be useful for modeling stream habitat quality. The impact of urban density (e.g., of roads, population, and housing) may also be studied.

Stream habitat quality is an ecologically valued endpoint. It is important to model the risk related to habitat impairments. Conceptual and logistic uncertainties may be major limitations in modeling. In the absence of research validations to the contrary,

QHEI may be assumed to be a valid measure of stream habitat quality in Ohio. The interactions between habitat modifications (exposure) and habitat quality (endpoint) may be modeled empirically using watershed-level landscape data across multiple scales. The

40 results of empirical modeling may potentially be plugged into models of biological quality in a broad regional framework. Regional modeling across multiple spatial scales in a hierarchical framework may improve ecological understanding and enable prediction of biotic impacts caused by changes in variables representing complex environmental and landscape factors.

Furious activity is no substitute for understanding. - H.H. Williams

41

CHAPTER 3

METHODOLOGY

There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies; the other is to make it so complicated that there are no obvious deficiencies. - C.A.R. Hoare

The review of literature, in the previous chapter, revealed that: 1) there is a need for regional empirical models with stream habitat quality as the response variable, and 2) the scale of the spatial unit has not been explicitly modeled in empirical studies. This chapter proposes a research methodology and describes the research hypotheses, statistical methods, and the major sources of uncertainties. Two GIS-based approaches to delineate the spatial unit of analysis, along with their limitations, are also discussed.

3.1 CONCEPTUAL MODEL

The theory of stream habitat quality, reviewed in Chapter 2, suggests that stream habitat components may be shaped by a complex interplay of anthropogenic and geomorphological attributes of the landscape (Frissell et al., 1986; Hill et al., 1991;

Rankin, 1995, p. 181) and stream-specific factors (Vannote et al., 1980; Rosgen, 1984).

It is probably possible (e.g., by using a grid cell-based approach) to aggregate, in space

42 and time, the influence of all the grid cells that contribute hydrologically to the habitat

quality at a site on a stream network, as shown in Figure 3.1. Mathematically, the

th contribution of the i spatial unit of influence, to habitat quality at time t, hi,t , may be represented as a function:30

hi, t = f (LU, LC, GM, d, hk, t , hi, t-1)

(3.1)

where LU is a vector of land use factors (e.g., agriculture, residential); LC is the

vector of land cover factors (e.g., forest);31 GM is the vector of geomorphology factors

(e.g., watershed valley slope) in the ith unit; f is the net impact of landscape factors; d is

the flow length (e.g., the network distance along the path of flow from the unit cell to the

stream habitat sampling site); hk, t is the current habitat quality of the neighboring cells;

and hi, t-1 is the past impact of the unit cell. Both functions, for landscape and stream-

specific factors, depend on flow length.

Then, the overall impact of the contributing landscape on stream habitat quality

may be summarized, over space and time, as:

Ht = Σ hi, t + g(SS) (3.2)

where Ht is the quality of stream habitat (e.g., QHEI score) at time t, measured at the outlet of the watershed; and g(SS)is the net impact of stream-specific factors (e.g.,

sinuosity, flow). Note that only the influence of landscape factors within the

hydrologically contributing surface area for the sampling location is considered here.

30 Note that the subscript i refers to the ith element in cross-sectional data, while subscript t points to the tth time-period in time-series data. 31 In this study, land use refers to surface features associated with human-oriented activities on the surface of earth (e.g., farmland or housing). Land cover is used to identify natural features (e.g., forests). 43

Stream Hydrologically contributing area for sampling site, of size s

Instream attributes with impact g(SS)

Unit cell with impact f(LU, LC, GM)

Flow-length, d

Habitat quality at sampling site, H

Figure 3.1: A hypothetical (possibly, grid cell-based) approach for studying habitat quality. Note that spatial and temporal autocorrelation effects are not shown here.

44 By increasing the size of the spatial unit (e.g., from a grid cell to a larger

contributing area), the habitat functions of Equations 3.1 and 3.2 may be further

simplified and recombined as:

Ht = f (LU, LC, GM, SS, s, Hk, Ht-1) (3.3)

where s is the size, or scale, of the contributing area; Hk is the combined effect of all neighborhood cells; and Ht-1 is the habitat quality in the past time period. Note that s is another manifestation of d from Equation 3.1. The function may be further simplified, by ignoring past (e.g., temporal autocorrelation) and neighborhood (e.g., spatial autocorrelation) impacts, to:

Ht = f (LU, LC, GM, SS, s) (3.4)

A few observations may underscore the pragmatic value of simplifying the more

complex overall function (e.g., equations 3.1 and 3.2) in this manner for empirical

modeling. First, incorporating network flow length for each cell over large geographic

areas may be computationally prohibitive because of the inherently iterative processing in

grid cell-based GIS analysis. Second, it is likely that the unavailability of time-series

data for congruent locations may preclude an explicit study of the temporal dimension.

The literature indicates that the scales at which biological integrity and stream

habitat response to stressor factors are not fully known (Allan et al., 1997). However,

there are some fundamental hints, which indicate that stream habitat quality may be

considerably influenced by stressors at relatively small scales. For instance,

sedimentation in rural watersheds is often limited to local scales (Marsh, 1997, p.229).

Also, QHEI measurements are often site-specific indicators of the stream habitat (Rankin,

1995, p.196). In other words, stream segments may be found with habitat quality much

45 better, or worse, than the average habitat quality of the whole stream. Based on these insights, and by controlling for scale, the model in Equation 3.4 is modifiable as:

Ht = f (LU, LC, GM, SS) s = whole contributing area (3.5)

and

Ht = f (LU, LC, GM, SS) s = local contributing area (3.6)

where all landscape factors (e.g., LU, LC, and GM) are measured for the whole, or a subset (e.g., the part more local to the sampling location) of the hydrologically contributing area for a habitat sampling location. This concept, of using a subset of a watershed, is discussed in detail in the section on spatial unit below. Note that at this conceptual level the form of the relationship (e.g., linear or non-linear) is not discussed.

Also note that the final equations are an over-simplification of the already simplified reality abstracted in Figure 3.1.

3.2 STATISTICAL MODEL

Ecological processes (e.g., stream habitat structuring) are inherently inexact. In other words, they are not purely deterministic mathematical functions.32 According to

Johnson and Gage (1997), experimental manipulation of data to test environmental hypotheses is unclear, expensive, and difficult. Consequently, most landscape-based studies have been based on empirical modeling using resource-intensive sampling (e.g., case studies reviewed in Chapter 2).

Multivariate linear (Gordon and Majumder, 2000; Lammert and Allan, 1999) and logistic (Richards et al., 1997) regression has also been used in ecological modeling.

32 Deterministic equations are also known as functional or mathematical relationships. On the other hand, statistical models are also called econometric models. 46 Structural equation modeling (e.g., linear equations), semivariograms (Cooper et al.,

1998), principal components analysis (Gordon and Majumder, 2000; Richards and Host,

1994), cluster analysis (Gordon and Majumder, 2000), and fuzzy logic are some of the

advanced techniques available for environmental analysis (Johnson and Gage, 1997).

According to Lewis-Beck (1993, p.5), linear regression may be justified as a

starting point on account of its simplicity and general parsimony, in the absence of strong

theory to support a non-linear relationship, or if the data does not suggest a clear

alternative to a linear model. For these reasons, multivariate linear regression is the

method selected for statistical modeling in this research.33 Given the general lack of

empirical studies of regional habitat quality, this may be a reasonable approach for

habitat modeling. Symbolically, after accounting for the scale and riparian zones:

S H = β0 + (βLU *LUR) + (βLC *LCR) + (βGM *GMR) + (βSS *SS) + u (3.7)

where HS is the true habitat quality at a fixed location on a stream with a

contributing area of scale S; βi is the true partial regression coefficient, or the strength of

the impact of factor i while simultaneously controlling for all the other stressor factors;

subscript R denotes whether the landscape factor is measured at the riparian scale or over

the contributing area; and u is the vector of missing, or unknown, factors in the model.34

Note that, by definition, the relationship is imperfect. It is subject to variability due to the random response of stream habitat quality, H, for fixed values of stressor factors (e.g., LU, LC, GM, SS) at any sampling site. Typically, a sample is randomly

33 According to Gujarati (1995, p.37), a regression model is linear if all parameters (β) appear with a power 2 2 of 1 only (e.g., not as β or β1β2). Note that, Y = β1 + β2X , which is linear in parameters (not in variable X), is a linear regression model. The basic model in Equation 3.7 also presumes that the impact from each stressor factor is additive in the overall model. However, the model may also be studied for non-additive (e.g., multiplicative), or interaction, effects between the theoretically significant explanatory variables. 34 The u factor is also known as the stochastic (random) residual, disturbance, or error in the model. 47 drawn from a population and the linear regression function is used to estimate the true

average values of the corresponding parameters in the entire population of interest. We

also assume that QHEI, as used in this study, is a true measure of H.35

3.2.1 ASSUMPTIONS

Like any model, the linear regression model is built upon some conceptual and statistical assumptions. Conceptually, some scientists (Berry, 1993, p.356) caution that dynamic (e.g., time-series) inferences may be drawn from the partial regression coefficients in a cross-sectional model, only if there is sufficient reason to believe that the underlying process (e.g., which determines the stream habitat quality in this study) is unchanging across space (cross-unit invariance) and time (cross-time invariance).

Statistically, some basic assumptions are critical for a valid interpretation of the regression estimates and inferences mentioned above. The major assumptions, from

Gujarati (1995, p.60), are: linearity in parameters, nonstochastic stressor values,36 zero

mean value of error term for any given stressor value, homoscedasticity (equal variance for all observations) of the error term, no autocorrelation (correlation between error terms in neighboring units) between the error terms, no perfect multicollinearity (linear relationship among stressor variables), and correct specification (e.g., no missing variables, and correct functional form of the equation) of the model. Another assumption, that of normality of error terms, may often be relaxed for large samples

35 Note that H, probably measured by QHEI, is the estimate of the true (and unknown) stream habitat quality in the population (e.g., the study area), and u is the estimate of the true error term in the population. The mean of an unbiased estimator, over infinite random samples, is equal to the population parameter being estimated (Berry and Feldman, 1993, p.164). 36 Stochastic is synonymous with random. Nonstochastic refers to fixed values in repeated sampling. 48 (Berry, 1993, p.416). Many statistical and visual tests are available for inspecting the

assumptions mentioned above (Berry and Feldman, 1993; Hair et al., 1998).

Skewness, multicollinearity, and autocorrelation are some of the common

statistical challenges in environmental datasets (Johnson and Gage, 1997). The potential

violation of major assumptions will be discussed in its practical and relevant context with

the discussion of results in Chapter 5.

3.2.2 OUTLIERS

According to Gujarati (1995, p.39), the error term, u, may represent many conceptual uncertainties in the estimation process: incomplete theory, unavailability of data, intrinsic randomness in the environment, inaccurately measured variables (e.g., measurement error), and incorrect functional form of the relationship.37 For these

reasons, a rigorous analysis of the residuals may be critical in regression analysis.

Generally, a residual is an observation with an unusual estimate of the response

variable. Regression residuals may be analyzed to identify influential observations.38

Eventually, the model may be respecified, or influential observations may be deleted, but

only after careful scrutiny of the potential underlying causes of the discrepancy (Fox,

1993, p.281). Hughes (1995, p.39) mentions the usefulness of examining residuals and

patterns in biocriteria data, and points out outliers as one of the factors that may distort

multivariate linear regression models for biological criteria. Johnson et al. (1997) use

37 Most of these uncertainties are also mentioned by Hunsaker et al. (1990) in the context of ecological risk assessment. 38 Hair et al. (1998, p.219) distinguishes between residuals, outliers, and influential observations. Outliers may be extreme estimates of the dependent variable (residual) or unrepresentative independent variables. Influential observations may behave like hidden outliers and still influence the regression function. 49 Cook’s distance to compare residuals, and Wilkes-Shapiro statistic to test the normality

of residuals. Hair et al. (1998, p.236) recommend a more conservative approach – using

multiple diagnostic measures (e.g., studentized residual, Mahalanobis distance, Cook’s

distance) to identify influential observations. This is the approach used in this study.

Multiple residual diagnostics are employed in this research in order to find the

most conservative group of influential observations in the model.39 Studentized residual

for an observation is similar to standardized residual, except that the standard deviation is

calculated from regression estimates omitting that observation. Leverage points are

typically measured by hat values, which identify the cases that are substantially different

from the other cases on one or more explanatory variables. Mahalanobis distance is the

distance of a case from the mean values of the independent variables. Cook’s distance

measures a residual’s influence by considering the total change in all other residuals

when that residual is removed. COVRATIO measures the impact of an observation on the standard errors of the entire set of estimated regression coefficients. SDFFIT , like

Cook’s distance, is a measure of the overall fit. It evaluates the change in the predicted values when an observation is deleted. SDFBETA measures the relative effect of an observation on each coefficient when it is deleted from the estimation process.

3.3 RESEARCH HYPOTHESES

The overall model of Equation 3.7 is based on the hypotheses that: 1) the average stream habitat quality in the study region, as measured by QHEI, is linearly dependent on

39 The discussion on residual diagnostics is largely from Hair et. al. (1998, p.218). 50 land use, land cover, topographic structure, and stream-specific factors, and 2) the relationship is characterized by the scale of the spatial unit.

More specifically, agricultural and urban land uses are anticipated to have an adverse impact on the stream habitat quality. The strength of the impact from different agricultural practices (e.g., row crop, pasture), and intensity of urban development (e.g., low intensity or high intensity residential) is worth exploring. Additionally, higher values of road, population, and housing densities may have a negative impact on habitat. Note that population and housing densities are proxy variables with only an indirect relationship with habitat via increased imperviousness, surface runoff, and channelization. Road-stream intersections may exist as bridges or culverts. They may be associated with direct drainage discharges, as well as bank alterations to accommodate them, which may result in poorer habitat quality.

Forest cover may enhance the stream habitat quality by influencing the substrate quality and rate of surface runoff. Different types of forest cover (e.g., deciduous or evergreen) may be explored for their impact on habitat quality. Intuitively, the impact of forests is opposite to that of agricultural and urban factors.

The geomorphological attributes of the spatial unit can be measured in the form of underlying geology and watershed valley slope. Slopes have been used as an explanatory variable in some studies (Johnson et al., 1997; Richards et al., 1996), but their exact relationship with stream habitat is not clear.

Among stream-specific factors, channel straightening is one of the most common habitat alterations in Ohio (Rankin, 1995, p.191). The literature review suggests that higher sinuosity may be associated with richer habitat. Stream order signifies the

51 topological position of the stream on the hierarchical network of streams. It is related to streamflow and scale as higher stream order is typically related to larger flows and drainage areas (Horton, 1957). Headwater streams (e.g., first-order) are generally more vulnerable to habitat and biological impacts because of typical low-flow conditions.

As the model in Equation 3.7 indicates, an important research agenda is to gain an insight into the transverse (e.g., riparian widths) and longitudinal (e.g., stream order, flowlength) scales at which the stressor factors affect the stream habitat. The impact of agricultural practices near the stream may be examined. Riparian forests may be associated with additional instream cover, which may help improve QHEI scores.

Residential use in riparian areas may have a multiplier effect on surface runoff and quality. Roads in stream buffer zones may increase surface runoff and sedimentation, which degrade the quality of substrate in the streambed. These stressors may also interact with surface slopes to improve or degrade the quality of stream habitat. The strength of association between land use and habitat quality may also vary depending upon the position of a stream (e.g., stream order) in the overall watershed hierarchy.

Headwater streams may exhibit reduced ability to recover from riparian stresses.

To test these hypotheses, data may be extracted and analyzed at different scales for riparian zones (e.g., varying buffer width) and contributing areas (e.g., varying size).

Owing to the exploratory nature of this research, as new patterns appear, relevant hypotheses will continue to be refined, developed, and tested during the exploratory and statistical analysis phases of the research.

52 The next critical question is to identify the logical spatial unit of analysis. This is

directly related to the issues of scale, size, and distance, because the data for stressor

factors must be extracted and analyzed at a theoretically significant scale of reference.

3.4 SPATIAL UNIT OF ANALYSIS

The development of the model above strongly suggests the fundamental

significance of the scale of the hydrologically contributing area for a habitat sampling site

in this research. According to Gujarati (1995, p.24):

When we include heterogeneous units in a statistical analysis, the (emphasis added) [size or scale

effect] must be taken into account so as not to mix apples with oranges.

Therefore, a hydrologically contributing area of user-specified scale is selected as

the spatial unit. As discussed in Chapter 2, a watershed has been commonly used as the

spatial unit in most stream biological quality studies. The delineation method and scale

of the spatial unit, however, varies. Some studies have used relatively large river basins

(Gordon and Majumder, 2000; Mattson and Godfrey, 1994). This may be appropriate for

the specific objectives of the study. For instance, the study by Gordon and Majumder

(2000) was designed for studying fish species and river basins were selected as

geographic units. According to Karr and Dudley (1981), fish may, typically, migrate

among stream reaches depending upon the diversity and condition of stream habitat.40 In

such cases, the contributing area downstream of the sampling sites may also contribute

towards the IBI score at a site. Lammert and Allan (1999) also note the mobility of fish

in their study.

40 According to their study of Black Creek in northeast Indiana, considerable movements were noticed for fish, especially in the disturbed habitats of the stream. 53 Johnston (1998, p.95) mentions pre-GIS times when basins were “visually

[delineated using] the maximum elevation separating adjacent drainage basins on topographic maps”. Unique contributing areas, upstream of the sample point, have been manually digitized in many studies (Richards and Host, 1994; Johnson et al., 1997;

Richards et al., 1996; Wang, 1997). Triangulated Irregular Networks (TINs) may also be used to delineate stream networks and watersheds. Johnston (1998) notes that delineation of contributing areas may be automated using GIS. Lammert and Allan (1999) use C-

MAP software, while Wang and Yin (1997) use Arc/Info GIS to delineate contributing areas.41

The delineation of contributing areas is not a trivial issue. Manual delineation requires much expertise, labor, and time, and is subject to human errors and inconsistency. On the other hand, automatic delineation leaves the researcher largely at the mercy of the algorithm used in a GIS function in terms of the scale, or size, of the delineated polygon. The scale of contributing areas has been used as a constant in watershed-based ecological studies. This research attempts to address the scale issue in greater detail. See Figure 3.2 for a clearer understanding of the following discussion.

3.4.1 LOCALSHED

The hierarchical organization of stream habitat has been discussed in Chapter 2.

Considering the theoretical background and the lack of regional models for habitat quality, it is decided that the local impacts of landscape components along the

41 The watershed delineation function in Arc/Info is based on the commonly used Deterministic-8 Node (D8) algorithm by O’Callaghan and Mark (1984), according to which flow from each cell moves to one and only one of the eight neighboring cells with the steepest descent in the flow direction matrix. 54 longitudinal scales of a stream network may be explored further. The fundamental significance of network distance in a hierarchy, is theorized by Patten et al. (1976)42:

The basis of hierarchy is that most interactions decrease in strength with distance. Components in

a network have their strongest interactions with nearby components, and typically behave as

collections of localized subsystems with graded bond strengths at successive levels of

organization.

From a practical viewpoint, the size of the contributing area may also determine the amount of runoff and sedimentation reaching the outlet. A large contributing area typically has longer maximum upstream flow-length.43 It is intuitively hypothesized that the impact of sediment transport may be more pronounced in contributing areas with smaller upstream flow lengths. According to Marsh (1997, p.229), most of the sediment released from an erosion site in rural watersheds gets deposited in storage areas, or sinks, relatively close to the origin of transport within the local contributing area.44

Overall, a habitat modeling design involving very large upstream areas may be counter-intuitive to the idea that QHEI is a site-specific measure of habitat quality.

According to Rosgen (1994), many morphological attributes of streams apply only to individual stream reaches (tens to thousands of meters), not to entire basins. Taking such theoretical and practical issues into account, a new approach is discussed for delineating scaled contributing areas.

The new approach used in this study focuses on extracting only a subset of the contributing area upstream of a sampling location. Such a localized subset may be called

42 As quoted in Allen and Starr (1982, p. 168). 43 Maximum upstream flow-length is the maximum length of the path of flow from all possible locations in a watershed to its outlet. Note that the discussion on area here refers to the planimetric, or projected, area of the watershed, which may not be an accurate indicator of the actual surface area of the watershed especially in regions of high-relief. 44 According to the source, only about 10-20% of eroded soil is exported annually from a small rural basin. 55 a local contributing area, or local watershed, or localshed, in this study because it delineates the area more local to the sampling location. It also differentiates the more immediate part of an individual watershed from the areas farther upland, which may have relatively weaker impact on the attributes measured at a sampling site. More precisely, it is the contributing area upstream of a location that is delimited by some criteria such as a maximum threshold area, a maximum upstream flow length, or up to an upstream location, say, another upstream outlet. It is important to note that it is a logically, not arbitrarily, delimited subset of the hydrologically contributing area for a location. From a practical viewpoint, two other issues arise - spatial independence and scale - for individual contributing areas.

56 3.4.1.1 LOCALSHED BASED ON SPATIAL INDEPENDENCE

As Figure 3.2 suggests, explicit spatial dependence may exist between individual watersheds when large downstream watersheds engulf smaller upstream watersheds, or when watersheds overlap over a considerable extent. This may result in some areas being analyzed multiple times for their effect on the habitat quality at the outlet. This issue of spatial independence may be dealt with by terminating an individual watershed at the upstream point where another individual watershed’s outlet exists. The downstream and upstream watersheds are, for the most part, independent of each other. Individual watersheds delineated in such manner are called independent localsheds in this study.

However, Figure 3.3 indicates that the size of watersheds delineated in this manner is dependent on the sampling strategy used for selecting biocriteria measurement sites.

Since there is no experimental control for size in this strategy, sparsely sampled locations would tend to generate watersheds of larger areas. To deal with the issue of variability in the size of individually delineated watersheds, another strategy for delineating localsheds is tested. It is based on generating localsheds of similar flow lengths, as discussed below.

57

Stream Individual watershed for site A

Individual watershed Individual watershed for site B for site B Independent localshed for site A

Sampling site A Sampling site B

Figure 3.2: Individual watersheds and independent localsheds.

58

Individual watershed for site A Small-scale individual Stream watershed for site C Medium-scale independent localshed for site A Sampling site C

Individual watershed for site B Small-scale independent localshed for site A

Sampling site A Sampling site B

OPTIMALLY SPACED SPARSELY SPACED SAMPLING SITES SAMPLING SITES

Figure 3.3: Small- and medium-scale independent localsheds.

Individual watershed Scaled for site A localshed with Scaled flow length = x Scaled Stream localshed localshed after for site A Iteration 3 flow length = x1 + (2 * x)

Clipping radius Iteration 2 flow length = x1 + (1 * x)

Iteration 1 flow length = x1 Sampling site A

FIXED RADIUS METHOD FIXED FLOW LENGTH FIXED AREA METHOD METHOD

Figure 3.4: Algorithms for delineating custom-scale localsheds.

59

3.4.1.2 LOCALSHED BASED ON FLEXIBLE SCALE

As discussed in Chapter 2, the size of watersheds has been a subject of much

intrigue over the years. Karr and Dudley (1981) also note that effects based on stream

order may be distorted due to differences in the size of the upstream watershed.

Considering the issue of variation in size, subsets of naturally contributing areas upstream

of each QHEI sample point were extracted using GIS. Three different criteria were

attempted for truncating each watershed – a fixed radius, a fixed flow length criterion,

and a fixed area criterion. The fixed radius alternative involves clipping the localshed

with a circle whose center is at the sampling site. The fixed flow length option involves

the selection of only those contiguous areas from the localshed whose maximum flow

length is below a certain threshold. The scheme with a fixed area threshold uses an

iterative algorithm with the flow-length gradually increasing at each iteration until a

localshed of the desired area is obtained.45 All three methods are computationally intensive and involve GIS-based computer programming. See Figure 3.4 for a clearer understanding of these alternatives.

After experimenting with all strategies, the fixed flow-length option is selected because of relative advantages in terms of simplicity and speed, even as the fixed area strategy is the most accurate and time-intensive. The fixed flow-length method is further modified with an additional parameter for minimum area threshold to ignore processing the watersheds that are already very small and may not need any further truncation.

45 Since it is almost impossible to achieve precisely the same area value for each localshed, a tolerance is used. The area is evaluated against this range (threshold +/- tolerance) after each iteration. 60 This strategy affords some benefits for a robust statistical analysis. First, extracting small localsheds from larger watersheds produces a fine-grained, statistically robust sample of spatial units, which may increase our confidence in statistical inferences.46 Hair et al. (1998, p.605) recommend a larger sample size (e.g., 200) in studies where misspecification of the model or non-normality of the data is expected.47

Second, the larger sample size is likely to have a better spatial distribution over the study area. Smaller watersheds may otherwise be clustered only around areas of low relief.

Thus, this strategy can help test the hypotheses across the whole ECBP ecoregion. Note that the sample size of small watersheds in the spatially independent strategy is only 125.

Using the flexible scale-based strategy the number of watersheds in the sample increases to more than 500. Considering these issues, it is hoped that the statistical models discussed later will derive much of their explanatory power from this simplified approach for delineating watersheds of flexible scale.

46 The standard error of an estimated coefficient tends to decrease with increasing sample size, which increases its chances of attaining statistical significance (Berry and Feldman, 1993, p.164). 47 According to the text (p.166), very small samples may cause overfitting and lack of generalizability, while very large samples may induce spurious statistical significance. It suggests using a sample size at least 15 times (50 times for step-wise procedures) the number of independent variables in the model. 61

Spatial unit Stream

Data layer, say, land cover N-meter riparian zone

Sampling site

Spatial unit Spatial unit

Clipped data Clipped data layer layer in in spatial unit riparian zone

Sampling Sampling site site

Figure 3.5: Overlay of data layers with localshed and riparian boundaries.48

48 Three riparian zones of different widths – 30 m, 90 m, and 500 m – are explored. 62 3.5 SPATIAL OVERLAY

Under both the localshed-based strategies, geographic data are generated for all the relevant variables for each of the sampling stations, and managed in a GIS. Data are collected from different sources and processed in a GIS to conform to the same coordinate system. Both riparian- and localshed-scale data are generated by clipping the geographic data layer with riparian and localshed boundaries, as shown in Figure 3.5.

3.5.1 AREA-PRORATED ASSIGNMENT OF CENSUS DATA

An overlay of census and area unit boundaries may result in partial overlap of census blocks. For an analysis of hydrologic influences over small areas, assigning all of the population in a partially overlapping census block to a spatial unit may not be appropriate. Therefore, care has been taken to assign proportionally allocated census data to the intersecting localsheds assuming that population is homogeneously distributed in each census block. With the objective of finding an accurate estimate of population within each spatial unit, an algorithm is automated in GIS to assign only area-prorated population to the intersecting spatial unit. Figure 3.6 graphically explains the algorithm that partially resolves these issues. In another approach, only those blocks may be included whose centroid falls within the boundary of the intersecting localshed.

Note that this algorithm is not completely accurate if the population within the partially overlapped spatial unit is not homogeneously distributed. However, since block-level population count is the most detailed level of population count available, this may be a fairly reasonable approach.

63

Population of block in spatial unit = 0.15 * 1,000 = 150

Census block 15% of Population = 1,000 block area

Spatial Unit

Figure 3.6: Algorithm for calculating area-prorated population for partially overlapping area units.

3.6 UNCERTAINTY

From a statistical perspective, uncertainties may arise from the violation of any

assumptions of the model discussed earlier. For instance, QHEI sites are not sampled

randomly by Ohio EPA. However, relatively large and spatially stratified samples may

counter some of the problems posed by selective sampling (Rankin, 1995, p.131).49 The discussion of outliers also sheds light on the potential sources of error in the process.

According to Bailey and Gatrell (1995, p.34), it may not always be possible in a geographical study to separate the spatial behavior due to the spatial dependence of the observations in an homogeneous environment, from that due to spatial independence in a

49 For more details on biocriteria sampling issues, see Larsen (1995). As a practical analogy, see the footnote in the next chapter on the stratified sampling used for accuracy assessment of land use data. 64 heterogeneous environment.50 QHEI is measured as a site-specific attribute of the stream habitat (Rankin, 1995, p.196). In this study, we may expect some variation in QHEI due to the intrinsic heterogeneity in the environment rather and some due to the spatial dependence between stream habitat quality measurements.51 Whether QHEI exhibits strong spatial autocorrelation at smaller scales remains to be examined. Hydrological processes may also be subject to temporal variations (e.g., Saginaw Basin case studies by

Johnson et al. (1997) and Richards et al. (1997), in Chapter 2). Autocorrelation issues are not explicitly analyzed in this study.

According to Allen and Starr (1982, p.6), models with discrete levels of hierarchy may incorporate scale as a continuously varying function that integrates various levels, as well as the interactions between them. No such integration function is attempted here because data are not available at all the theoretically relevant scales. The landscape factors may also be correlated across multiple scales (Poff, 1997). In this study, upland locations in the contributing area (e.g., beyond the riparian zone) have not been explicitly modeled in the equation.

50 The authors illustrate these concepts with a pithy illustration. Imagine throwing unmagnetized fillings, randomly, on a paper under which magnets have been placed. The clustered result will display clustering due to first-order spatial effects or global trends from spatial independence in a heterogeneous environment. Then imagine throwing magnetized iron fillings, randomly, on a sheet of paper without any magnets. The clustered result will display clustering due to second-order spatial effects or local trends from spatial dependence in a homogeneous environment. Now imagine having to identify the separate effects, as in most real-world cases, when magnetized fillings are present along with magnets under the paper! 51 Heterogeneity, in simple terms, is the variability of the measured phenomenon. It can be indicated by the range of standard deviation, or more crudely, simply by the range of data (Gujarati, 1995, p.359). A related term is heteroscedasticity, which results from any systematic variation in the data (Berry, 1993, p.407). 65 3.7 DISCUSSION

The linear regression model proposed for this study is an over-simplification of the complicated and dynamic processes influencing stream habitat. However, given the relative lack of concrete theory, it is a reasonable first-step in regional stream habitat modeling. It is also critical to gain an insight into the potential violations of the assumptions underlying the linear regression model by analyzing the regression residuals.

The cross-sectional research design, proposed in Equation 3.7, is more likely to pose problems of heterogeneity than time-series data (Hardy, 1993, p.73; Gujarati, 1995, p.359). However, controlling for the scale of the spatial unit may help in reducing the heterogeneity to some extent.

Small scale spatial units may have positive implications on account of the modifiable areal unit problem.52 The issue about the validity of applying results obtained from an analysis at a coarser scale (e.g., states) to spatial entities (e.g., townships) at finer scales may be moot because the analysis is performed at relatively fine scales in the first place. The success of any watershed-based ecological risk assessment model may depend as much on its political feasibility as its technical sophistication (Jones and

Gordon, 2000; Allan et al., 1997). The study by Allan et al. (1997) emphasizes the need for catchment-scale management, noting that most current land use decisions are taken at the village, township, or city scale with little consideration of upstream and downstream events. Generally, larger geographic units of study may provide a more regional and holistic approach for water quality management, but applications based on smaller geographic units of analysis may be more politically feasible. Consequently, the

52 For a brief review of the modifiable areal unit problem, see Bailey and Gatrell (1995, p.37). 66 contributing area-based approaches also increase the applicability of any results in the political context.

Large amounts of data may be needed across multiple spatial scales (e.g., mentioned in Figure 2.5 in Chapter 2) to enable empirical modeling for large regions.

This is not a trivial demand. Also, high-resolution data may be required to analyze small- scale spatial units. There is only a limited amount of habitat sampling data available in the US. Data on detailed geomorphology and stream-specific factors also are generally not readily available at the time of this writing.

Power is not revealed by striking hard or often, but by striking true. - Honore de Balzac

67

CHAPTER 4

DATA

Power corrupts. Absolute power is kind of neat. - John Lehman, U.S. Secretary of the Navy

This chapter discusses the design of the digital database for the research agenda

described in the previous chapter. The data are also explored using descriptive statistics

and correlation analysis. Some interesting features of the design have been compared

with some other relevant case studies, which are reviewed in Chapter 2 and are collocated

with this study in midwestern US.

Database development has benefited from the study area being in Ohio, where

rich digital data is relatively easily available for many environmental data layers, as

compared to many other states in the United States.53 The database design also gained

valuable insights from a pilot study for a subset of the study area.54 The pilot project underscored the significance of smaller units, delineation of watersheds, accuracy of

53 Ohio has been one of the foremost states in the country in extensive sampling, development and implementation of sophisticated bioassessment programs (Southerland and Stribling, 1995, p.85; Reash, 1995, p.153; and Simon and Lyons, 1995, p.261). It was also one of the first to use ecoregions (Omernik, 1995, p.54), and adopt biocriteria for tiered aquatic life use designations in its water quality standards regulations. The author also observed through personal experience that Ohio also one of the early states to convert and make digital data on elevation, hydrography, transportation, and other layers from the United States Geological Survey (USGS) available virtually free to the public. 54 Two seventh-order watersheds – Big Darby Creek and Great Miami River – were studied in the context of stream habitat quality modeling in the spring of 1999. 68 DEM, denser stream network, and higher resolution land use/cover data. The theoretical and methodological reasons for the inclusion of these layers of study have been described in the previous chapters. They are described in more detail here.

4.1 STUDY AREA - ECOREGION

An ecoregion is selected as the study area for the theoretical reasons underscored in Chapter 2.55 Based on the first approximation of ecoregions defined by Omernik

(1987), five of the 76 national ecoregions cover the state of Ohio – Eastern Corn Belt

Plains (ECBP), Erie/Ontario Lake Plain, Huron/Erie Lake Plain, Interior Plateau, and

Western Allegheny Plateau. The complete ECBP ecoregion is spread across the political boundaries of Ohio, Indiana, and Michigan.

A part of the ECBP ecoregion within Ohio has been selected as the geographical study area for this research, as shown in Figure 4.1. This has been done for several practical reasons. First, biocriteria data are readily available for the ECBP ecoregion from the Ohio Environmental Protection Agency (Ohio EPA) for the period 1989-1995.

Second, the author has worked with the same ecoregion for a research project (see

Gordon et al., 2001) funded by the United States Environmental Protection Agency and the National Science Foundation. Finally, only the part of ECBP within Ohio is selected because of reasons related to current computer data storage and memory capacities.56

55 See ODNR (1998) for more details on the ecoregions and subecoregions of Ohio. 56 For a discussion of the computer storage and memory issues, read the section on DEM later. 69

Figure 4.1: Study area and ECBP subecoregions

70

Figure 4.2: Counties and USGS 7.5 min. Quadrangles in study area.

71

Figure 4.3: QHEI sampling sites, topography, and major cities.

72 The ECBP ecoregion is a Level III ecoregion, which covers a total of 31,706 sq.

miles.57 The study area – part of ECBP within Ohio – covers an area of 15,224 sq. miles.58 The ecoregion is predominantly a rolling till plain, with elevation ranging from

400 to 1550 feet. Mean annual rainfall is between 34 to 45 inches. Major rivers are the

Sandusky River, Great Miami River, Little Miami River, Mad River, Paint River, and

Scioto River. Most streams have low to moderate gradients. Most of the areas are local end moraines with extensive glacial deposits. Major soil orders are alfisols and mollisols.

Historically, the ecoregion had natural tree cover in the form of beech and elm-ash swamp forests. It currently supports generally extensive corn, soybean, and livestock production. Columbus and Dayton are major urban areas in the ecoregion.

The ECBP ecoregion is further divided into six Level IV subecoregions, as shown in Figure 4.1. The northern one-third of the area is the Clayey, High Lime Till Plains subecoregion. Its soil is generally less productive and needs artificial drainage. Streams are typically turbid in this subecoregion. The central-east and central-west parts consist of the Loamy, High Lime Till Plains subecoregion. The urban-industrial areas of both

Columbus and Dayton fall within this subecoregion. The Darby Plains subecoregion is known for generally high quality of land and water. The Big Darby Creek is a State and

National Scenic River with a high level of fish diversity. The average farm size and yields per acre for corn, soybean, and wheat in this subecoregion are higher than many other subecoregions in Ohio. It is also one of the five national pilot watershed-based

57 There are 99 Level III ecoregions in the continental United States (Omernik, 1995). Ohio has five of these ecoregions (and a negligible area of another). 58 When the ECBP ecoregion is clipped with the boundary of Ohio, two polygons result, one of which is only about 376 sq. miles (2.5% of the study area) at the northwest corner of the state. This area has been ignored in this study for convenience. 73 ecological risk assessments designed to improve the decision-making process for water resource management (USEPA, 1997). The Mad River Interlobate Area subecoregion has high-yielding aquifers that feed abundant ground water into its perennial, high volume streams. Riparian woodlands are common in this subecoregion.

As shown in Figure 4.2, data in the study area is collected for 244 USGS 7.5 minute quadrangles across 36 counties in Ohio.59 The data for each USGS data layer is assembled by collecting data for each quadrangle in the study area, and then mosaicking these adjacent quadrangles into a contiguous dataset using GIS.

4.2 QHEI SAMPLES

QHEI scores are extracted from the Environmental Council of the States (ECOS) database for Ohio, which is collected by the Ohio EPA. There are a total of 1,208 records of QHEI measurements for the period 1988-1995 in the original database. The final

QHEI sample contains 634 measurements in the study area for the study period 1989-

1995.60 The geographical spread of the QHEI sampling locations is shown in Figure 4.3.

Note that most sites are away from major cities such as Columbus and Dayton. While most streams sampled for habitat in this dataset drain towards the south or southeast, some streams near the northeast part of the study area drain towards the Great Lakes.

Some sampling locations are revisited in the same year. Preliminary exploration of the data shows that multiple samples in the same year for a given site are almost identical, therefore, only the latest QHEI value for any given year for each sampling site

59 Technically, the ECBP study area intersects 47 counties and 309 USGS 7.5 minute quadrangles. For reasons of computing resources and time, about 80% of the ECBP study area is used. 60 As many as 54 streams have only one sampling location in the data. 702 sites met the attribute-based search criteria, but 68 sites fall outside the geographic area bounded by the DEM quadrangles. 74 has been selected. Only the QHEI score for the latest year is used in this study for sites with sampling in multiple years within the period 1989-1995.61 Note that this is in contrast to the study by Gordon and Majumder (2000), which treats repeated samples at the same site over multiple time periods as independent samples. Temporal issues with this dataset are also discussed later in this Chapter, in the section on land use/cover.

Common errors present in the raw data include: different river-mile values for sites with identical latitude-longitude values; latitude and longitude values of ‘0’; and erroneous latitude or longitude values.

The sampling sites are converted into geographic point entities using the latitude and longitude values stored for each sampling record in the ECOS database. These points are then snapped to the nearest streams from the streams database for geographic consistency, and to facilitate many GIS hydrology and network functions.62 The attributes of each site (e.g., date, QHEI score) in the database are imported into a GIS separately, and finally, assigned to their respective sampling site.

Narrative rating QHEI score Frequency Percentage Very Poor < 30 28 4 Poor 30 – 45 67 11 Fair 45 – 60 122 19 Good 60 – 75 219 35 Very Good 75 – 90 171 27 Excellent > 90 27 4

Table 4.1: Narrative ratings and frequency distribution of QHEI scores.63

61 The data extraction query to select records is: unique, non-zero values for latitude, longitude, and year, in the ECBP ecoregion. Two geographic outliers (probably data entry errors) are deleted from the sample. 62 The threshold for snapping points to streams is 200 m. 63 The numeric ratings are from Rankin (1999). Personal communication. 75 Figures 4.4 and 4.5 show the distribution of QHEI scores across all the sites. The data is somewhat negatively skewed with seven cases scoring below 20, and 95 cases with “poor” or a “very poor” narrative rating for QHEI (see Table 4.1).64 On average, the stream habitat quality in the whole ECBP study area is about 65 (e.g., “good” quality).

As shown in Figure 4.6, visually, there is some clustering of poor QHEI scores around the Columbus urban area. However, the clustering is not evident around the

Dayton urban area. The figure shows QHEI scores at locations in the Big Darby Creek watershed to the west of the Columbus metropolitan area, and in the Middle Great Miami

River basin to the west of Dayton. It may be noted here that the areas of known impacts

(e.g., Hellbranch Run near Columbus) are more densely sampled.

Figure 4.6 also confirms that the QHEI score may vary dramatically within short distances. For instance, at locations both upstream and downstream of Dayton, QHEI scores drop from 74 to 36, and from 75 to 44, within a fraction of a mile and in a few months in 1995. Such “pockets” or “islands” of QHEI values lend weight to the theory that areas local to the sampling site may have a significant contribution to the site- specific stream habitat quality, as measured by QHEI.

The outliers from QHEI scores, from the boxplot of Figure 4.5, are mapped in

Figure 4.7. The distribution, interestingly, clusters along the topographic/hydrologic ridge line separating the north-draining (e.g., towards the Great Lakes) areas from the south-draining (e.g., towards Ohio River) areas. This indicates that headwater streams may correlate with “poor” habitat.

64 The kurtosis value is –0.170 with a standard error of 0.194, which relate to slight, but probably not significant, departure from normality. 76

Figure 4.4: Histogram and normal plot for QHEI scores in the study area.

.

Figure 4.5: Boxplot for QHEI scores in the study area.

77

Figure 4.6: Variation in QHEI scores near the Columbus and Dayton urban areas.

78

Figure 4.7: Spatial distribution of QHEI outliers with scores below 25.

79 4.3 DIGITAL ELEVATION MODEL (DEM)

High-resolution elevation data is critical for accurate delineation of hydrologically contributing areas. It is also required to perform drainage-related operations (e.g., calculate slope). This research derives elevation data from the USGS 7.5 minute quadrangles of 1:24,000 scale hypsography (contours) data in Digital Line Graph (DLG) format.65 The overall geographic extent of these quadrangles is shown in Figures 4.2 and

4.3. An elaborate GIS-intensive automation process is used to process, analyze, and merge the vector (elevation contours) for all the quadrangles in the study area into raster

(ArcInfo GRID) format. Although a 10 m x 10 m cell size may have been better to identify small-scale impacts, computer-based limitations determine the selection of a 30 m x 30 m grid.66 This resolution compares well with other habitat-related studies. For instance, both Richards et al. (1996) and Wang and Yin (1997) used 1:250,000 DEM from the USGS 1-degree dataset.

The stream channel network is used to burn existing naturally flooded areas into the rasterized terrain in order to obtain hydrologically correct terrain consistent with the real world.67 Introduced by Jenson and Domingue (1988) and further enhanced by

Hutchinson (1989), stream-burning algorithms are incorporated into current ArcInfo GIS hydrology functions. The concept is particularly useful in areas of low-relief (e.g., most

65 The horizontal resolution is about +/- 10 m, and vertical resolution is about 7-15 m, for the contours. 66 More than 1.6 GB of available computer memory would have been required to generate the DEM for the whole study area in a single operation. Also, the process could take up to 2 days on a 6-processor UNIX machine! Up to 2.5 GB of computer storage space was used for storing almost all the GIS data, of which approximately 400 MB was used by the whole DEM. It should be noted that if a 10 m grid takes, say, 100 MB of disk space, then a 30 m grid would require about 900 MB, instead of 300 MB, of disk space. 67 To be precise, lakes or ponds, and reservoirs are also used for ‘burning’ the hydrology. Their USGS minor codes are 421 and 101, respectively. 80 of the ECBP study area), which are generally known to present problems for drainage- based analyses (Hutchinson and Dowling, 1991).

To burn streams correctly, the streams data must be positionally and topologically accurate, otherwise more errors may be introduced into the dataset than resolved.

Incorrect locations may translate into incorrectly placed valleys in the DEM. Incorrect topology may result in circular loops of streamflows. This issue can be resolved by careful editing of the stream network topology so that it is fully dendritic (Saunders,

1999). Streams are manually processed for correct location relative to existing contours.68 Each stream is checked, and possibly edited, for connectivity and topology in the network so that all streams flow downstream only.69

The DEM is assembled for each of six subsets of the study area first to circumvent computer processing limitations. Then, the final seamless DEM is composed by merging these multiple DEM components. This approach is also used by Wang and

Yin (1997) in their research in a southwest Ohio watershed. The algorithm itself is not perfect across regional boundaries, especially when multiple regions are merged together to have a seamless terrain. The merge process involves interpolation of elevation values to streamline any abrupt changes in elevation across regional components. This may compromise the accuracy of the DEM. It should be clear from this discussion that errors from multiple sources may affect the DEM generation process. Errors may creep in due

68 As a general rule, the ‘V’ of a contour line points towards the upstream direction and streams flow in the opposite direction. All braided and double-line streams were streamlined to have single-line representation. Therefore, many arcs were deleted or replaced by visual estimates of their center-line. 69 Incorrectly pointing streams were flipped to point in the correct downstream direction. Proximal, but separate arcs, suggesting an inconsistent break in flow, were snapped together into a continuous arc. This was done for all the 30,226 streams across more than 250 quadrangles in the study area. 81 to incorrect elevation values in the original dataset70, inaccurate edge matching of contours across quadrangle boundaries, incorrectly positioned streams, incorrectly connected streams, and common human errors of omission and commission. DEM processing is extremely time-, disk space-, and memory-intensive.71

It should be noted that no attempt has been made to derive streams from the DEM network because of insufficient knowledge about the threshold to define the regional stream channel network. It is intuitively estimated that the DEM-derived stream network may be of higher density than USGS 7.5-min. hydrology data, which may pose problems of data management and processing. Such data may also have many spurious or intermittent stream channels as artifacts of the process rather than real stream channels.

4.3.1 SLOPE

Some studies, mentioned in the literature review, underscored the theoretical significance of valley slopes in water quality studies. According to Mecklenburg (1998), slopes and bank erosion are leading factors affecting the loading of sediments into stream channels. Johnson et al. (1997) have suggested the use of riparian slopes, and Poff

(1997) has suggested using reach slope as a predictor variable in environmental studies.

Slopes have been derived from the DEM using raster GIS functions. It should be noted that the value derived for a particular area in this manner is the valley slope as

70 Some contours had a value of zero in the original dataset. 71 The heart of the process is the TOPOGRID module of algorithms in Arc/Info GIS. It may require up to 4 times the base data size as memory space. A DEM of 30 m cells for the study area occupies about 400 MB of disk space, therefore, up to 1.6 GB of computer memory may be needed. 82 opposed to the overall slope of the whole spatial unit.72 It is also different from the channel gradient, which is the longitudinal slope of a stream channel. Johnson et al.

(1997) use valley slope in their study of stream ecosystems.

Slope can be in angles or in percentages. This study uses percentage as the unit for measuring and comparing slopes across watersheds because it is more commonly used in environmental studies. The slopes in the watersheds are categorized into seven categories as shown in Table 4.2. The table indicates that almost half of the study area is very flat (0% - 1%), and only about 7% of the area has more than 5% incline. This seems to conform to the actual topography in most of Ohio.

Slope category Percentage of study area < 1% 50 1% - 2% 22 2% - 3% 11 3% - 4% 6 4% - 5% 4 5% - 6% 5 > 6% 2

Table 4.2: Summary of slopes derived from the DEM.

4.4 STREAM NETWORK

A detailed stream network is desirable for this research because that would capture the geographic distribution of streams and their riparian areas across the whole study area. Therefore, after careful evaluation of available datasets and methodologies,

72 Valley slope is the slope of the surface in a particular part, such as a grid cell, of the watershed’s valley. A measure of the overall slope is relief ratio, which is the ratio of the total relief (difference in elevation of the watershed’s outlet and summit) to the longest length of the watershed. 83 streams have been obtained from the USGS 7.5 minute quadrangles of 1:24,000 scale hydrography data in Digital Line Graph (DLG) format.

The streams data compares well with other studies reviewed in Chapter 2.

Johnson et al. (1997), Richards et al., (1996), and Richards and Host (1994) used

1:100,000 USGS DLG hydrography files in their study of water chemistry in the midwestern US. The 1:24,000 USGS dataset has more detailed streams than in both the

Planning and Engineering Data Management System for Ohio (PEMSO) GIS database from the Ohio EPA73, and the Reach File (RF3) streams from USEPA, which have been derived from 1:100,000 scale USGS DLG data.

The selected stream channel network from the USGS hydrology layer consists of streams, double-line streams, and ditches or canals.74 Lanfear (1990) stipulates some conditions that must be met before stream orders can be determined correctly. These streams are manually processed for: correct location relative to existing contours, connectivity, downstream flow, and unique identification.75

4.4.1 SINUOSITY

According to Rankin (1995, p.191), most unaltered streams in Ohio are sinuous.

Sinuous streams may affect the stream habitat by altering the stream velocity and bank stability. Sinuosity may also alter the pool and riffle structure of the stream (Rosgen,

1994; Mecklenburg, 1998). According to Richards and Host (1994), sinuosity may also reflect upon the extent of flow modifications by different regional land uses. They could

73 The PEMSO Database is a collection of streams and other water bodies in Ohio assessed and recorded as part of the U.S. EPA 305(b) Waterbody System. 74 USGS minor codes selected are 412 (stream), 414 (ditch or canal), 605 (left bank), and 606 (right bank). Note that intermittent, ephemeral, abandoned, dry, and underground channels have not been selected here. 75 Also read the section on DEM for stream network processing. 84 measure sinuosity visually by examining a distance of 200 m on the sampled reaches because of the relatively small dataset of 11 observations. Lammert and Allan (1999) used C-MAP software to calculate sinuosity of streams.

The value of sinuosity at a site is assumed to be the sinuosity of the arc closest to the sampling site. However, this value may not be the most accurate measurement of sinuosity. See Figure 4.8 for a better understanding of this discussion. Based on the resolution and topology of the arcs in the stream network, the sinuosity may only be calculated for a segment, rather than stream or reach.76 It is not clear whether disaggregation up to this level may help, or obfuscate, our ability to understand the relationship between habitat quality and sinuosity. GIS programming is used to calculate stream sinuosity for individual segments in this study.

Sinuosity measurement is also highly scale-dependent. As shown in Figure 4.8, datasets of coarser resolutions are prone to indicate lower sinuosity due to imprecise digitization of arcs. Richard and Host (1994) also suggest that stream sinuosity may be strongly influenced by spatial resolution of the stream network. Figures 4.9, 4.10, and

4.11 show that hydrologic channels in the study area are characterized by moderate sinuosity (e.g., mean=1.15), headwater location (e.g., stream order = 1), and an average length of 0.63 miles.

76 According to Allan et al. (1997), a stream network may be classified in increasing order of (linear) scale as: channel (100 m), reach (101 m), segment (102 m), and stream (103 m). 85

Node 3 Node 4 Node 4 Upstream segment Sinuosity = 1.7 Digitized stream Sinuosity = 1.1

Node 3 Upstream segment Sinuosity = 1.4

Node 2 Sampling site Node 2 Sampling site Sinuosity = ? Sinuosity = ?

Downstream segment Sinuosity = 1.1 Actual stream Sinuosity = 1.7 Node 1 Node 1

reach-scale vs. segment-scale low resolution vs. high resolution

Figure 4.8: Sources of uncertainty in calculating stream sinuosity

86

Figure 4.9: Sinuosity of streams in the ECBP study area.

Figure 4.10: Strahler stream order for streams in the ECBP study area.

Figure 4.11: Length of stream arcs in the ECBP study area

87 4.4.2 STREAM ORDER

Different algorithms can be used to process the stream order of a dendritic network with a varying degree of efficiency (Lanfear, 1990; Miller et al., 1996). For this research, stream order is calculated using a computation-intensive GIS program to topologically traverse all the arcs in the stream network, and analyze them according to the Strahler method of stream ordering.77 See Figure 4.10 for descriptive statistics of the

Strahler stream order values. As expected, with increasing order there is a substantial decrease in the number of streams per order. The value of the stream order for the arc that is geographically closest to the sampling site is assigned as the stream order at a given habitat sampling location.

Stream flow is one of the factors suggested in water quality studies (Johnson et al., 1997). However, as Rankin (1995, p.185) points out, it has not been used in this study for two major reasons. First, there is a lack of readily available and congruent streamflow data for many Ohio stream segments where habitat quality sampling has been done. Second, stream flow is not a limiting factor for many Ohio streams. Stream order is a proxy for flow, as larger streams generally have higher average flows.

4.4.3 RIPARIAN ZONE

Riparian zone destruction is one of the most common human impacts in Ohio

(Rankin, 1995, p.188). Typically, land use classes are overlaid with riparian zone boundaries to extract relevant land use data for further analysis. Many studies have used a value of 100 m for riparian zone width (Johnson et al., 1997; Richards et al., 1996;

77 The 30,226 arcs in the stream network take more than 50 hours to process for Strahler stream order. 88 Wang et al., 1997). It should be noted that these studies were actually limited by the resolution of the land use dataset of 61m or more. Lammert and Allan (1999) studied riparian scale with a 50 m strip on each side of the streams. Richards and Host (1994) have emphasized that riparian zones as narrow as 3 or 4 m may be studied for habitat- related impacts from woody debris entering the streams. Mattson and Godfrey (1994) cite other work that finds road length in the riparian zone to be an important variable in the study of water quality.

In this study, the arcs in the stream network have been buffered using vector, as well as raster, GIS functions to extract riparian zone for stream reaches. To explore the land-water interactions across the riparian zones, different riparian zones have been extracted with widths in multiples of 30 m. As dictated by the resolution of land cover data, variables are extracted for 30 m, 90 m, and 150 m wide riparian zones.78

4.5 LAND USE AND LAND COVER

The literature review underscored the need for using land use and land cover data of high resolution (Graham et al., 1991). Land use and land cover data from two different sources are available for this study. The first land use and land cover dataset, from Gordon and Majumder (2000), is classified from four Landsat TM images from the period August, 1991- October, 1993. They further enhanced the classification by incorporating ancillary data on urban areas from TIGER/Line and Census 1995 files.

The other land use dataset is obtained from the National Land Cover Data (NLCD

1992) database. The data for Ohio is part of a nationally consistent land cover data set

78 30 m, 90 m, and 150 m correspond to 30 m-grid riparian row widths of 1, 3, and 5 cells, respectively, on each side of the stream channel. 89 from Multi-Resolution Land Characterization (MRLC) data.79 The land-cover mapping

scheme of this data uses a modified Anderson level II classification for the conterminous

United States. In addition to satellite data, scientists used a variety of supporting

information including topography, census, agricultural statistics, soil characteristics, other

land cover maps, and wetlands data to determine the land cover types at 30 m resolution.

Twenty-one classes of land cover are mapped, using consistent procedures for the entire

U.S. and a subsequent accuracy assessment was performed.80 The Ohio land cover data is from 20 Landsat images from different time periods between 1988-94.81

Lammert and Allan (1999) use aerial photographs with a resolution of 100 m.

Wang et al. (1997), Richards and Host (1994), and Wang and Yin (1997) use 1:250,000

land cover dataset from the USGS with a resolution of 400 m (200 m for urban areas).

Both the available land use/cover datasets compare favorably with the landscape data

used in the studies mentioned above. However, the NLCD dataset is potentially more

useful for a study requiring detailed knowledge for small areas. The NLCD dataset has

Anderson Level II classification, as compared to the Anderson Level I for the other

dataset. The other land use dataset (from Gordon and Majumder, 2000) does not cover

the entire study area, missing the northeastern coastal area near Lake Erie, as well as

some areas just south of Columbus. Note that Johnson et al. (1997) and Richards et al.

(1996) use aerial imagery from the 1970s with 61 m resolution and 90% accuracy. An

79 The MRLC consortium consists of federal agencies, which include the USGS, USEPA, the U.S. Forest Service (USFS), and the National Oceanic and Atmospheric Association (NOAA). 80 For more details on the classification process for the NLCD data, see Vogelmann (2001). 81 This description is from the readme file accompanying the original data. The average year of a scene is 1991. The base data set for this project was leaves-off Landsat TM data, nominal-1992 acquisitions. Other ancillary data layers included leaves-on TM, USGS 3-arc second Digital Terrain Elevation Data (DTED) and derived slope, aspect and shaded relief, Bureau of the Census population and housing density data, USGS land use and land cover (LUDA), and National Wetlands Inventory (NWI) data, if available. Other additional data sets included STATSGO soils information and Ohio Wetland Inventory (OWI) data. 90 accuracy assessment is not available for the other dataset mentioned earlier in this

section. On the other hand, the NLCD data for Ohio used in this research is known to be

85% accurate for seven major Anderson Level I classes.82

There are some general differences between the NLCD and Anderson Level II

classification. The Commercial, Industrial, and Transportation classes are separate in the

Anderson Level II classification. The disaggregation of the “residential” Anderson Level

II category into low-intensity and high-intensity residential classes in NLCD data may be

potentially very useful, as the impact of urban areas may be gauged by the density of

residential land use. Similarly, Anderson Level II category of “cropland and pasture” is

further subdivided into four detailed classes in the NLCD data – pasture/hay, row crops,

small grains, and fallow. This may be significant for this study in capturing the impacts

of various farming activities because crop production is the predominant use of land in

Ohio.83 Johnson et al. (1997) suggest the use of crop type as a predictor variable. See

Table 4.3 for the names of available land cover categories. For the definition of all major

land use and land cover categories, see Appendix A.

82 Howard (2002), personal communication. Accuracy assessment is done by regions, not by states. Ohio is in Region 5. Accuracy is 58% for Anderson Level II 18 classes. Digital Orthophoto Quarter Quadrangles (DOQQs) are used as reference data in the Midwest. A two-stage sampling design is used. First, a 6km-by-6km primary sampling unit (PSU) is randomly selected from each 60 km x 30 km tessellation cell dividing the region. Then secondary sampling units (SSUs) are chosen by randomly selecting 100 sample pixels for each class across all PSUs. 83 Crop production occupied 47% of Ohio’s land area, as cited in Rankin (1995, p. 195). 91

Key Land Cover Classification Water 11 Open Water 12 Perennial Ice/Snow Developed 21 Low Intensity Residential 22 High Intensity Residential 23 Commercial/Industrial/Transportation Barren 31 Bare Rock/Sand/Clay 32 Quarries/Strip Mines/Gravel Pits 33 Transitional Forested Upland 41 Deciduous Forest 42 Evergreen Forest 43 Mixed Forest Shrubland 51 Shrubland Non-natural Woody 61 Orchards/Vineyards/Other Herbaceous Upland 71 Grasslands/Herbaceous Herbaceous Planted/Cultivated 81 Pasture/Hay 82 Row Crops 83 Small Grains 84 Fallow 85 Urban/Recreational Grasses Wetlands 91 Woody Wetlands 92 Emergent Herbaceous Wetlands

Table 4.3: NLCD Land Cover Classification System Key (Rev. July 20, 1999)

92 Figures 12 and Figure 13 together give a general idea of the geographic distribution and relative proportion of the study area in different land cover categories, respectively. Almost 60% of the study area is in row crops land use. Pasture is a category of agricultural use and constitutes almost 19% of the study area. Forests make up about 13% and approximately 5% of the area is developed.84

Land use data poses some statistical and temporal problems. It might be highly inter-correlated at all scales because of competitive displacement, if all land uses together constitute 100% (Lammert and Allan, 1999). It is also not uncommon to have a very low or high representation of a land use or land cover class in some spatial units (e.g.,

Richards et al.,1996, have up to 96% area of some units in row crops, and Lammert and

Allan, 1999, have only 1% - 9% urban area). Land use data may also be temporally inconsistent with the QHEI data, as shown in Figure 4.14. The figure also indicates that a single spatial unit may be composed from multiple temporally incongruent images.

Ideally, habitat sampling and land cover data would be available for all the desired locations for the same year. This may enable modeling based on single-year land use data from the past, as depicted by the hypothetical function in Figure 4.15.

According to the figure, a hypothetical time lag (e.g., 1991 – 1995) may be required before any impact of land cover on habitat quality may be identified. For instance, a land use alteration in 1989 may provide time for stream habitat to recover before 1995, and an alteration in 1993 may not produce noticeable habitat impacts until after 1995.

84 Row crops include corn, soybean, vegetables, tobacco, and cotton. Pasture land cover includes areas planted for livestock grazing or hay crops. Forests generally include tree cover more than 6 m tall (less than 6 m is Shrub land). Developed land is defined as areas with greater than 30% of the surface in construction materials (e.g., concrete or asphalt). 93

Figure 4.12: Major land use/ land cover classes in the study area.

94

Figure 4.13: Major land cover types in the ECBP study area.85

85 The NLCD class “Herbaceous Planted/Cultivated” is shown in more detail as consisting of Pasture/Hay, Row Crops, and Urban/Recreational Grasses. 95

Land use classified from QHEI sample records 1988 and 1992 scenes in ECOS database

1993 Spatial unit

1990 August, 1994 July, 1994

1989

Land use classified from 1991 and 1994 scenes

Current database

Figure 4.14: Temporal inconsistency between, and across, QHEI and land use data

Stronger association using 1991 land use data

Hypothetical function Weaker association (too recent, habitatWeaker im aspactssocia tyetion to u sappear)ing 199 3 land use data (too soon to degrade Strength of linkage (hypothetical) between Weak association land use and habitat (too early, habitat Habitat quality measured impacts recovers before in 1995 1995)

1989 1991 1993 1995

Year of land use / land cover data

Figure 4.15: A hypothetical, optimal, temporal lag between land use alterations and their impact on habitat quality in one spatial unit

96 4.6 ROADS

Roads in this research have been derived from the USGS 7.5 minute quadrangles of 1:24,000 scale transportation data in Digital Line Graph (DLG) format. Richards and

Host (1994) have also used 1:24,000 DLG transportation data in their study of stream habitats. The dataset includes a network of arcs classified as primary routes, secondary routes, trails, streets, and footbridges. The density of the road network, defined as the total length of roads per unit area, is calculated using GIS. Roads in the database range from 0.6 m to 9.86 km. An average road segment is 302 m long (std. dev. = 385 m).

4.7 POPULATION AND HOUSING DENSITY

Considering the small size of the watersheds for this study, the finest resolution of population and housing data (e.g., census block level) is organized into a GIS database from Tiger/Line 1995 geography files and Summary Tape File (STF) 1990 data from the

Census. Census data has been used as proxy for urban areas in a few studies. Richards and Host (1994) use housing density calculated from 1:24,000 topographic maps. There are more than 95,000 census blocks in the database, with a total population of 3,868,590 in 1,536,954 housing units. The block population ranges from 0 to 7,625, and housing units range from 0 to 2,050. The average block population is 40 (std. dev. = 92), and on average, there are 16 housing units (std. dev. = 37) in a census block.

4.8 SPATIAL UNIT – LOCALSHED

As mentioned in Chapter 3, a custom-defined hydrologically contributing area is selected as the spatial unit for this study.

97 4.8.1 LOCALSHED BASED ON SPATIAL INDEPENDENCE

Under this strategy, spatially independent localsheds are generated from the topographic information in the DEM for each QHEI sample point. Using a GIS, the watershed polygons are generated only up to the location of the nearest upstream QHEI sample point. The GIS may not always generate a watershed for every QHEI point, probably due to two major factors - the relatively imperfect placing of the QHEI points in the local terrain, and local depressions or errors in the DEM.86 Also, there may be more polygons (e.g., due to islands) in the generated layer than there are QHEI points, due to the polygon topology storage mechanism in the GIS. After discarding spurious island polygons and localsheds of extremely small or large areas (e.g., below approximately

2,000 acres or above 200,000 acres), the dataset contains 225 watersheds between about

20,000 to 200,000 acres (31.25 – 312.50 square miles, or relatively medium-scale localsheds) in area and 125 watersheds between approximately 2,000 to 20,000 acres

(3.12 – 31.25 square miles, or relatively small-scale localsheds) in area.87 In terms of a hierarchical scheme, these medium- and small-scale localsheds are probably analogous to the 11-digit watersheds and 14-digit subwatersheds respectively, of the USGS classification scheme for hydrologic cataloging units shown in Appendix B.

To test the hypothesis that the size of a localshed may be associated with the

QHEI score, a correlation analysis is performed. The Spearman correlation coefficient, which measures the strength and direction of a linear relationship between QHEI and

86 Sampling site points were snapped to the most hydrologically optimum location relative to the DEM within a distance of 60 m (2 cells) from the sampling location. It helped discount for minor locational inconsistency between the DEM and streams data. This also helped increase the number of medium watersheds from 102 to 225. 87 10,000 grid cells of 30 m x 30 m size are approximately 2,222 acres in size. 98 localshed area (WSAREA), shows a weak and positive (coeff. = 0.110), but significant

(prob. = 0.040) relationship.88 Interestingly, a t-test to further test whether QHEI in the medium-scale localsheds is higher of the two samples (e.g., medium and small localsheds), indicates that QHEI scores may be higher, but only at marginally significant levels (prob. = 0.082).

Table 4.5 provides some descriptive statistics for the independent localsheds database.89 Relating with the narrative criteria for QHEI scores in the region, shown in

Table 4.1, the habitat quality of streams, on average, seems to be “good” (e.g., 63). The

average localshed defined in this design is about 5,500 acres (e.g., 8.6 sq. mi.) in size.

Note that the localshed size varies across two orders of measurement (e.g., 200s to

20,000s of acres) in range, on first- to sixth-order streams. On average, the sampled

stream of an independent localshed is of the third-order and has a sinuosity of 1.17.

There may be no population or housing in some independent localsheds, but all

localsheds in this sample have at least a few roads (e.g., WSRDEN average > 0). A

majority (e.g., 80%) of an average localshed is in relatively flat slopes (e.g., 0%-3%). On

average, only about 10% of the localshed area is developed (e.g., WSLRES, WSHRES,

and WSCIT), while about 85% is vegetated (e.g., WSDFOR, WSHAY, and WSROWC).

There are negligible amounts of area in mining or wetlands. This description seems

typical for this part of Ohio because most of this region is in agricultural land use. Coal

mining is practiced predominantly near the southeastern edge of Ohio.

88 Spearman’s correlation coefficient is the equivalent of Pearson’s correlation coefficient for nonparametric data. It tests for ranks rather than raw data, and is appropriate for data not satisfying the normality assumption (SPSS, 1999). 89 See Table 4.4 for variable code definitions for the variables displayed in Tables 4.5 and 4.6. 99

Variable Definition QHEI The score on Ohio EPA's Qualitative Habitat Evaluation Index at a site, between 1989 and 1995 SUBSTRAT Score on the Substrate metric of QHEI COVER Score on the Instream Cover metric of QHEI CHANNEL Score on the Channel Morphology metric of QHEI RIPARIAN Score on the Riparian/Bank Stability metric of QHEI POOL Score on the Pool metric of QHEI RIFFLE Score on the Riffle metric of QHEI GRAD_S Score on the Gradient metric of QHEI YEAR Year of sampling for QHEI WSAREA Area of localshed for a sample site, in acres WSTRLR Strahler stream order of the stream, derived from USGS 1:24,000 DLGs, on which QHEI sample exists WSSTRFRQ Localshed stream frequency (number of streams in the localshed for a sample site) WSSINAVG Average sinuosity of streams in the localshed for a sample site WSPSFREQ Localshed point source frequency (number of point sources in the localshed for a sample site) WSPOPDEN Area-prorated block-level population density in 1990, in the localshed, in number of people per sq. mile WSHUDEN Area-prorated block-level housing density in 1990, in the localshed, in number of housing units per sq.mi. WSRDEN Road density in the localshed for a sample site, in miles per sq. mile WSSL01 Slopes less than 1% in the localshed for a sample site, in % of localshed area WSSL12 Slopes between 1% and 2% in the localshed for a sample site, in % of localshed area WSSL23 Slopes between 2% and 3% in the localshed for a sample site, in % of localshed area WSSL34 Slopes between 3% and 4% in the localshed for a sample site, in % of localshed area WSSL45 Slopes between 4% and 5% in the localshed for a sample site, in % of localshed area WSSL56 Slopes between 5% and 6% in the localshed for a sample site, in % of localshed area WSSL60 Slopes more than 6% in the localshed for a sample site, in % of localshed area WSLRES Low intensity residential land use in the localshed in % of localshed area WSHRES High intensity residential land use in the localshed in % of localshed area WSCIT Commercial-Industrial-Transportation land use, in the localshed in % of localshed area WSMINE Quarries-Strip Mines-Gravel Pits land use in the localshed in % of localshed area WSTSNT Transitional land use in the localshed in % of localshed area WSDFOR Deciduous forest land cover in the localshed in % of localshed area WSEFOR Evergreen forest land cover in the localshed in % of localshed area WSMFOR Mixed forest land cover in the localshed in % of localshed area WSHAY Pasture-Hay land cover in the localshed in % of localshed area WSROWC Row Crop land cover in the localshed in % of localshed area WSLAWN Urban-Recreational Grasses land cover in the localshed in % of localshed area WSWWET Woody Wetlands land cover in the localshed in % of localshed area WSHWET Herbaceous Wetlands land cover in the localshed in % of localshed area Low intensity residential land use, in % of riparian area in 30 m strip on each side of the streams in the R30LRES90 localshed for a sample site Deciduous forest land cover, in % of riparian area in 30 m strip on each side of the streams in the R30DFOR localshed for a sample site Pasture-Hay land cover, in % of riparian area in 30 m strip on each side of the streams in the localshed for R30HAY a sample site Row Crop land cover, in % of riparian area in 30 m strip on each side of the streams in the localshed for a R30ROWC sample site

Table 4.4: Metadata for the major variables in the localshed-based GIS database

90 All the localshed-scale variables (e.g., prefixed with ‘WS’) are also derived for 30 m, 90 m, and 500 m (prefixed with ‘R30’, ‘R90’, AND ‘R500’, respectively) riparian zones, for both localshed designs. 100

Variable Mean Min. Max. Std. QHEI 63.4 8.5 100.0 18.2 SUBSTRAT 13.3 0.5 22.0 4.7 COVER 12.1 1.0 21.0 4.2 CHANNEL 13.2 0.0 20.0 4.5 RIPARIAN 5.7 1.0 10.0 1.9 POOL 8.0 0.0 12.0 3.1 RIFFLE 3.2 -1.0 8.0 2.5 GRAD_S 7.9 2.0 10.0 2.3 YEAR 93 89 95 2 WSAREA 5,515 223 22,057 5,143 WSSTRLR 3 1 6 1.40 WSSTRFRQ 24 1 168 27 WSSINAVG 1.17 1.03 1.70 0.09 WSPSFREQ 2 0 30 4 WSPOPDEN 473 0 8,000 912 WSHUDEN 189 0 3,881 393 WSRDEN 4.42 0.95 21.73 3.71 WSSL01 46.52 3.93 98.40 20.14 WSSL12 22.12 1.60 47.81 6.98 WSSL23 11.20 0.00 26.53 4.46 WSSL34 6.50 0.00 17.68 3.80 WSSL45 4.30 0.00 15.23 3.31 WSSL56 2.69 0.00 13.77 2.61 WSSL60 6.67 0.00 49.05 9.22 WSLRES 6.31 0.00 55.34 11.31 WSHRES 1.38 0.00 42.42 3.82 WSCIT 2.77 0.00 55.95 6.57 WSMINE 0.12 0.00 17.53 1.12 WSTSNT 0.10 0.00 12.37 0.82 WSDFOR 11.65 0.50 45.33 8.51 WSEFOR 0.24 0.00 5.57 0.55 WSMFOR 0.03 0.00 0.65 0.07 WSHAY 15.26 0.00 55.38 8.51 WSROWC 58.83 0.07 97.61 24.42 WSLAWN 2.05 0.00 54.06 4.94 WSWWET 0.37 0.00 5.08 0.58 WSHWET 0.13 0.00 2.78 0.22 R30LRES 4.12 0.00 52.64 9.27 R30DFOR 26.09 0.00 80.45 16.91 R30HAY 14.63 0.00 54.76 9.31 R30ROWC 43.35 0.00 99.69 25.55

Table 4.5: Descriptive statistics for 350 spatially independent localsheds.

101 4.8.2 LOCALSHED BASED ON SCALE

Under this strategy, a watershed for each QHEI sample point is analyzed

separately (e.g., independent of any upstream sampling sites), and truncated based on a

maximum flow-length of 1-mile upstream of each sampling site.91 This is done to

generate watersheds defining only the areas of local impact for habitat sampling locations

distributed across the ECBP ecoregion.92 The motivation for selecting the 1-mile

upstream flow length was not completely arbitrary. It is found that at this distance the

average area of upstream contributing areas approaches the average area of the smaller

localsheds of the earlier strategy. At small scales it is also likely that the spatial extent of

the generated localshed is spatially independent (e.g., non-overlapping) of any adjacent

localshed, ensuring that variables are not analyzed redundantly. A GIS program is

developed to study the inter-site upstream network distance between pairs of neighboring

QHEI sampling sites on the same stream network.93 As shown in Figure 4.16, the average inter-site upstream distance is almost two miles, with almost 60% of the sampling sites being farther than one mile from any other upstream site.94 Therefore, a maximum upstream flow-length limit of 1-mile is selected as a conservative limit. Note that this flow-length may be changed in a sensitivity analysis of localshed scale.

Fewer island and tiny polygons for QHEI sample point-based localsheds are removed in the final sample of 543 localsheds. This is due to the localshed generating strategy being independent from the inter-site sampling distance.

91 An area threshold of 247 acres (or 0.39 sq. miles, or 1,000,000 sq. m., or 1,000 cells) is also used. This means that the watershed is not truncated into a localshed if its original area is below this threshold. 92 Relative to the USGS classification of hydrologic units, such localsheds would be further below the smallest element in the hierarchy, the fourth level or the hydrologic cataloging units. 93 Using the network topology in NETWORK module of ArcInfo GIS. 94 Note that not all sampling sites have an upstream neighbor. 102 Table 4.6 shows that the overall general structure of the region is similar to the earlier design, with generally predominant agricultural land use in typically flat lands.

However, some interesting details also emerge. The area of an average equal-scale localshed (e.g., 345 acres or 0.54 sq. mi.) is almost a whole order of measurement less than that of the spatially-independent localsheds (e.g., 5,515 acres or 8.6 sq. mi.). This design seems to delineate contributing areas for sample sites on streams of higher average order (e.g., 4 versus 3) and greater sinuosity (1.21 versus 1.17), and higher surface slopes

(16% versus 9% area in greater than 5% slopes) than those in the previous strategy.

As compared to the previous design, there is a greater proportion of area in deciduous forest use (e.g., 18%). In the independent localsheds, on average, there is about five times the land in row crops usage as in deciduous forest. On the other hand, as shown in Figure 4.17, in the equal-scale strategy, the row crops to deciduous forest land use/cover ratio is less than 3. In the 30 m riparian zones, on average, there is more land in deciduous forest land cover (e.g., 36%) than in row crops land use (e.g., 29%). In summary, there is a somewhat better distribution of land cover in the equal-scale localsheds in the study area.

Figure 4.18 provides a visual overview of the relative scale of the localsheds used in different strategies for the Big Darby Creek watershed – a seventh-order basin in the

PEMSO stream system. It is evident that localsheds tend to smooth, to a certain extent, the differences between watersheds generated for identical sampling stations in the independent localshed strategy. To a large extent, their relatively small size also makes them spatially non-overlapping.

103

Figure 4.16: Histogram of inter-site network distance between neighboring QHEI samples.

Figure 4.17: Major land cover types in the independent and equal-scale localsheds. 104

Variable Mean Min. Max. Std. QHEI 64.85 8.50 100.00 17.31 SUBSTRAT 13.57 0.50 23.00 4.60 COVER 12.45 1.00 21.00 4.03 CHANNEL 13.43 0.00 20.00 4.41 RIPARIAN 5.76 1.00 10.00 1.87 POOL 8.34 0.00 12.00 2.87 RIFFLE 3.36 -1.00 8.00 2.52 GRAD_S 7.97 2.00 10.00 2.19 YEAR 93 89 95 2 WSAREA 345 104 645 111 STRLRDLG 4 1 7 2 WSSTRFRQ 5 1 22 4 WSSINAVG 1.21 1.00 3.42 0.19 WSPSFREQ 1 0 40 2 WSPOPDEN 464 0 5,646 861 WSHUDEN 191 0 2,913 384 WSRDEN 5.12 0.00 24.93 4.27 WSSL01 32.38 1.89 99.90 19.54 WSSL12 22.44 0.10 51.10 7.87 WSSL23 13.93 0.00 29.50 5.19 WSSL34 9.10 0.00 21.09 4.36 WSSL45 6.42 0.00 22.71 3.91 WSSL56 4.17 0.00 15.36 3.09 WSSL60 11.57 0.00 73.68 13.41 WSLRES 6.23 0.00 59.36 11.15 WSHRES 1.51 0.00 35.13 3.92 WSCIT 4.23 0.00 82.27 10.10 WSMINE 0.10 0.00 16.39 1.06 WSTSNT 0.15 0.00 21.38 1.42 WSDFOR 17.78 0.00 75.56 12.44 WSEFOR 0.31 0.00 8.87 0.76 WSMFOR 0.04 0.00 1.55 0.13 WSHAY 16.59 0.00 52.07 11.32 WSROWC 47.24 0.00 100.00 25.41 WSLAWN 2.66 0.00 71.43 7.04 WSWWET 0.73 0.00 11.04 1.21 WSHWET 0.31 0.00 4.94 0.55 R30LRES 3.16 0.00 57.45 7.83 R30DFOR 36.52 0.00 100.00 21.48 R30HAY 12.35 0.00 60.16 11.63 R30ROWC 28.59 0.00 100.00 23.51

Table 4.6: Descriptive statistics for 543 equal-scale localsheds.

105

Figure 4.18: Relative size and location of small- and medium-scale independent localsheds and equal-scale localsheds, in the Big Darby Creek Basin of the ECBP study area.

106 4.9 CORRELATION ANALYSIS

The linear relationship between QHEI and major landscape and geomorphology variables in both the localshed-based designs is shown in Table 4.7. Clearly, the equal- scale design controls for the variation in area, and there is evidently little linear association between localshed area and QHEI scores. Note that there is a marginally positive and almost significant (prob. = 0.069) relationship in the independent localshed design. Higher stream orders (e.g., WSSTRLR > 2) relate to higher QHEI scores across both designs. Similarly, flatter slopes (e.g., < 1%) seem to be associated with lower habitat quality, and relatively steeper slopes (e.g., > 6%) indicate better QHEI values.

Interestingly, the sinuosity of streams does not seem to have a stable or significant association with QHEI across both designs. This may probably be due to the relatively coarse scale and accuracy of digitization for the stream network. The density of various urban factors – population, housing, and roads – does not exhibit predictable association

(e.g., negative) with QHEI. For instance, the localshed with the largest area-prorated population (e.g., 99,318) contains the city of Dayton, and the stream habitat quality at its mouth is very good (QHEI = 76 in 1993). This seemingly anomalous observation may be explained partly by the fact that population and housing may be used only as proxies and do not measure the direct impact of urbanization on habitat quality.

Many landscape variables exhibit predictable, strong, and statistically significant relationships with QHEI. Deciduous forests tend to be positively (coeff. = 0.452) associated with habitat, while row crops (coeff. = -0.375) relate to adverse habitat quality.

Note that the coefficient for different widths of riparian zone sheds some light on the region of influence for different land cover or land use types. While row crops show

107 relatively stronger negative relationships in narrower riparian strips (e.g., -0.490 in 30 m, while -0.475 in 90 m and –0.379 in 500 m riparian area), deciduous forests have stronger positive association with QHEI at broader spatial scales (e.g., 0.479 in 90 m and 0.456 in

500 m, while 0.417 in 30 m riparian zone). Overall, almost all variables seem to have stronger and significant linear relationships with QHEI in the equal-scale strategy, as compared to the independent localshed design.

108

Variable QHEI for independent QHEI for equal-scale localsheds localsheds (N=350)* (N=543) 96 Coefficient95 Probability Coefficient Probability WSAREA 0.097 0.069 0.036 0.406 WSSTRLR 0.277 0.000 0.313 0.000 WSSTRFRQ 0.118 0.027 0.095 0.026 WSSINAVG -0.030 0.577 0.039 0.360 WSPSFREQ 0.112 0.037 0.018 0.682 WSPOPDEN 0.030 0.571 0.026 0.547 WSHUDEN 0.032 0.556 0.023 0.598 WSRDEN 0.090 0.093 0.103 0.016 WSSL01 -0.276 0.000 -0.396 0.000 WSSL12 -0.072 0.177 -0.154 0.000 WSSL23 0.167 0.002 0.124 0.004 WSSL34 0.264 0.000 0.275 0.000 WSSL45 0.278 0.000 0.335 0.000 WSSL56 0.273 0.000 0.344 0.000 WSSL60 0.292 0.000 0.353 0.000 WSLRES 0.037 0.486 0.012 0.784 WSHRES 0.021 0.690 -0.004 0.924 WSCIT 0.019 0.728 -0.001 0.980 WSMINE 0.089 0.098 0.093 0.030 WSTSNT 0.045 0.399 0.054 0.211 WSDFOR 0.326 0.000 0.452 0.000 WSEFOR 0.201 0.000 0.224 0.000 WSMFOR 0.196 0.000 0.178 0.000 WSHAY 0.249 0.000 0.206 0.000 WSROWC -0.263 0.000 -0.375 0.000 WSLAWN 0.068 0.206 0.026 0.543 WSWWET 0.012 0.825 -0.015 0.718 WSHWET 0.104 0.053 0.132 0.002 R30LRES -0.066 0.220 -0.104 0.015 R30DFOR 0.409 0.000 0.417 0.000 R30HAY 0.191 0.000 -0.041 0.339 R30ROWC -0.394 0.000 -0.490 0.000 R90LRES -0.052 0.327 -0.057 0.183 R90DFOR 0.458 0.000 0.479 0.000 R90HAY 0.250 0.000 0.095 0.026 R90ROWC -0.391 0.000 -0.475 0.000 R500LRES 0.014 0.796 -0.006 0.893 R500DFOR 0.393 0.000 0.456 0.000 R500HAY 0.241 0.000 0.197 0.000 R500ROWC -0.298 0.000 -0.379 0.000 * Coefficients for theoretically and statistically significant (at the 5% level) variables are shown in bold.

Table 4.7: Correlation between QHEI and major variables in localsheds.

95 Pearson’s correlation coefficient. 96 Significance (2-tailed). 109 4.10 FINAL DATABASE

After delineating the contributing areas for QHEI sampling locations, data is generated for more than 25 variables for each localshed polygon by using common GIS data extraction techniques, such as buffering, overlaying, and clipping. All data attributes are stored for each polygon in separate physical directories. Each directory is then analyzed and summarized statistically before producing the final dataset.

A summary of the major variables used in the research, the major sources of data, derivation methods, and other related attributes (e.g., year, accuracy), is provided in

Table 4.8. Figure 4.19 graphically portrays the major variables assembled for each spatial unit in the GIS database. The whole process of data generation, management, and retrieval, is computation- and space-intensive and has been automated completely using

Arc Macro Language (AML) and Avenue programming.97 This study confirms that research at detailed scales over large study areas also requires the knowledge and availability of a variety of computer applications.98

97 For instance, the process to generate the final data for localsheds takes more than 20 hours, and results in about 277,000 files in 18,000 separate directories taking about 232 MB of space. The same process takes 8 hours, more than 6000 intermediate coverages (27 variables x 225 polygons), and 334 MB for medium watersheds. Avenue is the scripting language for ArcView GIS; AML is the scripting language for Arc/Info GIS. 98 Data were collected, analyzed, and stored on Sun Solaris Unix, Microsoft Windows NT 4.0 and Windows 2000 machines. Most GRID-based GIS analysis is performed initially on Unix workstation with Arc/Info 7.3.1, and later with ArcInfo 8.0.2 Workstation on Windows NT workstation. Initially, ArcView 3.2 or earlier was used for visualization. Later, the newer ArcInfo 8x Desktop is used for routine GIS tasks. Documentation is done in Microsoft Word 2000 or earlier. Statistical analysis is done in SPSS for Windows 10.x, and Microsoft Excel 2000 or earlier. Database tasks are carried out in Microsoft Access 2000 or earlier. WinZip 8.0 or earlier, and WS_FTP 5.0 or earlier, software are used for data and file compression and transfers. 110

DATA LAYER SOURCE Digital elevation - 30 m x 30 m cell resolution, derived from USGS 1:24,000 DLG model (DEM) hypsography files in 7.5min quadrangles.99 - From 1960s.100 Horizontal resolution is about 10 m. Vertical resolution is about 7-15 m for base contour data. - Fill for spurious sinks. - Use stream network to burn hydrology into the topography. Biocriteria - Generated using longitude, latitude in Ohio EPA ECOS database for sampling sites ECBP ecoregion, in the period 1989-1995. - Snap to optimum location on the nearest stream. Spatial unit - Derived for water quality sampling sites using DEM, point locations, and GIS functions and algorithms. - Generate using multiple points for spatially independent localsheds. - Generate separately for each point, and then truncate, for equal-scale localsheds. Streams - From USGS 1:24,000 DLG hydrography files in 7.5-min quadrangles. - From 1960s. Resolution is about 10 m. - Each arc is manually analyzed for direction and connectivity, and flipped if necessary to produce consistent topology for hydrologic functions. Land use - From NLCD, originally classified from 30 m resolution Landsat TM images from 1988 – 1994, with 23 Anderson Level II categories. - Accuracy 58% for Level II, and 85% for Level I categories. Roads - From USGS 1:24,000 DLG transportation files in 7.5min quadrangles. - From 1960s. Resolution is about 10 m. Population and - From Bureau of Census TIGER/Line95 files and block-level Census 1990 Housing Units database in STF1B files. Sinuosity - Derived programmatically from streams network. - From 1960s. - Unknown accuracy. Slopes - Derived from DEMs using GIS. - 30 m x 30 m cell resolution.

Table 4.8: Summary of database layers and sources.

99 Using TOPOGRID tool in ArcInfo GRID module. 100 On average, the 1:24,000 DLG data for Ohio is 37 years old (Hickman, 2002, personal communication). 111

Figure 4.19: Major basic data layers for each study area unit.

112 4.11 DISCUSSION

The detailed description of data suggests that there may be multiple sources of uncertainty regarding the datasets. Ecoregions may have residual background noise due to heterogeneity detectable only at the subecoregional scale. The boundary of hydrologically contributing areas may depend on the resolution and accuracy of the

DEM. DEM extraction from contour lines involves interpolation of elevation values.

The accuracy of the DEM may depend on: the accuracy of the base contour data, the selected grid cell size, the accuracy of the streams data used to burn streams, the interpolation of elevation values across the boundary of regions mosaicked together, the values of various parameters used in the process, and common human errors during manual processing of digital data.

Geology data has not been incorporated in the final database because of the unavailability of fine-scale data (e.g., 1:24,000 or better) for the whole study area spanning multiple counties. It is determined that coarse-scale data (e.g., 1:250,000 scale

STATSGO soils) may not provide the desired level of variability in the small-size spatial unit of this research.

In the present scheme, patterns and inter-relationships between variables may be obscured due to the noise generated by data from different time-periods. Note that, except land use/cover and demographic data, all other data, on average, are directly or indirectly from the 1960s.

The NLCD dataset represents land use and land cover conditions from the period

1988 - 1994. It is not known whether this is a suitable time period for the land use and land cover impacts to manifest in terms of quantifiable changes in habitat quality in 1992-

113 1994. Neither the habitat, nor the land use and land cover data, is available for the whole study area for any single year, as suggested in Figures 4.14 and 4.15. The satellite images used in the NLCD land cover data covers a range of 1988 - 1994, and changes that have taken place across the landscape between the land use image snapshots may not have been captured accurately. According to Allan et al. (1997), transitional land may show up as forest or woody land use because it may be in a temporary phase between past agricultural and future urban use.

The results of statistical modeling presented in the next chapter must be interpreted in light of the assumptions and limitations of the methodology (e.g., discussed in Chapter 3) and the database (e.g., discussed here).

Computers are useless – they can only give you answers. - Pablo Picasso

114

CHAPTER 5

REGRESSION ANALYSIS

There is nothing remarkable about it. All one has to do is to hit the right keys at the right time, and the instrument plays itself. - Johann Sebastian Bach

This chapter explains the regression models built upon the foundation laid in the

previous chapters. The context of this research is reviewed in Chapter 2, the conceptual

model and methodology is studied in Chapter 3, and the attributes of the GIS database are

explored for patterns and relationships with stream habitat quality in Chapter 4.

Linear regression results are analyzed for both the spatially independent as well as

equal-scale localsheds. To keep the interpretations simple and intuitive, transformation

of variables has not been pursued. Considering the relatively large sample sizes, non-

normality of explanatory variables is probably not a major problem (Lewis-Beck, 1993,

p. 22; Hair et. al., 1998, p.605; Fox, 1993, p.282).101 Many models are built and studied

using SPSS at each stage. However, only a few are discussed here. Models are selected

based on the practical and statistical significance (e.g., at the 0.05 level) of the

101 Using the central-limit theorem, which states that the distribution of a partial slope coefficient approaches normality as the sample sizes increases. Hence, the assumption regarding the normality of the error-term may be ignored as the error-term may be conceptually equivalent to the set of missing independent variables (Berry and Feldman, 1993, p.161). 115 explanatory variables, the value of the coefficient of determination (e.g., R2), the

multicollinearity between the variables, and the value of the standard error of the

estimate.102 The regression statistics are explored in the context of the major assumptions

regarding the inference of the results of linear regression from a sample to the population,

as mentioned in Chapter 3.103 The explanatory variables in the model are evaluated for

multicollinearity using the variance inflation factor (VIF).104

In Chapter 3, the following general model is specified:

S H = β0 + (βLU *LUR) + (βLC *LCR) + (βGM *GMR) + (βSS *SS) + u (5.1)

where HS is the true habitat quality at a fixed location on a stream with a

contributing area of scale S; βi is the true partial regression coefficient, or the strength of

the impact of factor i while simultaneously controlling for all the other stressor factors;

LU is a vector of land use factors (e.g., agriculture, residential); LC is the vector of land

cover factors (e.g., forest); GM is the vector of geomorphology factors (e.g., watershed

valley slope); SS is the vector of stream-specific factors (e.g., sinuosity, flow); d is the flow length (e.g., the network distance along the path of flow from the unit cell to the stream habitat sampling site); subscript R denotes whether the landscape factor is

102 Due to the sheer number of variables in the database, the ‘stepwise’ regression method of SPSS is sometimes used as a starting point (not blindly), to gradually include variables that add significantly to the overall model. At each step, existing variables may become eligible for removal from the model. In this research, entry probability of 0.01 and removal probability of 0.05 is used. Johnson et al. (1997) used a similar technique, the MAXR method, available in SAS software. 103 Potential transformations are explored in residual-based plots, not in scatter plots of raw data, because residual-based plots depict the partial stressor-response relationship in the presence of, and controlling for, the stressor variables (Fox, 1993, p.296). Scatterplots, on the other hand, present the marginal relationship between the dependent and a single independent variable. 104 Tolerance is the inverse of VIF. Tolerance is the proportion of variability of an independent variable not explained by the other independent variables, when it is regressed against the other variables (Hair et. al., 1998, p.193). A VIF of 5.0 corresponds to a tolerance of 0.20 indicating high multicollinearity as about 90% of the variability of a variable is explained by other independent variables (Hair et. al., 1998, p. 193). 116 measured at the riparian scale or over the contributing area; and u is the vector of missing, or unknown, factors in the model.

The model is estimated using an Ordinary Least Squares approach:

S HE = b0 + (bLU *LUR) + (bLC *LCR) + (bGM *GMR) + (bSS *SS) (5.2)

S where HE is the estimated habitat quality at a fixed location on a stream with a

contributing area of scale S; bi is the estimated partial regression coefficient, or the strength of the impact of factor i while simultaneously controlling for all the other stressor factors.

Two different type of spatial units are derived using GIS, as described in detail in

Chapter 3. The database assembled for these spatial units consists of various landscape, anthropogenic, and geomorphological variables, as described in Chapter 4. The relationship of these stressor variables with QHEI is studied in the empirical models presented below.

5.1 MODELS FOR SPATIALLY INDEPENDENT LOCALSHEDS

It may be recalled that spatially independent localsheds are derived using GIS and delineate watersheds up to the next upstream QHEI sampling location, which becomes the mouth for another spatially independent localshed, and so on.

The results from the model in Table 5.1 highlight many interesting issues. First, localshed area (e.g., WSAREA) is statistically significant in explaining stream habitat quality. Also, QHEI seems to increase with average localshed size. Specifically, controlling for all other independent variables (e.g., keeping them in the model, but at a constant value) with an increase in the localshed area of about 10,000 acres (e.g., about

117 16 square miles), the QHEI score, on average, may be expected to increase by 9 points.

This can be explained on account of the close relationship between watershed area and stream order. Lower localshed areas may be a proxy for headwaters, which are generally more susceptible to ecological stresses. Additionally, they may even be localsheds on higher order streams, but may have been delineated as small owing to the higher sampling density around areas of known impacts or urbanization.

Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 67.5 16.266 0.000 R90DFOR 0.256 0.200 3.002 0.003 2.249 R30ROWC -0.322 -0.452 -5.916 0.000 2.951 R30LRES -0.443 -0.225 -4.097 0.000 1.530 WSAREA 0.0009 0.243 4.648 0.000 1.379 WSSINMAX 4.170 0.084 1.679 0.094 1.258 F = 32.242 Prob. = 0.000 df = 349 R2 = 0.319 Adj. R2 = 0.309 Std. Error of Estimate = 15.117

Table 5.1: Linear regression for independent localsheds using land use, land cover, and localshed area and maximum reach sinuosity variables.

Second, deciduous forest land cover in 90 m strips (e.g., R90DFOR) on each side of the stream is significantly and positively related to habitat quality. As discussed in

Chapters 2 and 3, deciduous forests in the riparian zone provide instream cover, stabilize the soil, and reduce overall sedimentation-related impacts.

Third, both row crops (e.g., R30ROWC) and low-density residential land use

(e.g., R30LRES) in the 30 m riparian strip have a statistically significant adverse impact on stream habitat. Row crops, especially in the areas close to the stream channels, cause increased sedimentation. Low-density residential use (e.g., single-family housing) in the

118 riparian zone probably has an adverse effect due to deforestation and increased impervious surface (e.g., roof-tops, pavements, roads, parking lots).

Fourth, as indicated by the standardized coefficients, riparian row crops is the stressor with the strongest impact on QHEI in this model. For instance, keeping other variables constant, a 10% increase in the riparian area in row crops land use is associated with an average drop in the QHEI score of more than 3 points.

Fifth, the reach sinuosity of the most sinuous stream (e.g., WSSINMAX) in the localshed of the QHEI sampling site has a positive effect on the habitat quality. Note that this variable is not significant at the 5% level. However, it is still presented here because its statistical significance is not very low and its theoretical significance warrants a closer look at the resolution of the stream data.

Finally, deciduous forests are more effective in their relationship with QHEI at relatively larger riparian scales (e.g., 90 m), while the adverse impact of agricultural land use (e.g., row crops) is more pronounced at relatively smaller scales (e.g., 30 m).

Overall, the model explains just less than a third of the variation in QHEI scores (e.g., adjusted R2 ~ 31%), but all the explanatory variables, except reach sinuosity, are theoretically and statistically significant in explaining stream habitat quality. The overall model is also significant at the 1% level.

A pattern (e.g., QHEI outliers are along the ridge separating basins draining into the Ohio River and Great Lakes) is identified in Chapter 4 when negative QHEI outliers

(e.g., < 25) are mapped. This suggests that the headwater streams flowing to the north and south of the ridge are prone to habitat degradation. Since localshed size is typically

119 positively correlated with Strahler stream order, another model is built incorporating a dummy variable for headwater streams replacing the variable for watershed area.

In Table 5.2, STRAH12 is a dummy variable (e.g., coded as 1 if stream order is 1 or 2; as 0 if order is 3 or more) representing headwater streams. As expected from the review of relevant literature in Chapter 2, headwater streams are statistically significant and negatively correlated with habitat quality in this model. Other results are along the lines of the previous model. Riparian residential and agricultural use have a significant negative influence on stream habitat. Forested land in the 90 m riparian zone has a positive impact on QHEI scores at the mouth of the localshed. Relative to other stressors in the model, row crops in the 30 m riparian area is the most significant variable (e.g., standardized coefficient = -0.282).

Tables 5.3 and 5.4 explore interaction effects between the strongest stressor (e.g., riparian row crops) and stream-specific and geomorphological factors (e.g., headwaters and localshed scale). Both results are in line with expectations. Riparian forest land cover and high reach sinuosity have a significant positive influence on QHEI, while riparian row crops and low-density residential land use adversely impact the habitat quality. Row crops in 30 m riparian buffers is the strongest stressor variable.

Additionally, Table 5.3 suggests that row crops in headwater riparian zones exert a greater negative influence (e.g., coeff. = -0.215 + (-0.069) = -0.284) than those in streams lower in the network hierarchy (e.g., coeff. = -0.215). In other words, keeping other stressors constant, the QHEI score in a headwater localshed (e.g., reach of Strahler order 1 or 2) is, on average, 32% lower than the QHEI score in a higher order localshed.

120 In Table 5.4, the localshed area is transformed into a dummy variable (e.g.,

WSISMED; it is coded as “0” for areas between about 3.5 to 35 square miles, and “1” for the range 35 – 350 square miles). The results indicate that riparian row crops in smaller

(e.g., small-scale) localsheds have a significantly greater adverse impact on stream habitat (e.g., coeff. = -0.347), as compared to larger (e.g., medium-scale) localsheds

(coeff. = -0.347 + 0.110 = -0.237). In other words, controlling for all other variables, row crops in the proximity of streams are about 46% more detrimental to stream habitat in areas with relatively small hydrologically contributing areas (e.g., below approximately

35 square miles).

The model in Table 5.5 is an extension to the model in Table 5.1. The results from Table 5.5 again show familiar trends. There is a significantly positive association between QHEI and riparian forest land cover, and maximum reach sinuosity, and significantly negative impact of riparian row crops and low-density residential land use on stream habitat quality. Larger localsheds seem to have a significant beneficial impact on QHEI (e.g., on average, a medium-scale localshed in this study tend to have a QHEI score at least 6 points higher than a small-scale localshed).

121 Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 68.413 15.716 0.000 R90DFOR 0.307 0.240 3.487 0.001 2.210 R30LRES -0.308 -0.157 -2.714 0.007 1.551 R30ROWC -0.201 -0.282 -3.717 0.000 2.673 STRAH12 -4.020 -0.110 -2.252 0.025 1.100 F = 29.936 Prob. = 0.000 df = 349 R2 = 0.258 Adj. R2 = 0.249 Std. Error of Estimate = 15.762

Table 5.2: Linear regression for independent localsheds using land use, land cover, and stream order.

Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 58.817 11.512 0.000 R90DFOR 0.262 0.205 2.990 0.003 2.257 R30LRES -0.361 -0.184 -3.280 0.001 1.508 R30ROWC -0.215 -0.302 -3.696 0.000 3.204 WSSINMAX 7.441 0.149 3.061 0.002 1.147 STRAH12*R30ROWC -0.069 -0.112 -1.984 0.048 1.544 F = 27.359 Prob. = 0.000 df = 349 R2 = 0.285 Adj. R2 = 0.274 Std. Error of Estimate = 15.496

Table 5.3: Linear regression for independent localsheds using land use, land cover, and reach sinuosity, with an interaction between headwater streams and riparian row crops.

Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 60.768 11.699 0.000 R90DFOR 0.269 0.210 3.094 0.002 2.246 R30LRES -0.376 -0.191 -3.441 0.001 1.502 R30ROWC -0.347 -0.488 -5.499 0.000 3.820 WSSINMAX 6.620 0.133 2.693 0.007 1.184 WSISMED*R30ROWC 0.110 0.184 2.696 0.007 2.259 F = 28.280 Prob. = 0.000 df = 349 R2 = 0.291 Adj. R2 = 0.281 Std. Error of Estimate = 15.423

Table 5.4: Linear regression for independent localsheds using land use, land cover, and reach sinuosity, with an interaction between larger localsheds and riparian row crops.

122 In Table 5.6, a model is specified just for small-scale independent localsheds.

Again, riparian row crops is the strongest stressor, which has a stronger influence in

headwater reaches. Riparian low-density residential land use has a significant negative

impact on QHEI. A new variable, WSSL60 (e.g., percentage of localshed area in slopes

above 6%), has a positive impact on small-scale localsheds. Higher slopes are typically

associated with increased potential for erosion and sedimentation. However, in the

remarkably flat ECBP ecoregion, relatively higher slopes (e.g., above 6%) might be

associated with a better gradient of streambed particles, as well as increased potential for

instream cover.105 The variables also seem to have reasonable VIF values, which

indicates multicollinearity is probably not a serious problem.106 This model explains a

relatively better amount of variation in QHEI (adjusted R2 = 43%), with improved

precision (standard error of estimate = 13.91).

Next, the Chow test is employed to test whether there is a structural difference in

the habitat-stressor relationship at different spatial scales (after Gujarati, 1995, p.263):

Large Larger units: H = α0 + (αROWC *ROWCR30) + (αLRES *LRESR30) + u (5.3)

Small Smaller units: H = β0 + (βROWC *ROWCR30) + (βLRES *LRESR30) + u (5.4)

The test statistic is based on residual sum of squares from three models (e.g., small, large, and combined scales). It is significant at the 5% level, indicating that the signals from habitat processes may be interpreted more clearly at different scales.107 In the next set of regression models, the heterogeneity on account of the scale of hydrologically contributing areas is experimentally controlled using GIS-derived scales.

105 An identical model with WSSL01 (e.g., <1% slopes) explains 42% of the variation in QHEI, but with a p value of 0.082 (not significant at 5% level) for the partial regression coefficient for WSSL01 of –0.116. 106 The VIF numbers are typically higher for interaction terms, and variables involved in interactions. 107 [F(3, 344) = 7.8] > [F(critical-at-5%) = 2.64]. 123

Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 58.906 11.771 0.000 R90DFOR 0.275 0.216 3.188 0.002 2.244 R30ROWC -0.290 -0.407 -5.340 0.000 2.848 R30LRES -0.390 -0.199 -3.590 0.000 1.505 WSISMED 6.536 0.172 3.364 0.001 1.290 WSSINMAX 5.656 0.114 2.271 0.024 1.229 F = 29.400 Prob. = 0.000 df = 349 R2 = 0.299 Adj. R2 = 0.289 Std. Error of Estimate = 15.335

Table 5.5: Linear regression for independent localsheds using land use, land cover, and reach sinuosity, and localshed area as the dummy variable.

Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 72.974 23.715 0.000 R30ROWC -0.184 -0.253 -2.102 0.038 3.156 R30LRES -0.577 -0.330 -4.566 0.000 1.138 WSSL60 0.336 0.184 2.406 0.018 1.278 STRAH12*R30ROWC -0.204 -0.313 -2.714 0.008 2.899 F = 24.448 Prob. = 0.000 df = 124 R2 = 0.449 Adj. R2 = 0.431 Std. Error of Estimate = 13.910

Table 5.6: Linear regression for small-scale independent localsheds using land use, land cover, and localshed area and maximum sinuosity variables.

124 5.2 MODELS FOR EQUAL-SCALE LOCALSHEDS

In Chapter 3 a new concept is introduced, which suggests that we can experimentally control for the variation in area by using GIS to derive hydrologically contributing areas of a custom-defined scale (at least above a certain threshold). Such watersheds are called equal-scale localsheds in this study.

For comparison, a model is built identical to that in Table 5.2 and regressed for partial regression coefficients. Table 5.7 shows similar sign and statistical significance for variables, as compared to the identical model in the sample of spatially-independent localsheds. The explanatory strength of both riparian row crops and headwater reaches in the model has increased considerably, at the expense of the relative strength of low- density residential use.

Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 67.854 30.103 0.000 R90DFOR 0.227 0.240 5.221 0.000 1.722 R30LRES -0.201 -0.091 -2.163 0.031 1.447 R30ROWC -0.265 -0.359 -7.637 0.000 1.809 STRAH12 -5.796 -0.147 -3.909 0.000 1.158 F = 69.747 Prob. = 0.000 df = 542 R2 = 0.341 Adj. R2 = 0.337 Std. Error of Estimate = 14.113

Table 5.7: Linear regression for equal-scale localsheds using land use, land cover, and localshed area and maximum sinuosity variables.

Overall, the model seems to have gained from the GIS-based experiment of controlling for the scale of the spatial unit. There is a 36% improvement in the proportion of variation in QHEI explained by the explanatory variables in the model (e.g., adjusted R2 = 34% instead of 25%), and 12% improvement in prediction error (e.g., 125 standard error of the estimate = 14.113 instead of 15.762). It is evident from the VIF values that there is lesser multicollinearity between the independent variables in this model than in the identical model in the earlier section.

Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 74.375 32.328 0.000 R30ROWC -0.261 -0.354 -8.082 0.000 1.617 R90CIT -0.232 -0.105 -2.704 0.007 1.274 WSDFOR 0.260 0.187 4.308 0.000 1.583 WSSL01 -0.135 -0.153 -3.858 0.000 1.320 STRAH12 -6.209 -0.158 -4.457 0.000 1.056 F = 61.498 Prob. = 0.000 df = 542 R2 = 0.364 Adj. R2 = 0.358 Std. Error of Estimate = 13.881

Table 5.8: Linear regression for equal-scale localsheds using land use, land cover, and watershed slopes and headwater variables.

Exploring further with the objective of finding other factors with significant impacts on stream habitat quality, a model is built to estimate the impacts of slopes and riparian commercial-industrial-transportation land use. The results of this model are shown in Table 5.8. Riparian row crops are the strongest stressor in explaining stream habitat. At the localshed-scale, deciduous forest cover (e.g., WSDFOR = 0.260) seems to be a significant factor with a positive impact on stream habitat. Note that even as commercial-industrial-transportation land use in a 90 m riparian strip indicates significant adverse impact (e.g., coeff. = -0.232, p = 0.007) on QHEI, it is the weakest stressor in the model, as implied by its relatively low standardized coefficient and significance level. It is also indicated that the greater the localshed area in flat slopes (e.g., WSSL01; percentage of area in slopes below 1%), the greater is the adversity of impact on stream

126 habitat (e.g., coeff. = -0.135). This may be true as flatter slopes may be an indicator of

the level of urbanization, or imperviousness, which tends to cause greater surface runoff

into streams. Even if it is not impervious, flat lands may be associated with a greater

imbalance in the gradient of particles in the streambed (e.g., more fining, less gravel, less

instream cover). As expected, headwater streams (e.g., STRAH12) are again

significantly (p = 0.000) and negatively (e.g., coeff. = -6.209) associated with QHEI.

Overall, the predicted values of QHEI fall in the range 34.5 to 90, as compared to the

range 8.5 to 100 in the original QHEI data. This implies that the model needs to be

improved for areas with very poor stream habitat quality.

5.2.1 RESIDUALS

Based on the discussion in Chapter 3, the model is analyzed for validity of

regression assumptions. As indicated by Figure 5.1, the error terms in the model seem to

be fairly normally distributed. The VIF values show that multicollinearity does not seem

to be a serious problem in the model. The plot of squared residuals against the estimated

QHEI (after Gujarati, 1995, p.368) in Figure 5.2, and the plot of studentized residuals

against predicted values (after Fox, 1993, p.292) in Figure 5.3, both do not show any

major discernible pattern (e.g., fan-shaped). This indicates homoscedasticity of the

residuals.108 The partial regression plots in Figure 5.4 through Figure 5.7 fail to reveal any non-linear patterns (after Fox, 1993, p.296).

108 The plot for squared residuals versus R90CIT variable in Appendix C, however, does reveal a pattern indicating that heteroscedastic variance of the model errors tends to increase in spatial units with lesser areas in commercial, industrial, or transportation land use. 127 Many residual diagnostics are evaluated for the model in Table 5.8. These

measures are summarized in Table 5.9. For a graphical depiction of these diagnostics,

see the figures in Appendix C. After comparing the values in Table 5.8 for multiple

occurrences of cases across residual diagnostics, 19 candidate outliers are identified for

deeper scrutiny. These cases appear as candidate influential observations in at least four

diagnostics.109

Figure 5.1: Diagnosis for normality of model residuals.

109 Cases 196,204,235,245,302,388,399,and 423 appear four times; 53,153,184,186,340,and 551 appear five times; and 137,284,402,621, and 652 appear at least six times. 128

Figure 5.2: Diagnosis of model residuals using squared residuals and predicted values.

Figure 5.3: Diagnosis of model residuals using studentized residuals and adjusted predicted value.

129

Figure 5.4: Diagnosis of model residuals using partial regression with riparian row crop.

Figure 5.5: Diagnosis of model residuals using partial regression with riparian commercial-industrial-transportation land use.

130

Figure 5.6: Diagnosis of model residuals using partial regression with localshed forest cover.

Figure 5.7: Diagnosis of model residuals using partial regression with localshed flat slopes.

131

Residual Threshold * Threshold Candidate Outliers: observations Diagnostic value exceeding threshold #

Studentized Critical t at 5% +/- 1.96 53,245,302,308,340,388,399,423, 428,529,546,551,564,566,602,621,652 Leverage 2 * (k + 1) / n 0.022 22,137,138,144,153,161,166,171,175, (hat-value) 186,187,196,204,209,220,232,235,241, 280,284,331,339,340,341,347,402,440, 513,533,537,604,623,627,629,631 Mahalanobis Evaluate and sort top ten 137,138,153,161,186,220,235,280,284, distance 402 Cook’s 4 / (n – k – 1) 0.0075 53,137,173,184,196,204,235,245,269, distance 270,284,302,340,347,388,402,423,543, 551,566,618,621,652 COVRATIO 1 +/- [3 * (k + 1) / n] Upper: 1.033 22,137,138,144,153,161,171,175,179, 186,187,209,220,280,284,331,339,341, 440,513,537,629,631 Lower: 0.967 399,529,540,551,564,621,652 SDFFIT 2 * SQRT[(k + 1)/n – k –1)] +/- 0.211 53,137,173,184,196,204,235,245,269, 270,284,302,340,347,388,402,423,540, 551,566,618,621,652 SDFBETA 2 / SQRT(n) +/- 0.086

R30ROWC: 53,76,77,109,169,208,246,287,323,357, 402,423,428,431,447,475,495,502,529, 552,572,599,602,607,618,621,623,624, 602,650,655,701 R90CIT: 137,142,153,166,173,184,186,196,202, 204,236,284,388,402,415,466,529,533, 604,627,631 WSDFOR: 54,109,184,208,219,229,269,270,288, 340,347,399,402,433,456,516,602,627, 652,696 WSSL01: 52,53,76,77,184,229,235,245,297,302, 347,389,399,402,407,413,425,491,537, 543,548,551,552,586,590,621,652,696 * n is sample size (e.g., 543), k is the number of independent variables (e.g., 5). # ID for the localshed that can be used to link with other tables. Extreme observations are shown in bold.

Table 5.9: Summary of diagnostic tests to identify influential observations among regression residuals.110

110 Inspired by Hair et. al. (1998, p.236). 132

Variable Mean Min. Max. Std. QHEI 62.89 8.50 100.00 24.52 SUBSTRAT 13.29 1.00 20.00 5.84 COVER 11.42 1.00 20.00 6.01 CHANNEL 12.50 0.00 20.00 5.64 RIPARIAN 5.61 1.00 10.00 2.23 POOL 8.21 0.00 12.00 3.72 RIFFLE 3.34 -1.00 8.00 2.61 GRAD_S 8.53 4.00 10.00 2.29 YEAR 93 89 95 2 WSAREA 346 119 585 129 STRLRDLG 3 1 7 2 WSSTRFRQ 6 1 18 5 WSSINAVG 1.15 1.01 1.42 0.12 WSPSFREQ 3 0 40 9 WSPOPRDN 792 2 2,953 985 WSHUPRDN 344 0 1,210 456 WSRDEN 8.35 2.39 24.93 6.40 WSSL01 40.30 12.40 87.19 24.64 WSSL12 23.78 8.03 43.00 9.44 WSSL23 12.21 2.54 18.08 5.42 WSSL34 6.94 0.00 12.73 3.94 WSSL45 4.93 0.00 9.72 3.63 WSSL56 3.20 0.00 8.57 2.79 WSSL60 8.64 0.00 34.60 10.76 WSLRES 9.57 0.00 43.44 13.19 WSHRES 2.68 0.00 12.25 4.12 WSCIT 16.19 0.00 73.80 22.15 WSMINE 0.00 0.00 0.00 0.00 WSTSNT 0.00 0.00 0.00 0.00 WSDFOR 18.64 1.59 75.56 20.83 WSEFOR 0.14 0.00 1.26 0.28 WSMFOR 0.02 0.00 0.28 0.06 WSHAY 10.73 0.00 32.31 8.78 WSROWC 35.40 0.43 84.36 31.44 WSLAWN 4.64 0.00 32.60 8.00 WSWWET 0.64 0.00 3.64 0.87 WSHWET 0.15 0.00 0.45 0.17 R30LRES 7.78 0.00 30.34 10.00 R30DFOR 30.32 1.46 90.74 23.79 R30HAY 6.38 0.00 16.15 4.76 R30ROWC 28.08 0.00 81.61 25.54 R90CIT 15.33 0.00 51.23 19.74 R90DFOR 25.59 2.68 80.32 22.59

Table 5.10: Descriptive statistics for 19 residuals from the model in Table 5.8 for equal-scale localsheds.

133

ID Basin River River-mile AQLIFEDS111 53 Great Miami River Sevenmile Creek 9.2 WWH 137 Great Miami River Little Beaver Creek 4.7 WWH 153 Great Miami River Great Miami River 79.9 WWH 184 Great Miami River Hebble Creek .1 MWH-C 186 Great Miami River Hebble Creek 1.6 MWH-C 196 Great Miami River Dry Run 1.3 WWH 204 Great Miami River Dry Run 1.8 WWH 235 Great Miami River Mad River 13.9 WWH 245 Great Miami River Mad River 17.3 WWH 284 Great Miami River Buck Creek 4.7 WWH 302 Big Darby Creek Hamilton Ditch 2.1 MWH-C 340 Big Walnut Creek Rocky Fork Big Walnut Creek 3.1 EWH 388 Big Darby Creek Big Darby Creek 51.9 EWH 399 Stillwater River Greenville Creek 5.0 EWH 402 Big Walnut Creek Noble Run (Spring Hollow) .1 WWH 423 Great Miami River Kings Creek .8 CWH 551 Bokes Creek / Mill Creek Big Swale 1.7 LRW 621 Bokes Creek / Mill Creek Scioto River 220.8 WWH 652 Auglaize / Blanchard River Pike Run 8.1 MWH-C

Table 5.11: Location attributes for 19 residuals from the model in Table 5.8 for equal- scale localsheds.

ID QHEI Estimate Residual Order Population R30ROWC R90CIT WSDFOR WSSL01 53 85.0 54.5 30.5 1 119 56 0 10 13 137 67.5 54.0 13.5 1 121 4 47 2 18 153 71.5 60.6 10.9 7 1,176 0 42 2 33 184 79.5 55.3 24.2 2 273 38 10 28 59 186 42.5 51.6 -9.1 2 199 5 51 2 30 196 74.0 56.4 17.6 2 954 12 30 7 25 204 36.0 59.1 -23.1 2 1,206 7 29 12 27 235 88.0 66.7 21.3 6 267 31 0 47 87 245 85.5 56.9 28.6 6 1 36 0 13 85 284 55.0 65.3 -10.3 4 2,077 2 51 19 12 302 74.5 45.6 28.9 1 154 58 6 12 68 340 58.0 87.1 -29.1 3 177 1 0 60 19 388 81.5 53.0 28.5 2 829 19 18 3 51 399 100.0 62.3 37.7 4 29 43 0 9 24 402 66.0 90.2 -24.2 3 22 1 6 76 16 423 68.5 40.9 27.6 1 16 82 0 4 52 551 8.5 46.1 -37.6 2 19 53 0 6 74 621 25.0 58.5 -33.5 4 2 65 0 15 22 652 28.5 62.9 -34.4 2 945 22 0 27 49

Table 5.12: Regression estimates and some database attributes for 19 residuals from the model in Table 5.8 for equal-scale localsheds.

111 Aquatic Life Use Designation for the waterbody in Ohio. EWH: Exceptional Warmwater Habitat; WWH: Warmwater Habitat; MWH: Modified Warmwater Habitat; CWH: Coldwater Habitat; and LRW: Limited Resource Water. MWH-C probably refers to a change (upgrade or downgrade) in the waterbody’s use designation to MWH. 134

Figure 5.8: Geographic distribution of model residuals, urban areas, and Mad River Basin.

135 The descriptive statistics for these 19 residuals are shown in Table 5.10. The average values of QHEI and its component metrics, localshed area, stream order, reach sinuosity, and most land use and land cover variables are in close range of the corresponding average values for the equal-scale localsheds presented in Chapter 4.

However, some interesting trends seem to appear. Even as localshed-wide forest land cover is in line with the overall average (e.g., WSDFOR ~ 19% vs. ~18%), there seem to be less forested 30 m riparian areas (e.g., R30DFOR ~28% vs. ~37%). Even localshed- wide row crops are present to a lesser extent than typical (e.g., WSROWC ~35% vs.

~47%). On the other hand, there is higher population density (e.g., WSPOPDN ~792 people/sq. mile vs.464 people/sq. mile), higher road density (e.g., WSRDEN ~8% vs.~5%), greater proportion of localshed in flat slopes (e.g., WSSL01 ~40% vs.~32%), and higher share of localshed in low-density residential (e.g., WSLRES ~10% vs.~6%), high-density residential (e.g., WSHRES ~2.7% vs.~1.5%), and commercial-industrial- transportation land use (e.g., WSCIT ~16% vs.~4%), in these residual localsheds. The whole picture, at this stage, points to slightly higher level of urbanization in the residuals.

This should not come as a surprise, since the study area (e.g., ECBP ecoregion) itself is not highly representative of urban land use.

Tables 5.11 and 5.12 afford a closer look at the location and data attributes of these residuals, respectively. Interestingly, 11 of the 19 residuals (e.g., ~60%) are located in the Great Miami River Basin, which discharges into the area near Cincinnati. One residual (e.g., ID = 423) is designated as CRW (e.g., Coldwater Habitat), and another

(e.g., ID = 551, QHEI = 8.5) is designated as LRW (e.g., Coldwater Habitat). Both these aquatic life use designations are not typical of this region. Both the streams with the

136 minimum (e.g., QHEI = 8.5) and the maximum (e.g., QHEI = 100) QHEI score also appear, somewhat unsurprisingly, in this set.

A look at Table 5.12 indicates that all but one of the localsheds with a majority of area in flat slopes (e.g., WSSL01 > 50%) have been under-estimated (e.g., Residual > 0) for stream habitat quality by the regression model. Figure 5.8 shows that many of these positive residuals are also clustered around the Dayton area. Similarly, the habitat quality in both localsheds with a majority of land in deciduous forest cover (e.g., WSDFOR >

50%) has been over-estimated (e.g., Residual < 0). There probably are other interaction effects between variables (e.g., slope and landscape) that are not yet accounted for.

For largely unknown reasons, the urban area around Dayton seems to have higher

QHEI scores than estimated by our model. Seven residuals near Dayton are located in the Mad River Basin. It may be noted that Gordon and Majumder (2000) have identified

Mad River Basin as a problem area for similar reasons. As mentioned in Chapter 2,

Wang and Yin (1997) also found unexpectedly better water quality in the Mad River area near Dayton. This area is known for high-yielding aquifers (see the discussion on ecoregions in Chapter 4). Overall, this discussion underscores a general lack of complete and detailed understanding of the processes associated with stream habitat, especially around urban areas.

Another model is regressed after removing these 19 residual cases (e.g., 3.5% of all 543 cases) from the localshed-based data to explore the empirical influence of these cases. The results of this model are presented in Table 5.8. This model explains more

(e.g., R2 =40% vs. ~36%) of the variation in QHEI, with a standard error estimate of

13.194, as compared to 13.881 with the complete model (e.g., 5% improvement in the

137 prediction error). As indicated by VIF values, there does not seem to be serious multicollinearity between variables.

Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 73.923 32.268 0.000 R30ROWC -0.262 -0.361 -8.406 0.000 1.608 R90CIT -0.248 -0.097 -2.577 0.010 1.246 WSDFOR 0.299 0.211 4.888 0.000 1.623 WSSL01 -0.141 -0.160 -4.058 0.000 1.353 STRAH12 -6.525 -0.166 -4.782 0.000 1.050 F = 70.820 Prob. = 0.000 df = 523 R2 = 0.406 Adj. R2 = 0.400 Std. Error of Estimate = 13.194

Table 5.13: Linear regression for equal-scale localsheds using land use, land cover, and watershed slopes and headwater variables.

Variable Unstandardized Standardized t statistic p value VIF coefficient coefficient Constant 68.575 36.792 0.000 R30ROWC -0.179 -0.247 -4.389 0.000 2.842 R90CIT -0.208 -0.081 -2.189 0.029 1.246 WSDFOR 0.314 0.222 5.420 0.000 1.511 HW*WSSL01 -0.198 -0.236 -6.407 0.000 1.219 FLT*R30ROWC -0.104 -0.158 -2.841 0.005 2.794 F = 76.575 Prob. = 0.000 df = 523 R2 = 0.425 Adj. R2 = 0.419 Std. Error of Estimate = 12.981

Table 5.14: Linear regression for equal-scale localsheds using land use, land cover, watershed slopes, and slope-headwater and slope-rowcrop interactions.

The specific estimated coefficients are along the lines of the model in Table 5.8.

The exploration of influential residuals earlier hinted at hidden interactions between the variables. Some of these are incorporated in a model presented in Table 5.14. The

138 interactions seem to explain more than any of the earlier models, and all the variables have expected association with stream habitat quality.

Row crops in 30 m riparian strip have a strong adverse effect on QHEI.

Additionally, this impact seems to be accentuated in localsheds with flatter than average surface slopes (e.g., FLT dummy variable, which is ‘1’ if more than 32% of the localshed in slopes less than 1%, else ‘0’). Row crop agricultural use in 30 m riparian buffers is relatively the strongest predictor of stream habitat. Its partial regression coefficient (e.g., coeff. = -0.179) suggests that, keeping other variables constant, a 10% increase in the riparian area in row crops land use is associated with an average drop in the QHEI score of almost 2 points. The drop is almost 3 points, if the localshed is flatter than average.

Similarly, commercial-industrial-transportation use in 90 m riparian buffers is detrimental to habitat quality, with every 20% increase in such land use amounting to a more than 4 point drop in QHEI. Localshed-wide deciduous forest land cover seems to be beneficial to stream habitat (e.g., coeff. = 0.314). Every 20% decrease in deciduous forest anywhere in the localshed, is associated with a drop in QHEI of more than 3 points. Note that, as compared to other variables in the complete model, the influence of commercial- industrial-transportation is the weakest.

These numbers may seem trivial at first, but in reality the effects may be additive if riparian land is deforested to pave way for, say, row crops or commercial-industrial- transportation land use. In such a scenario, a land cover transition affecting only 8 acres

139 of the 30 m riparian area, or 23 acres in the 90 m riparian zone, of flatter localsheds may

translate into an almost 10 point decrease in QHEI scores in the localshed.112

In this model, the effect of flatter slopes in higher order streams is not captured.

However, in headwater localsheds (e.g., HW dummy variable = ‘1’ if Strahler order is 1

or 2, else ‘0’), this model estimates a drop in QHEI of almost 4 points (e.g., coeff. = -

0.198) with every 20% increase (e.g., 70 acres) in surface area with slope of less than 1%.

As can be expected with interaction terms, interacting variables display slightly

higher levels of multicollinearity. The variables are reasonably independent with low

VIF figures. Note that there is a 17% improvement in the explanatory power of the

model (e.g., R2 ~ 42% vs. 36%), and 6.5% improvement in the precision of the predicted

habitat quality (standard error of the estimate ~12.98 vs. 13.88). This seems to confirm

that the excluded cases (e.g., 3.5% of the localsheds) are indeed influential observations.

5.3 DISCUSSION

Some sophisticated methods, such as the Box-Cox power transformation method

(e.g., used in Johnson et al., 1997), are available to find the best transformation for

stressor and response variables in the model.113 However, these methods are not used in

this study to keep the specification, interpretation, and application simple.

112 A 20% change is not a far-fetched number considering that the average localshed area in the 30 m and 90 m riparian zones is only 38 and 116 acres, respectively. A 20% drop in localshed forest cover is related to more than 6 points drop in QHEI. Additionally, if it is a relatively flat localshed, and all the land is put to row crops use within 30 m of the stream, then there is an additional drop of more than 4 points (e.g., 20 * [-0.179 + (-0.104)] ). Note that there is no implication on the time it will take for the drop to happen. 113 The Box-Cox method is aimed at linearizing the relationship between the stressors and the response by transforming the response variable, such as by using λ in: λ y = β0 + (β1 *x1) + … + (βn *xn) + u According to the Box-Tidwell transformations (Lewis-Beck, 1993, p.312): λ λ λ y = β0 + (β1 *x1) 1 + … + (βn *xn) n + u 140 Spatial autocorrelation may affect these models because of geographical

proximity of spatial units, manipulation of data (Gujarati, 1995, p.402), and specification

error (e.g., excluded variables, inappropriate functional form). If the error terms are

correlated, then estimates are still unbiased, but inferences may be misleading.

The results do not explain the variation in QHEI to an extent that they may

confidently be applied for prediction. A significant portion (e.g., almost two-thirds in the

case of spatially independent localsheds) of the variability in QHEI remains unexplained

due to unknown factors. According to Berry and Feldman (1993, p.175), besides the

exclusion of theoretically relevant independent variables, measurement error in the

variables as well as misspecification of the functional form of the model may also cause

low explanation of variation. It is possible to attenuate measurement error by using

appropriate instrument variables, but they may be difficult to find.114

The models, however, provide valuable information about some stressors and the scale at which their impacts become significant. Row-crop agriculture is known to be the principal agricultural land use in the ECBP ecoregion (Yoder and Rankin, 1995b, p.273).

Policy controls for row crops in 30 m riparian zone may probably be the strongest elements in protecting stream habitat quality. On the other hand, deciduous forests seem to protect habitat at larger scales, either in wider strips (e.g., 90 m wide on each side), or in the whole localshed (e.g., contributing area up to a mile of upstream flow-length).

Forest land cover, which is strongly correlated with row crop agriculture, may

114 In simple terms, if Y is being regressed on X, which is measured with a random error, then an instrument variable, Z, can be used to estimate X first, and then the estimated X is used to estimate Y. Z must not be correlated with Y and must influence Y only through X. This is also called two-stage regression analysis. 141 alternatively be interpreted as absence of row crop agriculture land use in the model

(Johnson et al., 1997).

Interestingly, in the equal-scale strategy, localshed-wide deciduous forests are

effective in ameliorating habitat stresses, while forests in only a 90 m riparian zone were

more significant in the spatially-independent strategy. This is not surprising given that

the average area within only a 90 m buffer on each side of streams in the spatially-

independent localsheds is about 1.6 sq. miles (e.g., 345 acres), which is more than the

average area for the whole localshed (e.g., 0.5 sq. miles) in the equal-scale strategy.

Similarly, maximum localshed reach sinuosity is not significant in any of the

equal-scale localsheds. This indicates that sinuosity is more influential at a scale that is

relatively larger than used in the equal-scale localsheds (e.g., average area = 0.5 sq. miles,

maximum flowlength = 1 mile). It may be recalled that the average localshed area in the

spatially independent localsheds is almost an order higher, at around 8.5 sq. miles.

Higher slopes may be beneficial for stream habitat at smaller scales because of

the regional (between-unit removal) scouring and removal of sedimentation out of the

small spatial unit, by surface runoff over shorter distances (Marsh, 1997, p.229). This

factor is undetectable at coarser scales, where higher slopes are logically associated with

local (e.g., within-unit removal) erosion and sedimentation. Flatter terrains may indicate

less scouring and more filling of interstitial spaces in the stream substrate (Hill and Platts,

1991). Mecklenburg (1998) also hints at the possible improvement in substrate quality

by improved transport of fines due to better channel gradients.115 Low stream gradients,

115 Note that, as shown in descriptive statistics in Chapter 4 for this study, and for Ohio in general, higher slopes do not typically reach values that may qualify the steep areas as a risk on account of erosion. 142 combined with typically low flows in headwater streams, may cause higher retention time

for sediments, resulting in habitat degradation (Rankin, 1995, p.194).

The results of the model may only be applicable to the geographical context (e.g.,

ecoregion). Both the localshed strategies, though useful in providing insight into the

interrelationships between habitat, landscape and geomorphology, do not completely

cover the expanse of the study area. Issues of scale and heterogeneity may influence any

extrapolation because of the difference in ecological forcing functions across

geographical contexts (USEPA, 1998).116 Therefore, professional judgment may be required to make such decisions.

The results match well with Richards et al. (1996), which explains 50% of the variation in stream habitat quality, although in a much smaller sample (e.g., 46 vs. more than 500 in this study), and with much lesser resolution for riparian width, streams, slope, and land cover data.

The models presented here show that the processes shaping stream habitat work across various scales in a linked and holistic scheme. The localshed-based strategies also demonstrate that scale is important in explaining stream habitat quality, not only in the transverse direction (e.g., varying riparian widths), but also in the longitudinal direction

(e.g., varying flow-length, watershed size, or hierarchical position of reach in the overall stream network).

There is something to be said for every error; but whatever may be said for it, the most important thing to be said about it is that it is erroneous. - G.K. Chesterton

116 Ecological forcing functions are “critical abiotic variables that exert a major influence on the structure and function of ecological systems” (USEPA, 1998, p.86). 143

CHAPTER 6

CONCLUSIONS

When ideas fail, words come in very handy. - Johann Wolfgang von Goethe

Currently, there is no precedent of using biocriteria as a predictive tool. This is due to two major reasons. First, it is a relatively new research domain and very few attempts have been made at modeling ecological health of regional streams. Second, the complexity of assembling the database with the model variables may deter attempts by planners to implement a model. The work by Gordon and Majumder (2000) finds stream habitat to be the most significant factor determining regional variations in IBI. However, some of the explanatory variables (e.g., a composite index of chemical pollutants, and component metrics of QHEI) in the model do not lend themselves to any straightforward planning application. In this context, this research explores the scale-based linkages between habitat quality and natural and anthropogenic stressors, and attempts to present the relationship in a relatively simple model. In the absence of substantially established theory or ‘laws’ governing the behavior of the stream habitats in the watershed-based environmental system relative to unknown influences, empirical modeling based on statistical analysis seems to be a useful first approach.

144 The results provide an insight into the spatial processes shaping regional stream habitats.

Since the spatial processes have stronger impacts at certain scales, this study justifies watershed- and ecoregion-based data collection, and planning and management programs. It also helps in developing an understanding of the feasibility, merits, and limitations of using biological and habitat indicators of water quality in state monitoring, assessment, and permit regulation programs. The models also suggest that rural areas may be investigated at smaller scales because the runoff impacts there may be relatively less easily identifiable than in urban areas.

The discussion in Chapter 4 underscores the fact that results may be strongly influenced by the accuracy and resolution of the database (e.g., DEM, stream network, land use and land cover, demographics, roads).

6.1 POTENTIAL APPLICATIONS

Both Allan et al. (1997) and Jones and Gordon (2000) emphasize the need and technical superiority of catchment-scale water resource management. However, they also note the overlapping political jurisdictions and the fact that most land use decisions are currently taken at the village, township, or city level with little influence on upstream and downstream events. Interestingly, a localshed-based approach that relates agricultural and urban stresses to stream habitat at small scales (e.g., below 10 square miles) may help focus political efforts and enforcement on small areas, where local planning programs are already most effective.

A QHEI model with sufficient explanatory and predictive power may be used as a screening tool to guide local and regional planners in the decision-making for new

145 development (Gordon et al., 2001). For instance, an advanced internet-based application may guide planners through a series of intuitive questions (e.g., what is the area of the proposed development) that provide inputs to a statistical model. At the server, an application may analyze the geographic inputs (e.g., how much area in riparian deciduous forests may be lost by the proposed development) and return the results to the user in the form of changes in QHEI. Such habitat-based models may help managers perform a quick first-run to identify areas of potential habitat impairment before permitting large construction projects (e.g., under Section 404). It may also help scientists rank stream habitats to allocate resources for more precise biological surveys. This study may potentially be integrated with other ecological models. It is conceptually possible to integrate the QHEI models with a model to predict IBI (e.g., Gordon and Majumder,

2000), and eventually in TMDL (e.g., Total Maximum Daily Load) programs that allocate effluent limits among potential dischargers.

6.2 SPECIFIC POLICY IMPLICATIONS

The practical significance of the models can be gauged by the strong linkage of the model variables with many of the nine major impact types known in the ECBP ecoregion listed by Yoder and Rankin (1995b, p.273) – channelization; agricultural nonpoint; livestock access; flow alteration; impoundment; combined sewer overflows/urban; combined sewer overflows/urban with toxics; complex toxic; and conventional municipal/industrial. Specifically, channelization is related to sinuosity

(e.g., WSSINMAX), nonpoint and livestock sources to riparian row crops (e.g.,

146 R30ROWC), and urban and municipal/industrial may be associated with commercial-

industrial-transportation land use variable (e.g., R90CIT) in these models.

From an administrative perspective, the results from these models are directly

applicable to some water resource management issues listed by Yoder (1995, p.342) –

site-specific development criteria; regulation of adverse activities near aquatic habitats;

nonpoint source management; and watershed programs.

The empirical results indicate that narrow stream buffers (e.g., 30 m wide on each

side of the stream) in agricultural zones are very effective for habitat protection. In other

words, a riparian buffer of 30 m may be more effective than, say, a buffer of 90 m width

in an area under row crops land use.117 Similarly, wider riparian strips of, say, 90 m

width in forest cover are helpful for protecting streams, as compared to forests of

narrower, say, 30 m width.

The knowledge provided by these models can be reasonably linked to many Best

Management Practices long recommended by experts (Karr and Dudley, 1981) – rotation

with limited row crops, conservation tillage, and permanent vegetation cover on

vulnerable slopes and riparian banks. The significant interaction between variables

suggests that, for a start, BMPs may be initiated in headwater or flatter areas.

Best Management Practices (BMPs) that minimize adverse impacts and preserve

farming profitability can possibly reduce soil loss and the associated sedimentation

discharges into rivers from agricultural areas in close proximity of streams. In

agricultural areas, riparian buffer zones, setbacks, and easements can be constituted

117 Note that this does not mean that a stream buffer of, say, 10 m width will be more (or less) effective in ameliorating the negative impacts of agricultural land use in the proximity of streams, as that scale is not specifically studied in this empirical study due to the unavailability of land use data of finer resolution. 147 through local ordinances or other mechanisms, to conserve land and prevent soil erosion into streams. These strips can incorporate vegetation, filter strips, terraces, and sediment retention basins to reduce loss of soil. With innovative tools and public education, these tools may even be applied in existing agricultural or developed areas. For instance, the appropriateness of conservation tillage (e.g., with minimum cultivation of the soil) may be discussed with farmers owning land in critical areas of the watershed.118 The models also suggest that preserving, or planting, deciduous trees to reap the benefits of stream bank stability, instream cover, and shading, may be more fruitful over relatively wide riparian zones (e.g., 90 m) rather than in narrow strips (e.g., 30 m wide zones). Other benefits of conserving forests have been well known. In a recent national study,

Guldmann and Kim (2001) found that deciduous forests may improve air quality by reducing ozone concentrations in the region. In developed residential and industrial areas, stormwater runoff may be reduced by encouraging frequent street sweeping, parking lot cleaning, and replacing curbs with vegetated areas in the vicinity of streams.

6.3 FURTHER RESEARCH

Some ideas for further research relating to model structure, database, and miscellaneous issues are presented in this section. A recursive model with simultaneous

(Hair et al., 1998, p.600) or two-stage linear regression (Gujarati, 1995, p.680) analysis may be built to estimate IBI using QHEI and landscape variables because some land use factors (e.g., agriculture) may be significantly linked to both IBI (Gordon and Majumder,

118 Some economic aspects of conservation tillage – the need for special equipment for subsurface seeding, poor drainage and delayed planting due to soil compaction, and increased dependence on weed control – have restricted its widespread adoption by vegetable growers. 148 2000) and QHEI (Johnson et al., 1997). The two-stage regression approach may

encounter some potential obstacles. First, it may not be easy to find good instrument

variables that predict substantial proportion of the variability in QHEI without being

significantly correlated with IBI. Second, instrument variables must be measured with

substantially less random error than the variable that they are seeking to replace in the

first model. Third, even if such instrument variables are found they may not explain the

variation in QHEI to an extent that permits the use of the model for prediction with any

confidence, simple because of the current lack of knowledge about the environmental

processes shaping habitat quality. Finally, a sophisticated methodology needs to be

developed, which takes into account habitat-related stressors operating at multiple

ecological scales and aggregates them up to the level at which IBI operates.

Future structural modifications in this model may explore discriminant analysis to

study the power of QHEI to distinguish between given stressors (after Norton et al.,

2000), nonlinear regression, and ordinal logit and probit modeling techniques.

Autocorrelation may be specifically examined and treated using Weighted Least Squares

regression. The models may be validated using bootstrapping, simulation, or jackknife

procedures for estimation (Hair et al., 1998, p.606).119

The database may be enhanced by the inclusion of more variables, at higher

resolution. In some subecoregions of ECBP, streamflow may be a limiting factor

(Rankin, 1995). It is not accounted for in this analysis mainly because flow data are not

always readily available, and it is also not easy to determine where it may be limiting.

119 In bootstrapping, parameter estimates are averaged across model estimates from many random subsets of the original sample. Simulation is also similar but each new sample is slightly changed based on research objectives. In a jackknife process, a different observation is omitted in each new sample. 149 There may be a need for sampling to provide a larger database of sites reflecting a range

of conditions for alteration of stream habitats, especially in urban areas. A high

resolution stream dataset is necessary for a true study of sinuosity. Richards and Host

(1994) probably failed to find sinuosity to be statistically significant because of the lower

resolution of their streams dataset. According to Johnson and Gage (1997), few studies

have assessed the relationships of patch-based metrics in studying aquatic ecosystems.120

Future research may use flexible scale sinuosity (e.g., based on stream order) instead of maximum or average reach sinuosity. Road-stream and road-road intersections, possibly inverse distance-weighted, may be derived using GIS and studied for runoff-related impacts. Richards et al. (1996) suggest using wetland percentage and wetland position in habitat modeling.

Note that a potentially significant aspect of stream ecosytems, groundwater flow, is not studied in this research. At the time of this writing, regional-level soil or geology data of finer than 1:250,000 scale is not available. Also, land use dynamics could not be researched for lack of consistent data from multiple time periods. The modeling may be improved further with high-resolution stream and elevation data. Many of these issues are being addressed by ongoing work to provide national-level digital data on soil survey

(e.g., SSURGO), land use (e.g., NLCD 2000), hydrology (NHD) and elevation (NED).121

Further research may also use variables stratified by subecoregions and stream

order. Allan and others (1997) suggest the use of different buffer widths, for different

120 Patch dynamics theory places special importance on the relative position, size, and configuration or shape of landscape elements within the catchment. 121 The Soil Survey Geographic (SSURGO) database, typically in the range of 1:12,000 to 1:24,000, is provided by the Natural Resources Conservation Service (NRCS) of the U.S. Department of Agriculture (USDA). The National Land Cover Characterization (NLCD) 2000 dataset is anticipated to be available in 2004. High Resolution National Hydrography Dataset (NHD) will be available in 1:24,000 scale. National Elevation Dataset (NED) will be in 30 m, and possibly in 10 m, resolution. 150 streams in the same watershed, based on the stream order or type. This relates to the

differential influence of surrounding landscape on the sediment transport and other

similar hydraulic functions in headwater streams (Vannote et al., 1980). Sensitivity

analysis may be performed using equal-scale localsheds delineated by varying upstream

flow-length, possibly based on stream order or valley slope (e.g., higher valley slopes

indicate larger hydrologically connected area).

Among miscellaneous issues, sampling strategy may be improved to facilitate

statistical and spatial analysis. Localsheds are specific to sampling sites, so they do not

completely cover the whole study area. This may raise questions regarding the

applicability of the results to areas not covered by the localsheds in this empirical

analysis. Consequently, a more continuous sampling framework (e.g., sampling at the

mouth of each USGS 14-digit hydrologic cataloging unit, or HCU), may address such

issues. The HCUs follow a hierarchical hydrologic scheme that covers the whole nation

in a standard manner.

A lot of computing time in this study is spent performing repetitive GIS

calculations in series.122 Therefore, another interesting research area may be the

development of GIS software that supports parallel computing using a cluster of multiple

processors.

Be and not seem. - Ralph Waldo Emerson

122 For instance, if it takes 1 minute to process one localshed, then 500 localsheds are processed in 500 minutes (e.g., more than 8 hours) in the current processor architecture. Hypothetically speaking, in a perfectly parallel setup, if the GIS software is written to support multiple parallel processors, then a cluster of 50 parallely linked processors could possibly reduce the computing time to only 10 minutes! 151

REFERENCES

Adler, R.W., Landman, J.C., and Cameron, D.M., 1993. The Clean Water Act: 20 Years Later. Island Press. Washington, D.C.

Adler, R.W., 1995. Filling the gaps in water quality standards: Legal perspectives on biocriteria. Chapter 22 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Allan, J.D., and Johnson, L.B., 1997. Catchment-scale analysis of aquatic ecosystems. Freshwater Biology. 37: 107-111. Special applied issues section.

Allan, J.D., Erickson, D.L., and Fay, J., 1997. The influence of catchment land use on stream integrity across multiple spatial scales. Freshwater Biology. 37: 149-161. Special applied issues section.

Allen, T.F.H., and Starr, T.B., 1982. Hierarchy: Perspectives for ecological complexity. The University of Chicago Press, Chicago 60637.

Anderson, H.W., 1957. Relating sediment yield to watershed variables. Transactions of the American Geophysical Union. 45: 307-321.

Bailey, T.C., and Gatrell, A.C., 1995. Interactive spatial data analysis. Addison Wesley Longman Limited, Essex CM20 2JE, England.

Berry, W.D., 1993. Understanding regression assumptions. Part V in Regression Analysis. Edited by Michael S. Lewis-Beck. Volume 2. Quantitative Applications in the Social Sciences series. Sage Publications, New Delhi, India.

Berry, W.D., and Feldman, S., 1993. Multiple regression in practice. Part III in Regression Analysis. Edited by Michael S. Lewis-Beck. Volume 2. Quantitative Applications in the Social Sciences series. Sage Publications, New Delhi, India.

Bretschko, G., 1995. River/land ecotones: scales and patterns. Hydrobiologia. 303: 83- 91.

Cooper, S.D., Diehl, S., Kratz, K., and Sarnelle, O., 1998. Implications of scale for patterns and processes in stream ecology. Australian Journal of Ecology. 23: 27- 40.

152

Cormier, S.M., Smith, M., Norton, S., and Neiheisel, T., 2000. Assessing ecological risk in watersheds: A case study of problem formulation in the Big Darby Creek watershed, Ohio, USA. Environmental Toxicology and Chemistry. 19: (4) 1082- 1096, Part 2.

Davis, W.S., and Simon, T.P., 1995. Editors. Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Lewis Publishers, Boca Raton, FL.

Environmental Systems Research Institute (ESRI), 1997. ARC Macro Language. ESRI Press. ESRI, Inc., Redlands, California, 92373, USA.

Environmental Systems Research Institute (ESRI), 1994. Cell-based modeling with GRID. ESRI Press. ESRI, Inc., Redlands, California, 92373, USA.

Fox, J., 1993. Regression diagnostics. Part IV in Regression Analysis. Edited by Michael S. Lewis-Beck. Volume 2. Quantitative Applications in the Social Sciences series. Sage Publications, New Delhi, India.

Frissell, C.A., Liss, W.L., Warren, C.E., and Hurley, M.C., R.K., 1986. A hierarchical framework for stream habitat classification, viewing streams in a watershed context. Environmental Management. 10: 199-214.

Gordon, S.I., Arya, S., and Dufour, K., 2001. Creating a screening tool for identification of the ecological risks of human activity on watershed quality. Department of City and Regional Planning, The Ohio State University. Report to the U.S. Environmental Protection Agecy on Cooperative Agreement #CR826816-01-0.

Gordon, S.I., and Majumder, S., 2000. Empirical Stressor-Response Relationships for Prospective Risk Analysis. Environmental Toxicology and Chemistry. 19: (4) 1106-1112, Part 2.

Gordon, S.I., Ward, A.D., White, D.A., Majumder, S., and Chen, M., 2000. Integrating Planning, Forecasting, and Watershed Level Ecological Risk Assessment Techniques: A Test in the Eastern Cornbelt Plains Ecoregion. The Ohio State University. Report to the U.S. Environmental Protection Agency on Grant #R824769.

Gordon, S.I., 1985. Computer models in environmental planning. Van Nostrand Reinhold Company, New York 10020, USA.

Gordon, S.I., and Fromuth, R.K., 1981. A point, non-point source model of dissolved oxygen for the Great Miami River. Journal of Environmental Systems. 10: (3) 185-199.

153

Graham, R.L., Hunsaker, C.T., O’Neill, R.V., and Jackson, B.L., 1991. Ecological risk assessment at the regional scale. Ecological Applications. 1: (2) 196-206.

Guldmann, J.M., Kim, H.Y., 2001. Modeling air quality in urban areas: A cell-based statistical approach. Geographical Analysis. 33: (2) 156-180.

Hardy, M.A., 1993. Regression with dummy variables. Part II in Regression Analysis. Edited by Michael S. Lewis-Beck. Volume 2. Quantitative Applications in the Social Sciences series. Sage Publications, New Delhi, India.

Hickman, C., 2002. Personal Communication. April 17. Geographer, US Department of the Interior, USGS Columbus, OH 43229. Tel: 614 430-7768. Email: [email protected]. Public talk on National Hydrography Dataset (NHD).

Hill, M.T., and Platts, W.S., 1991. Ecological and Geomorphological Concepts in Instream and Out-of-Channel Flow Requirements. Rivers. 2: (3) 198-210.

Howard, S.M., 2002. Personal Communication. March 18. Raytheon, Land Cover Applications Center, USGS EROS Data Center. Tel: 605 594-6027. Email: [email protected]. Regarding accuracy assessment for NLCD data.

Hughes, R.M., 1995. Defining acceptable biological status by comparing with reference conditions. Chapter 4 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Hughes, R.M., Larsen, D.P., and Omernik, J.M., 1986. Regional reference sites: a method for assessing stream potential. Environmental Management. 10: 629- 635.

Hughes, R.M., and Omernik, J.M., 1981. Use and misuse of the terms watershed and stream order. Proceedings of the warmwater streams symposium. American Fisheries Society, Bethesda, Maryland. Edited by L.A. Krumholz. 320-326.

Hunsaker, C.T., and Levine, D.A., 1995. Hierarchical approaches to the study of water quality in rivers. BioScience. 45: 193-203.

Hunsaker, C.T., Graham, R.L., Suter II, G.W., O’Neill, R.V., Barnthouse, L.W., and Gardner, R.H., 1990. Assessing ecological risk on a regional scale. Environmental Management. 14: (3) 325-332.

Hutchinson, M.F., and Dowling, T.I., 1991. A continental hydrological assessment of a new grid-based digital elevation model of Australia. Hydrological Processes. 5: 45-58.

154

Hutchinson, M.F., 1989. A New Procedure for Gridding Elevation and Stream Line Data with Automatic Removal of Spurious Pits. Journal of Hydrology. 106: 211-232.

Jenson, S.K. and Domingue, J.O., 1988. Extracting Topographic Structure from Digital Elevation Data for Geographic Information System Analysis. Photogrammetric Engineering and Remote Sensing. 54: (11) 1593-1600.

Johnson, B.L., Richardson, W.B., and Naimo, T.J., 1995. Past, present, and future concepts in large river ecology: How rivers function and how human activities influence river processes. Bioscience. 45: (3) 134-141.

Johnson, L.B., and Gage, S.H., 1997. Landscape approaches to the analysis of aquatic ecosystems. Freshwater Biology. 37: 113-132. Special applied issues section.

Johnson, L.B., Richards, C., Host, G.E., and Arthur, J.W., 1997. Landscape influences on water chemistry in Midwestern stream ecosystems. Freshwater Biology. 37: 193-208. Special applied issues section.

Johnston, C.A., 1998. Geographic information systems in ecology. Methods in Ecology series. Series editors; J.H. Lawton and G.E. Likens. Blackwell Science Ltd, London WC1N 2BL.

Johnston, C.A., 1993. Introduction to quantitative methods and modeling in community, population, and landscape ecology. Chapter 25 in Environmental modeling with GIS. Edited by Michael F. Goodchild, Bradley O. Parks, and Louis T. Steyaert. Oxford University Press, New York, NY 10016-4314.

Jones, A.L., and Gordon, S.I., 2000. From Plan to Practice: Implementing Watershed- based Strategies into Local, State, and Federal Policy. Environmental Toxicology and Chemistry. 19: No. 4(2), 1136-1142.

Karr, James R., 1981. Assessment of biotic integrity using fish communities. Fisheries. 6: (6) 21-27.

Karr, J.R., and Dudley, D.R., 1981. Ecological perspective on water quality goals. Environmental Management. 5: (1) 55-68.

Kennedy, S., 1994. The small number problem and the accuracy of spatial databases. In The accuracy of spatial databases. Edited by Michael Goodchild and Sucharita Gopal. Taylor and Francis. New York, NY.

Lammert, M., and Allan, J.D., 1999. Assessing biotic integrity of streams: Effects of scale in measuring the influence of land use/cover and habitat structure on fish and macroinvertebrates. Environmental Management. 23: (2) 257-270 FEB.

155

Lanfear, K.J., 1990. A fast algorithm for automatically computing Strahler stream order. Water Resources Bulletin. 26: (6) 977-981.

Larsen, D.P., 1995. The role of ecological sample surveys in the implementation of biocriteria. Chapter 18 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Lewis-Beck, M.S., 1993. Applied regression: An introduction. Part I in Regression Analysis. Edited by Michael S. Lewis-Beck. Volume 2. Quantitative Applications in the Social Sciences series. Sage Publications, New Delhi, India.

Lillesand, T.,M., and Kiefer, R.W., 1994. Remote Sensing and Image Interpretation. Third edition. John Wiley & Sons, Inc., New York, NY 10158.

Majumder, S., 1998. A spatial empirical analysis of stressor-response relationships for prospective ecological risk assessment in the Eastern Cornbelt Plains ecoregion of Ohio. Ph.D. dissertation. Ohio State University.

Marsh, W.M., 1997. Landscape Planning: Environmental Applications. Third edition. John Wiley & Sons, Inc., New York, NY 10158.

Mattson, M.D., and Godfrey P.J., 1994. Identification of road salt contamination using multiple-regression and GIS. Environmental Management. 18: (5) 767-773 SEP- OCT.

Mecklenburg, D.E., 1998. Channel pattern’s influence on sediment pollution: A case study of Salt Creek, a previously straightened channel redeveloping meanders. Presented at ASAE annual international meeting at Orlando, Florida. July 11-16. Paper No. 982084.

Miller, N.S., Guertin, D.P., and Goodrich, D.C., 1996. Investigating stream channel morphology using a Geographic Information System. In Proceedings: ESRI International User Conference. Palm Springs, CA. May 20-24. At http://www.esri.com/library/userconf/proc96/TO300/PAP291/P291.html

Norton, S.B., Cormier, S.M., Smith, M., and Jones, R.C., 2000. Can biological assessments discriminate among types of stress? A case study from the Eastern Corn Belt Plains ecoregion. Environmental Toxicology and Chemistry. 19: (4) 1113-1119, Part 2.

O’Callaghan, J.F., and Mark, D.M., 1984. The Extraction of Drainage Networks from Digital Elevation Data. Computer Vision, Graphics, and Image Processing. 28: 323-344.

156

Ohio Department of Natural Resources (ODNR), 1998. Ecoregions of Indiana and Ohio. (Poster) Division of Geological Survey, Columbus, Ohio.

Ohio Environmental Protection Agency (Ohio EPA), 1989. Biological criteria for the protection of aquatic life. Volume III. Ecological Assessment Section. Division of Water Quality, Planning & Assessment, Columbus, Ohio 43266.

Omernik, James M., 1995. Ecoregions: A spatial framework for environmental management. Chapter 5 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Omernik, James M., 1987. Ecoregions of the coterminous United States. Annals of the Association of American Geographers. 77: (1) 118-125.

Ortolano, L., 1997. Environmental regulation and impact assessment. John Wiley & Sons, New York, USA.

Osborne, L.L., and Wiley, M.J., 1988. Empirical relationships between land use/cover and stream water quality in an agricultural watershed. Journal of Environmental Management. 26: (1) 9-27.

Patten, B.C., Bossermann, R.W., Finn, J.T., and Cole, W.G., 1976. Propagation of Cause in Ecosystems. In Systems Analysis and Simulation in Ecology. Volume 4. Academic Press, NY.

Poff, N.L., 1997. Landscape filters and species traits: towards mechanistic understanding and prediction in stream ecology. Journal of North American Benthological Society. 16: (2) 391-409.

Rankin, Edward T., 1999. Personal Communication. September, 14. Ecological Assessment Unit, Division of Surface Water, Ohio EPA, Columbus, Ohio 43228. Tel: 614 728-3388. Email: [email protected]. Regarding narrative criteria for QHEI.

Rankin, Edward T., 1995. Habitat indices in water resource quality assessments. Chapter 13 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Rankin, E.T., 1989. The Quanlitative Habitat Evaluation Index (QHEI): Rationale, methods, and application. Volume III. Ecological Assessment Section. Division of Water Quality, Planning & Assessment, Ohio Environmental Protection Agency. Columbus, Ohio 43266.

157

Reash, R.J., 1995. Biocriteria: A Regulated Industry Perspective. Chapter 11 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Richards, C., and Host, G.E., 1994. Examining land-use influences on stream habitats and macroinvertebrates – a GIS approach. Water Resources Bulletin. 30: (4) 729-738 JUL-AUG.

Richards, C., Haro, R.J., Johnson, L.B., and Host, G.E., 1997. Catchment and reach- scale properties as indicators of macroinvertebrate species traits. Freshwater Biology. 37: 219-230. Special applied issues section.

Richards, C., Johnson, L.B., and Host, G.E., 1996. Landscape-scale influences on stream habitats and biota. Canadian Journal of Fisheries and Aquatic Science. 53: (Suppl. 1) 295-311.

Ricklefs, R.E., 1997. The economy of nature. Fourth edition. W.H. Freeman and Company, New York, NY 10010.

Rosgen, D.L., 1994. A classification of natural rivers. Catena. 22: 169-199.

Saunders, W.K., 1999. Preparation of DEMs for Use in Environmental Modeling Analysis. In Proceedings: ESRI International User Conference. San Diego, CA. July 24-30. At http://www.esri.com/library/userconf/proc99/proceed/papers/pap802/p802.html

Simon, T.P., and Lyons, J., 1995. Application of the Index of Biotic Integrity to Evaluate Water Resource Integrity in Freshwater Ecosystems. Chapter 16 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Southerland, M.T., and Stribling, J.B., 1995. Status of Biological Criteria Development and Implementation. Chapter 7 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

SPSS, 1999. Advanced Models 10.0. SPSS Inc. Chicago, IL 60606-6307.

Strahler, A.N., 1957. Quantitative analysis of watershed geomorphology. American Geophysical Union Transactions. 38: 913-920.

158 Suter, G.W., II, 1993. A critique of ecosystem health concepts and indexes. Environmental Toxicology and Chemistry. 12: 1533-1539.

Suter, G.W., II, 1993. Ecological risk assessment. Lewis Publishers, Boca Raton, Florida.

Suter, G.W., II, and Barnthouse, L.W., 1993. Assessment concepts. In Ecological risk assessment. Editor and principal author Glenn Suter II. Lewis Publishers, Boca Raton, Florida. p.21-47.

United States Environmental Protection Agency (USEPA), 1998. Guidelines for Ecological Risk Assessment. EPA/630/R-95/002F, April. Office of Research and Development, Office of Water, Washington, DC 20460.

United States Environmental Protection Agency (USEPA), 1997. Watershed Ecological Risk Assessment. EPA/822/F-97-002, April. Office of Research and Development, Office of Water, Washington, DC 20460.

United States Geological Survey (USGS), 1993. Digital elevation models. Data User’s Guide 5. Third printing. Department of the Interior, Reston, Virginia.

Vannote, R.L., Minshall, W.W., Cummins, K.W., Sedell, J.R., and Cushing, C.E., 1980. The River Continuum Concept. Canadian Journal of Fisheries and Aquatic Science. 37: 130-137.

Vogelmann, J.E., Howard, S.M., Yang L., Larson, C.R., Wylie B.K., and Van Driel, N. 2001. Completion of the 1990s National Land Cover Data Set for the Conterminous United States from Landsat Thematic Mapper Data and Ancillary Data Sources. Photogrammetric Engineering and Remote Sensing, 67: 650-652.

Wang, L.Z., Lyons, J., Kanehl P., and Gatti R. 1997. Influences of watershed land use on habitat quality and biotic integrity in Wisconsin streams. Fisheries. 22: (6) 6-12 JUN.

Wang, X., and Yin, Z., 1997. Using GIS to assess the relationship between land use and water quality at a watershed level. Environment International. 23: (1) 103-114.

White, Dale A., Smith, R.A., Price, C.V., Alexander, R.B., and Robinson, K.W., 1992. A spatial model to aggregate point-source and nonpoint-source water-quality data for large areas.. Computers and Geosciences. 18: (8) 1055-1073.

Wilson, B., 2001. Personal Communication. Ohio Scenic Rivers Assistant. Division of Natural Areas and Preserves, Ohio Department of Natural Resources, Columbus, OH 43224. Tel: 614 265 6459. Email: [email protected]. Regarding the Ohio Scenic Rivers Stream Quality Monitoring project.

159

Yoder, Chris O., 1991. Answering some concerns about biological criteria based on experiences in Ohio. Water Quality Standards for the 21st Century, Proceedings of a National Conference. U.S. Environmental Protection Agency, Office of Water, Washington, D.C.

Yoder, C.O., 1995. Policy issues and management applications for biological criteria. Chapter 21 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Yoder, C.O., and Rankin, E.T., 1995a. Biological criteria program development and implementation in Ohio. Chapter 9 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Yoder, C.O., and Rankin, E.T., 1995b. Biological response signatures and the area of degradation value: New tools for interpreting multimetric data. Chapter 17 in Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Edited by Wayne S. Davis and Thomas P. Simon. Lewis Publishers, Boca Raton, FL.

Yoder, C.O., and Rankin, E.T., 1998. The role of biological indicators in a state water quality management process. Environmental Monitoring and Assessment. 51: 61-88.

LEGAL REFERENCES

Arkansas v. Oklahoma, 34 E.R.C. 1193 (1992). Clean Water Act of 1972 (CWA). P.L. 92-500, as amended, 33 U.S.C. Section 1251 et seq. Northeast Ohio Regional Sewer District v. Shank. 1991. Ohio Supreme Court Decision: Appeal from the Court of Appeals for Franklin county. No. 89-1554 – Submitted October 16, 1990 – Decided February 27, 1991.

All quotes, except the first (from the internet), are from different chapters in: Arnold, K., Gosling, J., and Holmes, D., 2000. The Java Programming Language, Third Edition. The Java Series. Addison Wesley Publishers.

160

APPENDIX A

[USGS HYDROLOGIC UNIT CLASSIFICATION SYSTEM]

161

BRIEF DEFINITION OF DIFFERENT LEVELS OF HYDROLOGIC UNITS

The different levels of hydrologic units are delineated by the U.S.G.S. using the 1:24,000 scale 7.5 minute series topographic maps. The units in each successive level are completely contained within a unit of the preceding level.

HYDROLOGIC NAME DIGITS AVERAGE SIZE UNITS IN UNIT LEVEL USA 1 Region 2 177,560 sq. miles 21 2 Sub-region 4 16,800 sq. miles 222 3 Accounting 6 10,596 sq. miles 352 Unit (Basin) 4 Cataloging Unit 8 703 sq. miles 2,149 (Sub-basin) 5 Watershed 11 63-391 sq. miles ~22,000 (40,000 – 250,000 acres) 6 Subwatershed 14 16-63 sq. miles ~160,000 (10,000 – 40,000 acres) Min. 5 sq. miles (3,000 acres)

Table A.1: The continuous hierarchy of Hydrologic Units for the United States123

123 From Hickman (2002), personal communication. 162

APPENDIX B

[NLCD CLASSIFICATION SYSTEM]

163

NLCD LAND COVER CLASSIFICATION SYSTEM LAND COVER CLASS DEFINITIONS

Water - All areas of open water or permanent ice/snow cover.

11. Open Water - All areas of open water; typically 25 percent or greater cover of water (per pixel).

12. Perennial Ice/Snow - All areas characterized by year-long cover of ice and/or snow.

Developed - Areas characterized by a high percentage (30 percent or greater) of constructed materials (e.g. asphalt, concrete, buildings, etc).

21. Low Intensity Residential - Includes areas with a mixture of constructed materials and vegetation. Constructed materials account for 30-80 percent of the cover. Vegetation may account for 20 to 70 percent of the cover. These areas most commonly include single-family housing units. Population densities will be lower than in high intensity residential areas.

22. High Intensity Residential - Includes highly developed areas where people reside in high numbers. Examples include apartment complexes and row houses. Vegetation accounts for less than 20 percent of the cover.

Constructed materials account for 80 to100 percent of the cover.

23. Commercial/Industrial/Transportation - Includes infrastructure (e.g. roads, railroads, etc.) and all highly developed areas not classified as High

Intensity Residential.

164 Barren - Areas characterized by bare rock, gravel, sand, silt, clay, or other earthen material, with little or no "green" vegetation present regardless of its inherent ability to support life. Vegetation, if present, is more widely spaced and scrubby than that in the "green" vegetated categories; lichen cover may be extensive.

31. Bare Rock/Sand/Clay - Prennially barren areas of bedrock, desert pavement, scarps, talus, slides, volcanic material, glacial debris, beaches, and other accumulations of earthen material.

32. Quarries/Strip Mines/Gravel Pits - Areas of extractive mining activities with significant surface expression.

33. Transitional - Areas of sparse vegetative cover (less than 25 percent of cover) that are dynamically changing from one land cover to another, often because of land use activities. Examples include forest clearcuts, a transition phase between forest and agricultural land, the temporary clearing of vegetation, and changes due to natural causes (e.g. fire, flood, etc.).

Forested Upland - Areas characterized by tree cover (natural or semi-natural woody vegetation, generally greater than 6 meters tall); tree canopy accounts for 25-100 percent of the cover.

41. Deciduous Forest - Areas dominated by trees where 75 percent or more of the tree species shed foliage simultaneously in response to seasonal change.

42. Evergreen Forest - Areas dominated by trees where 75 percent or more of

165 the tree species maintain their leaves all year. Canopy is never without green foliage.

43. Mixed Forest - Areas dominated by trees where neither deciduous nor evergreen species represent more than 75 percent of the cover present.

Shrubland - Areas characterized by natural or semi-natural woody vegetation with aerial stems, generally less than 6 meters tall, with individuals or clumps not touching to interlocking. Both evergreen and deciduous species of true shrubs, young trees, and trees or shrubs that are small or stunted because of environmental conditions are included.

51. Shrubland - Areas dominated by shrubs; shrub canopy accounts for

25-100 percent of the cover. Shrub cover is generally greater than 25 percent when tree cover is less than 25 percent. Shrub cover may be less than 25 percent in cases when the cover of other life forms (e.g. herbaceous or tree) is less than 25 percent and shrubs cover exceeds the cover of the other life forms.

Non-natural Woody - Areas dominated by non-natural woody vegetation; non-natural woody vegetative canopy accounts for 25-100 percent of the cover. The non-natural woody classification is subject to the availability of sufficient ancillary data to differentiate non-natural woody vegetation from natural woody vegetation.

61. Orchards/Vineyards/Other - Orchards, vineyards, and other areas planted

166 or maintained for the production of fruits, nuts, berries, or ornamentals.

Herbaceous Upland - Upland areas characterized by natural or semi-natural herbaceous vegetation; herbaceous vegetation accounts for 75-100 percent of the cover.

71. Grasslands/Herbaceous - Areas dominated by upland grasses and forbs.

In rare cases, herbaceous cover is less than 25 percent, but exceeds the combined cover of the woody species present. These areas are not subject to intensive management, but they are often utilized for grazing.

Planted/Cultivated - Areas characterized by herbaceous vegetation that has been planted or is intensively managed for the production of food, feed, or fiber; or is maintained in developed settings for specific purposes.

Herbaceous vegetation accounts for 75-100 percent of the cover.

81. Pasture/Hay - Areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing or the production of seed or hay crops.

82. Row Crops - Areas used for the production of crops, such as corn, soybeans, vegetables, tobacco, and cotton.

83. Small Grains - Areas used for the production of graminoid crops such as wheat, barley, oats, and rice.

84. Fallow - Areas used for the production of crops that are temporarily barren or with sparse vegetative cover as a result of being tilled in a management practice that incorporates prescribed alternation between

167 cropping and tillage.

85. Urban/Recreational Grasses - Vegetation (primarily grasses) planted in developed settings for recreation, erosion control, or aesthetic purposes.

Examples include parks, lawns, golf courses, airport grasses, and industrial site grasses.

Wetlands - Areas where the soil or substrate is periodically saturated with or covered with water as defined by Cowardin et al.

91. Woody Wetlands - Areas where forest or shrubland vegetation accounts for 25-100 percent of the cover and the soil or substrate is periodically saturated with or covered with water.

92. Emergent Herbaceous Wetlands - Areas where perennial herbaceous vegetation accounts for 75-100 percent of the cover and the soil or substrate is periodically saturated with or covered with water.

168

APPENDIX C

[RESIDUAL DIAGNOSTICS]

169

Figure C.1: Diagnosis of model residuals using studentized values.

Figure C.2: Diagnosis of model residuals using leverage values.

170

Figure C.3: Diagnosis of model residuals using Mahalanobis distance.

Figure C.4: Diagnosis of model residuals using Cooke’s distance.

171

Figure C.5: Diagnosis of model residuals using COVRATIO.

Figure C.6: Diagnosis of model residuals using SDFFIT.

172

Figure C.7: Diagnosis of model residuals using SDFBETA for riparian row crops.

Figure C.8: Diagnosis of model residuals using SDFBETA for riparian commercial- industrial-transportation land use.

173

Figure C.9: Diagnosis of model residuals using SDFBETA for localshed forest cover.

Figure C.10: Diagnosis of model residuals using SDFBETA for localshed flat slopes.

174

Figure C.11: Diagnosis of model residuals using squared residuals and riparian row crop.

Figure C.12: Diagnosis of model residuals using squared residuals and riparian commercial-industrial-transportation land use.

175

Figure C.13: Diagnosis of model residuals using squared residuals and localshed forests.

Figure C.14: Diagnosis of model residuals using squared residuals and localshed flat slopes.

176