<<

Water Resources Research

RESEARCH ARTICLE Rainfall Thresholds for Flow Generation 10.1029/2018WR023714 in Desert Ephemeral

1 1 2 1 Key Points: Stephanie K. Kampf G!), Joshua Faulconer , Jeremy R. Shaw G!), Michael Lefsky , Rainfall thresholds predict 3 2 streamflow responses with high Joseph W. Wagenbrenner G!), and David J. Cooper G!) accuracy in small hyperarid and 1 2 semiarid watersheds Department of Ecosystem Science and Sustainability, Colorado State University, Fort Collins, Colorado, USA, Department 3 Using insufficient rain data usually of Forest and Rangeland Stewardship, Colorado State University, FortCollins, Colorado, USA, USDA Forest Service, Pacific increases threshold values for larger Southwest Research Station, Arcata, California,USA watersheds, leading to apparent scale dependence in thresholds Declines in flow frequency and Rainfall thresholds for streamflow generation are commonly mentioned in the literature, but increases in thresholds with drainage Abstract area are steeper in hyperarid than in studies rarely include methods for quantifying and comparing thresholds. This paper quantifies thresholds semiarid watersheds in ephemeral streams and evaluates how they are affected by rainfall and watershed properties. The study sites are in southern Arizona, USA; one is hyperarid and the other is semiarid. At both sites rainfall and 3 2 2 Supporting Information: streamflow were monitored in watersheds ranging from 10- to 10 km . Streams flowed an average of 0-5 • Supporting Information 51 times per year in hyperarid watersheds and 3-11 times per year in semiarid watersheds. Although hyperarid sites had fewer flow events, their flow frequency (fraction of rain events causing flow) was higher than in Correspondence to: 2 semiarid sites for small ( < 1 km ) watersheds. At both locations flowfrequency decreased with drainage area, S. K. Kampf, [email protected] but the decrease was steeper in hyperarid watersheds. Watershed mean 60-min intensity thresholds ranged from 3-13 mm/hr in hyperarid watersheds and 7-16 mm/hr in semiarid watersheds. Higher runoff

Citation: thresholds and lower flow frequencies in small semiarid watersheds likely relate to greater ground cover Kampf, S. K., Faulconer, J., Shaw, J. R., and soil development compared to the desert pavement and bedrock surfaces in hyperarid sites. The choice Lefsky, M., Wagenbrenner, J. W., & of rain data strongly influenced threshold values; single rain gauges were only adequate for threshold Cooper, D. J. (2018). Rainfall thresholds prediction in <1-km2 watersheds, and incomplete rainfall data led to increases in thresholds with drainage forflow generation in desert ephemeral streams. Water Resources Research, 54, area. We recommend using mean rainfallintensity over the drainage area for threshold analysis because this 9935-9950. https://doi.org/10.1 029/ reduces apparent scale dependence in thresholds caused by incomplete rainfall information. 2018WR023714 Plain Language Summary Ephemeral streams in deserts are usually dry, flowing only after heavy Received 16 JUL 2018 rains. Our goal was to determine how much rain is needed for these streams to flow. We studied streams Accepted 5 NOV 2018 in dry southwestern and in wetter southeastern Arizona, USA. The dry site has mostly bare rock across the Accepted article online 12 NOV 2018 Published online 9 DEC 2018 landscape, whereas the wetter site has more vegetation, including shrubs, grasses, and oak. In the dry area, small streams flowed 3-5 times per year, and larger streams flowed 0-2 times per year. Streams required 3-13 mm of rain over 1 hr to trigger flow. The small streams flowed more frequently because rain falling on bare rock could rapidly reach the . Water was lost into the bed before reaching larger streams. In the wetter area, both small and large streams flowed 3-11 times per year. These streams required 7-16 mm of rain over 1 hr to trigger flow. This range of rainfall is slightly higher than for streams in the dry study area, likely because vegetation cover and soil development allow more rainfall to infiltrateinto soil before reaching streams. Information on the amount of rainfall needed to trigger streamflow can help with issuing flash warnings for ephemeral streams.

1. Introduction Rainfall thresholds for streamflow generation are important components of streamflow prediction models and flood frequency distributions (Gioia et al., 2008; Kirkby et al., 2002, 2005). These thresholds indicate the lowest magnitude or intensity of rainfall that will cause streamflow. For excess overland flow at the point scale, these thresholds should be greater than or equal to the infiltration capacity of the soil, as overland flow develops when rainfall intensity exceeds this capacity plus detention storage. Over larger domains, however, threshold values may change with spatial scale, varying with flow path length (Vair & Raz-Yassif, 2004). Such scale-dependent patterns can be simulated using distributed models that represent spatially variable land cover, soil properties, and/or hydrologic connectivity (Kirkby et al., 2005; Pierini et al., 2014), but most study areas lack empirical data to verify the accuracy of simulated thresholds and their ©2018. American Geophysical Union. All Rights Reserved. scale dependence.

KAMPF ET AL. 9935 Water Resources Research 10.1029/2018WR023714

One type of stream system where threshold-type responses are common is desert ephemeral streams (Osborn & Lane, 1969). Because these streams rarely flow, threshold-exceeding storms can be critical forseed dispersal, riparian vegetation establishment, and species diversity (Friedman & Lee, 2002; Shaw & Cooper, 2008; Stromberg et al., 2009). Several studies in arid and semiarid environments have identifiedrainfall depth or intensity thresholds for the occurrence of overland flow on hillslopes or for streamflow generation (Cammeraat, 2004; De Boer, 1992; Martfnez-Mena et al., 1998; Mayor et al., 2011; Osborn & Lane, 1969). Of these, some have found that rainfall threshold values increase with drainage area (Cammeraat, 2004; Mayor et al., 2011). This increase has been attributed both to transmission losses along channels (Kirkby et al., 2005; Simanton & Osborn, 1983) and to spatial patterns of high-intensity storms, which often cover only small portions of larger watersheds (Goodrich et al., 1997; Morin et al., 2006; Syed et al., 2003). The relative contributions of rainfall spatial patterns and transmission losses to threshold scaling have not been quanti­ fied.Further while multiple studies mention rainfall thresholds for streamflow generation, methods for quan­ tifying threshold values are rarely presented (Cammeraat, 2004; Yair & Klein, 1973), making it difficult to compare thresholds between study areas. 3 This research analyzes ephemeral streams draining small catchment to watershed-scale areas (10- to 103 km2) in southern Arizona, USA, to identify rainfall thresholds for streamflow generation. The goal is to examine how rainfall patterns and drainage area size interact to affectthreshold values. We compare thresh­ olds for initiating streamflow in a hyperarid part of the Sonoran desert, USA, to those of the semiarid Walnut Experimental Watershed. We also examine how watershed vegetation cover and slope alter the thresh­ olds. This information on spatial scaling of rainfall thresholds provides an empirical foundation for selecting parameters in hydrologic models, establishes lower bounds for flood warning systems, and can guide riparian management strategies for ephemeral stream systems.

2. Study Site The hyperarid study area is located in the U.S. Army's Yuma Proving Ground in southwestern Arizona, in the Sonoran desert. Annual precipitation averages 90 mm with most rain falling either during July-September or November-March. Summer storms are short-duration, high-intensity convective thunderstorms associated with the North American monsoon, and winter storms are from frontal systems that bring lower intensity, longer-duration rains (Hallack-Alegria & Watkins, 2007). The site is one of the hottest regions of the Sonoran desert with a mean annual minimum temperature of 16 °C and a mean annual maximum tempera­ ture of 31 °C. Maximum daily temperatures in July and August frequently exceed 40-42 °C. The two study watersheds are Mohave Wash and Yuma Wash (Figure 1 ), which both drain to the Colorado . Mohave Wash is 37 km long with a 323-km2 drainage area, and Yuma Wash is 28 km long with a 188-km2 drainage area. Elevation in Mohave Wash ranges from 210 to 720 m, and elevation in Yuma Wash ranges from 120 to 740 m. Mountainous portions of both watersheds consist primarily of exposed dacite and granodiorite, and lowland valleys are filled with Pliocene to Holocene alluvium. Gently sloping alluvial fans (bajadas) separate mountain blocks from lowland valleys, which contain active channels (Bacon et al., 2010). The alluvial fans are also known as piedmont surfaces, where desert pavement has formedon the sur­ face of partially consolidated alluvial deposits. Desert pavement has a one- or two-particle thick layer of clo­ sely packed, darkly varnished gravel (>2 mm) and cobble (>60 mm) overlaying a vesicular layer of eolian fineswith low permeability (McFadden et al., 1987; Springer, 1958). Documented infiltrationrates in this area are less than 10 mm/hr (McDonald et al., 2004), which is substantially lower than saturated hydraulic conduc­ tivities measured on other surfaces in the region (Caldwell et al., 2012). Desert pavement surfaces have sparse vegetation consisting primarily of Larrea tridentata (creosote bush) and Ambrosia dumosa (white bursage) (Shreve & Wiggins, 1964). Vegetation is more abundant and diverse in riparian areas, including large woody plants such as Parkinsonia spp. (palo verde), O/neya tesota (iron wood), and Prosopis spp. (mesquite). Streams in Mohave and Yuma Washes have been classifiedinto fivegeomorphic types: piedmont headwater (PH), bedrock (BK), bedrock with alluvium (BA), incised alluvium (IA), and braided (BR) (Sutfin et al., 2014). Piedmont headwater channels are incised into desert pavement, and bedrock channels are headwater streams incised into bedrock. Bedrock with alluvium channels are partially confined by bedrock but have enough persistent alluvium in the channel to create bedforms. Incised alluvium channels are cut into the par­ tially consolidated alluvial material of the piedmont, and they also contain modern alluvial bedforms. Braided

KAMPF ET AL. 9936 Water Resources Research 10.1029/2018WR023714

Figure 1. Location ofthe studywatersheds within Arizona showing stream stage and rainfallmonitoring locations (NAIP imageryfor Yuma; ESRIfor Walnut Gulch).

channels are the furthest downstream in the network, and they are large multithreaded channels underlain by deep modern alluvium. All channels have ephemeral flow, and those with modern alluvial fill have high permeability that allows rapid infiltration of channel flow (Kampf et al., 2016). The semiarid study area is the Walnut Gulch experimental watershed (Figure 1), where mean annual precipita­ tion is 312 mm. Walnut Gulch is 31 km long, with a drainage area of 149 km2• Elevations are higher than in the Yuma area, ranging from 1,220 to 1,930 m. Bedrock geology is diverse, including sedimentary, plutonic, and volcanic rocks with some modern alluvium along channels. Soils are predominantly sand and gravel loams formed on surfaces of the bedrock (Osterkamp, 2008). Land cover in Walnut Gulch is a mixture of shrubs, grass, and oak woodland, and the watershed also contains the small town of Tombstone, AZ (Skirvin et al., 2008). Channels in Walnut Gulch are all ephemeral, and they have width:depth ratios ranging from 2 to 100 (Miller, 1995). This range is consistent with the range of width:depth ratios for Mohave and Yuma Washes, with the exception of the braided channels, which have width:depth ratios >100 (Sutfin et al., 2014).

3. Methods 3.1. Instrumentation Stream stage was monitored at 18 locations in Mohave and Yuma Washes (Table 1) using In-Situ Inc. Rugged TROLL 100 pressure transducers (In-Situ, Fort Collins, CO, USA); monitoring locations were clustered near the watershed outlets due to accessibility. Study sites had drainage areas of 0.002-225 km2, and each watershed had stage measurements at two sites per geomorphic stream type except braided, which had one monitor­ ing location per watershed. Sensors were placed inside vented PVC pipes for protection and bolted onto bed­ rock channel beds, banks, or trees to keep them in place during flow. The pressure transducers were unvented, so we corrected for barometric pressure using In Situ Inc. Rugged BaroTROLL loggers located at the braided sites. All pressure sensors were recorded at 15-min time steps. Rainfall was monitored with nine tipping bucket rain gauges, which were models RG3-M, TE525, or TB4 (Onset Corporation, , MA, USA;

KAMPF ET AL. 9937 Water Resources Research 10.1029/2018WR023714

Table 1 Stream Stage Monitoring Locations, Characteristics of Their Contributing Areas and Channel Monitoring Locations, and StreamflowEvent Summaries at Mohave and Yuma Washes Grouped by Channel Geomorphic Type Mean Annual Number Site Area (km2) Site Elevation (m) Dates Recordinga Number Flow Events Flow Events YPHl 0.021 156 3/2012-3/2015 8 3 YPH2 0.002 154 3/2013-3/2015 10 5 MPHl 0.061 226 11/2011-3/2015 16 5 MPH2 0.049 242 2/2013-3/2015 10 3 YBKl 0.005 225 11/2011-5/2014 7 2 YBK2 0.013 166 3/2013-5/2014 2 2 MBKl 0.093 464 11/2011-5/2014 8 3 MBK2 0.015 384 2/2013-5/2015 3 YBAl 2.2 211 11/2011-3/2015 0 YBA2 1.7 207 3/2013-10/2013 0 0 MBAl 0.9 279 11/2011-3/2015 8 2 MBA2 0.8 328 3/2013-3/2015 4 2 YIA1 3.6 176 11/2011-3/2015 6 2 YIA2 5.7 162 3/2013-5/2014 0 0 MIA2 3.0 249 2/2013-3/2015 4 2 MIAl 170 270 11/2011-3/2015 4 YBDl 170 116 2/2012-3/2015 2 1 MBDl 225 201 4/2012-5/2014 0 Note. Site IDs beginning with Y are in Yuma Wash, and those beginning with M are in Mohave Wash. PH are piedmont headwater, BK are bedrock,BA are bedrock with alluvium, IA are incised alluvium, and BD are braided channel locations. aSome stations have data gaps within the period of record. Date formats are MM/YYYY.

Texas Electronics, Dallas, TX, USA; Hydrological Services Pty Ltd, Sydney, Australia) connected to either Campbell Scientific loggers (Campbell Scientific, Logan, UT, USA) or Hobo event loggers (Onset Corporation, Bourne, MA, USA). The full study duration was from November 2011 to March 2015, but due to variability in the timing of sensor installation, instrument failures, and other logistical issues, the period of record varied by location (Table 1). Tablel Stream Monitoring Locations, Characteristics of Their Contributing For Walnut Gulch, we used the 2000-2016 discharge records from 19 Areas and Channel Monitoring Locations, and Streamflow Event Summaries (Table 2). These flumes are installed on streams that have drainage at Walnut Gulch areas ranging from 0.002 to 149 km2 . Sites with drainage areas >2 km2 Area Site Elevation Number Flow Mean Annual Number have supercritical flumes, and the smaller watersheds have Smith flumes. 2 Site (km ) (m) Events Flow Events All flumes have continuous stage measurements from potentiometers 105 0.002 1361 101 6 attached to stilling well gears (Stone et al., 2008). We used the extensive 106 0.003 1358 122 8 digital rain gauge network for the watershed, which includes 88 Belfort 126 0.01 1503 52 3 weighing rain gauges (Belfort Instrument, Baltimore, MD, USA) that were 102 0.02 1360 116 7 continuously monitoring precipitation since 2000 (Goodrich et al., 2008). 112 0.02 1513 64 4 103 0.04 1362 126 8 104 0.05 1350 104 7 3.2. Radar Precipitation 121 0.05 1359 102 6 The rain gauge network at Mohave and Yuma Washes was not as extensive 125 0.06 1336 79 5 as at Walnut Gulch. Therefore, to examine spatial patterns of rainfall across 4 2.27 1359 87 5 11 8.24 1426 118 7 the watersheds, we used NEXRAD radar data from the KYUX station in 3 8.98 1335 100 6 Yuma, Arizona, available through the NOAA National Center for 7 13.52 1298 92 6 Environmental Information (https://www.ncei.noaa.gov/). This station is 10 16.63 1372 105 7 66-70 km south of the rain gauges in Yuma Wash and 103-108 km south 9 23.59 1372 135 8 of the rain gauges in Mohave Wash (Figure 1 ). Starting in May 2012, the 15 23.93 1355 83 5 6 95.10 1345 127 8 KYUX station was upgraded to dual-polarization radar, so we examined 2 113.72 1292 168 11 radar data from May 2012 through March 2015. We used the level 3 one­ 1 149.33 1227 174 11 hour precipitation NEXRAD product, which gives rainfall totals for a mov­ Note. All sites were recording continuously between 2000 and 2016 ing hour-long window at irregular time steps averaging 8 min in duration. except 126, which recorded from 2011 to 2016. Site information from To evaluate whether radar rainfall values were consistent with rain gauge https://www.tucson.ars.ag.gov/dap/flume_locations.asp. data, we extracted radar rainfalltime series for each pixel containing a rain

KAMPF ET AL. 9938 Water Resources Research 10.1029/2018WR023714

gauge and compared 60-min intensities between the radar and the rain gauges (Anagnostou et al., 2010; Xie et al., 2006). We tested for time offsetsbetween radar and rain gauge data (Morin et al., 2003) and determined that no time offset adjustment was needed. However, because time steps did not match for radar and rain gauge data, we used a linear interpolation function in Matlab (version R2013a, MathWorks, Natick, MA, USA) to resample the radar values to the same time steps as the gauge values. We then computed a series of statistics to evaluate the accuracy of the radar values: false positive (FP), the percent of time steps when radar has rainfall but the gauge does not; false negative (FN), the percent of time steps when radar does not record rainfall but the rain gauge does; coefficient of determination (R2); and root-mean-square error. Following this analysis, to map spatial patterns of rainfall across the study area, we used a mean field bias correction (Goudenhoofdt & Delobbe, 2009) and multiplied radar intensities by 0.67, the slope of the linear fit between all radar and rain gauge values (Figure 51).

3.3. Rainfall and Streamflow Patterns We examined spatial and temporal rainfall patterns using both the rain gauge and radar data. We defined discrete rainfall events as any sequence of rainfall separated by at least 7 hr from the next period of rain. This interevent time limit was selected based on the range of interevent times reported elsewhere (Dunkerley, 2008) and the maximum flow duration observed at the watershed outlets in Mohave and Yuma Washes (Faulconer, 2015). To avoid including spurious tips of the rain gauges in event counts, we only considered events that reported at least 1 mm of rainfall and a maximum 60-min rainfallintensity >1 mm/hr. For each defined event, we computed the total rain depth (P) and the maximum intensities at 15-, 30-, and

60-min time steps (Ml 15, M'30, Ml60) for all rain gauges. For each rainfall event, we documented whether or not the channels flowed. At Mohave and Yuma Washes, we identified flow events as times where the stage rose above the background noise in the sensor values, or >2 cm above the preevent stage. At Walnut Gulch, we identifiedflow events using any reported flow from the records. For each monitoring location we computed flow frequency (fraction of events

causing flow, Pobs (flow)) as the number of > 1-mm events with streamflow divided by the total number of > 1-mm events.

3.4. Rainfall Thresholds We evaluated whether flow occurrence at each monitoring location could be predicted by a rainfall thresh­ old. For a given rainfall event a threshold flow prediction is represented as

if rain > T, flow (1) if rain ::ST, no flow

where rain is any of the rainfallmetrics (P, Ml 15, Ml30, Ml60) and Tis the threshold. We identifiedT as the value

that optimizes the fraction of events correctly predicted, p0• This was accomplished by iterating through

values ofeT in increments of 0.1 units (mm or mm/hr). If multiple values ofeT gave the same p0 value, the lowest of the T values was selected as the threshold. For threshold analysis with rain gauge data at Mohave and Yuma Washes, we used the rain gauge closest to each stream monitoring location and computed thresholds for all rainfall metrics. For threshold analysis with

radar data, we used only Ml60 because this is the original unit provided in the radar product, and subhourly

time steps were irregular. We computed the radar maximum Ml60 as the maximum intensity forany time step

and any pixel in the drainage area. We also computed radar mean Ml60 by firstcomputing the mean Ml60 for each time step over all pixels in the drainage area, then taking the maximum of these drainage area mean

Ml60 values. For threshold analysis at Walnut Gulch, we computed thresholds for each flume using Ml60 values from each individual rain gauge in the flume's contributing area. This allowed us to evaluate how thresholds

vary with proximity of rain gauge. We also computed thresholds using the maximum and mean Ml60 for all rain gauges in the contributing area. We examined how flow frequency and thresholds varied with drainage area size for each study location inde­ pendently. We then tested for regional differencesin the scale dependence of these variables using a gener­ alized linear model of location (region), drainage area, and an interaction term. This analysis was performed using Proc GLM in SAS 9.4.

KAMPF ET AL. 9939 Water Resources Research 10.1029/2018WR023714

3.5. Multivariate Analysis To examine whether other variables besides rainfall affected streamflow occurrence, we used multivariate logistic regression in JMP version 13.0.0 (Klimberg & McCullough, 2012). For independent variables we used

rainfall (Ml60), drainage area, percent vegetation cover, mean slope, and antecedent precipitation index (API).

For Ml60, we used watershed mean values from rain gauges at Walnut Gulch; at Mohave and Yuma we used individual rain gauge values forsmall watersheds ( <0.1 km2) and radar mean values for all other watersheds. We computed drainage area and slope from 10-m digital elevation models using the watershed and slope tools in ArcGIS (ESRI, Redlands, CA, USA). For vegetation cover we used LANDFIRE Existing Vegetation Cover (LANDFIRE, 2014), which gives cover in 10% interval bins, and with these values we computed the area-weighted mean vegetation cover for each drainage area. We computed API following the method of Linsley et al. (1982) as

t APIt ¼ Pt þ P0k (2)

where tis the day, PO is the daily total precipitation forthe most recent prior day with rain, and k is a constant.

For any day with rainfall, PO resets to that day's total precipitation, and t resets to 0. We set k to 0.8 to repre­ sent rapid loss of moisture based on observed water contents in the floodplainsnear the study area channels (Kampf et al., 2016). We created logistic regressions to predict flow occurrence for sample subsets of Mohave and Yuma, Walnut Gulch, and all study sites combined. Each regression used log-transformed and standar­ dized values of the independent variables. We tested for collinearity of the independent variables and found that all of the independent variables were significantly correlated with one another. Therefore, we then con­

ducted principal component analysis on the independent variables. Ml 60 had the strongest correlation to PCl (0.86), followedby API (0.82); slope had the highest correlation to PC2 (0.56), cover to PC3 (0.79), and area to PC4 (0.47).

3.6. Performance Evaluation

We evaluated both rainfall threshold and logistic regression representations of flow using four metrics: p0, K,

FP, and FN. The observed agreement, p0, is the number of event responses (p0(flow) or p0(no flow)) correctly represented by the threshold divided by the total number of events. The kappa statistic of agreement, K, is computed as κ ¼ po pe (3) 1 pe

¼ ðÞðÞþðÞðÞ pe pobs flow pT flow pobs no flow pT no flow (4)

where Pobs is the fraction of observations of a specifiedtype (flowor no flow) and Pr is the fraction of thresh­ old predictions of the specified type. Higher values of K indicate greater agreement between the threshold predictions and the observations. Following Viera and Garrett (2005), values of K > 0.4 indicate moderate or better agreement, values >0.6 indicate substantial agreement, and values >0.8 indicate almost perfect agreement. We computed the false positives (FP) as the percent of all events for which flow was predicted when no flow was observed and falsenegatives (FN) as percent of all events for which flow was not predicted when flow was observed.

4. Results 4. 1. Rainfall From November 2011 to March 2015 the highest total rainfall in Mohave and Yuma Washes was in upper Mohave Wash (>500 mm) and the lowest in lower Yuma Wash (<200 mm). Many >1-mm rain events were highly localized; 15% had rain recorded at only one gauge, and only 21% reported rain at all gauges. Of the recorded rain storms, 54% were in summer (May-October; Hallack-Alegria & Watkins, 2007), and these had higher mean precipitation and maximum intensities than winter storms. For pixels containing rain gauges, timing of radar rainfall was similar to gauge rainfall, with FP and FN errors < 1 %

KAMPF ET AL. 9940 Water Resources Research 10.1029/2018WR023714

Table 3 (Table S1 ). Overall the fit between radar intensities and gauge intensities Performance of RainfallMetrics fo r Threshold Predictions of Flowor No Flow was good (R2 = 0.70). Radar rainfall totals were highest in upper Mohave RainfallVariable Data Source Po K FP FN and lowest in lower Yuma (Figure S3a). Over the nearly three-year per­

Mohave, Yuma iod of record for the radar data, Ml60 values for individual radar pixels were 10 mm/hr or higher for up to seven events in Yuma Wash and Closest gauge p 0.92 0.66 2.5 5.3 up to 12 events in Mohave Wash (Figure S3b). Nearly all of these Ml1s Closest gauge 4.8 0.91 0.62 4.5 high-intensity storms were in the summer. Ml30 Closest gauge 0.91 0.61 2.2 6.4 Ml6o Closest gauge 0.93 0.71 2.8 4.2 In Walnut Gulch, during the period of measurement at Mohave and Ml max Radar 60 0.94 0.44 1.4 3.6 Yuma, total precipitation was more than twice that of Mohave and Ml60 mean Radar 0.95 0.53 0.8 3.8 Walnut Gulch Yuma Washes, ranging from 931 to 1271 mm (Figure S4a). Highest rain­ fall totals were in the southeast half of the watershed, although the pat­ Pmax All rain gauges 0.87 0.47 2.0 10.9 Pmean All rain gauges 0.88 0.47 3.2 9.0 terns were locally variable. On average, 80% of the total precipitation Ml60 max All rain gauges 0.91 0.61 3.1 5.4 fell during the summer. During the May 2012-March 2015 time period, Ml6o mean All rain gauges 0.90 0.62 2.7 7.1 rain gauges at Walnut Gulch recorded 67 to 108 events with Ml60 Note. Po is the fractionof events correctly predicted, K is the kappa agree- exceeding 10 mm/hr (Figure S4b). This is nearly an order of magnitude ment statistic, FP is the percent of false positives, and FN is the percent falsenegatives. higher than the number of high-intensity events in Mohave and Yuma Washes. On average, 99% of these high-intensity events were during the summer season (May-October). Of the > 1-mm rainfall events, only 10% were recorded at all rain gauges, and on average, 45% of the gauges reported rainfall during an event.

4.2. Streamflow At Mohave and Yuma Washes, streamflow was infrequent, ranging from O events at YBA2 and YIA2 to 16 flow events at MPHl (Table 1) over the two- to three-year monitoring period. Seventy-seven percent of recorded flows were in summer. Winter flows were recorded in piedmont headwater sites, Yuma bedrock sites, one bedrock with alluvium site (MBAle), and one incised alluvium site (YIAle). Flows were highly localized, with an average of 30% of the monitoring sites recording flow during runoff-producing storms. Only one storm (13 July 2012) produced flowat all gauges, and this storm had the largest gauge and radar average precipita­ tion at 87 and 69 mm, respectively (Figure S5 and Movie S1). At Walnut Gulch, all stations recorded streamflowduring the monitoring period, and streams flowedan aver­ age of 3-11 times per year (Table 2). Flows were heavily concentrated in summer. More than 95% of flow events were during summers for the sites with drainage areas <0.1 km 2, and 98-100% of flow events were in summer in all other sites. The flows were usually localized in portions of Walnut Gulch, with 31 % of flumes on average recording flow during runoff-producing storms.

4.3. Rainfall Thresholds for Streamflow

All rainfall metrics tested work well for threshold prediction at Mohave and Yuma Washes (Table 3; p0 :::". 0.91, IC:::". 0.44). Precipitation depth (P) thresholds generally increased with drainage area from 5 to 32 mm when using the closest rain gauge, with higher thresholds in Mohave than Yuma Wash (Figure 2a). These thresholds

had moderate or better performance (IC>0.4) at most of the sites (Figure 2b). Ml60 thresholds performed best

overall using the closest rain gauge (Table 3). Threshold values from radar maximum Ml60 ranged from 5 to 14 mm/hr in watersheds <10 km2 and from 18 to 19 mm/hr in the watersheds >100 km2 (Figure 2c), with the exception of three sites that had thresholds >40 mm/hr (YPH2, MBA 1, MBDl ). The radar threshold perfor­

mance using the maximum Ml60 was fair or poor at most sites (Figure 2d). Mean Ml60 thresholds from radar ranged from 3 to 13 mm/hr (Figure 2e) and had a weak increase with the log of drainage area. These thresh­ olds performed well at most sites, with the exception of small watersheds <0.1 km2 (Figure 2f) because these are smaller than the ~ 1-km2 resolution of the radar.

Threshold values for Walnut Gulch varied with the rainfall metric used; overall, Ml60 worked better (IC= 0.61-

0.62 for maximum and mean Ml60, respectively) than P (IC= 0.47 for both maximum and mean; Table 3). Ml60 threshold values derived from individual rain gauges tended to increase with rain gauge distance from the flume (Figure 3a), and the performance of these threshold predictions declined sharply with distance from the flume (Figure 3b). The ICvalues indicated moderate or better performance (>0.4) for small watersheds 2 (<25 km ) in which the rain gauge was <2.5 km from the flume. When threshold analysis used rain

KAMPF ET AL. 9941 Water Resources Research 10.1029/2018WR023714

Figure 2. (a, c, and e) Rainfall thresholds versus drainage area and (b, d, and f) threshold performance K statistic versus drainage area for Mohave (M), Yuma (Y), and Walnut Gulch using the event total precipitation (P) from the (a and b) closest rain gauge, (c and d) maximum storm Ml60 across the watershed, and (e and f) mean storm Ml60 across the watershed. Rainfall values in (e) and (f) are from radar forMohave and Yuma and all rain gauges in the watershed at Walnut Gulch. Dashed horizontal line indicates the performance value above which threshold prediction is considered "moderate." For (c), three sites had thresholds higher than the range shown: YPH2 40 mm/hr and MBA 1 and MBDl 44 mm/hr.

gauges from across the watershed area, watershed mean P thresholds ranged from 12 to 27 mm, but most of these had only fair or poor agreement with observations (Figures 2a and 2b). Both the maximum and mean

Ml60 from all watershed rain gauges performed better for threshold prediction (Table 3; p0 2: 0.90, K 2: 0.61 ), with moderate or better performance in the small watersheds (<1 km2) and declining performance in larger

watersheds (Figure 2d). The maximum Ml60 threshold values increased with drainage area from around 6 mm/hr in the smallest watersheds up to over 26 mm/hr in the largest watersheds (Figure 2c). In contrast,

the mean Ml60 thresholds were in the 7-14-mm/hr range for all watersheds (Figure 2e). Thresholds for 2 maximum and mean Ml60 were the same for watersheds <0.1 km ; the difference between these

KAMPF ET AL. 9942 Water Resources Research 10.1029/2018WR023714

Figure 3. (a) Ml60 rainfallthresholds fromindividual rain gauges and (b) K performance statistic versus distance from rain gauge to flume for Walnut Gulch. Small watersheds are grouped together by drainage area, and the three largest watersheds (6, 2, 1) each have their own color and symbol. Dashed horizontal line indicates the performance value above which threshold prediction is considered "moderate."

thresholds increased with the log of drainage area for watersheds >1 km2 (Figure 4b). This difference in threshold values with drainage area was also evident in P thresholds (Figure 4a).

Thresholds for streamfloware inversely related to flowfrequency (p0(flow))because lower thresholds lead to more frequent streamflow.At Mohave and Yuma Washes, the piedmont headwater (PH) streams flowed 8-16

times, with 0.38 average p0(flow) (Figure 5). Bedrock (BK) headwater streams had lower average p0(flow) (0.22), with two to eight events recorded at each site. The larger drainage area bedrock with alluvium (BA)

sites flowed 0-8 times, with p0(flow) averaging 0.11. Similarly, incised alluvium (IA) sites flowed 0-6 times,

with 0.11 average p0(flow). Finally, the furthest downstream braided (BD) sites flowed 1-2 times, with mean

p0(flow) of 0.05. Flow frequencies (p0(flow)) values at the hyperarid Mohave and Yuma sites were higher than those in Walnut Gulch for small watersheds but not for large watersheds. The <0.1-km2 watersheds had

p0(flow) between 0.10 and 0.24, lower than corresponding small watersheds at Mohave and Yuma, whereas 2 the larger watersheds had p0(flow)between 0.09 and 0.14. For watersheds <1 km , generalized linear model

coefficients confirm that mean p0(flow) was higher (p = 0.020) and declined with drainage area faster (p = 0.014) at Mohave and Yuma Washes than at Walnut Gulch (Table S2). Flow frequency in watersheds >1 km2 did not differ among Mohave/Yuma and Walnut Gulch and declined with drainage area at similar rates in both regions. We also evaluated how thresholds varied with other independent variables describing watershed character­ istics. Comparing the two study regions, rainfall thresholds generally increased with watershed vegetation cover and decreased with watershed mean slope (Figure 6), although this pattern was not clearly evident within regions. Cover is higher, and slope is lower at most Walnut Gulch sites compared to Mohave and

Figure 4. Differencein (a) P and (b) Ml60 thresholds between maximum and mean watershed values versus drainage area forWalnut Gulch.

KAMPF ET AL. 9943 Water Resources Research 10.1029/2018WR023714

Yuma Washes, and this is associated with higher threshold values. Standardized coefficients from multivariate logistic regression analyses illustrate the relative importance of rainfall and watershed characteristics forflow occurrence (Table 4). Standardized slope coefficientswere highest for Ml60 in both regions (-2.1 to -3.5), and logistic regression with Ml60 alone performed nearly as well as logistic regression with the other inde­ pendent variables included. Regressions using rainfall alone were 80% accurate for Mohave and Yuma (radar rainfall) and 92% accurate for Walnut Gulch (rain gauge average). Adding area, slope, cover, and API improved regression performance to 84% accuracy at Mohave and Yuma but did not affect model accuracy at Walnut Gulch; collinearity of the inde­ pendent variables limited the value of adding the additional information. Comparison of standardized coefficientsdemonstrates that drainage area Figure 5. Flow frequency (po(flow)) versus drainage area for Mohave Wash (M), Yuma Wash (Y), and Walnut Gulch. Flow frequency is the fraction of (0.8) and mean slope (0.2) were the most influential watershed character­ >1-mm/hr rainfall events that produced flow, where rainfall values were istics in Mohave/Yuma,whereas vegetation cover and API had little effect. radar mean MI60 for Mohave and Yuma and rain gauge mean MI60 for For Walnut Gulch, slope, cover, and API were all significantvariables in the fi Walnut Gulch. Line ts shown as solid lines with dashed lines for 95% regression, although accuracy did not improve with the addition of more confidence intervals. independent variables. API (standardized coefficient 0.4) was the most important watershed characteristic followed by vegetation cover (-0.3) and mean slope (0.2). The logistic regression using all sites combined was similar to the model for Walnut

Gulch because of the larger data set compared to Mohave and Yuma Washes (p0 = 0.91, ,c = 0.56). The major­ ity of errors in the logistic regression predictions were false negatives (Table 4), similar to the errors forrainfall threshold predictions (Table 3).

5. Discussion 5.1 . Threshold Performance and Variability At both study areas, rainfallthresholds are strong predictors of streamflowoccurrence, except in some of the 2 >1-km Walnut Gulch watersheds. Our results show slightly stronger threshold predictions with MI60 than with total event precipitation (P) or shorter-duration maximum intensities (MI15,MI30), although the other rainfall metrics work nearly as well (Table 3) because these rainfall variables are highly correlated with one another (R2 = 0.95–0.98). Prior studies in this region have focused on shorter-duration maximum intensities (30 min or less), based on the idea that short-duration, high-intensity bursts of rainfall are responsible for overland flow generation (Osborn & Lane, 1969; Syed et al., 2003). However, similar to our findings, Koterba (1986) found that models predicting total and peak flow had similar performance with 10-, 20-, and 30-min intensities, and model performance tended to increase with duration considered. Given this ten- dency for better predictions using longer durations, we focused on 60 min because of the improved

Figure 6. MI60 rainfall thresholds versus (a) watershed percent vegetation cover and (b) watershed average slope. Threshold values are those derived from radar mean MI60 (Mohave, Yuma) and rain gauge mean MI60 (Walnut Gulch).

KAMPF ET AL. 9944 Water Resources Research 10.1029/2018WR023714

Table 4 Logistic Regression Equations and Performance for Representing Flowor No FlowResponse Yuma, Mohave Walnut Gulch All Sites

Coefficient MI60 All MI60 All MI60 All

b1 (Ml6ol -1.8 (0.25)* -2.1 (0.34)* -3.1 (0.07)* -3.5 (0.10)* -3.1 (0.07)* -3.4 (0.09)* b2 (area) 0.82 (0.22)* -0.07 (0.04) -0.10 (0.04)* b3 (slope) 0.19 (0.09)* 0.15 (0.05)* 0.20 (0.04)* b4 (cover) -0.12 (0.13) -0.28 (0.05)* -0.15 (0.04)* b5 (API) 0.04 (0.13) 0.37 (0.07)* 0.33 (0.06)* R2 0.23 0.34 0.42 0.42 0.41 0.41 AICc 295 211 6779 6444 7091 6697 Po 0.80 0.84 0.92 0.92 0.91 0.91 K 0.32 0.50 0.57 0.57 0.56 0.56 FP 4.0 3.8 2.0 2.0 2.1 2.0 FN 15.8 12.8 6.5 6.5 6.6 6.6

Note. All independent variables are log-transformedand standardized. Ml60 is the watershed mean value, API is the antecedent precipitation index, and bs are the regression coefficientsfor the indicated variables followed by the standard error in parentheses. *This indicates significance (p < 0.05).

performance and the longer period of integration, which facilitates comparisons of the tipping bucket rain gauge data with the irregular time step radar data.

Ml60 thresholds for a single site can range from Oto 32 mm/hr depending on the rainfall data used (Figure 3a). We found that rain gauges >2.5 km away fromstreamflow monitoring locations were unreliable for threshold predictions, and these tended to give the highest threshold values (Figure 3b). The <2.5-km distance needed for reliable rainfall information is consistent with the size of documented runoff-producing storms in Walnut Gulch, which have 1-2-km storm cores at a given time; outside of the core, rainfall declines sharply, extending 2.5-4 km from the storm center (Osborn & Laursen, 1973). Consequently rain gauges outside the storm core may miss the high rainfallintensities that produce runoff(Syed et al., 2003). At Walnut Gulch, the gauge spacing is about 1.5-2 km, allowing the gauge network to capture the runoff-producing rains for all subwatersheds. Few sites in the world have the rain gauge density of Walnut Gulch, and because of the unreliability of rain gauges >2.5 km from the watershed outlet, many applications may need to use radar data to determine thresholds. With the type of radar available for Mohave and Yuma, rain gauges are needed for threshold prediction in watersheds < 1 km2 because radar resolution is ~1 km, causing it to miss some of the localized high-intensity rains (Figure 2d). For watersheds larger than 1 km 2, radar is a useful supplement to rain gauge data. Radar data are affected by spatial and temporal scaling issues (Tustison et al., 2001), ground clutter (Chang et al., 2009; Matyas, 2010), as well as other uncertainties such as range-dependent bias and vertical variability in precipitation (Seo & Krajewski, 201 O; Smith et al., 1996; Villarini & Krajewski, 2010), so they should be calibrated to multiple rain gauges before being used for threshold predictions. Prior research has attributed the decline in flow frequency (Figure 5) and increase in thresholds (Figure 2) with drainage area to the combination of partial area storm coverage and transmission losses through the channel network (Goodrich et al., 1997). Our results help separate these two factors because differences in

threshold values between maximum and mean watershed-scale Ml60 relate to partial area storm coverage (Figure 4). Partial area storm coverage does not change threshold values for watersheds <1 km2, as indicated

by no difference in threshold values for the maximum Ml60 and the mean Ml60 over the watershed (Figure 4b). These watersheds are small enough that storm cores can cover the full drainage area. For water­ 2 sheds > 1 km , the difference between maximum and mean Ml60 increases with the log of drainage area (Figure 4b) because storm cores cover smaller portions of the total watershed area. The differencebetween 2 threshold values using the maximum Ml60 is greater than values using mean Ml60 by 4 mm/hr for a 2-km watershed up to 18-21 mm/hr for watersheds greater than 16 km2 (Figure 4b). This is likely because in large watersheds the maximum intensity of a storm must be high to produce sufficientoverland flow to overcome transmission losses and reach watershed outlets (Goodrich et al., 1997; Kirkby et al., 2005; Simanton & Osborn,

1983). In contrast, averaging the Ml60 across all contributing gauges (Figure 2e) dampens the partial area coverage effect on thresholds, removing the scale dependence in rainfall threshold values that is apparent

KAMPF ET AL. 9945 Water Resources Research 10.1029/2018WR023714

using maximum Ml 60 at Walnut Gulch (Figure 2c). We therefore recom­

mend using the drainage area mean Ml60 from a dense rain gauge net­ 2 work or from radar for threshold prediction in watersheds > 1 km , as this avoids confounding the partial storm coverage effect on threshold values with the transmission loss effect.

While a transmission loss effect is not clear in mean Ml60 thresholds for Walnut Gulch (Figure 2e), transmission loss is evident for flow frequency, which declines with increasing drainage area (Figure 5). Prior research has also demonstrated that transmission losses affect flow magnitude in these watersheds (Goodrich et al., 1997). Possibly the uncertainties in defining rainfall thresholds made this scaling behavior less evident in threshold values than in flow frequency, particularly because the 2 Figure 7. Event precipitation thresholds versus drainage area for this study thresholds did not perform well for most of the > 1-km watersheds 1 compared to other studies in arid and semiarid watersheds. This study, (Figure 2d). The transmission loss effect on threshold values is more 2Osborn & Lane (1969), 3Cammeraat (2004), 4Mayor et al. (2011 ), 5Yair & Klein apparent for Mohave and Yuma Washes (Figure 2e). The flow frequency (1973), 6Schick (1970) in Greenbaum et al. (2006), and 7Ries et al. (2017). also has a steeper decline with drainage area at Mohave and Yuma compared to Walnut Gulch (Figure 5 and Table 52). This could be because transmission losses are even greater in those watersheds, particularly for the extremely wide braided channels; excluding the braided channels, the area dependence of flow frequency (Figure 5)

and mean Ml60 thresholds (Figure 2e) would not be as evident.

5.2. Threshold Comparisons to Other Locations Studies of rainfall thresholds for overland flow generation in other regions have used a wide variety of rainfall metrics, including intensities over varying durations and total precipitation. The only consistent metric across all studies is depth of precipitation (Pl, so we use P to compare thresholds between regions. We identified comparable study sites in Spain, which have mean annual precipitation ranging from 235 to 274 mm (Cammeraat, 2004; Mayor et al., 2011; Rodrfguez-Caballero et al., 2014; Figure 7) and Israel, which have mean annual precipitation from 28 to 472 mm (Greenbaum et al., 2006; Ries et al., 2017; Vair & Klein, 1973). For the sites in Spain, where the mean annual precipitation is similar to Walnut Gulch, reported thresholds mostly fall within the range of the precipitation depth thresholds we identified (5-35 mm). These studies found increases in precipitation thresholds with increased drainage area (Cammeraat, 2004; Mayor et al., 2011). Spatial patterns of rainfall were not the focus, but the catchments examined were mostly smaller than the ~ 1-km2 area at which spatial variability in rainfall changes threshold values (Figure 4a), indicating that trans­ mission losses likely caused the threshold-scale dependence. In Israel, small ( < 1 km2 watershed studies were ) in the Negev desert, where the surface cover is mostly bare rock (Greenbaum et al., 2006), resulting in thresh­ olds at the lower end of the range we found in Arizona ( < 1 O mm). Larger watersheds in Israel examined by Ries et al. (2017) have a steep precipitation gradient, ranging from <200 mm at the watershed outlets up to over 600 mm in the headwaters. Their vegetation cover ranges from 21 to 24%, which is a similar cover range to Mohave and Yuma watersheds, yet the documented thresholds for streamflow were near 50 mm, higher than any of the thresholds we found. Ries et al. (2017) had a network of 15 rain gauges, but the spacing between some gauges was more than 20 km. Based on our findings, the thresholds they identified would likely have been lower if they had a denser rain gauge network and used watershed mean values. When all study regions are combined, precipitation thresholds increase significantly with the log of drainage area (Prob > F = 0.0002; Figure 7), but with considerable variability among study regions.

5.3. Other Factors Affecting Thresholds and Future Applications In addition to drainage area size, variability in threshold values can also relate to topography and ground sur­

face conditions. Comparing our two study areas, the Ml60 mean thresholds tended to be higher in Walnut 2 Gulch than Mohave and Yuma Washes, particularly for smaller ( <1 km ) watersheds. This may be due to dif­ ferences in ground cover between sites (Figure 6a), as surfaces with greater rock cover can generate runoffat lower rainfall thresholds (Abrahams & Parsons, 1991; Vair & Kossovsky, 2002). Mohave and Yuma headwater watersheds were either bedrock or desert pavement, and lower elevation surfaces had primarily 10-20% shrub cover. Of these land cover types, the piedmont headwater (PH) areas with desert pavement at

KAMPF ET AL. 9946 Water Resources Research 10.1029/2018WR023714

Mohave and Yuma had the highest flow frequencies. Low infiltration rates of desert pavement soils have been attributed to the saline-sodic soils below the rocky surface, where deflocculation of soil colloids during rainfall reduces infiltration rates (Musick, 1975). Comparisons of infiltration rates for desert pavements with different ages suggest that pedogenic processes can reduce infiltration rates over time (Meadows et al., 2008). Thresholds we documented on these surfaces are within the range of saturated hydraulic conductiv­ ities measured forrelatively old (4-100 ka) surfaces (3-6 mm/hr; Young et al., 2004). Consequently, the more frequent runoff on desert pavement is likely sensitive to disturbances that disrupt the pavement or soil below the rocky surface. Compared to desert pavement, watersheds with bedrock channels had less frequent flow because the exposed rhyolite and dacite is highly fractured, and surface topography is irregular enough to cause detention storage.

Pavement and bedrock were not prominent land cover types at the wetter and lower slope Walnut Gulch, where most surfaces had soil that allowed at least some infiltration. Walnut Gulch has a mixture of 30-40% cover of shrub- and grass-dominated vegetation. Saturated hydraulic conductivities measured at the surface of Walnut Gulch soils average 20 mm/hr for crusted soils and 25 mm/hr for uncrusted soils, but both soil types can have saturated hydraulic conductivities as low as 4-5 mm/hr (Becker et al., 2018). Rainfallthresholds from Walnut Gulch range from 7 to 16 mm/hr, which is lower than the mean but within the range of documented hydraulic conductivities. Research in semiarid Spain found that crusted areas were important sources of flow (Marchamalo et al., 2016; Rodrfguez-Caballero et al., 2014), and this may explain why the thresholds at Walnut Gulch are less than the average hydraulic conductivities. Vegetation patterns also affect infiltrationand over­ land flow and consequently the rainfallthresholds for response. Infiltrationcan be more common near shrubs than in bare spaces between, and the distribution of vegetation patches across catchments affects whether flow connects to downstream channels (Lesschen et al., 2009; Puigdefabregas et al., 1999). Further study of spatial patterns in soil crust and vegetation may help explain some of the variability in thresholds.

Although other studies have suggested that antecedent soil moisture may change the threshold required for overland flow (De Boer, 1992), API was only useful for threshold flow prediction in Walnut Gulch (Table 4). Extremely high evapotranspiration rates cause soils to dry rapidly at Mohave and Yuma Washes (Kampf et al., 2016), which may limit the role of initial wetness in runoff generation. The relatively short period of record may also have caused us to miss conditions in which antecedent conditions were important. While API was a significant predictor of flow occurrence for Walnut Gulch, it did not improve the performance of the logistic regression, potentially indicating that its role is also limited in Walnut Gulch. However, API was

significantly correlated to Ml 60 (r = 0.86), so its effects on flow could not be assessed independently. The API values we used were just a proxy for initial wetness, and measurements of soil moisture across the drainage areas could help determine if and how antecedent wetness affects threshold values. All thresholds we identified here were affected by our method of threshold identification. It is important to

use a consistent method when comparing between sites. Our method optimizes prediction accuracy (p0), and because these streams flowso infrequently, this results in mainly falsenegative errors. Optimizing to a different metric such as K would modify the threshold values. The methods we presented only determined presence or absence of flow, not flow magnitude, and they are intended to allow first-order comparison of rainfall-runoffbetween scales and regions. Further research could relate these findings to information on flow magnitude, more attributes of the storm systems such as size, velocities, and locations (Belachsen et al., 2017; Morin et al., 2006; Morin & Yakir, 2014), and/or attributes of the catchments such as drainage net­ work geometry (Dick et al., 1997; Kirkby et al., 2005).

One benefitof a threshold analysis is that it allows comparison across sites with differentgauge densities and precipitation records. A heavily instrumented watershed like Walnut Gulch is expensive to build and main­ tain. While this level of instrumentation is important for examining hydrologic processes, if we base our hydrologic understanding solely on locations with extensive long-term instrumentation, we will miss impor­ tant sources of variability in hydrologic response. Rainfall threshold analysis is a low-cost method that does not rely on measuring channel discharge, which is challenging in desert ephemeral streams. Thresholds enable comparisons of scaling behavior between sites (Figure 7) and allow us to examine the relative impor­ tance of rainfall and watershed properties on streamflow response (Table 4). Our results highlight how the choice of rainfall data affects both the magnitude of thresholds for an individual watershed and the apparent scale dependence of thresholds between watersheds. Therefore, future studies on the scaling of both rainfall

KAMPF ET AL. 9947 Water Resources Research 10.1029/2018WR023714

thresholds and runoffvolumes should incorporate dense rain gauge networks and/or radar rainfall data for larger watersheds. In future applications, rainfallthresholds of flow initiation can be used in warning systems as lower envelopes of the rainfall conditions likely to cause flooding in dryland ephemeral streams. Although such warning systems focus on flow magnitude (Carpenter et al., 1999; Clark et al., 2014; Gourley et al., 2014; Martina et al., 2005; Morin et al., 2009; Reed et al., 2002, 2007), it can be difficultto forecastephemeral stream­ flow accurately, and the threshold-based predictions developed here can help reduce some of the false posi­ tive or false negative errors in existing prediction systems. Field hydrologists can also use rainfall thresholds to predict times of erosion and transport and determine when fieldvisits for sample collection are needed. Rainfallthresholds combined with rainfall frequency information are also useful for riparian restora­ tion applications, as they can help predict flow frequency, which affects plant water availability and seed dispersal.

6. Summary and Conclusions Ephemeral channels in the hyperarid Mohave and Yuma Washes experienced an order of magnitude fewer high-intensity storms than those in semiarid Walnut Gulch, but the two sites had similar flow frequencies and thresholds for streamflow generation. The occurrence of streamflow could be reproduced to >90% accu­ racy by 60-min rainfallintensity thresholds, except in the larger watersheds of Walnut Gulch. The magnitudes of these thresholds varied with the choice of rain gauge data, whether aggregated across watersheds or using a single gauge location. Single rain gauges were only useful for threshold prediction in watersheds <25 km2, where the rain gauge was within 2.5 km of the watershed outlet. For watersheds longer or wider than 2.5 km, multiple rain gauges and/or radar rainfall data are needed for reliable threshold prediction. Scale dependence of threshold values can be an artifact of the choice of rainfall data, as incomplete rainfall information can lead to increases in thresholds with greater drainage area. To provide a consistent metric for comparing thresholds across differentwatersheds, we recommend using watershed mean rainfallintensities, which dampen the scale dependence of thresholds caused by partial area storm coverage. Thresholds using watershed mean intensities have lower scale dependence in semiarid than in hyperarid watersheds. Thresholds are lowest for small hyperarid watersheds with relatively impermeable desert pavement or bed­ rock compared to the semiarid watersheds that have higher soil and vegetation cover. Although many hydrologic applications require information on total flow magnitude, obtaining such data is challenging in ephemeral streams, which have dynamic channel geometry and unstable rating curves. Better understanding of rainfall thresholds that produce streamflow can help fill in gaps in discharge records and improve prediction of when and where flow is likely. The thresholds determined forour study sites are con­ sistent with the ranges of thresholds documented in other arid and semiarid regions and are within the range of saturated hydraulic conductivities documented forthe study areas. Future research in new study areas can add to the data sets presented here and help refinemethods for using scale, land cover, and other watershed properties to predict rainfall thresholds in ungauged streams.

Acknowledgments This research was funded by The References Strategic Environmental Research and Abrahams, A. D., & Parsons, A. J. (1991 ). Relation between infiltration and stone cover on a semiarid hillslope, southern Arizona. Journalof Development Program (SERDP) through Hydrology, 122(1 -4), 49-59. https://doi.org /10.1016/0022-1 694(91 )90171-D contract RC-1725. Thanks to SERDP Anagnostou, M. N., Kalogiros, J., Anagnostou, E. N., Tarolli, M., Papadopoulos, A., & Borga, M. (201 0). Performance evaluation of high­ progra m manager John Hall, staff at resolution rainfall estimation by X-band dual-polarization radar forflash flood applications in mountainous basins. Journalof Hydrology, Yuma Proving Ground, and to Andrew 394(1-2), 4-16. https://doi.org/1 0.1016/j.jhydrol.201 0.06.026 Carlson, Ben Conrad, Keith Faulconer, Bacon, S. N., McDonald, E. V., Caldwell, T. G., & Dalldorf, G. K. (2010). Timing and distribution of sedimentation in response to Julie Kray, and Nick Sutfin for their strengthening of late Holocene ENSO variability in the Sonoran desert,southwestern Arizona, USA. QuaternaryResearch, 73(3), 425-438. contributions to fieldme asurements. https://doi.org/1 0.1 01 6/j.yqres.201 0.01 .004 Walnut Gulch data are available through Becker, R., Gebremichael, M., & Marker, M. (201 8). Impact of soil surface and subsurface properties on soil saturated hydraulic conductivity in the Agricultural Research Service the semi-arid walnut gulch experimental watershed, Arizona, USA. Geoderma, 322, 112-120. https://doi.org/10.1016/j.geoderma.2018. Southwest Watershed Research Center 02.023 at https://www.tucson.ars.ag.gov/dap/. Belachsen, I., Marra, F., Peleg, N., & Morin, E. (201 7). Convective rainfall in dry climate: Relations with synoptic systems and flash-flood gen­ Mohave and Yuma data are available on eration in the Dead Sea region. Hydrology and Earth SystemSciences Discussions, 1-32. https://doi.org/1 0.5 1 94/hess-201 7-235 request. Threshold analyses also Caldwell, T. G., Young, M. H., McDonald, E. V., & Zhu, J. (2012). Soil heterogeneity in Mohave desert shrublands: Biotic and abiotic processes. benefited from conversations on the Water Resources Research, 48, W09551. https://doi.org/10.1029/2012WR01 1 963 topic with Lee MacDonald, Codie Cammeraat, E. L. (2004). Scale dependent thresholds in hydrological and erosion response of a semi-arid catchment in Southeast Spain. Wilson, Tim Green, and Aditi Bhaskar. Agriculture, Ecosystems & Environment, 704(2), 317-332. https://doi.org/1 0.101 6/j.agee.2004.01.032

KAMPF ET AL. 9948 Water Resources Research 10.1029/2018WR023714

Carpenter, T. M., Sperfslage, J. A., Georgakakos, K. P., Sweeney, T., & Fread, D. L. (1999). National threshold runoffestimation utilizing GIS in supportof operational flash floodwarn ing systems. Journalof Hydrology, 224(1-2), 21-44. https://doi.org/10.1 0l 6/S0022-1 694(99)001 1 5-8 Chang, P. L., Lin, P. F., Jong-Dao Jou, B., & Zhang, J. (2009). An application of reflectivity climatology in constructing radar hybrid scans over complex terrain. Journalof Atmospheric and Oceanic Technology, 26(7), 1315-1327. https://doi.org/10.1 175/2009JTECHA 1 162.1 Clark, R. A., Gourley, J. J., Flamig, Z. L., Hong, Y., & Clark,E. (2014). CONUS-wide evaluation of National Weather Service flash flood guidance products. Weather and Forecasting, 29(2), 377-392. https://doi.org/10.1 175/WAF-D-12-00124.1 De Boer, D. H. (1992). Constraints on spatial transference of rainfall-runoffrela tionships in semiarid basins drained by ephemeral streams. Hydrological Sciences Journal, 37(5), 491-504. https://doi.org/10.1080/02626669209492614 Dick, G. S., Anderson, R. S., & Sampson, D. E. (1997). Controls on flash flood magnitude and shape, upper Blue Hills badlands, Utah. Geology, 25(1 ), 45-48. https://doi.org/10.11 30/0091-7613(1997)025<0045:COFFMA>2.3.CO;2 Dunkerley, D. (2008). Rain event properties in nature and in rainfall simulation experiments: A comparative review with recommendations forin creasingly systematic study and reporting. Hydrological Processes, 22(22), 4415-4435. https://doi.org/1 0.1002/hyp.7045 Faulconer, J. (2015). Thresholds forruno ffgen eration in ephemeral streams with varying morphology in the Sonoran desert in Arizona, USA, (M.S. thesis Watershed Science). Colorado State University. Friedman, J. M., & Lee, V. J. (2002). Extreme ,channel change, and riparian forests along ephemeral streams. Ecological Monographs, 72(3),409-425. https://doi.org/1 0. l 890/001 2-9615(2002)072[0409:EFCCAR]2.0.C0;2 Gioia, A., Iacobellis, V., Manfreda, S., & Fiorentino, M. (2008). Runoffthresholds in derived flood frequency distributions. Hydrology and Earth System Sciences, 12(6), 1295-1307. https://doi.org/10.51 94/hess-12-1295-2008 Goodrich, D. C., Keefer, T. 0., Unkrich, C. L., Nichols, M. H., Osborn, H. B., Stone, J. J., & Smith, J. R. (2008). Long-term precipitation database, walnut gulch experimental watershed, Arizona, United States. Water Resources Research, 44, W05S04. https://doi.org/10.1029/ 2006WR005782 Goodrich, D. C., Lane, L. J., Shillito, R. M., Miller, S. N., Syed, K. H., & Woolhiser, D. A. (1997). Linearity of basin response as a function of scale in a semiarid watershed. Water Resources Research, 33(1 2), 2951 -2965. https://doi.org/10.1029/97WR01422 Goudenhoofdt, E., & Delobbe, L. (2009). Evaluation of radar-gauge merging methods forqua ntitative precipitation estimates. Hydrology and Earth SystemSciences, 73(2), 195-203. https://doi.org/1 0.51 94/hess-13-195-2009 Gourley, J. J., Flamig, Z. L., Hong, Y., & Howard, K. W. (2014). Evaluation of past, present and future tools for radar-based flash-flood prediction in the USA. Hydrological Sciences Journal,59(7), 1377-1 389. https://doi.org/10.1080/02626667.2014.919391 Greenbaum, N., Ben-Zvi, A., Haviv, I., & Enzel, Y. (2006). The hydrology and paleohydrology of the Dead Sea . Geological Society of America Special Paper 401. Hallack-Alegria, M., & Watkins, D. W. Jr. (2007). Annual and warm season drought intensity-duration-frequency analysis for Sonora, Mexico. Journalof Climate,20(9), 1897-1 909. https://doi.org/1 0.1 175/JCLl4101.1 Kampf, S. K., Faulconer, J., Shaw, J. R., Sutfin, N. A., & Cooper, D. J. (201 6). Rain and channel flow supplements to subsurface water beneath hyper-arid ephemeral stream channels. Journalof Hydrology, 536, 524-533. https://doi.org/1 0.1 0l 6/j.jhydrol.2016.03.016 Kirkby, M., Bracken, L., & Reaney, S. (2002). The influence of land use, soils and topography on the delivery of hillslope runoff to channels in SE Spain. Earth Surface Processes and Landforms, 27(13), 1459-1473. https://doi.org /10.1002/esp.441 Kirkby, M. J., Bracken, L. J., & Shannon, J. (2005). The influence of rainfall distribution and morphological factors on runoffdel ivery from dryland catchments in SE Spain. Catena, 62(2-3), 136-1 56. https://doi.org/1 0.101 6/j.catena.2005.05.002 Klimberg, R., & McCullough, B. D. (2012). Fundamentals of predictive analysis with JMP•. chap. 5, Logistic Regression. Cary, North Carolina: SAS Institute, Inc. Koterba, M. T. (1986). Differences influences of storm and watershed characteristics on runofffrom ephemeral streams in southeastern Arizona, (PhD dissertation). Tucson, AZ: Departmentof Hydrology and Water Resources, University of Arizona. LANDFIRE (201 4). Existing Vegetation Cover Layer, LANDFIRE 1.4.0, U.S. Department of the Interior, Geological Survey. Retrieved from http:// landfire.cr.usgs.gov/viewer/ Lesschen, J. P., Schoorl, J. M., & Cammeraat, L. H. (2009). Modelling runoffand erosion for a semi-arid catchment using a multi-scale approach based on hydrological connectivity. Geomorphology, 709(3-4), 174-183. https://doi.org/10.1016/j.geomorph.2009.02.030 Linsley, R. K., Kohler, M. A., & Paulhus, J. L. H. (1982). Hydrology forEngineers (3rd ed.). New York: McGraw-Hill. Marchamalo, M., Hooke, J. M., & Sandercock, P. J. (2016). Flow and sediment connectivity in semi-arid landscapes in SE Spain: Patterns and controls. Land Degradation & Development, 27(4), 1032-1044. https://doi.org/10.1002/ldr.2352 Martina,M. L. V., Todini, E., & Libralon, A. (2005). A Bayesian decision approach to rainfall thresholds based floodwa rning. Hydrology and Earth SystemSciences Discussions, 2(6), 2663-2706. https://doi.org/1 0.51 94/hessd-2-2663-2005 Martfnez-Mena,M., Albaladejo, J., & Castillo, V. M. (1998). Factors influencing surface runoffgener ation in a Mediterranean semi-arid envir­ onment: Chicamo watershed, SE Spain. Hydrological Processes, 12(5), 741-754. https://doi.org/10.1002/(SICl)1099-1085(19980430)12:5 <741 ::AID-HYP622>3.0.CO;2-F Matyas, C. J. (2010). Use of ground-based radar for climate-scale studies of weather and rainfall.Geography Compass, 4(9), 1218-1237. https://doi.org/1 0.1111/j.1 749-8198.2010.00370.x Mayor, A. G., Bautista, S., & Bellot, J. (201 1). Scale-dependent variation in runoff and sediment yield in a semiarid Mediterranean catchment. Journalof Hydrology, 397(1-2), 128-135. https://doi.org/10.1016/j.j hydrol.2010.1 1.039 McDonald, E., Hamerlynck, E., McAullife, J., & Caldwell, T. (2004). Analysis of desert shrubs along first-order channelson desert piedmonts: Possible indicators of ecosystem condition and historical variation. SERDP Seed project #CSl 153, Final Technical Report. McFadden, L. D., Wells, S. G., & Jercinovich, M. J. (1987). Influences of eolian and pedogenic processes on the origin and evolution of desert pavements. Geology, 15(6), 504-508. https://doi.org/1 0.1 1 30/0091-7613(1987) 15<504:IOEAPP> 2.0.CO;2 Meadows, D. G., Young, M. H., & McDonald, E. V. (2008). Infl uence of relative surface age on hydraulic properties and infiltration on soils associated with desert pavements. Catena, 72(1), 169-1 78. https://doi.org/10.1016/j.catena.2007.05.009 Miller, S. N. (1995). An analysis of channel morphology at Walnut Gulch: Linking field research with GIS applications, (MS thesis). Univ. of Arizona. Morin, E., Goodrich, D. C., Maddox, R. A., Gao, X., Gupta, H. V., & Sorooshian, S. (2006). Spatial patterns in thunderstorm rainfallevents and their coupling with watershed hydrological response. Advances in Water Resources, 29(6), 843-860. https://doi.org/1 0.101 6/j.advwatres.2005. 07.014 Morin, E., Jacoby, Y., Navan, S., & Bet-Halachmi, E. (2009). Towards flash-flood prediction in the dry Dead Sea region utilizing radar rainfall information. Advancesin Water Resources, 32(7), 1066-1 076. https://doi.org/10.1016/j.advwatres.2008.1 1.011 Morin, E., Krajewski, W. F., Goodrich, D. C., Gao, X., & Sorooshian, S. (2003). Estimating rainfall intensities from weather radar data: The scale dependency problem. Journalof Hydrometeorology, 4(5), 782-797. https://doi.org /10.1 1 75/1525-7541 (2003)004<0782:ERIFWR>2.0.CO;2

KAMPF ET AL. 9949 Water Resources Research 10.1029/2018WR023714

Morin, E., & Yakir, H. (2014). Hydrological impact and potential flooding of convective rain cells in a semi-arid environment. Hydrological Sciences Journal, 59(7), 1353-1 362. https://doi.org/1 0.1 080/02626667.2013.841 315 Musick,H. B. (1 975). Barrenness of desert pavement in Yuma County, Arizona. Journalof the ArizonaAcademy ofScience, 10(1 ), 24-28. https://doi.or g/1 0.2307/4002 1317 Osborn, H. B., & Lane, L. (1969). Precipitation-runoff relations for very small semiarid rangeland watersheds. Water Resources Research, 5(2), 419-425. https://doi.org/10.1 029/WR005i002p00419 Osborn, H. B., & Laursen, E.M. (1 973). Thunderstorm runoffin southeastern Arizona. Journalof Hydraulics Division, ASCE, 99(HY7), 1 129-1 145. Osterkamp, W. R. (2008). Geology, soils, and geomorphology of the walnut gulch experimental watershed, Tombstone, Arizona. Journalof the Arizona-Nevada Academy of Science, 40(2), 136-1 54. https://doi.org/10.2181 /1 533-6085-40.2. l 36 Pierini, N. A., Vivoni, E. R., Robles-Morua, A., Scott, R. L., & Nearing, M. A. (2014). Using observations and a distributed hydrologic model to explore runoffthresholds linked with mesquite encroachment in the Sonoran desert. Water Resources Research, 50, 8191 -8215. https://doi .org/10.1 002/201 4WRO 1 5 781 Puigdefabregas, J., Sole, A., Gutierrez, L., Del Barrio, G., & Boer, M. (1 999). Scales and processes of water and sediment redistribution in drylands: Results from the Rambla Honda fieldsite in Southeast Spain. Earth-Science Reviews, 48(1 -2), 39-70. https://doi.org/10.1016/ 5001 2-8252(99)00046-X Reed, S., Johnson, D., & Sweeney, T. (2002). Application and national geographic information system database to supporttwo-year floodand threshold runoffestimates. Journalof Hydrologic Engineering, 7(3), 209-219. https://doi.org/10.1061/(ASCE)1 084-0699(2002)7:3(209) Reed, S. M., Schaake, J., & Zhang, Z. (2007). A distributed hydrologic model and threshold frequency-based method for flash at ungauged locations. Journalof Hydrology, 337(3-4), 402-420. https://doi.org/10.1 0l 6/j.jhydrol.2007.02.015 Ries, F., Schmidt, S., Sauter, M., & Lange, J. (2017). Controls on runoffgener ation along a steep climatic gradient in the eastern Mediterranean. Journalof Hydrology: Regional Studies, 9, 18-33. https://doi.org/1 0.1016/j .ejrh.2016.1 1.001 Rodriguez-Caballero, E., Canton, Y., Lazaro, R., & Sole-Benet, A. (2014). Cross-scale interactions between surface components and ra infall properties. Non-linearities in the hydrological and erosive behavior of semiarid catchments. Journalof Hydrology, 517, 81 5-825. https://doi.org/1 0.1 0l 6/j.jhydrol.2014.06.01 8 Schick, A. P. (1970). Desert floods: Interim results of observations in Nahal Yael research watershed, southern Israel, 1965-1 970. IAHS­ UNESCO Symposium - representative and experimental basins, Dec 1970, pp. 478-493. Seo, B. C., & Krajewski, W. F. (2010). Scale dependence of radar rainfall uncertainty: Initial evaluation of NEXRAD's new super-resolution data forhydrol ogic applications. Journalof Hydrometeorology, 11(5), 1191-1 198. https://doi.org/1 0.1 175/201 0J HMl 265.1 Shaw, J., & Cooper, D. J. (2008). Linkages among watersheds, stream reaches, and riparian vegetation in dryland ephemeral stream networks. Journalof Hydrology, 350(1-2), 68-82. https://doi.org/1 0.1 0l 6/j.jhydrol.2007.1 1 .030 Shreve, F., & Wiggins, I. L. (1964). Vegetation and Flora of the Sonoran Desert. Stanford,CA: Stanford University Press. Simanton, J. R., & Osborn, H. B. (1 983). Runoff estimates for thunderstorm rainfallon small rangeland watersheds. In Hydrology and Water Resources in Arizona and the Southwest (Vol. 13, pp. 9-1 5). Flagstaff, AZ: Arizona-Nevada Academy of Science. Skirvin, S., Kidwell, M., Biedenbender, S., Henley, J. P., King, D. M., Collins, C. D. H., et al. (2008). Vegetation data, Walnut Gulch experimental watershed, Arizona, United States. Water Resources Research, 44, W05508. https://doi.org/10.1029/2006WR005724 Smith, J. A., Seo, D. J., Baeck, M. L., & Hudlow, M. D. (1 996). An intercomparison study of NEXRAD precipitation estimates. Water Resources Research, 32(7), 2035-2045. https://doi.org/ 10.1 029/96WR00270 Springer, M. E. (1958). Desert pavement and vesicular layer of some soils of the desert of the Lahontan Basin, Nevada. SoilScience Societyof America Journal,22 (1 ), 63-66. https://doi.org/1 0.21 36/sssajl 958.03615995002200010017x Stone, J. J., Nichols, M. H., Goodrich, D. C., & Bruno, J. (2008). Long-term runoffdatabase, walnut gulch experimental watershed, Arizona, United States. Water Resources Research, 44, W05S05. https://doi.org/10.1029/2006WR005733 Stromberg, J. C., Hazelton, A. F., & White, M. S. (2009). Plant species richness in ephemeral and perennial reaches of a dryland river. Biodiversity and Conservation, 7 8(3), 663-677. https://doi.org/10.1007/s 10531-008-9532-z Sutfin, N. A., Shaw, J., Wohl, E. E., & Cooper, D. (2014). A geomorphic classification of ephemeral channels in a mountainous, arid region, southwestern Arizona, USA. Geomorphology, 22 1, 164-1 75. https://doi.org/10.1 0l 6/j.geomorph.201 4.06.005 Syed, K. H., Goodrich, D. C., Myers, D. E., & Sorooshian, S. (2003). Spatial characteristics of thunderstorm rainfall fields and their relation to runoff. Journalof Hydrology, 271 (1-4), 1-21. https://doi.org/1 0.1 0l 6/50022-1694(02)0031 1 -6. Tustison, B., Harris, D., & Foufoula-Georgiou, E. (2001 ). Scale issues in verification of precipitation forecasts.Journal of GeophysicalResearch, 706(Dl 1), 11,775-1 1 ,784. https://doi.org/1 0.1 029/2001JD900066 Viera, A. J., & Garrett, J. M. (2005). Unterstanding interobserver agreement: The kappa statistic. FamilyMedicine, 37(5), 360-363. Villarini, G., & Krajewski, W. F. (2010). Review of the different sources of uncertainty in single polarization radar-based estimates of rainfall. Surveys in Geophysics, 31(1 ), 107-1 29. https://doi.org/10.1007/s l 0712-009-9079-x Xie, H., Zhou, X., Hendrickx, J. M., Vivoni, E. R., Guan, H., Tian, Y. Q., & Small, E. E. (2006). Evaluation of NEXRAD stage Ill precipitation data over a semiarid region. Journalof the American Water Resources Association, 42(1 ), 237-256. https://doi.org/1 0. l l 11/j. l 752-1 688.2006.tb03837 .x Vair, A., & Klein, M. (1973). The influence of surface properties on flow and erosion processes on debris covered slopes in an arid area. Catena, 7, 1-18. https://doi.org/1 0.1 016/50341-8162(73)80002-5 Vair, A., & Kossovsky, A. (2002). Climate and surface properties: Hydrological response of small arid and semi-arid watersheds. Geomorphology, 42(1 -2), 43-57. https://doi.org/1 0.1 0l 6/S01 69-555X(01 )00072-1 Vair, A., & Raz-Yassif, N. (2004). Hydrological processes in a small arid catchment: Scale effects of rainfall and slope length. Geomorphology, 61(1-2), 1 55-1 69. https://doi.org/1 0.1 0l 6/j.geomorph.2003.1 2.003 Young, M. H., McDonald, E. V., Caldwell, T. G., Benner, S. G., & Meadows, D. G. (2004). Hydraulic properties of a desert soil chronosequence in the Mohave desert, USA. Vadose Zone Journal, 3(3), 956-963. https://doi.org/10.2136/vzj2004.0956

KAMPF ET AL. 9950