UNIVERSITY OF MIAMI

SINGLE-BEAM ACOUSTIC SEABED CLASSIFICATION IN ENVIRONMENTS WITH APPLICATION TO THE ASSESSMENT OF GROUPER AND SNAPPER HABITAT IN THE UPPER FLORIDA KEYS, USA

By

Arthur C. R. Gleason

A DISSERTATION

Submitted to the Faculty of the University of Miami in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Coral Gables, Florida

May 2009

©2009 Arthur C. R. Gleason All Rights Reserved UNIVERSITY OF MIAMI

A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

SINGLE-BEAM ACOUSTIC SEABED CLASSIFICATION IN ENVIRONMENTS WITH APPLICATION TO THE ASSESSMENT OF GROUPER AND SNAPPER HABITAT IN THE UPPER FLORIDA KEYS, USA

Arthur C. R. Gleason

Approved:

______R. Pamela Reid, Ph.D. Terri A. Scandura, Ph.D. Associate Professor of Marine Geology Dean of the Graduate School and Geophysics

______Eugene C. Rankey, Ph.D. Thomas Hahn, Ph.D. Assistant Professor of Marine Geology Assistant Professor of Applied and Geophysics Marine Physics

______G. Todd Kellison, Ph.D. Jonathan M. Preston, Ph.D. Chief, Fisheries Ecosystems Branch Senior Scientist NOAA Southeast Fisheries Science Center Quester Tangent Corporation

GLEASON, ARTHUR C. R. (Ph.D., Marine Geology and Geophysics) Single-Beam Acoustic Seabed Classification in (May 2009) Coral Reef Environments with Application to the Assessment Of Grouper And Snapper Habitat in the Upper Florida Keys, USA

Abstract of a dissertation at the University of Miami.

Dissertation supervised by R. Pamela Reid. No. of pages in text. (173)

A single-beam acoustic seabed classification system was used to map coral reef environments in the upper Florida Keys, USA, and the Bahamas. The system consisted of two components, both produced by the Quester Tangent Corporation. A QTCView Series

V, operating with a 50 kHz sounder, was used for data acquisition, and IMPACT software was used for data processing and classification. First, methodological aspects of system performance were evaluated. Second, the system was applied to the assessment of grouper and snapper habitat. Two methodological properties were explored: transferability (i.e. mapping the same classes at multiple sites) and reproducibility (i.e. surveying one site multiple times). The transferability results showed that a two-class scheme of hard bottom and sediment could be mapped at four sites with overall accuracy ranging from 73% to 86%. The locations of most misclassified echoes had one of two characteristics: a thin sediment veneer overlying hard bottom or within-footprint relief on the order of 0.5 m or greater. Reproducibility experiments showed that consistency of acoustic classes between repeat transects over the same area on different days varied, for the most part, between 50% and 65%. Consistency increased to between 78% and 92% when clustering was limited to two acoustic classes, to between approximately 70% and

100% when only echoes acquired within two degrees of nadir in the pitch direction were used, and to between 81% and 87% when a limited set of features was used for classification. The assessment of grouper and snapper habitat addressed the question whether areas of high fish abundance were associated with characteristic acoustic or geomorphological signatures. The results showed, first, that the hard bottom / sediment classification scheme was a useful first step for stratifying survey areas to increase efficiency of grouper census efforts. Second, an index of acoustic variability complemented the hard bottom / sediment classification by further targeting areas of potential grouper habitat. Finally, five grouper and snapper spawning aggregation sites were all found to have similar associations with drowned shelf edge reefs in the upper

Florida Keys.

ACKNOWLEDGEMENTS Funding for this work was provided by Office of Naval Research grant

N000140110671 to Pamela Reid and grants from the NOAA Coral Reef Initiative to

Anne-Marie Eklund, Todd Kellison, and Margaret Miller. Financial support was also provided by a University of Miami Fellowship and the Yamaha Contender Miami Billfish

Tournament Circle of Friends Fellowship. Charlie and Lisa Evans generously donated the use of their vessel GRITS for much of the work in the Florida Keys.

The Florida Keys National Marine Sanctuary and Biscayne National Park issued permits FKNMS-2007-008 and BISC-2007-SCI-0021.

Diving, boat operations, data acquisition, and data analysis were conducted with the assistance of the following. From UM/RSMAS: Mike Anderson, Grant Basham,

Christine Bauer, Chris Boynton, Marilyn Brandt, Albert Chapin, Cassie Clark, Manuel

Collazo, Vanessa Damoulis, Meghan Dick, Daniel Doolittle, Megan Fairobent, Ryan

Freedman, Mike Feeley, Jack Fell, Brooke Gintert, Rick Gomez, Jimmy Herlan,

Veronique Koch, Phil Kramer, Shawn Lake, Bob Loos, Eric Louchard, Miguel

McKinney, Julie Mintzer, John Parkinson, Dave Powell, Laura Rock. From

NOAA/SEFSC: Heather Balchowsky, Neil Baertlein, Sean Cimulluca, Joe Contillo, Leah

Harman, Doug Harper, Tom Jackson, Jack Javech, Lindsey Kramer, Dave McClellan,

Mark Miller, Jen Schull, Mark Vermeij, Dana Williams. From the Perry Institute of

Marine Science / Lee Stocking Island: Craig & Tara Dahlgren, Brian Kakuk. From the

U.S. Navy's Atlantic Undersea Testing and Evaluation Center: Matt Accordino, Henry

Buerkert, Marc Ciminello, Tom Szlyk. From NOAA/AOML: Paul Dammann and Jules

Craynock. The crew of the Coral Reef II: Mike Fielder, Keith Pamper, Lou Roth, John

Rothchild

iii

Roberto Torres provided the locations of the Ocean, Watson's, and Davis reef sites described in Chapter 6. Roberto's willingness to share the experiences of decades of fishing the Florida Keys was both illuminating and crucial to testing the wider applicability of results from the initial surveys at Carysfort reef.

I enjoyed many conversations about seabed mapping with Bernhard Riegl, of the

National Coral Reef Institute at Nova Southeastern University. Descriptions of the seabed classes he mapped near Cabo Pulmo, Mexico helped provide context for some of the results in Chapter 3. Bernhard’s generous loan of a National Instruments data acquisition board enabled simultaneous operation of multiple Quester Tangent systems during a survey in November 2006.

The contributions and hospitality of Ben Biffard and Steve Bloomer, of the

University of Victoria, are gratefully acknowledged. Ben conducted the BORIS model runs that were used in Section 4.7. Steve participated in the Andros Island survey. Both provided constructive comments on drafts of Chapter 4 and insight into single-beam seabed classification techniques.

The assistance of the Quester Tangent Corporation marine division is also greatly appreciated. Karl Rhynas and Rick Pearson provided detailed training and troubleshooting for hardware and software glitches. Glenda Wyatt provided crucial support in organizing the Second Acoustic Seabed Classification Workshop at RSMAS in

April 2006. The patience and insight of Bill Collins and Jon Preston proved essential.

Both Bill and Jon offered constructive criticism, instruction, and guidance many times over the duration of this project, but two particular instances are worth mentioning. Bill provided on-site training and guidance for and assistance with data collection at Lee

iv

Stocking Island (Chapter 3). Jon provided libraries for accessing the IMPACT data structures, which were necessary to create Figures 4.12, 6.2, to perform the analysis in

Section 4.5.2.2, and, more generally, to acquire an intuitive insight into the echo classification process.

The support of my committee members Pamela Reid, Gene Rankey, Thomas

Hahn, Todd Kellison, and Jon Preston could not have been greater. I have enjoyed

interacting with each of them and deeply appreciate their assistance and availability,

especially when consulted in the face of a looming deadline. Particular thanks are due to

Pam for exhibiting tremendous trust and unflagging support while letting me explore my own topics of interest. It has been a great pleasure to work with her.

Finally, I cannot express enough gratitude for the support of my family, including

my parents, Paul and Phyllis, my grandmother, Isabelle, and my wife Louise.

v

TABLE OF CONTENTS

LIST OF FIGURES ...... ix

LIST OF TABLES ...... xii

Chapter 1: Introduction ...... 1 1.1 What can be mapped with single-beam acoustic seabed classification in the coral reef environment? ...... 5 1.2 What applications can utilize single-beam acoustic seabed classification in the coral reef environment?...... 7 1.3 Outline ...... 8

Chapter 2: Introduction to the QTC acoustic seabed classification system...... 13

Chapter 3: Consistency of single-beam acoustic seabed classification among multiple coral reef survey sites ...... 16 3.1 Background ...... 16 3.2 Previous Work...... 18 3.3 Methods ...... 22 3.3.1 Acoustic surveys ...... 23 3.3.2 Acoustic classification ...... 24 3.3.3 Identification of acoustic classes...... 27 3.3.4 Accuracy assessment...... 27 3.4 Results ...... 29 3.5 Analysis: where were the errors? ...... 36 3.5.1 Lee Stocking Island...... 36 3.5.2 Andros Island Area ...... 43 3.5.3 Carysfort Reef and Fowey Rocks ...... 46 3.6 Discussion ...... 46 3.7 Conclusions...... 55

Chapter 4: Reproducibility of single-beam acoustic seabed classification under variable survey conditions...... 58 4.1 Background ...... 58 4.2 Previous work...... 59 4.3 Methods ...... 62 4.3.1 Survey site and data acquisition...... 62 4.3.2 Vessel attitude and grazing angle computation...... 63 4.3.3 Acoustic data processing...... 66 4.3.4 Classification reproducibility...... 67 4.3.5 Echo, FFV, and Q-value correlation with survey parameters...... 71 4.4 Results ...... 74 4.4.1 Environmental conditions, vessel attitude and grazing angle ...... 74

vi

4.4.2 Classification reproducibility...... 77 4.4.3 Echo, FFV, and Q-value correlation with survey parameters...... 86 4.5 Attempts to improve reproducibility ...... 91 4.5.1 Cluster into fewer classes...... 92 4.5.1.1 Merge baseline classes four through six into a single class ...... 92 4.5.1.2 ACE-2...... 93 4.5.2 Keep only near-nadir echoes...... 95 4.5.2.1 Within 5 degrees of vertical (echoes stacked by 5)...... 96 4.5.2.2 Pitch within 2 degrees of vertical (echoes stacked by 5)...... 97 4.5.2.3 Within 5 degrees of vertical (echoes stacked by 1)...... 99 4.5.2.4 Pitch only within 2 degrees of vertical (echoes stacked by 1) ...... 101 4.5.2.5 Incidence angle less than 5 degrees (echoes stacked by 1) ...... 102 4.5.3 Stability of principal components ...... 104 4.5.3.1 Robust PCA...... 104 4.5.3.2 Consistent PCA ...... 107 4.5.3.3 Dataset Size ...... 108 4.5.4 Features least subject to ping-to-ping variability...... 110 4.6 Summary and Conclusions ...... 116

Chapter 5: Acoustic signatures of the seafloor: tools for predicting grouper habitat ...... 119 5.1 Background ...... 119 5.2 Methods ...... 121 5.2.1 Acoustic Survey ...... 123 5.2.1.1 Data Collection and Seabed Classification...... 123 5.2.1.2 Acoustic Variability Index ...... 125 5.2.2 Diver Survey ...... 126 5.2.3 Comparison of Acoustic and Diver Surveys...... 127 5.2.3.1 Acoustic Classification Accuracy Assessment...... 127 5.2.3.2 Grouper Abundance vs. Acoustic Classification and Variability...... 128 5.3 Results ...... 129 5.3.1 Acoustic Survey ...... 129 5.3.2 Diver Survey ...... 129 5.3.3 Acoustic Classification Accuracy Assessment ...... 129 5.3.4 Grouper Abundance vs. Acoustic Classification and Variability ...... 130 5.4 Discussion ...... 131 5.5 Conclusions...... 135

Chapter 6: Geomorphology of grouper and snapper spawning aggregation sites in the upper Florida Keys, USA ...... 137 6.1 Background ...... 137 6.2 Methods ...... 139 6.2.1 Acquisition and acoustic classification ...... 140 6.2.2 Identification of acoustic classes...... 142 6.2.3 Diver-based assessment of classification at Carysfort Reef ...... 143

vii

6.2.4 Locations of FSAs relative to seabed features ...... 144 6.2.5 Geomorphologic “signatures” of FSAs...... 145 6.3 Results ...... 145 6.3.1 Acquisition and acoustic classification ...... 145 6.3.2 Identification of acoustic classes...... 146 6.3.3 Diver-based assessment of classification at Carysfort Reef ...... 148 6.3.4 Locations of FSAs relative to seabed features ...... 148 6.3.5 Geomorphologic “signatures” of FSAs...... 152 6.4 Discussion ...... 153 6.5 Conclusions...... 156

Chapter 7: Conclusions ...... 158

References ...... 163

viii

LIST OF FIGURES Figure 2.1: Flowchart of QTCV processing...... 14 Figure 3.1: Map showing locations of Lee Stocking Island (LSI), Carysfort Reef (CF), Fowey Rocks (FR), and Andros Island (AI) study sites...... 23 Figure 3.2: Underwater photographs of pole-mounted transducer, video camera housing, and sample frames grabbed from underwater video...... 28 Figure 3.3: Sediment grain size distributions for samples from the survey areas...... 32 Figure 3.4: LSI acoustic classification and video classification plotted on top of a true-color IKONOS image of the LSI study area...... 33 Figure 3.5: Andros acoustic classification and video classification plotted on top of a true-color IKONOS image of the study area...... 33 Figure 3.6: Fowey rocks acoustic classes and diver estimated substrate...... 35 Figure 3.7: Within-frame accuracy for the LSI dataset...... 37 Figure 3.8: Depth-frequency histogram for sediment-dominated video frames in the LSI dataset...... 37 Figure 3.9: Classified acoustic tracks, video frames, and dive sites in the Adderly Cut portion of LSI study area...... 38 Figure 3.10: Selected underwater photographs from the Adderly Cut portion of the LSI study area...... 41 Figure 3.11: Within-frame overall accuracy histogram for the Andros dataset...... 43 Figure 3.12: Oblique underwater photographs of four seabed types in the Andros Island survey area...... 45 Figure 3.13: Within-frame overall accuracy histograms for the Carysfort and Fowey Rocks datasets...... 46 Figure 3.14: Plot of overall accuracy as a function of the number of acoustic classes. .51 Figure 3.15: Example of the utility of rock / sediment seabed classification in interpreting bathymetry to predict fish habitat...... 52 Figure 3.16: Cross shelf profile of part of the Navassa Island insular shelf...... 54 Figure 4.1: Fowey Rocks survey site and depth profiles...... 63 Figure 4.2: Wind speed and water temperature for the periods of the six surveys...... 74 Figure 4.3: Boxplots of daily mean and maximum within-stack attitude measurements...... 75 Figure 4.4: Transducer attitude data from May 28, 2007...... 76 Figure 4.5: Q-space of each of the six daily datasets clustered independently...... 79 Figure 4.6: All 18 replicates of both the northern and southern transects...... 80 Figure 4.7: Independently classified replicate track lines lines, Q-space, and pitch measurements along the northern transect on May 2, 2007...... 83 Figure 4.8: Classified track lines, Q-space, and pitch measurements along the transect for independent clustering of each pass along the northern transect on two days...... 84 Figure 4.9: Renumbering of daily classes...... 85

ix

Figure 4.10: Plots illustrating the locations of four stations from which echoes, FFVs, Q-values, and survey parameters were extracted...... 87 Figure 4.11: Plot of multidimensional angle between echo envelopes versus the distance between echoes for all pairwise comparisons at all four test stations...... 89 Figure 4.12: Plot of multidimensional angle between echo envelopes versus the minimum grazing angle at the times of center echo of each stack...... 89 Figure 4.13: Plot of the magnitude of the difference between echoes versus the minimum grazing angle at the times of center echo of each stack...... 90 Figure 4.14: Plot of the magnitude of the difference between FFVs versus the minimum grazing angle at the times of center echo of each stack...... 90 Figure 4.15: Plots of daily Q-space (left) and renumbering of daily classes (right) for datasets classified by ACE with just two clusters...... 94 Figure 4.16: Plot of Q-space for each day and renumbering of daily ACE classes after filtering out all stacks with maximum transducer pointing vector greater than 5 degrees off vertical...... 96 Figure 4.17: Plot of Q-space for each day and renumbering of daily ACE classes after filtering out all stacks with max pitch greater than 2 degrees off vertical...... 98 Figure 4.18: Plot of Q-space for each day and renumbering of daily ACE classes after filtering out all unstacked echoes with max transducer pointing greater than 5 degrees off vertical...... 100 Figure 4.19: Plot of Q-space for each day and renumbering of daily ACE classes after filtering out all unstacked echoes with pitch greater than 2 degrees...... 101 Figure 4.20: Plot of Q-space for each day and renumbering of daily ACE classes after filtering out all unstacked echoes with incidence angle greater than 5 degrees...... 103 Figure 4.21: Plot illustrating the concept of robust PCA...... 106 Figure 4.22: Plots illustrating the concept of robust PCA with more points...... 107 Figure 4.23: Echogram created from the BORIS dataset...... 111 Figure 4.24: BORIS dataset FFVs, FFV coefficient of variation, and loadings of the first principal component computed from the FFVs...... 112 Figure 4.25: Q-space for the BORIS dataset using seven different subsets of features input to the PCA...... 113 Figure 4.26: FFVs of the May 1, 2007 Fowey rocks dataset...... 114 Figure 5.1: Acoustic survey track lines superimposed on an IKONOS satellite image of Carysfort Reef and surroundings...... 122 Figure 5.2: Overview of acoustic processing...... 124 Figure 5.3: Illustration of the computation of acoustic variability...... 125 Figure 5.4: The three main acoustic classes and diver-estimated substrate at Carysfort Reef...... 130

x

Figure 5.5: Grouper abundance at each dive site relative to acoustic classification and acoustic variability...... 132 Figure 5.6: Acoustic classification and acoustic variability computed from the echoes closest to each dive site and grouped by the presence / absence of groupers...... 133 Figure 5.7: Depth, acoustic class, and acoustic variability along transect A-A’, shown in Figure 5.5...... 134 Figure 6.1: Map of the study area...... 139 Figure 6.2: Mean echoes for the six acoustic classes at Davis Reef...... 146 Figure 6.3: Acoustic classification and satellite image for the Davis Reef survey area...... 147 Figure 6.4: Visualization of processed acoustic data surrounding the Carysfort Reef survey area...... 149 Figure 6.5: Visualization of processed acoustic data surrounding the Watson’s Reef survey area...... 150 Figure 6.6: Visualization of processed acoustic data surrounding the Davis Reef survey area...... 151 Figure 6.7: Visualization of processed acoustic data surrounding the Ocean Reef survey area...... 152

xi

LIST OF TABLES Table 1.1: Technologies for surveying coral reef environments at different scales...... 2 Table 1.2: Summary of systems, frequencies, sediment mineralogy, and seabed classes used in previous single-beam ASC studies...... 10 Table 3.1: Seabed classes reported by previous efforts to map coral reef environments with single-beam acoustic seabed classification systems...... 21 Table 3.2: Characteristics and settings of the QTCV system used in this study...... 24 Table 3.3: Values of tunable parameters used when processing each survey with the IMPACT software...... 25 Table 3.4: Acoustic class labels and sizes (by percent of total echoes) in each survey area and aggregation of classes at fine descriptive resolution to coarse descriptive resolution...... 30 Table 3.5: Error matrices for acoustic hard bottom / sediment classification at multiple sites...... 34 Table 3.6: Diver descriptions of Adderly Cut dives from July 2002...... 39 Table 3.7: Probe depths and rugosity for dive sites in Adderly Cut...... 40 Table 3.8: Error matrix for the LSI survey excluding hard bottom sites covered with a thin sediment veneer...... 42 Table 3.9: Error matrix for the Andros survey excluding hard bottom sites covered with a thin sediment veneer...... 44 Table 4.1: Optimum number of clusters identified by ACE for each daily dataset clustered separately...... 77 Table 4.2: Optimum number of clusters identified by ACE for each transect when clustered separately...... 78 Table 4.3: Overall accuracy and Kappa coefficient between pairs of daily classified datasets...... 85 Table 4.4: Percent AMI for the reclassed ACE-best and the original ACE-best dataset...... 86 Table 4.5: Summary of visual assessment of correlation between envelopes, FFVs, or Q-space and survey variables...... 88 Table 4.6: Overall accuracy and Kappa coefficient between pairs of daily classified datasets reclassed to 6 classes but then with classes 4-6 merged to a single class...... 93 Table 4.7: Percent AMI for daily classified datasets reclassed to 6 classes but then with classes 4-6 merged to a single class...... 93 Table 4.8: Overall accuracy and Kappa coefficient between pairs of daily datasets classified by ACE with just two clusters...... 94 Table 4.9: Percent AMI for pairs of daily datasets classified by ACE with just two clusters...... 94 Table 4.10: Summary of off-nadir experiments...... 96

xii

Table 4.11: Overall accuracy and Kappa coefficient between pairs of datasets clustered by day after filtering out all stacks with maximum transducer pointing vector greater than 5 degrees off vertical...... 97 Table 4.12: Percent AMI between pairs of datasets clustered by day after filtering out all stacks with maximum transducer pointing vector greater than 5 degrees off vertical...... 97 Table 4.13: Overall accuracy and Kappa coefficient between pairs of datasets clustered by day after filtering out all stacks with maximum pitch greater than 2 degrees off vertical...... 99 Table 4.14: Percent AMI between pairs of datasets clustered by day after filtering out all stacks with maximum pitch greater than 2 degrees off vertical...... 99 Table 4.15: Overall accuracy and Kappa coefficient between pairs of datasets clustered by day after filtering out all unstacked echoes with maximum transducer pointing vector greater than 5 degrees off vertical...... 100 Table 4.16: Percent AMI between pairs of datasets clustered by day after filtering out all unstacked echoes with maximum transducer pointing vector greater than 5 degrees off vertical...... 100 Table 4.17: Overall accuracy and Kappa coefficient between pairs of datasets clustered by day after filtering out all unstacked echoes with pitch greater than 2 degrees...... 102 Table 4.18: Percent AMI between pairs of datasets clustered by day after filtering out all unstacked echoes with pitch greater than 2 degrees...... 102 Table 4.19: Overall accuracy and Kappa coefficient between pairs of datasets clustered by day after filtering out all unstacked echoes with incidence angle greater than 5 degrees...... 103 Table 4.20: Percent AMI between pairs of datasets clustered by day after filtering out all unstacked echoes with incidence angle greater than 5 degrees...... 103 Table 4.21: Overall accuracy and Kappa coefficient between daily classified datasets created from a single PCA using the median eigenvector of all six days...... 108 Table 4.22: Percent AMI for daily classified datasets created from a single PCA using the median eigenvector of all six days...... 108 Table 4.23: Optimum number of classes as determined by ACE as a function of dataset size for the entire merged Fowey rocks dataset...... 109 Table 4.24: Optimum number of classes as determined by ACE as a function of dataset size for the Watson’s Reef dataset...... 109 Table 4.25: Subsets of features and their colors as shown in Figure 4.24C, and Figure 4.25...... 114 Table 4.26: Overall accuracy and Kappa coefficient between daily classified datasets using the ACE clustering after computing Q space with only features 1-15...... 115 Table 4.27: Percent AMI for the datasets using ACE clustering after computing Q space with only features 1-15 ...... 116

xiii

Table 4.28: Summary of experiments and range of the majority of OA, Kappa, and AMI values in each...... 117 Table 5.1: Characteristics and settings of the QTCV system used in this study...... 131 Table 6.1: Survey areas and the spawning aggregations in each area...... 140 Table 6.2: Characteristics and settings of the QTCV system used in this study...... 141 Table 6.3: Values used for tunable parameters in IMPACT software for processing QTCV echoes...... 142 Table 6.4: Clustering results for the four survey areas...... 145 Table 6.5: Agreement between proposed FSA site model criteria and observations of the sites surveyed...... 154

xiv

Chapter 1: Introduction Darwin (1842) classified coral reefs around the world into three

geomorphological types: barrier, fringing, and , and in so doing pioneered the use of

mapping to help understand the development of coral reefs. Maps depicting thematic

seabed classes have been important tools for coral reef research at least since Agassiz

(1885). Today, maps of coral reef habitats are commonly used for developing marine

protected areas, planning field surveys, and educational outreach, among other

applications (U.S. Coral Reef Task Force Mapping and Information Synthesis Working

Group (USCRTFMISWG) 1999; Green et al. 2000). Despite their importance, however, until recently habitat maps had been produced for only a minute fraction of the world’s coral reefs. Due to the combined utility and lack of availability of habitat maps in coral

reef environments, mapping and monitoring was the first of four duties assigned to the

U.S. Coral Reef Task Force by Executive Order 13089 (63 FR 32701, 1998 WL 313072).

In conjunction with advances in technology and research (Mumby et al. 2004), the high

priority placed on mapping by the Coral Reef Task Force has enabled mapping of

shallow coral reefs to proceed rapidly over the past several years (see

http://eol.jsc.nasa.gov/Reefs for a review of shallow reef mapping initiatives).

A multi-resolution paradigm has evolved for mapping coral reefs with airborne or

satellite optical imagery (Mumby and Harborne 1999; USCRTFMISWG 1999). Under

the multi-resolution strategy, large areas are mapped at low thematic and spatial

resolution then selected smaller areas of interest are filled in using technologies with

higher thematic and spatial resolution. Deep or turbid water limits the utility of seabed

mapping with overhead imagery, however, to those shallow reefs located in clear water.

The multi-resolution approach, which is a sensible compromise between the needs of

1 2

global products (large areas, low cost) and local products (high spatial and thematic

resolution), has been enacted at a programmatic level for optical satellite and aerial

imagery (USCRTFMISWG 1999) and could serve as a model for mapping deeper

environments with acoustic technologies (Table 1.1).

Table 1.1: Technologies for surveying coral reef environments at different scales. A multi-resolution strategy of mapping with optical imagery exists for shallow reefs. A logical approach would be to implement a similar hierarchical approach for deep coral ecosystems with acoustic technologies. This dissertation focuses on the question of whether single-beam classification methods are appropriate for mapping deep reefs at the global / regional scale (upper right box).

Areas that cannot be mapped with satellite or aerial imagery are both extensive and ecologically important. For example, over 55% of the Florida Keys National Marine

Sanctuary (about 1540 square nautical miles) has not been mapped due to water depth or clarity limitations (FMRI 1998). The Tortugas Bank, , and Flower Garden

Banks are three examples from the Gulf of Mexico that illustrate the potential of luxuriant communities of shallow-water (zooxanthellate) to exist at “mesophotic” depths of 30-75 m (Miller et al. 2001; Hickerson and Schmahl 2005; Jarrett et al. 2005).

In addition, coral communities can exist below the photic zone, where deep-water

(azooxanthellate) corals form mounds up to several hundred meters high. Recent ocean exploration initiatives indicate that such deep corals are much more extensive than previously thought (Roberts et al. 2006). Deep corals provide important habitat for fishes, and shallow coral species may potentially find refuge from warming surface waters at mesophotic depths (Riegl and Piller 2003).

3

In order to improve the understanding of reef resources in optically deep water, which in this context means deeper than can be mapped with overhead imagery, alernative mapping technologies must be used in place of aerial or satellite imagery.

Acoustic mapping systems are a natural solution to mapping optically deep water.

Existing acoustic technologies that could be employed for mapping deep reefs differ in their cost, resolution, and, therefore, potential coverage, however. One challenge is that many areas need mapping, and no single acoustic technology will be cost effective on all spatial scales. A logical approach would be to employ acoustic survey methods in a hierarchical manner similar to that used for optical imagery. With this approach, a single- beam system would fill the equivalent role in optically deep water that moderate resolution satellite imagery fills in optically shallow water (Table 1.1).

Single-beam acoustic seabed classification (ASC) systems are potentially attractive for mapping large areas at relatively low descriptive resolution. ASC systems, also known as acoustic ground discrimination systems (Foster-Smith and Sotheran 2003;

Kenny et al. 2003), are relatively inexpensive to purchase and to operate, portable, and easy to use (Anderson et al. 2008). Thus, ASC has appealing advantages for mapping or exploration of deep-water coral ecosystems. The long-term vision implied by the multi- resolution mapping paradigm is that a network of ships deploying commercially available single-beam ASC systems would, at minimal marginal cost, acquire the data necessary to produce coral ecosystem habitat maps on a regional to global scale in optically deep water. As attractive as the long-term vision sounds, there are many challenges, both methodological and applied, to address before ships of opportunity could systematically contribute to regional-scale mapping of deep coral ecosystems (Anderson et al. 2008).

4

The optical remote sensing community has, for some time, explicitly distinguished between development or evaluation of remote sensing methodology and use of remotely sensed data for particular applications. Green et al. (1996), for example, reviewed state-of-the-art in remote sensing of tropical resources, including coral reefs, with satellite and aerial imagery. One of the conclusions of Green et al. (1996) was that research to that point had heavily emphasized the development and evaluation of novel techniques with few examples of transitioning these techniques for use in real world problems. Eight years later, Andrefouet and Riegl (2004) edited a special issue of the journal “Coral Reefs” in which the number of papers was evenly split between methodological development and application to science or management. Andrefouet and

Riegl (2004) acknowledged the progress that had been made in a short time transitioning remote sensing technology from “the tool without application to the mandatory tool.” At the same time, however, Andrefouet and Riegl (2004) noted that none of the papers in their special issue had used acoustic remote sensing technology, despite specific effort to solicit such submissions. Andrefouet and Riegl (2004) pointed out that additional effort was needed to develop and apply acoustic remote sensing for coral reef science and management.

The purpose of this dissertation research was to assess the utility of single-beam

ASC for mapping coral reef environments, thereby beginning to address the gap identified by Andrefouet and Riegl (2004). Acknowledging the importance of both methodology and application, the overall goal of the research was to make a contribution in both areas. Thus, the chapters of this dissertation fall into two groups, addressing two basic questions: What can be mapped with ASC in the coral reef environment? And, what

5

are some example applications of ASC in the coral reef environment? Addressing

questions of both methodology and application is important because the utility of a

technology is defined by its application to real world problems, yet to meaningfully apply

a technological solution one first must have a detailed understanding of the strengths and

weaknesses of the approach.

1.1 What can be mapped with single-beam acoustic seabed classification in the coral reef environment? The question “what can be mapped” captures the spirit of the methodological

section of this study, but it is overly simplistic, as ASC systems have been available for

about 15 years, and many areas have been mapped using them at several frequencies

between 12-200 kHz (Table 1.2). Two things distinguish this study from previous work with regard to the question "what can be mapped": first, the geology of the areas being mapped, and second, an emphasis on standardizing methodology among multiple surveys. Focusing on coral reef environments and on standardizing methodology refines the simple question of “what can be mapped” to define specific objectives for this dissertation.

The focus on coral reef environments distinguishes this study from most other

investigations of ASC performance, which have, for the most part, considered siliciclastic

environments (87% of papers in Table 1.2) and exclusively soft sediment communities

(67% of papers in Table 1.2). Only a handful of previous ASC studies have focused on

carbonate reef environments (Murphy et al. 1995; Hamilton et al. 1999; White et al.

2003; Moyer et al. 2005; Riegl and Purkis 2005; Riegl et al. 2007). One reason to expect

ASC might perform differently in reef environments is that carbonate and siliciclastic

sediments typically differ in depositional texture, fabric, and postdepositional alteration

6

(diagenesis). It is not surprising, therefore, that the geoacoustic properties of surficial

carbonate sediments have been shown to differ from those of siliciclastics (Richardson et

al. 1997; Brandes et al. 2002). Many maps made with ASC in soft sediment communities

have distinguished sediment facies primarily based on grain size (see citations in Freitas

et al. 2003b). In contrast, coral reef ecologists will tend to want subdivisions of rocky facies, grouping all sediment together as a single class (Mumby and Harborne 1999;

Franklin et al. 2003). The few ASC studies that have included rocky seabeds (Sotheran et al. 1997; Pinn and Robertson 1998; Bax et al. 1999; Bornhold et al. 1999; Anderson et al.

2002; Pinn and Robertson 2003; Brown et al. 2005) generally consider rock as a single class. Only Moyer et al (2005) and Riegl and Purkis (2005) have assessed the accuracy of

ASC maps that included multiple hard bottom classes, suggesting that opportunities exist for further technological progress in this area.

Anderson et al. (2008) proposed a list of ten priorities for research that would advance the field of acoustic seabed classification. At least five of these priority research areas fall under the general topic of standardization of instruments and methods.

Transferability, and reproducibility are two aspects of standardization of instruments and methods that are investigated in this dissertation. First, ASC should be transferable in the sense that the classes mapped should have the same meaning regardless of location or instrumentation used for data collection. Second, ASC should be reproducible such that successive measurements by the same sensor over a short time period produce stable results. Transferability, and reproducibility have not been well quantified for single-beam

ASC in general, and have not been addressed at all for ASC in coral reef environments.

7

Chapter 3 addresses the question of single-beam ASC transferability. Chapter 4 addresses

the question of single-beam ASC reproducibility.

1.2 What applications can utilize single-beam acoustic seabed classification in the coral reef environment? In the context of distinguishing between methodology and application (Green et al. 1996; Andrefouet and Riegl 2004), as discussed above, all previous studies using single-beam ASC in coral reef environments (Murphy et al. 1995; Hamilton et al. 1999;

White et al. 2003; Moyer et al. 2005; Riegl and Purkis 2005; Riegl et al. 2007) have been purely methodological. The products of these studies were maps of the distribution of sedimentary and outcropping rock facies. An application of remotely sensed data, in contrast, would use facies or habitat maps as a starting point or as an intermediate step toward some other objective. In this dissertation, the assessment of grouper and snapper habitat in the upper Florida Keys, USA, was chosen as an example application of single- beam ASC in coral reef environments.

Eklund et al. (2000) observed a black grouper aggregation just outside of a protected no-take marine reserve in the Florida Keys. If the existence of this aggregation had been known or suspected during the reserve planning process, it could have been considered for inclusion within the no-take reserve boundaries. One conclusion by

Eklund et al. (2000) was that seabed habitat maps for depths greater than 20 m in the

Florida Keys were needed to assist the stratification of reef fish census effort. A second conclusion by Eklund et al. (2000) was that rapid means for identifying grouper habitat, in particular possible sites of grouper spawning aggregations, would be beneficial for conservation efforts. Chapters 5 and 6 begin to address the needs identified in Eklund et al. (2000) by investigating the potential of a commercial single-beam ASC system to a)

8

create seabed substrate maps for depths greater than 20 m in the upper Florida Keys, and

b) associate seabed geomorphology with grouper and snapper habitat.

1.3 Outline This dissertation focuses on the question of whether single-beam classification

methods are appropriate for mapping deep reefs at the global to regional scale (upper

right box of Table 1.1). Several commercial-off-the-shelf, single-beam ASC systems are

available that could be used for this purpose (QTC1, RoxAnn2, BioSonics VBT3,

Kongsberg SEABEC4). All of the work in this dissertation was completed using a system produced by the Quester Tangent Corporation (QTC; Sidney, BC, Canada). The QTC system consisted of two complementary components. Data acquisition was performed with a QTCView Series V (QTCV) operating with a 50 kHz sounder. Data processing and classification were preformed with QTC IMPACT software. Chapter 2 introduces terminology, parameters, and processing steps associated with the QTC system.

Chapters 3 and 4 address the methodological questions related to producing transferable, and reproducible classification schemes in a coral reef environment (Section

1.1). Chapter 3 addresses single-beam ASC transferability. An abridged version of

Chapter 3, not including the Andros results or Section 3.5, has been published (Gleason et al. 2009) and a version containing the omitted sections is being prepared for publication. Part of the Chapter 3 discussion refers to a survey performed at the Navassa

Island National Wildlife Refuge. The details of the survey are not included in this

1 http://questertangent.com/m_prod_view.html

2 http://www.sonavision.co.uk/pages/seabed_classification_menu.html

3 http://www.biosonicsinc.com/vbt.shtml

4 http://www.sonavision.co.uk/pages/seabed_classification_menu.html

9

dissertation but have been published by Miller et al. (2008). Chapter 4 concerns single-

beam ASC reproducibility. The results from Chapter 4 are in preparation for publication.

Chapters 5 and 6 explore the application of single-beam ASC to the mapping and

assessment of grouper and snapper habitat in the upper Florida Keys, USA (Section 1.2).

Chapter 5 presents the results of mapping the aggregation site observed by Eklund et al.

(2000) and its surroundings. Grouper abundances were measured to assess whether the

seabed classes identified by the ASC could stratify the survey area in a way that would

efficiently allocate grouper census effort. Chapter 5 was published as Gleason et al.

(2006). The acoustic variability calculation described in Chapter 5 has been incorporated

into the Classification and Mapping Software (CLAMS; v. 1.1, Quester Tangent Corp.,

Sidney, B.C. Canada, 2004) where it is called the "complexity index."

In Chapter 6 the sites of five known historical grouper and snapper spawning

aggregations in the upper Florida Keys were surveyed using the same commercial single-

beam ASC system. The resulting maps of seabed classes for each site were interpreted,

and similar geomorphological features were found to be associated with each

aggregation. Chapter 6 has been submitted to The Professional Geographer and is currently in review.

10

Table 1.2: Summary of systems, frequencies, sediment mineralogy, and seabed classes used in previous single-beam ASC studies. Abbreviations used in the System column were: “QTCIII”, “QTCIV” and “QTCV” for QTCView Series III, IV and V, respectively; “Rox” for RoxAnn; “Echo+” for Echoplus, and "ISAH-S" for the precursor to the QTC systems. Entries under the Frequency column are in kHz. Abbreviations under the “Si/CO3” column indicate the predominant mineralogy of sediments in the study area: “Clastic” for siliciclastic, and “CO3” for carbonate. Some of the data for these columns were not explicitly stated in the cited reference but were assumed based on the study’s location or date. Citation System Freq. Si/CO3 Classes used or identified (Anderson et al. 2002) QTCIV 38 Clastic 4 classes of training data (Sand, gravel, rock, rock with macroalgae), additional 4 classes found in post processing due to expert guidance. (Bax et al. 1999) Rox 120 Clastic Hard, soft, rough. (Bornhold et al. 1999) QTCIV 200 Clastic Bedrock, sediment veneer, sand/gravel with boulders, sand/gravel and muddy sand. (Bloomer et al. 2007) QTCIV 50 Clastic 3 case studes. Classes mostly based some on grain size. CO3 (Brown et al. 2005) Rox 200 Clastic 6 classes based on grain size and rock. (Collins and McConnaughey QTCIV 38 / Clastic Classes not specified. 1998) 120 (Ellingsen et al. 2002) QTCIV 50 Clastic 6 classes mostly related to grain size, some but “sediment characteristics could CO3 not alone explain the diversity of acoustic classes.” (Foster-Smith and Sotheran Rox 38 / Clastic 12 biotypes plus rocky. 2003) 200 (Foster-Smith et al. 2004) QTC(IV?) 200 / Clastic 4 classes based mostly on grain size; and Rox 200 8 classes based on grain size and bedforms. (Freeman et al. 2002) QTCIV 120 Clastic 5 acoustic classes correlated with their habitat complexity index, which was derived from grain size + ss texture + mud fraction + burrows + wt. of stones. (Freeman and Rogers 2003) QTCIV 200 Clastic 11 classes: grain size and bedforms. (Freitas et al. 2003a) QTCIV 50 Clastic 3 classes: coarse, fine, very fine. (Freitas et al. 2003b) QTCIV 50/50 Clastic 3 classes: fine with silt, fine without QTCV w/diff silt, mud. width (Freitas et al. 2008) QTCV 50 / Clastic 50: 3classesmed sand, fine sand, 200 mud; 200: not correlate with sediment (Galloway and Collins 1998) QTCIV 38 / Clastic Mud, sand, gravel - qualitative 200 evaluation (Gleason et al. 2006) QTCV 50 CO3 7 classes, but 3 main ones. Hardbottom, two sediment classes. (Greenstreet et al. 1997) Rox 38 Clastic 6-7 based on grain size

11

Table 1.2 (continued): Summary of systems, frequencies, sediment mineralogy, and seabed classes used in previous single-beam ASC studies. (Hamilton et al. 1999) QTCIV 38 / 50 Clastic 5 QTC, 10 Rox. QTC classes based Rox and CO3 on grainsize (supervised). (Hetzinger et al. 2006) QTCV 200 Mostly 4 classes: fine-med sand, med-coarse CO3, sand, coarse sand-granule, v.coarse some sand-granule. clastic (Hewitt et al. 2004) QTCIV 200 Mixed? 5 classes based on biotic communities. (Hutin et al. 2005) QTCIV 38 / 50 Clastic 3-5 classes correlated with depth. QTCV Once depth was removed via regression, the residuals were able to map scallop beds. (Magorrian et al. 1995) Rox 50 Clastic Mud/silt (soft) and mussels, scallops. (Morrison et al. 2001) QTCIV 200 Mixed? Mostly grain size, but only qualitatively assessed via video ground truth (Moyer et al. 2005) QTCV 50 Mixed 5 classes, mostly hard/soft. ~60% accuracy hard / sediment (Murphy et al. 1995) Rox Not CO3 7 classes, not specified exactly. Some given were coral, sand, patchreef, fine sand, seagrass on sand. (Pinn et al. 1998) Rox 38 Clastic 24 classes in three surveys based on grain size, rock, and “unknown.” (Pinn and Robertson 1998) Rox Not Clastic No actual classes, just E1, E2. given (Pinn and Robertson 2001) Rox Not Clastic No actual classes, just E1, E2. given (Pinn and Robertson 2003) Rox 38 Clastic Between 4-9 classes based on grain size, rock, and “unknown” depending on interpolation and track spacing. (Preston et al. 1999) QTC(IV?) 38 / Clastic Classes not labeled, but instead 200 correlated with linear combinations of geotechnical variables. 38 kHz correlated with volume variables and 200 kHz correlated with surface vars. (Preston et al. 2004b) QTCIV 38 / Clastic Unspecified, exactly, but some 120 combination of sands. (Riegl et al. 2005b) QTCV 50 / Mixed? Sand, seagrass, (dense/sparse) algae. Echo+ 200 Experiments with algae over fixed spots / dropped baskets. (Riegl et al. 2005a) QTCV 50 / Mixed? Sand, seagrass, algae. 200 (Riegl and Purkis 2005) QTCV 50 / CO3 50: Rock and sediment; 200: high 200 and low relief; combined them for 4 classes (Riegl et al. 2007) QTCV / 50 Mixed 4 classes: two hardbottom, two Echo+ sediment. Sediment appears to be divided by sorting. (Sotheran et al. 1997) Rox Not Clastic 14 classes mostly based on bedrock, given cobbles / boulders and sand. (Tsemahman et al. 1997) ISAH-S 50 Clastic Primarily based on grain size. Highest correlation with geotech was corer penetration.

12

Table 1.2 (continued): Summary of systems, frequencies, sediment mineralogy, and seabed classes used in previous single-beam ASC studies. (von Szalay and QTCIII 38 / 38 Clastic 6 at site one, 13 at site two based on McConnaughey 2002) QTCIV grain size (White et al. 2003) Rox 200 CO3 Mud, sand, everything else (rock). (Wilding et al. 2003) Rox 200 Clastic No actual classes, just E1, E2.

Chapter 2: Introduction to the QTC acoustic seabed classification system The Quester Tangent Corporation (QTC; Sidney, BC, Canada) produces several

commercial-off-the-shelf acoustic seabed classification (ASC) systems comprised of both hardware and software components. The surveys described in Chapters 3 to 6 all used a

QTCView Series V (QTCV) for data acquisition and the QTC IMPACT (version 3.4)

software for data processing and classification.

The QTCV data acquisition system consists of an echo sounder, transducer, head

amplifier, analog to digital (A/D) signal capture card and software to record the digitized

echoes. A personal computer is required to house the A/D card, run the software and

store the digitized data. A wide variety of echo sounder/transducer combinations can be

used with the system. Examples in the literature include surveys at 24, 38, 50, 120, and

200 kHz (Table 1.2). The system used in this work operated at 50 kHz.

The QTC IMPACT software processes pre-recorded echoes using three general steps (Fig. 2.1). In step (A) the Hilbert transform computes the echo envelope from the full waveform (bipolar) data recorded during data acquisition. All echoes are scaled to a maximum amplitude of one, and a depth compensation routine is applied to normalize the echo length by subsampling the echo envelope at different rates depending on the depth at which it was acquired (Preston 2004). A user-selectable number (5 by default) of consecutive echoes are averaged (“stacked”) to reduce stochastic ping-to-ping variability.

Finally, a series of algorithms (Fig. 2.1) are used to compute features characterizing the echo shape; echo amplitude is not used in the QTCV analysis.

Data analysis step (B) reduces the dimensionality of the dataset. Usually the features generated in step (A) are highly correlated, so principal components analysis

13 14

(PCA) is used to transform the data set. IMPACT retains the first three principal

components regardless of the percent of dataset variance explained. In the system

documentation, these principal components are given the shorthand names “Q1”, “Q2”,

and “Q3”, thereby forming three-dimensional “Q-space”. For consistency with the documentation and with the majority of existing literature using QTC IMPACT, these

“Q” names will be retained below, but their meaning should be clear; each point in Q- space corresponds to a single, possibly stacked, echo in geographic space. Points close to one another in Q-space have similar shapes, and points far apart in Q-space have different shapes, but these shapes cannot be recreated from Q-space, nor does a point in Q-space have an a priori physical interpretation (e.g. gravel vs. sand).

Figure 2.1: Flowchart of QTCV processing. QTCV analysis consists of three steps A) feature extraction, B) data reduction, C) clustering. See text for discussion.

The data are clustered in analysis step (C) to form discrete classes corresponding

to bottom types that differ acoustically. The clustering algorithm used is based on a

15

simulated annealing routine (Preston et al. 2002; Preston et al. 2004a), which produces

the statistically optimum cluster membership for a given number of clusters. The usual

procedure is to determine the optimum cluster arrangement for a range of numbers of

clusters, then choose which number of clusters is best based on the Bayesian Information

Criterion (Preston et al. 2004a; Preston et al. 2004b). The class assignments can be mapped in geographic space because each echo (point in Q-space) has geographic coordinates associated with it. Finally, the parameters of the data analysis steps (B) and

(C) are stored in a file known as a QTC “catalog”. The catalog contains the PCA projection matrix and the mean/covariance matrix for each cluster. Additional data sets can be processed using this catalog, or subjected to their own PCA/clustering steps.

No single comprehensive description of the IMPACT system has been published, which hinders communication of results to those who have not personally used the system. The following brief list of sources should help readers unfamiliar with the QTC system to gain insight into the processing without reviewing the large literature on single- beam seabed classification in general. Hamilton et al. (1999; section 1.1) have a concise description of the underlying principle behind classification based on echo shape. ICES

(2007) provides background material on many aspects of acoustic seabed classification.

The system described by van Walree et al. (2005) is similar to the IMPACT processing although different features are generated to describe the echo shape. Preston (2004) described the depth compensation route used in IMPACT. Preston et al. (2004a) provided an overview of the features calculated by IMPACT and the simulated annealing clustering routine.

Chapter 3: Consistency of single-beam acoustic seabed classification among multiple coral reef survey sites

3.1 Background Coral reef-associated habitats that cannot be mapped with airborne or satellite

imagery are both extensive and ecologically important. For example, over 55% of the

Florida Keys National Marine Sanctuary (about 1540 square nautical miles) has not been

mapped due to water depth or clarity limitations (FMRI 1998). Not all of this unmapped

area contains reefs, of course, so the most basic mapping goal is to search this large area

in a cost effective way for the purposes of discriminating ecologically distinct habitats

such as reef, hard bottom, seagrass, and bare sediment, and prioritizing areas of interest

for detailed surveys.

Acoustic systems are a natural solution for mapping optically deep water, which

in this context means anywhere overhead imagery cannot effectively map the seabed.

Commercial single-beam acoustic seabed classification (ASC) systems could potentially contribute to the problem of mapping large areas of the seabed in optically deep water because they are inexpensive relative to other acoustic mapping tools, portable, easy to operate, and, at least conceivably, operable from ships of opportunity thereby enabling

coverage of certain large areas at minimal marginal cost (Anderson et al. 2008).

Even though single-beam ASC systems have been commercial products since the

early-1990s (Chivers et al. 1990) and have been used in a variety of mapping projects

around the world (Table 1.2), the methods of data acquisition, processing, classification,

and validation continue to improve. Indeed, a recent review of acoustic seabed

classification technologies identified the standardization of methodologies and the

measurement of seabed variability at multiple spatial scales as two priority areas for

16 17 future research (ICES 2007; Anderson et al. 2008). Standardizing ASC methods would permit comparisons of data collected in different areas or within the same area at different times, potentially by different systems (Anderson et al. 2008). As discussed in

Section 1.1, the improvements to ASC methodologies suggested by Anderson et al.

(2008) fall into three categories: objectivity, transferability, and reproducibility. This chapter concerns the transferability of ASC results.

Previous evaluations of single-beam acoustic seabed classification systems in coral reef environments have used data from just one study site (Section 3.2).

Consequently, the classification schemes employed have been different in every case

(Table 3.1). In contrast, this study compared results across multiple study sites to assess what seabed types could be mapped reliably by a particular single-beam ASC system in coral reef environments in general, rather than at a single specific site.

The overall objective of this study was to determine what acoustic seabed classes were consistently distinguished among survey areas in coral reef environments. The methods employed were, first, to survey multiple sites using the same 50 kHz single- beam ASC system, second, to cluster the acoustic data from each site independently, third, to associate the clusters of acoustic data with seabed types, and finally, to quantify the accuracy of the resulting classified map (Section 3.3). The results showed that hard bottom (rocky) and sediment substrate were common classes identified at all sites

(Section 3.4). An analysis of misclassified portions of the study areas revealed that thin sediment veneers and within-footprint macrorelief were common sources of errors in the acoustic classes (Section 3.5).

18

The conclusions of this study should be broadly applicable to mapping coral reefs

and reef-associated habitats. Even though a simple hard bottom /sediment classification

scheme discriminates only two types of seabed, the fact that this scheme is transferable

among sites proves useful in several ways, as discussed in Section 3.6 and exploited in

Chapter 6. Furthermore, documenting the transferability of a single, albeit simple, single-

beam acoustic seabed classification scheme is a step toward the standardization of ASC

methods recommended by Anderson et al. (2008).

3.2 Previous Work Even though this is the first study to compare single-beam ASC maps created at

multiple sites, previous works have investigated the performance of single-beam ASC

systems within single coral reef survey areas (Murphy et al. 1995; Hamilton et al. 1999;

White et al. 2003; Moyer et al. 2005; Riegl and Purkis 2005; Riegl et al. 2007).

Reviewing the classes found in these previous surveys suggests what seabed classes

might be common to all coral reef sites.

Murphy et al. (1995) surveyed Biscayne National Park, in South Florida, USA,

using a RoxAnn system. The sonar frequency was not specified. Murphy et al. (1995)

proposed that seven seabed communities could be mapped with the system, including

different densities of seagrass. Only five specific classes were actually described,

however, by Murphy et al. (1995): coral, patch reef / hard bottom, seagrass on sand, sand,

and fine sand. No accuracy assessment was provided.

Hamilton et al. (1999) used two single-beam ASC systems, a Quester Tangent

Series IV (QTCIV) and RoxAnn, to map a portion of the lagoon, near

Cairns, Australia. The RoxAnn system operated at 50 kHz, while the QTCIV operated at

19

38 kHz. Five classes were mapped with QTCIV: soft mud, plastic mud, silty mud, muddy gravel, and muddy, coarse, sandy gravel. The RoxAnn data were divided into 16 classes, merged to eight classes, and ultimately five classes: coarse sand and gravel, muddy sand and gravel, rough plastic muds, fine sand/sand/mud, and muds. Qualitative assessment with ground truth suggested that the QTCIV class boundaries were consistent with changes in grain size, but that the RoxAnn classes were not. No quantitative accuracy assessment was provided. Hamilton et al. (1999) felt that both systems produced erratic signals over hard bottom, but they did not include any hard bottom classes in their final maps.

White et al. (2003) used a RoxAnn system at 200 kHz to map an area offshore

Negros Occidental, Philippines, from 2.6 to 70 m depth. They produced four habitat maps with different numbers of classes, ranging from 10 classes with 28% overall accuracy to three classes with 86% overall accuracy. The classes for the coarsest level were mud, sand, and coral dominated.

Riegl and Purkis (2005) used a Quester Tangent Series V (QTCV) at both 50 and

200 kHz to map coral communities from 1-8 m depth in a ramp setting off the coast of

Dubai. They found that the 50 kHz data discriminated hard bottom from sediments and that the 200 kHz data discriminated high and low rugosity. They compared their results with a habitat map made from satellite imagery and found 66% agreement. Comparison against ground truth yielded 56% agreement.

Moyer et al. (2005) used QTCV at 50 kHz to map drowned Holocene reefs and surrounding facies between 3 to 35 m depth off the coast of Broward County, South

Florida, USA. After clustering their data into six acoustic classes representing rubble, two

20

types of sediment, and three types of reef they found only 39% accuracy. Moyer et al.

(2005) also consolidated acoustic clusters into two coarser classification schemes with

only three or two classes and found 61% and 64% accuracy, respectively.

Riegl et al. (2007) used two single-beam acoustic seabed classification systems,

QTCV and Echoplus, both operating at 50 kHz, on the same survey in a mixed carbonate-

siliciclastic setting offshore of Cabo Pulmo, Mexico. The QTCV data were divided into

two hard bottom classes, “rocky ridges” and “rock and hardground”, and two sediment

classes, “less sorted sand” and “well sorted sand”. The Echoplus data were divided into

two classes, hardgrounds and sand. Accuracy relative to video images of the seabed was

reported as 90% but no error matrix was provided.

No single seabed class was identified in all of the previous single-beam acoustic

surveys in coral reef environments (Table 3.1). Only a few seabed classes, such as

“sand”, “coral”, and “hard substrate” were identified by more than one of the previous

studies (Table 3.1). “Sand,” found in four of the previous surveys, was the most

commonly identified class, but only when subclasses identified by Murphy et al. (1995)

and Riegl et al. (2007) were combined. The lack of consistency between surveys suggests

that acoustic classes derived from single-beam echo sounders are not transferable among

sites. Fusing the classes found in previous studies into just two coarse-level classes,

however, reveals that five of the previous studies found at least one class that could be

described as “hard bottom” substrate and all of the previous studies had at least two

classes that, taken together, could be described as “sediment” substrate (Table 3.1).

Previous results suggest, therefore, that aggregating classes to a coarser level of detail

might be required to generate consistency between surveys.

21

Table 3.1: Seabed classes reported by previous efforts to map coral reef environments with single-beam acoustic seabed classification systems. The first row contains headings corresponding to the descriptive resolution (DR) of each study. The second row lists the studies, which were all at a fine level of DR, as well as headings for two columns of aggregated classes constructed from the previous studies. The last row contains the overall accuracy (OA) if available or N/A if not reported. DR Coarse Medium Fine Study Aggregation Aggregation Murphy Hamilton White et Riegl & Moyer et Riegl et al. of previous of previous et al. et al. al. (2003) Purkis al. (2007) results results (1995) (1999) (2005) (2005) Coral Coral High Hard Rocky Dominated Rugosity Substrate Ridges Hard Hard Bottom Patch Low Rock and Reef / Rugosity Hardground Hard Hard Bottom Seagrass Seagrass on Sand Muddy Coarse Sandy Gravel Gravel Classes Muddy Gravel Sand Sand High Sand Less Sorted Rugosity Sand Sediment Soft Sand Fine Low Well Sorted Sand Rugosity Sand Soft Silty Mud Pastic Mud Mud Mud Soft Mud OA N/A N/A N/A N/A 86% 56% 64% “90%”

Green et al. (1996) used the term “descriptive resolution” to refer to the capability

of a sensor to discriminate habitats, but it more generally refers to the level of detail (i.e.

number of distinct classes) provided by a classified map (Mumby et al. 1997; Mumby

and Harborne 1999). In the terrestrial remote sensing literature, common synonyms for

“descriptive resolution” are “thematic resolution” (e.g. Castilla et al.) or simply “level of detail” (Congalton and Green 1999). The observations that no acoustic class mapped in

22

any previous studies were common to more than a few sites but that “hard bottom” and

“sediment” classes, created as supersets of the classes in each study, were common to all

sites suggest that single-beam acoustic seabed classifications may be transferable

between sites only at a coarse level of descriptive resolution.

Aggregating similar seabed classes produces a hierarchical classification scheme

with multiple levels of descriptive resolution ranging from fine (many subtly

distinguished classes) to coarse (a few general classes). Given the expectation, based on

results of previous studies, that the level of descriptive resolution limits transferability of

classification schemes between sites, the classes found in each of the survey areas of this

study were aggregated to form a hierarchical classfication scheme with two levels: a

coarse level composed of “hard bottom” and “sediment” classes and a fine level composed of whichever classes were statistically discriminated from the data by the clustering process.

3.3 Methods Four sites were mapped with the same QTCV system operating at the same

frequency and classified with QTC IMPACT software, using the same overall procedure.

First, an acoustic survey was conducted (Section 3.3.1). Second, the acoustic data were

clustered into groups based on echo shape (Section 3.3.2). Third, each of the clusters was

labeled (Section 3.3.3). Fourth, classification was assessed with respect to ground truth data (Section 3.3.4). The details of each of these steps, which are described in the

following four sections, varied among the surveys, but all of the surveys contained each

of the four basic steps.

23

3.3.1 Acoustic surveys The 50 kHz QTCV system described in Chapter 5 was used to acquire the acoustic data for all four surveys (Table 3.2). Two of the sites investigated were in the

Bahamas, in shallow water on the Great Bahama Bank (Fig. 3.1). The other two sites were located in the reef crest and forereef environments of the upper Florida Keys (Fig.

3.1).

Survey 1 was conducted in the vicinity of Lee Stocking Island (LSI), Bahamas, on

June 16-20, 2001 (Fig. 3.1). Approximately 145 km of track lines were acquired along the bank top in water depths ranging from one meter to just over 10 m.

Figure 3.1: Map showing locations of Lee Stocking Island (LSI), Carysfort Reef (CF), Fowey Rocks (FR), and Andros Island (AI) study sites. Dashed line is the 200 m bathymetric contour.

Survey 2 was conducted east of Andros Island, Bahamas (Fig. 3.1) on October 16-

18, 2001. Approximately 73 km of track lines were acquired in depths from 1 to about 8 m. One portion of the bank was covered with a grid of 19 approximately 1500 m long transects spaced 100 m apart. A second portion of the bank top was covered by nine

24 widely-spaced, cross-shelf transects that, in turn, were connected by segments of an along-shelf transect formed while transiting between the cross-shelf areas.

Survey 3 was conducted on March 14, 28, and April 4, 2002 offshore of Carysfort

Reef, in the Florida Keys (Fig. 3.1). Fifty-two parallel transects, each about 2 km long, were run across the upper shelf between depths of 3 to 35 m. The total track length, including tie lines was approximately 124 km. Further details of this survey are described in Chapter 5.

Survey 4 was conducted at Fowey Rocks, also in the Florida Keys, approximately

45 km north-northeast of Carysfort (Fig. 3.1). The Fowey survey was conducted October

12 and 20, 2003 and included forty-one parallel transects across the upper shelf between depths of 3 to 40 m. The total track length, including tie lines, was approximately 72 km.

Table 3.2: Characteristics and settings of the QTCV system used in this study. Parameter Value Sounder model Suzuki 2025 Frequency 50 kHz Power 500 W Echo pulse length 0.3 ms Ping rate recorded 1.5 Hz (approx) Transducer model Suzuki TGN60-50B-12L Beam width (cross track) 42 degrees Beam width (along track) 16 degrees

3.3.2 Acoustic classification Acoustic data were processed with the QTC IMPACT software package (version

3.4, QTC, Sidney, BC, Canada, 2004). Details of the IMPACT processing were described in Chapter 2 and have been previously published (e.g. Preston et al. 2004a; Gleason et al.

2006; Freitas et al. 2008), so only a review of the essentials will be given here.

25

First, the raw, bi-polar waveforms were converted to echo envelopes (the instantaneous amplitude of the time series as computed by the Hilbert transform).

Second, the echoes were depth compensated using the algorithm described by Preston

(2004). The implementation of the Preston (2004) depth compensation algorithm within

IMPACT uses the sounder / transducer characteristics and two main parameters, the standard echo length (SEL) and the survey depth (Tables 3.2, 3.3).

Table 3.3: Values of tunable parameters used when processing each survey with the IMPACT software. Parameter (units) LSI AI CF FR Standard Echo Length (samples) 100 100 170 170 Echoes deeper than Critical Depth No No Yes Yes (N/A) Survey depth (m) Equal to Equal to 60 60 Critical Depth Critical Depth Stack size (# echoes) 5 5 5 5 Auto cluster iterations (N/A) 30 15 30 40 Auto cluster class range (N/A) 2-10 2-15 2-12 2-10

Third, echoes were stacked (i.e. averaged) to smooth some of the natural ping-to- ping variability in echo shape. All of the data from surveys for this paper were stacked by five (Table 3.3), which means that echoes one through five were averaged to form stack one, echoes six though ten were averaged to form stack 2, and so on. In the remainder of the description of the methods, the results, and the discussion below, all of which concern processed data, the term “echoes” actually implicitly means “stacked echoes.”

Fourth, 166 features were computed for each (stacked) echo. IMPACT uses five types of measurements to generate the 166 features: 1) cumulative amplitude, 2) amplitude quantiles, 3) amplitude histogram, 4) power spectrum, and 5) wavelet packet transform (Preston et al. 2004a). Although the exact algorithms used to compute these

26 features remain proprietary, they are metrics of “the shape and spectral character of the echo” (Preston 2004). Chapter 2 contains further discussion of the features.

The features computed by IMPACT are always highly correlated, so the fifth processing step was to reduce the dimensionality of the feature dataset using principal components analysis (PCA). The PCA algorithm converts the [N x 166] feature matrix, where N is the number of echoes and 166 the number of correlated features, into a [N x

166] principal components matrix, where the 166 components are mutually orthogonal

(see e.g. Davis 1986 for details on transforming data matrices with PCA). The 166 principal components are ordered by the amount of variance each explains in feature space, with the first principal component explaining most of the variance, the second explaining most of the remaining information, and so on. IMPACT arbitrarily retains only the first three principal components, thereby resulting in a truncated principal components matrix. QTC calls this [N x 3] data matrix “Q-space,” and the three dimensions, the first three principal components, are called Q1, Q2, and Q3 (Preston

2004; QTC 2004).

Sixth, echoes were clustered into classes based on their distribution in Q-space.

IMPACT’s autocluster function performs clustering using a simulated annealing routine

(Preston et al. 2004a). The outputs of IMPACT’s autocluster function are: a) the optimum number of clusters into which the echoes should be split, as defined by the Bayesian

Information Criterion (Preston et al. 2004a), b) the mean vector and covariance matrix defining each cluster in Q-space, and c) a class number for each echo assigning it to one of the defined clusters.

27

3.3.3 Identification of acoustic classes IMPACT’s auto-clustering routine divides a dataset into distinct clusters based on

echo shape, but, like any unsupervised classification routine, it cannot give these clusters

descriptive names (e.g. reef, rubble, seagrass etc..). The clusters output from IMPACT,

therefore, must be labeled by reference to other data sources.

Cluster labeling for these surveys employed comparison to satellite imagery,

notes taken while snorkeling or drift diving, statistics of the echoes themselves (i.e.

looking at individual echoes and mean echo shapes for each cluster), bathymetric cross

sections, sediment grain size measurements, and reference to previous seabed classifications at these sites (Gonzalez and Eberli 1997; FMRI 1998; Lidz et al. 2003;

Louchard et al. 2003; Mobley et al. 2004).

Once the individual classes were labeled, similar classes were aggregated to form

a second classification at each site consisting only of hard bottom and sediment classes.

Thus, the classes identified for each survey area fell into a hierarchy with two levels of

descriptive resolution: a fine level, within which the number of classes could vary among

sites, and a coarse level, consisting of just two classes per site.

3.3.4 Accuracy assessment Once the acoustic clusters had been labeled, they were quantitatively compared

with “ground truth.” Two types of data were collected for assessing the accuracy of the

acoustic classification: video images and diver-based observations. Downward looking

video images were acquired during the LSI and the Andros surveys where water was

shallow and clear enough that the seabed was always visible from the surface. In contrast,

the seabed was not visible from the surface at all times during the Carysfort and Fowey

28

rocks surveys, due to deeper and less clear water. Therefore, at Carysfort and Fowey

diver-based observations were acquired.

Video data from the LSI and Andros surveys were acquired with a Sony TRV 900

camera in an underwater housing that was mounted to the same pole that supported the

transducer used for the survey (Fig. 3.2). The camera was set to time-lapse mode, so that

an entire day’s worth of surveying could fit on one videotape. In time-lapse mode, the

camera acquired video (at full frame rate) for two seconds and then paused for 28

seconds. At the beginning of each survey day, the camera’s clock was synchronized with

a GPS unit. Using the time code embedded with each frame, the locations of the frames

were determined from the GPS tracks recorded for each day.

The LSI and Andros video data sets consisted of 1502 and 558 two-second long

clips, respectively. Each clip was viewed and a single frame was extracted for analysis.

For each of these frames, an analyst visually estimated what percent of the frame

consisted of rocky substrate, what percent was covered by sandy substrate, and what

percent was covered by rubble.

Figure 3.2: Underwater photographs of pole-mounted transducer, video camera housing, and sample frames grabbed from underwater video. Left: Pole-mounted transducer and video camera housing, as used for the LSI and Andros surveys. Left middle: Sample frame grabbed from the video over a sandy seabed. Right middle and right: Sample frames grabbed over low relief and high relief hard bottom, respectively.

The diver-based observations collected at Carysfort Reef and Fowey Rocks followed the Bohnsack and Bannerot (1986) stationary reef visual census (RVC) method.

29

The RVC protocol focuses on the collection of fish population data, but habitat data,

including estimates of the percent of the seabed covered by rock, rubble, and sediment,

are also collected (McClellan and Miller 2003). Estimates of substrate were the portion of the RVC dataset used for this analysis.

Comparison of the acoustic classification with the video / diver estimates of substrate was accomplished by computing the overall accuracy from an error matrix constructed for each survey site (Congalton and Green 1999). The comparison was made between each ground truth sample and the closest acoustic echo to that point. One refinement of the standard error matrix technique was necessary because the video / diver data was expressed as a fraction; the substrate at each point was X% sediment, Y% hard bottom, and Z% rubble. The acoustic classes, on the other hand, were discrete, so each entry in the error matrix was divided proportionally by the video / diver-estimated substrate. Chapter 5 has a sample calculation (Table 5.1).

3.4 Results The LSI data clustered into nine acoustic classes. Echoes were not evenly

distributed among the nine classes, however. Four of the classes contained 96.3% of the

echoes, and the other five classes comprised just 3.7% of the echoes (Table 3.4).

Furthermore, the five small (“minor”) classes did not exhibit any spatial coherence; they

were scattered throughout the study area. Efforts to label the classes, assess their

accuracy, and analyze the causes of errors therefore focused on the most populous

(“major”) classes; the minor classes were excluded from further analysis.

The cluster labeling process revealed that the four major classes at LSI

corresponded to one hard bottom and three sediment classes. The three sediment classes

30

were distinguished by the presence or timing of a second echo captured within the 256-

sample IMPACT analysis window. After aggregating the IMPACT-derived acoustic classes to a coarse level of descriptive resolution, the hard bottom and sediment classes contained 25.6% and 70.7% of the echoes, respectively. Comparison of the coarse-level

classes with the LSI video dataset indicated that the hard bottom / sediment acoustic classification had an overall accuracy of 74% (Fig. 3.4; Table 3.5A).

Table 3.4: Acoustic class labels and sizes (by percent of total echoes) in each survey area and aggregation of classes from fine descriptive resolution (DR) to coarse descriptive resolution. Coarse DR Fine DR Lee Stocking Island Andros Carysfort Reef Fowey Rocks Aggregated Class Class % Class % Class % Class % Low Relief Low Relief Hard Hard Bottom 25.6 Hard Bottom 5.8 Hard Bottom 45.8 Hard Bottom 53.4 Bottom High Relief High Relief Hard Bottom 3.2 Hard Bottom 9.9 Sediment 59.3 Sediment 78.7 Coarse Sand 9.9 Coarse Sand 25.6 Sediment Double Echo 1 6.3 Double Echo 1 5.2 Fine Sand 38.2 Fine Sand 7.9 Double Echo 2 5.1 Double Echo 2 3.8 Other 5 Minor classes 3.7 3 Minor classes 3.3 4 Minor classes 6.1 2 Minor classes 3.2

The Andros data clustered into eight acoustic classes. Five of these classes

contained 96.7% of the total number of echoes (Table 3.4), so analysis focused on these

five major classes. The cluster labeling process identified three sediment classes

distinguished by the presence or timing of a second echo and two hard bottom classes

distinguished by relief (Fig. 3.2 has examples of low and high relief hard bottom).

After aggregating the IMPACT-derived acoustic classes to a coarse level of

descriptive resolution, the hard bottom and sediment classes contained 9.0% and 87.7%

of the echoes, respectively. Comparison of the coarse-level classes with the Andros video

31

dataset indicated that the hard bottom / sediment acoustic classification had an overall

accuracy of 73% (Fig. 3.5; Table 3.5B).

The Carysfort data clustered into seven acoustic classes, three of which contained

93.9% of the total number of echoes (Table 3.4). Cluster labeling of these three major classes identified one corresponding to hard bottom and two corresponding to sediment.

Unlike LSI and Andros, where a second echo differentiated the sediment classes, at

Carysfort all of the echoes were too deep to record a second echo given the settings used

(Tables 3.2 and 3.3). Instead, at Carysfort the sediment classes differed by grain size (Fig.

3.3).

The mode of each of the sediment-sample grain size distributions at LSI (Fig.

3.3A; N=42) and at Andros (Fig. 3.3B; N=7) was in the range 0 (coarse sand) < φ < 3

(fine sand) where φ is the grain size expressed as -log2(grain size in mm). No acoustic

classes were discriminated based on grain size at LSI or Andros. In contrast, two groups

emerged from the sediment-sample grain size distributions at Carysfort and Fowey Rocks

(Fig. 3.3C; N=17). One set of samples had a mode in the range -1 (very coarse sand) < φ

< 1 (coarse sand). The other set of samples was in the range 2 (fine sand) < φ < 4 (very

fine sand). Eight of the ten samples that were closest to an echo labeled as “coarse

sediment” (Fig. 3.3C grey lines with circle markers) had a grain size distribution with a

mode between coarse to very coarse sand (0.5 - 2 mm). Five of the seven samples that

were closest to an echo labeled as “fine sediment” (Fig. 3.3C black lines without

markers) had a grain size distribution with a mode between fine or very fine sand (0.062 -

0.25 mm).

32

Figure 3.3: Sediment grain size distributions for samples from the survey areas. Volume percent of the sample is plotted in equally-spaced φ intervals (φ = -log2(mm)). A) Lee Stocking Island (N=42); B) Andros (N=7); C) Carysfort and Fowey samples plotted together (N=17). LSI and Andros acoustic data were not separated by grain size whereas Carysfort and Fowey acoustic data split into two classes corresponding to coarse and fine sand-sized sediment. Line styles reflect the acoustic class of the closest echo to each sediment sample.

After aggregating the IMPACT-derived acoustic classes at Carysfort Reef to a coarse level of descriptive resolution, the hard bottom and sediment classes contained

45.8% and 48.2% of the echoes, respectively. Comparison of the coarse-level classes with the RVC substrate dataset indicated that the Carysfort hard bottom / sediment acoustic classification had an overall accuracy of 86% (Fig. 5.4, Table 3.5C).

33

Figure 3.4: LSI acoustic classification (left) and video classification (right) plotted on top of a true-color IKONOS image (copyright GeoEye.com) of the LSI study area.

Figure 3.5: Andros acoustic classification (left) and video classification (right) plotted on top of a true-color IKONOS image (copyright GeoEye.com) of the study area.

34

Table 3.5: Error matrices from four survey sites comparing coarse descriptive resolution acoustic classes (Table 3.4) with ground truth obtained from video analysis or diver observations. No acoustic classes corresponded to rubble, so that row, which would be filled with zeros, has been left out of these tables. This omission does not change any of the accuracy calculations. Entries in the table are fractional because ground truth data were acquired as percentages of sediment, rubble, and hard bottom. Overall accuracy ranges from 73% to 86% at these four sites. A) Lee Stocking Island Video Classes Acoustic Classes Sediment Rubble Hard Bottom User's accuracy

Sediment 890.02 2.08 176.9 0.83

Hard bottom 188.27 2.52 189.21 0.50

Producer's accuracy 0.83 0.52

Total number of video frames 1502 Number of video frames matching minor acoustic classes 53 Number of video frames matching major acoustic classes 1449 Overall accuracy: 0.74

B) Andros Video Classes Acoustic Classes Sediment Rubble Hard Bottom User's accuracy

Sediment 349.36 9.88 123.76 0.72

Hard bottom 15.47 1.4 48.13 0.74

Producer's accuracy 0.96 0.28

Total number of video frames 558 Number of video frames matching minor acoustic classes 10 Number of video frames matching major acoustic classes 548 Overall accuracy: 0.73

C) Carysfort Diver Classes Acoustic Classes Sediment Rubble Hard Bottom User's accuracy

Sediment 7.3 0 0.7 0.91

Hard bottom 1.1 0.7 8.2 0.82

Producer's accuracy 0.87 0.92

Total number of dive sites 22 Number of video frames matching minor acoustic classes 4 Number of video frames matching major acoustic classes 18 Overall accuracy: 0.86

D) Fowey Rocks Diver Classes Acoustic Classes Sediment Rubble Hard Bottom User's accuracy

Sediment 4.22 0 1.78 0.70

Hard bottom 1.34 0.19 7.48 0.83

Producer's accuracy 0.76 0.81

Total number of dive sites 15 Number of video frames matching minor acoustic classes 0 Number of video frames matching major acoustic classes 15 Overall accuracy: 0.78

35

The Fowey Rocks data clustered into six acoustic classes, four of which made up

96.8% of the total number of echoes (Table 3.4). As with the other sites, analysis focused on these four major classes. The cluster labeling process indicated that two of the acoustic classes corresponded to hard bottom, and two classes corresponded to sediment.

The two hard bottom classes were discriminated by relief. The two sediment classes were discriminated by grain size (Fig. 3.3) as at Carysfort Reef.

Figure 3.6: Fowey rocks acoustic classes (track lines) and diver estimated substrate (pie charts).

After aggregating the IMPACT-derived acoustic classes at Fowey Rocks to a coarse level of descriptive resolution, the hard bottom and sediment classes contained

63.3% and 33.5% of the echoes, respectively. Comparison with the RVC substrate dataset indicated that the Fowey Rocks hard bottom / sediment acoustic classification had an overall accuracy of 78% (Fig. 3.6; Table 3.5D).

36

3.5 Analysis: where were the errors? The overall accuracies for hard bottom / sediment classification in the four study sites ranged from 73 to 86% (Table 3.5). The error matrix alone, however, does not provide any explanation for the classification errors. In particular, were there any features of the seabed in areas consistently misclassified that might explain the error?

Understanding the cause of misclassification may help guide future improvements to the survey system or survey design.

3.5.1 Lee Stocking Island The first step in analyzing the errors at LSI was to find the areas where the acoustic classification least agreed with the estimates of substrate from the video. To do this, the overall accuracy was computed for each individual frame. Traditionally, overall accuracy for a single validation site (video frame in this case) may take on values of only zero or one because in most studies the ground truth represents a single class at each location. As noted in Section 3.3.4, however, the ground truth for these surveys contained continuous measurements of the fraction of each site covered by hard bottom, rubble, or sediment. Therefore, just as the overall accuracy computed from the entire error matrix

(Table 3.5) could vary from zero to one, the overall accuracy for each video frame could vary from zero to one. The distribution of the frame-by-frame overall accuracies at LSI was bimodal (Fig. 3.7), indicating that most video frames either nearly perfectly agreed or nearly perfectly disagreed with the class of the nearest echo.

In the LSI dataset, there were 188.3 hard bottom sites classified by the acoustics as sediment and 176.9 sediment sites classified by the acoustics as hard bottom (Table

3.5a). Most of the misclassified sediment sites occurred in the deepest parts of the study area (Fig 3.8). In fact, almost all of the sites deeper than six meters that were classified on

37

the video as sediment dominated were classified as hard bottom by the acoustics (Fig

3.8). In contrast, most of the misclassified hard bottom sites were found along the edges

of islands (Fig. 3.7), which were the areas with shallowest water.

Figure 3.7: Within-frame accuracy for the LSI dataset. Left: Within-frame overall accuracy histogram (N=1502). Right: Map illustrating locations of video frames with less than 10% within-frame accuracy. Black box outlines the Adderly Cut area shown in Fig. 3.9; islands and boat track lines are shown in grey.

Figure 3.8: Depth-frequency histogram for sediment-dominated video frames in the LSI dataset. Red bars represent all sediment-dominated video frames (N=1114); blue bars represent sediment-dominated video frames in which the closest acoustic echo was misclassified as hard bottom (N=214).

38

Diver-based observations from July 2002 helped characterize the nature of the

misclassified LSI video frames. Due to time constraints, diving effort focused in Adderly

Cut, the channel just to the north of LSI (boxed area in Fig. 3.7) because it had a large

fraction of the misclassified sites (Fig. 3.7), and had been mapped previously (Gonzalez

and Eberli 1997; Louchard et al. 2003), which assisted the choice of dive sites.

Between July 6-11, 2002, 31 dives were made in the Adderly Cut area (Fig. 3.9).

Target locations for dives were taken from the acoustic tracks. The actual location of each site was recorded as divers entered the water using a Garmin GPSMap 162 with

WAAS. Data acquired at each site included: sediment depth, rugosity, underwater photographs, and a qualitative site description. Sediment depth was measured by probing with a 3/16” stainless steel rod that had been etched at 5 cm intervals up to 40 cm.

Figure 3.9: Classified acoustic tracks, video frames, and dive sites in the Adderly Cut portion of LSI study area. Tracks are colored red or blue based on acoustic class hard bottom or sediment, respectively. Triangles are plotted at the locations of video frames and colored red or blue based on the dominant cover type being hard bottom or sediment, respectively. Dive sites A-D are labeled according to Tables 3.6, 3.7. Sites labeled with an E were too far from track lines or video frames to be useful.

39

Rugosity was measured by draping a 188 cm long chain with 1 cm links along the seabed and measuring the straight-line distance covered. The actual rugosity index was calculated from each chain measurement as the total length of the chain divided by the linear distance covered when draped over the seabed (Luckhurst and Luckhurst 1978).

The chain measurements were repeated six times at each dive site, with the chain laid three times parallel in one direction and then three times in a perpendicular direction.

There were features observed on the dives, such as large coral heads and sand waves, that were too large to measure with the 188 cm chain. An effort was made to qualitatively capture these features, which are hereafter referred to as “macrorelief,” as part of the site description and with scale bars placed in underwater photographs.

Descriptions of the sites revealed common features within each group (Tables 3.6,

3.7). All of the sites in group A, the video hard bottom sites that were also classified as hard bottom by the acoustics, contained gravel-sized (> 2mm) but little to no finer-

Table 3.6: Diver descriptions of Adderly Cut dives from July 2002 grouped according to the class of the closest video frame (columns) and acoustic echo (rows). Group letters A- D are consistent with Table 3.7 and Figures 3.9, 3.10.

40 grained sediment. Three of the sites in group A contained corals, and three also contained

“macrorelief” in the form of seagrass swales or hummocky hard bottom. The common feature among sites in group B, the video hard bottom sites that were classified as sediment by the acoustics, was the presence of sediment overlying hard bottom. At four of the sites in group B, the sediment formed a thin veneer on top of an otherwise relatively flat hard bottom. At the fifth site in group B the sediment was collected in up to

25 cm thick packages forming in the lee of 20-30 cm high hard ground hummocks. Five of the six sites in group C, the video sediment sites that were classified as hard bottom by the acoustics, contained seagrass swales, which were 0.5 to 1.5 m high seagrass-stabilized sand dunes. The sides of the swales were often nearly vertical, formed by large blowouts.

All of the sites in group D, the video sediment sites that were classified as sediment by the acoustics, were characterized by low relief sediment with seagrass of varying density.

Table 3.7: Probe depths (cm), mean and standard deviation (Std) of rugosity (unitless) in Adderly Cut measured by divers in July 2002. Group average rugosity values were computed on the entire set of raw data for each group, not averaging the average values for each site. Group letters A-D are consistent with Table 3.6 and Figures 3.9, 3.10.

41

The diver data (Tables 3.6, 3.7) revealed common characteristics of misclassified sites. The most common cause for misclassification of the hard bottom sites was the presence of a thin veneer of sediment (Group B in Tables 3.6, 3.7, Fig. 3.10). The dominant cause of misclassification of the sediment sites was the presence of macrorelief created by large seagrass swales (Group C in Tables 3.6, 3.7, Fig. 3.10). Underwater photographs captured the archetype of each group (Fig. 3.10).

Knowing that sediment veneers on hard bottom and macrorelief in seagrass were the two most apparent sources of confusion between the video and acoustic

Figure 3.10: Underwater photographs illustrating the four groups of sites in the Adderly Cut portion of the LSI study area. Groups A and D represent agreement between the video and acoustic datasets; groups B and C represent errors. Divisions on the scale bar are 10 cm. The pictures are from sites A4, B4, C2, and D6. Group letters A-D are consistent with Tables 3.6, 3.7 and Figure 3.9.

42

classifications, the video frames were reviewed to flag those that contained either of these

features. Three hundred thirty-six video frames had been identified with less than 10%

agreement between the video and acoustic classifications (Fig. 3.7). Each of these video

frames was inspected. The hard bottom sites with sediment veneers were readily

identifiable on the video frames. Of the 163 video hard bottom sites that had been

misclassified acoustically as sediment, 130 of them were flagged as having a shallow

surficial layer of sediment. The seagrass sites containing swales with significant relief, on

the other hand, were not easy to reliably identify, owing, in part, to the fact that they were

all located in the deepest part of the survey area (Fig 3.8) where distance to the seabed

and water column attenuation most obscured the details in the video images.

After reviewing the video frames, a new error matrix was computed that excluded

the 130 hard bottom with sediment veneer sites; no seagrass sites were excluded because

it was too difficult to assess local relief from the video frames. Overall accuracy

improved from 74%, with the sediment veneer sites included, to 84% with these sites

excluded (Table 3.8). Thus, 38% of the overall error in the LSI can be attributed to hard bottom with sediment veneer.

Table 3.8: Revised LSI error matrix computed by excluding hard bottom sites with a veneer of sediment. Video Classes Acoustic Classes Sediment Rubble Hard Bottom User's accuracy

Sediment 888.92 1.13 49.95 0.95

Hard bottom 188.27 2.52 188.21 0.50

Producer's accuracy 0.83 0.79

Total number of video frames 1372 Number of video frames matching minor acoustic classes 53 Number of video frames matching major acoustic classes 1319 Overall accuracy: 0.82

43

3.5.2 Andros Island Area Analysis of the misclassified video frames for the Andros datasets followed the

approach used at LSI. Mismatches between the video and acoustic datasets were located

by computing overall accuracy on a frame-by-frame basis, as at LSI. Also like LSI, most

of the video frames either nearly perfectly agreed or disagreed with the nearest acoustic

class (Fig. 3.11). Unlike LSI, which had approximately the same number of hard bottom

sites classified as sediment as sediment sites classified as hard bottom (Table 3.5A), in

the Andros dataset the dominant error type was video hard bottom classified by the acoustics as sediment (Table 3.5B). Of the 126 video frames with less than 10%

agreement with the class of the nearest echo (Fig. 3.11), 94 were classified by the video

as hard bottom, but clustered acoustically with sediment. Only 27 were classified by the

video as sediment but clustered acoustically with hard bottom.

Reviewing the video frames revealed that 90 of the 94 hard bottom-dominated

frames that were closest to echoes classified as sediment contained thin veneers of

sediment over low relief hard bottom. Acoustically, these low relief hard bottom sites

with sediment veneer clustered with areas covered by thicker sediment. Ecologically,

however, low relief hard bottom with sediment veneer provides a habitat more similar to

Figure 3.11: Within-frame overall accuracy histogram for the Andros dataset (N=558). Most sites exhibited nearly perfect agreement (OA=1) or disagreement (OA=0) between the acoustic and video datasets.

44

hard bottom than to sediment. Sediment sites were either bare (Fig. 3.12A) or covered

with seagrass and macroalagal forms that do not require a holdfast. In contrast, the hard

bottom sites with sediment veneer were covered with hydrocorals, octocorals, small stony

corals, and algae attached to firm substrate (Fig. 3.12B). The two acoustic classes in the

Andros dataset that did correspond to hard bottom (Table 3.4) contained the same types

of benthic organisms found in the hard bottom sites with sediment veneer but in different proportions. Specifically, meter-scale relief, a metric controlled by the proportion of stony corals, differentiated the two acoustic hard bottom sites from each other and from the hard bottom sites with sediment veneer (Fig. 3.12B, C, D).

Reviewing the video frames of the 27 sites classified acoustically as hard bottom and by the video as sediment did not reveal any common characteristics among the sites that might explain the misclassification. At LSI, the presence of high relief, seagrass- stabilized sand waves explained the echoes from sediment sites clustering with hard bottom, but no such features were observed within the Andros survey area.

Recalculating the error matrix after eliminating sites determined by a review of the video frames to have sediment veneers over low relief hard bottom resulted in overall accuracy improving from 71% to 85% (Table 3.9). Thus, 48% of the overall error in the

Andros survey was caused by hard bottom with sediment veneer.

Table 3.9: Error matrix for the Andros survey excluding hard bottom sites covered with a thin sediment veneer. Sediment veneer accounts for 48% of the overall error at Andros. Video Classes Acoustic Classes Sediment Rubble Hard Bottom User's accuracy

Sediment 348.21 8.04 42.75 0.87

Hard bottom 15.37 1.35 46.28 0.73

Producer's accuracy 0.96 0.52

Total number of video frames 470 Number of video frames matching minor acoustic classes 8 Number of video frames matching major acoustic classes 462 Overall accuracy: 0.85

45

Figure 3.12: Oblique underwater photographs of four seabed types in the Andros Island survey area. Each row contains two pictures of each seabed: A) sediment, B) low-relief hard bottom with sediment veneer, C) low-relief hard bottom, D) high-relief hard bottom. Acoustically, seabed type B clustered with seabed type A, but ecologically seabed type B is closer to seabed type C.

46

3.5.3 Carysfort Reef and Fowey Rocks Analysis of the acoustic classification for Carysfort Reef and Fowey Rocks

followed the approach used at LSI and Andros. Mismatches between the diver-assessed

substrate and acoustic datasets were located by computing overall accuracy on a dive-by- dive basis, as they had been computed frame-by-frame for LSI and Andros. No dive sites had with zero percent accuracy relative to the closest acoustic echo (Fig. 3.13). This differed markedly from the LSI (Fig. 3.7) and Andros (Fig. 3.11) surveys in which the majority of the non-matching video frames were complete errors. No systematic misclassifications were evident in the Carysfort and Fowey Rocks datasets. Therefore, no reclassification was attempted with either of these datasets.

Figure 3.13: Within-frame overall accuracy histograms for the Carysfort (left) and Fowey Rocks (right) datasets. Unlike the LSI and Andros datasets (Figs. 3.7, 3.11) no dive sites had zero percent accuracy relative to the closest acoustic echo.

3.6 Discussion The overall objective of this study was to determine what seabed classes were

consistently distinguished by a single-beam acoustic classification system among

multiple survey areas in coral reef environments; in other words, “How transferable are

acoustic seabed classes?” The results showed that at coarse descriptive resolution

47

acoustic seabed classes were transferable. In addition, the results in two of the four study

areas revealed specific seabed characteristics that lead to systematic classification errors.

After commenting on descriptive resolution and commonly observed classification errors,

the following discussion considers how these results relate to previous studies and how

single-beam acoustic classification could contribute to mapping coral reef environments, given the strengths and limitations observed in this study.

The transferability of single-beam acoustic seabed classification schemes was assessed quantitatively at a coarse descriptive resolution. Hard bottom was well discriminated from sediment using the QTC system. Overall accuracy for the two-class hard bottom / sediment classification ranged from 73 to 86% in four study sites separated by as much as several hundred km (Table 3.5). Quantitatively assessing transferability at a fine level of descriptive resolution was not possible because the clustering process identified a unique set of seabed classes at each study site (Table 3.4). Qualitatively, therefore, the results showed that even though some individual classes were identified at more than one site, single-beam acoustic seabed classification schemes as a whole were not transferable between sites at a fine level of descriptive resolution. That acoustic seabed classes should be transferable at a coarse but not fine level of descriptive resolution is consistent with expectations based on the classes identified in previous studies (Table 3.1).

Based on diver observations and reevaluation of the video ground truth data, the most common classification error was caused by a thin (< 10 cm) veneer of sediment on top of hard bottom, which caused the hard bottom to be classified by the acoustics as sediment (Figs. 3.10, 3.12). From a purely physical point of view, one might reasonably

48 argue that classifying a thin surficial layer of sediment as “sediment” does not actually constitute an error. Ecologically, however, the biological community inhabiting a thin veneer over hard bottom differs greatly from one inhabiting deep sediment (Cf. Figs.

3.10B vs. 3.10D and Fig. 3.12A vs. 3.12B). Most applications, such as management, stratifying sampling effort, scaling demographic measurements up by habitat area, and so forth, would want to distinguish areas with sediment veneer from areas with thick sediment.

The second commonly misclassified seabed type was found at LSI only and included large (~1m or higher) seagrass-stabilized sand waves being classified as hard bottom. These areas frequently, though not always, were intermingled with a pavement hard bottom and gravel. The physical basis for this type of misclassification was probably echo lengthening due to macroroughness (large change of depth within the echo footprint) of the large waves. The relevant roughness length scale for 50 kHz sonar is on the order of a wavelength, or about 3 cm. Bare hard bottom in any of these survey sites will have many surface facets on this length scale (Figs. 3.2, 3.10A, 3.12C, D). In contrast, the hard bottom areas with thin sediment veneer were much smoother (Figs.

3.10B, 3.12B). The roughness length of the seagrass-stabilized sand waves was much longer, on the order of meters, but the waves were large enough to create significant depth variation within a single echo. The effect of such macroroughness on echo shape is the same as wavelength-scale seabed roughness because both lengthen the duration of the returned echo.

The misclassified sediment veneer and seagrass dune habitats support the conclusion that the QTC classification in these study areas was more sensitive to seabed

49 roughness than to sediment penetration. Oblique underwater photographs provided further illustration that habitats clustered by roughness. Two ecologically different but similarly physically smooth habitats (Fig. 3.12A, B) clustered together, distinct from a third slightly rougher seabed (Fig. 3.12C) and a fourth much rougher seabed (Fig. 3.12D).

Overall accuracy of the coarse-level classification at the four survey sites was comparable to previous values for coarse-level habitat mapping reported by studies using satellite or airborne imagery. Mumby and Edwards (2002), for example, reported overall accuracies ranging from 68% to 75% using satellite imagery and 89% using airborne imagery for a four-class scheme (coral, macroalgae, seagrass, sand). Andrefouet et al.

(2003) reported, for four or five habitat classes, overall accuracies of 77%, 78%, and 81% at Glovers, Heron, and Shiraho reefs using Iknonos imagery. Capolsini (2003) et al. reported overall accuracies ranging from approximately 73-85% (exact figures not given) for a single study area classified into three, four, and five classes. Maeder et al. (2002) reported an overall accuracy of 89% for a scheme with five classes (four benthic classes plus deep water). In such studies, overall accuracy commonly decreased as the number of image-derived habitat classes increased (Andrefouet et al. 2003).

Overall accuracy at the four survey sites was also comparable to values reported by previous single-beam acoustic seabed classification studies in coral reef environments

(Table 3.1; Fig. 3.14) with the possible exception of Riegl et al. (2007). The findings of this study appear to contradict Riegl et al. (2007) who found that hard bottom with sediment veneer classified reliably as hard bottom. The present study, in contrast, found that hard bottom with sediment veneer classified as sediment in both survey areas where it was present. Riegl et al. (2007) did not discuss sediment veneer in detail but did note

50

that it was interspersed between near-shore rocky ridges. The terrain mapped by Riegl et

al. (2007) near Cabo Pulmo includes an undulating bedrock plain with a patchy thin

veneer of sediment such that bedrock does protrude through the overlying sediment (B.

Riegl, personal communication Dec. 2008). Given this description, there are two

potential explanations for the discrepancy between Riegl et al. (2007) and the results at

LSI and Andros. First, patchy versus unbroken veneer may be a crucial difference. If the

(rough) underlying rocky surface protrudes over even a fraction of the ensonified area, or

over a fraction of the echoes within a stack, the echo might be lengthened sufficiently to

appear as a hard ground. The difference between groups A and B (Tables 3.6, 3.7, Fig.

3.10) was not that group A had no sediment; it was that group B had complete sediment

cover. A second potential explanation for sediment veneers in Cabo Pulmo classifying as

hard bottom may be the macrorelief provided by the nearby rocky ridges. As evidenced

by groups C and D (Tables 3.6, 3.7, Fig. 3.10), sufficient local relief appears the same

acoustically as a rough seabed. Without further quantitative data, it is not possible to

resolve which of these explanations dominates in the case of Cabo Pulmo. It does seem

plausible, however, that the discrepancy between Riegl et al. (2007) and the results of this

study may result more from different definitions of “veneer” than from conflicting

performance of the acoustic classification systems.

What are some potential uses of single-beam acoustic classification in coral reef

environments, given the strengths and limitations illustrated above? As just mentioned,

the overall accuracy of the simple two-class acoustic seabed maps was comparable to

coarse-level seabed maps derived from overhead optical imagery. Even coarse-level habitat maps derived from overhead images identify from three to five classes, however

51

(Maeder et al. 2002; Mumby and Edwards 2002; Andrefouet et al. 2003; Capolsini et al.

2003). It is worth, therefore, considering whether a simple two-class rock / sediment classification scheme is useful. An important point to remember is that with acoustic data, even single-beam data, the classification may be supplemented by utilizing bathymetry, which is inherently part of the data collection.

Figure 3.14: Plot of overall accuracy as a function of the number of acoustic classes. Open symbols represent the values reported in this study. Closed symbols represent values reported in earlier studies. Generally, accuracy decreases with increased numbers of acoustic classes.

One example of the benefits of adding a simple rock / sediment classification to traditional bathymetric data for habitat mapping is the ability to discriminate outcropping parts of the seabed. Hard bottom is known to be important habitat for many types of fish.

Sometimes substrate can be inferred from topographic profiles, but in other cases interpreting outcrops from bathymetry alone can be misleading. An example from the

Florida Keys (Fig. 3.15) illustrates a case where bathymetry alone provided a misleading picture of habitat. Based on bathymetry alone, one portion of this example survey area appeared to be promising fish habitat because it contained two parallel ridges with steep slopes (orange area highlighted in Fig. 3.15). In contrast, another portion of this example

52 survey area (purple area highlighted in Fig. 3.15) appeared to be a flat, featureless plain based on bathymetry alone. Considering seabed type in addition to bathymetry, a different interpretation became apparent; the area in orange was sediment-covered, providing little shelter for fish, whereas the area in purple was covered with small patch reefs, which would, in general, provide excellent habitat for reef fish. Augmenting bathymetry with hard bottom / sediment seabed mapping will be exploited in Chapters 5 and 6 for the interpretation of grouper and snapper habitat.

Figure 3.15: Example of the utility of rock / sediment seabed classification in interpreting bathymetry for fish habitat. Top panel: sunshaded bathymetry from interpolated single-beam echosounder depths along the upper slope seaward of Carysfort and Watson’s Reefs, Florida Keys, USA. Bottom panel: Survey track lines superimposed on the same sunshaded bathymetry as the top panel. The track lines are colored red for hard bottom (rocky) and gray for sediment substrate. In both panels, oblique view is from the ESE, total along-shelf distance is about 12 km, and depth ranges from 3 to 60 m.

53

A second benefit to habitat mapping resulting from the integration of a simple

rock / sediment classification with bathymetry is the potential to create a single, objective

habitat classification scheme based on patchiness and relief. Franklin et al. (2003)

proposed such a habitat classification scheme for coral reef environments. Patchiness was

defined as the percent of the seabed within a certain radius of the point being classified

that was covered with sediment. Relief was defined as the depth range within a certain

radius of the point being classified. Franklin et al. (2003) showed how this classification

scheme effectively stratified survey efforts in the Dry Tortugas, USA. The actual maps

used by Franklin et al. (2003), however, were produced by subjective interpretation and

integration of multiple data sources. Using a single-beam ASC system to map the

Franklin et al. (2003) classes would capture the benefits of this classification scheme and simultaneously improve both the automation and objectivity of the method.

Miller et al. (2008) used QTCV data and the Franklin et al. (2003) classification scheme to produce a benthic habitat map for the Navassa National Wildlife refuge. One of more than 100 transects used to create the Navassa map illustrates the utility of a classification scheme based on patchiness and relief as well as the ability to implement such a scheme using QTCV data (Fig. 3.16). Two types of patch reefs were observed along the transect (Fig. 3.16). Reefs labeled “A” and “B” had similar cross-shelf width,

but those labeled “B” had higher relief. Considering substrate only (Fig. 3.16, top

profile), these reefs were all placed in the same class (hard bottom), but considering both

patchiness and relief (Fig. 3.16, bottom profile), these reefs were discriminated (yellow vs. blue classes).

54

Figure 3.16: Cross shelf profile of part of the Navassa Island insular shelf. Top: Echoes colored by substrate only, red for hard bottom and green for sediment. Bottom: The same transect with echoes colored by a combination of local patchiness and relief. Low and high relief patch reefs, marked A and B respectively, are all classified as hard bottom, but can be differentiated using a combination of local patchiness and relief.

Franklin et al. (2003) produced a habitat map of the Dry Tortugas using a classification scheme based on patchiness and relief, but the map was based on qualitative interpretation only, due to the available datasets: primarily side-scan sonar, aerial imagery, and diver assessment. In contrast, the classes demarked with the single- beam ASC approach (Fig. 3.16) were objectively determined. Increased automation and objectivity of seabed classification is one goal of the ASC approach (Anderson et al.

2008), which is here illustrated by implementing the Franklin et al. (2003) classification scheme using QTCV data.

One of the most useful attributes of a simple hard bottom / sediment classification scheme is that the classes can be consistently identified across multiple areas. The seabed classes mapped in previous studies in coral reef environments were not consistent from site to site (Table 3.1). Indeed the fine descriptive resolution seabed classes identified in

55

this study were also not consistent from site to site (Table 3.4). All of the classes mapped

in previous studies and all of the classes mapped in this study, however, could be

collapsed to a coarser descriptive level into hard bottom and sediment classes. Having

even just two classes that mean the same thing in different survey areas, when mapped by

different systems and different operators, is potentially extremely useful. One of the

biggest challenges to remotely-sensed inventories in general, and reef assessments in particular, is that different classification systems, data processing, and assessment protocols mean that comparing results from different sensors, investigators, or locations is not straightforward (Green et al. 1996; Mumby and Harborne 1999). Tools and techniques that yield reliable and objective discrimination between hard bottom and sediment, such as those developed and illustrated here, are an important first step toward producing uniform maps across regions and operators.

3.7 Conclusions Transferability of results refers to both the consistency of classes mapped in

different places as well as the consistency of classes mapped by different sensors or

processing methods. This study addressed the first aspect of transferability by using a

single commercial single-beam acoustic seabed classification system to map multiple

sites in coral reef environments. The QTCV / IMPACT acoustic seabed classification

system discriminated rocky from sediment substrate with 73% to 86% accuracy at four

sites using the same hardware and different methods for ground truth.

Further work to improve the ability of single-beam classification systems to

discriminate deep sediment from hard bottom covered with a thin sediment veneer would

improve classification accuracy. For example, hard bottoms with a thin sediment veneer

56 were responsible for 38% and 48% of the total error at LSI and Andros, respectively. As evidenced at both LSI and Andros, these habitats can cover large areas in carbonate bank top environments and should therefore be a high priority for improvements to ASC methods.

Evidence from two systematically misclassified seabed types, hard bottom covered with sediment veneer and sediment with macrorelief, suggests that variability in echo shape depended more strongly on surface roughness than volume scattering in these survey areas. Future surveys over hard bottom with sediment veneer might therefore benefit from lower frequency data, which should penetrate the seabed deeper than the 50 kHz system used here. Future surveys over seabeds with high macrorelief may benefit from a narrower beam width transducer, to reduce footprint size, or new post-processing algorithms incorporating bathymetry.

The ability to accurately discriminate hard bottom from sediment appears to be a robust property of single-beam acoustic seabed classification systems. These two classes were reliably distinguished at four sites in this study and at five additional sites in previous studies, after considering which previously identified classes could likely be consolidated. A classification scheme that can be objectively applied using multiple systems, in multiple locations, by different people would be useful for efficiently exploring large areas of potential coral reef habitat that have not been mapped due to water clarity or depth.

Single-beam acoustic seabed classification systems have the potential to play an important role in coral reef mapping efforts due to the capability to extract bathymetry and distinguish rock from sediment substrate, in combination with their low cost and

57 portability. By documenting class transferability at coarse descriptive resolution, this study supports the concept that single-beam acoustic seabed mapping systems can efficiently providing rapid reconnaissance and moderate resolution habitat maps of large areas, thereby complementing other more detailed, but more costly, survey methods.

Chapter 4: Reproducibility of single-beam acoustic seabed classification under variable survey conditions

4.1 Background Maps depicting thematic seabed classes are important tools for both research and

management. Where the water is relatively clear, for example in many parts of the

tropics, satellite and aerial imagery play an important role for classifying the coastal zone

(Green et al. 1996). Most of the global seabed cannot be mapped using overhead

imagery, however, due to water clarity or depth limitations. In deep or turbid water,

acoustic remote sensing is an efficient method for classifying the seabed.

A recent review of acoustic seabed classification (ASC) identified the

standardization of instruments and methods as one category of “burning issues” for future

research (Anderson et al. 2008). A standardized method for ASC would be based on

statistical, objective procedures that reduce the variability inherent to subjective

interpretation (i.e. visually by an analyst). One aspect of a standardized method, which

was discussed in Chapter 3, would be a single classification scheme applicable to

multiple survey areas, different equipment, and different operators. A second aspect of a

standardized method is the reproducibility of results. ASC must be reproducible under

different survey conditions because the goal is to create a map of the seabed, not a map of uncontrolled survey parameters. Any correlation between echo features and survey parameters such as sea state will decrease the accuracy of the resulting seabed classes.

The overall objective of this study was to assess the reproducibility of a commercial single-beam acoustic seabed classification system. The two specific questions asked were, first, “How reproducible are the results of a particular off-the-shelf survey system?” Second, “Can reproducibility be improved by altering the standard usage

58 59 of the off-the-shelf software?” Six surveys performed with a single-beam sounder over a pair of transects near Miami, FL, USA, between 1 May and 13 August 2007 were used to answer these questions. The strategy of repeat surveys over the same area follows the suggestion of Anderson et al. (2008) that verification of instrument performance over reference areas is one way to assess the reproducibility of entire end-to-end ASC systems, as opposed to calibrating individual components of the system.

The particular single-beam ASC system used to evaluate questions of reproducibility consisted of two components, both produced by the Quester Tangent

Corporation (QTC; Sidney, BC, Canada). A QTCView Series V (QTCV) was used for data acquisition and the QTC IMPACT software was used for data processing. Together, these components comprise the "QTC system". Using a single commercial system might appear limiting, and certainly the measured absolute values of reproducibility only apply to this system and indeed this survey area. The framework described for measuring reproducibility, however, could be applied to data from any other system, or from another

QTC system in another environment. Furthermore, the conclusions about what factors affect reproducibility are not necessarily confined to the QTC system. Methodological improvements that increase the reproducibility of QTC results will likely benefit other single-beam ASC systems as well. Thus, these experiments should be of interest to the broader acoustic seabed classification community in addition to existing QTC users.

4.2 Previous work Several aspects of single-beam ASC system reproducibility have been examined in previous studies, but there has not yet been a systematic evaluation of the reproducibility of QTCV data. Foster-Smith and Sotheran (2003) surveyed Loch Maddy,

60

Scotland, on two different days with different 200 kHz RoxAnn systems and found 57% similarity between the resulting maps. Greenstreet et al. (1997) mapped the inner Moray

Firth, Scotland, with a 38 kHz RoxAnn system and found systematic differences when comparing to a repeat survey of the same area about four months later. Brown et al.

(2005) conducted four surveys of the same area in the Firth of Lorn, Scotland, varying vessel speed, 200 kHz RoxAnn system, track spacing, and track orientation among the four surveys. Agreement between pairs of surveys ranged from 36% to 43%. Wilding et al. (2003) revisited a set of eleven stations off the west coast of Scotland on six days over the course of ten months using a 200 kHz RoxAnn. Wilding et al. (2003) found that E1 and E2 values, the two parameters on which the RoxAnn system bases its classifications, drifted over the time scales of hours, days, and months along their repeat transect. They also found E1 and E2 to vary with vessel speed. Von Szalay and McConnaughey (2002) performed systematic repeat transects, on the same day, at two sites and found no effect of vessel speed on classification reproducibility using a 38 kHz QTC Series III (QTCIII) and 38 kHz Series IV (QTCIV).

The choice of ASC system is one difference between this study and previous work. Most previous investigations of reproducibility used the RoxAnn ASC system

(Greenstreet et al. 1997; Foster-Smith and Sotheran 2003; Wilding et al. 2003; Brown et al. 2005). Reproducibility characteristics may differ between RoxAnn and QTC systems, however, for several reasons. First, the two systems exploit different portions of echoes.

RoxAnn systems analyze the tail of the first echo and the entire second echo whereas

QTC systems analyze the entire first echo. Second, the performance of RoxAnn and QTC systems has been shown to differ in other ways besides reproducibility. For example,

61

QTCIV and RoxAnn have produced different results in the same study area (Hamilton et al. 1999). Furthermore, vessel speed has been shown to affect RoxAnn classifications

(Hamilton et al. 1999; Wilding et al. 2003) whereas QTCIV classifications have proven consistent under variable vessel speed (Hamilton et al. 1999; von Szalay and

McConnaughey 2002).

This study differs from previous work in other ways beyond simply the choice of

ASC system. Some previous studies have interpolated E1 and E2 values to a regular grid before comparison (Greenstreet et al. 1997; Foster-Smith and Sotheran 2003; Brown et al. 2005) whereas this study compared individual points without interpolation. There is no problem with interpolation in general, but it is another confounding step when trying to understand why maps might differ. Therefore, understanding the limitations to reproducibility will be simpler without interpolation. Some previous studies compared the output of different sounders (Foster-Smith and Sotheran 2003; Brown et al. 2005) whereas this study used one system for all data collection to limit potential confounding factors to survey conditions. Some previous studies have compared the output of supervised classification (von Szalay and McConnaughey 2002; Foster-Smith and

Sotheran 2003; Brown et al. 2005), whereas unsupervised classification was used here.

One study used the overall percentages of classes within repeat transects as a metric for reproducibility (von Szalay and McConnaughey 2002). In this study, all comparisons were performed point-by-point. This study used 50 kHz data, whereas all previous repeat surveys have used 200 kHz or 38 kHz (Greenstreet et al. 1997; von Szalay and

McConnaughey 2002; Foster-Smith and Sotheran 2003; Wilding et al. 2003; Brown et al.

2005). Finally, the biggest difference between this study and previous work is that an

62 effort has been made here to improve reproducibility, not just to measure it. Efforts to improve reproducibility provided insight into how the QTC system in particular, but also single-beam ASC systems in general, might be improved in the future.

4.3 Methods The methods described in this Section and the results presented in Section 4.4 apply to the first question addressed by this Chapter, namely the reproducibility of an off- the-shelf ASC system. The second question addressed by this Chapter, how to increase reproducibility, employed multiple independent approaches. The methods and results for each of the efforts to increase reproducibility are described in Sections 4.5.1-4.5.4.

4.3.1 Survey site and data acquisition Surveys were conducted at Fowey Rocks, a reef chosen because its proximity to

Miami made it easy to survey repeatedly and because its geomorphology was typical of the major upper Florida Keys reefs. A pilot survey performed in 2003 was used to pick two parallel transects running across the outer shelf and upper slope from depths of 5 to

60 m. Each transect was about 2 km long, taking 12 minutes to run at a nominal survey speed of 6 knots (Fig. 4.1).

Acoustic data were collected with a 50 kHz QTCV system (Table 3.2 lists system characteristics). The QTCV system was mounted on the gunwale of an 8m long survey vessel each day. Repeat transects were run on six days in 2007: May 1, 2, 9, 28, August

3, and 13. On each day, both transects were traversed three times. In addition to the acoustic data, pitch and roll of the survey vessel were recorded at a rate of 100 Hz.

Environmental conditions during the six surveys were extracted from NOAA

National Buoy Data Center records (Fig. 4.2). The three closest stations to the survey area were (Fig. 4.1): A) Fowey rocks lighthouse, located at the survey site; B) Virginia

63

Key, located approximately 15 km to the north of the survey site; and C) Molasses reef lighthouse, located approximately 45 km to the south of the survey site.

Figure 4.1: Fowey Rocks survey site and depth profiles. A) Location of Fowey Rocks (FR), Virginia Key (VK) and Molasses Reef (MR) in relation to South Florida. B) Survey tracks plotted in a different color for each day. Profiles along NW-NE and SW-SE are plotted in panels C and D. C) North transect depth profile (20x exaggeration) colored by classes from the May 1 dataset. D) South transect depth profile (20x exaggeration) colored by classes from the May 1 dataset. The main features are a series of north-south trending linear reefs (three on N transect, two on S transect) that are separated by relatively flat sediment in shallow water and sloping sediment (presumably) in deeper water.

4.3.2 Vessel attitude and grazing angle computation Pitch and roll of the survey vessel were used along with depth to compute the incidence angle and its complement, the grazing angle, for each echo. The incidence angle is the angle between the local seabed normal and the direction of propagation of the center of the incoming acoustic energy, taken here as a vector normal to the face of the transducer. Two processing steps were needed to enable the calculation of incidence

64

angle from pitch and roll for each stack of echoes. First, pitch and roll, which were in a

boat coordinate frame, had to be converted to a transducer pointing vector in a world

coordinate frame. Second, the slope of the seabed was computed by fitting a line or plane

to a moving window of depth measurements surrounding each stacked echo.

Pitch and roll were converted by computing the boat heading from the GPS track

and using the heading to rotate the unit vector in boat coordinates to UTM world

coordinates. For each echo, heading was computed as the average of the heading from the

coordinates of that stack to the following and previous echo.

Slope was computed for each echo by using all the depth measurements within a

window of 40 m acquired on a given day. The UTM coordinates of these points were

placed in a Nx2 matrix, where N was the number of points found in the 40 m window,

column one was the UTM eastings and column 2 was the UTM northings. The condition

number of this matrix, which indicates the accuracy of results from matrix inversion, was

computed with the MATLAB ‘cond’ command (R2008a, the Mathworks, Natick, MA) to

assess whether the points could adequately define a plane or if they were aligned along a

line. If the coordinate matrix had condition less than or equal to two, then a plane was fit to the eastings, northings, and depth data and the surface normal to the plane computed. If the condition was greater than 2, that indicated that the points fell mostly along a line, so principal components were used to project the eastings, northings data onto a single dimension, and then least squares regression was used to fit a line to the projected coordinate / depth data in order to compute the surface normal. In the latter case, when

fitting a line rather than a plane, the assumption was that there was zero slope component

perpendicular to the track line.

65

The slope calculation produced a seabed normal vector (in UTM coordinates) and the pitch, roll, and heading data produced a transducer pointing vector (also in UTM coordinates). The angle between the two, which was computed from the dot product of these vectors, gave the incidence angle of the pulse onto the seabed, and 90 minus the incidence angle gave the grazing angle.

A final challenge in the computation of grazing angle was addressing the procedure of echo stacking. As described in Section 3.3.2, echo stacking, or averaging, was used when processing the acoustic data to suppress ping-to-ping variability casued by incoherent scattering from the seabed. For these data, five echoes were averaged to produce one stacked echo (Section 3.3.2, Table 3.3). Stacking echoes resulted in an ambiguous definition of grazing angle because the roll and pitch varied for each of the five echoes in the stack. To address this challenge, grazing angle was computed in three ways: A) using the roll and pitch at the time of the center ping of the stack; B) using the maximum roll and pitch observed during the time spanned by the stack; C) using the minimum roll and pitch observed during the time spanned by the stack. Note that only in method (A) can the total grazing angle be computed, but that for all three methods both the pitch and roll components of grazing angle can be computed. This is due to the fact that the maximum or minimum roll and pitch did not necessarily occur at the same time during the stack span. In the case of the transducer used for this study it was advantageous to look at roll and pitch separately because the transducer beamwidths were quite different in those directions; full beam width angles were 16 degrees in the pitch direction and 42 degrees in the roll direction (Table 3.2).

66

4.3.3 Acoustic data processing The acoustic data were processed using QTC IMPACT software (version 3.4; see

Chapter 2 for software overview) using the same settings described for the Fowey Rocks survey in Chapter 3 (Table 3.3). As described in Section 3.3.2, data processing in

IMPACT consists of six steps. The first three steps, envelope generation, depth compensation, and echo stacking, produce bottom-aligned, depth-compensated, smoothed echoes. The fourth step, feature extraction, generates metrics of echo shape for each stacked echo. IMPACT organizes the feature dataset into a [Nx166] matrix, where N is the number of echoes and 166 the number of features generated for each echo. The features for each echo, therefore, may be considered a vector, called the “full feature vector” (FFV) by IMPACT. In the fifth step, the full [Nx166] matrix is decorrelated and reduced in size by principal components analysis. IMPACT retains the first three principal components, labeling them “Q1”, “Q2”, and “Q3”. The reduced [Nx3] matrix is called “Q-space”. The sixth step in IMPACT processing is clustering Q-space. The final output from the entire process is an assignment of each echo to a cluster comprised of echoes with similar shapes. Clusters are also known as “acoustic classes.”

Clustering in IMPACT (v. 3.4) uses the “automatic cluster engine” (ACE), which employs a modified k-means algorithm embedded within a simulated annealing routine

(Preston et al. 2004a). ACE outputs are: a) the optimum number of clusters into which the echoes should be split, as defined by the Bayesian Information Criterion (Preston et al. 2004a), b) the mean vector and covariance matrix defining each cluster in Q-space, and c) a class number for each echo assigning it to one of the defined clusters.

Acoustic data were clustered in three sets. First, a single merged dataset comprised of all echoes from all six days was created and clustered. Second, each day’s

67

data were clustered separately. Finally each individual transect on each day was clustered

separately. Thus there were a total of 43 clustered datasets each composed of a different

temporal subset of the same raw data: A) a single merged dataset from all six transects on

all six days; B) six daily datasets, each containing the six transects done on that day; C)

36 individual transects, six for each day.

Analysis of reproducibility was divided into two parts based on the type of data.

First, consistency of acoustic classes, which are nominal data, was investigated (Sections

4.3.4, 4.4.2). Second, correlations between echo envelopes, FFVs, and Q-values, which are continuous vector data, with survey parameters were investigated (Sections 4.3.5 and

4.4.3).

4.3.4 Classification reproducibility Classification reproducibility, from a user’s perspective, means consistently

assigning any particular parcel of seabed to the same acoustic class. Two qualitative and

two quantitative methods were used to assess classification reproducibility.

One qualitative indicator of reproducibly was simply the number of classes

discriminated during the clustering process. If the input data, for example on two

different days, were identical, the clustering process would identify the same classes in

those two datasets. Finding the same optimum number of classes in two repeat datasets

does not guarantee that two datasets are identical, but different optimum numbers of

classes is one sign that two repeat datasets are not identical.

A second qualitative assessment of classification reproducibility consisted of visual assessment of the consistency of classes plotted in both geographic space and Q-

space. Simply put, if the differences between two classified repeat transects are hardly

noticeable by eye, then the reproducibility of the method is better than if the differences

68 are obvious just looking at the maps. By itself, qualitative assessment is not a rigorous or objective way to evaluate agreement between maps, (Congalton and Green 1999; Foody

2006), but it provides an intuitively understandable complement to quantitative metrics of map agreement.

Two methods were used to quantitatively assess classification reproducibility between pairs of daily classified datasets: the error matrix and shared information. Both techniques were derived from contingency table analysis (Finn 1993; Legendre and

Legendre 1998). The two methods of analysis produced three metrics of dataset agreement: overall accuracy (OA), the Kappa statistic, and Average Mutual Information

(AMI).

The error matrix is a common tool for evaluating the accuracy of classified maps with “ground truth” data (Congalton and Green 1999). The error matrix is a square contingency table in which the class labels of the rows correspond to those of the columns. Many statistics derived from error matrices can serve as accuracy metrics, but overall accuracy (OA) and the Kappa statistic are two of the most basic and most common (Congalton and Green 1999; Foody 2002).

OA is computed as the trace of an error matrix (the number of points which had the same class in both datasets) divided by the total number of elements in the error matrix (the number of points compared between the datasets). OA varies from 0, corresponding to no agreement, to 1, reflecting perfect agreement between the two datasets being compared. In this case, results are reported as 100 * OA so the scale ranges from 0-100.

69

One limitation of OA is that some of the positive matches can be attributed to

chance. Kappa is a modified OA designed to correct for this positive bias due to chance

agreement (Congalton and Green 1999). There are several ways to weight or otherwise

modify Kappa (Foody 2004; Sim and Wright 2005). This analysis used the basic Kappa

statistic (e.g. Finn 1993):

OA − P Κ = r (1) 1− Pr where K is Kappa, OA is overall accuracy, and Pr is the random probability of a correct match in the confusion matrix. Pr was computed as the inner product of the error matrix € row sum and column sum vectors divided by the square of the number of points compared between the datasets (Finn 1993).

Error matrices are traditionally square, and they require class names to be the same for both rows and columns (i.e. class one on map A must mean the same thing as class one on map B). The daily ACE-clustered QTCV data do not necessarily meet these criteria; different numbers of clusters are possible on different days, and the cluster numbers from day to day do not mean the same thing. Therefore the first step was to renumber the daily classes to a common scheme. To do this, there needed to be at least as many new (renumbered) classes as the day with the most clusters. Consequently, days with fewer than the maximum number of clusters had at least one empty new class.

Once the clusters for each day had been renumbered, OA and Kappa were computed for each pair of datasets as follows. First, a day was chosen as the “reference” dataset and a second day chosen as the “comparison” dataset. Five hundred random echoes were selected from the reference dataset and each was paired with the closest echo in the comparison dataset. Any pairs of points separated by greater than 10 meters were

70 discarded, which usually left somewhere between 350 - 450 matched points. These remaining matched points were used to create an error matrix, from which OA and Kappa were calculated for this reference / comparison dataset pair. This process was repeated for each possible pair of reference and comparison datasets.

It is worth emphasizing that OA is typically used to evaluate a classified map, assumed to contain errors, against some validation dataset, assumed to contain no error.

In such a situation the term overall accuracy reflects the fact that one of the datasets is inherently more trustworthy than the other. As used here, however, to evaluate reproducibility between datasets, it is completely arbitrary which dataset is considered the reference and which is used for comparison. In fact, as described above, OA is reported both ways, with each dataset serving as reference and comparison. Therefore, even though the term overall accuracy is consistent with the error matrix literature, the term overall agreement better describes what the OA metric means in the context of reproducibility.

Average mutual information (AMI) is a measure of the information common to two nominal data sets. AMI was derived from information theory, specifically the concept of entropy (Finn 1993). AMI was computed following Finn (1993). Like OA and

Kappa, AMI is expressed as a percentage of similarity between two maps, ranging from

0, indicating that map A predicts nothing about map B, to 100, indicating that map A is a perfect predictor of map B. Unlike OA and Kappa, AMI can be computed for rectangular contingency tables (i.e. the number of classes in map A does not have to be the same as the number of classes in map B), and the class labels in map A and B do not have to be the same.

71

AMI has been used much less frequently to assess remotely sensed data than OA

or Kappa, so examples may help readers with its interpretation. The limiting cases are

simplest to explain, assuming that the classes in map A are the same as those in map B. If

AMI equals 100%, then the two maps are identical, corresponding to a diagonal error

matrix. If AMI equals 0% then the classes of the pixels on map B are randomly assigned

relative to the classes of the pixels on map A, corresponding to an error matrix filled with

equal elements. If AMI equals 50% then 50% of the pixels will be the same class on map

A and map B. Finn (1993) provides a more complete explanation of AMI, including both

idealized and real examples as well as examples of the more general case when the

classes on maps A and B are not the same.

4.3.5 Echo, FFV, and Q-value correlation with survey parameters An end user might be interested only in the reproducibility of acoustic classes

(Sections 4.3.4, 4.4.2). Quantifying the reproducibility of intermediate IMPACT

processing quantities (stacked echoes, FFVs and Q-values), however, is important to

understand the factors affecting reproducibility.

One way to quantify reproducibility was to look at the differences between echoes

as a function of variables that presumably affect them. In other words, can the differences

between echoes, or their derivatives, FFVs, and Q-values, be attributed to survey

parameters such as wind speed, or other confounding factors such as seabed slope? If so,

then removing those echoes so affected should increase reproducibility. For example, if

echoes were found to be less similar with a low grazing angle than with a high grazing

angle, eliminating off-nadir echoes should increase the reproducibility of repeat transects.

On the other hand, if there were no correlation between survey parameters and echo /

FFV / Q space reproducibility, then the source of the variability between echoes would be

72 either due to some unmeasured survey parameter, ping-to-ping variability inherent to random scatterers on the seabed, or within-footprint heterogeneity of the seabed.

Four points, hereafter called “stations,” were chosen along the northern transect.

Station one was along the main reef, in an area of high local relief (Fig. 4.1 shows the

“main reef”). Station two was between the main reef and the outlier reef, in a flat area of sediment. Station three was along the steeply sloping front of the outlier reef. Station four was on the sloping forereef. For each of the four stations, the closest echo from each of the 18 repeat transects was identified. For each echo, three dependent variables were extracted: echo envelopes, FFVs, and Q-values. Fifteen independent variables were also extracted: depth, date, vessel speed, sound speed, location, total grazing angle, pitch grazing angle, and roll grazing angle. Sound speed was computed from actual water temperature (Fig. 4.2) and nominal salinity. The three grazing angles (total, pitch, and roll) were extracted both at the time of the center ping in the stack and for the maximum values observed during the time interval of the stack, as described in Section 4.3.2.

Pairwise differences between the independent variables were computed for each station. At each station there were 18 observations for each variable (six days times three transects per day), which resulted in 153 comparisons between pairs of observations. As noted above, the independent variables were all scalar quantities, so the differences between them were computed as the absolute value of one measurement subtracted from the other. In addition, the minimum value within each pair was retained for the grazing angle variables. The logic behind using the minimum grazing angle of each pair rather than just the difference between grazing angles was that highly off-nadir echoes were expected to exhibit higher variance. If two echoes both had low grazing angles, the

73 difference between their grazing angles might be small even if the differences between the echoes were large.

In contrast to the independent variables, which were all scalars, the dependent variables were vector quantities. Therefore, simple differences between them could not be directly correlated with the independent variables. To address this complication, three methods were used to compute scalar differences between the vector quantities: the correlation coefficient (2), the multidimensional angle (3), and the magnitude of the difference vector produced by subtracting one from the other (4). For two vectors, A and

B, these quantities are defined as:

covAB rAB = (2) σ Aσ B

  −1 A * B θ AB = cos   (3) €  A B 

dAB = A − B (4)

€ where covAB was the covariance between vectors A and B, σA was the standard deviation of vector A, A * B was the€ dot product of vectors A and B, and ||A|| was the magnitude of vector A.

Plots were generated for all combinations of the nine dependent metrics (three variables times three difference methods) and 15 independent variables. The resulting

135 plots were inspected for correlations between echos, FFVs, or Q-values and the environmental variables.

74

4.4 Results 4.4.1 Environmental conditions, vessel attitude and grazing angle None of the three NOAA sensor packages had logged both wind speed and water

temperature on each of the repeat survey days, but wind speed and water temperature

were available from at least one of the stations during each survey (Fig. 4.2). May 2, and

August 13, 2007 were the calmest days. May 28 was the windiest. Water temperature

increased about 5C over the span of days surveyed.

Figure 4.2: Wind speed (left) and water temperature (right) for the periods of the six surveys. Missing bars indicate no data for that time period. For example, the Virginia Key station does not record water temperature. May 2, and August 13, 2007 were the calmest days. May 28 was the windiest. Water temperature increased about 5C over the span of days surveyed.

Two notable features were apparent in the transducer attitude data (Figs. 4.3 and

4.4). First, the median values of pitch, roll, and total angle off vertical were not zero on any day. For example, on May 2 the daily median value of the mean within-stack roll measurements was about -5 degrees (Fig. 4.3C). If the transducer had been perfectly aligned with the survey vessel, a negative bias of -5 degrees would indicate that the vessel listed to port during the survey. A more likely explanation is that the transducer was mounted to the vessel on May 2 such that it pointed 5 degrees to starboard.

75

Figure 4.3: Boxplots of daily mean and maximum within-stack attitude. Each box shows, for all measurements on one day, the median (horizontal red line), 25th-75th percentile (blue box), data within 1.5 times the range between the 25th-75th percentile (black “whiskers”), and individual data points falling outside the whiskers (red ‘+’ markers). A) Mean within-stack pitch, B) maximum within-stack pitch, C) mean within-stack roll, D) maximum within-stack roll, E) Mean within-stack angle off vertical, F) maximum within- stack angle off vertical. Green horizontal lines in (A, B, E, F) are drawn at +/- 8 degrees, and blue horizontal lines in (C, D, E, F) at +/- 21 degrees, which are half the beamwidth in the pitch and roll directions respectively. Note daily biases.

76

Figure 4.4: Transducer attitude data from May 28, 2007. Roll (A) and pitch (B) measurements made during all six transects May 28. C) Pitch component of the grazing angle during all six transects May 28. The horizontal black line in the fourth panel marks 1/2 of the pitch beamwidth. Note that the times when the boat turned at the end of each transect are visible (marked here with vertical black dashed lines) because vessel motion was larger when headed into the wind and waves on the first, third, and fifth transects than when headed downwind on the second, fourth, and sixth transects.

Furthermore, the mounting angle biases changed from day to day (Fig. 4.3). Biases were also visible in the raw data for a given day (Fig. 4.4).

Second, sea state affected variability of pitch and roll on both daily and transect time scales. Pitch and roll were most variable on May 28, the windiest day (Fig. 4.3). In particular, the maximum within-stack values were largest on May 28 (Fig. 4.3).

Maximum within-stack values were low on both May 2 and Aug 13, as expected based

77 on the low wind speeds those days (Fig. 4.2). The data from Aug 3, however, which was windier than May 2 or Aug 13 (Fig. 4.2), had similar variability in pitch and roll to May 2 and Aug 13 (Fig. 4.3). The relatively low pitch and roll variability on Aug 3 highlighted the fact that stability of a given vessel is a function of wind direction, fetch, duration, and vessel heading in addition to wind speed. Pitch and roll variability changed within each day also, as the survey vessel headed into and away from the wind (Fig. 4.4).

4.4.2 Classification reproducibility Classification reproducibility was assessed both qualitatively and quantitatively

(Section 4.3.4). Qualitative assessment included tables of the number of classes identified in repeat datasets and visual assessment of the consistency of classes in both geographic and Q-space. Quantitative assessment employed the error matrix and shared information approaches.

IMPACT identified 8 classes as the optimum split level for clustering of the entire dataset, which totaled 9,562 echoes from all days. Clustering each of the six days separately (independent PCA and clustering on each day) returned a variable number of optimum classes (Table 4.1). IMPACT suggested an optimum split level that clustered four of the six daily datasets into five classes, one day into four classes, and one day into six classes (Table 4.1).

Table 4.1: Optimum number of clusters identified by ACE for each daily dataset clustered separately. The optimum number varied from 4-6. Date # Classes # Stacks May 1 5 1822 May 2 6 1583 May 9 5 1983 May 28 5 1397 Aug 3 5 1393 Aug 13 4 1307

78

When processed separately, independent PCA and clustering were performed on each of the three replicates for both north and south transects each day, resulting in 36 clustered datasets. Most individual replicate transects contained three or four classes when clustered separately, though one leg clustered into five classes, and one leg resulted in only two classes (Table 4.2). Thus, the optimum number of classes generally varied by dataset, with smaller datasets, the individual transects, clustering into fewer classes and larger datasets, the daily and all days-merged datasets, clustering into more classes.

Table 4.2: Optimum number of clusters identified by ACE for each transect when clustered separately. The “Leg” column numbers the three replicate transects within each survey day. The optimum number varied from 2-4. North Transect South Transect Date Leg # Classes # Stacks Date Leg # Classes # Stacks May 1 1 4 264 May 1 1 4 212 May 1 2 3 252 May 1 2 3 212 May 1 3 4 433 May 1 3 3 255 May 2 1 3 380 May 2 1 4 210 May 2 2 3 250 May 2 2 4 249 May 2 3 3 243 May 2 3 4 259 May 9 1 4 235 May 9 1 4 243 May 9 2 4 254 May 9 2 4 233 May 9 3 3 354 May 9 3 4 231 May 28 1 3 256 May 28 1 4 220 May 28 2 4 238 May 28 2 4 238 May 28 3 4 246 May 28 3 3 228 Aug 3 1 5 457 Aug 3 1 4 223 Aug 3 2 3 314 Aug 3 2 2 218 Aug 3 3 4 471 Aug 3 3 3 212 Aug 13 1 3 232 Aug 13 1 3 215 Aug 13 2 3 252 Aug 13 2 4 226 Aug 13 3 3 257 Aug 13 3 4 213

The distribution of points in Q-space was broadly similar among days (Fig. 4.5), but the number of clusters and the locations of their boundaries shifted between days.

Generally classes with extreme values in Q-space had more stable mean values in the different daily datasets than classes formed from points in the middle of Q-space. For example, the mean values for the black, grey, and, yellow classes, which have the

79

smallest and largest values of Q1, were more consistent than the red and blue classes,

which contained echoes with intermediate Q1 values (Fig. 4.5).

Plotting daily classified echoes in geographic space reveals broad similarities, but

many differences (Fig. 4.6). The main and inner reefs, for example (Fig. 4.1), appear

consistently in the yellow class on both north and south transects (arrows labeled “M”

and “I” in Fig. 4.6). The transition (arrow labeled “T” in Fig. 4.6), occurring on both

Figure 4.5: Q-space of each of the six daily datasets after clustering independently. Classes have been colored so that black and grey are the first two colors for classes with the smallest Q-values, and yellow, red and blue are the first three colors for classes with the largest Q-values. August 13 had only four classes, so no yellow was used. May 2 had six classes so green was used for the least populous one. Note that the mean values for the black, grey, and, yellow classes are more consistent than the red and blue classes.

80

Figure 4.6: All 18 replicates of both the northern and southern transects. The left two panels plot all northern (A) and southern (C) transects using their actual coordinates. The right two panels plot all northern (B) and southern (D) transects with 50 m north-south offset between them. Transects in (B) and (D) are grouped in date order, so, for example, the top three are from 1 May and the bottom three are from 13 Aug. Colors in both plots correspond to the Q-spaces shown previously (Fig. 4.5). Y-axis is severely stretched on both plots to space points out for clarity. Note that transects are broadly similar, but not identical. The inner (I) and main (M) reefs appear consistently in the yellow class and the transition (T) between the shelf and slope also appears consistently as a border between either red or blue and grey classes.

81

transects just seaward of the outlier reef, between the red or blue classes on the shelf and

the grey or black classes on the slope also appears consistently defined (compare profile

from Fig. 4.1 with plan view of Fig. 4.6). The distribution of red and blue classes on the

shelf, however, varied day to day and even from transect to transect within the same day

(Fig. 4.6).

Plotting echoes from the individually classified transects in geographic space

reveals broad patterns of agreement between replicates, but many differences in the

details. As observed for the daily datasets, the transition between the shelf and slope

consistently appeared as a border between classes in the individual transects (arrows

labeled “T” in Figs. 4.6, 4.7, 4.8). Unlike the daily datasets, which generally identified a

separate class for the inner and middle reefs (arrows labeled “M” and “I” in Fig. 4.6), the transects classified individually did not segment these reefs as separate classes (Figs. 4.7,

4.8).

Differences between successive transects that had been classified separately were

often apparent even for transects with the same number of clusters (Figs. 4.7, 4.8). On

May 2, 2007, for example, the first two northern transects each had three clusters, but the

spatial distribution of these classes was not the same between transects (Fig. 4.7A, B, D,

E). The gray class was consistent between the first two replicates of the northern transect

on May 2, but the red and blue classes were not (Fig. 4.7A, B, D, E). The third replicate

of the north transect on May 2, 2007 was clustered into four classes, so differs from the

first two transects in that respect. Nevertheless, the west end of the third transect (Fig.

4.7C, F) appeared similar to the first transect (Fig. 4.7A, B, D, E).

82

The differences in vessel motion between successive transects, caused by heading

into and then away from the wind, were significant on some days (Fig. 4.4), so one might

expect to find greater similarity between transects with similar vessel motion. For

example, reproducibility might be expected to be greater on a calm day than a windy day

because on a calm day vessel pitch and roll will be minimal regardless of heading during

data acquisition. Likewise, on a windy day transects acquired when headed in the same

direction relative to the wind might be more similar than transects acquired when headed

in opposite directions. The classified individual transects do not necessarily support these

hypotheses, however. May 2 was the most calm survey day, yet the three replicates of the northern transect did not appear particularly reproducible (Fig. 4.7). May 28 was the

windiest survey day, exhibiting notable differences in pitch and roll on upwind and

downwind legs, yet the two transects acquired downwind (Fig. 4.8A, D, G and C, F, I)

were not noticeably more or less similar to each other than to the transect acquired going

upwind, when vessel motion was much larger (Fig. 4.8B, E, H). These observations

suggest that either the pitch and roll of the vessel do not affect classification

reproducibility or event the small pitch and roll measured on May 2 was large enough to

reduce reproducibility. Section 4.5.2 attempts to resolve which of these conclusions is

correct.

The first approach to quantifying reproducibility employed the error matrix to

derive overall accuracy and Kappa statistics. As the daily datasets had clustered into

different numbers of classes (Table 4.1), the first step required was to renumber the

clusters on each day into a common classification scheme. Renumbering of classes was

done by inspecting the positions of the class means in Q-space (Fig. 4.9). IMPACT had

83 identified clusters with class mean Q1 values near -2, -1.4, -0.8 on all six of the daily datasets (Fig. 4.9). Clusters on each day with mean values closest to these points in Q- space were renumbered 1-4 and assigned colors black, grey, and green, respectively (Fig.

4.9). IMPACT had identified clusters with class mean Q1 values near 0.2 on two of the six daily dataset and clusters with class mean Q1 values near -0.4 and +0.4 on five of the

Figure 4.7: Independently classified replicate track lines, Q-space, and pitch measurements along the northern transect on May 2, 2007. A, B, C) All transects on May 2 in light gray, and the actual classified data from a particular line in color (including dark gray). D, E, F) Clustered Q-space for each transect. G, H, I) Pitch measured during each transect. Green lines mark +/- eight degrees, which is the half angle of the transducer in the pitch direction, in green. The arrow labeled “T” marks the transition between the shelf and slope (as in Fig. 4.6). Note clear differences among the transects even though vessel motion was small on this day.

84

Figure 4.8: Independently classified replicate track lines, Q-space, and pitch measurements along the northern transect on May 28, 2007. A, B, C) All transects on May 28 in light gray, and the actual classified data from a particular line in color (including dark gray). D, E, F) Clustered Q-space for each transect. G, H, I) Pitch measured during each transect. Green lines mark +/- eight degrees, which is the half angle of the transducer in the pitch direction, in green. The arrow labeled “T” marks the transition between the shelf and slope (as in Fig. 4.6). Note that transects one (A, D) and three (C, F), acquired when headed downwind, were not much different than transect two (B, E) acquired headed into the wind, even though vessel motion was much greater on transect two (H) than on transects one or three (G, I).

six daily datasets (Fig. 4.9). Clusters on each day with mean values closest to these points in Q-space were renumbered 5, 4, and 6 and assigned the colors red, blue, and yellow, respectively (Fig. 4.9).

Once the classes had been renumbered, an error matrix was constructed and OA and Kappa were computed. OA and Kappa showed some consistency between daily

85

classified datasets, but generally these values were fairly low (Table 4.3). Most of the OA

values fell in the range of about 50% to 65%, with the exception of the comparisons

between August 13 and any of the other days, which were in the range of 30% - 48%.

Figure 4.9: Illustration of the method and results of renumbering of daily classes. Top left: mean +/- one standard deviation of each cluster from each daily dataset. All of the clusters on a given day have the same color in the top plot. Bottom left: same cluster means +/- one standard deviation as the top panel, but colors correspond to the new class numbers. Right: All of the points for each day plotted in their respective Q-spaces using the new class colors from the bottom left panel.

Table 4.3: Overall accuracy (left) and Kappa (right) between pairs of daily classified datasets (reclassed to 6 classes).

For the repeat QTCV datasets, AMI proved