Journal of Marine Science and Engineering

Article Comparison of Human and Visual Acuity—Setting the Benchmark for Shallow Water Autonomous Imaging Platforms

Scott Bainbridge * and Scott Gardner

Australian Institute of Marine Science, PMB 3 MC, Townsville Qld. 4810, Australia; [email protected] * Correspondence: [email protected]; Tel.: +61-7-4753-4377; Fax: +61-7-4772-5852

Academic Editor: Dong-Sheng Jeng Received: 30 November 2015; Accepted: 14 February 2016; Published: 24 February 2016

Abstract: A comparison was made between the underwater visual acuity of human observers and a high-end stills camera as applied to visual surveys of shallow water coral reefs. The human observers had almost double the visual acuity of the camera, recording a Snellen eye test score of 20/8 at 4.3 m depth against 20/15 for the camera. The human observers had a field of view of 7.8 m (horizontal) by 5.8 m at 4.3 m depth while the camera had a field of view of 4.46 m by 2.98 m, or only one-third of the area observed by the snorkelers. The human observers were therefore able to see a three-times-larger field of view at twice the resolution of the camera. This result comes from the observers actively scanning the scene to put the area of interest in the part of the retina with the greatest resolving power (the fovea), increasing the apparent resolving power of their eyes, against the camera which resolved equally across the image. As a result, in actively identifying targets, humans exceeded the camera, but for more passive observation work they may be closer to the performance of the camera. The implications for autonomous platforms are that to match the human observers for target recognition, platforms will need to operate lower (to increase resolution) and longer (to sample the same area) and so issues such as collision avoidance and navigation will be critical to operationalizing autonomous systems.

Keywords: visual acuity; coral reefs; manta tow; autonomous platforms; imaging systems

1. Introduction There is an increasing need to replace or supplement diver and snorkeler based marine field sampling with sampling based on autonomous platforms, such as Autonomous Surface Vessels (ASVs) or Autonomous Underwater Vehicles (AUVs). This is being driven by the need to sample in deep areas (below dive-able depth or around 30 m [1,2]), to sample at night or in murky conditions, or to work in areas inhabited by potentially hazardous animals such as sharks or crocodiles [3,4]. For many autonomous platforms visual imaging systems are the main form of information collection [5]. These include standard digital , hyper-spectral cameras, three-dimensional (3D) and 360 degree rigs using a range of lighting from ambient to wavelength-specific light sources (such as UV light) [6]. In designing cameras for new platforms there is an implied assumption that these cameras can capture the same level of information as a diver or snorkeler can using their eyes, with the benefit that the camera produces a permanent record that can be analyzed by human or machine vision systems. This assumption needs to be tested by defining what a human observer can perceive and then comparing or benchmarking this against what a camera system can achieve. The camera system can be benchmarked on what a human can interpret from the captured images

J. Mar. Sci. Eng. 2016, 4, 17; doi:10.3390/jmse4010017 www.mdpi.com/journal/jmse J. Mar. Sci. Eng. 2016, 4, 17 2 of 12

J. Mar. Sci. Eng. 2016, 4, 17 2 of 12 (machine capture + human interpretation) or against an automated image analysis system (machine captureThis + machinestudy looks interpretation). to provide an initial comparison of the visual acuity of two snorkelers and a modernThis camera study lookssystem to provideas applied an initialto the comparisonmanta‐tow sampling of the visual method, acuity a of common two snorkelers method and for aobserving modern shallow camera water system coral as applied reefs [7]. to The the manta manta-tow‐tow method sampling involves method, an observer a common on method being for observingtowed on the shallow surface water behind coral a small reefs vessel [7]. The and manta-tow assessing the method condition involves of the an benthos observer or onreef snorkel over a beingseries towedof two‐ onminute the surface tows. At behind the end a small of the vessel two minutes and assessing a summary the condition of the of just the seen benthos is recorded or reef overon an a underwater series of two-minute slate before tows. the Atnext the two end‐minute of the tow two is minutes started. a This summary method of is the quick reef justand seensimple is recordedbut requires on anat least underwater three people slate before(snorkeler, the nextboat two-minutedriver and safety tow is observer) started. Thisand does method not iscreate quick a andpermanent simple record but requires of what at leastis seen, three apart people from (snorkeler, the summary boat record. driver andThere safety are other observer) coral and reef does and notshallow create water a permanent visual assessment record of whatmethods, is seen, such apart as line from intercept the summary transects record. [8] and There fish arevisual other surveys [9–11], and that shallow are also water amenable visual to assessment being replaced methods, by autonomous such as line platforms intercept transectsusing still [ 8or] and video fish imaging visual surveyssystems [[2,6,7].9–11], thatNote are that also this amenable was just to an being initial replaced study using by autonomous one site, at one platforms time, with using two still observers; or video imagingit is hoped systems to extend [2,6 ,this7]. Note work that to other this was sites just and an water initial types study as usingfield opportunities one site, at one allow. time, with two observers; it is hoped to extend this work to other sites and water types as field opportunities allow. 2. Experimental Section 2. ExperimentalThe manta‐tow Section method uses a small floating wooden board, approximately 70 cm × 50 cm, which a snorkelerThe manta-tow holds onto method as they uses get a smalltowed floating along the wooden surface board, around approximately the reef. The 70 board cm ˆ holds50 cm, a which sheet aof snorkeler underwater holds paper onto for as theyrecording get towed what along they theobserve. surface For around this study the reef. an Theexisting board manta holds tow a sheet board of underwaterwas re‐purposed paper by for attaching recording a what Sony they® A7r observe. 36 mega For‐pixel this studyfull frame an existing camera manta (Tokyo, tow Japan) board wasin a re-purposedNauticam® NA by‐ attachingA7 underwater a Sony ®housingA7r 36 (Hong mega-pixel Kong, full China) frame with camera a (Tokyo, UW Japan)‐Nikkor in a® Nauticam20 mm f2.8® NA-A7lens (Tokyo, underwater Japan) housingto the lower (Hong side Kong, of the China) board with to give a Nikon a view UW-Nikkor of the bottom® 20 equivalent mm f2.8 lens to (Tokyo,what a Japan)snorkeler to thewould lower see. side The of underwater the board camera to give housing a view of was the adapted bottom to equivalent allow for to external what a power snorkeler and wouldcamera see. control. The underwaterOn the upper camera side of housing the board was a series adapted of sensors to allow were for external mounted power including and cameraa control.altimeter On (Tritech the upper® PA500, side ofAberdeen, the board Scotland), a series of a sensorsroll/tilt weresensor mounted (LORD MicroStrain including a® sonar 3DM altimeter‐GX4‐25, (TritechWilliston,® PA500, VT, USA) Aberdeen, and a GPS Scotland), unit, all a controlled roll/tilt sensor by a Raspberry (LORD MicroStrain Pi® microcomputer® 3DM-GX4-25, (Caldecote, Williston, UK) VT,located USA) in and a waterproof a GPS unit, instrument all controlled pod. by aFigure Raspberry 1 shows Pi® microcomputer the set up and (Caldecote, the location UK) of locatedthe main in acomponents. waterproof instrument pod. Figure1 shows the set up and the location of the main components.

Figure 1. ModifiedModified manta board showing camera,camera, sensorssensors andand instrumentationinstrumentation pod.pod.

The manta board rig allowed the collection of position data via GPS (using a small aerial that remained above the water during tows), distance to the bottom via a sonar altimeter and

J. Mar. Sci. Eng. 2016, 4, 17 3 of 12

J. Mar.The Sci. manta Eng. 2016 board, 4, 17 rig allowed the collection of position data via GPS (using a small aerial3 of 12 that remained above the water during tows), distance to the bottom via a sonar altimeter and roll/tilt/accelerationroll/tilt/acceleration data, data, all all synchronized synchronized by by thethe GPSGPS timetime signal.signal. The The camera camera was was synced synced to to the theother other data data by photographing anotheranother GPS GPS unit unit and and then then correcting correcting the the internal internal clock clock of of the the camera camera to theto the common common GPS GPS time. time. This This allowed allowed the the precise precise location, location, altitude altitude (equivalent (equivalent to bottomto bottom depth), depth), speed,speed, direction, direction, and and orientation orientation of the of the camera camera to be to recordedbe recorded for for each each image image taken. taken. The The manta manta board board rigrig was was controlled controlled by by the the snorkeler snorkeler to beto be just just under under the the surface surface with with the the camera camera facing facing directly directly down. down. A numberA number of visualof visual targets targets were were printed printed on A3on A3 paper paper (279 (279ˆ 420 × 420 mm), mm), laminated laminated and and weighted weighted downdown to stayto stay on on the the bottom. bottom. Two Two 1 m 1 steelm steel rulers rulers were were used used to giveto give distance distance measures. measures. A small A small depth depth sensorsensor (Schlumberger (Schlumberger® CTD® CTD Diver, Diver, Houston, Houston, TX, TX, USA) USA) was was attached attached to oneto one of theof the targets targets to giveto give a a separateseparate measure measure of depth. of depth. The The visual visual targets targets included included A3-sized A3‐sized print-outs print of‐outs the of standard the standard Snellen Snellen eye charteye [12 chart], a color [12], charta color for chart white for balance white balance adjustments, adjustments, and print and outs print with outs varying-sized with varying lines‐sized to check lines to whatcheck level what of detail level could of detail be seen could by be the seen snorkelers by the andsnorkelers the camera. and the The camera. visual targets The visual are shown targets in are Figureshown2. in Figure 2.

Figure 2. Visual targets used for calculating (left to right) resolving power, color balance and visual acuityFigure (Snellen 2. Visual chart) targets (not to used scale, for originals calculating were (left printed to right) at A3 resolving (279 mm power,ˆ 420 mm)).color balance and visual acuity (Snellen chart) (not to scale, originals were printed at A3 (279 mm × 420 mm)). The visual targets were positioned in approximately four-meter deep water in a sandy area off The visual targets were positioned in approximately four‐meter deep water in a sandy area off Wistari Reef in the southern part of the , Australia. The two metal rulers were Wistari Reef in the southern part of the Great Barrier Reef, Australia. The two metal rulers were arranged in a “T” shape at the start of a flat sandy area with the other visual targets arranged along a arranged in a “T” shape at the start of a flat sandy area with the other visual targets arranged along line over which the snorkelers were towed (see Figure3). Conditions for the sampling were excellent a line over which the snorkelers were towed (see Figure 3). Conditions for the sampling were with clear skies, calm seas and very clear water. The area chosen consisted of flat sandy region with a excellent with clear skies, calm seas and very clear water. The area chosen consisted of flat sandy depth of around 4–5 m in a sheltered back reef area. All runs were done over a period of about 30 min, region with a depth of around 4–5 m in a sheltered back reef area. All runs were done over a period mid-morning, at the same location with effectively the same conditions. of about 30 min, mid‐morning, at the same location with effectively the same conditions. The snorkelers were towed at 2 knots over the targets (which they had not previously seen) three The snorkelers were towed at 2 knots over the targets (which they had not previously seen) three times; once to determine the visual acuity using the Snellen chart, the second to read off the metal times; once to determine the visual acuity using the Snellen chart, the second to read off the metal ruler markings and the third to look at resolving power using the left-most chart in Figure2. Each ruler markings and the third to look at resolving power using the left‐most chart in Figure 2. Each observer was only given one pass per target to determine what they could observe to ensure that they observer was only given one pass per target to determine what they could observe to ensure that they did not have prior knowledge of what they were seeing. For each observer run the camera was fired did not have prior knowledge of what they were seeing. For each observer run the camera was fired once a second along the tow to capture the same information. The best images were then used for the once a second along the tow to capture the same information. The best images were then used for the comparison; in particular images with the targets at the center where lenses typically perform best [13] comparison; in particular images with the targets at the center where lenses typically perform best [13] were selected although the uniformity of the sampling conditions meant that the images were all of similar quality (sharpness, detail and lighting).

J. Mar. Sci. Eng. 2016, 4, 17 4 of 12

The camera images were recorded as raw files (as Sony® AWR files, Tokyo, Japan) and converted to Adobe® Digital Negative (DNG) files using the Adobe® DNG Converter (version 9.2, San Jose, CA, USA); from there they were imported into Adobe Lightroom® (version 4.4, San Jose, CA, USA) and processed in Adobe Photoshop® (version CS5.1, San Jose, CA, USA). For all of the field images presented here the details are: Sony A7r camera, Nikon 20 mm f2.8 UW‐Nikkor Lens, depth 4.3 m, f8.0, ISO 100, 1/320 s, white balance 7900 K, camera speed 2 knots moving right to left.

3. Results

3.1. Visual Field of View The measured field of view from the camera system with a 20 mm lens on a full‐frame sensor in 4.3 m of water, from Figure 3, was 16.5 pixels/cm to give a field of view of 4.46 m horizontally and 2.98 m vertically. This equates to an angular field of view of 54 degrees and contrasts to a theoretical in‐air measurement for the same equipment of 7.74 m × 5.16 m [14]. This represents a 66% reduction in field of view or, conversely, a multiplier of three times. The field of view for the snorkelers was measured by getting them, while in the water with their eyes below surface level, to extend their arms until they could just see their hands and to measure this angle using a camera from above. Note the snorkelers wore their standard masks while doing the tests as the mask limits peripheral vision. For the two snorkelers tested, the horizontal angular field of view was 80° for the first observer and 86° for the second (Figure 4), indicating that the mask severely limited peripheral vision (“normal” peripheral vision is over 130° [15]); the differences between the two mask values were J. Mar. Sci. Eng. 2016, 4, 17 4 of 12 most likely due to differing mask design. Using an angular field of view of 85 degrees, for a depth of 4.3 m, gives a snorkeler field of view of around 7.8 m horizontally by 5.8 m vertically against the werecamera selected value of although 4.46 m × the 2.98 uniformity m. This means of the the sampling snorkelers conditions were seeing meant an that area the of images45 m2 versus were 13 all m of2 forsimilar the camera, quality (sharpness,or over three detail times and the lighting). area.

Figure 3. ImageImage showing showing the the two two 1 m steel rulers (right) and the color chart (left).

The camera images were recorded as raw files (as Sony® AWR files, Tokyo, Japan) and converted to Adobe® Digital Negative (DNG) files using the Adobe® DNG Converter (version 9.2, San Jose, CA, USA); from there they were imported into Adobe Lightroom® (version 4.4, San Jose, CA, USA) and processed in Adobe Photoshop® (version CS5.1, San Jose, CA, USA). For all of the field images presented here the details are: Sony A7r camera, Nikon 20 mm f2.8 UW-Nikkor Lens, depth 4.3 m, exposure f8.0, ISO 100, shutter 1/320 s, white balance 7900 K, camera speed 2 knots moving right to left.

3. Results

3.1. Visual Field of View The measured field of view from the camera system with a 20 mm lens on a full-frame sensor in 4.3 m of water, from Figure3, was 16.5 pixels/cm to give a field of view of 4.46 m horizontally and 2.98 m vertically. This equates to an angular field of view of 54 degrees and contrasts to a theoretical in-air measurement for the same equipment of 7.74 m ˆ 5.16 m [14]. This represents a 66% reduction in field of view or, conversely, a multiplier of three times. The field of view for the snorkelers was measured by getting them, while in the water with their eyes below surface level, to extend their arms until they could just see their hands and to measure this angle using a camera from above. Note the snorkelers wore their standard masks while doing the tests as the mask limits peripheral vision. For the two snorkelers tested, the horizontal angular field of view was 80˝ for the first observer and 86˝ for the second (Figure4), indicating that the mask severely limited peripheral vision (“normal” peripheral vision is over 130˝ [15]); the differences between the two mask values were most likely due to differing mask design. Using an angular field of view of 85 degrees, for a depth of 4.3 m, gives a snorkeler field of view of around 7.8 m horizontally by 5.8 m vertically against the camera value of 4.46 m ˆ 2.98 m. This means the snorkelers were seeing an area of 45 m2 versus 13 m2 for the camera, or over three times the area. J. Mar. Sci. Eng. 2016, 4, 17 5 of 12 J. Mar. Sci. Eng. 2016, 4, 17 5 of 12

Figure 4. ImagesImages showing showing head head shots shots of of snorkelers snorkelers with with their their arms arms extended, extended, indicating indicating the side limits limits of their field field of view. From this the angle of vision can be measured, as indicated by the lines overlaid on the images, and hence the fieldfield of view for a set depth.

3.2. Visual Acuity 3.2. Visual Acuity A standard Snellen eye chart [12] (right‐most part of Figure 2) was used to record visual acuity. A standard Snellen eye chart [12] (right-most part of Figure2) was used to record visual acuity. The Snellen chart measures visual acuity by comparing what a particular person can discern at a set The Snellen chart measures visual acuity by comparing what a particular person can discern at a set distance from the eye chart compared to what a person with normal vision can discern: 20/20 vision distance from the eye chart compared to what a person with normal vision can discern: 20/20 vision is is where a normal person would be able to discern the letters/shapes at a distance of 20 ft (6 m) from where a normal person would be able to discern the letters/shapes at a distance of 20 ft (6 m) from the eye chart, and a score of 20/15 means that a person is able to discern at 20 ft what a person with the eye chart, and a score of 20/15 means that a person is able to discern at 20 ft what a person with normal eye sight would be able to see at 15 ft. Lower values for the second number indicate increased normal eye sight would be able to see at 15 ft. Lower values for the second number indicate increased visual acuity (a score of 20/15 indicates a visual acuity 25% better than normal). visual acuity (a score of 20/15 indicates a visual acuity 25% better than normal). For the two snorkelers tested (neither of which wears glasses or other visual aids for long For the two snorkelers tested (neither of which wears glasses or other visual aids for long distance distance vision) both observers had better than 20/20 vision in air and could comfortably read down vision) both observers had better than 20/20 vision in air and could comfortably read down to 20/15 to 20/15 at a distance of 20 ft or 6 m from the chart. Using a distance to the chart of 4.3 m (the same as at a distance of 20 ft or 6 m from the chart. Using a distance to the chart of 4.3 m (the same as the the water depth of the field data), both could see down to 20/13 in air with one person being able to water depth of the field data), both could see down to 20/13 in air with one person being able to read read down to 20/10, which is better than “average”; at this distance normal vision would be 20/15 (15 down to 20/10, which is better than “average”; at this distance normal vision would be 20/15 (15 ft is ft is approximately 4.5 m or close to the field sampling depth of 4.3 m). approximately 4.5 m or close to the field sampling depth of 4.3 m). In water, both snorkelers were able to see down to 20/8 in 4.3 m of water, which is better than In water, both snorkelers were able to see down to 20/8 in 4.3 m of water, which is better than their in‐air vision with an improvement of around 53%. At the same distance/depth, the best image their in-air vision with an improvement of around 53%. At the same distance/depth, the best image of of the same chart from the camera is shown in Figure 5. With the image enlarged to 100% the acuity the same chart from the camera is shown in Figure5. With the image enlarged to 100% the acuity of of the camera is around 20/15 (top line of the right‐hand side of the left image in Figure 6) and with the camera is around 20/15 (top line of the right-hand side of the left image in Figure6) and with mild mild sharpening, color balancing and contrast enhancement applied (right image in Figure 6), some sharpening, color balancing and contrast enhancement applied (right image in Figure6), some parts parts of the 20/13 line are readable. As a result, the acuity of the camera is at best 20/13 and more of the 20/13 line are readable. As a result, the acuity of the camera is at best 20/13 and more likely likely around 20/15; this is significantly worse than that of the snorkelers (who both scored 20/8 in around 20/15; this is significantly worse than that of the snorkelers (who both scored 20/8 in water). water). For a distance of 4.3 m, normal vision in air would be approximately 20/15. Given the magnifying properties of water (around 130% [16]) this should increase for the snorkelers to around 20/11 underwater. As both snorkelers had better than normal in-air vision, at 20/13 for a 4.3 m distance, then, using a 1.33 magnification, this would put their in-water vision at around 20/9. Their actual in-water acuity was 20/8 or close to their theoretical score. The camera would therefore need to score 20/13 or better in water to match a human observer with normal vision and 20/8 or better to match the observers used in this study; at 20/15 it is well below what the humans achieved. Note that the score of 20/15 for the camera is for a human interpreting the camera image, if the image was to be analyzed by image recognition software then realistically the 20/20 line (bottom of the left side of the left image in Figure6) would be the smallest text that would be discernible.

J. Mar. Sci. Eng. 2016, 4, 17 6 of 12

J. Mar. Sci. Eng. 2016, 4, 17 6 of 12 J. Mar. Sci. Eng. 2016, 4, 17 6 of 12

Figure 5. Image from the camera showing the two Snellen eye charts in 4.3 m of water.

For a distance of 4.3 m, normal vision in air would be approximately 20/15. Given the magnifying properties of water (around 130% [16]) this should increase for the snorkelers to around 20/11 underwater. As both snorkelers had better than normal in‐air vision, at 20/13 for a 4.3 m distance, then, using a 1.33 magnification, this would put their in‐water vision at around 20/9. Their actual in‐ water acuity was 20/8 or close to their theoretical score. The camera would therefore need to score 20/13 or better in water to match a human observer with normal vision and 20/8 or better to match the observers used in this study; at 20/15 it is well below what the humans achieved. Note that the score of 20/15 for the camera is for a human interpreting the camera image, if the image was to be analyzed by image recognition software then realistically the 20/20 line (bottom of the left sideFigure of the 5. left ImageImage image from in the Figure camera 6) wouldshowing be the the two smallest Snellen text eye chartsthat would in 4.3 m be of discernible. water.

For a distance of 4.3 m, normal vision in air would be approximately 20/15. Given the magnifying properties of water (around 130% [16]) this should increase for the snorkelers to around 20/11 underwater. As both snorkelers had better than normal in‐air vision, at 20/13 for a 4.3 m distance, then, using a 1.33 magnification, this would put their in‐water vision at around 20/9. Their actual in‐ water acuity was 20/8 or close to their theoretical score. The camera would therefore need to score 20/13 or better in water to match a human observer with normal vision and 20/8 or better to match the observers used in this study; at 20/15 it is well below what the humans achieved. Note that the score of 20/15 for the camera is for a human interpreting the camera image, if the image was to be analyzed by image recognition software then realistically the 20/20 line (bottom of the left side of the left image in Figure 6) would be the smallest text that would be discernible.

Figure 6. A 100% enlargement showing (left) that the top line of the right page is just readable which represents a visual acuityacuity ofof 20/1520/15 and (right) the same image with contrast enhancement and mild sharpening which shows some legibilitylegibility atat thethe 20/1320/13 line line (second (second from from top, top, right right hand side). side).

3.3. Resolution 3.3. Resolution Using an image with a range of lines, spaces, text and crosses (see left image in Figure 2), the Using an image with a range of lines, spaces, text and crosses (see left image in Figure2), the snorkelers were asked to record what they could see. The results from both snorkelers were the same; snorkelers were asked to record what they could see. The results from both snorkelers were the same; text was readable down to a 16 point size (5.6 mm high) and lines down to 2 mm in thickness and 2 mm of separation. From the camera, the same chart is shown in Figure7 (at 100% resolution). The text is readable down to 24 points, the text at 16 points is less so, and the actual text resolving power for theFigure camera 6. A value 100% enlargement is probably showing around 20–22(left) that point the fonttop lineversus of the16 right for the page snorkelers. is just readable For blackwhich lines on arepresents white background, a visual acuity the of camera 20/15 and did (right) as well the as same the image snorkelers, with contrast seeing enhancement lines 2 mm and wide mild spaced 2 mmsharpening apart. which shows some legibility at the 20/13 line (second from top, right hand side).

3.3. Resolution Using an image with a range of lines, spaces, text and crosses (see left image in Figure 2), the snorkelers were asked to record what they could see. The results from both snorkelers were the same;

J. Mar. Sci. Eng. 2016, 4, 17 7 of 12 J. Mar. Sci. Eng. 2016, 4, 17 7 of 12 text was readable down to a 16 point size (5.6 mm high) and lines down to 2 mm in thickness and 2 mm of separation. From the camera, the same chart is shown in Figure 7 (at 100% resolution). The text is readable down to 24 points, the text at 16 points is less so, and the actual text resolving power for the camera value is probably around 20–22 point font versus 16 for the snorkelers. For black lines on a J.the Mar. camera Sci. Eng. value2016, 4is, 17probably around 20–22 point font versus 16 for the snorkelers. For black lines7 ofon 12 a white background, the camera did as well as the snorkelers, seeing lines 2 mm wide spaced 2 mm apart.

Figure 7. ImageImage target target showing an array of lines, text and shapes, at 100% enlargement.

The image of of the the rulers rulers was was also also analyzed. analyzed. With With black black markings markings on ona grey a grey steel steel background, background, the thecamera camera performance performance was was less less than than that that of the of the human human observers. observers. The The human human observers observers were were able able to toidentify identify the the inch inch scale scale markings markings (text (text approximately approximately 5 mm 5 mmhigh) high) on the on ruler the rulerfrom a from depth a depth of 4.3 ofm 4.3while m whilethe camera the camera had hadtrouble, trouble, requiring requiring contrast contrast enhancement enhancement and and sharpening sharpening (using (using Adobe ® Photoshop ®:: contrast contrast slider setset toto 15,15, “un-sharp“un‐sharp mask” mask” filter filter set set to to 100% 100% with with a radiusa radius of of 3.1 3.1 pixels), pixels), to to make the lower scale somewhat readable (Figure 8). maketo make the the lower lower scale scale somewhat somewhat readable readable (Figure (Figure8). 8).

Figure 8. A 200% enlargement of the rulers showing ruler markings; the enlargement was taken from Figure 8. A 200% enlargement of the rulers showing ruler markings; the enlargement was taken from the center of the image where lens performance should be maximized. the center of the image where lenslens performance shouldshould bebe maximized.maximized. 3.4. Color Balance 3.4. Color Balance ® Using image manipulation software (Adobe Lightroom®), the image in Figure 9 was color‐ Using image image manipulation manipulation software software (Adobe (Adobe Lightroom Lightroom), ®the), image the image in Figure in Figure 9 was9 colorwas ‐ adjusted to reflect the same image target on land. The camera recorded images with a white balance color-adjustedadjusted to reflect to reflect the same the image same imagetarget on target land. on The land. camera The camerarecorded recorded images with images a white with abalance white of 7900 K; to give a balance that better reflected the actual image target, a white balance of 15,250 K balanceof 7900 K; of to 7900 give K; a to balance give a that balance better that reflected better reflected the actual the image actual target, image a white target, balance a white of balance 15,250 ofK was required. 15,250was required. K was required.

J. Mar. Sci. Eng. 2016, 4, 17 8 of 12 J. Mar. Sci. Eng. 2016, 4, 17 8 of 12

Figure 9. CloseClose-up‐up of the color chart in 4.3 m of water: left image is out of the camera with a white balance of 7900 K;K; thethe rightright image image is is corrected corrected to to the the in-air in‐air chart chart colors colors with with a whitea white balance balance of of 15,250 15,250 K. TheK. The original original color color chart chart is shown is shown in Figurein Figure2. 2.

4. Discussion Discussion

4.1. Visual Acuity The results show that snorkelers, for the same same depth, depth, visual visual target target and and equivalent equivalent field field of of view, view, have significantlysignificantly better visual performance than the range of digitaldigital cameras.cameras. The camera setset-up‐up used used is is considered considered to to be be one one of the of the best best currently currently available available [17,18] [17 with,18] withthe Nikkor the Nikkor lenses lenses being onebeing of one the of few the lenses few lenses that thatare aredesigned designed to work to work “wet”, “wet”, meaning meaning without without a adome dome and and associated additional air/water interface interface [17–20]. [17–20]. In In general, general, the the snorkelers snorkelers were were able able to to perceive detail at twice the resolution of the camera and, for the same water depth, had a field field of view three times that of the camera. This This was was particularly so so for for text, where the human observers were able to recognize letter shapes even if they were only partially defined.defined. In comparisons based just on shapes and lines, the human observers were still more effective than the camera but but the the difference difference was was less less pronounced. pronounced. Under ideal conditions, such as black lines on a white background, the camera and human results were similar; under non-idealnon‐ideal conditions, such as the black markings on a grey steel ruler, ruler, the humans exceeded the camera. The resultsresults need need to to be be interpreted, interpreted, however, however, with with an understanding an understanding of how of thehow human the human eye works. eye works.Unlike aUnlike camera, a camera, the resolving the resolving power ofpower the eye of the is not eye uniform is not uniform spatially, spatially, but rather but the rather central the central part of partthe retina of the (the retina fovea) (the has fovea) a higher has resolvinga higher powerresolving than power the edge than of the the edge retina of [15 the]. Forretina the [15]. experiments For the experimentsconducted in conducted this study, thein this snorkel study, observers the snorkel would observers have automatically would have moved automatically their eyes moved to place their the eyestarget to (the place underwater the target charts) (the underwater in the center charts) of their in visionthe center (even of if their they vision kept their (even head if they still) kept and their thus headat the still) point and of highestthus at the resolving point of power, highest while resolving the camera power, will while resolve the camera all parts will of resolve the image all parts equally. of Whatthe image this meansequally. is What that the this human means eye is that may the be human very good eye at may identifying be very good known at identifying or anticipated known targets, or anticipatedas the study targets, showed as that the itstudy outperformed showed that the it camera, outperformed but may the be camera, less good but at may scanning be less an good entire at scanningscene or identifying an entire scene items or at identifying the periphery. items Unfortunately at the periphery. it was Unfortunately not possible it to was stop not the possible observers to stopfrom the automatically observers from looking automatically at the targets looking so it was at the difficult targets to quantifyso it was the difficult resolving to quantify power of the an “off-target”resolving power area. of an “off‐target” area. The study was limited to one time, one area, a small number of observers and, in particular, to one water and bottom type. Changing the optical properties of the water column (such as increased ), lighting (such as overcast oror dawn/dusk)dawn/dusk) and and bottom type (bottom topography and complexity) may change the performance of both the camera and human observers. The study was conducted under under ideal ideal conditions conditions (high (high light, light, clear clear water, water, uniform uniform substrate, substrate, low low waves); waves); it would it would be interestingbe interesting to look to look at how at how light light and and turbidity turbidity change change visual visual acuity acuity given given the ability the ability for the for human the human eye toeye function to function in low in lowlight light (which (which may mayfavor favor the human the human observers) observers) contrasted contrasted with the with ability the ability to post to‐ post-processprocess images images to extract to extract detail detail (which (which may mayfavor favor the camera) the camera) [15]. [15].

4.2. Real World Use

J. Mar. Sci. Eng. 2016, 4, 17 9 of 12 J. Mar. Sci. Eng. 2016, 4, 17 9 of 12 4.2. Real World Use Images taken over a nearby reef area (20 m from the test site, taken at the same time, see Figure 10) wereImages visually taken overanalyzed a nearby by the reef authors area (20 for m fromwhat the information test site, taken they at could the same provide. time, seeVisually, Figure the10) weremain visuallybenthic types analyzed could by easily the authors be discerned, for what as informationcould the growth they couldforms provide.of the corals. Visually, From the this main and similarbenthic images types could it would easily be be possible discerned, to identify as could the the main growth benthic forms form of the (e.g., corals. sand, From coral, this rubble, and similar etc.), theimages growth it would form be of possible any coral to identify (branching, the main encrusting, benthic formdigitate, (e.g., etc. sand,) and, coral, for rubble,some easilyetc.), the identified growth taxa,form some of any level coral of (branching, taxonomic encrusting, information digitate, (such asetc. the) and, genus for Acropora some easily evident identified in the taxa,lower some right level and leftof taxonomic of Figure 10). information (such as the genus Acropora evident in the lower right and left of Figure 10).

Figure 10. ImageImage showing details of coral growth at depth of 2.7 m.

4.3. Implications for Autonomous Monitoring Humans are particularly good at target recognition, partially due to the way the eye scans to automatically position the area of highest resolving power over the target and partially because human brains are powerful image recognitionrecognition enginesengines [[21].21]. So, for for applications that involve the identificationidentification of targets, a human observer will potentially outperform a machine vision system. For passive observing, where the observer is analyzing an entire scene, camera systems can provide enough informationinformation for for an an equivalent equivalent analysis analysis to beto done,be done, such such as determining as determining the main the benthicmain benthic forms formsfrom Figure from Figure 10. This 10. means This means that the that imaging the imaging system system must be must designed be designed around around the specific the specific goal of goal the ofobserving the observing task rather task rather than as than a general as a general purpose purpose imaging imaging platform. platform. The work showed that, for the study conditions, the camera needs to be closer to the subject to record the same level of detail as our eyes eyes can can perceive. perceive. This This can can be be done done optically, optically, but but given given turbidity turbidity and light issues, this will normallynormally meanmean havinghaving thethe cameracamera physicallyphysically closer.closer. For the conditionsconditions encountered in in this this study study (shallow (shallow clear clear water), water), the the camera camera needed needed to be to 50% be 50%closer closer to record to record the same the levelsame of level detail; of detail; that is, that the is,acuity the acuityof a person of a person in a 5 m in depth a 5 m depthof water of waterwas matched was matched by a camera by a camera 2–3 m from2–3 m the from same the target. same target.The need The to need be closer to be to closer the objects to the being objects observed being observed means that means camera that systems camera willsystems have will a havemuch a smaller much smaller field of field view of viewand andso the so theautonomous autonomous platform platform will will need need to to do do more samples/runs in in order order to to cover cover the the same same area area as as a a human human observer; observer; this this will will take take more more time time and negate some of the potential efficienciesefficiencies that autonomous platformsplatforms bring.bring. The other implication of needing to be closer to the bottom is that issues of collision avoidance, underwater navigationnavigation and and positioning positioning become become critical. critical. Running Running a platform a platform at 2 at m 2 above m above a bottom a bottom with with complex topology is a far easier task than doing the same at 1 m or less. This means that advances in underwater navigation are a critical component of operationalizing underwater autonomous platforms.

J. Mar. Sci. Eng. 2016, 4, 17 10 of 12 complex topology is a far easier task than doing the same at 1 m or less. This means that advances in underwater navigation are a critical component of operationalizing underwater autonomous platforms. The final implication is the need to move away from general purpose imaging systems based on consumer-grade equipment to purpose-built systems based on machine vision systems. This includes high resolution sensors, quality optics, use of raw or non-compressed image and video formats, use of stills over video where possible, and the need to implement lighting that maximize the quality of the images and video collected. The goal is to produce images of archival quality that are amenable to future automated image analysis systems. There are cameras that meet these needs but these are typically used in industrial applications (such as quality control in manufacturing) and so adapting these types of technologies to environmental monitoring is an important next step in developing autonomous underwater imaging platforms.

5. Conclusions This short study showed that the currently available cameras and lenses cannot match a human observer for shallow water coral reef observations where the goal of the program is to identify particular targets (for example a particular type of coral or organism such as a crown-of-thorns starfish). In this scenario, trained observers tend to automatically use the highest resolving part of their eyes (by actively looking through moving their head and eyes to scan a scene), allowing them to not only have a greater optical acuity but, by scanning, to also have a greater field of view. This increased field of view and visual acuity coupled with the ability of the human mind to recognize shapes and objects [21] puts the human observer ahead for this type of observing. Where the observing goal involves more passive observation, such as estimating the cover of benthic forms where the entire scene needs to be scanned and analyzed, cameras should be able to match what humans can do. The ability of a camera to create a permanent record is an important advantage allowing change detection by comparing images taken at different times along with allowing future analysis of particular targets or benthic types. The implication for autonomous platforms is that, to match human capability, the imaging systems need to be optimized in terms of the sensors, lenses, lighting, image formats and processing and deployment. There is a need to utilize advances in machine vision systems over consumer-grade equipment with the explicit goal of producing images suitable for automated image analysis to accommodate future advances in this area. For studies looking to identify particular targets (such as pest species), cameras will need to be located closer to the subject, about half what a person would need, with a corresponding reduction in field of view. The camera needs to record images in raw format to allow for color correction and sharpening to be done in post-processing and to ensure that compression artifacts, found in JPEG and other compressed formats, do not interfere with automated image analysis routines. Positioning cameras closer to complex terrain, such as that found on a coral reef, requires more complex navigation and collision avoidance capabilities than what is needed if the system is able to operate further away from the target. Reducing the height also reduces the field of view, requiring more passes over an area to capture the entire scene. Autonomous platforms will therefore need to operate lower and longer, and so these requirements need to be part of the design specifications for platforms under development. The limited nature of this study (one site and only two observers) highlights the need to better understand human and imaging system performances under a range of other conditions, especially in turbid water and low light. It also highlights that the type of observing (active target recognition or passive scene analysis) will change the relative strength of human and camera-based observing. A full understanding of the environment and goals of the observing needs to be obtained first to inform the specifications of the imaging system deployed. There is a need to move away from generic consumer-based solutions to targeted systems using industrial vision equipment tailored to deliver against the project outcomes. J. Mar. Sci. Eng. 2016, 4, 17 11 of 12

A full understanding of the imaging requirements and corresponding limitations is needed in order to develop platform mission profiles that will deliver the required set of outcomes. This in turn will need to link into the fundamental platform design to ensure the platform can support the operational imaging requirements. These linkages, between platform design, mission planning and image outcomes, need to be fully understood so that new platforms are able to support the operational capabilities of the current imaging systems while delivering the required science outcomes.

Acknowledgments: The authors would like to thank Shaun Hahn and Ray Boyes of AIMS for help in conducting the field work as well as the staff of the Heron Island Research Station, run by the University of Queensland, for their help. All trademarks mentioned are the property of their respective owners. The authors also acknowledge the valuable input of the two reviewers and thank them for their help. Author Contributions: Scott Bainbridge designed the study, oversaw the field work and analyzed the results. Scott Gardner designed and built the equipment rig, selected the test site and designed and conducted the field sampling. The other snorkelers who did the field work have been identified in the Acknowledgements. The manuscript was written by Scott Bainbridge with editorial input from Scott Gardner. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Boavida, J.; Assis, J.; Reed, J.; Serrão, E.A.; Gonçalves, J.M. Comparison of small remotely operated vehicles and diver-operated video of circalittoral benthos. Hydrobiologia 2016, 766, 247–260. [CrossRef] 2. Singh, W.; Örnólfsdóttir, E.B.; Stefansson, G. A camera-based autonomous underwater vehicle sampling approach to quantify scallop abundance. J. Shellfish Res. 2013, 32, 725–732. [CrossRef] 3. Desa, E.; Madhan, R.; Maurya, P. Potential of autonomous underwater vehicles as new generation ocean data platforms. Curr. Sci. 2006, 90, 1202–1209. 4. Patterson, M.; Relles, N. Autonomous underwater vehicles resurvey bonaire: A new tool for coral reef management. In Proceedings of the 11th International Coral Reef Symposium, Ft. Lauderdale, FL, USA, 7–11 July 2008; pp. 539–543. 5. Armstrong, R.A.; Singh, H.; Torres, J.; Nemeth, R.S.; Can, A.; Roman, C.; Eustice, R.; Riggs, L.; Garcia-Moliner, G. Characterizing the deep insular shelf coral reef habitat of the hind bank marine conservation district (US virgin islands) using the seabed autonomous underwater vehicle. Cont. Shelf Res. 2006, 26, 194–205. [CrossRef] 6. Kocak, D.M.; Dalgleish, F.R.; Caimi, F.M.; Schechner, Y.Y. A focus on recent developments and trends in underwater imaging. Mar. Technol. Soc. J. 2008, 42, 52–67. [CrossRef] 7. Moran, P.J.; Johnson, D.; Miller-Smith, B.; Mundy, C.; Bass, D.; Davidson, J.; Miller, I.; Thompson, A. A Guide to the Aims Manta Tow Technique; Australian Institute of Marine Science (AIMS): Townsville, Australia, 1989. 8. Long, B.; Andrews, G.; Wang, Y.-G.; Suharsono. Sampling accuracy of reef resource inventory technique. Coral Reefs 2004, 23, 378–385. [CrossRef] 9. Mallet, D.; Wantiez, L.; Lemouellic, S.; Vigliola, L.; Pelletier, D. Complementarity of rotating video and underwater visual census for assessing species richness, frequency and density of reef fish on coral reef slopes. PLoS ONE 2014, 9, e84344. 10. Pelletier, D.; Leleu, K.; Mou-Tham, G.; Guillemot, N.; Chabanet, P. Comparison of visual census and high definition video transects for monitoring coral reef fish assemblages. Fish. Res. 2011, 107, 84–93. [CrossRef] 11. Harvey, E.; Fletcher, D.; Shortis, M.R.; Kendrick, G.A. A comparison of underwater visual distance estimates made by scuba divers and a stereo-video system: Implications for underwater visual census of reef fish abundance. Mar. Freshw. Res. 2004, 55, 573–580. [CrossRef] 12. Sloan, L.L. Measurement of visual acuity: A critical review. Arch. Ophthalmol. 1951, 45, 704–725. [CrossRef][PubMed] 13. Schulz, J. Geometric optics and strategies for subsea imaging. Subsea Optics Imaging 2013.[CrossRef] 14. Lyons, M. Photography Calculators. Available online: http://www.tawbaware.com/maxlyons/calc.htm (accessed on 5 November 2015). 15. Cambridge in Colour. Cameras versus the Human Eye. Available online: http://www.cambridgeincolour. com/tutorials/cameras-vs-human-eye.htm (accessed on 5 November 2015). J. Mar. Sci. Eng. 2016, 4, 17 12 of 12

16. Ross, H.E.; Nawaz, S. Why do objects appear enlarged under water? Arquivos Brasileiros de Oftalmologia 2003, 66, 69–76. [CrossRef] 17. Connick, J. Nauticam na-a7/ Sony Review: Mirrorless in Mexico. Available online: http://diveadvisor.com/ sub2o/nauticam-na-a7-sony-review-mirrorless-in-mexico# (accessed on 6 January 2016). 18. White, B. Nikonos Lenses on a ? Available online: http://www.backscatter.com/ learn/article/article.php?ID=94 (accessed on 6 January 2016). 19. Rorslett, B. Lenses for Nikonos (i-v) Mount. Available online: http://www.naturfotograf.com/ lens_nikonos.html (accessed on 6 January 2016). 20. Church, J. Jim Church's Essential Guide to Nikonos Systems; Aqua Quest Publications, Inc.: Locust Valley, NY, USA, 1994. 21. Oliva, A.; Torralba, A. The role of context in object recognition. Trends Cogn. Sci. 2007, 11, 520–527. [CrossRef] [PubMed]

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).