Tracking the Motion of Box Jellyfish

Home , Dioithona oculata, Tripedalia cystophora

Tracking the Motion of Box Jellyﬁsh

Chapter 1. Introduction 5

black and 255 being white. This condenses the pixel to have one value between 0 and Tobias Kjellberg Magnus Oskarsson255. Tobias The 15 film Palmér sequences has Kalle a lot of features Åström in common but still each film sequence Centre for Mathematical Sciences,di Lundfferentiates University, from the others inSweden one or more aspects. Depending how the jellyfish is set up the light and shadows form differently which will make each film sequence different Chapter 1. Introduction {magnuso,3 tobiasp, kalle}@maths.lth.sefrom the each other. These differences can be observed in fig 1.3 below.

Figure 1.1: The Tripedalia Cystophora viewed from the side where Rh marks the Fig. 1. Left:location The box of one jellyfish rhopalium tripedalia cystophora is only a couple of mm large and almost completely transparent. Right: A close-up of the rhopalia body seen in figfrom1.1. In one this frame thesis there recorded are 15 indifferent the experimental film sequences to setup. be analyzed, each with different box jellyfishes, light conditions and artefacts, forcing the solution to be more of a generic detection-algorithm. The method used to find the pacemakers in the film sequences isAbstract divided into—In three steps;this detecting, paper clusteringwe investigate and selection a which system for tracking will be describedthe thoroughly motion later of in boxthe thesis. jellyfish The aimtripedalia of this master’s cystophora thesis is to: in a special test setup. The goal is to measure the motor response of the animal find all the rhopalia in every film sequence. • given certain visual stimuli. The approach is based on tracking focus onthe evaluating special different sensory detection-methods structures since this – the is the rhopalia crucial step. – of the box jellyfish • from high-speed video sequences. We have focused on a real- doing the above in almost-realtime. • time system with simple building blocks in our system. However, using a combination of simple intensity based detection and model 1.4 The jellyfishbased tracking - Tripedalia we achieve cystophora promising tracking results with up to 95% accuracy. Figure 1.3: Example of how different the film sequences can be. Tripedalia cystophora is a roughly 1 cm sized box jellyfish whose habitat is in the Fig. 2. Example input frames from a number of different sequences. Notice Even though great measure has been done in order to minimize artefacts in the film mangrove swamps. It preys on small copepods that swarm between the roots of the the high variance in lighting conditions. In some frames the rhopalia are barely I. INTRODUCTION sequences the difference between them can be quite large. Some of the film sequences mangrove trees. The copepods gather in light shafts created by the canopy above. The discernible and in many frames there are structures that have an appearance very similar to the rhopalia. box jellyfish uses itsBox visual systemjellyfish, to detect or those cubomedusae, light shafts but it cannot have see very the special visual are brighter, making it easier to find the rhopalia while some are darker and thus making copepods themselves.systems, The interesting [1], [2]. part The in this visual thesis is their system visual systems is based which on four identical it hard to distinguish the rhopalia from the background (figure 1.4 and 1.5). An artefact that is visible in all film sequences is a smudge on the camera lens that looks like an is distributed atsensory four sensory structures, organs, the rhopalia. which Each rhopalium are called is carrying rhopalia. six eyes Each rhopalia were three of them are looking upwards and the other three looking downwards. The animalselongated direct rhopalium themselves, (A in figure 1.6 i.e.). The how large they resemblance would between like the to smudge move. and In consists of six different eyes: the upper and lower lens eyes, ones looking downwards are also directed inwards towards the bell which results in the order to do this, an initial goal which we discuss in this paper the pit eyes, and the slit eyes [3]–[8]. The lens eyes have box jellyfish to “look through” its own bell. This unique visual system enable the box is how to track the four rhopalia. These appear as four dark jellyfish to displayimage-forming visually guided behaviours optics that and appear resemble remarkable for vertebrate such “simple” and cephalopod discs, situated on the perimeter of the bell, see figure 2. There box jellyfish. eyes [9]–[11]. have been a number of previous studies on motor response The role of vision in box jellyfish is known to involve from controlled visual stimuli, see e.g. [18], [19]. phototaxis, obstacle avoidance, control of swim-pulse rate and advanced visually guided behaviour [12]–[14]. Most known II. EXPERIMENTAL SETUP species of box jellyfish are found in shallow water habitats where obstacles are abundant [15]. Medusae of the stud- In this study we used in total 33 animals, with sizes ranging ied species, T. cystophora, live between the prop roots in from 0.43 to 0.89 cm. The animals were tethered by the top Caribbean mangrove swamps [16], [17]. They stay close to of the bell during the experiments, using a glass pipette with the surface [16] where they prey on a phototactic copepod that gentle suction, and placed in a Plexiglas tank with inside gathers in high densities in the light shafts formed by openings dimensions of 5 5 5 cm. The vertical walls of the tank in the mangrove canopy. The medusae are not found in the were covered with× diffusing× paper and a neutral density filter. open lagoons, where they risk starvation [12]. The overall goal Each vertical wall was illuminated from the outside by four of this project is to learn more about the visual system and blue-green LEDs. The diffuser was used to make a plane light the neural processes involved. One interesting problem is the source, while the neutral density filter was used to increase study of the connection between visual stimuli presented to the contrast between lit and dark panels and switching one the animals and the motor response of the animal. However, or more panels off was used as the behavioural trigger. The tracking free-swimming jellyfish poses a very demanding colour of the LEDs matched the maximum spectral sensitivity tracking problem. Instead we look at a special experimental of the animals and had a peak emission at 500 nm. During setup where the animals are tethered whilst they are submitted the experiments a box was placed over the set-up in order to to different light stimuli. The goal is then to track how the eliminate visual cues coming from outside. Image sequences Chapter 2. Framework 10

Chapter 2. Framework 10 Figure 2.2: Original picture.

%0( !"#"$#%&'(

120( $)*+#",%'-(

130( Figure 2.2: Original picture. Figure 2.3: The detections.

14056( 140( Fig. 5. A typical detection result is shown. Since there are many rhopalia-like #,.$/%'-( structures in the images we get a largeOne number idea could be of to create false an algorithmpositives, that checks but that these each pixel inside a circle is less than a specific value, meaning darker, and in the same manner checks some number will be eliminated in the followingof clustering pixels outside theand circle tracking if they are steps.greater than a certain value, meaning brighter. This will probably result in a time consuming algorithm with not that good accuracy. The problem with this algorithm is that there are a lot of points inside the circle, about Fig. 3. An overview of the system. For each input frame Ii we run the 500 pixels, that need to be checked which takes a lot of time and by comparing to fix detection algorithm. This produces a number of tentative points Xdi. This the images. See figure 4 forvalues a close-up we won’t account view. for the film For sequences this with darker reason or brighter light settings. set of points is then sent to the clustering algorithm, which then outputs a we have tested a number ofIn the template film sequences with based different lightapproaches settings many false for positives will emerge as smaller number of refined positions Xc . These points are fed into the tracking well. This forces our detection algorithm to have more breadth thus not only finding i detection. The template is basedthe rhopaliaon but false the positives assumption as well. So the goal here that is to create we a detection method algorithm, alongside the four point positions from the previous frame, Xti 1. which minimizes the false positives but still manages to find all the rhopalia and does − have a numberFigure of 2.3: pixelsThe detections. near the rhopalia that are dark. Further The final output is then the four detected points Xt . this within reasonable time, i.e minimize the amount of data to check. i outside the rhopalia we should have pixels that are brighter. One idea could be to create an algorithm that checks that each pixel inside a circle is lessFigure than a specific 4 value, shows meaning darker, some and inexample the same manner templates checks some number that we have tried. For were recorded with a high-speed camera operated at 150 of pixelsspeed outside the we circle have if they are adopted greater than aquite certain value, sparse meaning templates. brighter. Top row shows This will probably result in a time consuming algorithm with not that good accuracy. frames per second. The dataset consists of 15 video sequences, The problempixels with thisthat algorithm should is that there be are inside a lot of points the inside rhopalia, the circle, about and hence should be each with around 100 greyscale frames. Each greyscale frame 500 pixels,darker. that need Bottom to be checked which row takes shows a lot of time the and by pixels comparing to that fix are assumed to be values we won’t account for the film sequences with darker or brighter light settings. has a resolution of 800 864 pixels. Depending on how outside the rhopalia. For each point Xi(j) in the input image × In the film sequences with different light settings many false positives will emerge as the jellyfish is set up, the light and shadows form differently well.I Thisi, forces we our can detection then algorithm define to have more a number breadth thus not of only inside finding and outside points, the rhopalia but false positives as well. So the goal here is to create a detection method Ωin(j) and Ωout(j). Examples of these point sets can be seen which will make the video sequences different from each other, which minimizes the false positives but still manages to find all the rhopalia and does see figure 2. Even though great measure has been done in this withinin figurereasonable time, 4. i.e We minimize have the amount looked of data to check. at two types of measures, one order to minimize artefacts in the film sequences the difference absolute and one relative. For the absolute measure we define between them can be quite large. Some of the film sequences a threshold for the inner pixels, tin and one for the outer pixels, are brighter, making it easier to find the rhopalia while some tout. We then count the number of inside and outside pixels are darker and thus making it hard to distinguish the rhopalia that fulfil the constraints, i.e. from the background. The tethering of the animal is also visible N (j)= Γ(Ω (j) t )+ Γ(Ω (j) t ), (1) in all sequences. This tether-shadow causes problem when abs in ≤ in out ≥ out the jellyfish is contracting and the rhopalia is moving over where Γ(x)=1 if x is true and zero otherwise, and the tether-shadow. Since the box jellyfish is moving in every video sequence some parts of the jellyfish are moving in and Xdi = Xi(j) Nabs(j) >Ndet , (2) out of focus. The physical nature of the jellyfish, i.e. being { | } transparent, also affects the appearance and causes refraction where Ndet is some bound. on the surroundings. For the relative measure we randomly compare n inside The rhopalia in each frame in each video sequence has and outside pixels, and count how many of the inside pixels been manually annotated in order to perform evaluation and are darker than the outside pixels. So if we let R(Ω) denote testing of algorithms. a function that randomly chooses a point from the set Ω we have, III. SYSTEM OVERVIEW n In this section we will describe our system. Since the focus Nrel(j)= Γ(R(Ωin(j)) Ndet . (4) every frame i, a first detection step – which only uses the local { | } intensity distribution around a point – produces a large number We have evaluated our whole system in order to find templates that generate enough possible detections, i.e. that in most of detections, Xdi. These points are then clustered into a set cases at least generates the four correct points, but that do of points Xci, in order to remove multiple detections, as well as for improved positional accuracy in the detected points. not generate excessive amounts of false positives. In figure 5 Finally these clustered positions are sent to the tracking step, a typical detection result is shown. Since there are many which also gets as input the previous frame’s four detected rhopalia-like structures in the images we get a large number of false positives, but these will be eliminated in the following points, Xti 1. The output is the four detected points Xti. In the following− subsections we will describe the different steps clustering and tracking steps. in more detail. B. Clustering A. Detection We take a scale space approach for clustering the detec- The rhopalia appear as dark discs on a brighter background tions; by smoothing the input image Ii isotropically with a in the images that are quite consistent in size and appearance in Gaussian kernel we get a low scale version Ism. We then ChapterChapter 3. 3.DetectionDetection methods methods 2020

Chapter 3. Detection methods 20

Chapter 3. Detection methods 20 (c)(c) Inside Inside pattern pattern 1. 1. (d)(d) Inside Inside pattern pattern 2. 2. ChapterChapter 3. 3. DetectionDetection methods methods 20 (c) Inside pattern 1. (d) Inside pattern 2.

(c) Inside pattern 1. (d) Inside pattern 2. (e)(e) Inside Inside pattern pattern 3. 3. (f)(f) Outside Outside pattern pattern 1. 1.

(c)(c) Inside Inside pattern pattern 1. 1. (d)(d) Inside pattern 2. (e) Inside pattern 3. (f) Outside pattern 1.

(e) Inside pattern 3. (f) Outside pattern 1. (g)(g) Outside Outside pattern pattern 2. 2. (h)(h) Outside Outside pattern pattern 3. 3. Fig. 4. A number of templates for the detection step. Top row shows pixels that should be inside the rhopalia, and hence should be darker. Bottom row shows the pixels that are assumed(e)(e) to Inside Inside bepattern outside pattern 3. 3. the rhopalia. (f)(f) Outside pattern 1. (g) Outside pattern 2. (h) Outside pattern 3.

find all local minima Xloc of Ism. The reason for this is the possible subsets of four points Yi of Xdi we change coordinate appearance of the rhopalia as dark spots in the images. We system so that the first three points are at (0, 0), (0, 1) and then calculate how many detections we have within a vicinity (1, 0) and the fourth point at Ya. If there are n such subsets of each Xloc, we find the best subset by

Nd T 1 kopt = arg min (Ya(k) Xm) Σ− (Ya(k) Xm), (8) N (j)= Γ( X (j) Xd(k) < ), (5) k − − loc || loc − ||2 cluster k=1 and (g) Outside pattern 2. (h) Outside pattern 3. (i) Outside pattern 4. and if there are a minimum number Nmin of detections, then (i) Outside pattern 4.Xti = Yi(kopt). (9) (g) Outside pattern 2. (h) Outside pattern 3. (i) Outside pattern 4. we add this local minimum(g) Outside pattern to our 2. clustered points Xci, (h) Outside pattern 3. FigureFigure 3.5: 3.5:TheThe di difffferenterent patterns patterns used used in in detection detection method method 5. 5. For both equation (7) andFigure (8) we 3.5: canThe choose differentXt patternsi = Xt usedi 1 in detection method 5. Xc = X (j) N (j) N . (6) − i { loc | loc ≥ min} if the optimal value is to large, i.e. if the best clustered points This gives a fast, accurate and quite robust way of clustering are too far away from the previous frame’s points we choose the detections. See figure 6b for an example result of the the previous frame’s point positions for the new frame. See clustering. figure 6c for an example result of the tracking. We have not yet focused on the tracking, and more complex C. Tracking motion models are of course possible and quite easy to For the tracking step we have looked at a number of simple implement into the framework, such as e.g. Kalman filters [20] or Particle filters [21]. algorithms.(i) Outside pattern The 4. input consists of the four points from the previous frame, Xti 1 and a number(i) of Outside possible pattern 4. candidate − (i) Outside pattern 4. Figure 3.5: The diffpointserent patternsXci. used In our in detection model method we could 5. also look at the scenario IV. EXPERIMENTAL RESULTS where the general statisticalFigure 3.5: distributionThe different of patterns the four used inpoints detection is method 5. Figure 3.5: The different patterns used in detection method 5. given in some form. Arguably the most simple tracking is In total we have in our test setup 1469 frames, and we just choosing as new points, the four closest points from the have manually marked the true coordinates of the rhopalia in candidate points to the points from the previous frame, i.e. each frame. We have tested our system on the video sequences and compared it to ground truth in the following way. For the Xti(j) = arg min Xti 1(j) Xci(k) 2,j=1,...,4. detection accuracy we count for each of the four rhopalia in Xci(k) || − − || N (7) the images if there are or more detections within a circle with radius of 10 pixels around the true coordinate. We have We have also looked at learning an affine shape model of used N = 10 and N = 20. For the cluster accuracy we have the four points. In this case we can fix a canonical coordinate counted the percentage of rhopalia that have a cluster within a system by placing the first three points at (0, 0), (0, 1) and circle of 10 pixels around the correct coordinate, and likewise (1, 0). The final points will then be placed at some point Xa. for the tracking accuracy we count the percentage of rhopalia Using a large number of ground truth positions we can estimate with a tracked point within 10 pixels. We do this for all the the mean Xm and covariance matrix Σ for Xa. This gives us a 1469 frames, with four rhopalia in each frame. We have mainly way of finding the four points that statistically most resemble tested different detection parameters and settings. In figure 7 a four-point configuration, given affine deformations. For all the resulting accuracy percentages can be seen. We see that Chapter 2. Framework Chapter 2. Framework 11 12

Chapter 2. Framework 11 Figure 2.4: The detections. Figure 2.6: The clusters.

Figure 2.4: The detections. Figure 2.5: The clusters formed from the detections. Figure 2.7: The ﬁnal four coordinates. Fig. 6. The ﬁgure shows the output of the three steps of our system for an example input image. Left all the detected points are shown in red. These are fed into the clustering which outputs the resulting yellow clustered points in the middle. Finally the tracking outputs the four points to the right, depicted in green. Chapter 5. Results 2.3 Clustering step24 2.4 Tracking step

Here the goal is to reduce the amount of detections to a much smallerThe tracking amounts step of is clusters. the step where the ﬁnal four coordinates are chosen from the clusters

100 100 This particular clustering method first smoothen[4] G. the Laska imagewhich by and using M.can Gaussian beHündgen, seen in blur figure “Morphologie[3] 2.6 and 2.7. The und first inputultrastruktur to the tracking der algorithm lichtsin- in each 90 90

80 80 to get rid of all noise and then correlating the detectionsnesorgane togetherfilm von sequence with tripedalia the local is the minima ground cystophora truth for conant the first (cnidaria, image. In the cubozoa),” next picture theZool previous 70 70 in the blurred image. The reason for correlatingJb the Anat detections,tracked vol. with 108, coordinates the pp. local 107–123, minima is used as the 1982. ground truth and the closest clusters within 20 pixels 60 60 is because a rhopalium should present itself as a dark spotwill in the be blurred the next image, tracked i.e a coordinates. There is a flaw having this sort of algorithm, 50 50 [5] A. Garm, F. Andersson, and D.-E. Nilsson, “Unique structure and optics 40 40 local minima. And if there are three or more detectionsof the in lesserwhich five pixels eyestakes distance the of previous the from box a tracked jellyfish coordinates tripedalia as input. cystophora,” It will produceVision false positives >= 10 detections accuracy (%) 30 >= 20 detections accuracy (%) 30 local minima it will be marked as a cluster. Theresearch resulting, clusterwhen vol. no from48, detections no. detections 8, are pp.found can 1061–1073, because thismeans 2008. that the tracked positions will be static 20 20 be seen in figure 2.5 until close by detections and clusters are found. This will mean that, even though no 10 10 [6] J. Piatigorsky and Z. Kozmik, “Cubozoan jellyfish: an evo/devo model 0 0 detections are found, the rhopalia will eventually overlap with with the static tracked 0 2000 4000 6000 8000 Figure10000 12000 2.5:14000The16000 clusters formed0 from2000 the4000 detections.6000 8000 10000 12000 14000 16000 Mean number of detections Mean number of detections for eyes and other sensory systems,” Int J Dev Biol, vol. 48, no. 8-9, pp. 719–729,coordinates 2004. and be marked as found. (a) 10 detections accuracy. (b) 20 detections accuracy. 2.3 Clustering≥ step ≥ [7] Z. Kozmik, S. K. Swamynathan, J. Ruzickova, K. Jonasova, V. Paces, 100 100 C. Vlcek, and J. Piatigorsky, “Cubozoan crystallins: evidence for 90 Here the goal is to reduce the amount of detections90 to a much smaller amounts of clusters. convergent evolution of pax regulatory sequences,” Evolution & de- 80 80 This particular clustering method first smoothen the image by using Gaussian blur [3] 70 70 velopment, vol. 10, no. 1, pp. 52–61, 2008.

60 to get rid of all noise and then correlating the detections60 together with the local minima [8] F. S. Conant, The Cubomedusæ: a memorial volume. The Johns 50 in the blurred image. The reason for correlating50 the detections with the local minima 40 40 Hopkins press, 1898, vol. 4, no. 1. Cluster accuracy (%) is because a rhopalium should present itself asTracked accuracy (%) a dark spot in the blurred image, i.e a 30 30 [9] D.-E. Nilsson, L. Gislén, M. M. Coates, C. Skogh, and A. Garm, 20 local minima. And if there are three or more detections20 in five pixels distance from a 10 local minima it will be marked as a cluster. The10 resulting cluster from detections can “Advanced optics in a jellyfish eye,” Nature, vol. 435, no. 7039, pp. 0 0 0 2000 4000 6000 8000 10000 12000 14000 16000 0 2000 4000 6000 8000 10000 12000 14000 16000 201–205, 2005. be seen in figureMean number2.5 of detections Mean number of detections [10] V. J. Martin, “Photoreceptors of cubozoan jellyfish,” in Coelenterate (c) Cluster accuracy. (d) Tracked accuracy. Biology 2003. Springer, 2004, pp. 135–144. Figure 5.1: Mean number of detection in relation to the different accuracies. These [11] M. Koyanagi, K. Takano, H. Tsukamoto, K. Ohtsu, F. Tokunaga, and four diagrams present the number of detections related to the accuracy. The difference Fig. 7.between Evaluation those measures of a numberis how it’s calculated of different which candetection be seen in parameters4.1. and A. Terakita, “Jellyfish vision starts with camp signaling mediated by settings. The accuracy for the different steps in the system is shown. The opsin-gs cascade,” Proceedings of the National Academy of Sciences, best performing systems have a tracking accuracy of 95%. vol. 105, no. 40, pp. 15 576–15 580, 2008. 100 5 [12] E. Buskey, “Behavioral adaptations of the cubozoan medusa tripedalia 90 cystophora for feeding on copepod (dioithona oculata) swarms,” Marine the best performing80 systems have4 a tracking accuracy of 95%. Biology, vol. 142, no. 2, pp. 225–232, 2003. 70 [13] A. Garm and S. Mori, “Multiple photoreceptor systems control the

60 swim pacemaker activity in box jellyfish,” The Journal of experimental biology, vol. 212, no. 24, pp. 3951–3960, 2009. 50 V. C ONCLUSIONS [14] A. Garm, M. Oskarsson, and D.-E. Nilsson, “Box jellyfish use terrestrial 40 visual cues for navigation,” Current Biology, vol. 21, no. 9, pp. 798– We haveMean cluster accuracy (%) 30 in this paper investigated how a system for 803, 2011. 1 20 detection of the special eyes 3– the rhopalia – of box jellyfish [15] M. M. Coates, “Visual ecology and functional morphology of cubozoa can be constructed.10 We have shown that using a low-level (cnidaria),” Integrative and comparative biology, vol. 43, no. 4, pp. 2 542–548, 2003. 0 detection method0 in500 combination1000 1500 with2000 clustering2500 3000 and tracking Mean number of detections [16] S. E. Stewart, “Field behavior of tripedalia cystophora (class cubozoa),” we can get very good performance on a varying dataset. The Marine & Freshwater Behaviour & Phy, vol. 27, no. 2-3, pp. 175–188, systemFigure can 5.2: beMean run cluster in real-time. accuracy in relation The to basic mean number idea isof detections. that theThe system 1996. mean cluster accuracy within each detection method with outliers removed. should be used in order to learn more about the visual system [17] B. Werner, C. E. CUTRESS, and J. P. STUDEBAKER, “Life cycle of and neural processes of box jellyfish. The next step would be tripedalia cystophora conant (cubomedusae),” 1971. to, from the tracking of the rhopalia, measure the motion of [18] R. Petie, A. Garm, and D.-E. Nilsson, “Velarium control and visual the whole bell of the jellyfish in order to measure the motor steering in box jellyfish,” Journal of Comparative Physiology A, vol. response of the animal given certain visual stimuli. 199, no. 4, pp. 315–324, 2013. [19] ——, “Contrast and rate of light intensity decrease control directional swimming in the box jellyfish tripedalia cystophora (cnidaria, cubome- REFERENCES dusae),” Hydrobiologia, vol. 703, no. 1, pp. 69–77, 2013. [20] G. Welch and G. Bishop, “An introduction to the kalman filter,” 1995. [1] E. W. Berger, “The histological structure of the eyes of cubomedusae,” [21] N. J. Gordon, D. J. Salmond, and A. F. Smith, “Novel approach to Journal of Comparative Neurology, vol. 8, no. 3, pp. 223–230, 1898. nonlinear/non-gaussian bayesian state estimation,” in IEE Proceedings [2] C. Claus, “Untersuchungen uber charybdea marsupialis,” Arb. Zool. Inst. F (Radar and Signal Processing), vol. 140, no. 2. IET, 1993, pp. Wien, Hft, vol. 2, pp. 221–276, 1878. 107–113. [3] J. S. Pearse and V. B. Pearse, “Vision of cubomedusan jellyfishes.” Science (New York, NY), vol. 199, no. 4327, pp. 458–458, 1978.