MASTER THESIS

Martin Ser´yˇ

Detection of dynamic gabor patches in 1/f noise

Department of Software and Computer Science Education

Supervisor of the master thesis: Mgr. Dˇechtˇerenko Filip, Ph.D. Study programme: Computer Science Study branch: Artificial Intelligence

Prague 2021 I declare that I carried out this master thesis independently, and only with the cited sources, literature and other professional sources. It has not been used to obtain another or the same degree. I understand that my work relates to the rights and obligations under the Act No. 121/2000 Sb., the Copyright Act, as amended, in particular the fact that the Charles University has the right to conclude a license agreement on the use of this work as a school work pursuant to Section 60 subsection 1 of the Copyright Act.

In ...... date ...... Author’s signature

i I would like to thank my supervisor Mgr. Dˇechtˇerenko Filip, Ph.D. for the idea and his help with this work. Next I thank my girlfriend for grammar check and patience she has had with me during my work on this thesis.

ii Title: Detection of dynamic gabor patches in 1/f noise

Author: Martin Ser´yˇ

Department: Department of Software and Computer Science Education

Supervisor: Mgr. Dˇechtˇerenko Filip, Ph.D., Department of Software and Com- puter Science Education

Abstract: Research focusing on static scenes with static objects is omitting the time factor from real life examples we are trying to study. Can we say that a lifeguard looking for a drowning man is using the same brain processes that were observed in the laboratory for static scenes? We can conclude that a static scene is a big simplification of the task itself.

The aim of this thesis is to prepare a tool which would allow researching dynamic scenes and thus broadening the possibilities of visual detection tasks at hand.

Along the tool we also present a couple of simplified examples with which we would like to demonstrate the utilization of the tool. All concluding with a final experiment in which we will try to detect masked patterns in a noisy environment.

Keywords: detection modeling 1/f noise Gabor patch

iii Contents

Introduction 3

1 Theoretical background 5 1.1 Visual sensory system ...... 5 1.2 Object detection ...... 6 1.3 ...... 6 1.3.1 Background blending ...... 7 1.3.2 ...... 7 1.3.3 ...... 8 1.4 Motion detection ...... 8 1.4.1 Human motion detection ...... 8 1.4.2 Computer motion detection ...... 9 1.4.3 Motion analysis ...... 9 1.4.4 Motion dazzle ...... 10

2 Technical background 11 2.1 Image properties ...... 11 2.1.1 Luminance ...... 11 2.1.2 Contrast ...... 12 2.2 Pink noise ...... 12 2.3 Gabor patch ...... 14 2.3.1 Sinusoidal grating ...... 15 2.3.2 Gaussian window ...... 15 2.4 Structural similarity index measure (SSIM) ...... 17 2.4.1 Luminance comparison ...... 18 2.4.2 Contrast comparison ...... 18 2.4.3 Structure comparison ...... 18 2.4.4 Combining function ...... 19 2.4.5 Complex wavelet SSIM - CW-SSIM ...... 19

3 Methods 20 3.1 Background description ...... 20 3.1.1 Linear interpolation ...... 20 3.1.2 3D pink noise ...... 22 3.1.3 Shifting pink noise ...... 22 3.2 Stimuli description ...... 24 3.3 Scene description ...... 25 3.4 Detection approach ...... 25

4 Preliminary experiments 27 4.1 Simple POC ...... 28 4.2 SSIM precision ...... 30 4.3 Dynamic stimuli ...... 31 4.4 Moving stimuli ...... 33

1 5 Main experiment 34 5.1 Methods ...... 34 5.1.1 Sliding window ...... 34 5.1.2 SSIM ...... 35 5.1.3 Frame difference ...... 35 5.1.4 Improved frame difference ...... 35 5.1.5 Heat maps ...... 35 5.1.6 Heat map evaluation ...... 36 5.2 Results ...... 36 5.2.1 SSIM ...... 36 5.2.2 Frame difference ...... 39 5.2.3 Recapitulation ...... 43

Conclusion 44

Further research 45

Bibliography 46

List of Figures 54

Glossary 55

A Attachments 57 A.1 Source code ...... 57 A.1.1 Experiments ...... 57 A.1.2 Final experiment ...... 57 A.2 Results ...... 58

2 Introduction

Object detection is an everyday task for many animals and humans alike. Preda- tors need to detect prey before they are even able to attack. Prey on the other hand needs to detect the predator before they can attack. For humans those tasks are often not the question of survival, but important nevertheless. Have you ever tried finding your car keys on a table? Was the table ever messy? The less order there is on the table and the more things there are the harder the task. The human brain has to process a lot of information that is not relevant. This irrelevant information is called noise. In nature it is no different. Rarely is the prey being presented to the predator on a simple background with no noise. Natural scenes are never completely random. If they were, our sensory systems would have a hard time processing all the visual information. Luckily natural im- ages are not random and their properties allow us to design simplified experiments that resemble the real time situations rather well. One such example is 1/f noise which shares many statistical properties with natural scenes. Using 1/f noise instead of real natural scenes is beneficial in laboratory con- ditions. We can remove unnecessary distractors that are present in natural scenes and observe the search task in a more controlled environment. Another discovery we are using is a gabor patch named after Hungarian sci- entist Dennis Gabor. The gabor patch has properties similar to the simple cells in the visual cortex of mammalian brains. However there is one feature that not even real images share with natural scenes. Real images are not dynamic whereas nothing is ever completely static in nature. We probably will not be able to take the very same photo of the sea waves twice, because the sea itself is dynamic and moves. The waves move around in the sea, the leaves quiver in the and the water flows in the river. To better illustrate the problematics we present a table with real life examples.

Environemnt Dynamic Static Dynamic Hunting a prey Catching a mosquito Stimulus Static Searching for a body in water Locating keys

Hunting a prey is an example of a fully dynamic environment. The prey moves around and there is a lot of distractors present especially if the prey is a part of a bigger group. Catching a mosquito in a room is an example of a dynamic target presented on a static background. The furniture in an apartment probably does not move around while we are trying to catch a mosquito that is flying around. In this case the mosquito is the dynamic stimulus and the furniture provides the static background. Water in a sea rolls around all the time so it can hardly be considered a static background. Whereas an unconscious body floating on the water surface is rather static. So searching for a body in sea is an example of a static stimuli in a dynamic environment.

3 Keys on our messy table are not moving around just like the rest of the noise around. Therefore this is an example of a static stimulus in a static environment. There are many studies exploring the functionality of the human brain in lab- oratory conditions without unwanted distractors. Najemnik and Geisler [2005]1 found that humans achieve nearly optimal performance while finding a static ga- bor patch target in 1/f noise. Work by Sebastian et al. [2017] also measured the detection of a static stimulus in a static scene. All those experiments resemble the task of finding keys on a messy table. But how will the results change, if we add motion to the scene and stimulus? Dorr et al. [2010] suggests that static images do not trigger the same eye movement as moving scene. At the same time there has been a research which used dynamic environment. Kimmig et al. [2008] used a simple dot as a stimulus on a black background to observe eye movement. Results of such experiment could be different if there were more natural like stimulus and scene used. KristjAnsson´ et al. [2009] studied search strategies used to detect a drifting gabor patch among distractors. Search strategy can be altered in presence of dynamic background as a distractor. We plan to explore the remaining three situations using pink noise and gabor patches. We decided against using videos to create dynamic scenes because of Dorr et al. [2010], who concluded that professionally cut material is not very representative of natural viewing behavior. We design a method to dynamize pink noise as well as a gabor patch. In this work we introduce the problematics of object detection. We explain the differences between real life and laboratory experiments. We present methods that are used in vision labs and the general idea behind them.

Goal of the work

The aim of this thesis is bringing the laboratory conditions closer to real life while preserving the control over the environment. To do so we introduce sev- eral methods to dynamize the 1/f noise and also gabor patches. New methods should provide more realistic dynamic environment which has not been studied before. Such environment could bring us closer to the understanding of the neural processes happening in the human brain.

1Later followed by Geisler and Najemnik [2005]

4 1. Theoretical background

In the first chapter we introduce some background theory that ought to help us understand the motivation behind this work. We briefly introduce the visual sensory system. Then we present two complementary processes object detection and camouflage. We compare object detection from the computer and from hu- man perspective. We include a chapter dedicated to an overview of camouflage systems in animal kingdom. At the end of the chapter we talk about motion detection. We tackle the problem from both the computer and human perspective. We compare human vision and human ability to perceive and analyze motion.

1.1 Visual sensory system

Visual sensory system is a multilayered sensory nervous system used to interpret the surrounding environment visually. The processed signal consist of light with different intensities and wavelengths. The organ responsible for processing the signal is the eye. The light enters the eye through the cornea and is focused by the lens onto the retina, a light- sensitive membrane at the back of the eye. The light is then being processed by photoreceptive cells in the retina. There are two types of cells in the retina - rods and cones (see Bear [2007]). Rod cells are most sensitive to green wavelengths of light. The images gen- erated by rod stimulation alone are relatively blurred and confined to shades of gray. Rod vision is commonly referred to as scotopic or twilight vision because in low light levels it enables individuals to distinguish shapes and the relative brightness of objects. Baylor et al. [1979] performed an experiment with which they managed to trigger a response of rod cells using only a single photon. On the other hand rod cells are unable to assess object’s color. Cones consist of three different types of cells. Each one of the cells responds to a distinct wavelength peak. Every peak represents a different color. That way cones allow perceiving the environment through photopic vision which is the true color vision. The vision is dominant at normal light levels, both indoors and outdoors. Neural impulses produced by the photoreceptive cells are then transmitted by the optic nerve, from the retina upstream the optic chiasm in the brain. There the nerve signal is decussated1, and sent into the LGN. The signal from LGN then travels into the visual cortex (see Bear [2007]). The signal is then further processed in the visual cortex. Marˆcelja[1980] and later Daugman [1985] found that the receptive field profiles of the cells inthe visual cortex are well described by the members of the gabor function family. That is why gabor patch is a widely utilized stimulus for testing in visual labs.

1Information from left eye goes to the right part of the brain and vice versa.

5 1.2 Object detection

The main function of the visual system is object detection. Object detection is a task that combines object localization and object classification. Object classifica- tion involves assigning a label to the found object. Process of finding the object to classify is called object localization. State of the art computer detectors use CNN (Convolutional Neural Net- work). Those detectors are able to perform very well on various object detection tasks. Big disadvantage of such models is that they usually work as a black box. Therefore we do not know how the model actually learns and performs the classification. Latest research by Geirhos et al. [2019] and Singh et al. [2020] shows that CNNs are often strongly biased towards recognising textures rather than shapes2. This behavior is in stark contrast to human behavioural evidence (Singh et al. [2020] and Elder and Velisavljevic [2009]). This means that current state of the art object detectors implement fundamentally different classification strategies to humans. Sebastian et al. [2020] argues that if we take into account all of the dimensions of stimulus variation that occur under natural conditions when performing object detection we are fairly far away from understanding all the underlying processes in our visual and cognitive systems. There is no universal search strategy implemented by humans. Every human utilizes different strategy that is often task specific. Azizi et al. [2016] foundthat playing action games can alter search strategies of an individual. Human object detection is also affected by stereopsis. Ponce and Born [2008] studied the effect of stereopsis on target detection. The research concluded that stationary target perceived through one eye is invisible. But when perceived by both eyes the target’s outline popped out from the background. Wardle et al. [2010] learned that targets and masks presented at different depths activate distinct populations of disparity-tuned neurons. Such activation led to the target being more apparent than the target masked by a mask at the same depth.

1.3 Camouflage

Camouflage is the process of concealing the texture of an object. Suchcon- cealment slows down or even disables the observers ability to detect the object. Therefore camouflaging is a complement method to object detection. Studying camouflaging patterns and methods implemented by animals provides useful find- ings which help us understand the mammalian vision system and the detection process. In nature there are many examples of camouflaging behavior. Castner and Nickle [1995] found bush crickets that are mimicking leaves. Leopards’ spots make them hard to track. And there are many others. Most of those methods exploit the fact that the visual system is not perfect and is easily fooled. There is a lot of discrepancies among the classification of camouflage tech- niques. Ruxton et al. [2004] mentions four major ones, whereas S. and Cott

2Singh et al. [2020] suggests that the texture dependency tends to vary between datasets.

6 [1940] defines three categories that are principally different. Due to theaimof the thesis and its application to vision labs we have decided to separate camou- flage into three following categories.

• Background blending

• Disruptive coloration

• Countershading

Figure 1.1: Examples of camouflage in nature - background blending, disruptive coloration, and countershading

Same as Ruxton et al. [2004] we are separating the techniques by the flaw of the vision system it is trying to exploit3. To preview each category see Fig.: 1.1. We explain the idea behind every technique in separate section for every type of camouflage.

1.3.1 Background blending Background blending is a technique where the concealed object gets the color that matches its surrounding. It is one of the most common techniques in the animal kingdom. Examples of such masking are also seen with various army uniforms. Background blending by itself is not enough to properly conceal the object. Merilaita and Lind [2005] states that resemblance of the background is an im- portant aspect of concealment. But at the same time coloration matching only a random visual sample of the background is neither sufficient nor necessary to minimize the probability of detection. This is further researched in Schaefer and Stobbe [2006]

1.3.2 Disruptive coloration Disruptive coloration is a method to break up the clear boundary at the edge of the masked object. Visual objects are recognized by their outlines (see Schmidt- mann et al. [2015]). If the camouflage breaks the outlining boundary the ob- server’s visual system is tricked by the presented visual cues. Now the observer

3Ruxton et al. [2004] adds counter-illumination, transparency and silvering. Those kinds of camouflage appear only in marine animals and are similar to the other techniques intermsof exploiting visual system.

7 thinks that the object has a different shape and therefore has a harder time iden- tifying it. This idea is supported by Elder and Velisavljevic [2009] who suggest that the fastest mechanisms underlying animal detection in natural scenes use shape as a principal discriminative cue. Stevens and Cuthill [2006] shows that the disruptive coloration exploits edge detection algorithms that mammals use to model early visual processing. Price et al. [2019] suggests that disruptive coloration may be an effective means of concealment in complex backgrounds. Especially where there are many background patterns present. In complex background the object would have to be able to switch between different patterns to blend in perfectly. Research on moth- like targets by Stevens et al. [2006] suggests that disruptive coloration reduces the necessity to match the background perfectly.

1.3.3 Countershading Countershading is one of the most common visual characteristics of animals. As the name suggests it is a pattern with darker pigmentation on those surfaces exposed to the light most. It is believed that this form of pigmentation reduces the detectability of animals Rowland [2008]. When perceiving 3D objects the brain automatically expects the top part of the object to be lighter thanks to the sunlight. And at the same time the bottom part of the object to be darker because it is shadowed by the upper part. Countershading works against this brain’s expectation and instead of appearing three-dimensional the counter shaded objects appear rather flat.

1.4 Motion detection

In contrast to object detection, motion detection is a task performed solely in dynamic scenes4. It is a task of detecting moving objects in a given dynamic scene. We use the general notion used when talking about motion detection. The moving object is denoted as the foreground and the static scene is the background. The task is then determining what regions of a given scene frame5 are the background and which ones are the foreground.

1.4.1 Human motion detection There is one specific area in visual cortex designed to detect motion. Simoncelli and Heeger [1998] states that there are areas designed to detect orientation and direction, and areas designed to detect motion (both speed and direction). Visual sensory system uses a lot of cues to detect motion. So not everything we perceive as a motion involves movement in the real world. One such visual cue is a so called speed line. A speed lines are lines that convey the impression of movement. Geisler [1999] and Burr and Ross [2002] found that humans actually perceive motion from speed lines. Interestingly enough binocular vision does not provide any significant help with detecting motion. McKee et al. [1997] argues that binocular vision does not greatly enhance motion detection in noise. 4Videos or real-time captures in computer, real life situations for human 5Frame is a snapshot in time of the given scene

8 In their work Cass et al. [2011] learned that a quick flicker (not a motion) managed to catch human’s attention. This experiment suggests that humans use some sort of frame differencing method when perceiving the environment. Wallach [1935]6 studied the perception of motion of parallel line segments7. From this study we learned that the perceived motion depends on the perceived contour of the object as well. In literature this phenomenon is known as the aperture problem.

1.4.2 Computer motion detection The simplest computer algorithms rely on static cameras and static scenes and the only dynamic aspect in the video is the object that is being tracked. To detect motion in such situations we can use a simple frame difference algorithm. This algorithm treats the last8 frame as a background9 and performs pixel wise difference with the current frame. The areas where the difference is non-zero are then marked as foreground while the rest is background. There are many approaches to the improvement of the performance of simple background subtraction. Improvements are needed to tackle changing lighting conditions, high memory requirements and generally very high sensitivity to noise. Many of those are building on top of gaussian mixture models10. A work by Chen et al. [2015] uses background differencing aided by SSIM.

1.4.3 Motion analysis Rarely we want to perform motion detection only to learn that there has been some motion without analyzing that motion further11. Motion analysis deals mostly with tracking and trajectory estimation. Whether we are talking about predators hunting prey, drivers avoiding other drivers on the road or catching a ball during a game, motion analysis is an everyday task. Derrington and Suero [1991] in their work suggest that the visual system computes the motion in several stages. First estimating the motions of the com- ponents, and then combining them to create a bigger picture. Current research by Bill et al. [2020] and Gershman et al. [2016] into this field suggests that there really is some sort of hierarchical structure to the perceived motion in the scene. Bill et al. [2020] states that a real world complex dynamic scene is often decom- posed by the visual system into simpler parts. Hughes et al. [2017] learned that using a specific moving pattern on perceived stimuli would affect the observed perception of motion of the target itself. This only strengthens the initial results of Derrington and Suero [1991]. The hierarchical structure ought to help us with processing several signals at once and possibly filter some out. Humans are able to coherently group flying

6Original in German translated to English as Wuerger et al. [1996] 7See barber pole illusion 8Or we can take more than two and compute the frame using average or some other methods - see Herrero-Jaraba et al. [2003] 9If we know how the does scene look we can use background subtraction. The idea in this algorithm is the same. 10Eg: Zivkovic [2004] or Chan et al. [2010] 11Tasks without analysis are e.g. automatic doors or lights.

9 flocks of birds and perceive them as one object with one trajectory (Gershman et al. [2016]). This system probably prevents overflooding the visual system with information. Such hierarchy does not always have to be useful. Generally speaking motion breaks camouflage of targets because it catches the attention (Franconeri and Simons [2003]). But Hall et al. [2013] and Cave and Chen [2016] learned that if the target is surrounded by several similar targets the tracking becomes a lot harder and the camouflage slows down the process of identification.

1.4.4 Motion dazzle Motion dazzle is a pattern utilized by many animals. But in contrast to cam- ouflaging patterns motion dazzle is not a masking pattern intrinsically. Onthe contrary objects with motion dazzle patterning are easier to spot and distinguish. However the pattern is believed to confuse the vision system and make the object very hard to track. Spering et al. [2005] links contrast values with tracking ability. A target of low contrast makes it difficult for the pursuit system to reliably estimate the velocity of the target. How and Zanker [2014] argues that the motion signals generated by the mo- tion dazzle pattern could be a misleading source of information. This hypothesis is further strengthened by Hughes et al. [2017] whose research shows that moving stripes of gabor patches can actually alter the expected motion of the target. Those results are aligned with findings by Hall et al. [2013] and Cave and Chen [2016], and suggest that motion dazzle patterns create an illusion of several dis- tinguished features that make the animal hard to trail. Having said that the latest research by Hughes et al. [2019] suggests that motion dazzle might not be such an effective way of tracking prevention as was previously believed. Hughes et al. [2019] used genetic programming to evolve pat- terns that are hard to catch12. The results show that low contrast and featureless patterns offer the greatest protection against capture when in motion. Thisis the same conclusion as in Spering et al. [2005].

12A simple game used to evaluate this experiment: http://dazzle-bug.co.uk/

10 2. Technical background

In this chapter we go through the technical background of the thesis. We cover terms that are related to the very implementation of the proposed tool. At first we introduce our notion of an image. Then we talk about pink noise and why it is useful for us and how we generate it. Next section is dedicated to gabor patches and our implementation of their generator. At the end of the chapter we present an analysis of Structural Similarity Index Measure.

2.1 Image properties

In the tool an image is a 2D array of floating point numbers whose values range from 0 to 1. Every item in the array represents a luminance intensity of a single pixel of the image. So 0 means black pixel, 1 means white pixel. It is up to the output handler whether this array will be mapped to 0 − 255 interval (8-bit) or some other representation1. The range of the intensity array is called a dynamic range. The dynamic range tells us how many different shades can there be in the picture. 8-bit images can display 255 different shades of gray. Grimaldi et al. [2019] shows that after some basic early transforms in the eye the statistics of an image become virtually independent on the dynamic range. The coordination system of the image follows the window coordinate system used in computers. In this system the x coordinates grow from left to right and y coordinates grow from top to bottom. We set the origin into the center of the patch. The x, y coordinates range from −0.5 to 0.5. This means that the upper left corner of an image has coordinates [−0.5, −0.5], and the bottom right corner has coordinates [0.5, 0.5]. This notion helps us define several functions easier.

2.1.1 Luminance Luminance is a photometric measure of the luminous intensity. In our case it describes the amount of light that is being emitted from a pixel. The SI unit for cd luminance is candela per square metre ( m2 ). But we are unable to work with this unit. The final luminance of every pixel depends on the hardware which is used to run the experiment2. Fortunately we do not need to define any special units. Since we are dealing only with grayscale images we say that the luminance of a pixel is its intensity. We always work with intensities of pixels in relation to other pixels. Pixels with high intensity shine brighter than pixels with lower intensity. Using this notion of intensity we define the luminance of an image as the mean intensity of all its pixels.

∑︁image Ip L(image) = p size(image)

where Ip is the intensity value of the pixel p.

1Possibly even different palette than black and white - see Fig.: 3.1. 2Fine tuning the hardware is hard since the perceived brightness depends on a lot of factors - see Yang and Purves [2004]

11 2.1.2 Contrast Contrast is the difference in luminance, or in our case intensity. It makes objects recognizable. When perceiving the real world humans use contrast to determine the difference between separate objects within the same field of view. Compared to absolute luminance contrast provides better cues to the human visual system. That means that humans can perceive the world similarly regardless of the huge changes in illumination over the day or from place to place. There are many ways to define contrast. Generally luminance contrast can be represented as a ratio of the difference between the luminance of the target to the luminance of background. Fechner [2016] defines a contrast as

It − Ib

Ib where It is the luminance of the target, and Ib is the luminance of the back- ground. This fraction is called the Weber contrast. Similar notion is used by Michelson [1995] who defines contrast as

Imax − Imin

Imax + Imin where Imax and Imin are the highest and lowest luminance values respectively. Such definitions of contrast are not useful for our case because all the values used to compute them are either constant or of little variance. We opted for the standard definition of RMS contrast as described in Peli [1990]. RMS contrast is a standard deviation of the intensity of all pixels in a given image. ⌜ ⃓ image ⃓ 1 ∑︂ 2 RMS(image) = ⎷ (Ip − L(image)) size(image) p The RMS contrast is suitable for our case due to its sensitivity (see Kukkonen et al. [1993]).

2.2 Pink noise

Natural scenes provide a lot of distractors. Such distractors can impose a bias on the results of laboratory experiments. For example Castelhano and Heaven [2010] found that details in natural scene improved guidance in search tasks. Reeder and Peelen [2013] learned that humans searching for familiar object categories in natural scenes are mediated by spatially global search templates that consist of view invariant shape representations of category diagnostic object parts. And later Reeder et al. [2015] found that the attentional capture in such scenes is involuntary. If we want to remove the effect of natural distractors we need to utilize anoise that does not contain complex features that could temper with the results. But at the same time should share as much with natural scenes as possible. There are many different types of noise3. We focus mainly on 1/f noise also 3Often named after the color of the visible light with the same power spectrum as the noise.

12 called the pink noise. According to Szendro et al. [2001] pink noise is, thanks to its stochastic properties, one of the most common signals in many biological systems4. Its 2D form is an useful abstraction for human vision experiments as it shares many statistical properties with natural scenes. Field [1987] showed that amplitude spectra of natural images falls off by a factor of/f 1 5. Which is the same as the pink noise. Such noise does not contain recognizable objects or regions, but does contain complex random features - see Geisler [2008]. Complex features can catch the attention of a human observer and can affect the search task. The term 1/f noise is often incorrectly used to describe any noise with a power spectral density of the form: 1 S(f) ∝ f α where f is the frequency of the signal, and 0 < α < 2. Correct way to denote those signals is using 1/f α. In many cases the α is omitted from the name and therefore potentially confusing term 1/f noise is used instead of the correct 1/f α. Compared to the pink noise, the white noise has constant power spectral den- sity. Brown noise’s6 spectral density is inversely proportional to f 2. Therefore α = 0 for white noise and α = 2 for brown noise. 1/f noise then lies some- where in between the white noise and the brown noise (Halley [1996]), with white noise having no correlation in time and brown noise having no correlation among increments. The brown noise is an integral of the white noise (Gardiner [2009]). The integration of the signal increases the exponent α by 2 whereas the inverse opera- tion of differentiation decreases it by 2. Therefore, 1/f noise cannot be obtained by the simple procedure of integration or of differentiation of the two noise sig- nals. Moreover there are no simple even linear stochastic differential equations generating signals with 1/f noise. To generate pink noise we are using inverse fourier transformation as described in Timmer and Koenig [1995]. The described algorithm starts with generating two values for each coordinate pair7. Generated values should be drawn from the gaussian distribution and create a complex number of them. One of the generated values serves as the real part and the other as the imaginary part of the desired complex number. To obtain these two values we use Box-Muller transform Box and Muller [1958]. Box-Muller transform is a method for generating pairs of independent random values which follow the gaussian distribution. The transform takes a source of uniformly distributed random numbers and transforms it into a series of pairs. The method performs a fourier transformation of the uniformly generated in- put. The pairs are then the real and the complex value of the produced fourier transformation. In order to get the values following the gaussian distribution needed to con- tinue with generating the pink noise we generate a white noise. The noise is then 4Also Halley [1996] mentions that pink noise might be the best null model of environment variation 5Tolhurst et al. [2007] later argued that natural images rather follow a different slope. 6Sometimes also referred to as a random walk noise 7In the paper denoted as frequency - ω

13 transformed using fourier transformation. That way we get a complex number whose real and imaginary parts are both random variables following the gaussian distribution - see Box and Muller [1958]. The next step of the algorithm by Tim- mer and Koenig [1995] is multiplying the fourier transformation by the power law spectrum. The power law spectrum is of form

√︄ β 1 (︃ 1 )︃ 2 S(ωi) ∼ 2 ω Omega is the distance of the pixel from the center (see 2.1) of the resulting image

1 (︄ )︄ 1 (︃ 1 )︃ β 1 β = √ ω x2 + y2 and after substituting β = 1 we get the final equation: 1 1 = √ ω x2 + y2 The resulting multiplication is the fourier transformation of the desired pink noise. In order to get the pink noise we have to perform a backward fourier trans- formation which shifts us from the frequency domain back to the coordination domain. We used this method to generate all the base pink noise.

2.3 Gabor patch

The idea behind gabor patches comes from gabor filters (Gabor [1946]). Gabor patches are simple stimuli that drive early visual activity in a controlled fashion. The patches look like a series of parallel black and white bars. The bars can be oriented any way, they can be made easily discernible or difficult to see, sparse or dense. This variety makes them very convenient for our case. A gabor patch is constructed using a specific gabor function. Generally speak- ing a gabor function is a normalized product of a gaussian function and a complex sinusoid. For our use case we did not need a complex sinusoid. We worked with a real sinusoid. Gabor function for real values is expressed as

t−u 2 −π( s ) gγ(t) = K(γ)e cos(ω(t − u) + θ) Where γ = {u, s, ω, ϕ}. The K(γ) is the normalizing factor. In literature it’s 2 −π t−u used to normalize to ∥g(t)∥ = 1. The e ( s ) is the gaussian function. And the cos(ω(t − u) + θ) is the real sinusoid. The gabor patch is a 2D representation of a gabor function. It is a composition of a sinusoidal grating which creates the bars, the gaussian cutout, which softly cuts out the grating into the round shape we desire, and the normalization. We go through every part separately and explain the math behind it in following sections.

14 2.3.1 Sinusoidal grating Sinusoidal grating is a sinusoidal wave transferred into a plane. We are able to alter the phase, orientation and frequency of the said wave. We want our sinusoidal function to take in two parameters x, y - coordinates of pixel intensities in the output patch. We also want our sinusoidal function to be in the form of cos(ω(t−u)+θ) to match the general gabor function definition. The gabor function’s grating takes 4 parameters. We omit the offset u. We are only dealing with a fixed coordination system, we have no need for this parameter. Therefore we set u = 0. The frequency ω handles the density of the bars. The phase θ denotes a shift of the bars in orthogonal direction. Those are varying parameters that help us create the dynamic environment. The last parameter is t. In general case that is time. In our case it is not the time in the common sense, but rather a state in which the function currently is. We create a direct mapping of coordinates to the said state. The mapping function m(x, y) = t arranges all pixels giving them the order on the timeline of the gaussian function. That order depends on the desired orientation of the gabor patch. The function m weights all the coordinates depending on the defined orien- tation theta. m(x, y) = x cos(θ) + y sin(θ). The gradient of the order (given by m) is orthogonal to the final bars of the gabor patch. To get vertical stripes we choose low θ which gives low weight to the y coordinates and only x coordinates is taken into account. Therefore all pixels are ordered by their x coordinates. Our final grating function is then:

grating(θ, ω, ϕ)(x, y) = sin(ω(x cos(θ) + y sin(θ)) + ϕ)

2.3.2 Gaussian window The window is a simple 2D gaussian function used to restrict the shape of the grating. In the experiment we want to use round patches with no hard edges. To accomplish that we created an envelope8 which outlines our extremes closer to the edge of the final patch. There are many possible approaches to computing such envelope. The most straightforward is nullifying the intensity of pixels that are too far away from the center of the patch. This is called the box or Dirichlet window. ⎧ √ ⎨0 if x2 + y2 ≥ T Iˆxy = ⎩Ixy else

Where√ Ixy is the intensity of the pixel x, y before applying the envelope func- tion, x2 + y2 is the distance of the pixel from the center of the patch (see 2.1), and T is a threshold parameter defining the size of our patch. This approach produces hard edges that make the patch stand out from the scene. We need an envelope that would have some sort of fade out around the T .

8Term envelope and window can and usually are used interchangeably

15 To create the fade out effect we can use linear function. The patch fades out linearly by the ratio of pixel distance from the origin to the size of the patch (T ). This is called the triangular or Bartlett window. ⎧ √ 0 if x2 + y2 ≥ T ˆ ⎨ √ Ixy = x2+y2 ⎩ Ixy T else This approach creates the desired fade out effect but at the same time the center of the image is affected a lot. We would like to have the center of the patch mostly unchanged and only the edges blurred out. In signal processing there are many windows that can provide smooth fade out on edges while preserving the center of the patch unchanged. There is hanning window, Tukey window, Kaiser window, and many more. For overview of window function see Prabhu [2014]. We decided to use the gaussian cutout which is a part of the definition of the gabor function. The intensity change is a function of distance from the center. Pixels that are at the center of the patch remain mostly unchanged and the intensity is lowered the further we are from the center of the patch. Gaussian function in the gabor function is defined as follows:

2 −π t−u e ( s ) We want to construct a function of form c(x, y) that would take in two pa- rameters x, y - coordinates of pixel intensity in the output patch, and return the value of the gaussian cutout. Same as for grating we omitted the offset parameter u. Now we only want to create a mapping m(x, y) = t where t is the distance from the center of the gabor patch. Unlike with grating we do not have to worry about the orientation 9 because the distance is invariant to rotation . √ The final mapping has a form m(x, y) = x2 + y2. And our final cutout function is then:

(︄ 2 2 )︄ x2+y2 x y −π 2 + cutoutσ(x, y) = e σ = exp −π σ2 In order to speed up the computation and make the code a bit cleaner we fit all constants into the σ parameter. That gives us even more simplified version which can be found in the code:

(︄ 2 2 )︄ x2+y2 x + y π cutoutσˆ(x, y) = e σˆ = exp , where σˆ = − σˆ σ2 Thanks to the declining intensity of the pixels it is hard to measure the real size of the gabor patch. The patch has fuzzy edges which may be considered a part of the patch by some and excluded by others. In our work we are trying to get as close to the desired size without creating hard edges in the final scene as possible.

9rotation around the center of the scene (see 2.1)

16 Normalization In the end the whole output is normalized using the standard normalization method.

2.4 Structural similarity index measure (SSIM)

Structural Similarity Index Measure 10 is a way to measure a similarity of provided images. According to Lin et al. [2013] SSIM is said to be correlated with human detection performance11. This knowledge makes SSIM a very useful similarity measure in vision labs. SSIM can be used to track objects (Fu et al. [2019]12). Chen et al. [2015] uses SSIM to cancel out noise when computing frame difference. Song and Geng [2010] showed that weighted SSIM13 can be a useful tool to evaluate and possibly design camouflaged textures. Those examples show that SSIM is a versatile method with many possible applications. The original proposed by Wang et al. [2004] separates the task of measuring image similarity into three comparisons that are then combined together.

S(x, y) = f(l(x, y), c(x, y), s(x, y)) Where f is the combining function which computes one similarity measure based on the three separate components. The inner parts of the combine func- tion consist of luminance comparison (l(x, y)), contrast comparison (c(x, y)), and similarity comparison (s(x, y)). The intention behind separating the similarity into three independent parts is that every part by itself represents one aspect in which the images can be similar. To demonstrate let us say that we proportionally change a luminance value of one of the inputs. We can say that the structure of the image was not affected14 by this change. In order to get a well behaved function we need the similarity function S to satisfy several axioms15:

• Symmetry - S(x, y) = S(y, x)

• Boundness - S(x, y) ≤ 1

• Unique maximum - S(x, y) = 1 ≡ x = y

The first axiom ensures that two images always have the same similarity measure no matter the order of the arguments. Boundness tells us that the SSIM function has maximum in 1. There are not two images whose similarity is more

10Term measure is used here in its intuitive form. It does not refer to the mathematical definition of measure and SSIM does not follow the measure axioms. 11Lin et al. [2013] showed this as a by-product of testing a new metric to measure similarity. 12Along with the usual euclidean distance. 13Weighted is a bit different form of SSIM than the one we utilized, but the principle of computing similarity remains the same. 14Provided we did not change the luminance in drastic way - eg. nullifying the whole image, or darkening just one color... 15It is important to not confuse the definition of SSIM function with a metric. Metrics follow different axioms.

17 than 1. The final axiom should ensure that the maximum similarity 1 is reached only when the two pictures are identical. In order to finalize the definition of the similarity measure function S, we first define the three functions l(x, y), c(x, y) and s(x, y), where each function represents a separate part of the comparison - luminance, contrast and structural similarity respectively. Lastly we define the combination function f(·) as well.

2.4.1 Luminance comparison Wang et al. [2004] defines the luminance comparison as follows:

2µxµy + C1 l(x, y) = 2 2 µx + µy + C1 The µ is the mean luminance of the first, respectively second image. The 2 2 constant C is introduced to ensure stability for values µx + µy close to zero. The 2 C is defined as Ci = (KiL) . K is an arbitrary number Ki ≪ 1. In our tool we used K1 = 0.01 just as in the original paper on SSIM. L is the dynamic range of the pixels. In our case that is 1. Thanks to symmetry of the used operations we can see that the symmetry axiom holds. The second axiom also holds16. And finally the luminance measure is 1 only if both input luminances are the same. Therefore luminance measuring does not break any of the required properties. It is also worth noting that the proposed function l is qualitatively consistent with Weber’s law. Weber’s law states that the relative changes in luminance are more apparent than absolute luminance changes.

2.4.2 Contrast comparison Contrast comparison follows similar formula as luminance

2σxσy + C2 c(x, y) = 2 2 σx + σy + C2 The σs are the RMS contrasts of the two inputs. The stabilizing constant is defined in the same manner as in luminance comparison. The only differenceis using a different K. We need to choose slightly bigger K since the variance is expected to be lower thus leading to bigger instability issues. In our tool we are using K2 = 0.03 just as in the original paper by Wang et al. [2004]. All three axioms of the SSIM still hold. The same reasoning as for luminance is used to show that.

2.4.3 Structure comparison We say that the image structure is invariant to the absolute values of luminance x−µx and variance. Therefore we normalize the inputs into two unit vectors σx y−µy and σy . Now we compare the structural similarity as the correlation of said vectors. The correlation of x and y is the same as the correlation of x − µx and

16To prove that we can use the formula for squaring binomials.

18 y − µy respectively. Using this equality we can arrive to the final definition of the structure comparison:

σx,y + C3 s(x, y) = σxσy + C3

σx,y is the correlation of the inputs. Contrast (σ) remains the same. The stabilizing constant is defined as C3 = C2/2. This similarity comparison does not take into account small transformations. If we try to measure the structural similarity of an image and its slight rotation17 proposed SSIM does not report very high similarity. This is something we look into later on.

2.4.4 Combining function Finally the combining function takes all three comparison functions and combines them proportionally according to relative importance of each property.

SSIM(x, y) = (l(x, y)αc(x, y)βs(x, y)γ) In our work we do not set α = β = γ = 1 as in the original SSIM paper18. Instead we try to estimate α, β and γ based on observations.

2.4.5 Complex wavelet SSIM - CW-SSIM The disadvantage of SSIM is that it does not perform well on scaled, translated or rotated images which are not structural distortions. It is due to the used structural similarity comparison which does not take such transformations into account and rather compares respective pixels directly. CW-SSIM is an improved method that takes into account small transforma- tions in images. It uses the same approach as base SSIM. It compares three distinctive properties of input images. The luminance and contrast remains the same and instead of the structural similarity based on correlation it uses complex wavelet transforms of the images to cancel out magnitude changes and impose higher penalty to inconsistent phase distortions. We utilized a method used in Sebastian et al. [2017]. The method uses phase invariant similarity based on fourier transformation. It is defined as the cosine of the angle between the fourier amplitude spectrums of both images.

AxAy s(x, y) = ∥Ax∥∥Ay∥

Ai is the fourier amplitude spectrum of image i. The amplitude spectrum is obtained by taking the complex absolute value of the fourier transformation of given image.

17Translation and scaling also changes the alignment of the pixels therefore affects the SSIM as well 18Sebastian et al. [2017] also used α = β = γ = 1

19 3. Methods

The aim of this chapter is to present methods we developed to create a dynamic environment. We focus on the environment further in this work. We introduce the thought process we used to develop the dynamic environment. We define the scenes’ background and the stimuli we utilized in our experiments. At the end of the chapter we describe the detection methods we used.

3.1 Background description

The background is the dominant part of the experiment setup. Our goal was to select a background which represents the natural scenes as close as possible without introducing a noise that could affect our measurements in an unexpected way. In the tool the background is a 2D array of floating point numbers which range from 0 to 1. Every item in the array represents a luminance intensity of a single pixel in the output video (or image). In order to generate videos we mapped the 0 − 1 range to 0 − 255 and created monochromatic video with 8-bit per pixel. Using this notion 0 intensity represents black pixel (0) and 1 intensity represents white pixel (255). To illustrate a different mapping we include Fig.: 3.1 which was not generated by mapping into a black and white palette but by mapping into a red and blue palette. In our experiments we utilized a pink noise as a background for all scenes. To dynamize the pink noise we needed to create a motion patterns in the background. We explored three means of such dynamization.

• Linear interpolation of genuine pink noise

• 3D pink noise

• Shifting a genuine pink noise in one direction

We introduce every method one by one. For each method we also present a spectral image. In this image we observed whether an image contains distinctive features or not. To demonstrate this observation we present a photograph Fig.: 3.2 with its corresponding spectral transformation. The photography contains several distinctive features. The features are seen in the transformation as the light lines going from left to right, from top to bottom, and from the top left corner to the bottom right one. A noise without distinctive features should not contain such lines.

3.1.1 Linear interpolation The dynamization of the pink noise is done by linear interpolation. We generated a genuine pink noise, then utilized the linear interpolation to create a dynamic transition from one genuine pink noise to another. 0With one axis used as the time

20 Figure 3.1: Example of pink noise mapped to red and blue palette

Figure 3.2: Original image and its spectral image

The linear interpolation showed good comparison to true pink noise (see Fig.: 3.3 and Fig.: 3.4). We observed a light drop in contrast value depending on the speed of change. This drop was probably caused by the chosen interpolation method. If we were to use a method that would be more asymmetric there would

21 most likely be no drop in contrast. In the video it is almost impossible to distinguish genuine pink noise from transition by the naked eye. Video examples of a dynamic pink noise background can be found in the appendix A.2.

Figure 3.3: Two images of a pink noise and an halfway interpolation of those

Figure 3.4: Spectral image of a pink noise and an halfway interpolation of those

3.1.2 3D pink noise The dynamization of the pink noise is done by slicing 3D pink noise. We cre- ated a 3D pink noise using algorithms that are mentioned in previous chapters generalized to 3 dimensions. This method created a pink noise with flickering effect. The flickering effect is not so apparent in the presented Fig.: 3.5. Spectral images of the pink noise can be seen in Fig.: 3.6. We advise the reader to rather see the video in the appendix A.2.

3.1.3 Shifting pink noise The dynamization of the pink noise is done by shifting a pink noise in one di- rection. We created a pink noise which was stretched in one direction. The dynamization is then realized by shifting a viewport along the stretched direc- tion. In Fig.: 3.7 we present three frames of pink noise that has been shifted in top-to-bottom direction (see Fig.: 3.8 for respective spectral images). To see this observe this noise properly please see the appendix A.2. In the end we have decided to use the interpolation method. Prior check of all the generated noise showed linear interpolation as the most naturally valid.

22 Figure 3.5: Three time slices of 3D pink noise

Figure 3.6: Spectral images of the time slices from Fig.: 3.5

Figure 3.7: Three time slices of a shifted pink noise

Figure 3.8: Spectral images of the time slices from Fig.: 3.7

Running pink noise is a better fit for experiments designed to test hypotheses regarding motion dazzle.

23 3.2 Stimuli description

We utilized a gabor patch for our stimuli. Our goal was to explore the behavior of dynamic scenes. To dynamize the gabor patch stimulus we selected the three properties the patch is defined with. The most basic property of the patch is the rotation (see Fig.: 3.9). Frequency represents the density of the lines in the patch (see Fig.: 3.10). Last adjustable property is the phase which changes the shift of the lines in their perpendicular direction (see Fig.: 3.11).

Figure 3.9: Three gabor patches with θ = −45 deg, θ = 0 deg and θ = 45 deg

Figure 3.10: Three gabor patches with ω = 5, ω = 10 and ω = 15

Figure 3.11: Three gabor patches with ϕ = 0, ϕ = π and ϕ = 2π

In the tool the patch is a 2D array of floats which range from −1 to 1. Every item in the array represents a change in luminance intensity. Negative values imply darkening, values around zero mean no change in the scene and positive

24 values mean lightening. This way both extreme values (dark and light) can affect the final scene1. We used simple addition and clipping to add such patch into the scene. After finding the location of the patch we added the respective intensities (luminances) of the scene and the patch and then clip the result to fit into the [0, 1] interval 2. One big disadvantage of this approach is that the pixel intensity change can be acute. In order to lower this effect re-scaled the patches before adding them to the scene. The scale of the patch let us modify the opacity of the stimulus. In all but the first experiment we scaled down the stimulus intensity3 bytwo . The first experiment was aimed at finding the correct value for this scale.

3.3 Scene description

In the experiments we were working with dynamic scenes. The background always is a pink noise or an interpolation of two pink noises. The period at which a new pink noise is generated is fixed ons 1 . Every full 1s a proper pink noise is displayed. Every other time there is a linear interpolation of the last pink noise and the next one. The scene is square sized with 200px per side. The observations last for 10s. In the scene there is a single instance of a stimulus. The size of the stimulus matches the size of the scene. The placement of the stimulus varies from experi- ment to experiment. We used both the stimulus and the original background for control measurements.

3.4 Detection approach

There are many approaches to detecting objects. We decided to use SSIM to analyze our scenes. Firstly we observed how various scene properties change in time and whether there is any observable effect of the motion on those properties. In our experiments we compared the following properties of two given scenes as described in Wang et al. [2004] - luminance, contrast, structural similarity, and phase invariant similarity described in Sebastian et al. [2017]. Stabilizing constants were set to the same values that were used in the original by Wang et al. [2004]. K1 = 0.01 and K2 = 0.03 In the experiments we measured the similarity between a scene with stimulus, scene without a stimulus and stimulus as a standalone object. We conducted three measurements for every scene. See Fig.: 3.12 to observe the separate objects.

• scene with a stimulus vs stimulus as a standalone object

• scene without a stimulus vs stimulus as a standalone object

• scene with a stimulus vs scene without a stimulus

1This is called partial masking. It helps blending the stimulus into the scene and thus simulate more realistic situation - Sebastian et al. [2020] 2The same interval that was used for the scene originally 3That means the stimulus intensity was in [−0.5, 0.5] range.

25 Figure 3.12: Picture of a scene with and without stimulus and one single stimulus4

That way we were able to provide control measurements, and to show whether the designed method works or is just a false positive. First measuring tells us whether the stimulus is recognizable in the scene or not. Second measuring detects false positives. Third measure is a control one and shows us how well the stimulus is hidden in the scene. It is worth mentioning that in Sebastian et al. [2017] they were using a hanning window to restrict the area of interest. The hanning function is a function whose usage is similar to the gaussian cutout used for generating gabor patches. It is used to lower the perceived intensity of pixels at the edge of the area of interest. Same as the gaussian cutout it creates a round area of interest. In our experiments we were not sure whether the stimulus is present at the center of the area of interest5 or not. Sebastian et al. [2017] did not consider uncertainty of the location of the target. Hanning window could potentially mask a big portion of the stimulus. Therefore we decided to not use the hanning window to restrict our area of interest.

4The stimulus has been added to the scene with 0.5 scale - see above. 5Whole scene or sliding window

26 4. Preliminary experiments

The experiments in this chapter serve primarily as a proof of concept whether the selected properties can actually help us detect the stimuli or not, and what hyper parameters we are to select to get good results. In the experiments we are not trying to detect the stimulus but rather observe how various scene properties change in time and whether we can use them to locate the stimulus in a bigger scene. We want to test the behavior of detection methods before applying them on a larger scale. In the first experiment we adjust the values of the alpha, beta and gamma coefficients needed to fine-tune the SSIM described in Wang et al. [2004]. Wealso test the scale of the stimuli intensity. Last variable we change in this experiment is the placement of the stimulus. We try four different positions so we can see how well the SSIM performs. In the second experiment we compare two approaches to measure the SSIM. One being the original proposed in Wang et al. [2004] and the second one being phase invariant used in Sebastian et al. [2017]. In order to utilize the power of phase invariant check, and to learn the threshold at which those similarity measures are effective, we moved the stimulus gradually out of the picture at different rates. The two following experiments focus more on the detection part of the work. In former experiments we used the stimulus which was present in the picture to detect it. Which is much easier task because we are using a very good visual cue. In our final experiment we want to be able to detect stimulus without knowing its position, orientation, phase or frequency. In the third experiment we used several pre-generated stimuli of different rotation, phase and frequency to detect the one present in the scene. We then compare whether we are able to detect stimuli patterns in the scene without knowing the pattern of the particular stimulus occurring in the scene. The fourth experiment utilized the same approach to detection used in last experiment. In this case there is a moving static patch instead of non-moving dynamic one. This experiment tells us how precise the chosen methods are.

Experiment Goal Method Simple POC - 4.1 adjust SSIM values SSIM and CW-SSIM SSIM precision - 4.2 estimate precision of SSIM SSIM and CW-SSIM Dynamic stimuli - 4.3 explore SSIM with pattern pattern search Moving stimuli - 4.4 explore SSIM with pattern pattern search

For every experiment we provide several plots similar to Fig.: 4.1. On y axis there is time. On x axis there is the similarity measure. Every plot line corresponds to one comparison (for more information see 3.4). Method used to measure the comparison is displayed at the top of the plot.

• scene containing a stimulus vs stimulus (blue)

• scene vs stimulus (orange)

• scene containing a stimulus vs scene (green)

27 ssim values 1.0

0.8

0.6

0.4 Values 0.2

0.0

0.2

cw_ssim values 1.0

0.8

0.6

0.4 Values 0.2

0.0

0.2 0 2 4 6 8 10 Time [s] Gabor noise vs Gabor Noise vs Gabor Gabor noise vs Noise

Figure 4.1: Example plot

Higher similarity means that the two compared objects are more similar. Lower similarity means the opposite. According to the Fig.: 4.1 we conclude that according to SSIM a scene without a stimulus and stimulus (orange) are not that similar. Whereas a scene with a stimulus is very similar to the scene without the stimulus (green). The orange line denotes how much does the stimulus stand out of the scene. High values leads to false positive measures. The green line denotes how well is the stimulus hidden in the scene. Low values mean that the stimulus is very apparent in the scene and disrupts the overall visual of the scene. The blue line is the measure comparison we pay the most attention to. Higher values mean that the chosen method is very good at detecting the stimulus.

4.1 Simple POC

This experiment provides us with some general overview of the methods and prop- erties we measure. We tried to determine which SSIM parameters are valuable to our main experiment and also helped us design the following experiments. One big disadvantage of SSIM is its natural disability to scale or work with small transformations of the target pattern1. In order to work around that dis- advantage we used phase invariant similarity described in Sebastian et al. [2017]. In this experiment we include the phase invariant similarity measure along the regular SSIM parts. The position and intensity scale of the stimuli varies. There are four position settings we observed • full stimulus in the middle of the scene • stimulus positioned at the lower edge of the scene (thus showing just the top half of the stimulus) 1Wang et al. [2004]

28 • stimulus positioned at the right edge of the scene (thus showing just the left half of the stimulus)

• stimulus positioned at the lower right corner of the scene (thus showing just the top left quarter of the stimulus)

There are three intensity scale settings we observed

• Low setting 0.2

• Normal setting 0.5

• High setting 0.8

In order to assign weights to the parts of the SSIM we observed each of the parts individually.

• Contrast

• Luminance

• Simple structural similarity

• Phase invariant similarity

The resulting plot of all SSIM parts are showed in Fig.: 4.2.

Luminance values 1.0 0.8 Gabor noise vs Gabor 0.6 Noise vs Gabor 0.4 Values Gabor noise vs Noise 0.2 0.0

Contrast values 1.0 0.8 Gabor noise vs Gabor 0.6 Noise vs Gabor 0.4 Values Gabor noise vs Noise 0.2 0.0

Structure values 1.0 0.8 Gabor noise vs Gabor 0.6 Noise vs Gabor 0.4 Values Gabor noise vs Noise 0.2 0.0

CW_Structure values 1.0 0.8 Gabor noise vs Gabor 0.6 Noise vs Gabor 0.4 Values Gabor noise vs Noise 0.2 0.0 0 2 4 6 8 10 Time [s]

Figure 4.2: Plot for 0.5 stimulus scale and displaying full stimulus.2

The conclusion of this experiment is that luminance has little to none pred- icative value. Whereas contrast, if compared to the rest of the scene, can be used to locate a small stimulus in a bigger scene. It just has a negative impact. The last plot shows that the structural similarity might prove very useful when

2All plots for all scenes along with videos showing the experiment are part of the appendix A.2.

29 detecting stimuli because it is easy to tell the difference between a scene with a stimulus (blue) and a scene without a stimulus (orange). This experiment also tells us that the stimuli are camouflaged fairly well in the scene (green). Based on this experiment we set coefficients for SSIM as follows:

• α = 0 luminance has no impact

• β = −0.5 contrast has negative impact

• γ = 0.5 similarity has positive impact3

4.2 SSIM precision

In later experiments we were detecting a small stimulus in a large scene. This detection used a sliding window which scans the whole scene. For efficiency reasons we wanted to set the window slide step just as big as possible. Although small slide steps are more precise the amount of time needed is very big. Stepping by one pixel would need us to test O(W ∗ H) windows (where W and H are the width and height of the scene). To estimate how big the sliding step of the window should be we present a simple experiment. We measured the deterioration of the SSIM precision. In the scene there is a single stimulus put into the middle, but in time it slides off to the edge. We observe how the measured values change in time. Forevery axis we test four different speeds resulting in 16 experiments in total. Results for stimulus going right are in Fig.: 4.3, and results for stimulus going down and right are in Fig.: 4.4 In the plot we notice one thing - the difference between detection (blue) and control (orange) for simple SSIM oscillates between positive and negative values before stabilizing around zero. CW-SSIM difference does not oscillate at all. The oscillation is caused by aligning phase shifted stimuli. Zero values for SSIM means that the images are not similar. High negative values means that the images are inverse of each other. When we moved the stimulus by half a phase 6 had a scene with white bars where there are black bars in our reference stimulus. This way the scene and reference were complements of each other. Thanks to this we got high negative values of SSIM. The same oscillation that exhibited SSIM did not occur in CW-SSIM. The absence of oscillations shows that chosen similarity index7 indeed does perform better than SSIM if there are simple transformations applied to the target pat- terns. By observing the speed with which the difference between the two SSIM mea- surements narrows we concluded that in order for SSIM to be conclusive we have 2This breaks the axioms set earlier in the definition of SSIM. We are comparing relative values of the SSIM so we are fine even without bounded SSIM. 3Note that the SSIM always lands within [−1, 1] interval. Therefore if we want to make a signal stronger (similarity) we have to use 0 < γ < 1 to achieve that. 4In the end only a quarter of the stimulus is visible in the lower right corner. 5In the end more than a half of the stimulus is visible at the bottom. 6This also depends on the direction of the shift. If we moved the stimulus in the direction of the stripes, this effect would not have occurred. 7Phase invariant similarity described in Sebastian et al. [2017]

30 ssim values 1.0 0.8 0.6 0.4 0.2

Values 0.0 0.2

cw_ssim values 1.0 0.8 0.6 0.4 0.2

Values 0.0 0.2

0 2 4 6 8 10 Time [s] Gabor noise vs Gabor Noise vs Gabor Gabor noise vs Noise

Figure 4.3: In this experiment the stimulus moves at a constant pace towards the right bottom corner of the scene4

ssim values 1.0 0.8 0.6 0.4 0.2

Values 0.0 0.2

cw_ssim values 1.0 0.8 0.6 0.4 0.2

Values 0.0 0.2

0 2 4 6 8 10 Time [s] Gabor noise vs Gabor Noise vs Gabor Gabor noise vs Noise

Figure 4.4: In this experiment the stimulus moves at constant pace towards the bottom of the scene.5 to be able to observe at least half of the stimulus but observing more than a half is definitely better.

4.3 Dynamic stimuli

In a former experiment we learned that if there is at least half of stimuli in the scene we are able to detect it using SSIM methods. We needed to adjust the SSIM metrics from the previous experiments to take into account changing stimuli. In this experiment we tried to detect a stimulus changing in time but this time we

31 did not know what the stimulus looked like. We created a list of predefined stimuli which were used to compare the scene to. Those predefined stimuli are all considered a potential pattern. Measuring SSIM for given scene takes all the predefined stimuli and computes their respective SSIM. The final SSIM value is taken as the highest measured SSIM.

SSIMˆ︂ (image) = max (SSIM(samplei, image)) samples We did this for both simple SSIM and CW-SSIM. Potentially CW-SSIM should be able to perform well even with fewer predefined patterns. There were nine measurements in total. We tested three different stimulus updates (theta, phase and frequency8), and three different granularities of the detection patterns. Higher granularity means more different patterns. Therefore there is a higher chance to test the scene against a precise pattern but at the same time the computation takes longer time, and the specificity of the method lowers. In Fig.: 4.5 we present results for an experiment with varying frequency and average9 granularity.

ssim values 1.0 0.8 0.6 0.4 Values 0.2

pattern_search_ssim values 1.0 0.8 0.6 0.4 Values 0.2

cw_ssim values 1.0 0.8 0.6 0.4 Values 0.2

pattern_search_cw_ssim values 1.0 0.8 0.6 0.4 Values 0.2

0 2 4 6 8 10 Time [s] Gabor noise vs Gabor Noise vs Gabor Gabor noise vs Noise

Figure 4.5: Plot for experiment with frequency change and middle granularity

In Fig.: 4.5 we see that even though the predefined array gives worse results10 we are still able to distinguish between the SSIM values for scene with stimuli and between values for scene without one. At the same time we can see that the performance of CW-SSIM dropped significantly compared to simple SSIM. This effect can be caused by highrates of false positives (orange).

8Rotation, phase offset and number of lines respectively. 9Among other experiments. 10higher similarity to scene without stimuli and lower similarity for scene with stimuli

32 4.4 Moving stimuli

Next logical step was to merge the former two experiments into one. That way we actually simulated the behavior of the detection in a single sliding window which is used in the final experiment. We observed a scene with a single stimulus that was moving in a single direc- tion at a constant pace. Our goal was to determine how precise can the SSIM method be with a decreasing percentage of the stimulus being visible, and when we did not know what the stimulus looked like. Results can be seen in Fig.: 4.6.

ssim values 1.0 0.8 0.6 0.4 0.2 0.0

Values 0.2

pattern_search_ssim values 1.0 0.8 0.6 0.4 0.2 0.0

Values 0.2

cw_ssim values 1.0 0.8 0.6 0.4 0.2 0.0

Values 0.2

pattern_search_cw_ssim values 1.0 0.8 0.6 0.4 0.2 0.0

Values 0.2

0 2 4 6 8 10 Time [s] Gabor noise vs Gabor Noise vs Gabor Gabor noise vs Noise

Figure 4.6: In this experiment the stimulus moves at a constant pace towards the lower right side of the scene.

Once again we observe the oscillation in SSIM. The CW-SSIM with pattern search does not perform very well in this case. However pattern search SSIM managed to provide conclusive results even with imperfect matches. For this reason we kept using SSIM as it performs better in our conditions than CW- SSIM using phase invariant similarity. The fact that we need to capture at least half of the stimulus by the sliding window persists which improved our time performance in the main experiment.

33 5. Main experiment

In the previous chapter we proposed and observed several concepts revolving around the detection of stimuli in a scene. We applied those findings to the main experiment which we present in this chapter. In this experiment observed scenes differ from those in previous experiments. The scene now is 1000 × 1000 pixels big. There is only a small stimulus present in the scene. The stimulus size is 200px to match the size of the stimulus from previous experiments. Our goal was to prepare a probabilistic distribution which would serve as a heat map where higher pixel intensity corresponds to higher probability of the stimulus being present in the location. First approach to detection uses the SSIM method. SSIM ought to help us localize the gabor patch by distinguishing its features in the scene. The second method localizes the gabor patch in the scene using frame difference. This method attempts to locate the stimulus not based on its structure but rather based on overall motion in the scene.

5.1 Methods

Before we present the results of detecting gabor patches we introduce new meth- ods that we did not use in the previous experiments. Here we introduce the sliding window which we used to divide the scene into separate POIs which were then further processed. One such process is SSIM which was used before. Second processing method we used is a frame difference which detects motion. Finally the results were aggregated into a heat map which represents the probability distribution of the stimulus’ location.

5.1.1 Sliding window We used a sliding window technique to localize our target. The window’s size is the same as the size of the stimulus’. The step of the window slide should be as small as possible for performance reasons. Thanks to the previous experiments we know that the stimulus is detectable by SSIM if at least a half of the stimulus is presented in the POI (see 4.1 and 4.3). Using this knowledge we knew we could have used half the stimulus size as the window slide step. To ensure good precision we used even smaller window. In the following experiments we used a quarter of the stimulus size as a step for our window. The window is 200px wide and 200px high. The step is 50px. That way no matter the stimulus position1 we always have at least one window that captures more than 50% of the stimulus. We computed a single value for every window. This value tells us whether there is or whether there is no stimulus present. Such window matches the situ- ation that was observed in the previous experiments (see Chapter 4)

1This statement does not apply to the edges of the scene, which we can use to hide the stimulus altogether. Provided that the stimulus is fully visible somewhere in the scene the statement holds.

34 5.1.2 SSIM We computed the value for the sliding window using the SSIM. We used SSIM and CW-SSIM with visual cue, and SSIM with pattern cue. We have decided against CW-SSIM with pattern cue, as it showed statistically less significant results than SSIM with pattern search (see 4.3 and 4.4).

5.1.3 Frame difference A background subtraction is used to detect a moving object on a static back- ground. Its big disadvantage is that it requires a static background which we do not have in our experiments. In order to detect the stimuli in our video we used a threshold frame difference, where the threshold is computed as the relative speed of change of the background. Any change of the pixel intensity above this threshold is then reported, intensity changes below the threshold goes unnoticed. Threshold frame difference applied to a simple scene without any stimuli does not report any movement. Using this method we can also estimate the expected speed of intensity change. That could help developing a camouflage for the stimuli, because if the change of the intensity of the stimuli’s pixels is low enough it will not surpass the threshold and thus go undetected by the threshold frame difference. The value for the sliding window is then the difference between the windows’ average intensity change and the average intensity change of the whole scene.

5.1.4 Improved frame difference Described threshold frame difference method reports only changes above the com- puted threshold. It is very probable that this method is not able to detect a static stimulus, even though the static stimulus creates an irregularity in local intensity change. To be able to detect any irregularities in intensity change we introduce per- centile frame difference. This method first computes regular frame difference. Then it creates a several percentile levels from the resulting frame difference. The reported difference is computed according to computed percentile levels. Pixels with average intensity change come out with low intensity and pixels with irregular intensity change (either high or low) come out with high intensity. This approach is theoretically able to locate a static stimuli in a dynamic scene.

5.1.5 Heat maps To observe the decision process closely we created a heat map video. In this video the areas with higher intensity corresponds to areas with higher chance of a stimulus being present. Every window gets one value. The intensity of a pixel is computed as the average of said values from all windows the pixel belongs to. In our case every pixel belongs to four different windows2.

2Once again edges are kinda problematic, since the windows are either small, or contain noise.

35 5.1.6 Heat map evaluation To evaluate the heat maps we used recall and false positive rate. For every heat map we created six predictive models. Every model predicts the location of the gabor patch based on the values of the heat map. We created six percentile levels. The percentile levels are 100, 99, 95, 90, 80, and 70. Every model corresponds to one percentile level. The percentile value denotes the threshold which the model uses for prediction. If the value of the heat map is greater or equal to the given percentile the model predicts true. If the value is less than the given percentile the model predicts false. We measured recall and false positive rate for every model.

5.2 Results

In this section we present the results of our attempt to localize a gabor patch in a pink noise scene. We compare the performance of the mentioned methods in various situation.

Settings Stimulus position Dynamic Static Stimulus Dynamic Static Background Dynamic Static

For every settings we evaluated the performance of the methods mentioned earlier.

• SSIM with a visual cue • threshold frame difference

• SSIM using pattern search • simple frame difference

• CW-SSIM with a visual cue • percentile frame difference

First three methods are aimed at detecting the gabor patch using SSIM. We compared the performance of SSIM aided with patterns (see 4.3 and 4.4) with SSIM and CW-SSIM aided with the actual search target (see 4.1 and 4.2)3. The last three methods are aimed at detecting the target gabor patch not based on its structure but rather based on the overall motion in the scene. We present the performance of every method, and compare how did the method perform in given situation.

5.2.1 SSIM All the detection methods using SSIM showed fairly good results in the concept phase. In the localization experiment the results were conclusive only for CW- SSIM with visual cue. In Fig.: 5.1 we present a frame from the original scene and a heat map generated using CW-SSIM.

3We did not use CW-SSIM due to its inconclusive results

36 Figure 5.1: A frame from the original (left) and the heat map generated by CW-SSIM (right)

The light area in the heat map denotes the expected area (top left) where the stimulus is present. We can see that in this case the location is clear and it more or less matches the location of the gabor patch. Such distinction is never to be taken for granted. In Fig.: 5.2 we present an original frame and a heat map generated using SSIM.

Figure 5.2: A frame from the original (left) and the heat map generated by SSIM (right)

We cannot really tell where did the SSIM localization method find the gabor patch. We present these pictures to illustrate the usage of the heat maps to the reader. To see all the generated heat maps see A.2 in appendix. More rigorous approach to evaluation follows. As expected using SSIM to detect stimuli showed very fluctuating results in all situations where there was a moving stimulus present. We present a result with static scene, dynamic stimulus position and dynamic stimulus. See Fig.: 5.3. We can see that the recall drops abruptly to zero and then to the end goes

37 Recall

1.0

0.8

0.6

Values 0.4

0.2

0.0

FPR

1.0

0.8

0.6

Values 0.4

0.2

0.0

0 1 2 3 4 5 6 Time [s] 100 percentile 95 percentile 80 percentile 99 percentile 90 percentile 70 percentile

Figure 5.3: Plot for true SSIM in static scene with moving stimulus up. CW-SSIM on the other hand managed to overcome this behavior. We present a result in the same situation. See Fig.: 5.4.

Recall

1.0

0.8

0.6

Values 0.4

0.2

0.0

FPR

1.0

0.8

0.6

Values 0.4

0.2

0.0

0 1 2 3 4 5 6 Time [s] 100 percentile 95 percentile 80 percentile 99 percentile 90 percentile 70 percentile

Figure 5.4: Plot for true CW-SSIM in static scene with moving stimulus

We can see that with 95th percentile (green) we still have high recall and low false positive rate. If we do not care that much about FPR we can even reach 100% recall (red, purple and brown). Last SSIM method is the pattern search. In the concept phase this method exhibited fairly good performance. We expected this method to generate very noisy results. But at the same time provide reliable localization results. In Fig.: 5.5 we can see that the recall is fairly noisy. None of the percentile values exhibit any notable results.

38 Recall

1.0

0.8

0.6

Values 0.4

0.2

0.0

FPR

1.0

0.8

0.6

Values 0.4

0.2

0.0

0 1 2 3 4 5 6 Time [s] 100 percentile 95 percentile 80 percentile 99 percentile 90 percentile 70 percentile

Figure 5.5: Plot for pattern SSIM in static scene with moving stimulus

We conclude that SSIM can be used to localize known patterns in dynamic and static scenes with good recall. If we want to localize moving stimulus we have to use CW-SSIM. As results for pattern search SSIM show SSIM is only applicable if we know very well what we are looking for. Without knowing the visual properties of the target SSIM is not very useful.

5.2.2 Frame difference Detecting movement using our proposed frame difference methods proved to be effective and quite robust depending on the situation. We present results for dynamic scene with moving and stationary dynamic stimulus. First situation consists of a moving target on a dynamic background. Second situation exhibits a smaller degree of motion by presenting static target on a dynamic background. In theory percentile frame difference should be able to cancel out the effect ofthe dynamic background and detect any irregular motion. To better illustrate the idea we present Fig.: 5.6 which was taken from the fully dynamic settings. In the picture we can see that the difference frame reports movement in the area where the stimulus is actually present. Generated heat map matches the position of the stimulus well. We present these pictures to illustrate the idea behind the frame difference to the reader. To see all the generated frame difference and heat maps see A.2 in appendix. Now we present results for the threshold frame difference and the simple frame difference. Results for percentile frame difference are in separate section atthe end of this chapter. See Fig.: 5.7 to observe the performance of the threshold frame difference. We can see that the recall is very high for a model using 95th percentile (green) while still holding low FPR. We can see drops of FPR for 90th percentile. This suggests that there is some sort of noise present in the heat map.

39 Figure 5.6: A frame from the original (left), the difference created by threshold frame difference, and the heat map generated by threshold frame difference (right)

Recall

1.0

0.8

0.6

Values 0.4

0.2

0.0

FPR

1.0

0.8

0.6

Values 0.4

0.2

0.0

0 1 2 3 4 5 6 Time [s] 100 percentile 95 percentile 80 percentile 99 percentile 90 percentile 70 percentile

Figure 5.7: Plot for threshold frame difference in a dynamic scene with a moving dynamic stimulus

For comparison we also include the results of simple frame difference method - Fig.: 5.8. For moving stimulus this method exhibits the same level of recall as the thresh- old difference. This is caused by the fast moving stimulus which manages tocause a bigger difference than the rest of the dynamic scene. To prove that the movement of the stimulus really does affect the performance of the simple frame difference we also include results of detection in a dynamic scene with a dynamic stimulus with static position. We compare the performance of the threshold frame difference in Fig.: 5.9, and the simple frame difference in Fig.: 5.10 We can clearly see that the simple frame difference struggles to maintain high recall. Whereas the threshold frame difference maintains good recall and very low false positive rate for 95th percentile (green). An obvious disadvantage of this approach is that it does not perform so well for a static stimulus. The heat map reports only parts which have average in- tensity change higher than the overall scene and static stimulus does not change.

40 Recall

1.0

0.8

0.6

Values 0.4

0.2

0.0

FPR

1.0

0.8

0.6

Values 0.4

0.2

0.0

0 1 2 3 4 5 6 Time [s] 100 percentile 95 percentile 80 percentile 99 percentile 90 percentile 70 percentile

Figure 5.8: Plot for simple frame differencee in dynamic scene with moving dy- namic stimulus

Recall

1.0

0.8

0.6

Values 0.4

0.2

0.0

FPR

1.0

0.8

0.6

Values 0.4

0.2

0.0

0 1 2 3 4 5 6 Time [s] 100 percentile 95 percentile 80 percentile 99 percentile 90 percentile 70 percentile

Figure 5.9: Plot for the threshold frame difference in a dynamic scene witha non-moving dynamic stimulus

Therefore the reported intensity change is most probably lower than the average change of the scene. We conclude that in a dynamic environment the simple frame difference is useful only when we know that our target will maintain high speed of change. Threshold frame difference manages to detect even a slight motion.

Percentile frame difference Mentioned method perform very well in situations with a dynamic stimuli. As we already mentioned we would like to detect a static stimulus in a dynamic

41 Recall

1.0

0.8

0.6

Values 0.4

0.2

0.0

FPR

1.0

0.8

0.6

Values 0.4

0.2

0.0

0 1 2 3 4 5 6 Time [s] 100 percentile 95 percentile 80 percentile 99 percentile 90 percentile 70 percentile

Figure 5.10: Plot for the simple frame differencee in a dynamic scene witha non-moving dynamic stimulus environment. Simple frame difference and threshold frame difference are not able to detect a stimulus that is not moving even though the relative change of the stimulus is different than the relative change of the scene. Such situation isaccom- plished by having a dynamic scene with a static non-moving stimulus. Percentile frame difference should be able to detect such stimuli. We compared performance of percentile frame difference (Fig.: 5.11), and threshold frame difference (Fig.: 5.12) in such scene.

Recall

1.0

0.8

0.6

Values 0.4

0.2

0.0

FPR

1.0

0.8

0.6

Values 0.4

0.2

0.0

0 1 2 3 4 5 6 Time [s] 100 percentile 95 percentile 80 percentile 99 percentile 90 percentile 70 percentile

Figure 5.11: Plot for percentile frame difference in dynamic scene with static stimulus

Unfortunately the recall is very low for both of these approaches. Threshold frame difference exhibits a lot of spikes which are caused by several artifacts in

42 Recall

1.0

0.8

0.6

Values 0.4

0.2

0.0

FPR

1.0

0.8

0.6

Values 0.4

0.2

0.0

0 1 2 3 4 5 6 Time [s] 100 percentile 95 percentile 80 percentile 99 percentile 90 percentile 70 percentile

Figure 5.12: Plot for threshold frame difference in dynamic scene with static stimulus the produced heat map. Percentile frame difference is more stable but achieving higher recall cannot be done without having high false positive rate as well (brown and purple). The percentile frame difference method selected to generate the heat map turned out not being suitable for this kind of task. The heat map does not show any significant results. In theory it should be able to locate well any irregularities in intensity change. Yet the intensity change seems to behave all the same in the scope of the sliding window. This interference disrupts the detection process. We conclude that a moving stimulus is easier to locate than a non-moving stimulus that is dynamic. However non-moving static stimulus is even less de- tectable even though the intensity change of the stimulus is different to the in- tensity change of the scene the stimulus is presented in.

5.2.3 Recapitulation We tested the performance of methods based on the SSIM and methods based on the frame difference in several environments with varying degree of dynamicity. We learned that the SSIM is more suitable for such tasks where there is a visual cue provided. When provided with the visual cue CW-SSIM exhibited good results in all tested environments4. Methods based on the frame difference showed good results in dynamic environments. Threshold frame difference managed to cancel out the effect of a dynamic environment. Proposed method percentile frame difference did not meet our expectations of detecting fully static stimuliin dynamic scene.

4Fully dynamic and fully static

43 Conclusion

We created an environment for generating dynamic scenes for vision labs. Envi- ronments created by the tool can range from a simple environment where there is one gabor patch placed in a scene with pink noise background, to a dynamic environment with several gabor patches that are moving around. Using the tool we managed to perform several experiments that told us a lot about the usage of SSIM in vision labs. We learned that luminance is not a distinctive feature in experiments using pink noise and gabor patches. Contrast proved to be somewhat useful but in the end not as distinctive as structure comparison. Our results show that SSIM can be used to localize known patterns in dynamic and static scenes with good recall. Movement of the stimulus creates a lot of noise for SSIM based detection methods. To localize moving stimulus we have to use CW-SSIM which cancels out slight transformations very well. Another option is to make sure that we align the area of interest with the search pattern perfectly. We proposed a new method that could be used to detect previously unknown patterns using SSIM. This method performed well in a constrained environment. Once we lifted those constraints this method’s performance dropped significantly. According to the performance of pattern search SSIM method we conclude that SSIM and detection methods based on SSIM are only applicable if we know the target we are looking for. Without knowing the target visual properties SSIM based methods are not very useful. In dynamic environment the simple frame difference is useful only when we know that our target maintains higher speed of change than the rest of the scene. Whereas threshold frame difference manages to detect even a slight motion. We conclude that a moving stimulus is easier to locate than a non-moving stimulus that is dynamic. But a non-moving static stimulus is even less detectable even though the intensity change of the stimulus is different than the intensity change of the scene the stimulus is presented in. Our research showed us that SSIM can be used to study dynamic camouflage patterns and techniques but only under some restricted conditions. For example using pink noise and gabor patch totally ruled out using luminance for such study. It is possible that in more real life examples the luminance can be of better use. Also SSIM is suitable for detecting patterns of which we have prior knowledge. It is a simple alternative to classification for objects of simple definable structure.

44 Further research

We propose an experiment similar to our final experiment with several stimuli. Each stimulus would be generated from one true image pattern. But different masking techniques would be used to add it to the scene. Thanks to the fact that they have the same true image they can be compared to using SSIM. That way we can compare masking techniques among each other and also their overall performance. Researching other than nonlinear functions to interpolate between the two pink noises in the scene can also improve the precision of the experiment. Using bezier curves of higher order than one could potentially lead to stabilization of the contrast in the scene. In the future we would like to incorporate our work into the PsychoPy soft- ware (see Peirce [2007]) which already partially supports some sort of explored dynamicity.

45 Bibliography

[Citing pages are listed after each reference.]

Ioannis Agtzidis, Inga Meyh¨ofer, Michael Dorr, and Rebekka Lencer. Following forrest gump: Smooth pursuit related brain activation during free movie view- ing. NeuroImage, 216:116491, August 2020. doi: 10.1016/j.neuroimage.2019. 116491. URL https://doi.org/10.1016/j.neuroimage.2019.116491. Elham Azizi, Larry A. Abel, and Matthew J. Stainer. The influence of action video game playing on eye movement behaviour during visual search in ab- stract, in-game and natural scenes. Attention, Perception, & Psychophysics, 79(2):484–497, December 2016. doi: 10.3758/s13414-016-1256-7. URL https: //doi.org/10.3758/s13414-016-1256-7. [Page 6.] D. A. Baylor, T. D. Lamb, and K. W. Yau. Responses of retinal rods to single photons, Mar 1979. [Page 5.] Mark Bear. Neuroscience : exploring the brain. Lippincott Williams & Wilkins, Philadelphia, PA, 2007. ISBN 978-0781760034. [Page 5.] Johannes Bill, Hrag Pailian, Samuel J. Gershman, and Jan Drugowitsch. Hierar- chical structure is employed by humans during visual . Pro- ceedings of the National Academy of Sciences, 117(39):24581–24589, September 2020. doi: 10.1073/pnas.2008961117. URL https://doi.org/10.1073/pnas. 2008961117. [Page 9.] G. E. P. Box and Mervin E. Muller. A note on the generation of random nor- mal deviates. The Annals of Mathematical Statistics, 29(2):610–611, June 1958. doi: 10.1214/aoms/1177706645. URL https://doi.org/10.1214/ aoms/1177706645. [Pages 13 and 14.] David C. Burr and John Ross. Direct evidence that “speedlines” influence motion mechanisms. The Journal of Neuroscience, 22(19):8661–8664, October 2002. doi: 10.1523/jneurosci.22-19-08661.2002. URL https://doi.org/10.1523/ jneurosci.22-19-08661.2002. [Page 8.] John Cass, Erik Van der Burg, and David Alais. Finding flicker: Critical dif- ferences in temporal frequency capture attention. Frontiers in Psychology, 2, 2011. doi: 10.3389/fpsyg.2011.00320. URL https://doi.org/10.3389/ fpsyg.2011.00320. [Page 9.] M. S. Castelhano and C. Heaven. The relative contribution of scene context and target features to visual search in scenes. Attention, Perception & Psy- chophysics, 72(5):1283–1297, June 2010. doi: 10.3758/app.72.5.1283. URL https://doi.org/10.3758/app.72.5.1283. [Page 12.] James L. Castner and David A. Nickle. Intraspecific color polymorphism in leaf-mimicking katydids (orthoptera: Tettigoniidae: Pseudophyllinae: Pte- rochrozini). Journal of Orthoptera Research, (4):99, August 1995. doi: 10.2307/3503464. URL https://doi.org/10.2307/3503464. [Page 6.]

46 Kyle R. Cave and Zhe Chen. Identifying visual targets amongst interfering distractors: Sorting out the roles of perceptual load, dilution, and atten- tional zoom. Attention, Perception, & Psychophysics, 78(7):1822–1838, June 2016. doi: 10.3758/s13414-016-1149-9. URL https://doi.org/10.3758/ s13414-016-1149-9. [Page 10.]

Antoni B. Chan, Vijay Mahadevan, and Nuno Vasconcelos. Generalized stauf- fer–grimson background subtraction for dynamic scenes. Machine Vision and Applications, 22(5):751–766, April 2010. doi: 10.1007/s00138-010-0262-3. URL https://doi.org/10.1007/s00138-010-0262-3. [Page 9.]

Guofeng Chen, Yinglong Shen, Fushi Yao, Peipei Liu, and Yunyi Liu. Region- based moving object detection using SSIM. In 2015 4th International Con- ference on Computer Science and Network Technology (ICCSNT). IEEE, De- cember 2015. doi: 10.1109/iccsnt.2015.7490981. URL https://doi.org/10. 1109/iccsnt.2015.7490981. [Pages 9 and 17.]

John G. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of the Optical Society of America A, 2(7):1160, July 1985. doi: 10.1364/josaa.2. 001160. URL https://doi.org/10.1364/josaa.2.001160. [Page 5.]

Kevin Dent, Jason J. Braithwaite, Xun He, and Glyn W. Humphreys. Integrating space and time in visual search: How the preview benefit is modulated by stereoscopic depth. Vision Research, 65:45–61, July 2012. doi: 10.1016/j.visres. 2012.06.002. URL https://doi.org/10.1016/j.visres.2012.06.002.

Andrew Derrington and Manuel Suero. Motion of complex patterns is computed from the perceived motions of their components. Vision Research, 31(1):139– 149, January 1991. doi: 10.1016/0042-6989(91)90081-f. URL https://doi. org/10.1016/0042-6989(91)90081-f. [Page 9.]

M. Dorr, T. Martinetz, K. R. Gegenfurtner, and E. Barth. Variability of eye movements when viewing dynamic natural scenes. Journal of Vision, 10(10): 28–28, August 2010. doi: 10.1167/10.10.28. URL https://doi.org/10.1167/ 10.10.28. [Page 4.]

J. H. Elder and L. Velisavljevic. Cue dynamics underlying rapid detection of animals in natural scenes. Journal of Vision, 9(7):7–7, July 2009. doi: 10. 1167/9.7.7. URL https://doi.org/10.1167/9.7.7. [Pages 6 and 8.]

Gustav Fechner. ELEMENTE DER PSYCHOPHYSIK. FORGOTTEN Books, Place of publication not identified, 2016. ISBN 1332459110. [Page 12.]

David J. Field. Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America A, 4(12): 2379, December 1987. doi: 10.1364/josaa.4.002379. URL https://doi.org/ 10.1364/josaa.4.002379. [Page 13.]

Steven L. Franconeri and Daniel J. Simons. Moving and looming stimuli capture attention. Perception & Psychophysics, 65(7):999–1010, October 2003. doi: 10.3758/bf03194829. URL https://doi.org/10.3758/bf03194829. [Page 10.]

47 Changhong Fu, Ran Duan, and Erdal Kayacan. Visual tracking with online structural similarity-based weighted multiple instance learning. Information Sciences, 481:292–310, May 2019. doi: 10.1016/j.ins.2018.12.080. URL https: //doi.org/10.1016/j.ins.2018.12.080. [Page 17.]

D. Gabor. Theory of communication. part 1: The analysis of information. Journal of the Institution of Electrical Engineers - Part III: Radio and Communication Engineering, 93(26):429–441, November 1946. doi: 10.1049/ji-3-2.1946.0074. URL https://doi.org/10.1049/ji-3-2.1946.0074. [Page 14.]

C. W. Gardiner. Stochastic methods : a handbook for the natural and social sciences. Springer, Berlin, 2009. ISBN 978-3-540-70712-7. [Page 13.]

Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Fe- lix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness, 2019. [Page 6.]

W. S. Geisler and J. Najemnik. Human and optimal eye movement strategies in visual search. Journal of Vision, 5(8):778–778, September 2005. doi: 10.1167/ 5.8.778. URL https://doi.org/10.1167/5.8.778. [Page 4.]

Wilson S. Geisler. Motion streaks provide a spatial code for motion direction. Nature, 400(6739):65–69, July 1999. doi: 10.1038/21886. URL https://doi. org/10.1038/21886. [Page 8.]

Wilson S. Geisler. Visual perception and the statistical properties of natural scenes. Annual Review of Psychology, 59(1):167–192, January 2008. doi: 10.1146/annurev.psych.58.110405.085632. URL https://doi.org/10.1146/ annurev.psych.58.110405.085632. [Page 13.]

Samuel J. Gershman, Joshua B. Tenenbaum, and Frank J¨akel. Discover- ing hierarchical motion structure. Vision Research, 126:232–241, September 2016. doi: 10.1016/j.visres.2015.03.004. URL https://doi.org/10.1016/j. visres.2015.03.004. [Pages 9 and 10.]

Alexander Goettker, Ioannis Agtzidis, Doris I. Braun, Michael Dorr, and Karl R. Gegenfurtner. From gaussian blobs to naturalistic videos: Comparison of ocu- lomotor behavior across different stimulus complexities. Journal of Vision, 20 (8):26, August 2020. doi: 10.1167/jov.20.8.26. URL https://doi.org/10. 1167/jov.20.8.26.

Antoine Grimaldi, David Kane, and Marcelo Bertalm´ıo. Statistics of natural images as a function of dynamic range. Journal of Vision, 19(2):13, Febru- ary 2019. doi: 10.1167/19.2.13. URL https://doi.org/10.1167/19.2.13. [Page 11.]

Joanna R. Hall, Innes C. Cuthill, Roland Baddeley, Adam J. Shohet, and Nicholas E. Scott-Samuel. Camouflage, detection and identification of mov- ing targets. Proceedings of the Royal Society B: Biological Sciences, 280 (1758):20130064, May 2013. doi: 10.1098/rspb.2013.0064. URL https: //doi.org/10.1098/rspb.2013.0064. [Page 10.]

48 John M. Halley. Ecology, evolution and 1f-noise. Trends in Ecology & Evolution, 11(1):33–37, January 1996. doi: 10.1016/0169-5347(96)81067-6. URL https: //doi.org/10.1016/0169-5347(96)81067-6. [Page 13.]

Elıas Herrero-Jaraba, Carlos Orrite-Uru˜nuela,and Jes´usSenar. Detected motion classification with a double-background and a neighborhood-based difference. Pattern Recognition Letters, 24(12):2079–2092, August 2003. doi: 10.1016/ s0167-8655(03)00045-x. URL https://doi.org/10.1016/s0167-8655(03) 00045-x. [Page 9.]

Martin J. How and Johannes M. Zanker. Motion camouflage induced by zebra stripes. Zoology, 117(3):163–170, June 2014. doi: 10.1016/j.zool.2013.10.004. URL https://doi.org/10.1016/j.zool.2013.10.004. [Page 10.]

Anna E. Hughes, Christian Jones, Kaustuv Joshi, and David J. Tolhurst. Diverted by dazzle: perceived movement direction is biased by target pattern orientation. Proceedings of the Royal Society B: Biological Sciences, 284(1850):20170015, March 2017. doi: 10.1098/rspb.2017.0015. URL https://doi.org/10.1098/ rspb.2017.0015. [Pages 9 and 10.]

Anna E. Hughes, David Griffiths, Jolyon Troscianko, and Laura A. Kelley. Noev- idence for motion dazzle in an evolutionary citizen science game. October 2019. doi: 10.1101/792614. URL https://doi.org/10.1101/792614. [Page 10.]

H. Kimmig, S. Ohlendorf, O. Speck, A. Sprenger, R.M. Rutschmann, S. Haller, and M.W. Greenlee. fMRI evidence for sensorimotor transformations in human cortex during smooth pursuit eye movements. Neuropsychologia, 46(8):2203– 2213, July 2008. doi: 10.1016/j.neuropsychologia.2008.02.021. URL https: //doi.org/10.1016/j.neuropsychologia.2008.02.021. [Page 4.]

Arni´ KristjAnsson,´ Andri Bjarnason, Arni´ Bragi Hjaltason, and Brynd´ısGyda Stef´ansd´ottir.Priming of luminance-defined motion direction in visual search. Attention, Perception, & Psychophysics, 71(5):1027–1041, July 2009. doi: 10.3758/app.71.5.1027. URL https://doi.org/10.3758/app.71.5.1027. [Page 4.]

Helj¨aKukkonen, Jyrki Rovamo, Kaisa Tiippana, and Risto N¨as¨anen. Michel- son contrast, RMS contrast and energy of various spatial stimuli at thresh- old. Vision Research, 33(10):1431–1436, July 1993. doi: 10.1016/ 0042-6989(93)90049-3. URL https://doi.org/10.1016/0042-6989(93) 90049-3. [Page 12.]

C. J. Lin, C-C. Chang, and Y-H. Lee. Developing a similarity index for static camouflaged target detection. The Imaging Science Journal, 62(6):337–341, December 2013. doi: 10.1179/1743131x13y.0000000057. URL https://doi. org/10.1179/1743131x13y.0000000057. [Page 17.]

S. Marˆcelja. Mathematical description of the responses of simple cortical cells. Journal of the Optical Society of America, 70(11):1297, November 1980. doi: 10.1364/josa.70.001297. URL https://doi.org/10.1364/josa.70.001297. [Page 5.]

49 Suzanne P. McKee, Scott N.J. Watamaniuk, Julie M. Harris, Harvey S. Smallman, and Douglas G. Taylor. Is stereopsis effective in breaking camouflage for mov- ing targets? Vision Research, 37(15):2047–2055, August 1997. doi: 10.1016/ s0042-6989(96)00330-6. URL https://doi.org/10.1016/s0042-6989(96) 00330-6. [Page 8.]

Sami Merilaita and Johan Lind. Background-matching and disruptive coloration, and the evolution of cryptic coloration. Proceedings of the Royal Society B: Biological Sciences, 272(1563):665–670, March 2005. doi: 10.1098/rspb.2004. 3000. URL https://doi.org/10.1098/rspb.2004.3000. [Page 7.]

Albert Michelson. Studies in optics. Dover Publications, New York, 1995. ISBN 978-0486687001. [Page 12.]

Jiri Najemnik and Wilson S. Geisler. Optimal eye movement strategies in visual search. Nature, 434(7031):387–391, March 2005. doi: 10.1038/nature03390. URL https://doi.org/10.1038/nature03390. [Page 4.]

Jonathan W. Peirce. PsychoPy—psychophysics software in python. Jour- nal of Neuroscience Methods, 162(1-2):8–13, May 2007. doi: 10.1016/j. jneumeth.2006.11.017. URL https://doi.org/10.1016/j.jneumeth.2006. 11.017. [Page 45.]

Eli Peli. Contrast in complex images. Journal of the Optical Society of America A, 7(10):2032, October 1990. doi: 10.1364/josaa.7.002032. URL https://doi. org/10.1364/josaa.7.002032. [Page 12.]

Carlos R. Ponce and Richard T. Born. Stereopsis. Current Biology, 18(18): R845–R850, September 2008. doi: 10.1016/j.cub.2008.07.006. URL https: //doi.org/10.1016/j.cub.2008.07.006. [Page 6.]

K. M. M. Prabhu. Window functions and their applications in signal processing. CRC Press/Taylor & Francis, Boca Raton Florida, 2014. ISBN 9781138076136. [Page 16.]

Natasha Price, Samuel Green, Jolyon Troscianko, Tom Tregenza, and Martin Stevens. Background matching and disruptive coloration as habitat-specific strategies for camouflage. Scientific Reports, 9(1), May 2019. doi: 10.1038/ s41598-019-44349-2. URL https://doi.org/10.1038/s41598-019-44349-2. [Page 8.]

R. R. Reeder and M. V. Peelen. The contents of the search template for category- level search in natural scenes. Journal of Vision, 13(3):13–13, June 2013. doi: 10.1167/13.3.13. URL https://doi.org/10.1167/13.3.13. [Page 12.]

Reshanne R. Reeder, Wieske van Zoest, and Marius V. Peelen. Involuntary attentional capture by task-irrelevant objects that match the search tem- plate for category detection in natural scenes. Attention, Perception, & Psy- chophysics, 77(4):1070–1080, March 2015. doi: 10.3758/s13414-015-0867-8. URL https://doi.org/10.3758/s13414-015-0867-8. [Page 12.]

50 Hannah M Rowland. From abbott thayer to the present day: what have we learned about the function of countershading? Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1516):519–527, November 2008. doi: 10.1098/rstb.2008.0261. URL https://doi.org/10.1098/rstb.2008. 0261. [Page 8.]

Graeme D. Ruxton, Tom N. Sherratt, and Michael P. Speed. Avoiding Attack. Ox- ford University Press, October 2004. doi: 10.1093/acprof:oso/9780198528609. 001.0001. URL https://doi.org/10.1093/acprof:oso/9780198528609. 001.0001. [Pages 6 and 7.]

W. L. S. and Hugh B. Cott. Adaptive coloration in animals. The Geographical Journal, 96(3):222, September 1940. doi: 10.2307/1788577. URL https:// doi.org/10.2307/1788577. [Page 6.]

H. Martin Schaefer and Nina Stobbe. Disruptive coloration provides camouflage independent of background matching. Proceedings of the Royal Society B: Bio- logical Sciences, 273(1600):2427–2432, July 2006. doi: 10.1098/rspb.2006.3615. URL https://doi.org/10.1098/rspb.2006.3615. [Page 7.]

Gunnar Schmidtmann, Ben J. Jennings, and Frederick A. A. Kingdom. Shape recognition: convexities, concavities and things in between. Scientific Reports, 5(1), November 2015. doi: 10.1038/srep17142. URL https://doi.org/10. 1038/srep17142. [Page 7.]

Stephen Sebastian, Jared Abrams, and Wilson S. Geisler. Constrained sampling experiments reveal principles of detection in natural scenes. Proceedings of the National Academy of Sciences, 114(28):E5731–E5740, June 2017. doi: 10. 1073/pnas.1619487114. URL https://doi.org/10.1073/pnas.1619487114. [Pages 4, 19, 25, 26, 27, 28, and 30.]

Stephen Sebastian, Eric S. Seemiller, and Wilson S. Geisler. Local reliability weighting explains identification of partially masked objects in natural im- ages. Proceedings of the National Academy of Sciences, 117(47):29363–29370, November 2020. doi: 10.1073/pnas.1912331117. URL https://doi.org/10. 1073/pnas.1912331117. [Pages 6 and 25.]

Eero P. Simoncelli and David J. Heeger. A model of neuronal responses in vi- sual area MT. Vision Research, 38(5):743–761, March 1998. doi: 10.1016/ s0042-6989(97)00183-1. URL https://doi.org/10.1016/s0042-6989(97) 00183-1. [Page 8.]

Aditya Singh, A. Bay, and A. Mirabile. Assessing the importance of colours for cnns in object recognition. 2020. [Page 6.]

Sujit K. Singh, Chitra A. Dhawale, and Sanjay Misra. Survey of object detection methods in camouflaged image. IERI Procedia, 4:351–357, 2013. doi: 10.1016/ j.ieri.2013.11.050. URL https://doi.org/10.1016/j.ieri.2013.11.050.

Liming Song and Weidong Geng. A new camouflage texture evaluation method based on WSSIM and nature image features. In 2010 International Conference

51 on Multimedia Technology. IEEE, October 2010. doi: 10.1109/icmult.2010. 5631434. URL https://doi.org/10.1109/icmult.2010.5631434. [Page 17.]

Miriam Spering, Dirk Kerzel, Doris I. Braun, Michael J. Hawken, and Karl R. Gegenfurtner. Effects of contrast on smooth pursuit eye movements. Journal of Vision, 5(5):6, May 2005. doi: 10.1167/5.5.6. URL https://doi.org/10. 1167/5.5.6. [Page 10.]

Martin Stevens and Innes C Cuthill. Disruptive coloration, and edge de- tection in early visual processing. Proceedings of the Royal Society B: Biological Sciences, 273(1598):2141–2147, May 2006. doi: 10.1098/rspb.2006.3556. URL https://doi.org/10.1098/rspb.2006.3556. [Page 8.]

Martin Stevens, Innes C Cuthill, Amy M.M Windsor, and Hannah J Walker. Disruptive contrast in animal camouflage. Proceedings of the Royal Society B: Biological Sciences, 273(1600):2433–2438, July 2006. doi: 10.1098/rspb.2006. 3614. URL https://doi.org/10.1098/rspb.2006.3614. [Page 8.]

P. Szendro, G. Vincze, and A. Szasz. BIO-RESPONSE TO WHITE NOISE EX- CITATION. Electro- and Magnetobiology, 20(2):215–229, January 2001. doi: 10.1081/jbc-100104145. URL https://doi.org/10.1081/jbc-100104145. [Page 13.]

J. Timmer and M. Koenig. On generating power law noise. Astronomy and Astrophysics, 300:707, August 1995. [Pages 13 and 14.]

D. J. Tolhurst, Y. Tadmor, and Tang Chao. Amplitude spectra of natural im- ages. Ophthalmic and Physiological Optics, 12(2):229–232, December 2007. doi: 10.1111/j.1475-1313.1992.tb00296.x. URL https://doi.org/10.1111/ j.1475-1313.1992.tb00296.x. [Page 13.]

Elle van Heusden, Anthony M. Harris, Marta I. Garrido, and Hinze Hogendoorn. Predictive coding of visual motion in both monocular and binocular human visual processing. Journal of Vision, 19(1):3, January 2019. doi: 10.1167/19. 1.3. URL https://doi.org/10.1167/19.1.3.

Hans Wallach. Uber¨ visuell wahrgenommene bewegungsrichtung. Psychologische Forschung, 20(1):325–380, December 1935. doi: 10.1007/bf02409790. URL https://doi.org/10.1007/bf02409790. [Page 9.]

Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, April 2004. doi: 10.1109/tip.2003.819861. URL https://doi.org/10.1109/tip.2003.819861. [Pages 17, 18, 25, 27, and 28.]

S. G. Wardle, J. Cass, K. R. Brooks, and D. Alais. Breaking camouflage: Binocular disparity reduces contrast masking in natural images. Journal of Vision, 10(14):38–38, December 2010. doi: 10.1167/10.14.38. URL https: //doi.org/10.1167/10.14.38. [Page 6.]

52 Sophie Wuerger, Robert Shapley, and Nava Rubin. “on the visually perceived direction of motion” by hans wallach: 60 years later. Perception, 25(11):1317– 1367, November 1996. doi: 10.1068/p251317. URL https://doi.org/10. 1068/p251317. [Page 9.]

Z. Yang and D. Purves. The statistical structure of natural light patterns de- termines perceived light intensity. Proceedings of the National Academy of Sciences, 101(23):8745–8750, May 2004. doi: 10.1073/pnas.0402192101. URL https://doi.org/10.1073/pnas.0402192101. [Page 11.]

Z. Zivkovic. Improved adaptive gaussian mixture model for background subtrac- tion. In Proceedings of the 17th International Conference on Pattern Recog- nition, 2004. ICPR 2004. IEEE, 2004. doi: 10.1109/icpr.2004.1333992. URL https://doi.org/10.1109/icpr.2004.1333992. [Page 9.]

53 List of Figures

1.1 Camouflage in nature ...... 7

3.1 pink noise for red and blue palette ...... 21 3.2 Original image and its spectral image ...... 21 3.3 Interpolated pink noise ...... 22 3.4 Interpolated pink noise spectra ...... 22 3.5 Slices of 3D pink noise ...... 23 3.6 Spectra of slices of 3D pink noise ...... 23 3.7 Shifted pink noise ...... 23 3.8 Spectra of shifted pink noise ...... 23 3.9 Gabor - theta ...... 24 3.10 Gabor - frequency ...... 24 3.11 Gabor - phase ...... 24 3.12 Simple scene example ...... 26

4.1 Example ...... 28 4.2 Plot for simple POC ...... 29 4.3 Plot for SSIM precission ...... 31 4.4 Plot for SSIM precission ...... 31 4.5 Plot for gabor detection ...... 32 4.6 Plot for detection precission ...... 33

5.1 Frames from main CW-SSIM experiment ...... 37 5.2 Frames from main SSIM experiment ...... 37 5.3 Plot for true SSIM performance ...... 38 5.4 Plot for true CW-SSIM performance ...... 38 5.5 Plot for pattern SSIM performance ...... 39 5.6 Frames from main frame difference experiment ...... 40 5.7 Plot for threshold frame difference ...... 40 5.8 Plot for frame difference ...... 41 5.9 Plot for threshold frame difference ...... 41 5.10 Plot for simple frame difference ...... 42 5.11 Plot for improved frame difference ...... 42 5.12 Plot for threshold frame difference ...... 43

54 Glossary

1/f noise (also pink noise) a signal with a frequency spectrum such that the power spectral density is inversely proportional to the frequency of the signal 3, 4, 12, 13, 56

CNN Convolutional Neural Network 6 contrast a relative difference in luminance 6, 10, 12, 21, 22, 25, 29, 30,44

CW-SSIM Complex Wavelet SSIM 19, 27, 30, 32, 33, 35, 36, 37, 38, 39, 43, 44, 58 dynamic range the ratio between the largest and smallest values that a certain quantity can assume 11 false positive rate is the fraction of retrieved non-relevant instances among all non-relevant instances 36, 38, 40, 43 false positive is an outcome where the model incorrectly predicts the positive 28 fourier transformation a mathematical technique to transform a function of space f(p), to a function of frequency X(ω) 13, 14, 19

FPR False Positive Rate 38, 39 frame difference a general method to compare two consecutive frames in a video. See simple frame difference, threshold frame difference, and per- centile frame difference 34, 35, 39, 43, 59 gabor function a normalized product of a gaussian function and a complex sinusoid 5, 14, 15, 16, 55 gabor patch a 2D representation of a specific real gabor function 3, 4, 5, 10, 11, 14, 15, 16, 24, 26, 34, 36, 37, 44, 59 gaussian window see gaussian cutout 55 gaussian cutout (also gaussian window) is a gaussian function of the form

(︄ x2 + y2 )︄ cutoutσ(x, y) = exp −π σ2

it is used to create a round patches with smooth edges 14, 16, 26, 55 gaussian function is a function of the form

(x − b)2 gauss(x) = a exp(− ) 2c2 14, 15, 16, 55

55 hanning funciton (also hann function, hanning window, cosine bell) is a windowing function used for smoothing values (not to confuse with ham- ming window) 26, 56 hanning window (also hann window) alternative term for hanning function 16, 26, 56

LGN Lateral Geniculate Nucleus 5 luminance a photometric measure of the luminous intensity 11, 12, 17, 18, 19, 20, 24, 25, 29, 30, 44 percentile frame difference method to compute frame difference using several percentile levels to categorize absolute intensity changes of pixel based on the relative change of the scene as a whole 35, 36, 39, 42, 43, 55, 59 pink noise see 1/f noise 4, 11, 13, 14, 20, 21, 22, 23, 25, 36, 44, 54, 55, 58, 59

POI Point of Interest 34 recall (also TPR) is the fraction of retrieved relevant instances among all rele- vant instances 36, 37, 38, 39, 40, 42, 43

RMS Root Mean Square 12 simple frame difference method to compute frame difference using absolute change of intensity of every pixel 36, 39, 40, 41, 42, 44, 55, 59

SSIM Structural Similarity Index Measure 9, 11, 17, 18, 19, 25, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 55, 58 stereopsis perception of the depth using binocular vision 6 threshold frame difference method to compute frame difference using abso- lute change of intensity of every pixel which is higher than given threshold 35, 36, 39, 40, 41, 42, 43, 44, 55, 59

TPR True Positive Rate 56

56 A. Attachments

A.1 Source code

All plots, measured data, and videos used in the thesis are generated using the tool. Its source code can be found in this Github repository 1. Every script we needed to perform the experiments is in the ./experiment folder.

A.1.1 Experiments Scripts to generate videos and plots for experiments from chapter 4. All videos are generated running run_exp_N_Experiment_Name script without any command line arguments, where N is the number of the experiment and Experiment_Name is the name of the experiment.

• Experiment 4.1 - run_exp_1_simple_poc.py

• Experiment 4.2 - run_exp_2_ssim_precision.py

• Experiment 4.3 - run_exp_3_changing_gabor_adv_detection.py

• Experiment 4.4 - run_exp_4_moving_gabor_adv_detection.py

To simplify the process we created a script generate_all_videos.py2 that will run all four experiments. Generated videos, logs and measurements3 are saved in separate folder named N_Experiment_Name4. To generate plots we will use script generate_plots_for_experiment.py with command line arguments which will serve as labels for generated plots and a folder of the experiment. The script will generate plot for every .csv file that is in the given folder --folder=Folder5. Every plot will be saved in .pdf and .png format. A simple script to generate all plots for all experiments was created to simplify the process - generate_all_plots.py.

A.1.2 Final experiment Scripts to generate videos for experiments from chapter 5. All videos are generated running run_main_exp_gabor_localization.py script without any command line arguments. This will generate videos for every method and for every setting mentioned in 5. To generate the .csv files and

1https://github.com/SeryMa/DancingGabor 2Better option in this case would probably be using a Makefile, but Makefiles generally do not work well on windows os. 3in .csv format 4For convenience we recommend creating a new folder solely for results and running scripts so that this folder is the working directory 5Defaults to ”.”

57 plots run evaluate_main_exp_gabor_localization.py from the same directory where the generated videos are. Please beware that generating all videos takes a lot of time and can be time consuming.

A.2 Results

Attached results can also be found in this dropbox folder. In the attachment there are generated videos, the .csv files. In the dropbox folder there are also the plots. Included plots are in .pdf and .png format. Videos and plots of every experiment are put into separate folders to ease the navigation.

Preliminary experiments Results for first set of experiments (see Chapter 4):

• 0_test_running - behavior of 3D pink noise and shifted pink noise during experiment 4.1

• 1_simple_poc - results for experiment 4.1

• 2_ssim_precision - results for experiment 4.2

• 3_changing_gabor_adv_detection - results for experiment 4.3

• 4_moving_gabor_adv_detection - results for experiment 4.4

Main experiment Results for the localization experiment (see Chapter 5). Separate experiments are sorted by the settings. Every setting has a separate folder 5_gabor_localization_XYZ where X is the settings for background - ei- ther D for dynamic background or S for static background, Y is the settings for stimulus - either D for dynamic stimulus or S for static stimulus, and Z is the settings for stimulus position - either D for moving stimulus or S for non-moving stimulus. In every folder there are results for every method.

SSIM All results for SSIM methods explained in detail in 5.2.1

• true_ssim - results for SSIM with visual cue

• true_cw_ssim - results for CW-SSIM with visual cue

• pattern_ssim - results for SSIM with pattern search

58 Frame difference All results for frame difference methods explained in detail in 5.2.2 and 5.2.2.

• threshold_diff - results for threshold frame difference

• pure_diff - results for simple frame difference

• avg_diff - results for percentile frame difference

Other resources We also include generated videos for the other two mentioned dynamization meth- ods (see Chapter 3.1). The videos are located in folder 0_test_running. All videos in this folder follow the same file format that was used in 4.1. Sothe reader can observe the generated noise and gabor patch in different positions. Files named v2 denotes the pink noise generated using the 3D approach, and files named run denotes the pink noise that is moving in down.

59