<<

City University of New York (CUNY) CUNY Academic Works

All Dissertations, Theses, and Capstone Projects Dissertations, Theses, and Capstone Projects

2-2018

Social Influences on Behavior: From Song Learning to Motion Coordination

Iva Ljubičić The Graduate Center, City University of New York

How does access to this work benefit ou?y Let us know!

More information about this work at: ://academicworks.cuny.edu/gc_etds/2521 Discover additional works at: https://academicworks.cuny.edu

This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] Social Influences on Songbird Behavior: From Song Learning to Motion Coordination

By

Iva Ljubičić

A dissertation submitted to the Graduate Faculty in Biology in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York

2018

© 2018 Iva Ljubičić All Rights Reserved

ii Social Influences on Songbird Behavior: From Song Learning to Motion Coordination

By

Iva Ljubicic

This manuscript has been read and accepted for the Graduate Faculty in Biology in satisfaction of the dissertation requirement for the degree of Doctor of Philosophy.

______Date Chair of Examining Committee Dr. Ofer Tchernichovski, Hunter College

______Date Executive Officer Dr. Cathy Savage-Dunn

______Dr. Mark Hauber, Hunter College

______Dr. David Lahti, Queens College

______Dr. Carolyn Pytte, Queens College

______Dr. Sébastien Derégnaucourt, Paris West University

The City University of New York

iii Social Influences on Songbird Behavior: From Song Learning to Motion Coordination

Iva Ljubičić

Advisor: Dr. Ofer Tchernichovski

Abstract

Social animals learn during development how to integrate successfully into their group. How do social interactions combine to maintain group cohesion? We first review how social environments can influence the development of vocal learners, such as songbirds and humans

(Chapter 1). To bypass the complexity of natural social interactions and gain experimental control, we developed Virtual Social Environments, surrounding the bird with videos of manipulated playbacks. This way we were able to design sensory and social scenarios and test how social zebra finches adjust their behavior (Chapters 2 & 3). A serious challenge is that the color output of a video monitor does not match the color vision of zebra finches. To minimize chromatic distortion, we eliminated all of the colors from the videos, except in the beak and cheeks where we superimposed colors that match the sensitivity of zebra finch photoreceptors

(Chapter 2). Birds strongly preferred to watch these manipulated ‘bird appropriate’ videos. We also designed Virtual Social Environments for assessing how observing movement patterns might affect behavior in real-time (Chapter 3). We found that presenting birds with manipulated movement patterns of virtual males promptly affects the mobility of birds watching the videos: birds move more when virtual males increase their movements, and they decrease their

iv movements and ‘cuddle’ next to virtual males that stop moving. These results suggest that individuals adjust their activity levels to the statistical patterns of observed conspecific movements, which can explain zebra finch group cohesion. Finally, we studied the song development process in the absence of social input to determine how intrinsic biases and external stimuli shape song from undifferentiated syllables into well-defined categorical signals of adult song (Chapter 4). Do juveniles learn the statistics of early sub-song to guide vocal development?

We trained juvenile zebra finches with playbacks of their own, highly variable, developing song and showed that these self-tutored birds developed distinct syllable types (categories) as fast as birds that were trained with a categorical, adult song template. Therefore, the statistical structure of early input seems to have no bearing on the development of phonetic categories. Overall, our results uncover social forces that influence individual behaviors, from motion coordination to vocal development, which have implications for how group structures and vocal culture are maintained.

v Acknowledgements

I am so grateful to the many people who made this work possible. My biggest thanks go to my advisor, Ofer Tchernichovski, for encouraging the never-ending lab discussions about science, politics, philosophy and art. Thank you for the light-hearted atmosphere that allowed us to make fun of you, and each other. I am so glad that your hiring policy is “intelligence is overrated,” which guaranteed that I would be surrounded by like-minded people. I could not have survived the PhD process without the support from you and the lab-family you built. I know that when I think back on the years in the lab, I won’t remember the disappointments and frustrations, but will instead think fondly of the laughter and good times we had whenever we were together.

Thank you to my Doctoral Committee for your time and expertise in “Bird Science” – Drs. Mark Hauber, David Lahti, Carolyn Pytte and Sébastien Derégnaucourt. Your feedback was invaluable in shaping my thesis. Special thanks go to my collaborators Drs. Olga Fehér and James Gordon. Olga was the first person I met when I visited the lab, and her vehement arguments with Ofer during lunchtime inspired me to join his lab. I learned a lot from you, and I am glad we got to work on experiments together. Jim, thank you for lending us your spectrophotometer, and for always making yourself available to (patiently) answer my questions. Thank you to the staff and faculty at the Graduate Center and Hunter College for their support over the years. I am especially grateful to the Psychology Department at Hunter for providing me with the opportunity to teach undergraduates who were eager to learn about human and animal behavior.

I am deeply grateful to the many friends in and out of academia who supported me and kept me sane: Kristel Shu Bassett, Anisa Dhimarko, Jill Guzzo, Julia Hyland Bruno, Dina Lipkind, Martina Ljubičić, Petra Ljubičić, Alesandra Nahodil, M. Aaron Owen, Christina Roeske, and many, many others along the way. Thank you to my parents, who supported me in all my endeavors, even when I made the “weird” decision to study animal behavior. Finally, thank you to the two little zebra finches that were always looking over my shoulder while I worked on my thesis: you were a daily source of joy over the last few years.

vi Table of Contents

Abstract iv

Acknowledgements vi

List of Figures viii

List of Tables ix

Chapter 1 Introduction: Social Influences on Vocal Development 1

Chapter 2 Designing Virtual Environments to Study Social Influences on Behavior 11

Chapter 3 Zebra Finches Adjust their Activity Levels to Match Mobility Patterns

of Virtual Males 27

Chapter 4 Statistical Learning in Songbirds: From Self-Tutoring to Song Culture 45

Appendix Song Learning from Virtual Tutors: A Pilot Study 63

References 71

vii

List of Figures

Figure 1.1 A schematic of the complexity of possible social interactions throughout vocal development...... 4 Figure 1.2 A schematic of a Virtual Social Environment to test how developing songbirds use statistical rules to guide vocal learning...... 10 Figure 2.1 Spectral sensitivity of individual cone photoreceptors of the zebra finch and phosphor radiance spectra of the tablets...... 17 Figure 2.2 Video stimuli manipulations...... 18 Figure 2.3 Comparison of cheek and beak reflectance between live males and video males...... 19 Figure 2.4 Relative cone outputs for red beaks and orange cheeks...... 21 Figure 2.5 Results of video preference tests...... 24 Figure 3.1 Schematic of the setup...... 31 Figure 3.2 Movement speeds of birds across the two configurations...... 35 Figure 3.3 The Introduction Effect...... 36 Figure 3.4 Comparison of movement ratios between time points of high and low disparity between speeds...... 37 Figure 3.5 Birds dynamically adjust their activity levels over time...... 39 Figure 3.6 Scatterplots of the combined movements across all birds where video speeds are decreasing...... 40 Figure 3.7 Scatterplots of the combined movements across all birds where video speeds are increasing...... 41 Figure 3.8 Heatmaps reveal that birds tend to settle in front of tablets with males when the video is a static image...... 42 Figure 4.1 Transition from continuous to categorical signal over development.... 48 Figure 4.2 Self-tutoring in zebra finches...... 50 Figure 4.3 Adult songs of isolate and self-tutored sibling pairs...... 55 Figure 4.4 Syllables form tight clusters in tape-tutored and self-tutored birds but not in isolates...... 57

viii Figure 4.5 Distribution of isolate, self-tutored and tape-tutored song features...... 59 Figure A1 Similarity to tutor song at the endpoint of song development...... 66 Figure A2 Adult songs of birds trained with passive song...... 67 Figure A3 Adult songs of birds trained with video of the singing “lipstick” male. 68 Figure A4 Adult songs of birds trained with video of the singing “floating cheeks and beak” ...... 69

List of Tables

Table 3.1 An outline of the two different video configurations presented to the birds...... 30 Table 3.2 Calculation of Movement Ratios (MR) ...... 33

ix

Chapter 1

Introduction: Social Influences on Vocal Development

A modified version of parts of this chapter have been published in Current Opinion in Behavioral Sciences 7, 101-107 (2016), titled “Social influences on song learning.” Authors: Iva Ljubičić, Julia Hyland Bruno, and Ofer Tchernichovski

1 Background

In vocal learners, such as humans and songbirds, the young learn from their social networks not only through direct (potentially aggressive) interactions, but also by attending to the consequences of interactions between others. In humans, even two-day-old infants prefer to listen to infant-directed speech (Cooper & Aslin, 1990), which is higher in pitch, simplified in grammar, and slower overall (Kuhl et al., 1997). These manipulations facilitate word recognition

(Singh et al., 2009) and segmentation (Thiessen et al., 2005) by infants. The timing of vocal exchanges with the caregiver is particularly important: nine-month-old infants can quickly match the phonological structure of their mother’s vocalizations, but only if she answers their babbling promptly (Goldstein & Schwade, 2008). Similar influences have been observed in songbirds: juvenile zebra finches produce poor imitation of non-interactive song playbacks, but copy live tutors with high accuracy (Derégnaucourt et al., 2013; Eales, 1989). Vocal development in songbirds is also susceptible to complex social interactions with peers and adults other than parents (Derégnaucourt & Gahr, 2013; Soma, 2011). There is a large diversity of song learning strategies across songbird species: there are differences in the timing and speed of learning, repertoire size, and the degree of imitation (Beecher & Brenowitz, 2005). This social complexity, combined with the long duration of vocal development, makes it difficult to assess how social forces combine to shape the outcomes of vocal learning.

Here we focus on songbirds that are age-limited learners, where social influences occur within a narrow developmental window (Beecher & Brenowitz, 2005; Brainard & Doupe, 2002), making it feasible to track an entire vocal development under controlled conditions (Lipkind &

2 Tchernichovski, 2011; Tchernichovski et al., 2004). We review recent technical advances for quantifying the cumulative influences of social factors on vocal development. We start with techniques for tracking vocal interactions, non-vocal communication, and eavesdropping. We then present a framework for a more ambitious approach of experimenting with virtual social environments (VSE), in order to test the influence of specific social variables and potential synergies between them. If successful, VSEs could test a broad range of hypotheses about how songbirds interpret and incorporate social information during vocal development.

Complexity of the Social Environment

The social environment can potentially shape the course of vocal development through vocal interactions, non-vocal communication, and eavesdropping (Figure 1.1):

Vocal interactions: The role of auditory input on vocal development is much more extensive than merely providing ‘sensory templates’ for imitation. In humans, turn-taking and overlap avoidance are universal across languages, and minor variations in turn-taking tempo can define cultural identities (Stivers et al., 2009). At three months old, infants are already attentive to minor temporal delays in interactive episodes with their mothers (Striano et al., 2006), suggesting that perception of turn-taking rules develops very early. Many animal species, including vocal-non-learners, exhibit antiphonal vocal exchanges. Juvenile marmoset monkeys, for example, learn to avoid call overlap with family members (Chow et al., 2015). It is not known if antiphonal calling affects song development in songbirds, but since juveniles typically exchange thousands of calls with parents and peers every day (Elie et al., 2011), developmental effects are likely.

3 Observed(interac4ons(

Direct(interac4ons(

"

Simple(Networks( Vocal"" " Eavesdropping" Enemies" Non*vocal"" x"

juvenile(

Friends" Mates" 4me"i" Complex(Networks( 4me"i+1" 4me"i+n"

Figure 1.1. A schematic of the complexity of possible social interactions throughout vocal development. Vocal development may be influenced by many social experiences. These include direct vocal and non-vocal interactions, and observed or overheard interactions between others. At any point in developmental time, these influences can have varying effects as the bird’s social networks increase in complexity.

Non-vocal communication: In humans, nonverbal maternal responses increase the complexity of an eight-month-old’s speech, but only if these are contingently related (Goldstein et al., 2003).

Fourteen-month-olds will turn their head at a pointing gesture, and by 18 months they will use this social cue to learn new words (Briganti & Cohen, 2011). Non-vocal gestures appear to affect vocal development in songbirds as well, often in subtle ways: female cowbirds, which do not sing, guide juvenile males to produce certain dialects with quick wing strokes or beak gapes

(King et al., 2005; West & King, 1988). Remarkably, lesioning forebrain song nuclei eliminates the female’s preferences for high quality songs and generates disruptions within the social network (Maguire et al., 2013). In zebra finches, a large array of both affiliative gestures and aggressive interactions, such as preening and feeding or chasing and pecking, appear to correlate with tutor choice (Jones & Slater, 1996; Williams, 1990), but controlling for such social influences is challenging.

4 Eavesdropping: In order to acquire an appropriate vocal repertoire, juveniles need to learn about their surrounding social network. Human infants as young as 18 months can learn new words from overhearing adult conversation (Floor & Akhtar, 2006). Until recently, the capacity to make social inferences via observation was thought to exist in only a few species of non-human animals, but it now appears that many species of birds also take advantage of eavesdropping to acquire vital social information. For instance, juvenile song sparrows silently eavesdrop on interactions between adult males (Templeton et al., 2010), and often imitate more songs from an adult they hear interacting with another juvenile than one they interacted with directly (Beecher et al., 2007). Here, the timing of song learning is important: adult males only tolerate immature males within their territory, which suggests that opportunities for direct interactions may exist during early learning, while being limited to eavesdropping later on (Templeton et al., 2012). In a laboratory setting, canaries that overheard overlapping songs between two males were less likely to call to the interrupting “winner” male at a later time (Amy & Leboucher, 2009). This could mean that they perceive and remember dominance in singing interactions. Finally, cowbird females attend to the calling behavior of other females in their flock, producing more copulation displays in response to male songs previously associated with female calls (Freed-Brown &

White, 2009). These findings, however, have limited bearing on whether juvenile songbirds extract information by observing social interactions and how this affects song development.

5 Tracking social interactions during song development

Song learning is a means by which an individual establishes its place within a social network. Recent advances in computation power, data storage, and miniaturization of electronic devices have made it possible to study song development continuously (Derégnaucourt et al.,

2005; Lipkind et al., 2013) and even across generations (Fehér et al., 2009), but only in laboratory environments where social interactions are either restricted, or nonexistent. Field studies provide ecological validity, but monitoring vocal development in the wild is usually not feasible. As a result, cultural transmission of birdsong has been studied almost exclusively by sampling mature songs and quantifying population-level vocal changes over time (Goodale &

Podos, 2010; Williams et al., 2013). Although it is still prohibitive to study song learning continuously outside of the lab, technologies for tracking individual behavior in social environments are rapidly emerging (Dell et al., 2014). Recently, the spread of new foraging techniques in wild great tits was tracked over several weeks, revealing a gradual cultural spread with the emergence of strong conformity to cultural norms (Aplin et al., 2015). Similar tracking of individuals was used to determine changes in zebra finch social networks due to developmental stress (Boogert et al., 2014) but was limited to identification of birds at feeders and did not track moment-to-moment interactions between individuals within the network.

Adopting a similar approach in order to study vocal learning would require continuously recording vocalizations, relative 3D location, heading, and mechanisms for detecting non-vocal gestures.

6 Tracking vocal interactions: In humans, the LENA system (Xu et al., 2009) now allows dense sampling of early speech development over several months. Similarly, tracking vocal interactions in small songbirds is now feasible using lightweight wireless microphones

(Anisimov et al., 2014; Ter Maat et al., 2014). At least in semi-natural aviaries in the lab, these techniques now allow for tracking vocal interactions during pair-bond formation (Gill et al.,

2015) and could be easily extended to juvenile song development (Lipkind & Tchernichovski,

2011).

Tracking relative location and heading: In animals where movement is restricted to two- dimensional planes, it is relatively easy to track location and heading of multiple animals in social environments (e.g.: fruit flies (Branson et al., 2009), mice (de Chaumont et al., 2012) and fish swimming in shallow waters (Katz et al., 2011; Rosenthal et al., 2015)). GPS has been used to track movement patterns in humans (González et al., 2008) and wild baboons (Strandburg-

Peshkin et al., 2015), and 3D image analysis has been used to capture aerial displays in starling flocks (Bialek et al., 2012; Cavagna et al., 2010). In insects, tracking 3D trajectories requires highly controlled environments (Mischiati et al., 2014). To our knowledge, no techniques are currently available for monitoring the 3D moment-to-moment movements of small songbirds in a social environment over the course of song development. However, it is likely that combining the above technologies will make this possible soon.

Tracking non-vocal gestures: Recently, several groups have developed techniques for automatically classifying predefined behaviors, such as social interactions in mice (de Chaumont et al., 2012) and fruit flies (Branson et al., 2009), and courtship behavior using 3D body postures

7 in mice (Matsumoto et al., 2013). There are both technical and conceptual issues inherent in using automated detection of ethologically meaningful gestures. However, even a crude detection of body postures (Matsumoto et al., 2013) or gaze (Butler & Fernández-Juricic, 2014) could be useful for studying social effects on song learning.

Toward virtual social environments (VSEs) for studying vocal development

A major appeal of studying birdsong is the fact that, like human language, it is culturally transmitted. Figuring out how social processes accumulate to shape vocal learning in songbirds is within reach. Here we present a framework for a more ambitious approach of experimenting with virtual social environments (VSE), specifically at the level of direct social interactions, eavesdropping, and statistical learning.

Simulating social interactions: In a VSE, the relative contributions of different sensory information to song learning may be titrated in a controlled manner that cannot be achieved with live tutors. For example, does the orientation of a virtual tutor affect a juvenile’s developing song? Do juveniles prefer to imitate males that interact with them over males that are interacting with their own mates? Do juveniles establish coordinated call interactions with tutors, and does this influence their imitation decisions? Such virtual scenarios could be made non-contingent at first (i.e. by simply using video playbacks) to establish a baseline for more complex VSEs, where visual and auditory stimuli are interactive (King, 2015).

8 Simulating eavesdropping scenarios: One of the exciting questions that VSEs may answer is how, and at what level, a juvenile bird can learn to incorporate observed social information into its song. For example, to what extent can a bird derive social rules from observing aggressive interactions? Suppose that bird A displaces bird B, but bird B displaces bird C: will the juvenile observer infer a linear hierarchy from those pairwise interactions (Rose et al., 2004), and use this information to alter its developing song? Such an inference could affect moment-to-moment behavior (e.g., deciding which virtual bird to approach or exchange calls with (Amy &

Leboucher, 2009) as well as developmental trajectories related to tutor choice and song convergence or divergence within the peer group.

Simulating statistical learning scenarios: Human infants can perceive statistical regularities in spoken language very early (Marcus et al., 1999), which helps them to segment spoken language into words and sentences (Jusczyk, 2000). It is unknown to what extent songbirds acquire and use statistical information in the context of vocal development. At the simplest level, one might ask if social birds can keep track of the individuals around them, and relate this information to their own actions. For example, imagine a VSE where natural variations in song structure become contingent upon dynamic changes in the sex ratio of bird images presented: will the juvenile perceive this contingency and alter its song so as to attract more females (Figure 1.2)? If successful, tracking vocal changes this way would reveal which statistical rules are most effective in dynamically inducing vocal changes. Similar approaches could be developed for testing sensitivity to spatial and contextual rules of singing behavior.

9

Figure 1.2. A schematic of a Virtual Social Environment that can be designed to test how developing songbirds use statistical rules to guide vocal learning. Here, the VSE changes dynamically via real time pitch detection. As the bird sings, the pitch of his song alters the ratio of virtual females to males facing him, allowing us to test if the bird will perceive this contingency and adjust his song to attract more females.

Conclusions

In this chapter, we highlighted three categories of social factors that are likely to play a role in vocal development in both humans and songbirds: vocal interactions, non-vocal communication, and eavesdropping as means of acquiring pragmatic information about the social world. We suggest that these factors may modulate sensorimotor learning mechanisms that are well understood in songbirds, through many development studies (Okubo et al., 2015).

Designing VSEs to test specific hypotheses about social influences on vocal learning could provide an inroad into the ecological complexity of vocal development without sacrificing experimental control.

10

Chapter 2

Designing Virtual Environments to Study Social Influences on Behavior

11 Introduction

Continuous monitoring of social interactions is likely to provide deeper understanding of song development. However, establishing causality from such complex longitudinal data is unlikely to succeed unless complemented by controlled studies. We suggest that this can be achieved by developing virtual social environments (VSEs) that simulate the social context of song learning, through iterative interactions between the developing bird and a computerized visual and vocal interface. Video playbacks have been successfully used to study avian visual communication (Evans & Marler, 1991; Partan et al., 2005), species recognition (Adret, 1997;

Galoch & Bischof, 2006a), and individual recognition (Bird & Emery, 2008; Galoch & Bischof,

2006b). Video playbacks, however, were only rarely used to study song learning (Deshpande et al., 2014). A serious concern in using video is that songbirds have very different color sensitivity than that of humans, and therefore, video screens are likely to render a distorted (and perhaps even incomprehensible) color image from the point of view of the bird (Cuthill et al., 2000;

Fleishman & Endler, 2000).

Avian Color Vision

Human color vision relies on three types of photoreceptors (cones). Each of those three cone types is sensitive to a distinct range of the light spectrum: Short wavelength sensitivity

‘blue’ cones (SWS), medium wavelength sensitivity ‘green’ cones (MWS), and long wavelength sensitivity ‘red’ cones (LWS). Note that each cone type is sensitive to a broad spectral range, and there is much overlap in the spectral ranges of different cone types. The perception of color

12 stems, to a large extent, from the ratio in which different cone-types are activated. That is, our

‘color space’ has three dimensions, which means that it can be reconstructed by ‘blending’ three primary colors. Avian color vision, however, relies on four different cone types. They have an additional cone type that is sensitive to violet and ultraviolet spectrum (VS/UVS), depending on the species (Cuthill et al., 2000). To further complicate matters, birds have oil droplets that act as cut-off filters which tune the spectral sensitivity of their respective photoreceptors to longer wavelengths (Bowmaker et al., 1997; Vorobyev et al., 1998). Therefore, the color space of a songbird has higher dimensionality than that of humans, which means that blending three primary colors might not suffice to reconstruct the color space that they can perceive.

The main challenge of using video playbacks for animal experiments is that monitors are designed for human eyes. Each screen pixel is composed of three types of light emitters called phosphors. Each phosphor generates a fairly pure color tone (primary color): blue, green and red.

The blue phosphor stimulates mostly SWS cones, the green mostly MWS cones, and the red mostly LWS cones (Cuthill et al., 2000; Fleishman & Endler, 2000). Even though the color spectrum produced by the monitor has only three sharp peaks (RGB: Red, Green and Blue), whereas real-world color scenes are often composed of very complex spectra, the monitors are engineered to accurately replicate the sensation of natural colors by activating similar ratios of cones – in humans. In other words, the entire space of natural colors is accurately mapped into the three phosphors of each pixel, so as to mimic our perception of color. It should be noted that the term ‘phosphor’ is a bit of a misnomer: modern screens do not contain physical phosphor anymore, but they still receive the same three colors channels (RGB input) and each pixel digitally ‘blends’ them.

13 Since the number of cone types and spectral sensitivities differs so strongly between birds and humans, video monitors engineered for human eyes are unlikely to be ideal for birds. First, monitors do not emit almost any UV light, making the color space deficient for birds. Since their

UV cone will remain un-stimulated, a bird’s ‘white’ (which is perceived when all of the cones are stimulated in equal proportions) cannot be reconstructed (Cuthill et al., 2000). Second, the color (RGB) ratio of the screen will not necessarily activate the remaining three avian cones in a manner that would resemble their activation while looking at the natural image that the video images attempt to imitate. Therefore, the color space on a video monitor might look wildly different to a bird than it does to a human (Cuthill et al., 2000).

Still, birds can perceive video playbacks despite their chromatic distortions. In zebra finches and Bengalese finches, several video playback experiments show that they can recognize their own species (Galoch & Bischof, 2006a; Ikebuchi & Okanoya, 1999), and even recognize familiar individuals over strangers (Fleischman et al., 2016; Galoch & Bischof, 2006b).

Although monitor flaws do not exclude the use of video playbacks in avian research, when studying behaviors such as sexual selection or tutor choice using video monitors, it might make sense to attempt fixing some of those flaws.

In this chapter, we first tested the extent to which the color space of a video monitor matches the reflectance spectra of zebra finch males’ cheek plumage and beak colors, and if color monitors correctly stimulate zebra finch photoreceptors. We then edited and adjusted the colors of the video playbacks accordingly. Finally, we performed preference tests to examine if zebra finches prefer video playbacks that minimize chromatic distortion.

14 Methods

All experimental procedures were approved by the Institutional Animal Care and Use

Committees of Hunter College. All birds used in this study are from the Hunter College zebra finch colony.

Estimating Sensitivity Functions and Phosphor Radiance Spectra

To determine the relative activation of cone types in zebra finches across different stimuli, we need to estimate the sensitivity functions of all four photoreceptors, the reflectance spectra of the stimuli of interest, and the spectra of the ambient light in which the stimuli will be perceived. We can then attempt to match the 3 color channels of our displays to the sensitivity of zebra finch photoreceptors. Finally we must verify that we can properly calibrate the color intensity of our monitor, to achieve all the desired ratios of cone activation (Fleishman et al.,

1998).

To calculate the spectral sensitivity functions of zebra finch photoreceptors, we used

Maximov’s approximation formulas (Maximov, 1989) on published data for peak sensitivity of visual pigments (Bowmaker et al., 1997), while taking into account filters for oil droplets and ocular media (Hart & Vorobyev, 2005). We measured radiance spectra using a SpectraScan

PR-670 on Nexus 7 tablets (2013 model) displaying an all red, green or blue image on the screen at the highest brightness. Each spectrum was measured as a proportion of a white Spectralon standard.

15 Birds have oil droplets that are specialized to filter incoming light before it reaches the photoreceptor. Oil droplets tune sensitivity towards longer wavelengths, reducing overlap between adjacent cones, allowing for better color discrimination (Figure 2.1a). Coincidentally, even though videos lack a UV phosphor, the emission of our tablet R, G and B channels are similar to the zebra finch cone sensitivity range (Figure 2.1b). This means that that we can vary the intensities of the R, G, B colors independently on the monitor to affect the cone activation ratios as needed. Zebra finch UV cones can only be weakly stimulated by the tablets (Figure

2.1b), so colors with a UV component cannot be replicated faithfully with standard video. Even though the pure R, G, and B colors are a good match for photoreceptor sensitivity, it does not mean that cameras inherently blended the colors correctly during video recording. In the next section, we measured the radiance spectra from manipulated and un-manipulated videos and compared them to reflectance spectra from plumage of live males.

16

Figure 2.1. Spectral sensitivity of individual cone photoreceptors of the zebra finch and phosphor radiance spectra of the tablets. a) Spectral sensitivity functions are estimated from peak sensitivities reported in Bowmaker et. al (1997). We scaled the functions to account for filtering of light through the ocular media (dashed lines) and oil droplets (solid lines). Oil droplets tune sensitivity towards longer wavelengths, reducing overlap between adjacent cones, allowing for better color discrimination. b) Radiance spectra of the three different phosphors emitted by our tablets (dotted), with the cone sensitivity of zebra finches. The peaks of phosphor radiance are scaled to match the peaks of the cone sensitivities. The good match between the phosphor radiances and cone sensitivity indicates that we could alter the R,G,B color -space to change relative cone ratios if needed.

17 Measuring Reflectance from Zebra Finch Cheeks and Beak

To compare the plumage reflectance spectra of live males to the radiance spectra from virtual zebra finches, we measured reflectance of two regions of interest: the male’s red beak and orange cheeks. For live males, we measured reflectance from the cheeks and beak of 5 unrelated, adult males. For virtual males, we measured the radiance of the cheeks and beak from two different still images of the male: one with the colors left untouched, and the other as a gray- scale image with a controlled color imposed on the beak and cheek areas (see Creating Video

Stimuli section, below). All measurements were recorded using a SpectraScan PR-670. Each region of interest was measured as a proportion of a white Spectralon standard.

Creating Video Stimuli

We recorded videos of a singing adult male with a Nexus 7 tablet (Figure 2.2, left). We developed image analysis code to automatically identify and segment the cheeks and beak of the male in the videos, using Matlab R2015b. This allowed us to manipulate the ‘red’ beak pixels, setting them all to a standardized red value. We performed a similar ‘standardization’ to the

‘orange’ cheeks (Figure 2.2, right). We then converted each video frame into a gray-scale image, and overlaid the gray levels of the beak and cheeks with our standard ‘red’ and ‘orange’ colors

(Figure 2.2, middle). This way, we eliminated all the potentially distorted colors from our videos, retaining only our standard red and orange colors on beak and cheeks.

Figure 2.2. Video stimuli manipulations. Left, “normal male”: un-manipulated video where colors are untouched. Middle, “lipstick male”: The colors are removed from the male except for the manipulated colors on the beak and cheeks. Right, “floating cheeks and beak”: Manipulated cheeks and beak, without the context of the male.

18 Spectral Reflectance

The spectral reflectance of the beak and cheeks from live adult zebra finch males shows that there is some small reflectance in the UV range (wavelengths below 400nm) for beaks

(Figure 2.3 left, solid line), but not cheeks (Figure 2.3 right, solid line). We measured the radiance of the beak and cheeks from still images of the male in the un-manipulated video. The shape of the spectra of the beak and cheeks from a live male are different from the spectra of cheeks and beak from a virtual (video) male (Figure 2.3, solid vs. dashed lines). However, are these differences meaningful to the birds? The shapes by themselves don’t tell us how zebra finch cones are activated in response to these different stimuli. In the next section, we compare the activation ratios in zebra finch photoreceptors to assess how the manipulated and un- manipulated videos compare to plumage from live males.

Figure 2.3. Comparison of cheek and beak reflectance between live males and video males. Left: Mean reflectance spectra of the red beak from a live male (solid line) and a male displayed on a tablet (dashed line). Right: Mean reflectance spectra of the orange cheeks from a live male (solid line) and a male displayed on a tablet (dashed line). Videos of males have different spectral shapes compared to live males. These can be used to calculate how photoreceptors are activated across the different stimuli. All spectra are scaled to the maximum value of the beak radiance.

19 Calculating Cone Outputs

We can now test and cross validate if our videos are likely to be appropriate for zebra

finch vision, and if our color manipulations have any effect. That is, we need to compare our

estimates for relative cone activation across live plumage and video colors and make sure that

cone activation ratios are similar across stimuli. To compare how a zebra finch perceives the

different visual stimuli, we calculated the relative cone output for each stimulus, taking into

account the sensitivity of each cone and ambient light where birds viewed the tablets.

Cone output was calculated by integrating for the area under each curve formed by the

product of the ambient light spectra (where the stimuli are viewed), the reflectance spectra of the

stimulus of interest, and the individual photoreceptor cone functions (as calculated in the

previous section) (Fleishman & Endler, 2000). Ambient light was measured with SpectraScan

PR-670 as a proportion of a white Spectralon standard.

Even though red beaks reflect in the UV range (Figure 2.3left), our artificial ambient light does not emit radiation in the UV range, so the UVS cones will not be stimulated (Figure 2.4).

Beaks on a tablet, whether the image is manipulated or not, would stimulate LWS cones slightly more than real beaks (Figure 2.4, left panel). This will make the red on a tablet appear more saturated (Fleishman & Endler, 2000). Cheeks from all stimuli would affect cones in almost equal ratios (Figure 2.4, right panel).

20

Figure 2.4. Relative cone outputs for red beaks (left) and orange cheeks (right). Stimuli from un- manipulated images are represented in black, while stimuli from manipulated images are in gray. Left panel: A beak on both tablets stimulates LWS more than a live bird (red), making it appear more saturated. Right panel: Cheeks stimulate cones in similar ratios to that of a live male, regardless of the video source.

There is very little difference in color representation of the virtual beak and cheeks between the manipulated and un-manipulated images. Red/orange colors from a tablet stimulate the receptors in zebra finch eyes in the same relative proportions as real red/orange plumage, so we did not modify the color space further. In the next section, we performed preference tests to see if birds are behaviorally affected by the color manipulations.

21 Video preference tests

In order to judge if zebra finches had a preference for videos where colors had been

manipulated, we presented them with three versions of the same adult male:

1. Un-manipulated video: Colors are untouched (Figure 2.2, left)

2. “Lipstick” male: gray-scale male video overlaid with standardized cheeks and beak

colors (Figure 2.2, middle). This manipulation eliminates all colors from the video while

retaining standardized beak and cheek colors.

3. “Floating” cheeks and beak: presented against a gray background, without the bird and

without any background scenery (Figure 2.2, right).

Three Nexus 7 tablets (2013 model) were placed equidistant from each other in a 34” circular arena. Each bird (n=14, 7 males and 7 females) was placed singly in the arena while all three tablets were presenting video playbacks in a loop. The session concluded when the tablets turned off automatically after 40 minutes. Songs were played intermittently from all three tablets simultaneously to attract the birds to the tablets. Songs consisted of long bouts with 9 motifs of a four-syllable song, and short bouts with 4 motifs within each bout. Songs played, on average, every 5.5 minutes and the rest of the video was silent. Sessions were continuously recorded from an overhead video camera

(PointGrey Grasshopper).

Each bird received one daily session for six days. We changed the spatial arrangement of

the tablets with every session to minimize place-bias. We used idTracker software to

automatically identify bird locations (Pérez-Escudero et al., 2014) and Matlab 2017a for data

22 analysis. When a bird was within an 8-inch radius of a tablet (the body-span of 2 zebra finches), we categorized its behavior as watching the tablet. Sessions where the bird did not approach any of the tablets for at least 5% of the session were discarded. We discarded data from three females and one male that did not approach tablets for more than half of the six sessions.

Results

As expected, birds preferred the un-manipulated video over the floating cheeks & beak video, as they spent 19% of their time watching the un-manipulated video and only 4% of their time watching the floating cheeks & beak video. Interestingly, however, birds strongly preferred the ‘lipstick’ male video, and spent 59% of their time watching it (Figure 2.5). Differences in birds’ preferences across the three stimuli were statistically significant (non-parametric Friedman test Q=17.5, p<0.01, n=10 birds). Post-hoc analysis showed statistically significant preference for the “lipstick” video compared to the un-manipulated video (Wilcoxon sign-rank tests with

Bonferroni correction for multiple comparisons, Z=-2.83, p=0.0026). Both male videos were preferred over the floating cheeks (‘lipstick’ male vs. ‘float’: Z=-2.83, p=0.0026; un-manipulated male vs. ‘float: Z=-2.60, p=0.0047). Birds spent only 18% of their time elsewhere in the arena, indicating strong attraction to the video playbacks. There was no significant difference in preference for the location of the tablets (non-parametric Friedman test Q=3.36, p>0.05), indicating that preference for the “lipstick” videos was stronger than place preferences. These results indicate that video color manipulations improved the attractiveness of video playbacks.

23

Figure 2.5. Results of video preference tests. Birds preferred to approach the video with a “lipstick” male more than a video of a male with “normal” colors. Both males were preferred over “floating” cheeks and beak, suggesting that the context of the bird is important to the birds. Gray dots are the average time in front of a tablet across six sessions for each bird, jittered for visibility. Red lines are the mean, with red shading for 95% confidence intervals and blue shading to indicate one standard deviation.

24 Discussion

We found that manipulating video colors, and retaining only colors that are properly adjusted to zebra finch vision (red beak and orange cheeks) had a remarkable effect on behavior.

When given a choice between manipulated and un-manipulated videos, birds strongly preferred to spend more time in front of a manipulated video of a gray-scale male with colored cheeks and beak. This preference was maintained even as the video was played from different locations, suggesting a consistent bias towards the manipulated videos.

This video manipulation was a relatively easy task, since the color space of our video monitor mapped well to the sensitivity of zebra finch cones, though different monitor brands might vary in phosphor outputs (Cuthill et al., 2000). This allowed us to calculate the output of those cones to estimate how colors from video displays could affect zebra finch color perception.

After adjusting the colors of the video playbacks to make the beak and cheeks stand out, we found that zebra finches largely prefer this manipulation. Even though the cheeks and beak from both manipulated and un-manipulated videos similarly stimulate zebra finch photoreceptors

(Figure 2.4), birds prefer to watch the “lipstick” male over a male with un-manipulated colors. It could be that a gray-scale video simplifies the color-space in the video playbacks, removing the brown plumage on the flanks of the birds, along with the filter adjustments that the recording camera may have used to change the background lighting presented in the video. This allows salient features (the cheeks and beak) to stand out, potentially acting as a supernormal stimulus

(Tinbergen, 1951). However, it is not the higher contrast alone that the birds are responding to,

25 since birds preferred the red/orange colors against a gray background, only when they were in the context of a bird.

It is clear that manipulating the colors to remove the color artifacts elicited strong preferences. Such videos could be used to test for social influences on song development in juveniles, without sacrificing experimental control. We have done preliminary work to determine whether ‘lipstick’ males can drive stronger imitation of a complex song than the ‘floating’ cheeks and beak, but, unexpectedly, birds had very good imitation even from passive playbacks alone (Appendix). Adjustments to the pilot study will have to be made to determine whether videos that elicit strong preferences in an arena setting can be used to drive vocal imitation in juveniles.

If successful, virtual reality experiments could potentially combine ecological validity with experimental control (Bohil et al., 2011). One of the most successful virtual environments for studying animal behavior is the drosophila “flight simulator” (Brembs, 2000; Wasserman et al., 2015). In these experiments, a fruit fly is suspended, tethered to a tiny rod. As the fly attempts to move, forces generated by its wings feed back to an interface that moves and rotates a surrounding display image in a manner that simulates active navigation flight. Similar approaches were recently used in the mouse (Chen et al., 2013; Harvey et al., 2009) and in humans (Ekstrom et al., 2005). Once established, such virtual reality scenarios can bypass physical and biological constraints: fundamental questions about spatial learning may be tested by controlling how movement affects perspective in virtual tasks (Chen et al.,

2013).

26

Chapter 3

Zebra Finches Adjust their Activity Levels to Match Mobility Patterns of Virtual Males

27 Introduction

In most animal species, only a fraction of juveniles survive to maturity. In addition to being small, inexperienced, and vulnerable, young social animals must also learn how to successfully integrate within their group. They must learn how to identify and interact with friends and enemies (Nowak, 2006; Rubenstein & Wrangham, 1987), and how to utilize public and private space (Markham et al., 2013). Group cohesion is maintained through successful interactions between individuals that can quickly modify their behavior based on a partner’s response. Failure to respond correctly can limit interactions between individuals, which can have cascading effects and lead to fission of the social group (Seebacher et al., 2017; Silk et al., 2014).

Animals living in groups often have to match their movements to their neighbors in order to maintain group cohesion and to avoid collision. Schooling fish, for example, copy changes in speed from neighbors directly in front or behind them to avoid immediate collisions, but will copy directional information only from the fish in front of them to maintain group cohesion

(Katz et al., 2011). Tracking three-dimensional flying movements of birds in flocks of thousands of starlings reveals that individuals interact only with their 6-7 nearest neighbors regardless of flock density (Ballerini et al., 2008). This smaller-than-expected number of birds that influence movement at the individual level actually promotes cohesion of the flock at the group level and increases survival during predator attacks: an incoming predator would otherwise force individuals to splinter from the group to avoid collision.

28 Zebra finches are a gregarious species and in the wild they build nests in large colonies of up to 200-350 adults, depending on the season (Zann, 1996). Within these colonies, however, individuals forage and travel in small groups even outside of the breeding season – most frequently in mixed-sex pairs (the pair-bond), but also in small groups of 3-10 individuals

(McCowan et al., 2015). During the breeding season, these long-term pairs synchronize their foraging with each other so that they appear at the nest sites at the same time (Mariette &

Griffith, 2012, 2015). It is not known, however, at what time scales zebra finches coordinate their behaviors with individuals and how that promotes group cohesion. How responsive are zebra finches to minor changes in activity levels of a few individuals in the group? In this chapter, we test how sensitive zebra finches are to incremental changes in movement, by using video playbacks to precisely control the movements of virtual males that are presented to birds.

We predict that zebra finches would be sensitive to increasing and decreasing movements of males in videos, and would adjust their movements to match the speeds of the males.

Methods

We used 9 zebra finches from the Hunter college colony. All experimental procedures were approved by the Institutional Animal Care and Use Committees of Hunter College.

Video stimuli

To assess how zebra finches respond to movement patterns they can observe in nearby individuals, we presented them with video playbacks of male zebra finches with varying movement speeds. We recorded and then manipulated videos of two adult males. We edited

29 those videos to make small, incremental changes in the apparent amount of their movements. We did this by slowing the speed of the recorded videos, in four ways: 100% (normal) speed, 50% speed, 33% speed, and 0 speed (frozen image). We used videos where the males were perching and occasionally moved their heads, but otherwise made no drastic changes to body posture. We played the videos on small tablets (Nexus 7; 2013 model) for a total of 40 minutes, changing the video speed every 5 minutes.

Testing setup

Each daily session lasted 40 minutes and included a sequence of eight videos, played back-to- back. These 5-minute videos were presented in increasing (0à100%) and decreasing

(100%à0) speeds, in two configurations (table 3.1). Each bird (6 males and 3 females) was randomly assigned to start in the or configuration for three consecutive days, after which the configuration was switched for counterbalance.

Configuration Configuration Start time Session (min) Half Video Speed Video Speed 0 1st 0% 100% 5 1st 33% 50% 10 1st 50% 33% 15 1st 100% 0% 20 2nd 100% 0% 25 2nd 50% 33% 30 2nd 33% 50% 35 2nd 0% 100%

Table 3.1 An outline of the two different video configurations presented to the birds. Video speeds increased (red) and decreased (blue) in different halves of the videos, depending on the configuration.

30 Birds were individually placed in the testing space (a large circular arena) for approximately half an hour for two days to get them acclimated to the arena and to being handled. On the third day, the 3 tablets were turned on before the bird was released into the arena. Each of the two tablets displayed videos of a different male with the same speed manipulations presented simultaneously. A 3rd tablet showed an empty perch. This spatial arrangement represented a social polarity within the arena (Figure 3.1). The social polarity was changed every day, by changing the videos that were displayed on the tablets, to prevent place- bias from affecting our analysis. A camera recorded video from directly above the arena to be analyzed at a later time. We used idTracker software to automatically identify bird locations from the videos (Pérez-Escudero et al., 2014) and Matlab 2017a for data analysis.

Figure 3.1. Schematic of the arena setup. Two tablets played videos of two different males moving at the same speeds, depending on the configuration. A third tablet was a video of a blank perch. This represents a social polarity for the bird watching the videos, which was changed every day to prevent place bias.

31 Video Tracking & Analysis

We tracked and analyzed 6 sessions for each bird (n=9), for a total of 54 sessions. Two of those sessions, from two different birds, were discarded because the birds were hiding out of view of the camera. Using the software idTracker we semi-automatically set a threshold to capture the birds in the video frames, so as to differentiate the bird from the background (Pérez-

Escudero et al., 2014). The threshold varied across birds and sessions, depending on color contrasts and movement. Because threshold may affect sensitivity to movement we normalized the speed into relative within-session units. This allowed us to pool data across sessions: We calculated the distance (in pixels) the bird traveled between 10 frames of the recorded video

(over 1/3 of a second). Travel distances were calculated in sliding windows throughout the 40 minutes a session. Missing values (for example, if there were frames where the bird moved too fast to get picked up by idTracker software) were omitted. Then, for every 5-minute chunk of video, we calculated the average of those distances, to estimate the bird’s mean speed of movement during each video playback. We then normalized these speed estimates by dividing each value within a session by the maximum of the average movement within that session.

In order to judge if birds’ movement speeds were influenced by the video speed, we compared movement between increasing speed (0, 33%, 50%, 100%) to its movement during a decreasing speed block (100%, 50%, 33%, 0), by looking at movement ratios (Table 3.2). We average the ratios of movements across the 3 sessions, and plotted the log ratios. This measure approaches 0 (i.e., log(1)) when the videos presented do not affect birds’ movements. We ran a repeated measures ANOVA to see if there was a change in the ratios with respect to time.

32 Mauchly’s test indicated the sphericity assumption was violated (χ2(27)=50.6, p=0.0039) so we corrected the degrees of freedom using Greenhouse-Geisser estimates (ε=0.33).

config config Video Video Movement Speed Speed Speed Ratio (MR) Disparity

100% 0% s100/s0 high

50% 33% s50/s33 low

33% 50% s33/s50 low

0% 100% s0/s100 high

0% 100% s0/s100 high

33% 50% s33/s50 low

50% 33% s50/s33 low

100% 0% s100/s0 high

Table 3.2 Calculation of Movement Ratios (MR). To test how movement speeds differed between the two configurations, we took the ratio of the birds’ movement speeds (s) in the configuration

over those in the configuration. This allowed us to compare time points where there is a high video speed disparity (100:0) and a low speed disparity (50:33)

To create a scatterplot of movement between tablets for all of the birds, we rotated the video images of arena space so that the coordinates of each tablet remain invariant to spatial arrangements across sessions (those rearrangements were done in order to control for place biases). We then generated heat-maps of the cumulative time the birds spent in each location, relative to the position of each tablet. The color-map was designed to be proportional to those square cumulative values, in order to capture the distribution of birds’ stationary positions over each session.

33 Results

Video movement drives similar movement in the observing birds

If changes to video speeds affected bird movements in the arena, we expected to find maximum movement when full-speed videos were playing (100% speed), and minimum movement during display of static images (0% speed) (Figure 3.2a). We found that the movement of the virtual males in the video playbacks affected the movements in the observing birds. In the configuration, this effect is apparent in both the first and second half of sessions

(Figure 3.2b). In the configuration, this effect was only noticeable in the second half of the session, when video speeds decreased (Figure 3.2c). This discrepancy is probably an outcome of an ‘introduction’ effect, where birds move a lot at the beginning when they are introduced into the arena, regardless of configuration. This can be illustrated by averaging the sessions across all birds, where we would expect a movement ratio to even out across the two configurations

(Figure 3.3a). However, as seen in Figure 3.3b, birds moved most during the first 10 minutes of each session, regardless of the videos presented to them, while the rest of the sessions oscillate around the expected 0.5 value. This greater tendency to move at the beginning of sessions potentially interacted with the effect of video speeds on bird movements in the first half of sessions: dampening responses to increasing video speeds, while amplifying responses to the decreasing video speeds. This is only a concern early during the session, and therefore results are more robust during the second half of each session.

34

Figure 3.2. Movement speeds of birds across the two configurations. a) We expected maximum response when videos were playing at full speed, and minimum response when videos were static images of males b) In the configuration, most of the birds conformed to the expected response pattern: Birds decreased their movements in the first half of the sessions, and increased their

movements in the second half. c) In the configuration, the birds maintained high activity in the first half of sessions, while decreasing their movements in the second half. Gray dots are the normalized data points (3 sessions per bird) and black lines are the polynomial fit (2 degrees) of the 3 sessions for each bird.

35

Figure 3.3. The Introduction Effect. a) If birds were maximally responding to normal videos, and minimally to the static images, we would expect the responses to average out across the two configurations (~0.5). b) Average movement data across both configurations shows there is a steep decline in the first 10 minutes, indicating, an ‘introduction effect’ where birds move most at the beginning of their sessions for both configurations, regardless of the speeds displayed on tablets. Error bars depict the standard error.

To get a measure of how bird movements differ across the two configurations, we compared the time points that had a high video speed disparity (100:0) and a low video speed disparity (50:33) (Table 3.2). This ratio gives us a measure of any potential bias towards moving faster at time points that correspond to higher video speeds. We found that birds move more when the videos are a normal speed compared to movements in response to static video images

(Figure 3.4 left). By counting the number of instances above the zero line, where movements would be identical, we found that birds are 77% likely to move more when watching the higher speed movie, compared to the 50% expected ratio under the null hypothesis that there is no

36 difference between the two configurations (binomial test, p=0.00044). Birds are also 61% likely to move more in response to videos playing at 50%, compared to videos at 33%, but this does not reach significance (p=0.055) (Figure 3.4, right).

Figure 3.4. Comparison of movement ratios between time points of high disparity (100:0) and low disparity (50:33) speeds. Violin plots show the probability densities of the data with the median plotted as a red line. The zero line is the identity line, where movements are equal. Both violin plots are the widest above the zero line, which indicates a bias to move more in the presence of higher speed videos. The black circles are the data points, jittered for visibility.

How do these movement ratios change over time? Is the introduction effect real, or is it merely a habituation effect, where the birds generally decrease their activity within a session?

Here we plot normalized average movement ratios of each bird over time. If responses are random, the log of the ratio would center around the zero line, and there would be no significant change over time. This is our null hypothesis. A positive value, however, indicates more movement in the configuration, while a negative number indicates more movement in the configuration. If birds are dynamically adjusting their activity levels over time, we expect to see a shift above or below the zero line that corresponds to time points with high speed disparities

(100:0) in the direction of the configuration where the normal speed video is playing.

37 A repeated measures ANOVA revealed that there was a statistically significant effect of time on movement ratios, even after Greenhouse-Heisser correction of the degrees of freedom for violation of sphericity (F(2.3,18.48)=8.12, p=0.002). These results indicate that movement ratios differ over time, so the birds are changing their movements depending on what is presented to them on the tablets (Figure 3.5). We limited post-hoc comparisons to chunks of time with high video speed disparity (100:0).

First Half of Video Sessions

The configuration starts with a normal speed video and goes down to a static image, while the reverse is true for the configuration. Thus, if birds are consistently biased towards more movement at 100% and less movement at 0%, the ratios should be biased above the zero line at the start of sessions, and become negative by the end of the first half. However, because of the introduction effect, we only see a small bias to positive movement ratios (after normalization) at the beginning of sessions (Mean=0.19, SD=0.27), but this flips at the end of the first half of sessions (Mean=-0.50, SD=0.71) (Figure 3.5, left half). Paired t-test shows these are statistically significant after Bonferroni correction (t(8)=3.08, p=0.015). This tells us that birds are responding to the videos and increasing their movements when video speeds increase.

Second Half of Video Sessions

We see a similar effect in the second half of videos, where videos start with a static image in the configuration, and end at normal speed, with the opposite true for the configuration.

Here we expect the log ratios to be negative and then flip to positive. And this is indeed what we found: log movement ratios are negative (Mean=-0.71, SD=0.93) and flip to positive

38 (Mean=0.99, SD=0.94) (Figure 3.5, right half). Paired t-test shows these are statistically significant after Bonferroni correction (t(8)=-3.33, p=0.01). This tells us that birds continue to change their responses to match video speeds, and are not affected by habituation.

We also test if there is a flip when video speeds 33:50 are compared to 50:33 in the second half of sessions. We expect more birds to be negative and then flip to positive, but this was not statistically significant (t(8)=-0.67, p=0.52). This tells us that the difference between

50% and 33% speeds might be less noticeable to the birds. We did not test for this in the first half of the sessions because of a possible interaction with the introduction effect.

Figure 3.5. Birds dynamically adjust their activity levels over time. Log of the movement ratios between the two configurations reveals that there are significant differences over session time- points, suggesting that birds are responding to changes in tablet speeds.

39 Video playback influence on movement is spatially localized

How are the video speeds affecting the patterns of the birds’ movements in the arena, and are their movements related to the social polarity? To visualize the movement patterns, we plotted the combined bird locations across each session. These scatterplots show us that the movements are occurring between the two tablets with the virtual males, with fewer visits to the tablet with a blank perch. Birds exhibited more movements between the two tablets during normal video speeds than during static images (Figure 3.6 and 3.7). When video speeds were decreasing, birds decreased their movements between all 3 tablets as the virtual males slowed down from normal speed to a static image (Figure 3.6). However, when speeds increased, movement patterns between tablets remained high in the first half of the configuration, (Figure

3.7 top row), due to the ‘introduction effect’ we discussed earlier. This introduction effect also affects the beginning of decreasing sessions in the configuration, where it amplifies the birds’ response, as shown by the dense movement patterns (Figure 3.6, bottom row, left).

Figure 3.6. Scatterplots of the combined movements across all birds where video speeds are decreasing. Birds are affected by the social polarity and move most between the videos of the two males, and rarely visit the video with the blank perch. As the video speeds decrease, the density of movement patterns also decreases.

40

Figure 3.7. Scatterplots of the combined movements across all birds where video speeds are increasing. Birds are affected by the social polarity and move most between the videos of the two males, and rarely visit the video with the blank perch. In the top row, birds are affected by the introduction effect and move a lot between the two videos with males, regardless of the speeds presented to them. However, as the speeds increase in the second half of a session (bottom row), the density of movement patterns also increases.

Birds were more likely to ‘cuddle’ in front of tablets when virtual males went still

Scatterplots, however, do not show where birds stand when they’re not moving. To visualize where birds were choosing to settle in the arena, we created heatmaps of the combined bird sessions. Here we focused only on the second half of each configuration to avoid the

‘introduction effect’. When virtual males decrease their movements, the heatmaps go from having a few distinct peaks (Figure 3.8, top left), to having many distinct peaks (Figure 3.8, top right), indicating that the birds prefer to settle in front of the tablets with males when the video becomes a static image. When the video males increase their speeds, we see a similar pattern, though it is less pronounced: there are many distinct peaks when the videos are static (Figure 3.8, bottom left), with fewer peaks when videos are moving at normal speeds (Figure 3.8, bottom right). This suggests that slowing videos down might be more salient than increasing video

41 speeds, at least during the second half of a session. These heatmaps show that the birds prefer to

‘cuddle’ next to virtual males, and even though birds visit the perches (as in scatterplots in

Figure 3.6 and 3.7), they rarely settle in front of them (Figure 3.8).

Figure 3.8. Heatmaps reveal that birds tend to settle in front of tablets with males when the video is a static image. Top row: There are more peaks when the virtual males stop moving (right vs. left). Bottom row: Similarly, there are more peaks when virtual males aren’t moving (left vs. right). Birds prefer to ‘cuddle’ in front of tablets with virtual males, but not the blank perch.

Discussion

We showed that zebra finches adjust their movement patterns in response to changes of video speeds of movies presenting conspecific males. The effect was highly significant when comparing normal speed video to a static image, but even presenting slightly higher video speeds

(50% vs. 33% speeds), we still observed a bias to move more while watching higher speeds.

Previous studies showed that zebra finches can recognize stationary birds in a video: Fleischman et al (2016) showed that female zebra finches could be trained to discriminate between mates and

42 strangers from a photograph presented on a digital monitor. Furthermore, when our virtual males stop moving, the birds preferred to ‘cuddle’ in front of tablets with a male rather than a tablet with a blank perch, suggesting a preference to roost near social partners. Taken together, our findings suggest that zebra finches dynamically adjust their movement patterns to statistical patterns they can observe in the movement of neighboring, conspecific birds.

One possible function could be that coordination of behaviors increases affiliation. In humans, simple tasks to synchronize movement patterns between social partners increases cooperation, trust, affiliation (Hove & Risen, 2009) and altruistic behavior compared to those who were given tasks with asynchronous movement patterns (as reviewed in Trainor & Cirelli,

2015). In adult zebra finches, call coordination is associated with establishment and maintenance of strong pair bonds (Benichov et al., 2016; D’Amelio et al., 2017; Ter Maat et al., 2014). To coordinate, individuals have to precisely assess their own behavior, and adjust to their partner’s response. Our results show that birds prefer to move more between two virtual males when the videos are moving, and to ‘cuddle’ in front of tablets when the males stop moving, suggesting a possible social affiliation.

Could the ‘freezing’ of virtual males be a signal of predation risk? Zoratto et al (2014) played videos of starling flocks under attack to conspecific individuals and found a decrease in mobility, while non-threatened flocks increased mobility in individual birds. One way to confirm/rule out this explanation of our results would be to run further tests in the arena: analysis of vocalization as well as using close-up video recordings could show what birds are doing when they settle in front of tablets. If birds are sleeping, it is unlikely that they ‘cuddle’ because of a

43 freeze response due to increased predation risk. If birds do not fall asleep, but continue to vocalize during non-moving videos, it would also diminish the predation risk explanation of our results. Our audio recordings were not of sufficient quality to make this analysis possible, but new technologies allow for lightweight wireless microphones that can be attached to the birds to record audio and monitor changes to heart rates (Anisimov et al., 2014; Ter Maat et al., 2014).

A trivial explanation to why movement patterns of virtual males affected our birds could be that slowing down videos alters the motion quality of our video. In our experiment, we used video clips where our males exhibited a small range of head movements to minimize jerkiness when slowed down. Even such incremental range of motion affected our birds when slowed down, suggesting that something else other than motion quality was responsible for our results.

Our study shows that zebra finches are sensitive to small changes in movement of virtual males and will adjust their activity levels to move more between the two virtual males when the males are moving at normal speeds. This sensitivity to movements of other individuals has important implications for coordination dynamics with other individuals within their group, which could have cascading effects on group structure (Couzin & Krause, 2003). This is the first step in building virtual reality scenarios that could test how zebra finches coordinate behaviors and maintain small sub-groups within large flocks (McCowan et al., 2015). Making the virtual social environments interactive could test the limits of fission-fusion dynamics as groups vary in size (Silk et al., 2014).

44

Chapter 4

Statistical learning in songbirds: from self-tutoring to song culture

A slightly modified version of this chapter has been published in Phil. Trans. R. Soc. B 372 (2016) Authors: Olga Fehér, Iva Ljubičić, Kenta Suzuki, Kazuo Okanoya, and Ofer Tchernichovski

45 Introduction

Songbirds are a useful animal model to study the biological foundations of human linguistic behavior, because there are numerous parallels in the behavioral, neural and genetic mechanisms between birdsong and language (Doupe & Kuhl, 1999; Haesler et al., 2004). Vocal learners, such as humans and songbirds, acquire complex vocalizations from adult individuals during development. In both songbirds and humans, vocal development is constrained by biases that guide acquisition and shape the structure of the communication system (Kirby et al., 2008;

Plamondon et al., 2010). In humans, artificial language paradigms (for review, see (Gomez &

Gerken, 2000)) have been used to study the relationship between these biases and linguistic structure. These methods provide researchers with great flexibility in manipulating the linguistic input of learners and have proved very useful in identifying learning biases. However, they are of limited ecological validity, because it is unclear how the short episodes of learning an artificial language may scale up and generalize to the developmental process of acquiring natural vocalizations. Songbirds give us the chance to have full control over the sensory input during the entire developmental period and to examine what features of the sensory input are most effective in driving change in the vocal output during developmental vocal learning.

A major thread of research in language acquisition has focused on the importance of statistical learning (Romberg & Saffran, 2010). Humans are able to extract statistical regularities from their environment from a very young age. Pre-linguistic children, for instance, can acquire conditional relations (Saffran et al., 1996), distributional frequencies (Maye et al., 2002), and cue-based (Thiessen & Saffran, 2003) and structural information (Marcus et al., 1999) from

46 linguistic stimuli. These types of statistical learning all play a role in language acquisition

(Romberg & Saffran, 2010), but their relative influence and exact relationship are unclear.

Songbirds also employ multiple learning strategies during development (Liu et al., 2004;

Tchernichovski et al., 2001), but to what extent statistical learning is involved in song acquisition remains largely unexplored. Here we use a songbird, the zebra finch (Taeniopygia guttata), to investigate how distributional information in the vocal input affects the development of phonetic categories. We compared three conditions: no input (isolation), normal/categorical song input

(wild-type), and self-input which is initially non-categorical (self-tutored). Our self-tutored birds had the “illusion” of being tutored by plastic bird models, but for vocal input we used the bird’s own recent vocal output. This recursive approach allows us to judge what statistical features of song structure can be internally generated and to study the time-course of phonological category formation.

In zebra finches, vocal development begins with babbling at about day 30 post-hatch and ends around day 90 (Figure 4.1). Zebra finches sing intensely during this period: they produce about one million song syllables which we record and analyze making it possible to track moment-to-moment changes that occur during development (Tchernichovski et al., 2001). The early song, called subsong (Figure 4.1a), is highly variable, and acoustic features such as syllable duration and pitch are broadly distributed with no apparent clusters (Figure 4.1c, bottom panel).

In most birds, the first developmental changes in song structure can be seen at about day 50, when the broad distribution of song features changes, and distinct clusters begin to form (Figure

4.1c, middle panel). These clusters become increasingly dense and tight, until they appear as distinct syllable types in the mature zebra finch song (Figure 4.1c, top panel). The song by this

47 stage is a categorical signal, composed of syllable types organized into motif units produced in a fixed order (Figure 4.1b). After song crystallizes, at about day 100, adult zebra finches retain their stereotyped song motif for the rest of their life.

Figure 4.1. Transition from continuous to categorical signal over development. a) Sonogram of a juvenile zebra finch (day 40) showing highly variable, undifferentiated syllables. b) Song of an adult zebra finch showing stable syllable types falling into distinct categories. Introductory syllables are denoted by i; letters stand for song syllables that are repeated in a fixed order (ABCDE…). c) Development of syllable types (categorical signal) in a wild-type bird. Red dots represent song syllables: mean frequency modulation (y axis) is plotted against syllable duration (x axis, both normalized). Bottom panel represents distribution of syllables at the subsong stage, top panel at adulthood.

48 In the absence of song tutoring, socially isolated zebra finches improvise an abnormal song, which typically lacks a stable motif and shows high acoustic variability across renditions compared to wild-type songs (Williams et al., 1993). In a previous study we showed that wild- type song culture emerges de novo over a few generations when birds imitate isolate song in iterated learning chains (Fehér et al., 2009). This suggests that zebra finches have learning biases that steer songs towards wild-type features even when such models are absent. But why is the isolate song abnormal to begin with? Evidence suggests that birds cannot use their own vocal output as a sensory template. The forebrain (cortical) song system is composed of two functionally distinct anatomical pathways: a posterior premotor pathway responsible for adult song production and an anterior forebrain (basal ganglia) pathway, which is involved in early song production (vocal babbling) and in song learning (Bottjer et al., 1984; Jarvis & Nottebohm,

1997; Nottebohm et al., 1976; Ölveczky et al., 2005; Scharff & Nottebohm, 1991). The pre- motor song system receives auditory input and can respond strongly to song playbacks, most strongly to the bird’s own song (Doupe & Konishi, 1991; Margoliash & Konishi, 1985). This sensory activation of the song system is crucial for song learning (Roberts et al., 2012).

However, during singing, the motor song system becomes momentarily deaf (McCasland &

Konishi, 1981), probably in order to avoid simultaneous sensory and motor activation of the same neurons.

In the current study, we bypass the biological barrier that prevents isolates from forming a song template by “tricking” young birds into imitating delayed playbacks of their own developing song. This self-input is the simplest possible closed-loop recursion we can imagine, because it involves learning without any external input, through recursion occurring many

49 thousands of time over development (Figure 4.2). This process is similar to inter-individual iterated learning, and it may amplify internal biases in a similar fashion but in this case over cycles of self-imitation. The principal question is whether self-iterated learning is sufficient to steer the vocal output towards wild-type song features, and crucially for the purposes of this paper, if it prompts normal developmental processes by allowing the bird to efficiently generate phonetic categories in the absence of categorical input. In other words, our method allows us to test if the formation of phonetic categories is contingent on the availability of categorical input, or alternatively, if categorical input may only contribute to the specifics (the acoustic structure) of phonetic categories.

Figure 4.2. Self-tutoring in zebra finches. a) Isolate siblings are raised in social isolation without any song input; their vocalizations are recorded continuously. b) Self-tutored birds are also socially isolated and recorded during the entire song development, but in addition, they learn to peck on a key (red button) which induces playback of a song randomly selected from their own recent vocalizations, repeated recursively over development.

50 Methods

Animal care

Twelve brother pairs were selected from the Hunter College breeding colony for the experiment. All birds were raised by their parents in a dedicated cage in a colony room until day

7 post-hatching. The father was then removed, and the cage was taken to a nursery area housing mothers (who do not sing) and chicks only. Birds were raised by their mother, and at day 30, at the onset of song development, the male siblings were separated from the mother and placed singly in sound-attenuated recording boxes, where they remained until adulthood (day 120). One bird of each sibling pair was raised in social and acoustic isolation (isolate, ISO group), while the other was trained on its own vocal output (self-tutored, ST group). We used brothers to minimize the influence of genetic variation in pairwise statistical comparisons.

Experimental groups

Self-tutored group (ST, 12 birds): One bird from each of our 12 sibling pairs was randomly selected to undergo training using operant song playback, as in Tchernichovski et al. (2001).

Training began 2-4 days after placement into the sound boxes to allow familiarization with the new environment. Birds were trained on their own developing songs (Figure 4.2). To do this, we used our software Sound Analysis Pro 2011 (Tchernichovski et al., 2004) to automatically record the entire vocal output of each bird, continuously over development. We developed an extension to SAP2011, which, on a separate thread, automatically identified song bouts produced by the bird, in nearly real-time. The software identified as putative song any vocalization (song bout) that lasted more than one second with no more than a 100 ms silence period in it, and selected

51 those that included frequency-modulated sounds, to exclude bouts of non-modulated calls. This procedure gave a false-positive error rate of up to 30%, but usually much lower than that. False positive song bouts included mostly bouts of broadband cage noise. The software saved the 10 most recent putative song bouts to a song library folder, for each bird separately. The content of the song library folder was updated every 20 minutes for each bird. Given that juvenile zebra finches typically produce thousands of song bouts every day, turnover rate of songs in each library was less than one hour, except during night sleep. Each time the bird pecked on a key, one of the ten recent songs was randomly played through a speaker placed behind a plastic bird model in the cage. Most birds pecked on the keys to hear songs hundreds of times every day, although there was considerable individual variation (mean number of key pecks = 463±236).

This training continued until the bird was 120 days old at which time we discontinued the training and moved the bird into a colony room.

Isolates group (ISO, 12 birds): Brothers of each bird in the ST group were isolated from day 30 to day 120, as the ST group, but did not receive any training with song playbacks. Their vocalizations were recorded continuously.

WT Training group (WT, 8 birds): Each bird was trained with song playbacks of a randomly selected wild-type, fully stereotyped song, as in Abe & Watanabe (2011). All eight birds produced either partial or complete imitations of the WT song by day 105 when we terminated the experiments. We used their complete song development records, day 40-100 post-hatch, as a normal song learning (WT) control.

52 Data Analysis

We analyzed the vocal output of each bird throughout development. For each bird, we took all the vocalization data produced every 5 days, starting from day 40 post-hatch.

Vocalizations were processed in the Batch Processing module of SAP2011, which automatically segments and calculates mean and variance values for several acoustic features and saves them in a MySQL database. We used 6 syllable features for analysis: syllable duration, mean amplitude, mean frequency modulation (FM), mean pitch, mean Wiener entropy and mean goodness of pitch. Together, these features summarize much of the acoustic structure of each syllable. Matlab and R (version 3.1.0) were used for further analysis of the acoustic features.

K-nearest neighbor analysis: From each day (40, 45, 50, etc.), 5000 syllables were randomly sampled from all the song syllables produced. We then scaled the syllable features as in

(Tchernichovski et al., 1999), and computed Euclidean distances for each pair of syllables across all features to obtain a 5000x5000 matrix of acoustic distances. In order to obtain a robust estimate of cluster density, we calculated the median of the 100 closest neighbors (which are likely to be at the center of clusters), as our measure.

Statistical analysis: To test for differences in developmental trajectories between the ISO and

ST groups, we used a linear mixed effects model in R using the lme4 package (Bates et al.,

2015). As fixed effects, we entered training condition and developmental age (without interaction) into the model, and individual birds and brother pairs as random effects. P-values were obtained by an ANOVA of the full model against a model without training condition. To analyze the time-course of the divergence between ISO and ST birds, we performed t-tests at

53 every time-point with FDR adjustment for multiple comparisons. Empirical cumulative distribution functions were calculated for the final day of song production separately for each feature, and differences were quantified using Kolmogorov-Smirnov tests.

Results

Birds that were trained with their own developing songs (ST), developed songs that were very different from those of their isolate (ISO) brothers. Figure 4.3 shows example sonograms for three sibling pairs demonstrating that ST songs were both more structured and more stereotyped than ISO songs. The ST song motifs (underlined in yellow) tended to contain a larger variety of syllable types (underlined in blue) and were much more stereotyped, and similar to wild-type zebra finch motifs. Most of their song syllables (Figure 4.3a & 4.3b) contained multiple notes and high frequency modulation. In contrast, ISO syllables were often either very long with non-modulated call-like structure (Figure 4.3b), or very short clicks (Figure 4.3c) that are rare in wild-type zebra finch songs. We did not observe any clicks in songs of ST birds.

54

Figure 4.3. Adult songs of isolate and self-tutored sibling pairs. Sonograms of three pairs are shown (a, b, c). Songs of the isolate brothers are above their self-tutored siblings’. Song motifs are underlined in yellow, song syllables in blue. Self-tutored motifs are more stable (repeated in the same way every time), composed of more syllable types and are generally longer. Self-tutored syllable durations are similar to the wild-type durations, isolate durations tend to be too short or too long.

We visualized the process of syllable formation and stabilization over development by presenting scatter-plots of syllable features at different stages of song development

(Derégnaucourt et al., 2005). Figure 4.4a presents an example of an ISO bird and his ST brother’s development. For each syllable (red dot), duration (x axis) is plotted against Frequency

Modulation (y axis), and each panel shows 3000 song syllables produced on a particular day by one bird. At the beginning of song development, the young birds produced unstructured and

55 highly variable syllables that take up a large diffuse area in feature space without clear categories

(lower panels). However, as development progressed (towards upper panels in Figure 4.4a), the birds formed clusters of syllable types. By the end of development, as the songs were crystallized and repeated with very little variability, the clusters become small and tight. In ST birds, as in

WT birds, clear clusters corresponding to categorical signals were present by day 60, but in ISO birds the syllables at this time were still diffuse, and even at the end of song development, the categories were not as clearly defined as in ST and WT birds.

To quantify the emergence of categories (clusters) over development, we calculated

Euclidean distances between syllables features (see Methods). We first extracted a set of the closest nearest neighbors (100 closest syllables for each 5000 sampled syllable), and calculated the median distance, which is a simple estimate of how similar syllable features were at regions of high density (close to the center of clusters). We did this for each bird every 5 days during development. Figure 4.4b shows the mean nearest neighbor distance for each group (error bars represent SEM) over development. As shown, the distances decreased with age in all three groups, indicating the emergence of clusters, but it decreased much more rapidly in WT and ST birds. Between day 50 and 65, the changes were particularly large but this was not observed in the ISO group where changes happened much more slowly, mostly after day 60. This time of rapid cluster emergence in ST and WT birds coincides with the time when major vocal changes typically occur to the phonetic structure of song syllables in normal development.

56 Figure 4.4. Syllables form tight clusters in tape-tutored and self-tutored birds but not in isolates. a) Cluster formation in an isolate bird (left column), his self-tutored brother (middle column) and a tape-tutored bird (right column). Red dots represent song syllables: mean frequency modulation is plotted against syllable duration (both normalized). The top panels represent adult songs, the bottom ones the beginning of development, and the middle row a stage in early development when self- and tape-tutored birds have already begun to produce stable and tightly clustered syllables not present in isolates. b) Median Euclidean distance between neighboring syllables. Over song ontogeny, syllables of the same type become more similar in every group, but this happens faster and earlier in self- and tape-tutored birds while the isolates never from tight clusters.

57 In order to judge the effect of training, we used mixed effects linear regression (details in

Methods). We found a significant effect of Training on development with significant differences between ISO and WT birds (p<0.01) and ISO and ST birds (p<0.01), but no difference between

ST and WT birds (p=0.87). The difference between the ISO and the ST birds is significant

(p<.05) or approaches significance starting from day 60 (with the exception of two days, 65 and

95). In contrast, there are no developmental time points when ST birds are significantly different from WT birds, suggesting that self-training induced the early emergence of syllable categories, as in normal birds. The ISO birds, on the other hand, were invariably either significantly

(p<0.05) or nearly significantly (p<0.08) different from the other groups, and did not catch up with age.

Finally, we tested if the clusters that ST birds generated were acoustically similar to song syllables of isolates or to song syllables of WT birds. At the endpoint of song development, we compared the distribution of features across our three experimental groups (ST, ISO and WT).

Figure 4.5 shows the empirical cumulative distribution functions (ECDFs) for six acoustic features. As shown, for most of the song features the ST curve (green) lies in-between the ISO

(red) and the WT (blue) curves. Figure 4.5b shows the combined KS distances for all 6 song features for all 3 comparisons. The highest distance is between ISO and WT features, followed by ST and WT and then ISO and ST. This confirms the visual impression that the songs of ST birds were indeed more wild-type-like than their ISO brothers’, although ST song features were still slightly closer to ISO features than WT features.

58

Figure 4.5. Distribution of isolate, self-tutored and tape-tutored song features. a) Song features of self- tutored songs (green lines) are more wild-type-like than isolate song features (red lines): their cumulative distributions lie in-between those and the isolate distributions. b) KS Distance for combined acoustic features for the three possible comparisons between ISO, ST and WT songs.

59 Discussion

We showed that when juvenile zebra finches were exposed to delayed playbacks of their own developing songs, syllable types (clusters) emerged early in development despite the fact that the auditory input was initially a continuous and highly variable signal. In other words, a recursive self-input quickly evolved into categorical syllable types even in the absence of any external or categorical input. Moreover, this process followed the typical developmental time- course observed in WT birds. Our findings suggest that the early emergence of categories is internally driven, rather than a simple reflection of imitating a symbolic input, so song imitation can be seen as modulation, rather then the cause of the transition from continuous to categorical signals. In the absence of any song input, categories evolved late and to a lesser extent as indicated by the more diffuse syllable clusters observed in isolates. Therefore, the emergence of categorical signals is strongly facilitated by presenting birds with song input, but the statistical structure of that early input, which is naturally highly variable and continuous, appears to have no bearing on the development of phonetic categories.

The self-generation of distinct syllable types has implications at the level of song culture.

Cultural transmission of categorical signals should be easier, and is more likely to remain stable.

The developmental transition from a broad distribution of vocal sounds to several distinct clusters is equivalent to a moderate compression. An important parameter is the magnitude of compression: a strong compression would collapse a broad distribution into a single cluster. In most songbirds, compression is mild, resulting in several syllable types, but there are extreme cases: in chipping sparrows the broadly distributed song features collapse into a single syllable

60 type over development (Liu & Nottebohm, 2007), whereas the song of the adult California thrasher remains complex and variable (Sasahara et al., 2012). The strong effect on song input availability on cluster formation, which we observed in the ST birds, suggests that this is essentially a socially mediated process. A comparable effect was observed in chipping sparrows

(Liu & Nottebohm, 2007), who maintain multiple syllable types until they establish their territories at which time they stop producing them and retain only the one that best matches their neighbor’s song. There is some tension between this socially maintained simplicity and sexual selection, because females generally show preference for males that produce more elaborate courtship signals, such as complex songs and large repertoires (Buchanan & Catchpole, 1997;

Hasselquist et al., 1996; Okanoya, 2004; Searcy, 1984). These opposing forces (simplicity and strong compression for learning, efficient delivery and to avoid predation on the one hand and complexity and weak compression for sexual selection on the other) have a great impact on the evolution of song culture, as evidenced by the complex song of the Bengalese finch in contrast to the simple song of its wild ancestor, the white-rumped munia (Honda & Okanoya, 1999). Our study yields insights into which aspects of vocal development are determined by an innate developmental program and which depend on external stimuli, shedding light on the relationship between innate and culturally determined aspects of vocal culture. To our knowledge, the possible link between the extent of developmental compression and the stability of song culture had not previously been studied.

Self-tutored birds did not only develop syllable categories, but the acoustic features of their songs were also more WT-like than those of their ISO brothers. This result speaks to the relationship between learning biases and song structure, and may have implications for learning

61 biases in humans and linguistic structure. In a previous study, we showed that biased imitation of

ISO song leads to WT song features over repeated episodes of learning across generations (Fehér et al., 2009). In that study, the initial song input was ISO song, and although the young birds readily imitated the ISO tutors, biases accumulated across generations resulting in WT song features within a few generations. That process is similar to what is observed in iterated learning experiments involving artificial language learning in humans that have found linguistic structure evolving out of unstructured input across learners. Human studies have shown that over repeated episodes of iterated learning, languages become more learnable and compositional (Kirby et al.,

2008), more regular (Reali & Griffiths, 2009; Smith & Wonnacott, 2010) and that categorical structure emerges from a continuous meaning space (Carr et al., 2016), that is, linguistic structure evolves as a result of cultural transmission. Collectively, these studies suggest that biases reflected in linguistic structure are amplified and made explicit by transmission, and that this process requires learning by multiple individuals. Our current study suggests relaxing this requirement: multiple individuals may not be necessary for learning biases to shape vocal output, because iterations within a single individual in a closed sensory-motor loop may facilitate behavioral changes over development. If a similar self-tuning of vocal behavior towards universally observed linguistic features holds for humans, it could have important theoretical and practical implications.

62 Appendix

Song Learning from Virtual Tutors: A Pilot Study

Introduction

Juvenile zebra finches must decide which song to imitate within the first 2-3 months of their life, a decision they cannot reverse later on. This decision is influenced by complex social interactions throughout development, which are currently difficult to study systematically

(Chapter 1). Video playbacks would allow us to simplify the social scenarios presented to the birds and to test which social interactions are important for song learning.

Here we use the videos from our preference tests in Chapter 2, where we showed that zebra finches strongly preferred to watch videos of manipulated males (59.33% of the time) over videos of floating cheeks and beaks (only 4.11% of the time), to bias song imitation of juveniles.

Here we test to what extent we can use a video of a singing virtual male as a social replacement for a live tutor. Juveniles produce poor imitation of non-interactive (passive) song playbacks in the laboratory, but copy live tutors with high accuracy (Derégnaucourt et al., 2013; Eales, 1989).

Our prediction is that the passive learners will have poor imitation of the tutor song, while the juveniles with a video of a male will have better imitation if they perceive the virtual male as a social tutor. Virtual social environments could answer questions related to tutor choice, without sacrificing experimental control.

63 Methods

Animal Care

22 birds were selected from the Hunter College breeding colony. All birds were raised by their parents in a dedicated cage in a colony room until day 7 post-hatching. The father was then removed, and the cage was taken to a nursery area housing only non-singing mothers and fledglings. Birds were raised by their mother until day 30, at the onset of song development, the male juveniles were separated from the mother and placed singly in sound-attenuated recording boxes, where they remained until adulthood (day 120).

We compared three conditions that received the same audio but differed in exposure to visual stimuli: experimental group with video of an attractive male (‘lipstick’), control group with video of moving cheeks and beak (‘float’), and control group with no visual input (‘passive’).

Experimental Groups

‘Lipstick’ Group (n=9): Juveniles were trained to a video of a singing gray-scale male with colored beaks and cheek (Figure A1)

‘Float’ Group (n=6): Juveniles were trained a video of floating cheeks and beak against a plain gray background (Figure A1)

‘Passive’ Group (n=7): Juveniles were trained with passive exposure to song and no tablets.

64 Training protocol

The two tablet groups were trained with Nexus 7 tablets (2013 model) that were programmed, using the Tasker app for Android, to automatically turn on twice a day (at 10 am and 4pm) and play a 20-minute video. The passive training group received the same 20 minute audio clip at the same time, so that the timing of the song bouts was identical. All three groups received the same song: a synthetically-composed song of four natural syllables. We used our software Sound Analysis Pro 2011(Tchernichovski et al., 2004) to automatically record the entire vocal output of each bird, continuously over development.

Data Analysis

We measured song similarity using Sound Analysis Pro at the endpoint of song development (day 110 post hatch) for each bird against the tutor model (Tchernichovski et al.,

2000). We used 10 renditions of each bird’s motif sung throughout the day and compared it to the motif of the tutor to get an average percentage of song similarity.

Results

Unexpectedly, birds in our passive training group had good imitation of the 4-syllable model song by day 110 (Mean=81.69%, SD=8.76; Figure A1 and Figure A2 for song bout samples). This suggests that the song itself is a strong stimulus, so we cannot conclude whether the video of a male had an effect on song learning. Birds in the ‘lipstick’ group (Mean=73.87,

SD=18.26; Figure A3 for song bouts) and ‘float’ group (Mean=75.1, SD=16.77; Figure A4 for song bouts) had lower average similarities, but a one-way ANOVA showed no statistically

65 significant difference of imitation across the three groups (F(2,19)=0.55, p=0.59; Figure A1).

The worst imitation across all groups was in the ‘Lipstick’ group, which could be due to the larger sample size of the group.

Figure A1. Similarity to tutor song at the endpoint of song development (110 days post hatch). There are no differences across the different groups, indicating that visual information from the tablets was not influencing song development. Gray dots are the average similarity scores across 10 motifs for each bird. Red lines are the mean of the group, 95% confidence intervals are shaded in red, and one standard deviation is shaded in blue.

66

Figure A2. Adult songs of birds trained with passive song. Representative sonograms from the endpoint of song development are shown, with one sonogram per bird. Birds had very good imitation of the syllables from the song model (outlined in blue). This means that visual information from videos was not necessary to promote song learning.

67

Figure A3. Adult songs of birds trained with video of the singing “lipstick” male. Representative sonograms from the endpoint of song development are shown, with one sonogram per bird. Birds typically had good imitation of the syllables from the song model (outlined in blue).

68

Figure A4. Adult songs of birds trained with video of the singing “floating cheeks and beak”. Representative sonograms from the endpoint of song development are shown, with one sonogram per bird. Birds typically had good imitation of the syllables from the song model (outlined in blue), and were comparable to the control group without any visual input (“passive”).

69 Discussion

We showed that birds trained with passive song of our 4-syllable model had very good imitation, which was unexpected based on previously reported results for passive training in zebra finches (Derégnaucourt et al., 2013; Eales, 1989). Three of our song syllables were chosen based on previously published results that reported 80% of juveniles did not learn all three syllables by day 63 post hatch when trained with key-pecks, which typically produce better song imitation than passive playbacks (Derégnaucourt et al., 2013; Lipkind et al., 2013). We expected greater variation in song imitation success using a four-syllable song. Future work could explore why this particular song was readily imitated without visual or social input. Perhaps more complex songs, or a different paradigm, could be used to test the effects of virtual males on song development in zebra finches. For now, we cannot conclude whether the ‘lipstick’ males, which we know the birds prefer to watch (Chapter 2), can drive stronger imitation of song than other visual inputs. Adjustments to the pilot study will have to be made to determine whether videos can be used as a social substitute to drive vocal imitation in juveniles.

70 References

Abe, K., & Watanabe, D. (2011). Songbirds possess the spontaneous ability to discriminate

syntactic rules. Nature Neuroscience, 14(8), 1067–1074.

Adret, P. (1997). Discrimination of video images by zebra finches (Taeniopygia guttata): Direct

evidence from song performance. Journal of Comparative Psychology, 111(2), 115–125.

Amy, M., & Leboucher, G. (2009). Effects of eavesdropping on subsequent signalling

behaviours in male canaries. Ethology, 115(3), 239–246.

Anisimov, V. N., Herbst, J. A., Abramchuk, A. N., Latanov, A. V., Hahnloser, R. H. R., &

Vyssotski, A. L. (2014). Reconstruction of vocal interactions in a group of small songbirds.

Nature Methods, 11(11), 1135–1137.

Aplin, L. M., Farine, D. R., Cockburn, A., Thornton, A., & Sheldon, B. C. (2015).

Experimentally induced innovations lead to persistent culture via conformity in wild birds.

Nature, 518(7540), 538–541.

Ballerini, M., Cabibbo, N., Candelier, R., Cavagna, A., Cisbani, E., Giardina, I., Lecomte, V.,

Orlandi, A., Parisi, G., Procaccini, A., Viale, M., & Zdravkovic, V. (2008). Interaction

ruling animal collective behavior depends on topological rather than metric distance:

evidence from a field study. Proceedings of the National Academy of Sciences, 105(4),

1232–1237.

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models

using lme4. Journal of Statistical Software, 67(1).

Beecher, M., & Brenowitz, E. (2005). Functional aspects of song learning in songbirds. Trends

in Ecology & Evolution, 20(3), 143–149.

71 Beecher, M., Burt, J., Ologhlen, A., Templeton, C., & Campbell, S. (2007). Bird song learning in

an eavesdropping context. Animal Behaviour, 73(6), 929–935.

Benichov, J. I., Benezra, S. E., Vallentin, D., Globerson, E., Long, M. A., & Tchernichovski, O.

(2016). The forebrain song system mediates predictive call timing in female and Male Zebra

finches. Current Biology, 26(3), 309–318.

Bialek, W., Cavagna, A., Giardina, I., Mora, T., Silvestri, E., Viale, M., & Walczak, A. M.

(2012). Statistical mechanics for natural flocks of birds. Proceedings of the National

Academy of Sciences, 109(13), 4786–4791.

Bird, C. D., & Emery, N. J. (2008). Using video playback to investigate the social preferences of

rooks, Corvus frugilegus. Animal Behaviour, 76(3), 679–687.

Bohil, C. J., Alicea, B., & Biocca, F. (2011). Virtual reality in neuroscience research and therapy.

Nature Reviews Neuroscience, 12, 752–762.

Boogert, N. J., Farine, D. R., & Spencer, K. A. (2014). Developmental stress predicts social

network position. Biology Letters, 10(10), 20140561.

Bottjer, S. W., Miesner, E. A., & Arnold, A. P. (1984). Forebrain lesions disrupt development

but not maintenance of song in passerine birds. Science, 224(4651), 901–903.

Bowmaker, J. K., Heath, L. A., Wilkie, S. E., & Hunt, D. M. (1997). Visual pigments and oil

droplets from six classes of photoreceptor in the retinas of birds. Vision Research, 37(16),

2183–2194.

Brainard, M. S., & Doupe, A. J. (2002). What songbirds teach us about learning. Nature,

417(6886), 351–358.

Branson, K., Robie, A. A., Bender, J., Perona, P., & Dickinson, M. H. (2009). High-throughput

ethomics in large groups of Drosophila. Nature Methods, 6(6), 451–457.

72 Brembs, B. (2000). The Operant and the Classical in Conditioned Orientation of Drosophila

melanogaster at the Flight Simulator. Learning & Memory, 7(2), 104–115.

Briganti, A. M., & Cohen, L. B. (2011). Examining the role of social cues in early word learning.

Infant Behavior and Development, 34(1), 211–214.

Buchanan, K. L., & Catchpole, C. K. (1997). Female choice in the sedge warbler Acrocephalus

schoenobaenus: multiple cues from song and territory quality. Proceedings of the Royal

Society B: Biological Sciences, 264(1381), 521–526.

Butler, S. R., & Fernández-Juricic, E. (2014). European starlings recognize the location of

robotic conspecific attention. Biology Letters, 10(10), 20140665.

Carr, J. W., Smith, K., Cornish, H., & Kirby, S. (2016). The Cultural Evolution of Structured

Languages in an Open-Ended, Continuous World. Cognitive Science, 41(4), 892–923.

Cavagna, A., Cimarelli, A., Giardina, I., Parisi, G., Santagati, R., Stefanini, F., & Viale, M.

(2010). Scale-free correlations in starling flocks. Proceedings of the National Academy of

Sciences, 107(26), 11865–11870.

Chen, G., King, J. A., Burgess, N., & O’Keefe, J. (2013). How vision and movement combine in

the hippocampal place code. Proceedings of the National Academy of Sciences, 110(1),

378–383.

Chow, C. P., Mitchell, J. F., & Miller, C. T. (2015). Vocal turn-taking in a non-human primate is

learned during ontogeny. Proceedings of the Royal Society B: Biological Sciences,

282(1807), 20150069.

Cooper, R. P., & Aslin, R. N. (1990). Preference for Infant-Directed Speech in the First Month

after Birth. Child Development, 61(5), 1584–1595.

73 Couzin, I. D., & Krause, J. (2003). Self-Organization and Collective Behavior in Vertebrates.

Advances in the Study of Behavior, 32, 1–75.

Cuthill, I. C., Hart, N. S., Partridge, J. C., Bennett, A. T. D., Hunt, S., & Church, S. C. (2000).

Avian colour vision and avian video playback experiments. Acta Ethologica, 3(1), 29–37.

D’Amelio, P. B., Trost, L., & ter Maat, A. (2017). Vocal exchanges during pair formation and

maintenance in the zebra finch (Taeniopygia guttata). Frontiers in Zoology, 14(1). de Chaumont, F., Coura, R. D.-S., Serreau, P., Cressant, A., Chabout, J., Granon, S., & Olivo-

Marin, J.-C. (2012). Computerized video analysis of social interactions in mice. Nature

Methods, 9(4), 410–417.

Dell, A. I., Bender, J. a., Branson, K., Couzin, I. D., de Polavieja, G. G., Noldus, L. P. J. J.,

Pérez-Escudero, A., Perona, P., Straw, A. D., Wikelski, M., & Brose, U. (2014). Automated

image-based tracking and its application in ecology. Trends in Ecology and Evolution,

29(7), 417–428.

Derégnaucourt, S., & Gahr, M. (2013). Horizontal transmission of the father’s song in the zebra

finch (Taeniopygia guttata). Biology Letters, 9(4), 20130247.

Derégnaucourt, S., Mitra, P. P., Fehér, O., Pytte, C., & Tchernichovski, O. (2005). How sleep

affects the developmental learning of bird song. Nature, 433(7027), 710–716.

Derégnaucourt, S., Poirier, C., Van der Kant, A., Van der Linden, A., & Gahr, M. (2013).

Comparisons of different methods to train a young zebra finch (Taeniopygia guttata) to

learn a song. Journal of Physiology - Paris, 107(3), 210–218.

Deshpande, M., Pirlepesov, F., & Lints, T. (2014). Rapid encoding of an internal model for

imitative learning. Proceedings of the Royal Society B: Biological Sciences, 281(1781),

20132630.

74 Doupe, A. J., & Konishi, M. (1991). Song-selective auditory circuits in the vocal control system

of the zebra finch. Proceedings of the National Academy of Sciences, 88(24), 11339–11343.

Doupe, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: common themes and

mechanisms. Annual Review of Neuroscience, 22, 567–631.

Eales, L. (1989). The Influences of Visual and Vocal Interaction on Song Learning in Zebra

finches. Animal Behaviour, 37(3), 507–508.

Ekstrom, A. D., Caplan, J. B., Ho, E., Shattuck, K., Fried, I., & Kahana, M. J. (2005). Human

hippocampal theta activity during virtual navigation. Hippocampus, 15(7), 881–889.

Elie, J. E., Soula, H. A., Mathevon, N., & Vignal, C. (2011). Dynamics of communal

vocalizations in a social songbird, the zebra finch (Taeniopygia guttata). The Journal of the

Acoustical Society of America, 129(6), 4037–4046.

Evans, C. S., & Marler, P. (1991). On the use of video images as social stimuli in birds: audience

effects on alarm calling. Animal Behaviour, 41(1), 17–26.

Fehér, O., Wang, H., Saar, S., Mitra, P. P., & Tchernichovski, O. (2009). De novo establishment

of wild-type song culture in the zebra finch. Nature, 459(7246), 564–568.

Fleischman, S., Terkel, J., & Barnea, A. (2016). Visual recognition of individual conspecific

males by female zebra finches, Taeniopygia guttata. Animal Behaviour, 120, 21–30.

Fleishman, L., & Endler, J. a. (2000). Some comments on visual perception and the use of video

playback in animal behavior studies. Acta Ethologica, 3(1), 15–27.

Fleishman, McClintock, W., D’Eath, R., Brainard, D., & Endler, J. (1998). Colour perception

and the use of video playback experiments in animal behaviour. Animal Behaviour, 56(4),

1035–1040.

75 Floor, P., & Akhtar, N. (2006). Can 18-Month-Old Infants Learn Words by Listening In on

Conversations? Infancy, 9(3), 327–339.

Freed-Brown, G., & White, D. J. (2009). Acoustic mate copying: female cowbirds attend to other

females’ vocalizations to modify their song preferences. Proceedings of the Royal Society

B: Biological Sciences, 276(1671), 3319–3325.

Galoch, Z., & Bischof, H.-J. (2006a). Behavioural responses to video playbacks by zebra finch

males. Behavioural Processes, 74(1), 21–26.

Galoch, Z., & Bischof, H.-J. (2006b). Zebra Finches actively choose between live images of

conspecifics. Ornithological Science, 5(1), 57–64.

Gill, L. F., Goymann, W., Ter Maat, A., & Gahr, M. (2015). Patterns of call communication

between group-housed zebra finches change during the breeding cycle. eLife, 4, e07770.

Goldstein, M. H., King, A. P., & West, M. J. (2003). Social interaction shapes babbling: Testing

parallels between birdsong and speech. Proceedings of the National Academy of Sciences,

100(13), 8030–8035.

Goldstein, M. H., & Schwade, J. A. (2008). Social Feedback to Infants’ Babbling Facilitates

Rapid Phonological Learning. Psychological Science, 19(5), 515–523.

Gomez, R. L., & Gerken, L. (2000). Infant artificial language learning and language acquisition.

Trends in Cognitive Sciences, 4(5), 178–186.

González, M. C., Hidalgo, C. A., & Barabási, A.-L. (2008). Understanding individual human

mobility patterns. Nature, 453(7196), 779–782.

Goodale, E., & Podos, J. (2010). Persistence of song types in Darwin’s finches, Geospiza fortis,

over four decades. Biology Letters, 6(5), 589–592.

76 Haesler, S., Wada, K., Nshdejan, A., Morrisey, E. E., Lints, T., Jarvis, E. D., & Scharff, C.

(2004). FoxP2 Expression in Avian Vocal Learners and Non-Learners. Journal of

Neuroscience, 24(13), 3164–3175.

Hart, N. S., & Vorobyev, M. (2005). Modelling oil droplet absorption spectra and spectral

sensitivities of bird cone photoreceptors. Journal of Comparative Physiology A:

Neuroethology, Sensory, Neural, and Behavioral Physiology, 191(4), 381–392.

Harvey, C. D., Collman, F., Dombeck, D. A., & Tank, D. W. (2009). Intracellular dynamics of

hippocampal place cells during virtual navigation. Nature, 461(7266), 941–946.

Hasselquist, D., Bensch, S., & von Schantz, T. (1996). Correlation between male song repertoire,

extra-pair paternity and offspring survival in the great reed warbler. Nature, 381(6579),

229–232.

Honda, E., & Okanoya, K. (1999). Acoustical and Syntactical Comparisons between Songs of

the White-backed Munia (Lonchura striata) and Its Domesticated Strain, the Bengalese

Finch (Lonchura striata var. domestica). Zoological Science, 16(2), 319–326.

Hove, M. J., & Risen, J. L. (2009). It’s All in the Timing: Interpersonal Synchrony Increases

Affiliation. Social Cognition, 27(6), 949–960.

Ikebuchi, M., & Okanoya, K. (1999). Male Zebra Finches and Bengalese Finches Emit Directed

Songs to the Video Images of Conspecific Females Projected onto a TFT Display.

Zoological Science, 16(1), 63–70.

Jarvis, E. D., & Nottebohm, F. (1997). Motor-driven gene expression. Proceedings of the

National Academy of Sciences, 94(8), 4097–4102.

Jones, A. E., & Slater, P. J. B. (1996). The role of Aggression in song tutor choice in the zebra

finch: Cause or effect? Behaviour, 133.

77 Jusczyk, P. W. (2000). The Discovery of Spoken Language. MIT Press.

Katz, Y., Tunstrom, K., Ioannou, C. C., Huepe, C., & Couzin, I. D. (2011). Inferring the

structure and dynamics of interactions in schooling fish. Proceedings of the National

Academy of Sciences, 108(46), 18720–18725.

King, A., West, M., & Goldstein, M. (2005). Non-vocal shaping of avian song development:

parallels to human speech development. Ethology, 111(1), 101–117.

King, S. L. (2015). You talkin’ to me? Interactive playback is a powerful yet underused tool in

animal communication research. Biology Letters, 11(7), 20150403.

Kirby, S., Cornish, H., & Smith, K. (2008). Cumulative cultural evolution in the laboratory: an

experimental approach to the origins of structure in human language. Proceedings of the

National Academy of Sciences, 105(31), 10681–10686.

Kuhl, P. K., Andruski, J. E., Chistovich, I. A., Chistovich, L. A., Kozhevnikova, E. V, Ryskina,

V. L., Stolyarova, E. I., Sundberg, U., & Lacerda, F. (1997). Cross-Language Analysis of

Phonetic Units in Language Addressed to Infants. Science, 277(5326), 684–686.

Lipkind, D., Marcus, G. F., Bemis, D. K., Sasahara, K., Jacoby, N., Takahasi, M., Suzuki, K.,

Feher, O., Ravbar, P., Okanoya, K., & Tchernichovski, O. (2013). Stepwise acquisition of

vocal combinatorial capacity in songbirds and human infants. Nature, 498(7452), 104–108.

Lipkind, D., & Tchernichovski, O. (2011). Quantification of developmental birdsong learning

from the subsyllabic scale to cultural evolution. Proceedings of the National Academy of

Sciences, 108, 15572–15579.

Liu, A. W., Gardner, T. J., & Nottebohm, F. (2004). Juvenile zebra finches can use multiple

strategies to learn the same song. Proceedings of the National Academy of Sciences,

101(52), 18177–18182.

78 Liu, W.-C., & Nottebohm, F. (2007). A learning program that ensures prompt and versatile vocal

imitation. Proceedings of the National Academy of Sciences, 104(51), 20398–20403.

Maguire, S. E., Schmidt, M. F., & White, D. J. (2013). Social brains in context: lesions targeted

to the song control system in female cowbirds affect their social network. PloS One, 8(5),

e63239.

Marcus, G. F., Vijayan, S., Bandi Rao, S., & Vishton, P. M. (1999). Rule learning by seven-

month-old infants. Science, 283, 77–80.

Margoliash, D., & Konishi, M. (1985). Auditory representation of autogenous song in the song

system of white-crowned sparrows. Proceedings of the National Academy of Sciences,

82(17), 5997–6000.

Mariette, M. M., & Griffith, S. C. (2012). Nest visit synchrony is high and correlates with

reproductive success in the wild Zebra finch Taeniopygia guttata. Journal of Avian Biology,

43(2), 131–140.

Mariette, M. M., & Griffith, S. C. (2015). The adaptive significance of provisioning and foraging

coordination between breeding partners. The American Naturalist, 185(2), 270–280.

Markham, A. C., Guttal, V., Alberts, S. C., & Altmann, J. (2013). When good neighbors don’t

need fences: Temporal landscape partitioning among baboon social groups. Behavioral

Ecology and Sociobiology, 67(6), 875–884.

Matsumoto, J., Urakawa, S., Takamura, Y., Malcher-Lopes, R., Hori, E., Tomaz, C., Ono, T., &

Nishijo, H. (2013). A 3D-Video-Based Computerized Analysis of Social and Sexual

Interactions in Rats. PloS One, 8(10), e78460.

Maximov, V. V. (1989). Approximation of visual pigment absorption spectra. Sensory Systems,

2(1), 2–5.

79 Maye, J., Werker, J. F., & Gerken, L. (2002). Infant sensitivity to distributional information can

affect phonetic discrimination. Cognition, 82(3), B101–B111.

McCasland, J. S., & Konishi, M. (1981). Interaction between Auditory and Motor Activities in

an Avian Song Control Nucleus. Proceedings of the National Academy of Sciences, 78(12),

7815–7819.

McCowan, L. S. C., Mariette, M. M., & Griffith, S. C. (2015). The size and composition of

social groups in the wild zebra finch. Emu, 115(3), 191–198.

Mischiati, M., Lin, H., Herold, P., Imler, E., Olberg, R., & Leonardo, A. (2014). Internal models

direct dragonfly interception steering. Nature, 517(7534), 333–338.

Nottebohm, F., Stokes, T. M., & Leonard, C. M. (1976). Central control of song in the canary,

Serinus canarius. Journal of Comparative Neurology, 165(4), 457–486.

Nowak, M. A. (2006). Five rules for the evolution of cooperation. Science, 314(5805), 1560–

1563.

Okanoya, K. (2004). The Bengalese finch: A window on the behavioral neurobiology of

birdsong syntax. Annals of the New York Academy of Sciences, 1016, 724–735.

Okubo, T. S., Mackevicius, E. L., Payne, H. L., Lynch, G. F., & Fee, M. S. (2015). Growth and

splitting of neural sequences in songbird vocal development. Nature, 528(7582), 352–357.

Ölveczky, B. P., Andalman, A. S., & Fee, M. S. (2005). Vocal experimentation in the juvenile

songbird requires a basal ganglia circuit. PLoS Biology, 3(5), 0902–0909.

Partan, S., Yelda, S., Price, V., & Shimizu, T. (2005). Female pigeons, Columba livia, respond to

multisensory audio/video playbacks of male courtship behaviour. Animal Behaviour, 70(4),

957–966.

80 Pérez-Escudero, A., Vicente-Page, J., Hinz, R. C., Arganda, S., & de Polavieja, G. G. (2014).

idTracker: tracking individuals in a group by automatic identification of unmarked animals.

Nature Methods, 11(7), 743–748.

Plamondon, S. L., Rose, G. J., & Goller, F. (2010). Roles of syntax information in directing song

development in white-crowned sparrows (Zonotrichia leucophrys). Journal of Comparative

Psychology, 124(2), 117–132.

Reali, F., & Griffiths, T. L. (2009). The evolution of frequency distributions: Relating

regularization to inductive biases through iterated learning. Cognition, 111(3), 317–328.

Roberts, T. F., Gobes, S. M. H., Murugan, M., Ölveczky, B. P., & Mooney, R. (2012). Motor

circuits are required to encode a sensory model for imitative learning. Nature Neuroscience,

15(10), 1454–1459.

Romberg, A., & Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley

Interdisciplinary Reviews: Cognitive Science, 1(6), 906–914.

Rose, G. J., Goller, F., Gritton, H. J., Plamondon, S. L., Baugh, A. T., & Cooper, B. G. (2004).

Species-typical songs in white-crowned sparrows tutored with only phrase pairs. Nature,

432(7018), 753–758.

Rosenthal, S. B., Twomey, C. R., Hartnett, A. T., Wu, H. S., & Couzin, I. D. (2015). Revealing

the hidden networks of interaction in mobile animal groups allows prediction of complex

behavioral contagion. Proceedings of the National Academy of Sciences, 112(15),

201420068.

Rubenstein, D. R., & Wrangham, R. (Eds.). (1987). Ecological Aspects of Social Evolution:

Birds and Mammals. Princeton University Press.

81 Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants.

Science, 274(5294), 1926–1928.

Sasahara, K., Cody, M. L., Cohen, D., & Taylor, C. E. (2012). Structural Design Principles of

Complex Bird Songs: A Network-Based Approach. PLoS ONE, 7(9).

Scharff, C., & Nottebohm, F. (1991). Study of the behavioral deficits following lesions of

various parts of the zebra finch song system: Implications for vocal learning. Journal of

Neuroscience, 11(9), 2896–2913.

Searcy, W. A. (1984). Song repertoire size and female preferences in song sparrows. Behavioral

Ecology and Sociobiology, 14(4), 281–286.

Seebacher, F., Krause, J., & Seebacher, F. (2017). Physiological mechanisms underlying animal

social behaviour. Philosophical Transactions of the Royal Society B: Biological Sciences,

372(1727), 20160231.

Silk, M. J., Croft, D. P., Tregenza, T., & Bearhop, S. (2014). The importance of fission – fusion

social group dynamics in birds. Ibis, 156(4), 701–715.

Singh, L., Nestor, S., Parikh, C., & Yull, A. (2009). Influences of Infant-Directed Speech on

Early Word Recognition. Infancy, 14(6), 654–666.

Smith, K., & Wonnacott, E. (2010). Eliminating unpredictable variation through iterated

learning. Cognition, 116(3), 444–449.

Soma, M. F. (2011). Social Factors in Song Learning: A Review of Estrildid Finch Research.

Ornithological Science, 10(2), 89–100.

82 Stivers, T., Enfield, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., Hoymann, G.,

Rossano, F., de Ruiter, J. P., Yoon, K.-E., & Levinson, S. C. (2009). Universals and cultural

variation in turn-taking in conversation. Proceedings of the National Academy of Sciences,

106(26), 10587–10592.

Strandburg-Peshkin, A., Farine, D. R., Couzin, I. D., & Crofoot, M. C. (2015). Shared decision-

making drives collective movement in wild baboons. Science, 348(6241), 1358–1361.

Striano, T., Henning, A., & Stahl, D. (2006). Sensitivity to interpersonal timing at 3 and 6

months of age. Interaction Studies, 7(2), 251–271.

Tchernichovski, O., Lints, T. J., Deregnaucourt, S., Cimenser, A., & Mitra, P. P. (2004).

Studying the song development process: rationale and methods. Annals of the New York

Academy of Sciences, 1016, 348–363.

Tchernichovski, O., Lints, T., Mitra, P. P., & Nottebohm, F. (1999). Vocal imitation in zebra

finches is inversely related to model abundance. Proceedings of the National Academy of

Sciences, 96(22), 12901–12904.

Tchernichovski, O., Mitra, P. P., Lints, T., & Nottebohm, F. (2001). Dynamics of the vocal

imitation process: how a zebra finch learns its song. Science, 291(5513), 2564–2569.

Tchernichovski, O., Nottebohm, F., Ho, C., Pesaran, B., & Mitra, P. (2000). A procedure for an

automated measurement of song similarity. Animal Behaviour, 59(6), 1167–1176.

Templeton, C. N., Akçay, C., Campbell, S. E., & Beecher, M. D. (2010). Juvenile sparrows

preferentially eavesdrop on adult song interactions. Proceedings of the Royal Society B:

Biological Sciences, 277(1680), 447–453.

Templeton, C. N., Campbell, S. E., & Beecher, M. D. (2012). Territorial song sparrows tolerate

juveniles during the early song-learning phase. Behavioral Ecology, 23(4), 916–923.

83 Ter Maat, A., Trost, L., Sagunsky, H., Seltmann, S., & Gahr, M. (2014). Zebra Finch Mates Use

Their Forebrain Song System in Unlearned Call Communication. PLoS ONE, 9(10),

e109334.

Thiessen, E. D., Hill, E. A., & Saffran, J. R. (2005). Infant-Directed Speech Facilitates Word

Segmentation. Infancy, 7(1), 53–71.

Thiessen, E. D., & Saffran, J. R. (2003). When cues collide: use of stress and statistical cues to

word boundaries by 7- to 9-month-old infants. Developmental Psychology, 39(4), 706–716.

Tinbergen, N. (1951). The study of instinct. New York: Oxford University Press.

Trainor, L. J., & Cirelli, L. (2015). Rhythm and interpersonal synchrony in early social

development. Annals of the New York Academy of Sciences, 1337(1), 45–52.

Vorobyev, M., Osorio, D., Bennett, a. T. D., Marshall, N. J., & Cuthill, I. C. (1998).

Tetrachromacy, oil droplets and bird plumage colours. Journal of Comparative Physiology -

A Sensory, Neural, and Behavioral Physiology, 183(5), 621–633.

Wasserman, S. M., Aptekar, J. W., Lu, P., Nguyen, J., Wang, A. L., Keles, M. F., Grygoruk, A.,

Krantz, D. E., Larsen, C., & Frye, M. A. (2015). Olfactory neuromodulation of motion

vision circuitry in Drosophila. Current Biology : CB, 25(4), 467–472.

West, M., & King. (1988). Female visual displays affect the development of male song in the

cowbird. Nature, 334(6179), 244–246.

Williams, H. (1990). Models for song learning in the zebra finch: fathers or others? Animal

Behaviour, 39(4), 745–757.

Williams, H., Kilander, K., & Sotanski, M. Lou. (1993). Untutored song, reproductive success

and song learning. Animal Behaviour, 45(4), 695–705.

84 Williams, H., Levin, I. I., Norris, D. R., Newman, A. E. M., & Wheelwright, N. T. (2013). Three

decades of cultural evolution in Savannah sparrow songs. Animal Behaviour, 85(1), 213–

223.

Xu, D., Yapanel, U., & Gray, S. (2009). Reliability of the LENATM language environment

analysis system in young children’s natural home environment. Boulder, CO: LENA

Foundation.

Zann, R. (1996). The Zebra Finch: a Synthesis of Field and Laboratory Studies. Oxford Univ.

Press.

85