Data integration methods for studying animal population dynamics

by Audrey Béliveau

M.Sc., Université de Montréal, 2012 B.Sc., Université de Montréal, 2010

Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

in the Department of Statistics and Actuarial Science Faculty of Science

c Audrey Béliveau 2015 SIMON FRASER UNIVERSITY Fall 2015

All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for “Fair Dealing.” Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately. Approval

Name: Audrey Béliveau Degree: Doctor of Philosophy (Statistics) Title: Data integration methods for studying animal population dynamics Examining Committee: Chair: Gary Parker Professor

Richard Lockhart Senior Supervisor Professor

Carl Schwarz Co-Supervisor Professor

Steven Thompson Supervisor Professor

Rick Routledge Internal Examiner Professor

Paul Conn External Examiner Research Mathematical Statistician National Marine Laboratory NOAA/NMFS Alaska Fisheries Science Center

Date Defended: 22 December 2015

ii Abstract

In this thesis, we develop new data integration methods to better understand animal pop- ulation dynamics. In a first project, we study the problem of integrating aerial and access data from aerial-access creel surveys to estimate angling effort, catch and harvest. We pro- pose new estimation methods, study their statistical properties theoretically and conduct a simulation study to compare their performance. We apply our methods to data from an annual Kootenay Lake (Canada) survey. In a second project, we present a new Bayesian modeling approach to integrate capture- recapture data with other sources of data without relying on the usual independence assump- tion. We use a simulation study to compare, under various scenarios, our approach with the usual approach of simply multiplying likelihoods. In the simulation study, the Monte Carlo RMSEs and expected posterior standard deviations obtained with our approach are always smaller than or equal to those obtained with the usual approach of simply multi- plying likelihoods. Finally, we compare the performance of the two approaches using real data from a colony of Greater horseshoe (Rhinolophus ferrumequinum) in the Valais, Switzerland. In a third project, we develop an explicit integrated population model to integrate capture- recapture survey data, dead recovery survey data and snorkel survey data to better under- stand the movement from the ocean to spawning grounds of Chinook salmon (Oncorhynchus tshawytscha) on the West Coast of Vancouver Island, Canada. In addition to providing spawning escapement estimates, the model provides estimates of stream residence time and snorkel survey observer efficiency, which are crucial but currently lacking for the use of the area-under-the-curve method currently used to estimate escapement on the West Coast of Vancouver Island.

Keywords: Aerial-access; Capture-recapture; Creel surveys; Independence assumption; Integrated population modeling; Oncorhynchus tshawytscha

iii Acknowledgements

First and foremost, I am very grateful to my supervisors Richard Lockhart and Carl Schwarz for their time, advice, financial support and the collaboration opportunities offered through- out my doctoral program. I would like to thank my collaborators: Steve Arndt for providing insight on the creel survey data; Roger Pradel for hosting me at the CEFE and introducing me to integrated population modeling; Michael Schaub and Raphaël Arlettaz for providing the bats data and insight; and finally Roger Dunlop for hosting me during the 2014 Burman River survey and for the numerous discussions that have followed. I can say without a doubt that those PhD years were the best of my life so far, for the most part thanks to the incredibly friendly atmosphere in the Department and the amazing people I met there. I would like to thank Derek Bingham for hosting me in his lab and providing access to computing resources. I am also grateful to Gary Parker for his support in a wide array of instances. To my fellow graduate students and friends Ararat, Biljana, Elena, Huijing, Mike, Ofir, Oksana, Ruth, Shirin, Zheng and many others, thank you for cheering up my days and for the many dinners, concerts, tennis matches and more! A very special mention goes to Shirin and Ofir for their support in difficult times. I would like to say a big thank you to all my dancing friends and teammates for all the fun times that helped maintain a good balance in my life. I am also thankful to David Haziza for always believing in me! Finally, I gratefully acknowledge the financial support from the Natural Sciences and Engineering Research Council of Canada.

iv Table of Contents

Approval ii

Abstract iii

Acknowledgements iv

Table of Contents v

List of Tables vii

List of Figures ix

1 Introduction 1

2 Adjusting for undercoverage of access-points in creel surveys with fewer overflights 3 2.1 Introduction ...... 3 2.2 Sampling Protocol ...... 5 2.3 Statistical Methods ...... 6 2.3.1 Inference Framework ...... 7 2.3.2 Study of the Bias ...... 8 2.3.3 Study of the Variance ...... 9 2.3.4 Optimal Allocation ...... 10 2.3.5 Stratification ...... 11 2.4 Simulation Study ...... 12 2.5 Application ...... 15 2.6 Discussion ...... 21

3 Explicit integrated population modeling: escaping the conventional as- sumption of independence 23 3.1 Introduction ...... 23 3.2 Background and notation ...... 24 3.2.1 Capture-recapture survey ...... 24

v 3.2.2 Population count survey ...... 26 3.2.3 Integrated population modeling via likelihood multiplication . . . . . 27 3.3 Integrated population modeling based on the true joint likelihood ...... 28 3.3.1 Capture-recapture and count data ...... 28 3.3.2 Model variations ...... 32 3.4 Simulation Study ...... 33 3.5 Application ...... 38 3.6 Discussion ...... 43

4 Integrated population modeling of Chinook salmon (Oncorhynchus tshawytscha) migration on the West Coast of Vancouver Island 45 4.1 Introduction ...... 45 4.2 Sampling Protocol ...... 47 4.3 Notation ...... 48 4.4 A Jolly-Seber approach to estimate escapement ...... 49 4.5 Integrated population modeling ...... 53 4.6 Analysis of the 2012 data ...... 57 4.6.1 Assessment of the integrated population model ...... 64 4.7 Discussion ...... 66

Bibliography 67

Appendix A Supplementary materials for Chapter 2 70 A.1 First-order Taylor expansions ...... 70

A.2 Assumptions, propositions and proofs for the study of Errparty ...... 70 A.2.1 Assumptions ...... 70

A.2.2 Study of Errparty for the estimators CbR and CbDE ...... 71

A.2.3 Study of Errparty for the estimator Cb1 ...... 72

A.2.4 Study of Errparty for the estimator Cb2 ...... 74 A.3 Proof of the Optimal Allocation ...... 76 A.4 Monte Carlo measures ...... 77 A.5 Figures ...... 78

Appendix B Supplementary materials for Chapter 3 81 B.1 Monte Carlo measures used in the simulation study ...... 81 B.2 Plots of the results of the simulation study ...... 83 B.3 Bats data analysis ...... 93

Appendix C Supplementary materials for Chapter 4 94 C.1 Analysis of the 2012 capture-recapture data using the software MARK . . . 94

vi List of Tables

Table 2.1 Values of αi and βi for the variance formulas ...... 10 Table 2.2 Parameter values used to generate the data for the simulation study. . 13

Table 2.3 Monte Carlo measures for the simulation with µb = 130. Numbers are expressed in %...... 16 Table 2.4 Allocation of sample size in the 2010-2011 Kootenay Lake Creel Survey. 18

Table 2.5 Optimal values of no/ng for each month and day type combination for the number of rainbow trout kept. Note that we do not present results for the double expansion estimator because in that case the optimal

allocation is no = ng...... 20 Table 2.6 Seasonal combined estimates (Est) of total number of rainbow trout kept along with approximate 95% confidence intervals (Low,Upp). The last column is computed as a separate total estimate over the three seasons...... 21

Table 3.1 Changes in the population size per state over time for a study with K = 3 periods. The table follows the timeline in Figure 3.1. Starting

in the upper left corner of the table, the population is comprised of N1 unmarked individuals at the beginning of period 1. Then, the count survey occurs (which does not affect the state nor size of the popula-

tion). Then, B1 births occur resulting in N1 + B1 unmarked individu-

als in the population. Then, C1 individuals are captured, marked and

released which leaves N1 +B1 −C1 unmarked individuals in the popula- u m tion. Then, D1 unmarked individuals die and D11 marked individuals u die. When period 2 begins, there are respectively N1 + B1 − C1 − D1 m and C1 −D11 unmarked and marked individuals in the population. The table goes on like this until the study is finished. Note: C & R is used to abbreviate “captures and recaptures”...... 30 Table 3.2 Monte Carlo measures comparing the performance of the true joint likelihood approach (L) and the composite likelihood approach (Lc) in the simulation study, across scenarios and parameters. Each Monte Carlo measure is based on 250 simulated datasets...... 35

vii Table 3.3 Monte Carlo estimates of P (WL ≤ WLc ), where W stands for either the absolute error (AE), the standard deviation of the posterior sample (SD) or the length of the 95% HPD credible interval (LCI). Each Monte Carlo measure is based on 250 simulated datasets...... 36

Table 4.1 Notation for the data collected at Burman River. The subscript s can take the values m (males) and f (females)...... 49 Table 4.2 Notation for the parameters used in the Jolly-Seber model and/or the integrated population model. The subscript s can take the values m (males) and f (females)...... 51 Table 4.3 Variables used in the Jolly-Seber model, categorized based on their role in the model...... 52 Table 4.4 Formulas used to compute quantities of interest for the Jolly-Seber model or the integrated population model. Residence time and alive population size in the stream cannot be estimated from the Jolly-Seber model. Notes: (1) Sums are defined as zero when backwards; (2) The use of d − 0.5 in the mean stopover time calculation is based on the assumption that within a day, the movement of fish upstream to the spawning grounds is distributed uniformly over the day; (3) The latent m m variables Ni,j,s, and Ai,j,s are defined as 0 when i is not a capture- recapture day; (4) The time unit is days...... 54 Table 4.5 Variables used in the integrated population model, categorized based on their role in the model...... 55 Table 4.6 Escapement estimates obtained from the Jolly-Seber model and the in- tegrated population model. The formulas used to calculate escapement are given in Table 4.4. CI denotes credible intervals...... 60 Table 4.7 Integrated population modeling marginal estimates and credible in- tervals of observer efficiency in the snorkel survey, based on the fish visibility covariate...... 64

viii List of Figures

Figure 2.1 Boxplots of relative bias due to the partial interview of parties for 100 population replicates with varying mean number of boats per day. Left column: scenario (A); right column: scenario (B). The

first to fourth rows relate, in order, to the estimators Cˆ1, Cˆ2, CˆDE

and CˆR...... 14 Figure 2.2 Kootenay Lake and the creel survey access points. Riondel/Crawford Bay and Boswell/Kuskanook ramps were combined for field moni- toring and data analysis. Map provided by A. Waterhouse, Ministry of Forests, Lands, and Natural Resource Operations...... 17 Figure 2.3 Monthly estimates of total number of rainbow trout kept along with approximate 95% confidence intervals. The top and bottom graphs represent weekends and weekdays respectively. The estimators (2.1) to (2.4) are represented respectively by the following symbols: tri- angle, circle, x mark and square...... 19

Figure 3.1 Timeline of events of the animal population study. The symbols “C”, “B” and “CR” stand for count survey, births and capture-recapture, respectively. Note that the time between the count survey, the births and the capture-recapture survey in each period is negligible. . . . 28 Figure 3.2 Marginal posterior distributions (smoothed) obtained from analyzing the bats data. The plain line represents the true joint likelihood method while the dashed line represents the composite likelihood method...... 43

Figure 4.1 Map of Burman River on the West Coast of Vancouver Island, Canada 46 Figure 4.2 Schematic representation of Chinook salmon migration at Burman River, as assumed by the integrated population model. The arrows denote transitions while boxes denote states...... 53 Figure 4.3 Timeline when surveys were performed in 2012. Each occurrence is denoted by a symbol “×”. Adjacent symbols correspond to consecu- tive days...... 58 Figure 4.4 Summary time series of the 2012 data...... 58

ix Figure 4.5 Daily discharge measured at Gold River over the 2012 migration period. Although discharge data is not available at Burman River, the data at nearby Gold River are thought to be a good proxy for Burman River. The first big freshet occurred on October 14th. . . . 59 Figure 4.6 Estimates of the population size in the pool obtained using the Jolly- Seber model based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval...... 61 Figure 4.7 Stopover time estimates obtained using the Jolly-Seber model based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval...... 61 Figure 4.8 Estimates of the population size in the tagging pool obtained using the integrated population modeling approach based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval...... 62 Figure 4.9 Stopover time estimates obtained using the integrated population modeling approach based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval...... 62 Figure 4.10 Residence time estimates obtained using the integrated population modeling approach based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval...... 63 Figure 4.11 Estimates of alive population size in the spawning area obtained using the integrated population modeling approach based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval...... 63 Figure 4.12 Bayesian p-values for the assessment of the capture-recapture com-

ponent of the integrated population model, using discrepancy D1. . 65 Figure 4.13 Bayesian p-values for the assessment of the snorkel survey component

of the integrated population model, using discrepancy D2...... 66

x Chapter 1

Introduction

The study of animal population dynamics is important for the management and conservation of animal populations. A variety of surveys can be used for that purpose: capture-recapture surveys, population counts, newborn counts, dead recoveries, telemetry surveys, creel sur- veys, etc. When a population is studied using more than one type of survey, the integration of all the data in a single statistical analysis can be very challenging. It is currently an area of active research and will be the main topic of this work. Chapters 2, 3 and 4 form the core of this thesis. They are self-sufficient in the sense that they can be read in any order, they each contain an introduction and the notation is not shared between chapters. In Chapter 2, we propose new statistical methods to integrate the data from aerial-access creel surveys in order to estimate angling effort, catch and harvest in recreational fisheries. Aerial-access creel surveys rely on two components: (1) A ground component in which fishing parties returning from their trips are interviewed at some access-points of the fishery; (2) An aerial component in which an instantaneous count of the the number of fishing parties is conducted. It is common practice to sample fewer aerial survey days than ground survey days. This is thought by practitioners to reduce the cost of the survey, but there is a lack of sound statistical methodology for this case. In Chapter 2, we propose various estimation methods to handle this situation and evaluate their asymptotic properties from a design- based perspective (see Lohr, 2009). The performance of the proposed estimators is studied empirically using a simulation study with varying sampling scenarios. Another aspect that we study in this work is the optimal allocation of the effort between the ground and the aerial portion of the survey, for given costs and budget, for which we derive formulas using the Lagrange multipliers method. Finally, we apply our methods to data from an annual Kootenay Lake (Canada) survey. Capture-recapture surveys are periodic surveys that take place on a series of capture occasions. On each occasions, a survey crew captures animals from a population. When an animal is captured for the first time, it is marked with a unique identification number

1 and released back to the population, and the identification number is recorded. When an animal is recaptured, its identification number is recorded and it is released back into the population. The data collected by capture-recapture can be used to estimate the survival probability of the marked animals between capture occasions. In Chapter 3, we develop new statistical methods for the integrated population modeling of capture-recapture data with other types of data, such as population counts and dead recoveries. Typically, integrated population models rely on the assumption that the datasets are independent so that a joint likelihood is easily formed as a product of likelihoods (Schaub and Abadi, 2011). In our work, we develop a new capture-recapture Bayesian model that takes into account the dependency between datasets. A key aspect of the model is that it uses latent variables that keep track of all population gains (e.g. births) and losses (e.g. deaths) in the unmarked population and the marked cohorts over time. A simulation study compares, under various scenarios, our approach with the common likelihood multiplication approach. Finally, we compare the performance of the two approaches using a real dataset comprised of capture-recapture data, count data and newborn count data on a colony of Greater horseshoe bats (Rhinolophus ferrumequinum) in the Valais, Switzerland. In Chapter 4, we develop a Bayesian integrated population model to study the return of Chinook salmon (Oncorhynchus tshawytscha) from the ocean to the spawning grounds in Burman River, on the west coast of Vancouver Island, Canada. Chinook salmon on the west coast of Vancouver Island return to their natal stream in the fall after reaching maturity to spawn and die. When entering Burman River, fish stop for at least some time at a stopover pool, where a capture-recapture survey takes place, then move upstream where they spawn and die. The upstream portion of the river is surveyed periodically by snorkelers that count the number of marked and the total number of fish seen (alive). Carcass surveys also take place periodically, during which marked and unmarked carcasses are picked. Our integrated population model integrates the capture-recapture data, carcass data and snorkel data all in a single analysis. This is, to our knowledge, the first use of explicit integrated population modeling applied to salmon migration. Our explicit integrated population model uses latent variables to follow explicitly the movement and state of fish throughout the migration. In this work, we also implement a Bayesian version of the Jolly-Seber model (Schwarz and Arnason, 1996) to the capture-recapture data alone and compare estimates between the integrated method and the Jolly-Seber method.

2 Chapter 2

Adjusting for undercoverage of access-points in creel surveys with fewer overflights

The work in this chapter underwent a peer-review process for publication in Biometrics, a journal of the International Biometric Society published by Wiley. The paper is currently available in Early View on Wiley Online Library, see Béliveau et al. (2015).

The 2010-2011 Kootenay Lake creel survey was conducted with financial support of the and Wildlife Compensation Program on behalf of its program partners BC Hydro, the Province of BC, Fisheries and Oceans Canada, First Nations and the public. Access interviews and overflight boat count data were collected by Redfish Consulting Ltd. (Nelson, British Columbia).

2.1 Introduction

Sustainability of recreational fisheries relies on well-advised management decisions. To inform those decisions, fishery agencies conduct creel surveys. Many characteristics of a fishery can be of interest, including total catch (number of fish released or kept), total harvest (number of fish kept), or total fishing effort (number of fishing days or hours) over a period of time. The data collection for creel surveys can be of two types: off-site (mail, telephone, door-to-door, logbooks) or on-site (Pollock, Jones and Brown, 1994). In this work, we focus on on-site surveys, which are conducted at the water body location during fishing hours. A common type of on-site survey is the access-point survey: it is a ground survey, which relies on survey agents intercepting and interviewing angling parties immediately at the return of their fishing trip. The survey agents can be posted for example, at public boat ramps, piers or marinas. If a list of all access-points of the water body

3 can be constructed, and access-points are selected randomly (each with strictly positive probability), an unbiased estimate of, e.g. total catch can be obtained for each survey day. However, in many practical situations, this option is impossible because some access-points may be private (for example private docks or piers) or some parties may use unregulated sites. Consequently, if these cases represent a significant proportion of the parties and/or if these cases differ significantly in their variables of interest from parties that use the covered access-points, then standard estimation methods will have a substantial undercoverage bias (Lohr, 2009). To address this problem, it is typically assumed in creel surveys that parties that use uncovered access-points do not differ in the variables of interests (catch, harvest, fishing effort, etc.) from parties that use the covered access-points. For instance, they should not be more or less experienced anglers. Still, this assumption is not sufficient for the estimation of totals because the number of parties using uncovered access-points is unknown and typically not negligible. This last piece of information is deduced using aerial surveys. Aerial surveys can be conducted, for example, using aircraft overflights or well-suited viewpoints from which an instantaneous count of the number of fishing parties at a time of the day is obtained. Ideally, aerial surveys should be scheduled at random times of the day (Pollock et al., 1994) but environmental conditions (e.g. inclement weather, daylight hours, airport delay) can make it hard for survey agents to respect the planned schedules. With this in mind, Dauk and Schwarz (2001) proposed estimation methods in the case when the aerial survey is conducted at a convenient time of the day, typically around the peak of fishing activity. The use of deterministic aerial survey times is justified if parties’ choice of access point is not related to their fishing schedule. In this work, we focus on multi-day surveys for which we wish to estimate totals of variables of interest over multiple days, for example, the week-ends of August. Statistical methodology is currently available when ground and overflight surveys are conducted on the same set of days, chosen at random among the days at study (Dauk and Schwarz, 2001). In practice, it is also common that aerial surveys are carried out on a random sample of the ground survey days only. This is thought by fisheries managers to be more economical because flights are costly, and the biological (fish size, age, species) and angler data provided by ground sampling are highly valuable for management. Rather surprisingly, there is a lack of statistical methodology for this type of aerial-access creel survey. The motivating application for the work in this chapter is the annual creel survey on Kootenay Lake, British Columbia. Estimates of catch (per species and overall), harvest (per species and overall) and fishing effort are required at the monthly level, separately for weekdays and weekends/holidays. In each stratum (eg. week-ends of August), the sampling of days follows a two-phase design: in the first phase, a simple random sample (srs) of days is selected to conduct the ground portion of the survey; in the second phase, a simple random sample of days for the overflight survey is selected from the days when access surveys

4 are done. The access-points to be surveyed are selected deterministically to maximize the proportion of anglers that are interviewed. Also, in practice, because of inclement weather, mechanical breakdown or other reasons, some of the scheduled overflights might not be carried out. In this work, we assume that all scheduled overflights are conducted or that if missed it occurred at random. In Section 2.2, the sampling protocol is described in detail and the notation is introduced. Then a variety of estimation methods are provided in Section 2.3 along with their design- based asymptotic properties and a strategy for optimal allocation of resources between the ground and overflight components. In Section 2.4, a simulation study investigates the performance of the estimators. Finally, in Section 2.5, the methods are applied to the 2010-2011 Kootenay Lake creel survey data.

2.2 Sampling Protocol

Consider a population U of size N days. On every day i ∈ U, a set Vi of size Mi parties

(or boats) fishes on that day on the water body of interest. For every party j ∈ Vi on days i ∈ U, the variable of interest is cij. It may represent, for example: the number of

fish caught, the number of fish kept or the number of rod-hours. For every party j ∈ Vi on days i ∈ U, an indicator variable, Iij, indicates whether party j returns to one of the ground survey access points. In addition, for every day i ∈ U, if an overflight could be conducted, it would be conducted at time ti (we make the usual assumption that overflights are instantaneous). Then, for every party j ∈ Vi on days i ∈ U, an indicator variable,

δij(ti), indicates whether party j is fishing at time ti. For the rest of the chapter, we drop the dependence on ti in δij(ti) for ease of notation.

In the first phase, a simple random sample sg ⊂ U of size ng days is selected to conduct the ground surveys. On every sampled day i ∈ sg, the parties that return to the surveyed access-points are interviewed (i.e. the parties for which Iij = 1): their corresponding vari- ables cij as well as the start and end times of their fishing trip are collected. On every day P i ∈ sg, the total of the variable of interest over the interviewed parties, Ci ≡ cijIij, can j∈Vi be computed from the data.

In the second phase, a simple random sample so ⊂ sg of size no days is selected. Over-

flight surveys are conducted on those days and, for every day i ∈ so, the number of active P boats at time ti, Aoi ≡ δij, is recorded. Thus, for every day i ∈ so, one can deduce j∈Vi the value of δij for each party interviewed during the ground survey on that day using the start and end times of their fishing trip. Then, one can compute the number of interviewed P parties that are fishing at time ti, Agi ≡ δijIij. j∈Vi

5 2.3 Statistical Methods

The goal is to estimate the sum of a variable of interest over all angling parties during the ∗ P ∗ ∗ P study period: C = i∈U Ci , where Ci = cij is the sum over all angling parties active j∈Vi on day i of the variable of interest. In this section, we propose a number of estimators for C∗. We start by suggesting two intuitive estimators:

P   Aoi ˆ N X i∈so C1 =  Ci P (2.1) ng Agi i∈sg i∈so and     ˆ N X 1 X Aoi C2 =  Ci   . (2.2) ng no Agi i∈sg i∈so The general idea behind these two estimators is to calculate an estimate of the total of the variable of interest at the surveyed access-points and expand it to all access-points using an inflation factor computed as a ratio of the Aoi’s and Agi’s. The difference between Cˆ1 and

Cˆ2 lies in computing the ratio involving the Aoi’s and Agi’s. As a third estimator, we suggest

N X Aoi CˆDE = Ci , (2.3) no Agi i∈so which uses only information from days when both access and aerial components are avail- able. Setting y = C Aoi , this estimator is a double expansion estimator (see, e.g. Särndal, i i Agi ∗ Swensson and Wretman (1992), p.348), where yi can be seen as a proxy for Ci . The double expansion estimator is a generalization of the (single-phase) expansion estimator (also called

Horvitz-Thompson estimator) to two-phase designs. It is simply a weighted sum of the yi’s computed from the aerial survey days’ data, where the weights correspond to the inverse probability of inclusion in the sample, N . The estimator is design-unbiased but does not no integrate auxiliary information; namely the information collected on ground survey days that do not have an overflight. Hence, we propose to use that auxiliary information in a two-phase ratio estimator (see again Särndal et al., p.359):

  1 P C ng j ˆ N X j∈sg CR =  yi 1 P . (2.4) no Cj i∈so no j∈so

Ratio estimators are asymptotically design-unbiased and have improved design-efficiency over expansion estimators when yi is approximately proportional to Ci. These four estimators are consistent in the sense that if we sample all days and interview all fishing parties every day, then the estimators are equal to the true total catch C∗.

6 Before describing the inference framework in which we study the proposed estimators, let us make some general remarks. First, note that if Ci is constant across days (all i ∈ U), ˆ ˆ ˆ Aoi ˆ then C2 = CR = CDE. Second, note that if Agi is constant across days (all i ∈ U), then C1 = Cˆ2 = CˆR. Therefore, if the total catch per day and the proportion of interviewed parties are similar across days, all estimators are expected to be roughly equivalent. However, the first condition seems very unlikely to be satisfied in practice because daily environmental conditions (such as weather) could significantly affect the number of fishing parties and the success of the parties. Also, regarding the second condition, there can be, for example, a greater use of non-sampled access points on good weather days in summer, which tends to decrease the proportion of interviewed parties.

2.3.1 Inference Framework

Throughout this chapter, we use the generic notation Cˆ to denote an estimator of C∗. The ∗ ∗ total error of an estimator Cˆ is Cˆ − C = (Cˆ − C˜) + (C˜ − C ) ≡ Errday(Cˆ) + Errparty(Cˆ), where the first term, Errday(Cˆ), is the error due to the sampling of days while the second term, Errparty(Cˆ), is the error due to the partial interview of fishing parties. Besides, C˜ denotes the estimator one would have used in the case of a census of ground and overflight days. For example, if the estimation strategy is to use the estimator CˆR, the estimator used 1 P N Cj ˜  N P  j∈U P in the presence of a census of days would be C = N i∈U yi 1 P = i∈U yi. N Cj j∈U First, we assume that there is a superpopulation model, m, that randomly generates, for each day i ∈ U, the number Mi of fishing parties. In addition, it generates, for each party j on day i: variables of interest, cij’s; fishing status at time ti, δij’s; and indicators of return to one of the surveyed access-points, Iij’s. Then, following the established sampling design, a two-phase sample of days is randomly selected by the survey practitioner. Inference can be made following different approaches depending on the sources of randomness one is willing to take into account for inference. In this chapter, we adopt the design-based mode of inference, that considers only the randomness coming from the design. For example, unbiasedness under the design-based approach means that on average, over all the possible samples of days, the total error is null. Another type of inference that we do not pursue in this work would be joint design and model-based inference. Although we are doing design- based inference, we make use of the superpopulation model mentioned in the beginning of this paragraph. The purpose of that model will be to give guidance concerning the design-based biases of our estimates.

From a design-based perspective, the contribution Errday(Cˆ) to the total error is random

(design-dependent) while the contribution Errparty(Cˆ) is a fixed quantity, because C˜ and ∗ C do not depend on the sample of days. As a consequence, Errparty(Cˆ) contributes to the

7 bias of Cˆ but not to its variance:

∗ n o Biasp(Cˆ) ≡ Ep(Cˆ − C ) = Ep Errday(Cˆ) + Errparty(Cˆ) n o = Ep Errday(Cˆ) + Errparty(Cˆ) (2.5) and

∗  Varp(Cˆ) = Varp(Cˆ − C ) = Varp Errday(Cˆ) + Errparty(Cˆ)  = Varp Errday(Cˆ) , where Ep(·) and Varp(·) denote respectively the expectation and the variance under the sampling plan.

2.3.2 Study of the Bias n o From equation (2.5), two terms contribute to the bias : Ep Errday(Cˆ) and Errparty(Cˆ).

To begin, we focus on the first term. In the case of the double-expansion estimator CˆDE, we n o   have E Err (Cˆ ) = E N P y − P y = 0. This result follows from classical p day DE p no i∈so i i∈U i survey sampling theory for two-phase designs (see eg. Lohr (2009), p.473). The other estimators are smooth non-linear functions of estimated totals that can be linearized using a first order Taylor series in the traditional finite population asymptotic framework of Isaki and Fuller (1982). The Taylor series expansions of the estimators are given in Appendix  ∗ A.1. Consequently, Ep Errday(Cˆ) is negligible relative to the true total catch C when no is large enough. We now focus on the second term of (2.5). In general, this term is not negligible but we are interested in finding situations in which it is. Note that if all fishing parties were interviewed on the sampled days (all access-points are known, accessible and surveyed), we  would have Errparty Cˆ = 0 for all four estimators. Therefore, sampling as many access- points as possible helps in reducing the bias of the estimators. Errparty(Cˆ) Now, we study the quantity C∗ in an asymptotic framework consisting of a se- ∞ quence of superpopulation models, {mη}η=1. For any superpopulation model mη, the num- ber of fishing parties on each day i in the population of days U is denoted Mηi and tends to infinity in probability, as η → ∞. The subscript η is dropped for ease of notation.

Note that CˆDE and CˆR have the same value of C˜ and therefore, the same value of

Errparty(Cˆ) so they can be studied at the same time. Because neither the access-points nor the time of the overflight were selected randomly, it is necessary to assume that the super- population model is such that parties generated on a given day have the same probability of being interviewed (this probability should not depend on e.g. fishing period or ability).

8 More formally, we assume that on any day i ∈ U, the random variables

Iij|(Mi,ci1, . . . ,ciMi ,δi1, . . . ,δiMi ), j = 1 ...Mi are i.i.d. Bernoulli(pi), where pi is a non-zero probability. This is an important assumption whose validity must be gauged by fisheries scientists prior to the survey. In addition, we use two assumptions that are mainly technical and are normally satisfied. See Appendix A.2.1 for the assumptions.

It can be shown that, under the assumptions previously described, Errparty(CˆDE) and ∗ Errparty(CˆR) are negligible relative to the target C when the number of fishing parties on each day is large enough. See Appendix A.2.2 for a proof of this assertion. On the other hand, the estimators Cˆ1 and Cˆ2 require additional assumptions in order for the error due to the partial interview of parties to be negligible. In the case of Cˆ2, pi should be approximately constant across days i ∈ U (see Appendix A.2.4 for a proof of this assertion). Regarding

Cˆ1, we found two cases that provide negligible error:

1. pi is approximately constant across days i ∈ U, or

C∗ 2. i (the average catch per boat) and Aoi (the proportion of parties fishing at the time Mi Mi of the aerial count) are approximately constant across days i ∈ U.

See Appendix A.2.3 for a proof of this assertion. To sum up, we found that the estimators

CˆR and CˆDE are those that require the weakest assumptions to get negligible error due to partial interview of parties.

2.3.3 Study of the Variance

The variance of Cˆ can be expressed using the usual two-phase decomposition of the variance:    Varp Cˆ = Var1E2 Cˆ|sg + E1Var2 Cˆ|sg , where E1(·) and Var1(·) denote respectively the

first-phase expectation and variance and E2(·|sg) and Var2(·|sg) denote respectively the second-phase expectation and variance, conditional on the first-phase sample sg. S2 In the case of the double expansion estimator, we have Var E Cˆ|s  = N 2 1 − ng  y 1 2 g N ng   S2 2 and E Var Cˆ|s  = N 2 1 − no y , where S2 = 1 P y − y¯  and y¯ = 1 P y . The 1 2 g ng no y N−1 i U U N i i∈U i∈U total variance is therefore:   2  2 no Sy Varp Cˆ = N 1 − . (2.6) N no The asymptotic variances of the remaining three estimators can be obtained from their first-order Taylor expansions. Here, the large sample properties refer again to the finite population framework. We use AVp(·) to denote asymptotic design-variance. For all three

9 Cˆ αi βi ¯ ¯ ¯   ˆ AoU CU CU AoU CU AoU C1 Ci + Aoi − Agi 2 Aoi − Agi AgU AgU AgU AgU AgU Cˆ C R + R C¯ ,R = Aoi R C¯ 2 i U i U i Agi i U y¯U CˆR yi yi − Ci C¯U

Table 2.1: Values of αi and βi for the variance formulas remaining estimators, the asymptotic variance can be written in the form

  2 ! S2  2 ng Sα 2 no β AVp Cˆ = N 1 − + N 1 − , (2.7) N ng ng no

2 2 2 where Sα and Sβ are measures of dispersion defined analogously to Sy . The values of αi and βi associated with each estimator can be found in Table 2.1. Note that the variance formulas do not rely upon the assumptions in Section 2.3.2, which are only relevant to the study of bias. Theoretical comparison between variances seem ambitious for most estimators. However, it can be seen that CˆR will be more efficient 2 2 than the double expansion estimator if its corresponding value of Sβ is smaller than Sy or, equivalently, if the population correlation coefficient between y and C is sufficiently large, 1 CV(C) that is, greater than 2 CV(y) , where CV stands for the population coefficient of variation. Variance estimators can be obtained from (2.7) by replacing all S2 quantities by their equivalent at the sample level. For example, an estimator of the variance of CˆR would 2 2 s  2 ng  s 2  n  β 2 1 2 be Vard Cˆ = N 1 − α + N 1 − o , where s = P (y − y¯ ) and p R N ng ng no α no−1 i∈so i so 2 1 P ˆ ˆ¯ 2 1 P ˆ¯ 1 P ˆ ˆ y¯so sβ = i∈so (βi − βso ) , with y¯so = yi, βso = βi and βi = yi − ¯ Ci. no−1 no no Cso i∈so i∈so q ˆ A confidence interval of approximate level 1−α can be obtained as Cb±tng−1,1−α/2 Vard p C .

2.3.4 Optimal Allocation

Suppose that a budget B is allocated to the survey and that each overflight has a cost of κo and each ground survey has a cost of κg. In the case of CˆDE, the allocation that minimizes the variance (2.6) subject to the constraint noκo + ngκg ≤ B is obviously no = ng = B/(κo + κg) because the information collected on ground survey days that don’t have an aerial survey is not used to compute CˆDE. For the other estimators, the allocation that minimizes the asymptotic variance (2.7) subject to the constraint noκo + ngκg ≤ B is found using the method of Lagrange multipliers; see Appendix A.3. We obtain:

v −1  u 2  B u κo Sβ ng = 1 + t 2 2  κg κg Sα − Sβ

10 and v u 2 uκg Sβ no = ngt 2 2 . κo Sα − Sβ

2 2 Although the quantities Sα and Sβ are unknown, they can be approximated using pre- vious years’ survey data. We now make two practical remarks. First, the optimal allocation 2 2 formulas require that Sα − Sβ > 0. If this is not the case, then the optimal allocation is necessarily n = n = B . Second, the optimal allocation formulas can lead to alloca- o g κo+κg tions that do not satisfy no ≤ ng ≤ N. In that case, the optimal solution is found on the boundary, which means that either no = ng < N or no < ng = N. In the former case, the optimal allocation is n = n = B and in the latter, n = B−Nκg and n = N. o g κo+κg o κo g It suffices to compute both allocations along with their variance and choose the allocation with the smallest variance. The optimal fraction of overflight days to ground days depends on two ratios. As the ratio of activity costs of an aerial to a ground survey increases, fewer overflight days should 2 2 be performed. As Sα increases relative to Sβ, then the optimal allocation also favors ground surveys.

2.3.5 Stratification

Aerial-access creel surveys, such as the Kootenay Lake survey, may also be obtained through a stratified two-phase design. Stratification occurs at the population level, that is, the study period is divided into strata and two-phase srs/srs samples are selected independently in each stratum. When such a design is used, it is usually desirable to estimate the total of variables of interest over larger periods of time such as seasons or years. In this section, we modify our notation by including stratum indicator indices. The stratum populations are H S denoted by U1,...,UH , and U now represents the overall population U = Uh. As well, h=1 the stratum first-phase and second-phase samples are denoted respectively by sg1, . . . , sgH and so1, . . . , soH and are of size ng1, . . . , ngH and no1, . . . , noH . We are interested in es- ∗ PH ∗ ∗ timating the total C = h=1 Ch, where Ch is the variable of interest total in stratum h. In the case of the double expansion estimator, one can obtain a stratified estimator and variance estimator by simply summing the stratum estimators and variance estimators. For the two-phase nonlinear estimators (2.1), (2.2) and (2.4) there are typically two ways to combine the information across strata, that is: separate ratio estimators and combined ratio estimators (see Lohr (2009), p.144).

The separate estimator, which we denote Cˆs, is obtained by summing the estimators ˆ PH ˆ ˆ computed within each stratum: Cs = h=1 Ch, where Ch represents one of the estimators (2.1), (2.2) or (2.4) computed within stratum h. The variance of the separate estimator is

11 the sum of the stratum variances, and therefore a variance estimator is simply obtained by summing the variance estimates within each stratum.

For the combined estimators, we present only the case of estimator CˆR for sake of simplicity but the other combined estimators can be obtained analogously. H P Nh P Chj   ngh j∈sgh PH Nh P h=1 The combined ratio estimator is given by CˆRc = yhi h=1 noh i∈soh H P Nh P Chj noh j∈soh h=1 2    2   s   PH 2 ngh sαh 2 noh βhˆ and the variance estimator is Vard p CˆRc = N 1 − + N 1 − , h=1 h Nh ngh h ngh noh H P Nl P ylj nol j∈sol ˆ l=1 where αhi = yhi and βhi = yhi − H Chi. P Nl P Clj nol j∈sol l=1 A known fact about the separate ratio estimator is that it sums up the separate biases while the standard error generally decreases relative to the total of interest (Lohr (2009), p. 145). As a result, the bias-to-SE ratio increases. If the separate biases are negligible, the use of the separate ratio estimators is appropriate, otherwise the combined estimators are preferable.

2.4 Simulation Study

The first part of the simulation study was designed to study the bias due to the partial interview of parties under different scenarios. We considered one scenario (A) where we use the same model to generate the fishing parties on every day and another scenario (B) where there are three different types of days (e.g. based on weather conditions) and a different generating model for the parties for each day type. Hence scenario (A) makes a strong assumption that pi is the same for all days i in U and in that case we expect ∗ Errparty(Cˆ)/C to be asymptotically equal to zero for all four estimators based on the results in Section 2.3.2. Scenario (B) allows pi to vary across days and in this case it ∗ was asserted in Section 2.3.2 that Errparty(Cˆ)/C is asymptotically equal to zero for the estimators CˆDE and CˆR. For each scenario, and for increasing mean number of boats (fishing parties) per day, µb = 50, 100, 250 and 500, we generated 100 populations of size N = 22 days (corresponding roughly to weekdays in a month). For every day i in U = {1,...,22}, we proceeded in the following way:

• In the case of scenario (B), generate the day type: type (i) with probability 0.3, type (ii) with probability 0.4 or type (iii) with probability 0.3.

• Generate the number of fishing parties : Mi ∼ Poisson(µb)

12 Parameter Scenario (A) Scenario (B) Mean number of fishermen per party, νi 2 2 Mean catch per fisherman, λi 0.3 0.8(i) 0.5(ii) 0.2(iii) Probability of fishing at time of overflight, φi 0.5 0.6(i) 0.7(ii) 0.8(iii) Probability of returning to an access-point, pi 0.66 0.5(i) 0.7(ii) 0.8(iii)

Table 2.2: Parameter values used to generate the data for the simulation study.

• Generate Mi fishing parties with catch, activity indicator at time of overflight and access-point landing indicator using the following distributions:

cij ∼ (Poisson(νi − 1) + 1) × Poisson(λi);

δij ∼ Bernoulli(φi);

Iij ∼ Bernoulli(pi).

The parameters used for data generation are listed in Table 2.2.

Then, for each population that was generated, we computed the relative bias due to ˆ ˆ Errparty(C) the partial interview of parties for estimators (2.1) to (2.4) as RBparty(C) = C∗ . Note that the probability distributions used to generate the data were chosen arbitrarily but an accurate match between model and reality is not required here in order to study the design-based properties of our estimators. Figure 2.1 shows the simulation results. In the case of scenario (A), the four estimators have similar behaviors, i.e. the relative biases are distributed closer around zero as the mean number of fishing parties per day gets larger.

In the case of scenario (B), the relative biases of CˆDE and CˆR are centered around zero while the other estimators exhibit a systematic bias that does not diminish as the number of fishing parties increases.

13 Figure 2.1: Boxplots of relative bias due to the partial interview of parties for 100 population replicates with varying mean number of boats per day. Left column: scenario (A); right column: scenario (B). The first to fourth rows relate, in order, to the estimators Cˆ1, Cˆ2, CˆDE and CˆR. 14 The second part of the simulation study was designed to compare the estimators in terms of accuracy and confidence interval coverage. In order to do so, we generated, for each of the two scenarios, a single population with µb = 130 using the algorithm already described. For both scenarios we generated, from the population, K = 50,000 two-phase srs/srs samples of size ng and no. We varied the first-phase sampling fraction by investigating the cases ng = 4, no = 2 and ng = 10, no = 5. For each replicated sample, we computed the estimators of total catch (2.1) to (2.4). Note that given the small sample and population sizes, we could have generated all the possible samples rather than simulating a large number of samples but our code would not have been as generally usable for larger population and/or sample sizes. With K = 50,000 the two approaches give similar results. We summarize the results using the following Monte Carlo measures: the relative bias due to the sampling of days (RBdaysMC), the relative root mean squared error (RRMSEMC), the coverage probability of a 95% confidence interval (CPMC) and the bias ratio (BRMC). The formulas used for the calculations are given explicitly in Appendix A.4. The simulation results are displayed in Table 2.3. For both scenarios (A) and (B), the

RRMSEMC of the double expansion estimator is larger than that of Cˆ1, Cˆ2 and CˆR. In scenario (A), the coverage probability is close to 95% for all estimators. The bias ratio is also relatively low (Cochran, 1977, p.14) which explains why the confidence intervals have proper coverage. The coverages are consistently slightly over 95% which we think is a consequence of the t distribution being an approximation for the distribution of Cˆ. In the case of scenario (B), when the first-phase sampling fraction is smaller, the bias ratios remain fairly low and all estimators have coverage probability close to 95%. However, when the

first-phase sampling fraction is larger, the biases of Cˆ1 and Cˆ2 become important relative to the standard error and affect the coverage probability negatively. Hence, the estimator CˆR is preferable in this study because it has the smallest RRMSEMC along with 95% coverage probability for the confidence interval.

2.5 Application

An aerial-access creel survey was conducted on Kootenay Lake, British Columbia, from December 2010 through November 2011. The study period was stratified by month and by day status: weekday or weekend. Statutory holidays were also defined as weekends. In each stratum, a simple random sample of days was selected to conduct ground surveys. Within each of these samples, a simple random sample of days was selected to conduct overflights. The allocation of sample sizes for the study is displayed in Table 2.4. The number of samples per month was adjusted seasonally to increase the intensity during months when fishing effort was expected to be higher (based on previous data). Unsafe weather conditions also resulted in cancellation of some flights but we assume here that the simple random sample assumption is valid.

15 Scenario ng no Cˆ RBparty RBdaysMC BRMC RRMSEMC CPMC Cˆ1 2 0 14 13 96 Cˆ 2 0 17 13 96 (A) 4 2 2 CˆDE 2 0 9 18 97 CˆR 2 0 15 13 96 Cˆ1 2 0 23 7 96 Cˆ 2 0 32 7 96 (A) 10 5 2 CˆDE 2 0 17 11 97 CˆR 2 0 27 7 96 Cˆ1 -12 3 -36 25 95 Cˆ -7 1 -23 25 96 (B) 4 2 2 CˆDE -2 0 -5 35 98 CˆR -2 -1 -12 26 97 Cˆ1 -12 1 -89 16 83 Cˆ -7 0 -47 14 91 (B) 10 5 2 CˆDE -2 0 -8 21 97 CˆR -2 0 -15 14 95

Table 2.3: Monte Carlo measures for the simulation with µb = 130. Numbers are expressed in %.

The survey also recorded data on shore anglers but we focus on boat anglers only. There were fifteen derby days during the study period. During those days, a fishing derby was organized on Kootenay Lake, with entry fees and substantial prize money ($ 100s or $ 1000) for the largest fish. Derbies are organized mostly by local businesses (or a community group). For sake of simplicity we chose to exclude derby days for this analysis. Estimates of total catch on these days could have been obtained separately and then added to our total estimates. Because the sampling of derby days is independent from the sampling on other days, bias and variances add up. Hence a variance estimate for the total over derby and non derby days altogether could be obtained by summing the variance estimates obtained for derby and non derby days respectively. The ground portion of the survey was located at the following access points: Balfour, Boswell, Kuskanook, Kaslo, Riondel, Crawford Bay and Woodbury; see map in Figure 2.2. During the sampled days, angling parties returning to those access points were interviewed to determine the number of fish kept and released from each species, the start and the end time of the angling trip, and other variables. The aerial survey was conducted around noon, which is the peak daily activity. The number of boats showing fishing activity was counted once as the airplane flew out and again on the return flight. We compute the quantity Ao as the average of the inbound and outbound counts. We compute Ag as the average of the number of parties fishing at the inbound overflight midtime and the outbound overflight midtime. For example, if

16 .! Survey Access Points Alberta µ

New Denver Revelstoke Silverton Kaslo .! Vancouver Nelson.!Cranbrook Victoria Washington Idaho Montana

Woodbury Slocan .! Riondel .!

Crawford Bay .! Balfour .!

Nelson !( Boswell .!

Kuskanook !( .!

0 5 10 20 KSialolmoeters

Map Projection/Coordinate System: NAD 1983 UTM Zone 11N !(

Figure 2.2: Kootenay Lake and the creel survey access points. Riondel/Crawford Bay and Boswell/Kuskanook ramps were combined for field monitoring and data analysis. Map provided by A. Waterhouse, Ministry of Forests, Lands, and Natural Resource Operations.

17 Period Weekdays Weekends (yyyy-mm) N ng no N ng no 2010-12 22 2 1 9 2 1 2011-01 21 1 1 10 2 1 2011-02 20 1 0 8 2 0 2011-03 23 1 1 8 2 1 2011-04 21 2 1 6 2 1 2011-05 22 3 2 6 4 3 2011-06 22 3 2 8 4 2 2011-07 21 3 2 10 4 2 2011-08 23 3 2 8 3 2 2011-09 22 3 2 8 2 2 2011-10 20 3 1 5 2 0 2011-11 22 3 1 5 2 2

Table 2.4: Allocation of sample size in the 2010-2011 Kootenay Lake Creel Survey. the overflight takes place from 12 pm to 1 pm on the way out and 1 pm to 2 pm on the way in, then Ag is the average of the number of interviewed parties fishing at 12:30 and 1:30. This can introduce a bias in the estimates that we assume to be negligible. Finding the most suitable way of computing the quantities Ao and Ag for aerial surveys that span considerable time remains an open question. This is a significant consideration in this lake that can take between 45 minutes and 1.5 hours to fly in one direction. In this work, we present the results for the variable number of rainbow trout kept. Plots of the data in Appendix A.5 give insight on the proportion of parties interviewed, the total catch and the number of fishing parties per day, respectively. Variance estimates cannot be obtained in strata that have zero or one aerial survey. For this reason, we present stratum estimates for the months of May to September only; see

Figure 2.3 (those total estimates do not include derby days). The estimators Cˆ1, Cˆ2 and

CˆR produce similar results in each stratum. This may not be the case in other scenarios or studies. Furthermore, the confidence intervals in some strata are very wide, whereas they are significantly shorter in other strata. This is explained by the small sample sizes in some strata producing quite variable estimates. Rather surprisingly, the estimator CˆDE has a much smaller confidence interval in some strata, especially in August. An inspection of the data reveals that, in those cases, the values of yi for the no = 2 sampled overflight days turn out to be very close, thus leading to a small variance estimate for CˆDE.

18 Figure 2.3: Monthly estimates of total number of rainbow trout kept along with approximate 95% confidence intervals. The top and bottom graphs represent weekends and weekdays respectively. The estimators (2.1) to (2.4) are represented respectively by the following symbols: triangle, circle, x mark and square.

19 The optimal allocation was computed for the months of May to September, separately for weekends and weekdays. The cost of an overflight on Kootenay Lake is approximately $1,200 whereas the daily cost of an access-point survey is approximately $1,600. From the results in Table 2.5, we cannot conclude generally that conducting fewer overflight than ground surveys is the best strategy. In particular the results suggest that estimation for June to September weekdays would be more efficient with an equal number of overflight and ground survey days. Note that these results apply only to the variable “number of rainbow trout kept" so optimal allocation should also be investigated for other key variables of the study before a decision on sample size allocation is made.

Weekdays Weekends

Cb1 Cb2 CbR Cb1 Cb2 CbR May 0.14 0.14 0.08 0.60 0.65 0.69 June 1.00 1.00 1.00 0.87 0.87 0.87 July 1.00 1.00 1.00 0.69 0.69 0.66 August 1.00 1.00 1.00 1.00 1.00 1.00 September 1.00 1.00 1.00 0.68 0.67 0.60

Table 2.5: Optimal values of no/ng for each month and day type combination for the number of rainbow trout kept. Note that we do not present results for the double expansion estimator because in that case the optimal allocation is no = ng.

In order to illustrate the methods described in Section 2.3.5, we produced estimates at the seasonal level (see Table 2.6). The year was divided into three seasons: winter (December to March), shoulder (April, May, October, November) and summer (June to September). We chose to compute combined estimates rather than separate estimates in order to prevent the bias from becoming important relative to the standard error. However, when no is equal to zero or one in some stratum, the combined variance estimators cannot be computed. But because stratification is expected to enhance the efficiency of estimators (provided that the strata are sufficiently homogeneous, which should be satisfied here), one can pool some strata and pretend the data were obtained from a two-phase srs/srs sample (without stratification) for computing the variance estimate. This variance estimate is expected to overestimate the variance and provide confidence intervals with coverage probability greater than 1 − α. For the winter analysis, we pooled all weekdays together and all weekends together. For the shoulder analysis, we pooled April and May weekdays, October and November weekdays, and similarly for weekends. No pooling was necessary for summer. We also computed a total estimate over the whole survey period by summing the seasonal estimates and their variance estimates (separate estimator strategy). We observe that estimates are highest in the summer season and lowest during winter. The confidence intervals are also narrower (relative to the estimate values) than those associated with monthly estimation in Figure 2.3.

20 Shoulder Season Summer Season Winter Season Total Apr, May, Oct, Nov June to Sept Dec to Mar Dec to Nov Est Low Upp Est Low Upp Est Low Upp Est Low Upp Cˆ1 1672 1265 2079 3341 2872 3809 874 576 1172 5887 4767 7006 Cˆ2 1639 1231 2048 3526 2895 4156 887 612 1162 6052 4796 7308 CˆDE 2274 1840 2708 4027 3357 4698 1312 865 1758 7613 6136 9090 CˆR 1715 1368 2062 3616 2992 4239 873 565 1181 6203 4983 7424

Table 2.6: Seasonal combined estimates (Est) of total number of rainbow trout kept along with approximate 95% confidence intervals (Low,Upp). The last column is computed as a separate total estimate over the three seasons.

2.6 Discussion

In this chapter, we have provided estimation strategies to be used for aerial-access creel surveys with overflights occurring only on a subset of access survey days. The estimators

CˆR and CˆDE were shown to be the most suitable in terms of bias. However, the bias may be substantial if one cannot assume that parties fishing on a given day are generated from a model which gives to each party the same probability of being interviewed. Simulation results have shown that, when the first-phase sampling fraction is small, the bias of Cˆ1 and Cˆ2 can be negligible relative to standard error and thus does not affect the coverage of confidence intervals. We have applied our methods to the 2010-2011 Kootenay Lake creel survey for one variable of interest of the survey: the number of rainbow trout kept. Although conducting fewer overflights than ground surveys is thought to be more economical by the fisheries managers, our optimal allocation results suggest that this might not be true for a number of months/day type combinations for the estimation of totals, as many allocations lie on the boundary no = ng. However, the biological data obtained from the ground surveys is quite valuable for fisheries scientists. For example, changes in fish size and age composition are often used to evaluate population responses to management decisions such as changed daily catch limits. These variables do not require aerial surveys (the purpose of aerial surveys is to be able to estimate total effort and catch) but may suggest more ground survey effort to adequately describe their trends. A decision about optimal allocation for future years thus needs to balance the relative importance of the different quantities of interest of the survey.

It also remains to determine the best way to compute the quantities Ag when the overflight is not quite instantaneous, as in the Kootenay Lake survey where the average flight time one way is one hour. Another topic of interest is to investigate the appropriateness of the assumption that all scheduled overflights are conducted or that those missed occurred at random. Missing overflights are often due to weather conditions which can be related to the variable of interest such as catch and fishing effort. Ignoring the non response in this case could possibly lead to biased estimates.

21 Finally, the methods presented in this chapter can be applied in contexts other than fisheries; for instance, to estimate the attendance at a multi-day street festival. In this case, the access survey can consist of posting interviewers at some access locations and collect arrival and departure times. The aerial survey can be replaced by a ground count of people at the peak attendance time of the day. Large areas can be covered by partitioning the total area into smaller sections and assigning a surveyor to each of them.

22 Chapter 3

Explicit integrated population modeling: escaping the conventional assumption of independence

3.1 Introduction

Monitoring changes in population size and structure (age, sex) provides valuable insight for effective management of animal populations. A common way to gain insight into pop- ulation dynamics is to capture and mark cohorts of individuals with a unique identifier followed by recaptures and/or resightings and/or dead recoveries of the marked animals. Capture-recapture, mark-resight, mark-recovery or mark-recapture-recovery surveys may be supplemented by other types of surveys on the same population such as periodic counts of individuals (all, adults, females, unmarked, etc.) or nests. Those counts are typically subject to observational error. When multiple surveys are used to study a single population, the data can be analyzed separately by survey. However, a joint analysis of the data via integrated population mod- eling is often preferred because it can provide more precise estimates and/or permit the estimation of parameters that cannot be estimated using separate analyses. For instance, capture-recapture data and population count data alone do not permit the estimation of a fecundity rate but an integrated population model that combines both datasets does. For a recent review of publications where integrated population modeling has been used with and mammal populations, see Schaub and Abadi (2011) . Currently, integrated population models are typically formulated by multiplying the like- lihoods of the various datasets. In some circumstances, this approach, while approximate, is indeed very good - for example, if the different surveys are conducted on sub-populations

23 which do not share many individuals in common (nearly independent datasets), but have common demographic parameters. Simulation studies have been conducted to compare the estimates obtained when multiplying the likelihoods for both dependent and independent datasets; see Besbeas, Borysiewicz and Morgan (2008) and Abadi et al. (2010). However, an important approach has not been compared in these empirical studies, that is, the analysis of dependent datasets using the true joint likelihood. We pursue this idea in this chapter. In parallel with our work, there has been recently a growing interest in using integrated population modeling methods that do not rely on an independence assumption, see e.g. Chandler and Clark, 2014, for a solution based on data augmentation. To simplify the presentation of our methodology, we focus, for most of this chapter, on the case of a population studied using two surveys: a capture-recapture survey and a population count survey. In section 3.2, we give some background and notation. In section 3.3, we develop the model based on the true joint likelihood and we further explain how our model can be modified to accommodate a variety of situations (not only capture-recapture and population count data). In section 3.4, we present the results of a simulation study. Finally, in section 3.5, we apply our methodology to data from a colony of Greater horseshoe bats (Rhinolophus ferrumequinum) in Switzerland.

3.2 Background and notation

3.2.1 Capture-recapture survey

The data collection process of capture-recapture entails sending a survey crew into the field on a series of capture occasions. When an animal is captured for the first time, it is marked with a unique tag and released in its environment so that it can be identified if recaptured at a further capture event. When marked individuals are recaptured at a further capture event, their identity is recorded. Suppose that there are K capture occasions. The capture-recapture data can be sum- marized into an m-array1 with K − 1 lines and K columns:

  M12 M13 ...M1K Z1    M23 ...M2K Z2     . . .  .  .. . .    MK−1,K ZK−1

The first K − 1 columns of the m-array form an upper-triangular array that we denote by M with the lines indexed by i = {1,...,K − 1} and the columns indexed by j =

1The term m-array is commonly used in capture-recapture studies to summarize the capture-recapture from individual capture histories.

24 {i + 1,...,K}. The elements Mij represent the number of individuals released on occasion i (after either being captured for the first time or recaptured) that are alive and recaptured for the next time on occasion j. The last column of the m-array is denoted by Z and is indexed by {1,...,K − 1} with Zi representing the number of individuals released at occasion i that are never recaptured. The capture-recapture data can be modeled using a Cormack-Jolly-Seber model, which conditions on the number of animals released at each occasion. The lines of the m-array are modeled using independent multinomial distributions conditional on the number of releases:

indep [Mi,i+1,...,Mi,K ,Zi|Ri = ri] ∼ Multinomial (ri, qi) , for i = 1,...,K − 1, (3.1)

PK where Ri = l=i+1 Mil + Zi is the number of individuals released at time i and

K−i !> X qi = q(i,i+1), . . . , q(i,K),1 − q(i,i+l) l=1 is a vector of size K − i + 1 where qij represents the probability that a marked individual survives2 from occasion i to occasion j, and is not recaptured until occasion j. Let φ = > (φ1, . . . ,φK−1) with φj representing the individual’s probability of apparent survival from > occasion j to j + 1 and p = (p2, . . . , pK ) with pj representing the individual’s probability of recapture at occasion j. Then, the qij’s can be expressed in terms of φ and p. For example, q46 = φ4(1 − p5)φ5p6 is the probability that a marked individual survives from occasion 4 to occasion 6, and is not recaptured until occasion 6. The Cormack-Jolly-Seber model relies on a number of assumptions:

i. Survival is independent between individuals and does not depend on individual char- acteristics (sex, age etc.)

ii. Capture is independent between individuals and does not depend on individual char- acteristics (sex, age etc.)

iii. No temporary emigration (permanent emigration is confounded with death)

iv. No tag loss, no recording errors and marking does not affect the future behavior of an individual

v. Capture occasions are instantaneous

For inference, a conditional likelihood, L(φ, p|M,Z), is formed simply as the product of the K − 1 multinomial densities in (3.1).

2Apparent survival is used because permanent emigration from the study area is indistinguishable from death. Unless explicity stated, survival in this chapter is always apparent survival.

25 3.2.2 Population count survey

A count survey of the population provides information on the relative changes in population size over time. Let us suppose that a population is studied using K population counts equally spaced in time. Let the count data be denoted by a vector Y, of size K, with

Yi being the number of individuals counted on occasion i. Note that the counts Yi are typically imperfect counts because they are subject to observational error. The count data is typically modeled using a state-space model (Buckland et al., 2004). State-space models involve a state-process and an observational process. In this case, the state process is the latent process that governs the changes in population size between counts. Let N = > (N1,...,NK ) , where Nj is the population size at the time of the jth population count.

The state process is specified by specifying a distribution for Nj conditional on Nj−1. We assume that the birth process is instantaneous and that births occur right after population counts. A simple model for Nj, j = 2,...,K that accounts for births and survival could be

Nj|Nj−1,Bj−1 ∼ Binomial (Nj−1 + Bj−1, φj−1) , for j = 2,...,K (3.2) with the number of births right after the jth count defined as

Bj|Nj ∼ Poisson (Njfj/2) , for j = 1,...,K − 1. (3.3)

The division by 2 is a way to estimate, assuming a 50/50 sex-ratio, a fecundity per female. For sake of simplicity, this model assumes that juvenile and adult survival probabilities are the same although this is often not true in real populations. This state-space model relies on a number of assumptions:

i. Females start reproducing at the age of one year old

ii. The expected sex ratio of newborns is 50%

iii. Survival is independent between individuals and does not depend on individual char- acteristics (sex, age etc.)

iv. No immigration and no temporary emigration (permanent emigration is confounded with death).

In addition to the state-space process, an observation process describes the population count data, Y, conditional on N. In practice, a normal distribution is often used as an approximation: indep  2 Yj|Nj ∼ Normal Nj,σ , for j = 1,...,K.

26 Note that if larger counts are thought to be less precise than smaller ones, a log-normal distribution can be used instead:

indep  2 log(Yj)|Nj ∼ Normal log(Nj),σ , for j = 1,...,K.

The likelihood of the count survey data is thus

K 2 X Y L(φ, f,σ ,N1|Y) = P (Y1|N1) [P (Yj|Nj)P (Nj|Bj−1,Nj−1)P (Bj−1|Nj−1)] , (N∗,B)∈Ω j=2

∗ > ∗ where N = (N2, . . . ,NK ) and Ω is the set of all possible values for (N , B).

3.2.3 Integrated population modeling via likelihood multiplication

In the ecological literature, integrated population models have typically been obtained by multiplying the likelihoods of the separate datasets. In the case of a population studied with both a capture-recapture and a count survey, the following pseudo-likelihood is constructed by multiplying the capture-recapture likelihood and the population count likelihood:

c 2 L (φ, f, p,N1|Y,M,Z) = L(φ, p|M, Z)L(φ, f,N1,σ |Y). (3.4)

Note that the capture-recapture likelihood and the population count survey likelihood have a parameter φ in common, which represents both the survival between capture occa- sions and between counts. Therefore, this approach assumes that the jth capture occasion and the jth count occur at about the same time for all js. The capture-recapture data and the count data are not independent when both surveys are conducted on a single population (or overlapping populations). In the literature so far, the term independence assumption has been coined when describing the likelihood (3.4). The use of this likelihood is attractive in practice because of its simplicity and because it uses a reduced number of parameters. This likelihood multiplication approach is reminiscent of the naive Bayes approach (Koller and Friedman, 2009). Equation (3.4) is not the true joint likelihood but rather a composite likelihood (Varin et al., 2011). Hence, it provides unbiased estimating equations. However, pretending that it is the true joint likelihood for inference leads to incorrect variance estimates and hence con- fidence intervals that do not have the targeted confidence level. Surprisingly, this character- istic of the composite likelihood seems to have been overlooked in the integrated population modeling literature so far. In particular, the simulation studies of Besbeas, Borysiewicz and Morgan (2008) and Abadi et al. (2010) investigate the frequentist properties of the estima-

27 tors but do not investigate the properties of the variance estimators and confidence/credible intervals. This will be addressed in Section 3.4.

3.3 Integrated population modeling based on the true joint likelihood

3.3.1 Capture-recapture and count data

Suppose, as in Section 3.2, that we have capture-recapture data (M, Z) and count data Y and that assumptions i.-v. and i.-iv. from Sections 3.2.1 and 3.2.2, respectively, are met. In order to formulate an explicit model, we have to take into account the order in which the surveys and the demographic gains and losses occur in the population. For sake of illustration, we assume that the events follow the timeline represented in Figure 3.1.

Figure 3.1: Timeline of events of the animal population study. The symbols “C”, “B” and “CR” stand for count survey, births and capture-recapture, respectively. Note that the time between the count survey, the births and the capture-recapture survey in each period is negligible.

The formulation of an explicit integrated population model based on the true joint likelihood can be achieved using a Bayesian model (Koller and Friedman, 2009). The key to formulating the true joint likelihood is to introduce a set of latent variables so that when combined with the capture-recapture data, one can deduce, at any point in time, the state of the population, that is

• the number of unmarked animals in the population

• the number of marked animals remaining (alive and not recaptured) in each released cohort.

A set of variables that is appropriate is

• N1, the population size at the beginning of the study

u u • D , a vector of length K − 1, where Dj represents the number of unmarked animals that died in period j

28 • Dm, an upper triangular array indexed by i = {1,...,K −1} and j = {i, . . . , K −1} m where Dij represents the number of marked animals released for the last time in period i that died in period j

3 • B, a vector of length K, where Bj represents the number of births in period j.

In order to be able to follow the transition of individuals from an unmarked state to a marked state, it is convenient to reparametrize the capture-recapture data as (M,C) rather than (M,Z), where C is a vector of length K − 1 where Cj represents the number of individuals captured for the first time (i.e. marked) in period j. The relationship between

(M,C) and (M,Z) is one-to-one; they contain the same information. The quantity Cj can be computed from the capture-recapture data M and Z as the number of individuals released at period j minus the number of individuals recaptured at period j, that is:

 K  j−1 X X Cj = Zj + Mjl − Mkj for 2 ≤ j ≤ K − 1, l=j+1 k=1

 PK  with C1 = Z1 + l=2 M1l . u m To show that our parametrization {N1, D , D , B, M, C} allows us to track the state of the population at any point in time, we constructed Table 3.1, which illustrates the case of K = 3 periods. Each line of the table shows the distribution of the population across states at a given time. Each column of the table follows the change in population size, over time, per state. We added a column to the right of the table to keep track of the total population, because this column will be useful for modeling the count data.

3Births is the term generally used to represent ANY source of new animals to the study area. The new animals in general do not have to be juvenile animals.

29 Timeline Number of unmarked Number of marked Number of marked Number of marked Total number of of events individuals individuals last individuals last individuals last individuals released during released during released during period 1 period 2 period 3 N1 N1 Count N1 N1 Births N1 + B1 N1 + B1 Period 1 Captures N1 + B1 − C1 C1 N1 + B1 Deaths u m u m N1 + B1 − C1 − D1 C1 − D11 N1 + B1 − D1 − D11 Count u m u m N1 + B1 − C1 − D1 C1 − D11 N1 + B1 − D1 − D11 Births 2 2 P u m P u m N1 + Bj − C1 − D1 C1 − D11 N1 + Bj − D1 − D11 j=1 j=1 Period 2 C&R

30 2 2 2 P P u m P u m N1 + Bj − Cj − D1 C1 − D11 − M12 C2 + M12 N1 + Bj − D1 − D11 j=1 j=1 j=1 Deaths 2 2 2 2 2 2 2 2 P P P u P m m P P u P P m N1 + Bj − Cj − Dj C1 − D1j − M12 C2 + M12 − D22 N1 + Bj − Dj − Dij j=1 j=1 j=1 j=1 j=1 j=1 i=1 j=i Count 2 2 2 2 2 2 2 2 P P P u P m m P P u P P m N1 + Bj − Cj − Dj C1 − D1j − M12 C2 + M12 − D22 N1 + Bj − Dj − Dij j=1 j=1 j=1 j=1 j=1 j=1 i=1 j=i Births 3 2 2 2 3 2 2 2 P P P u P m m P P u P P m N1 + Bj − Cj − Dj C1 − D1j − M12 C2 + M12 − D22 N1 + Bj − Dj − Dij

Period 3 j=1 j=1 j=1 j=1 j=1 j=1 i=1 j=i C&R 3 3 2 2 3 2 3 2 2 2 P P P u P m P m P P P u P P m N1 + Bj − Cj − Dj C1 − D1j − M1j C2 + M12 − D22 − M23 C3 + Mi3 N1 + Bj − Dj − Dij j=1 j=1 j=1 j=1 j=2 i=1 j=1 j=1 i=1 j=i Deaths

Table 3.1: Changes in the population size per state over time for a study with K = 3 periods. The table follows the timeline in Figure 3.1. Starting in the upper left corner of the table, the population is comprised of N1 unmarked individuals at the beginning of period 1. Then, the count survey occurs (which does not affect the state nor size of the population). Then, B1 births occur resulting in N1 + B1 unmarked individuals in the population. Then, C1 individuals are captured, marked and released which leaves N1 + B1 − C1 unmarked u m individuals in the population. Then, D1 unmarked individuals die and D11 marked individuals die. When period 2 begins, there are u m respectively N1 + B1 − C1 − D1 and C1 − D11 unmarked and marked individuals in the population. The table goes on like this until the study is finished. Note: C & R is used to abbreviate “captures and recaptures”. Next, we exploit Table 3.1 to define conditional distributions for the data M, C and Y u m as well as for the latent random variables B, D and D (we do not model N1 because this parameter is at the top of the hierarchy). For ease of notation, we do not specify the variables that the distributions are conditioned upon. Also, sums that go backwards are defined as zero and undefined variables (e.g. M11) are defined as zero. The parts of the equations that are highlighted are derived from Table 3.1.

 u  u Pj−1 • Cj ∼ Binomial Nj + Bj , ξj for j = 1,...,K − 1, where Nj = N1 + l=1 (Bl − u Cl − Dl ) is the number of unmarked individuals at the beginning of period j

Pj−1 m Pj−1 • Mij ∼ Binomial( Ri − l=i Dil − l=i+1 Mil , pj), for i = 1,...,K − 1 and j = Pi−1 i + 1,...,K, where Ri = Ci + k=1 Mki is the number of released individuals in period i

2 Pj−1 u • Yj ∼ Normal Nj , σ , for j = 1,...,K, where Nj = N1 + l=1 (Bl − Dl − Pj−1 m k=1 Dkl ) is the population size at the beginning of time j

• Bj ∼ Poisson ( Nj fj/2) , for j = 1,...,K

u  u  • Dj ∼ Binomial Nj + Bj − Cj , 1 − φj , for j = 1,...,K − 1

m  Pj−1 m Pj  • Dij ∼ Binomial Ri − l=i Dil − l=i+1 Mil , 1 − φj , for i = 1,...,K−1 and j = i, . . . , K − 1

The parameters φ, σ2, p and f used throughout are defined as in Section 3.2. The Poisson and Normal distributions were chosen arbitrarily for sake of illustration. In addition, we > introduced the parameter ξ = (ξ1, . . . ,ξK−1) , where ξj represents the probability for an unmarked individual to be captured on occasion j. In some capture-recapture studies, we would set (ξ2, . . . ,ξK−1) = (p2, . . . ,pK−1) when marked and unmarked individuals have the same capture probability at each occasion. The true joint likelihood is obtained as

L(φ, f, p,N1, ξ|M, C, Y) = (3.5) " K # K−1  K K−1  X Y Y  u Y Y m  P (Yi)P (Bi) P (Di )P (Ci) P (Mij) P (Dij ) , (B,Dm,Du)∈Ω∗ i=1 i=1  j=i+1 j=i  where the densities in (3.5) are given in the earlier bullet list and are actually conditional densities but the conditioning variables have not been explicited to simplify the notation. Also, Ω∗ is the set of all possible values for (B, Dm, Du). The observed data likelihood (3.5) highlights that the datasets (M, C) and Y are not independent - but they are conditionally independent given the latent variables B, Dm and Du.

31 It is computationally convenient to fit the model (3.5) using a Bayesian approach via MCMC techniques. Bayesian software that have built-in MCMC algorithms such as Win- BUGS and JAGS have become widespread in the statistical ecology community and are user-friendly. Thus, we adopt a Bayesian framework in Sections 3.4 and 3.5.

3.3.2 Model variations

The model that we developed can easily be modified to suit a variety of situations that we have not considered explicitly in this chapter so far. In this section, we give three examples of this. Some involve combining the capture-recapture and count data with other types of data (e.g. dead-recovery data, newborn counts). Other modifications are driven by the availability of individual categorical covariates (e.g. sex, age) in the capture-recapture data.

• Consider the case where a count of newborns is carried out in every period, right after

births occur. Let this data be given by J = (J1,...,JK ), where Jj is the newborn count in period j. The data can be incorporated in the model by using a distribution that models the measurement error in the newborn counts. For example, one could

model Jj, for j = 1,...,K, using Jj ∼ Poisson(Bj).

• Consider the case where dead-recovery data is collected. The data consists of reported identification numbers of marked individuals found dead throughout the study. The data can be summarized in the form of an upper-triangular array, H, indexed by

i = {1,...,K − 1} and j = {i, . . . , K − 1}. Cell Hij contains the number of marks recovered dead in period j from individuals that were released for the last time on period i. Assuming that the dead recoveries occur at the very end of each time period and that a mark may only be recovered in the period that an individual died, the dead recovery data can be modeled in the following way:

m Hij ∼ Binomial(Dij ,ρj),

where ρj is the probability that a marked individual that died during period j has its mark recovered. If marks can be recovered in any period after death occurs, then we suggest the use of a more complicated model to account for all the possible periods that death might have occurred:

j X Hij = Xilj, l=i m where X111 ∼ Binomial (D11, ρ1) and other Xilj’s are given by

 j−1  m Y Xilj ∼ Binomial Dil , ρj (1 − ρm) . m=l

32 • Individual categorical covariates such as age or sex may be recorded at the time of capture. The capture-recapture data can thus be expressed in a number of separate m- arrays, e.g. one for individuals released as juveniles and one for individuals released as adults. In order to model the extra covariate information properly, the latent state structure of the marked population in Table 3.1 needs to be changed. Each combination of release period and covariate value should have its own column, e.g. juveniles released in period 2. The modeling may also require the use of different parameters for each covariate value, e.g. φ1 and φA for juvenile and adult survival probabilities.

In Section 3.5, we apply some of the variations discussed in this section to a real dataset.

3.4 Simulation Study

We conduct a simulation study in order to compare inference under the composite likelihood and the true joint likelihood approaches presented in Sections 3.2.3 and 3.3.1, respectively. For sake of simplicity and faster convergence, we assume that parameters ξ, p, f and φ are constant over time. We consider four scenarios:

Scenario ξ p f φ Characteristic 1 0.2 0.2 2 0.5 Low sample effort, low turnover 2 0.2 0.2 6 0.25 Low sample effort, high turnover 3 0.5 0.5 2 0.5 High sample effort, low turnover 4 0.5 0.5 6 0.25 High sample effort, high turnover

By varying ξ, p, f and φ across scenarios, we aim to vary the amount of dependency between the capture-recapture and the count data. Scenario 3 has the highest dependency between the datasets because it has the largest expected proportion of marked animals over time. That is, it has the highest capture and recapture probabilities along with the slowest renewal of the population (lowest fecundity rate and highest survival probability). Conversely, scenario 2 has the lowest dependency between the datasets.

Common to all four scenarios, we set N1 = 500 for the initial population size and σ = 30 for the standard deviation of the population counts. For each of the four scenarios, we generate H = 250 datasets with K = 4 years of data, by sampling from the true model described in Section 3.3.1. Each dataset is analyzed using both the composite likelihood and true joint likelihood inference methods (described in Sections 3.2 and 3.3.1, respectively) in a Bayesian framework. Parameter estimates are computed as posterior means. For each scenario and inference method, we summarize the results for each parameter using the Monte Carlo bias (Bias), relative root mean square error (RMSE), bias ratio (BR), expected posterior standard deviation (E.SD), expected length of the 95% HPD, i.e. highest

33 probability density, credible interval (E.LCI) and coverage probability of the HPD credible interval (CP). The formulas used to compute those Monte Carlo measures are given in Appendix B.1. For each parameter, we also compute the proportion of the H = 250 populations for which the parameter estimate from the true joint likelihood approach is closer in absolute value to the true parameter than the composite likelihood approach (PS.AE). Similarly, we compute the proportion of the H = 250 populations for which the posterior standard deviation from the true joint likelihood approach is smaller than the composite likelihood approach (PS.SD) and for which the credible interval length from the true joint likelihood approach is smaller than the composite likelihood approach (PS.LCI). The formulas used to compute those Monte Carlo measures are also given in Appendix B.1. Sampling from the posterior distributions is done using MCMC techniques through the JAGS software. In order to save computational time, we set the initial parameter values of the Markov chains equal to the true parameter values. The number of iterations per run chain was chosen as conservative and determined by visual inspection of the trace plots of the first three populations. The chains for the true joint likelihood method are run for 2,000,000 iterations for scenarios 1 and 2 and 1,000,000 iterations for scenarios 3 and 4. The chains for the composite likelihood method are run for 600,000 iterations. All chains are thinned so to keep 100,000 iterations. For our Bayesian data analysis, we use, whenever possible, the same prior distributions for both the composite likelihood and the true joint likelihood approaches. This is important because we do not want differences in the results between the two methods to be artifacts of different priors. The priors for the parameters p and φ are Beta(1,1), the prior for f is Uniform(0,10), the prior for N1 is a discrete uniform on [1,2000] and the prior for σ is Uniform(0,100). The prior for ξ in the true joint likelihood approach is Beta(1,1).

34 Scenario Method True value Bias RMSE BR E.SD E.LCI CP Parameter φ 1 L 0.5 -0.01 0.06 -0.11 0.07 0.26 0.96 1 Lc 0.5 -0.01 0.07 -0.14 0.08 0.29 0.96 2 L 0.25 0.01 0.04 0.25 0.05 0.18 0.98 2 Lc 0.25 0.01 0.04 0.15 0.05 0.19 0.98 3 L 0.5 -0.00 0.02 -0.03 0.02 0.09 0.95 3 Lc 0.5 0.00 0.02 0.00 0.02 0.10 0.96 4 L 0.25 0.00 0.02 0.06 0.02 0.07 0.94 4 Lc 0.25 0.00 0.02 0.04 0.02 0.08 0.94 Parameter f 1 L 2 0.21 0.59 0.37 0.56 2.10 0.96 1 Lc 2 0.30 0.77 0.42 0.72 2.75 0.97 2 L 6 0.13 1.10 0.12 1.43 5.22 0.97 2 Lc 6 0.26 1.26 0.22 1.57 5.74 0.97 3 L 2 0.02 0.14 0.11 0.14 0.56 0.96 3 Lc 2 0.02 0.22 0.08 0.28 1.12 0.99 4 L 6 0.05 0.55 0.10 0.52 2.02 0.95 4 Lc 6 0.09 0.67 0.14 0.75 2.95 0.96 Parameter p 1 L 0.2 0.02 0.05 0.39 0.04 0.17 0.96 1 Lc 0.2 0.02 0.05 0.45 0.05 0.19 0.96 2 L 0.2 0.01 0.04 0.15 0.05 0.19 0.96 2 Lc 0.2 0.01 0.05 0.26 0.05 0.20 0.96 3 L 0.5 0.00 0.03 0.09 0.03 0.12 0.96 3 Lc 0.5 0.00 0.03 0.06 0.03 0.13 0.96 4 L 0.5 0.00 0.05 0.09 0.04 0.17 0.94 4 Lc 0.5 0.01 0.05 0.12 0.05 0.18 0.93

Parameter N1 1 L 500 -3.85 23.24 -0.17 30.39 124.10 0.98 1 Lc 500 -2.52 25.49 -0.10 39.76 168.00 0.99 2 L 500 -2.62 22.94 -0.11 28.51 115.48 0.98 2 Lc 500 3.14 25.47 0.12 40.01 169.16 1.00 3 L 500 -0.02 20.89 -0.00 29.49 119.66 0.96 3 Lc 500 1.30 25.04 0.05 41.24 173.78 0.99 4 L 500 -2.86 20.33 -0.14 26.54 107.70 0.98 4 Lc 500 -1.43 26.08 -0.05 40.47 171.07 0.98 Parameter σ 1 L 30 8.62 16.31 0.62 20.65 72.78 0.96 1 Lc 30 10.78 17.47 0.78 22.06 77.02 0.93 2 L 30 9.19 16.48 0.67 20.50 72.65 0.95 2 Lc 30 10.84 17.53 0.79 22.35 77.79 0.92 3 L 30 10.60 17.29 0.78 19.94 71.22 0.93 3 Lc 30 12.77 18.63 0.94 22.06 77.18 0.93 4 L 30 8.94 17.19 0.61 19.37 68.90 0.93 4 Lc 30 11.50 18.02 0.83 22.33 77.80 0.91

Table 3.2: Monte Carlo measures comparing the performance of the true joint likelihood approach (L) and the composite likelihood approach (Lc) in the simulation study, across scenarios and parameters. Each Monte Carlo measure is based on 250 simulated datasets. 35 Scenario PS.AE PS.SD PS.LCI Parameter φ 1 0.64 0.86 0.86 2 0.61 0.64 0.65 3 0.61 0.93 0.93 4 0.53 0.82 0.82 Parameter f 1 0.61 1.00 1.00 2 0.65 0.82 0.76 3 0.69 1.00 1.00 4 0.60 1.00 1.00 Parameter p 1 0.58 0.99 0.98 2 0.61 0.82 0.84 3 0.58 0.96 0.96 4 0.53 0.93 0.92

Parameter N1 1 0.61 0.95 0.98 2 0.59 0.94 0.98 3 0.60 0.96 0.97 4 0.58 0.96 0.98 Parameter σ 1 0.66 0.90 0.84 2 0.59 0.88 0.79 3 0.62 0.91 0.82 4 0.55 0.96 0.85

Table 3.3: Monte Carlo estimates of P (WL ≤ WLc ), where W stands for either the absolute error (AE), the standard deviation of the posterior sample (SD) or the length of the 95% HPD credible interval (LCI). Each Monte Carlo measure is based on 250 simulated datasets.

36 The results are displayed in Tables 3.2 and 3.3. Additional plots are provided in Ap- pendix B.2. We observe from Table 3.2 that the Monte Carlo RMSEs and expected posterior standard deviations obtained with the true joint likelihood approach are always smaller than or equal to those obtained with the composite likelihood approach. Furthermore, turning to Table 3.3, the estimators were closer to the true value with the true likelihood approach more than half of the time (between 53 − 69% of the samples depending on the parameter and scenario). The true likelihood approach yielded smaller credible intervals than the com- posite likelihood approach more than half of the time (between 65 − 100% of the samples depending on the parameter and scenario). We did not find evidence that the difference in performance between the true likelihood approach and the composite likelihood approach is more important when there is more dependency in the datasets. The boxplots for the parameters φ and f in Appendix B.2 show this quite well, when comparing scenarios 1 with 3 and 2 with 4. Because the variability in the estimates is higher when the sampling effort is lower, the gain in absolute value of using the true likelihood approach is greater. This result is quite interesting because we feel that there is at the moment a general presumption in the current integrated population modeling literature that when the dependency is low between the capture-recapture data and the count data, the performance of the composite likelihood method is very close to that of the true joint likelihood because the composite likelihood is similar to the true joint likelihood . A reassuring result in favor of the use of the simpler composite likelihood method is that the credible interval coverage probabilities were all close to the 95% target (between 91% and 100% across scenarios and parameters, with most of them greater than or equal to 95%). However, there is no guarantee of a similar behavior in other studies, when modeling assumptions and/or data types are different. With both the true likelihood approach and the composite likelihood approach, the bias in the estimates of the parameter σ was significant. We believe that this is due to the fact that the parameter σ is quite sensitive to the prior choice and the estimates got pulled towards the upper tail of the prior distribution, see Gelman (2006). Also, with both the true likelihood approach and the composite likelihood approach, the bias in the estimates of the parameter f was significant and positive in scenarios 1 and 2. We believe that this is due to influence of the prior on f whose effect diminishes with increasing amount of data. The prior on σ could potentially also have an effect. For the parameters φ, f and p, we notice a lower bias ratio in scenarios that have higher sampling effort. We expected the opposite since increasing sampling effort typically reduces standard errors. However, it seems that in this study, the decrease in bias ratio when increasing sampling effort would be explained by the fact that increasing sampling effort reduces the effect of the prior on the estimate.

37 For the parameters N1 and σ, we notice that, for each method, the difference in the performance of the estimates across scenarios is not important. In particular, increasing the capture-recapture sampling effort does not improve significantly the estimation of N1 and σ. The boxplots for the parameters N1 and σ in Appendix B.2 show this quite well.

Regarding N1, we attribute this result to the fact that most of the information on N1 comes from Y1, which has the same expected value across scenarios. Regarding σ, its estimate is based on the difference between the counts and the expected population size at the time of the counts, which does not vary across scenarios (our values of f and φ were chosen such that the growth factor φ + φf/2 is the same in all scenarios).

3.5 Application

We compared the performance of our integrated population modeling approach to the com- posite likelihood approach using data from a greater horseshoe colony (Rhinolophus ferrumequinum) that lives in the attics of a twelfth-century chapel in Vex, Valais, Switzer- land (46◦130N, 7◦240E); see Sierro et al., 2009. The data consists of capture-recapture data, population counts and newborn counts from 1991 to 2005. This data was analyzed in Schaub et al. (2007) using integrated population modeling based on a composite likelihood approach.

The survey protocol included the following activities:

• Counts of individuals shortly before parturition Every year, except for 1991 and 2001, individuals emerging from the roost at dusk were counted on a day shortly before parturition. These population counts consist of young and adults from both sexes that are present at the colony. Flying bats cannot be aged or sexed.

• Chapel visits for captures and newborn counts while young were left unattended Every year, during the first weeks after parturition, when young were left unattended in the attics, a count of the number of young was recorded and most young were ringed. The ring number along with the sex of the bats were recorded. Generally, the aim was to mark all the newborns in each year. The young must have a certain size in order to be marked, but they must not be independent yet, otherwise they fly away when they are approached. Thus, there is a time window in which the young can be marked. Since the births are always very not synchronous, this means that several visits have to be made to the chapel in order to mark as many as possible. In some years, only one visit has been done due to time constraints and, in some years, the births were very asynchronous and more visits would have been necessary so a number of young remained unmarked. There is also a possibility for the timing to be

38 judged wrongly and then some young were already independent during the first visit. In all cases, the number of young could be counted with very high accuracy.

• Recaptures in the chapel in 2004 and 2005 In 2004 and 2005, at about the same period as the count survey, the main entrance of the chapel was blocked on one day at daylight and all individuals present in the chapel were recaptured. The ring number along with the sex of the bats were recorded. These recaptures were only carried out in 2004 and 2005 in order not to perturb the colony too much.

When modeling the data, we assume the following timeline of events. The study has 15 time periods, which start with a population count (except in 1991 and 2001) and simultane- ous recaptures (2004 and 2005 only), immediately followed by births, immediately followed by newborn counts and simultaneous captures of unmarked animals. In this section, we use the superscript age,sex to define variables for age and sex cate- gories. Age can take the values 0 or 1+, which represent respectively zero years old (first year of life) and at least one year old. Sex can take the values m or f denoting males and females, respectively. The count data is denoted by the vector Y and the newborn data by the vector J. Both vectors have length K = 15. The capture-recapture data takes the form of four m- arrays, based on sex and age at release. The data is coded in the variables (M0,f , Z0,f ), (M0,m, Z0,m), (M1+,f , Z1+,f ) and (M1+,m, Z1+,m) defined similarly to (M, Z) in Section 0,m 3.2.1. For example, M3,14 is the number of males released in their first year of life in period 3 that were recaptured (as adults) in period 14. To model the data, we choose to use the parametrization that was selected based on the DIC criterion in Schaub et al. (2007). We are not performing model selection because our goal with this work is to illustrate the difference between two inference approaches when using a common set of parameters. Following Schaub et al. (2007), we assume that the survival probability in the first year of age, φ0, is different from the survival probability in subsequent years, φ1+ . A fecundity parameter, f, is constant over time and stands for the yearly mean number of newborn per female old enough to reproduce. We assume that females start reproducing in their second year of life because reproduction at an earlier age is rare. We also assume that a bat present in the chapel at a recapture occasion is recaptured with probability one. Therefore, the probability of recapture corresponds to the probability of being present in the nursery colony at the time of recaptures. The probability age,sex of presence in the chapel, pj , is assumed to vary by year (j =14 or 15, for 2004 and 2005), sex (m or f) and age category (0 or 1+). To define the true joint likelihood model, we introduce the following latent variables, age,sex sex u;age,sex m;age,sex defined analogously to the ones in Section 3.3.1: N1 , B , D , D , where + m;age,sex age = 0 or 1 and sex = m or f. Note that in Dij , age indicates age at release at

39 time i, not at recapture at time j. Analogously to Section 3.3.1, we compute the number of first captures C0,m and C0,f from the capture-recapture data. The true joint likelihood model is described by the following distributions. For ease of notation, we omit specifying the variables that the distributions are conditioned upon. Also, sums that go backwards are defined as zero and undefined variables are defined as zero.

0,sex  sex  • Cj ∼ Binomial Bj , ξj for j = 1,...,K − 1

0,sex 0,sex Pj−1 m;0,sex Pj−1 0,sex 0,sex • Mij ∼ Binomial(Ci − l=i Dil − l=i+1 Mil , pj ), for (i,j) = {(13,14),(14,15)}

0,sex 0,sex Pj−1 m;0,sex Pj−1 0,sex 1+,sex • Mij ∼ Binomial(Ci − l=i Dil − l=i+1 Mil , pj ), for i = 1,...,12 and j = {14,15} or (i,j) = (13,15)

1+,sex 1+,sex Pj−1 m;1+,sex Pj−1 1+,sex 1+,sex • Mij ∼ Binomial(Ri − l=i Dil − l=i+1 Mil , pj ), for i = 1,...,13 and j = {14,15} or (i,j) = (14,15), where

i−1 1+,sex 1+,sex X 1+,sex 0,sex Ri = Ci + (Mki + Mki ) k=1 is the number of released adult individuals of a given sex in period i

 1+,m 1+,m 1+,f 1+,f 0,m 0,m 0,f 0,f 2 • Yj ∼ Normal Nj τ + Nj τ + Nj τ + Nj τ , σ , for j = 1,...,K, where

0,sex sex u;0,sex m;0,sex Nj = Bj−1 − Dj−1 − Dj−1,j−1,

is the number of individuals of a given sex in their first year of life at the beginning of period j

j−1 j−2 1+,sex 1+,sex 0,sex u;1+,sex X m;1+,sex X m;0,sex Nj = Nj−1 + Nj−1 − Dj−1 − Dk,j−1 − Dk,j−1 k=1 k=1 is the number of individuals of a given sex over one year old at the beginning of period j, and age,sex q age,sex age,sex τj = p14 p15 (3.6)

is an index of the presence rate at the colony at the time of the count.

 1+,f  • Jj ∼ Poisson Nj f , for j = 1,...,K

 m f  • Bj ,Bj ∼ Multinomial (Jj; 0.5, 0.5) , for j = 1,...,K

40 u;1+,sex  u;1+,sex u;0,sex 1+  • Dj ∼ Binomial Nj + Nj , 1 − φj , for j = 1,...,K − 1, where

u;0,sex sex 0,sex u;0,sex Nj = Bj−1 − Cj−1 − Dj−1 is the number of unmarked individuals of a given sex in their first year of life at the beginning of period j and

u;1+,sex u;1+,sex u;0,sex u;1+,sex Nj = Nj−1 + Nj−1 − Dj−1

is the number of unmarked individuals of a given sex over one year old at the beginning of period j

u;0,sex  sex 0,sex 0 • Dj ∼ Binomial Bj − Cj , 1 − φj , for j = 1,...,K − 1

m;0,sex  0,sex Pj−1 m;0,sex Pj 0,sex 1+  • Dij ∼ Binomial Ci − l=i Dil − l=i+1 Mil , 1 − φj , for i = 1,...,K− 1, j = i + 1,...,K − 1

m;1+,sex  1+,sex Pj−1 m;1+,sex Pj 1+,sex 1+  • Dij ∼ Binomial Ri − l=i Dil − l=i+1 Mil , 1 − φj , for i = 1,...,K − 1, j = i + 1,...,K − 1

m;0,sex  0,sex 0 • Dii ∼ Binomial Ci , 1 − φj , for i = 1,...,K − 1

m;1+,sex  1+,sex 1+  • Dii ∼ Binomial Ri , 1 − φj , for i = 1,...,K − 1

For the approach based on a composite likelihood, we specify models for the count data, the newborn count data and the capture-recapture data separately. For the capture- recapture data, we model the data in each of the four m-arrays separately using products of multinomial distributions as in Schaub et al. (2007). Because there is no recapture effort before 2004, the multinomial probabilities are set to zero prior to 2004. For the count data, we use

 0,f 0,f 0,m 0,m 1+,f 1+,f 1+,m 1+,m 2 Yj ∼ Normal Nj τ + Nj τ + Nj τ + Nj τ , σ , for j = 1,...,K, where the τ’s are defined in equation (3.6) and

1+,sex  1+,sex 0,sex 1+  Nj ∼ Binomial Nj−1 + Nj−1 , φ  + f  N 0,sex ∼ Poisson N 1 ,f φ0 , j j−1 2

 1+,f  for j = 2,...,K. For the newborn count data, we use Jj ∼ Poisson Nj f for j = 1,...,K. Whenever possible, we used the same prior distributions for the true likelihood and the composite approaches. For the prior distribution of ξ, we set ξj ∼ Beta(1,1) for all js.

41 For the other parameters, we used the priors in Schaub et al. (2007). We used Beta(1,1) priors for the survival and the recapture probabilities. We used a Gamma prior with shape and rate parameters both set at 10−3 for the inverse of σ2 and a Normal(0, 104) on the log fecundity. We used Normal priors truncated to positive values and rounded to the 4 1+,f 1+,m nearest integer for the initial population sizes: Normal(10, 10 ) for N1 and N1 and 4 0,f 0,m Normal(20, 10 ) for N1 and N1 . For both methods, we ran the MCMC chains for 1,500,000 iterations, with a burn- in/adaptation period of 500,000 iterations and a thinning factor of 10. Plots of the marginal posterior distributions are shown in Figure 3.2. The posterior distributions for fecundity and survival parameters (f, φ0, φ1+ ) look the most different between the true joint likelihood approach and the composite likelihood approach. The true joint likelihood approach yields narrower posterior distributions for those parameters, hence indicating that for this data there is a benefit in using the true joint likelihood approach because it yields narrower credible intervals. Also, the posterior means and standard deviations are summarized in Appendix B.3. The true joint likelihood approach yields smaller estimates of fecundity, fˆ = 0.64 vs 0.71, and juvenile survival, φˆ0 = 0.41 vs 0.45, but a larger estimate of survival when older, φˆ1+ = 0.94 vs 0.92. All 95% HPD credible interval are smaller with the true joint likelihood method than with the composite likelihood method, except for the parameter τ 1+,f .

42 4 6 20 3 15 4 2 10 2 Posterior density Posterior density Posterior density Posterior 1 5 0 0 0

0.2 0.3 0.4 0.5 0.6 0.7 0.80 0.90 1.00 0.2 0.4 0.6 0.8 1.0 1.2 Survival probability Survival probability in the first year after the first year Fecundity rate 2.5 8 4 2.0 6 3 1.5 4 2 1.0 2 1 Posterior density Posterior density Posterior density Posterior 0.5 0 0 0.0

0.4 0.6 0.8 1.0 0.6 0.7 0.8 0.9 1.0 0.2 0.4 0.6 0.8 1.0 Presence probability of females Presence probability of females Presence probability of males in their first year after the first year in their first year 0.4 4 0.3 3 0.2 2 Posterior density Posterior density Posterior 1 0.1 0 0.0

0.0 0.2 0.4 0.6 0.8 0 2 4 6 8 10 12 Presence probability of males after the first year Standard deviation in counts

Figure 3.2: Marginal posterior distributions (smoothed) obtained from analyzing the bats data. The plain line represents the true joint likelihood method while the dashed line represents the composite likelihood method.

3.6 Discussion

In this chapter, we introduced an integrated population modeling approach that does not rely on the typical independence assumption. The independence assumption is widely used in practice and there appears to be a belief in the literature that the use of the independence assumption is justified when the dependency between the capture-recapture data and the count data is low. However, this belief was not supported by our simulation study which suggests that, even in cases of low dependency, inference based on the true joint likeli- hood might be significantly different from inference based on the independence assumption.

43 When the marking effort is lower, the dependency between the datasets is thought to be weaker but the precision of estimates is also expected to be lower. Thus, using an explicit integrated population modeling approach rather than a composite likelihood approach in lower dependency cases could be worth the extra effort even if the percent gain in precision is smaller relative to higher dependency scenarios, because this gain would be larger in absolute value. The composite likelihood method performed surprisingly well in our simulation study with credible intervals having the targeted coverage probability. However, satisfactory be- havior in this study does not guarantee a similar behavior in studies with different modeling assumptions and/or types of data. In the simulation study, we did not investigate cases with values of ξ = p smaller than 0.2. That is because the MCMC algorithm used in JAGS was slower to converge for the true likelihood method which increased the computational burden of the simulation study. We had initially considered ξ = p = 0.1 for scenarios 1 and 2 and the MCMC chains had clearly not converged after one million iterations due to high auto-correlations. In further work, it would be interesting to investigate cases with smaller values of p and ξ by either investing more computer resources and time in the simulation study or reduce the number of replicates. Improving the mixing of the chain with a custom algorithm does not seem easy. At some point during this work, we considered an alternative parametrization to the one given in Section 3.3.1 when formulating the true joint likelihood model. We have successfully implemented this parametrization, which is given by

u u • N , a vector of length K, where Nj represents the size of the unmarked population at the beginning of period j

• Nm, an upper triangular array indexed by i = {1,...,K −1} and j = {i+1,...,K} m where Nij represents the number of animals released for the last time in period i that are alive and not recaptured prior to period j

• B, a vector of length K, where Bj represents the number of births in period j.

This parametrization is similar in spirit to that of Lee et al. (2015). However Lee et al. did not introduce Nm because they analyzed the capture histories rather than the m-array summaries.

44 Chapter 4

Integrated population modeling of Chinook salmon (Oncorhynchus tshawytscha) migration on the West Coast of Vancouver Island

4.1 Introduction

As reported by Fisheries and Oceans Canada (DFO), “Chinook (Oncorhynchus tshawytscha) from the west coast of Vancouver Island (WCVI) are one of British Columbia’s most impor- tant natural resources. These stocks have long been major contributors to First Nations, commercial troll, and sport catches, from Alaska to southern Vancouver Island” (DFO, 2012). Wild WCVI Chinook salmon current population status is poor despite management actions taken over the last 15 years (DFO, 2012) including harvest restrictions and hatchery propagation. The factors contributing to the low abundance and failure to rebuild stocks remain uncertain. Chinook salmon on the West Coast of Vancouver Island return to their natal streams or rivers in the fall to spawn and die once they have spawned. Burman River, on the West Coast of Vancouver Island, is one of the six streams selected to represent the escape- ment of naturally spawned Chinook salmon in WCVI streams for management under the Pacific Salmon Treaty which provides a joint Canada-US framework for conservation and management of Pacific salmon – escapement is the number of fish that escape fisheries to spawn in freshwater. When transitioning from saltwater to freshwater an acclimation of salmon’s osmoregulatory function is required to maintain homeostasis in freshwater. To this end, Chinook salmon migrating to the Burman River hold at least briefly at the upper limit of tidal influence in a trench pool scoured by a confining bedrock (Figure 4.1). Note that similar stopover behavior is observed in other local Chinook salmon populations, e.g.

45 Conuma River, Gold River. Although occasional early freshets in August may stimulate Chinook salmon to move upstream, most hold in the stopover pool until the higher flows of September and October stimulate upstream movements and spawning. The freshet pro- vides not only access to spawning areas but also sufficient freshwater volume to initiate the physiological osmoregulatory changes. Migration into the upstream spawning area (km 0 to 7.5, in Figure 4.1) is generally complete by mid-October. Spawning is complete by late October or early November.

Figure 4.1: Map of Burman River on the West Coast of Vancouver Island, Canada

In the WCVI region, DFO relies on periodic snorkel surveys to estimate escapement using an area-under-the-curve (AUC) method (Hilborn et al. 1999; Parken et al. 2003). The traditional AUC method estimates escapement by dividing the AUC – the area under the time curve formed by interpolating snorkel counts – by an estimate of observer efficiency and an estimate of mean fish residence time in the counting area. As detection probability and residence time are known to vary annually, by location and with environmental conditions, stream-specific parameter measurements are required to produce reliable species-specific estimates with the method (English et al. 1992; Parken et al. 2003). DFO’s AUC estimation approach in the WCVI has been criticized (DFO, 2014) because the observer efficiency and mean residence time estimates are chosen subjectively rather than estimated rigorously using, for example, a radio-tagging survey. Periodic snorkel surveys will likely remain the

46 method of choice to monitor escapements in the future on the WCVI unless replaced by other technology. In order for estimates to be reliable under this method, it is necessary to gain a better understanding of the migration dynamics affecting residence time in the counting area as well as estimates of observer efficiency. For example, it is believed that in years with late freshets, fish spend more time in the stop-over pool and less time in the spawning area where the snorkel survey takes place. Residence time is also believed to be related to local temperature. Although costs of initial studies to establish the relationship between time of first freshet and residence time in the counting area are substantial, such studies would provide better estimates of spawning escapement in the long term. Lack of certainty regarding the accuracy of population estimates with the AUC esti- mation approach prompted establishment of the Sentinel Stocks Program (SSP) under the 2009 Pacific Salmon Treaty to improve estimates of Chinook salmon escapement in WCVI and other regions. Funding was provided annually from 2009 to 2013 by SSP for surveys in the Burman River. Funding was also provided in 2014 by the PSC Southern Endowment Fund. Every year from 2009 to 2014, capture-recapture surveys, dead recovery surveys and snorkel surveys were conducted. In addition, a radio-tagging survey was conducted in 2012. One hundred radio-tags were applied to fish in the stopover pool over the course of the capture-recapture survey. Radio-tag signals were recorded by fixed receivers at km 0 and km 7.5 and by foot survey. The radio-tags were set so that a different signal was sent if a fish had not moved for at least 12 hours. In Section 4.2, we describe the sampling protocol used in 2009-2014 to conduct the capture-recapture, dead recoveries and snorkel surveys. In Section 4.4, we show how to apply the Jolly-Seber method to the capture-recapture data to estimate escapement and other parameters of interest. In Section 4.5, we show how to analyze the capture-recapture, carcass and snorkel data together using an explicit integrated population model, along the lines of Chapter 3. In Section 4.6, we apply the methods presented in Sections 4.4 and 4.5 to the 2012 Burman data. We conclude with a discussion in Section 4.7.

4.2 Sampling Protocol

From 2009-2014, Chinook salmon surveys consisting of capture-recapture, dead recoveries and snorkel surveys, took place at Burman River. Surveys were conducted periodically but could not be conducted at high flows. In 2012 only, a radio-tagging study also took place but it is not the focus of this work. Capture-recapture surveys took place at the Burman stopover pool, starting in Septem- ber when fish started to arrive in the lower river; they were continued until three consecutive capture-recapture occasions led to no catch, suggesting that most fish had moved to the spawning area. On capture-recapture occasions (two to three days per week), fish were caught with a beach seine. Sampling was limited to three beach-seine sets per day in an

47 attempt to keep absolute sampling effort approximately constant each week. Fish captured for the first time were tagged, measured for post orbital hypural length (POH) and visually sexed before being released. The tags used for marking were dorsally visible uniquely- numbered 80 lb. monofilament-cored Floytm tags inserted between the pterigiophores with a needle and secured with size “J” metal sleeves. In order to assess tag loss, a secondary permanent mutilation mark was applied to the opercula. Recaptured fish had their tag color and number recorded and sex was assessed visually. Although some jacks (subadult fish) were marked during the capture-recapture survey, adults are the main interest of this study so the jack data is not analyzed in this work. The POH length cutoff used for jacks was set at 500 mm. As spawners moved upstream from the stopover site, carcass surveys began and contin- ued until carcasses were no longer present. On each carcass survey day, a crew recovered all accessible carcasses along a given route down the main channel. Recovered carcasses were sexed, measured (POH) and the tag id and color were recorded for marked fish. However, not all carcasses present in the stream could be sampled on a given carcass survey occasion; for example, some carcasses along the sampling route were stuck in a log jam and therefore not accessible. Resampling of carcasses was prevented by sectioning the head. Sampled carcasses were rarely observed again on subsequent sampling occasions as they most likely got flushed out by flows. Although some jack carcasses were recovered, this data will not be analyzed in this work. Snorkel surveys were conducted periodically over the study period. The river, from rkm 7.5 to rkm 0, was typically swum by two snorkelers who recorded the number of marked and total fish seen. It is not possible to read the tag number of marked fish in the snorkel survey. Other variables were recorded on survey days such as fish visibility. The standard survey procedure that was followed consisted in recording one joint observation for each river section, agreed between the two snorkelers. The individual observers’ counts are not available. Finally, extraneous to the survey protocol, the number of fish removed from the pool by the hatchery during migration was recorded.

4.3 Notation

Throughout this chapter, we use the notation given in Table 4.1 to describe the data. We discard any snorkel survey data that is outside the range of dates spanned by the capture- recapture and carcass surveys. Data are defined as zero on days when surveys did not take place. The notation for the parameters used to define the Jolly-Seber and/or the integrated population models is given in Table 4.2.

48 Survey Variable Definition Capture-recapture Mi,j,s Number of fish of sex s released on day i and recaptured next on day j. Capture-recapture Cj,s Number of individuals of sex s captured and released for the first time on day j. Capture-recapture Rj,s Number of individuals of sex s released on day j. Note: this is a redundant variable because it can be Pj−1 computed from C and M, as Cj,s + i=1 Mi,j,s. u Carcass survey Zj,s Number of unmarked fish of sex s whose carcass was recovered on day j. m Carcass survey Zi,j,s Number of marked fish of sex s whose carcass was recovered on day j and that were released previously on day i and not recaptured since. u Snorkel survey Yj Snorkel count of unmarked fish on day j. m Snorkel survey Yj Snorkel count of marked fish on day j. Snorkel survey vj Fish visibility on day j. Can take the values low, medium, high or unknown. u Hatchery removals Hj,s Number of unmarked fish of sex s removed by the hatchery on day j.

Table 4.1: Notation for the data collected at Burman River. The subscript s can take the values m (males) and f (females).

The first day of the capture-recapture study is considered as day one. Let Kcapt denote the day on which the last capture occasion with nonzero catch occurred. Let Kpool denote an arbitrary date which we assume is the last day when Chinook salmon are present in the stopover pool. Let Kcarc be the last carcass survey day, i.e. the last day of the study. For improved readability, through this chapter, we purposely omit the conditioning variables in distributions.

4.4 A Jolly-Seber approach to estimate escapement

As a straightforward approach to estimating escapement in a given survey year, we suggest analyzing the capture-recapture data at the stopover pool using a Jolly-Seber model with the POPAN parametrization of Schwarz and Arnason (1996). Although inference under the Jolly-Seber model is typically conducted using a frequentist framework, we adopt a Bayesian approach in order to make pertinent comparisons with the integrated population model in Section 4.5.

49 Parameter Definition

Bj,s Number of individuals of sex s that arrive (from the ocean) to the stopover pool after midday on day j and before midday the next day.

Note: B0,s is the number of individuals of sex s in the pool right before midday on day 1. u Nj,s Number of unmarked individuals of sex s in the stopover pool at midday on day j. m Ni,j,s Number of marked individuals of sex s that are in the stopover pool at midday on day j and were released previously on day i and not recaptured prior to day j. u Tj,s Number of unmarked individuals of sex s that transition from the stopover pool to the spawning area after midday on day j and before midday the next day. m Ti,j,s Number of marked individuals of sex s that transition from the stopover pool to the spawning area between midday on day j and midday the next day and were released previously on day i and not recaptured since. u Aj,s Number of unmarked individuals of sex s alive in spawning area before midday on day j. m Ai,j,s Number of marked individuals of sex s that are alive in the spawning area before midday on day j and were released previously on day i and not recaptured since. u Dj,s Number of unmarked fish of sex s that died after midday on day j and before midday the next day. m Di,j,s Number of marked fish of sex s that died between midday on day j and midday the next day and were released previously on day i and not recaptured since. u Xj,s Number of dead unmarked fish of sex s present in the river at midday on day j. m Xi,j,s Number of marked fish of sex s that died and are present in the river at midday on day j and were released previously on day i and not recaptured since. u Fj,s Number of dead unmarked fish of sex s that got flushed out between midday on day j and midday the next day. m Fi,j,s Number of dead marked fish of sex s that got flushed out between midday on day j and midday the next day and were released previously on day i and not recaptured since. move pj,s Probability for individuals of sex s in the stopover pool at midday on day j to move to the spawning area before midday the next day.

φj,s Probability for individuals of sex s alive in the spawning area at midday on

50 day j to survive until midday the next day. flush pj Probability for dead individuals in the river at midday on day j to get flushed out before midday the next day. capt pi,s Capture probability for individuals of sex s in the stopover pool at midday on day i. precov Recovery probability of carcasses present in the river at midday on a given carcass survey day. snor pj Probability for fish alive in the river at midday on day j to be counted in the snorkel survey. snor  snor µlow Intercept, on the non-logit scale, used to model logit pj .  snor αv Linear effect of fish visibility v, on the logit scale, used to model logit pj .

αlow is set equal to 0.  snor σv Standard deviation used to model logit pj for a given visibility v.

∆j Number of marked adult fish miscounted as unmarked in the snorkel survey on day j. p∆ Probability for marked fish to be miscounted as unmarked in a given snorkel survey.

Table 4.2: Notation for the parameters used in the Jolly-Seber model and/or the integrated population model. The subscript s can take the values m (males) and f (females).

Table 4.3 gives an organized representation of the notation used in the Jolly-Seber model formulation. We suppose that before the study begins, there is a superpopulation of adult Chinook salmon in the ocean that will soon migrate to Burman River. Before the first capture-recapture sampling occasion, B0,m and B0,f fish (males and females respectively) enter the lower river and are available to be sampled at the stopover pool at the first capture capt occasion. Between midday on day j = 1, . . . ,K − 1 and midday the next day, Bj,m male and Bj,f female Chinook salmon newly arrive at the stopover pool. We assume that fish that newly arrive in the pool do not leave before midday the next day. Fish present in the pool at midday on day j = 1, . . . ,Kcapt − 1, move to the upper sections of the river before move move midday the next day with probability pj,m and pj,f , for males and females respectively. u u In absolute numbers, Tj,m unmarked males and Tj,f unmarked females move upstream; and m m Ti,j,m marked males and Ti,j,f marked females move upstream and were released previously on day i and not recaptured since. Note that the entrants to the pool in this description are analogous to “births” in a typical Jolly-Seber model while fish that move out of the stopover pool towards the spawning area are analogous to “apparent deaths” in a typical Jolly-Seber model. Typically, in a Jolly-Seber model, arrival (“birth”) and transition (“apparent death”) parameters are defined on capture occasions only, but the Bayesian framework here allows us to have births

51 Parameters to Latent State relate the data Transition State number transitions Data to the latent of fish probabilities number of fish Arrive at Burman B In stopover pool Nu, Nm M, C pcapt Move upstream Tu, Tm pmove

Table 4.3: Variables used in the Jolly-Seber model, categorized based on their role in the model.

u m move and movement every day and estimate Bj,s, Tj,s, Ti,j,s and pj,s even for days j that do not have capture occasions. This way, we can obtain daily estimates of population size in the pool as well as better estimates of escapement. Having births and movement every day also allows for better comparisons with the integrated population model. capt capt On capture occasion i, males and females are captured with probability pi,m and pi,f respectively. To ensure identifiability of all parameters, we set the capture probability on the first capture occasion, for each sex, equal to that on the second occasion. We impose the same equality on the last two occasions. The Jolly-Seber model is described by the following distributions and state equations, where i takes values within the set of capture-recapture survey days and s = f or m for females or males, respectively. First, the size of the unmarked population of sex s in the pool at midday on day j is governed by the following equation

u u u u Nj,s = Nj−1,s − Cj−1,s + Bj−1,s − Tj−1,s − Hj−1,s, (4.1)

capt u for j = 2, . . . ,K and with N1,s = B0,s. The number of marked individuals of sex s released previously on capture-recapture day i, not recaptured since, and in the pool at midday on day j is governed by the following equations

m m Ni,i+1,s = Ri,s − Ti,i,s (4.2) and m m m Ni,j,s = Ni,j−1,s − Mi,j−1,s − Ti,j−1,s, (4.3) for j = i + 2, . . . ,Kcapt. The transitions (unobservable) are modeled as

u  u move capt • Tj,s ∼ Binomial Nj,s − Cj,s, pj,s , for j = 1, . . . ,K − 1

m  move  • Ti,i,s ∼ Binomial Ri,s, pi,s m  m move  capt Ti,j,s ∼ Binomial Ni,j,s − Mi,j,s, pj,s , for j = i + 1, . . . ,K − 1.

52 The capture-recapture data (observable) are modeled as

 m capt capt • Mi,j,s ∼ Binomial Ni,j,s, pj,s , for capture survey days j ∈ {i + 1, . . . ,K }

 u capt capt • Cj,s ∼ Binomial Nj,s, pj,s , for capture survey days j ∈ {1, . . . ,K }.

Quantities of interest can be calculated once a posterior sample is obtained. These include escapement (per sex and total), population size in the pool (per sex, over time) and mean stopover time in the pool in days (per sex and arrival day in the pool). The formulas used to calculate the latter quantities are given in Table 4.4. The escapement estimates obtained with the Jolly-Seber model are likely to be biased low because they do not account for fish that enter the river after the last capture-recapture occasion with non-zero catch. The mean stopover time estimates may also be biased low because they do not account for the fact that some fish most likely remain in the pool after the last capture-recapture occasion with nonzero catch.

4.5 Integrated population modeling

In this section, we present an integrated population model that incorporates all sources of data in a single analysis, along the lines of the work in Chapter 3. The Chinook salmon mi- gration is assumed to follow the steps illustrated in Figure 4.2. Table 4.5 gives an organized representation of the notation used in the integrated population model.

Figure 4.2: Schematic representation of Chinook salmon migration at Burman River, as assumed by the integrated population model. The arrows denote transitions while boxes denote states.

For the integrated population model, we assume that movement to and out of the stopover pool is as described in Section 4.4, except that we allow for new entrants up to j = Kpool − 1 and we assume that after midday on day Kpool, no new entrants arrive and all fish remaining in the pool leave the pool by midday the next day. Thus we set move pool carc u m pKpool = 1 and Bj,s = 0 for j = K , . . . ,K − 1, as well as Tj,s = 0 and Ti,j,s = 0 for j = Kpool + 1, . . . ,Kcarc − 1. We assume that the number of fish alive of each sex in the spawning area at midday on day 1 is equal to those on day 2. Fish alive in the spawning area after midday on day j = 1, . . . ,Kcarc − 1 die before midday the next day with probability

1−φj,m for males and 1−φj,f for females. We also allow for fish that move to the upstream

53 Quantity Formulas for the Jolly-Seber model Formulas for the integrated population model

Kcapt−1 Kpool−1 P P Escapement for sex s Bj,s Bj,s j=0 j=0

Kcapt−1 Kcapt−1 P P Total escapement (Bj,m + Bj,f ) (Bj,m + Bj,f ) j=0 j=0

Number of individuals j j u P m u P m of sex s in the pool Nj,s + Ni,j,s Nj,s + Ni,j,s at midday on day j i=1 i=1 Mean stopover time for individuals of sex Kcapt−j j+d−2 Kpool−j j+d−2 s that arrived in the move P move Q move move P move Q move 0.5pj,s + (d − 0.5)pj+d−1,s (1 − pl,s ) 0.5pj,s + (d − 0.5)pj+d−1,s (1 − pl,s ) pool between midday d=2 l=j d=2 l=j on day j and midday the next day 54 Mean residence time for individuals of sex Kcarc−j j+d−2 s that arrived in the P Q n/a 0.5(1 − φj,s) + (d − 0.5)(1 − φj+d−1,s) φl,s spawning area between d=2 l=j midday on day j and midday the next day Number of individuals j of sex s alive in the u P m n/a Aj,s + Ai,j,s spawning area at i=1 midday on day j Median snorkel −1 snor observer efficiency at n/a logit (logit (µlow ) + αv) visibility v

Table 4.4: Formulas used to compute quantities of interest for the Jolly-Seber model or the integrated population model. Residence time and alive population size in the stream cannot be estimated from the Jolly-Seber model. Notes: (1) Sums are defined as zero when backwards; (2) The use of d − 0.5 in the mean stopover time calculation is based on the assumption that within a day, the movement m m of fish upstream to the spawning grounds is distributed uniformly over the day; (3) The latent variables Ni,j,s, and Ai,j,s are defined as 0 when i is not a capture-recapture day; (4) The time unit is days. Latent State Parameters to Transition State number transitions Data relate the data to the of fish probabilities latent number of fish Arrive at B Burman In stopover Nu, Nm M, C pcapt pool Move Tu, Tm pmove upstream Alive in Au, Am Yu, Ym, v psnor, µsnor, α, σ, ∆, p∆ stream low Die Du, Dm φ Dead in Xu, Xm Zu, Zm precov stream Get flushed Fu, Fm pflush out

Table 4.5: Variables used in the integrated population model, categorized based on their role in the model. area between midday on day j and midday the next day to die with the same probabilities. u u In absolute numbers, Dj,f unmarked females and Dj,m unmarked males die between midday m m on day j and midday the following day. Also, Di,j,f marked females and Di,j,m marked males die between midday on day j and midday the following day and were previously released on day i and not recaptured since. We assume that there are no carcasses in the stream at midday on day 1. Fish dead in the stream at midday on day j = 2, . . . ,Kcarc − 1 get flush u flushed out before midday the next day with probability p . In absolute numbers, Fj,f u unmarked carcasses of female fish and Fj,m unmarked carcasses of male fish get flushed out. m m In addition, Fi,j,f marked carcasses of female fish and Fi,j,m marked carcasses of male fish get flushed out and were released previously on day i and not recaptured since. We assume that capture-recapture surveys, carcass surveys and snorkel surveys are instantaneous and occur simultaneously at midday. On capture-recapture day i, males and capt capt females are captured with probability pi,m and pi,f respectively. To avoid identifiability issues, and to make things comparable with the Jolly-Seber model, we set the capture probability on the first capture occasion, for each sex, equal to that on the second occasion. We impose the same equality on the last two occasions. On carcass survey day j, carcasses present in the stream are picked with probability precov. On snorkel survey day j, a number

55 snor ∆j of marked fish appears as unmarked if counted. Fish are counted with probability pj , which depends on the visibility on day j. While there are many possible candidate integrated population models, we present only one candidate model for the sake of simplicity and as a starting point in understanding better the migration of Chinook salmon at Burman River. The model is developed with the perspective that it will be fitted using Bayesian methods. It is parametrized using a number of latent variables and fundamental parameters, described in Table 4.2. Our integrated population model is described by the following distributions and state equations, where i takes values within the set of capture-recapture survey days and s = f or m for females or males, respectively. First, the sizes of the marked and unmarked m u population in the pool at midday, Ni,j,s and Nj,s respectively, are governed by equations (4.1)-(4.3), where j = 1, . . . ,Kpool. The state equation describing the number of unmarked individuals of sex s alive in the river at midday on day j is

u u u u Aj,s = Aj−1,s + Tj−1,s − Dj−1,s,

carc u u for j = 2, . . . ,K and with A1,s = A2,s. The state equation describing the number of marked individuals of sex s released last on day i and alive in the river at midday on day j is m m m m Ai,j,s = Ai,j−1,s + Ti,j−1,s − Di,j−1,s, for j = i + 1, . . . ,Kcarc. The state equation describing the number of dead unmarked fish of sex s that are present in the river at midday on day j is

u u u u u Xj,s = Xj−1,s − Fj−1,s − Zj−1,s + Dj−1,s, for j = 2, . . . ,Kcarc. The state equation describing the number of dead marked fish of sex s present in the river at midday day j and released previously on day i and not recaptured since is m u u u u Xi,j,s = Xi,j−1,s − Fi,j−1,s − Zi,j−1,s + Di,j−1,s.

The transition distributions are given by

u  u move pool • Tj,s ∼ Binomial Nj,s − Cj,s, pj,s , for j = 1, . . . ,K

m  move  • Ti,i,s ∼ Binomial Ri,s, pi,s m  m move  pool Ti,j,s ∼ Binomial Ni,j,s − Mi,j,s, pj,s , for j = i + 1, . . . ,K .

u  u u  carc • Dj,s ∼ Binomial Tj,s + Aj,s, 1 − φj,s , for j = 1, . . . ,K − 1

m  m m  carc • Di,j,s ∼ Binomial Ti,j,s + Ai,j,s, 1 − φj,s , for j = i, . . . ,K − 1

u  u flush carc • Fj,s ∼ Binomial Xj,s, pj , for j = 2, . . . ,K − 1

56 m  m flush carc • Fi,j,s ∼ Binomial Xi,j,s, pj , for j = 2, . . . ,K − 1

The distributions used to relate the data to the latent variables are given by

u  u recov carc • Zj,s ∼ Binomial Xj,s, p , for carcass survey days j ∈ {1, . . . ,K }

m  m recov carc • Zi,j,s ∼ Binomial Xi,j,s, p , for carcass survey days j ∈ {i + 1, . . . ,K }

u  u u snor carc • Yj ∼ Binomial Aj,f + Aj,m + ∆j, pj , for snorkel survey days j ∈ {1, . . . ,K }

m P h m m i snor • Yj ∼ Binomial all possible i Ai,j,f + Ai,j,m − ∆j, pj , for snorkel survey days j ∈ {1, . . . ,Kcarc}

snor  snor 2  carc • logit(pj ) ∼ Normal logit(µlow ) + αvj , σvj , for snorkel survey days j ∈ {1, . . . ,K }

P h m m i ∆ carc • ∆j ∼ Binomial all possible i Ni,j,f + Ni,j,m , p , for snorkel survey days j ∈ {1, . . . ,K }.

Quantities of interest can be calculated once a posterior sample is obtained. These include escapement (per sex and total), population size in the pool (per sex, over time), mean stopover time in the pool (in days, per sex and arrival day in the pool), mean residence time in the spawning area (in days, per sex and arrival day in the spawning area) and median observer efficiency for the snorkel survey (per visibility category). The formulas used to calculate the latter quantities are given in Table 4.4.

4.6 Analysis of the 2012 data

The 2012 data were collected following the timeline shown in Figure 4.3. A summary of the data is given in Figure 4.4. For the data analysis, we focused on the study period of September 10th 2012 to October 27th 2012, which spans from the first day of live captures to the last day of carcass recoveries. In other words, for this analysis we did not consider the snorkel data collected outside of this period. Over the course of the capture-recapture survey, a total of 1179 adult Chinook salmon were tagged, of which 35% were females. A total of 348 recaptures occured with most being recaptured once. Over the course of the carcass survey, 299 adult carcasses were recovered, of which 15% were marked males, 7% were marked females, 40% were unmarked males and 38% were unmarked females. Snorkel counts varied between 7 (November 10th 2012, high visibility) and 725 (October 17th 2012, medium visibility). In the 2012 study, the hatchery removed 25 females and 55 males on September 21st and 67 females and 47 males on September 22nd.

57 Live captures

Dead recoveries

Snorkel counts

Hatchery removals 05−Oct 12−Oct 19−Oct 26−Oct 02−Nov 09−Nov 07−Sep 14−Sep 21−Sep 28−Sep

Figure 4.3: Timeline when surveys were performed in 2012. Each occurrence is denoted by a symbol “×”. Adjacent symbols correspond to consecutive days.

800 Types of data Snorkel counts New live captures

600 Live recaptures Carcass recoveries 400 Number of fish 200 0 05−Oct 12−Oct 19−Oct 26−Oct 02−Nov 09−Nov 07−Sep 14−Sep 21−Sep 28−Sep

Figure 4.4: Summary time series of the 2012 data.

58 Migration timing of Chinook salmon in the Burman River is believed to be related to water discharge. Figure 4.5 shows the daily water discharge measured at Gold River, near the Burman River. The first large water discharge in 2012 is observed on October 14th. Concurrently, the last capture-recapture survey with positive catch took place on October 11th. This was followed by three capture-recapture occasions with no catch. Unfortunately, because no data was recorded on those days, those three exact dates are unknown but we hypothesize that they must have been after October 14th. Hence, for the integrated population model we assume that all fish have left the stopover area by October 16th and move th we ensure this by setting pj,s = 1 on October 15 . 350 )

1 Oct.14

− Oct.19 s 3 m 300 250 200 150 100 50 Mean daily discharge at Gold River in 2012 ( Mean daily discharge at Gold River 0

Aug 15 Sep 01 Sep 15 Oct 01 Oct 15 Nov 01 Nov 15

Figure 4.5: Daily discharge measured at Gold River over the 2012 migration period. Al- though discharge data is not available at Burman River, the data at nearby Gold River are thought to be a good proxy for Burman River. The first big freshet occurred on October 14th.

For our Bayesian approach, we fit the models using the JAGS software. Before running the chains, we ran an adaptation/burn-in phase of 250,000 iterations. We ran the chains for 2.5 million iterations, thinned by a factor of 10 and summarized the results using marginal posterior means and highest probability density (HPD) credible intervals. Convergence was assessed through traceplots. Whenever possible, we use the same priors for the Jolly-Seber approach and the in- tegrated population modeling approach. The priors on the number of entrants, Bj,s, are move Uniform(0,400), rounded to the nearest integer. For the various probabilities pj,s , φj,s, capt snor ∆ carc flush pi,s , µlow , p , p , pj we use Beta(1,1) distributions, which are equivalent to Uni- form(0,1). For the visibility effects, we use a Gamma(shape = 0.5, rate = 0.005) prior on

αmedium and αunknown. The positivity of the Gamma prior ensures that observer efficiency

59 Jolly-Seber model Integrated model Estimate 95 % CI Estimate 95 % CI Male escapement 3022 (2548-3511) 2411 (2154-2670) Female escapement 2443 (1798-3120) 2878 (2316-3437) Total escapement 5465 (4641-6293) 5289 (4710-5913)

Table 4.6: Escapement estimates obtained from the Jolly-Seber model and the integrated population model. The formulas used to calculate escapement are given in Table 4.4. CI denotes credible intervals.

is higher for medium visibility than low visibility because αlow = 0. On the same line of thought, to form a prior on αhigh that ensures higher observer efficiency than at medium visibility, we add a Gamma(shape = 0.5, rate = 0.005) effect to the prior on αmedium.

Finally, the priors on σv are Uniform(0,4). The escapement estimates obtained with the Jolly-Seber approach and the integrated population modeling approach are displayed in Table C.1. With the Jolly-Seber model, the escapement estimate is larger for males than females whereas the integrated population modeling shows opposite behavior in the estimates. However, credible intervals for males and females overlap for both methods. Note that more males than females were marked in the capture-recapture survey but this does not imply that male escapement is larger than female escapement because capture probabilities may depend on sex. Our escapement estimates can be compared with those obtained under a frequentist implementation of the Jolly-Seber (POPAN) model using the software MARK, see Appendix C.1. Figures 4.6 and 4.7 show plots of results obtained with the Jolly-Seber approach; Figures 4.8 to 4.11 show plots of results obtained with the integrated population modeling approach. For the Jolly-Seber approach, the number of Chinook salmon in the stopover pool at midday can only be estimated up to October 11th, the last capture-recapture occasion with positive catch. On October 11th, the population in the pool is near its maximum. The mean stopover time estimates from the Jolly-Seber model are typically lower than those from the integrated population model. This was expected because the calculation of stopover time with the Jolly-Seber model is truncated at the last capture-recapture occasion with positive catch and does not account for fish still in the pool after this time. The plots representing the number of individuals alive in the stream in Figure 4.11 show that the population in the pool peaked shortly after the big freshet on October 14th. The mean residence time plots in Figure 4.10 show the estimated mean residence time as a function of the date of entry in the spawning area. Residence time assessment is an important component for the use of the AUC method by DFO.

60 1500 1500

● ● ● ● ● ● ● ● 1000 1000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

tagging pool ● tagging pool 500 ● ● 500 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● Number of males in the ● ● ●

Number of females in the Number of females ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Sep 10 Sep 17 Sep 24 Oct 01 Oct 08

Figure 4.6: Estimates of the population size in the pool obtained using the Jolly-Seber model based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval. 61 10.0 10.0

● ● ● ● ● ● ● 7.5 ● 7.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5.0 ● ● ● ● 5.0 ● ● ● ● ● ● ● ● ● ● ● 2.5 ● 2.5 ● ● ● ● ● ● ● ●

● by males, time for Mean stopover ● Mean stopover time for females, by females, for time Mean stopover ● ● ● ● arrival time in the tagging pool (days) arrival arrival time in the tagging pool (days) time in arrival ● 0.0 0.0 ● Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Sep 10 Sep 17 Sep 24 Oct 01 Oct 08

Figure 4.7: Stopover time estimates obtained using the Jolly-Seber model based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval. 1600 1600

1200 1200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 800 800 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● tagging pool ● tagging pool ● ● ● ● ● ● ● ● ● ● 400 ● ● 400 ● ● ● ●

● Number of males in the ● ● ● Number of females in the Number of females ● ● ● ● ● ● ● ● ● ● 0 ● 0 ● Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15 Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15

Figure 4.8: Estimates of the population size in the tagging pool obtained using the integrated population modeling approach based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval.

62 15 15

● ● ● ● ● ● ● ● ● 10 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 Mean stopover time for males, by males, time for Mean stopover Mean stopover time for females, by females, time for Mean stopover arrival time in the tagging pool (days) arrival arrival time in the tagging pool (days) arrival Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15 Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15

Figure 4.9: Stopover time estimates obtained using the integrated population modeling approach based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval. 7.5 7.5

● ● ● ● ● ● ● ● ● ● 5.0 ● ● 5.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.5 ● 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● in the spawning area (days) area in the spawning area (days) in the spawning ● time arrival time of males,by time of females,by arrival time arrival time of females,by Mean spawning area residence area Mean spawning area residence Mean spawning 0.0 0.0 Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15 Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15

Figure 4.10: Residence time estimates obtained using the integrated population modeling approach based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval. 63

1500 1500 ● ●

● ● 1000 1000 ● ● ● ● ● ● ● ● ● ● 500 ● 500 ● the spawning area the spawning the spawning area the spawning ● ● ● ● ● ● ● ● ● ● ● ●

● in males Number of alive ● ● ● ● ● ● ● ● ● ● Number of alive females in females Number of alive ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15 Oct 22 Oct 29 Sep 10 Sep 17 Sep 24 Oct 01 Oct 08 Oct 15 Oct 22 Oct 29

Figure 4.11: Estimates of alive population size in the spawning area obtained using the integrated population modeling approach based on the formula in Table 4.4. Each estimate is represented along with a 95 % HPD credible interval. Observer efficiency Estimate 95 HPD % CI Low visibility 0.35 (0.01-0.73) Medium visibility 0.65 (0.29-0.95) High visibility 0.77 (0.60-0.97) Unknown visibility 0.84 (0.45-1.00)

Table 4.7: Integrated population modeling marginal estimates and credible intervals of observer efficiency in the snorkel survey, based on the fish visibility covariate.

Table 4.7 presents the observer efficiency estimates obtained with the integrated popu- lation model. This information is pertinent to the use of the AUC method. With the integrated population modeling approach, the probability to recover a dead carcass present in the stream during a carcass survey occasion is estimated at 0.11 (95% HPD CI of 0.06-0.17). This estimate is quite higher than the proportion of marked individuals recovered dead (0.06) because it is conditional upon the carcasses not being flushed.

4.6.1 Assessment of the integrated population model

Model fit can be assessed using posterior predictive p-values, also called Bayesian p-values (Meng, 1994; Gelman et al., 2003). Approximate p-values are computed as follows based on a sample of size ν from the posterior distribution:

1 ν p = X 1[D(W0 ,θ ) > D(W,θ )], (4.4) ν k k k k=1 th 0 where D is a discrepancy measure, W is the data, θk is the k posterior sample and Wk is data simulated from θk using the model. We used (4.4) to assess the capture-recapture and snorkel components of the integrated population model. A more thorough model assessment was not possible in the given time- frame due to not saving all parameter values from the MCMC run and the algorithm being computationally demanding. We used a sample size of ν = 5000. For the discrepancies, we √ √ used Freeman-Tukey statistics (Freeman and Tukey, 1950) which have the form ( O− E)2, where O is observed (or simulated) count data and E is the expectation of the count under the model. We use, for the capture-recapture and snorkel components respectively:

v 2  u   q u  u u X m capt D1(W,θ) =  Rj,s − tNj,s + Ni,j,s pj,s  all possible i v 2  u    q u   D (W,θ) =  Y u + Y m − u X Au + X Am logit−1 logit(µsnor) + α  . 2  j j t  j,s i,j,s low vj  s∈{m,f} all possible i

64 Bayesian p-values lie between 0 and 1. A Bayesian p-value not “close” to 0.5 suggests that the simulated data and observed data discrepancies are significantly different hence providing evidence of a poor fit of the model. Figure 4.12 shows the p-values computed for assessing the capture-recapture component of the model. The average of all p-values on the graph is 0.51. The p-values which are the farthest from 0.5 are from the beginning and end of the capture-recapture survey. This suggests that the assumption that the capture probabilities are the same at the first two and last two occasions might be inappropriate. Figure 4.12 shows the p-values computed for assessing the snorkel survey component of the model. The average of all p-values on the graph is 0.49. We do not have evidence that the snorkel survey component of the model does not fit well, except maybe for the second last survey occasion, which could simply be an outlier. 1.0 0.8 0.6 0.4 Bayesian p−value Bayesian Sex

0.2 Male Female 0.0

Sep 10 Sep 17 Sep 24 Oct 01 Oct 08

Figure 4.12: Bayesian p-values for the assessment of the capture-recapture component of the integrated population model, using discrepancy D1.

65 1.0

● 0.8

● ● ● ● 0.6 ●

● ● ● ● ● ● ● ● ● ● ●

0.4 ● ● ● Bayesian p−value Bayesian 0.2 0.0

Sep 17 Sep 24 Oct 01 Oct 08 Oct 15 Oct 22

Figure 4.13: Bayesian p-values for the assessment of the snorkel survey component of the integrated population model, using discrepancy D2.

4.7 Discussion

In this paper, we developed an integrated population model to integrate capture-recapture data along with carcass recovery data and snorkel survey data to gain insight on the biologi- cal processes driving the migration of Chinook salmon populations that use a single stopover pool on their migration route. An integrated population approach has the advantage over a Jolly-Seber approach of providing insight on snorkel observer efficiency and mean residence time spent in the counting area. These two issues are crucial to assess in order to use the area-under-the-curve method properly, which is likely to remain the method of choice for the coming years on the West Coast of Vancouver Island. Radio-tagging surveys could also provide insight on observer efficiency and mean residence time, but they are typically more expensive. In future work, we want to apply the integrated population modeling methodology to all 2009-2014 study years, compare different modeling assumptions and reflect on similarities and differences in estimates between years. Much remains to be done, namely evaluating the potential impact of transients and incorporating tag loss, loss on capture and hatchery removals of marked fish in the models. We are also interested in studying the relationship between stopover time, residence time and time of the first big freshet. Work by Dunlop (2015) suggests that this relationship is strong and thus residence time needed for the area- under-the-curve method could be estimated yearly from the time of the first big freshet alone.

66 Bibliography

[1] Abadi, F., Gimenez, O., Arlettaz, R., and Schaub, M. (2010). An assessment of inte- grated population models: bias, accuracy, and violation of the assumption of indepen- dence. Ecology, 91, 7-14.

[2] Béliveau A., Lockhart R.A., Schwarz C. J. and Arndt S.K. (2015). Adjusting for un- dercoverage of access-points in creel surveys with fewer overflights. Biometrics, doi: 10.1111/biom.12335.

[3] Besbeas, P. and Morgan, B.J.T. (2012). Kalman Filter Initialization for Integrated Population Modelling. Journal of the Royal Statistical Society: Series C, 61, 151-162.

[4] Besbeas, P., Borysiewicz, R.S. and Morgan, B.J.T. (2008) Completing the ecological jigsaw. In: Thomson D.L., Cooch E.G. and Conroy M.J., eds. Modeling Demographic Processes in Marked Populations. Environmental and Ecological Statistics, 3. Springer, New York, pp. 513-539.

[5] Besbeas, P., Lebreton, J.-D. and Morgan, B.J.T. (2003). The Efficient Integration of Abundance and Demographic Data. Journal of the Royal Statistical Society: Series C, 52, 95-102.

[6] Besbeas, P., Freeman, S.N., Morgan, B.J.T. and Catchpole E.A. (2002). Integrating Mark-Recapture-Recovery and Census Data to Estimate Animal Abundance and De- mographic Parameters. Biometrics, 58, 540-547.

[7] Brooks S.P., Catchpole E.A. and Morgan B.J.T. (2000). Bayesian animal survival es- timation. Statistical Science, 15, 357-376.

[8] Buckland, S.T., Newman, K.B., Thomas, L. and Koesters, N.B. (2004). State-Space Models for the Dynamics of Wild Animal Populations. Ecological Modelling, 171, 157- 175.

[9] Chandler, R.B. and Clark, J.D. (2014). Spatially explicit integrated population models. Methods in Ecology and Evolution, 5, 1351-1360.

[10] Cochran W.G. (1977). Sampling techniques. 3rd edition. New York: John Wiley.

[11] Dauk P.C. and Schwarz C.J. (2001). Catch estimation with restricted randomization in the effort survey. Biometrics 57, 461–468.

[12] Dunlop R.H. (2015). Open population mark-recapture estimation of ocean-type Chinook salmon spawning escapements at stopover sites on the west coast of Vancouver Island.

67 Report prepared for the Sentinel Stocks & Southern Boundary and Enhancement Com- mittee, Pacific Salmon Commission.

[13] DFO. (2014). Proceedings of the Regional Peer Review on the West Coast Vancouver Island Chinook Salmon Escapement Estimation and Stock Aggregation Procedures; June 18-20, 2013. DFO Can. Sci. Advis. Sec., Proceed Ser. 2014/025.

[14] DFO. (2012). Assessment of west coast Vancouver Island Chinook and 2010 Forecast. DFO Can. Sci. Advis. Sec. Sci. Advis. Rep. 2011/032.

[15] English K.K., Bocking R.C. and Irvine J.R. (1992). A robust procedure for estimating salmon escapement based on the area-under-the-curve method. Canadian Journal of Fisheries and Aquatic Science, 49, 1982-1989.

[16] Freeman M.F., Tukey J.W. (1950). Transformations related to the angular and the square root. Ann. Math. Statist., 21, 607-611.

[17] Gelman A., Meng X-L. and Stern H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733-807.

[18] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1, 515-533.

[19] Hilborn R., Bue B.G. and Sharr S. (1999). Estimating spawning escapements from periodic counts: a comparison of methods. Canadian Journal of Fisheries and Aquatic Science, 56, 888-896.

[20] Isaki C.T. and Fuller W.W. (1982). Survey design under the regression superpopulation model. Journal of the American Statistical Association 77, 89-96.

[21] Kéry, M. and Schaub, M. (2012). Bayesian Population Analysis Using WinBUGS: A Hierarchical Perspective. Academic Press, Burlington, MA.

[22] King, R. (2012). A review of Bayesian state-space modelling of capture-recapture- recovery data. Interface Focus, 2, 190-204.

[23] Koller D. and Friedman N. (2009). Probabilistic Graphical Models: Principles and Tech- niques. The MIT press. Cambridge, MA.

[24] Lee A.M., Bjørkvoll E.M., Hansen B.B., Albon S.D., Stien A., Sæther B.-E., Engen S., Veiberg V., Loe L.E. and Grøtan V. (2015). An integrated population model for a long-lived ungulate: more efficient data use with Bayesian methods. Oikos. In press.

[25] Lohr S.L. (2009). Sampling: Design and Analysis. Second Edition. Boston, MA: Brooks/Cole Cengage Learning.

[26] Matechou, E., Morgan, B.J.T., Pledger, S., Collazo, J. and Lyons J. (2013). Integrated Analysis of Capture-Recapture-Resighting Data and Counts of Unmarked at Stop-Over Sites. Journal of Agricultural, Biological, and Environmental Statistics, 18, 120-135.

[27] Mazzetta, C., Morgan, B.J.T. and Coulson T. (2010). A state-space modelling approach to population size estimation. University of Warwick institutional repository, 1-27.

68 [28] McCrea, R.S., Morgan, B.J.T., Gimenez, O., Besbeas, P., Lebreton, J.-D. and Bregn- balle, T. (2010). Multi-Site Integrated Population Modelling. Journal of Agricultural, Biological, and Environmental Statistics, 15, 539-561.

[29] Meng X-L. (1994). Posterior Predictive p-Values. The Annals of Statistics, 22, 1142- 1160.

[30] Parken C.K., Bailey R.E. and Irvine J.R. (2003). Incorporating Uncertainty into Area- under-the-Curve and Peak Count Salmon Escapement Estimation. North American Journal of Fisheries Management, 23, 78-90.

[31] Pollock K.H., Jones C.M. and Brown T.L. (1994) Angler Survey Methods and their Ap- plications in Fisheries Management. Bethesda, Maryland: American Fisheries Society Special Publication 25.

[32] Särndal C-E., Swensson B. and Wretman J. (1992). Model Assisted Survey Sampling. New York: Springer-Verlag.

[33] Schaub, M., Gimenez, O., Sierro, A. and Arlettaz, R. (2007). Use of Integrated Mod- eling to Enhance Estimates of Population Dynamics Obtained from Limited Data. Conservation Biology, 21, 945-955.

[34] Schaub, M. and Abadi, F. (2011). Integrated population models: a novel analysis framework for deeper insights into population dynamics. Journal of Ornithology, 152 (Suppl 1), S227-S237.

[35] Schwarz C.J. and Arnason A.N. (1996). A General Methodology for the Analysis of Capture-Recapture Experiments in Open Populations. Biometrics, 52, 860-873.

[36] Sierro A., Lugon, A. and Arlettaz, R. (2009). La colonie de grands rhinolophes Rhi- nolophus ferrumequinum de l’église St-Sylve à Vex (Valais, Suisse): évolution sur deux décennies (1986-2006). Le Rhinolophe, 18, 75-82

[37] Stoklosa, J., Hwang, W-H., Wu, S-H. and Huggins, R. (2011). Heterogeneous Capture- Recapture Models with Covariates: A Partial Likelihood Approach for Closed Popula- tions. Biometrics, 67, 1659-1665.

[38] Varin C., Reid N. and Firth, D. (2011). An Overview of Composite Likelihood Methods. Statistica Sinica, 21, 5-42.

69 Appendix A

Supplementary materials for Chapter 2

A.1 First-order Taylor expansions

¯ ! ¯ ! −1 . 1 X AoU 1 X Aoi AoU Agi N Cˆ1 = Ci + C¯U − C¯U ng A¯ no A¯ A¯ A¯ i∈sg gU i∈so gU gU gU

−1 . 1 X 1 X Aoi N Cˆ2 = CiR¯U + C¯U Ri − C¯U R¯U , where Ri = ng no Agi i∈sg i∈so   −1 . 1 X y¯U 1 X y¯U N CˆR = Ci + yi − Ci ng C¯ no C¯ i∈sg U i∈so U

A.2 Assumptions, propositions and proofs for the study of Errparty

A.2.1 Assumptions

(A1.) For every i ∈ U, Ii1, . . . ,IiMi are independent conditionally on (Mi,ci1, . . . ,ciMi ,δi1, . . . ,δiMi )

(A2.) For every i ∈ U and j ∈ Vi,

Iij|(Mi,ci1, . . . ,ciMi ,δi1, . . . ,δiMi ) ∼ Bernoulli(pi);

p (A3.) For every i ∈ U, Aoi → ∞;

r  2 1 P 1 ∗ cij − C Mi j∈V Mi i √ (A4.) For every i ∈ U, CV (c ) ≡ i = o ( M ). i ij 1 C∗ p i Mi i

70 A.2.2 Study of Errparty for the estimators CbR and CbDE

Proposition 1. Under assumptions (A1.)–(A4.),

Err (Cˆ) party → 0 in probability, C∗ where Cˆ stands for either CˆDE or CˆR.

Proof. We start by bounding the absolute relative error in the following way:

P P ∗ yi − Ci |Errparty(Cˆ)| i∈U i∈U ∗ = P ∗ C Ci i∈U

P ∗ iCi i∈U = P ∗ Ci i∈U P ∗ |i|Ci i∈U ≤ P ∗ Ci i∈U P ∗ max(|i|)Ci i∈U i∈U ≤ P ∗ Ci i∈U

= max (|i|), i∈U where

∗ (  −1 ) ∗ ! yi − Ci Aoipi − Agi Agi − Aoipi Ci − Ci pi Agi − Aoipi i = ∗ = + 1 + 1 ∗ − . Ci Aoipi Aoipi Ci pi Aoipi

P P Noting that Agi and Aoi can be expressed as Agi = δijIij and Aoi = δij, respectively, j∈Vi j∈Vi conditional moment calculations show that

! A − A p A p − A p E gi oi i M ,δ , . . . ,δ = oi i oi i = 0, i i1 iMi Aoipi Aoipi

71 using assumption (A2.) and

! A − A p A p (1 − p ) Var gi oi i M ,δ , . . . ,δ = oi i i i i1 iMi 2 2 Aoipi Aoipi (1 − p ) = i Aoipi → 0 in probability using assumptions (A1.)–(A3.). ∗ P ∗ P Also, noting that Ci and Ci can be expressed as Ci = cijIij and Ci = cij, respec- j∈Vi j∈Vi tively, conditional moment calculations show that

! C − C∗p C∗p − C∗p E i i i M ,δ , . . . ,δ = i i i i = 0, ∗ i i1 iMi ∗ Ci pi Ci pi using assumption (A2.) and

∗ ! P 2 C − C p pi(1 − pi) c Var i i i M ,δ , . . . ,δ = j∈Vi ij ∗ i i1 iMi ∗2 2 Ci pi Ci pi (1 − p ){CV (c )2 + 1} = i i ij piMi → 0 in probability using assumptions (A1.), (A2.) and (A4.). Then, applying Chebyshev’s inequality, we have that for any  > 0,  A − A p  P gi oi i ≥  M ,δ , . . . ,δ → 0 i i1 iMi Aoipi and ! C − C∗p P i i i ≥  M ,δ , . . . ,δ → 0 ∗ i i1 iMi Ci pi which imply ! Err (Cˆ) P party ≥  M ,c , . . . ,c ,δ , . . . ,δ → 0 in probability. ∗ i i1 iMi i1 iMi C Taking expected value and applying the Dominated Convergence Theorem gives the con- clusion of the proposition.

A.2.3 Study of Errparty for the estimator Cb1

Proposition 2. If assumptions (A1.)–(A4.) are satisfied, then for any  > 0, ! mini∈U (pi) Errparty(Cb1) maxi∈U (pi) P − 1 −  ≤ ∗ ≤ − 1 +  → 1, maxi∈U (pi) C mini∈U (pi)

72 Proof. Errparty(Cb1) can be bounded above in the following way:

P A P C i∈U oi − P C∗ i∈U i P A i∈U i Errparty(Cb1) i∈U gi ∗ = P ∗ C i∈U Ci P ∗ P ! P ∗ ! i∈U Ci pi i∈U Aoi i∈U Ci − Ci pi = P ∗ P P ∗ + 1 (A.1) i∈U Ci i∈U Aoipi i∈U Ci pi   P P !−1  i∈U Aoipi − Agi i∈U Agi − Aoipi  × P P + 1 + 1 − 1  i∈U Aoipi i∈U Aoipi  P ∗ ! maxi∈U (pi) i∈U Ci − Ci pi ≤ P ∗ + 1 mini∈U (pi) i∈U Ci pi   P P !−1  i∈U Aoipi − Agi i∈U Agi − Aoipi  × P P + 1 + 1 − 1.  i∈U Aoipi i∈U Aoipi 

Similarly, we can obtain a lower bound:

P ∗ ! Errparty(Cb1) mini∈U (pi) i∈U Ci − Ci pi ∗ ≥ P ∗ + 1 C maxi∈U (pi) i∈U Ci pi   P P !−1  i∈U Aoipi − Agi i∈U Agi − Aoipi  × P P + 1 + 1 − 1.  i∈U Aoipi i∈U Aoipi 

The first conditional moment computations are analogous to the ones in Appendix A.2.2: ! ! A − A p C − C∗p E gi oi i M ,δ , . . . ,δ = E i i i M ,δ , . . . ,δ = 0. P i i1 iMi P ∗ i i1 iMi j∈U Aojpj j∈U Cj pj

Also following from Appendix A.2.2:

! ! A − A p A − A p Var gi oi i M ,δ , . . . ,δ ≤ Var gi oi i M ,δ , . . . ,δ → 0 in probability P i i1 iMi i i1 iMi j∈U Aojpj Aoipi and ! ! C − C∗p C − C∗p Var i i i M ,δ , . . . ,δ ≤ Var i i i M ,δ , . . . ,δ → 0 in probability. P ∗ i i1 iMi ∗ i i1 iMi j∈U Cj pj Ci pi

73 Then, we use Chebyshev’s inequality the same way as in Appendix A.2.2 and obtain that for any  > 0, ! min (p ) Err (Cb ) max (p ) P i∈U i − 1 −  ≤ party 1 ≤ i∈U i − 1 +  M ,c , . . . ,c ,δ , . . . ,δ ∗ i i1 iMi i1 iMi maxi∈U (pi) C mini∈U (pi) → 1 in probability.

Taking expected value and applying the Dominated Convergence Theorem gives the con- clusion of the proposition. Proposition 3. If assumptions (A1.)–(A4.) are satisfied and if, for every i ∈ U,

∗ Ci 1 A5. = µci + Op( √ ) Mi Mi

Aoi 1 A6. = µδi + Op( √ ) Mi Mi then P P ! Errparty(Cb1) i∈U µcipi i∈U µδi ∗ − P P − 1 → 0 in probability. C i∈U µci i∈U µδipi

Proof. In the expression for the relative error due to party sampling (A.1), we develop the P C∗p P A  factor i∈U i i i∈U oi into a Taylor series expansion at the point P C∗ P A p i∈U i i∈U oi i  ∗ ∗  C1 CN Ao1 AoN ,..., , ,..., = (µc1, . . . ,µcN ,µδ1, . . . ,µδN ). M1 MN M1 MN

It is then easy to see that

P ∗ P ! P P !   i∈U Ci pi i∈U Aoi i∈U µciMipi i∈U µδiMi 1 P ∗ P = P P + Op √ , i∈U Ci i∈U Aoipi i∈U µciMi i∈U µδiMipi Mi under assumptions (A5.) and (A6.). Then, using the conditional moment calculations and  1  Chebyshev’s inequalities from Appendix A.2.3, and noting that Op √ is already op(1), Mi we get : for every  > 0, ( ! ) Err (Cb ) P µ p P µ P party 1 − i∈U ci i i∈U δi − 1 >  M ,c , . . . ,c ,δ , . . . ,δ → 0 in probability. ∗ P P i i1 iMi i1 iMi C i∈U µci i∈U µδipi Taking expected value and applying the Dominated Convergence Theorem gives the con- clusion of the proposition.

A.2.4 Study of Errparty for the estimator Cb2

Proposition 4. If assumptions (A1.)–(A4.) are satisfied, then for any  > 0, ! mini∈U (pi) Errparty(Cb2) maxi∈U (pi) P − 1 −  ≤ ∗ ≤ − 1 +  → 1, maxi∈U (pi) C mini∈U (pi)

74 Proof. Errparty(Cb2) can be bounded above in the following way:

  (P C ) 1 P Aoi − P C∗ Errparty(Cb2) i∈U i N i∈U Agi i∈U i ∗ = P ∗ C i∈U Ci P ∗ !(P ∗ ) i∈U Ci pi i∈U (Ci − Ci pi) = P ∗ P ∗ + 1 i∈U Ci i∈U Ci pi ( ) 1 (A p − A ) A − A p −1 1 1 × X oi i gi gi oi i + 1 + − 1 N A p A p p p i∈U oi i oi i i i (P ∗ ) i∈U (Ci − Ci pi) ≤ max(pi) P ∗ + 1 i∈U i∈U Ci pi ( ) 1 (A p − A ) A − A p −1 1 1 × X oi i gi gi oi i + 1 + − 1. N A p A p p min (p ) i∈U oi i oi i i i∈U i

Similarly, we can obtain a lower bound:

(P ∗ ) Errparty(Cb2) i∈U (Ci − Ci pi) ∗ ≥ min(pi) P ∗ + 1 C i∈U i∈U Ci pi ( ) 1 (A p − A ) A − A p −1 1 1 × X oi i gi gi oi i + 1 + − 1. N A p A p p max (p ) i∈U oi i oi i i i∈U i

From here, the rest of the proof is analogous to the proof in Appendix A.2.3.

75 A.3 Proof of the Optimal Allocation

Introducing the Lagrange multiplier λ in the asymptotic variance formula, the function to minimize is ! ! 2 1 1 2 2 1 1 2 N − Sα + N − Sβ + λ (ngCg + noCo − B) . ng N no ng

Setting the derivative with respect to ng equal to zero gives the equation

2 2 2 2 N Sα N Sβ − 2 + 2 + λCg = 0. ng ng

Similarly, setting the derivative with respect to no equal to zero gives

2 2 N Sβ − 2 + λCo = 0 no and setting the derivative with respect to λ equal to zero gives

ngCg + noCo − B = 0.

Solving the first equation, we get v u 2 2 uSα − Sβ ng = Nt . (A.2) λCg

Similarly, solving the second equation, we get

s 2 Sβ no = N . (A.3) λCo It remains to find the value of λ. Inserting (A.2) and (A.3) in the third equation gives

√ N nq q o λ = C (S2 − S2) + C S2 . C g α β o β √ The final expressions for ng and no are obtained by inserting λ in (A.2) and (A.3).

76 A.4 Monte Carlo measures

The Monte Carlo relative bias due to the sampling of days is computed as

E Cˆ − C˜ RB Cˆ = MC , daysMC C∗ where 1 K E Cˆ = X Cˆ MC K k k=1 is the Monte Carlo expectation, with Cˆk representing the estimator Cˆ computed within the kth sample. The Monte Carlo relative root mean squared error is given by

q  MSEMC Cˆ RRMSE Cˆ = , MC C∗ where K 1  2 MSE Cˆ = X Cˆ − C∗ . MC K k k=1 The Monte Carlo coverage probability of confidence intervals is given by

1 K CP Cˆ = X I , MC K k k=1

∗ where Ik is an indicator variable of the coverage of the true total C by the 95% confidence ˆ q ˆ  interval Ck ± tng−1,0.975 Vard Ck . The Monte Carlo bias ratio is given by

ˆ ∗  EMC C − C BRMC Cˆ = , q  VarMC Cˆ where K 1 n o2 Var Cˆ = X Cˆ − E Cˆ . MC K k MC k=1

77 A.5 Figures

Figure A.1: Time series of the ratio Agi , calculated from the Kootenay Lake data for Aoi overflight survey days, separately for weekends (WE) and weekdays (WD).

78 Figure A.2: Time series of the quantity y = C Aoi , calculated from the Kootenay Lake i i Agi data for overflight survey days, separately for weekends (WE) and weekdays (WD).

79 Figure A.3: Time series of the quantity B Aoi , where B is the number of parties interviewed i Agi i on day i. The quantity was calculated from the Kootenay Lake data for overflight survey days, separately for weekends (WE) and weekdays (WD).

80 Appendix B

Supplementary materials for Chapter 3

B.1 Monte Carlo measures used in the simulation study

ˆ Let θh be an estimate of a parameter θ obtained by analyzing the hth simulated dataset (out of H = 250) in a given scenario. The Monte Carlo bias is calculated as

Bias(θˆ) = E(θˆ) − θ, where 1 H E(θˆ) = X θˆ . H h h=1

The Monte Carlo root mean square error is calculated as

1 H RMSE(θˆ) = X(θˆ − θ)2. H h h=1

The Monte Carlo bias ratio is calculated as ˆ ˆ Bias(θ) BR(θ) = q , V(θˆ) where H 1 n o2 V(θˆ) = X θˆ − E(θˆ) . H h h=1

81 The Monte Carlo expected credible interval length is calculated as

1 H E.LCI(θˆ) = X(U − L ), H θ,h θ,h h=1 where Uθ,h and Lθ,h are respectively the upper and lower bound of the highest posterior density (HPD) credible interval for θ in the dataset h. The Monte Carlo coverage probability of the credible interval is calculated as

1 H CP(θˆ) = X I(L ≤ θ ≤ U ), H θ,h θ,h h=1 where I is an indicator function. ˆ ˆ Let θh,L be an estimator obtained by true joint likelihood modeling of dataset h and θh,Lc be an estimator obtained by composite likelihood modeling of dataset h. The Monte Carlo probability that the true likelihood method has the smallest absolute error is calculated as

H c 1 X   PS.AE(θˆ|L, L ) = I |θˆ − θ| ≤ |θˆ c − θ| . H h,L h,L h=1

Let SDθ,h,L be the posterior standard deviation for parameter θ obtained by true joint likeli- hood modeling of dataset h and SDθ,h,Lc be the posterior standard deviation for parameter θ obtained by composite likelihood modeling of dataset h. The Monte Carlo probability that the true likelihood method has the smallest posterior standard deviation is calculated as

H c 1 X PS.SD(θˆ|L, L ) = I (SD ≤ SD c ) . H θ,h,L θ,h,L h=1

Let Lθ,h,L and Uθ,h,L be the lower and upper bounds of the credible interval obtained for θ by the true joint likelihood modeling of dataset h, and let Lθ,h,Lc and Uθ,h,Lc be the lower and upper bounds of the credible interval obtained for θ by the composite likelihood modeling of dataset h. The Monte Carlo probability that the true likelihood method has the smallest credible interval length is calculated as

H c 1 X PS.LCI(θˆ|L, L ) = I (U − L ≤ U c − L c ) . H θ,h,L θ,h,L θ,h,L θ,h,L h=1

82 B.2 Plots of the results of the simulation study

Survival probability

● ●

0.7 ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ●●● ●●● 0.6 ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●●●● ●●● ● ●●●● ● ● ●●●●●●●●●●● ● ● ●● ● ●● ●●●●

●● 0.3 ●● ● ●● ●●●● ●●● ● ● ●● ● ● ● ● ●●●●●● ●● ●● ● ● ●●●●●● ● ● ● ●●●●●● ● ● ● ●●●●●●●●●●●● ● ●●●●●●● ● ● ● ● ●●●● ● ● ●●●●●●● ●●● ●●● ●● ● ●●●●●●●●●●●● ● ●●●●●●● ●●●●●●●● ● ● 0.5 ●●●● ●●●●●●● ●● ● ●●●● ●●●● ●● ● ●●●●●●●● ●●●●●●●● ●●● ● ●●●●●●●●●●● ●● ● ●●● ●● ● ●●●●●●●●● ●●●●●●●● ●●●●●● ● ●●●● ●●● ● ●●●●●●● ●●●●●●●● ● ●● ● ● ● 0.2 ●● ●●●●● ● ● ● ●●● ●●●●●● ● ●● ●●●●●●●● 0.4 ● ●● ●●● ● ●● ●● ● ●●● ●

● 0.1 via the composite likelihood approach via the composite likelihood approach via the composite likelihood 0.3 ^ ^ φ ● φ

0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4

φ^ via the true joint likelihood approach φ^ via the true joint likelihood approach

(a) Scenario 1 (b) Scenario 2

● ●● ● ● ● ● ● 0.30 ● ● ● ●● ● ● ● ●●● ● ● ● 0.55 ●● ● ● ●●● ●● ● ● ● ● ●●● ● ●● ● ●●●●● ●●● ● ● ● ●●● ●● 0.28 ● ●● ●●●●●●●●● ● ●● ● ●●● ● ●●●●●● ● ●●●●●● ● ● ●● ●●●● ● ●● ●●●● ● ● ●● ●●● ●● ● ● ●●●● ● ●●● ●●● ● ● ●● ●●● ● ●●●●● ●● ● ●●●●●●●● ● ●●●●●●●●● ●●●●●●● ● ● ●●●●●●●● ● ●●●●●●●●● ● ●●●●●● 0.26 ●●●●●● ●●●●●●● ●● ●●● ●● ● ● ●●●●● ●●●●●●●●●● ● ● ●●●●●●●● ●●●●●●●●● ● ● ●●●●● ●● ● ● ●●●●●●● ●● 0.50 ● ●●●● ● ● ● ●●●●●●● ● ● ● ●●●●●● ● ● ●●● ● ●●●●●● ● ● ● ● ●●●●● ●●● ●●●● ● ● ●● ●●●●● ● ●●●●● 0.24 ●● ● ●●● ●● ● ●●●●●● ● ● ●● ●●● ●●●●● ● ●● ● ● ● ●●●●●●● ●●● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ●●● ●● ●●● ● ●● ● ●● ● ● 0.22 ● ● ● ● ● ● ●● ● ●

0.45 ● ● via the composite likelihood approach via the composite likelihood approach via the composite likelihood 0.20 ● ^ ^ φ ● φ ●

0.45 0.50 0.55 0.20 0.22 0.24 0.26 0.28 0.30

φ^ via the true joint likelihood approach φ^ via the true joint likelihood approach

(c) Scenario 3 (d) Scenario 4

Figure B.1: Plots comparing the values of φˆ (posterior mean) obtained for the true joint likelihood approach and the composite likelihood approach in the simulation study. Each plot contains 250 data points. The points that lay in the gray hourglass region are the simulation runs for which the value of φˆ obtained from the true joint likelihood approach is closer (in absolute value) to the true parameter value φ. Note that the scales differ across plots.

83 Recapture probability

● ● ● 0.35 ●● ●● ● ●● ●●

0.4 ● ● ●● ●

0.30 ● ● ●● ● ●●●● ● ●●●●●●● ●● ●● ●● ● ● ● ●● ● ●● ●●●●●●● ● ●●● ●●● ●●●●●● ● ●●●● ● ● ● 0.25 ● ●●● 0.3 ●●● ●●● ●●●●●●● ● ●●●●● ● ●● ●● ●●●●●● ●●●●●●● ●●●●●●●●●● ● ● ●●●●●●●●●●●● ● ●●●●●● ●●●● ●●●●●●●● ● ●●●●●●●●●●● ●●●●●●●●●●●● ● ●●●●●●●●● ●●●●●●●● ●● ●●●●● ● ●●●●●● 0.20 ● ●●●●●●●●● ●●●●●●●● ●●●●●●●● ● ●●●●●●● ● ●●●●●●●●●●● ●●●●●● 0.2 ●●●●●●●●●● ● ●●●● ●●●● ●●●● ●●● ●●● ● ●●●●●● ●●● ●●●●● ● ● ●●●● ● ●●●●●●● ●● ●● 0.15 ● ●●● ● ●● ● ● ● ● ●

0.1 ● 0.10 via the composite likelihood approach via the composite likelihood approach via the composite likelihood ^ ^ p p 0.05 0.0

0.0 0.1 0.2 0.3 0.4 0.05 0.10 0.15 0.20 0.25 0.30 0.35

p^ via the true joint likelihood approach p^ via the true joint likelihood approach

(a) Scenario 1 (b) Scenario 2

● ●

● ● 0.60 ● ●● ● ● ● ● ● ●● ● ●● ●● ●●● ● 0.6 ● ● ●● ●●● ● ● ●●●●●● ●●●● ● ● ●●● ●●● ●●●●●● ●●●●●●●● 0.55 ●● ● ●● ● ● ●●●● ● ●●●●● ●● ● ●●●●●●● ●●●●● ● ●●●●●●●● ●●●● ● ● ●●●● ●●● ●●●●●●●● ● ●●●● ● ●●●●● ● ●●●●●●●● ●●●●●●●● ● ●●●●●●● ● ●●●●●●●●● ● ●●●●●● ● ●● ●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●●●●● ●● ● 0.5 ● ●●●●● ● 0.50 ● ● ● ●●● ● ●●●●●●●● ●● ●●●●●●●●● ●● ●●●●●● ● ● ● ●●●●●●●●● ● ●● ●●●●●●●●● ●●●●●●●● ● ● ●● ●● ●●●●●●●●● ● ● ●●●●● ●●●● ● ●● ●●● ●●●● ● ●●●● ● ● ● ● ●● ● ●● ●●●● ●● ●● ● ●● ● ●● ● ● ● 0.45 ● ● ●● ● ● ● ● ●

0.4 ● 0.40 via the composite likelihood approach via the composite likelihood approach via the composite likelihood ^ ^ p p

0.40 0.45 0.50 0.55 0.60 0.4 0.5 0.6

p^ via the true joint likelihood approach p^ via the true joint likelihood approach

(c) Scenario 3 (d) Scenario 4

Figure B.2: Plots comparing the values of pˆ (posterior mean) obtained for the true joint likelihood approach and the composite likelihood approach in the simulation study. Each plot contains 250 data points. The points that lay in the gray hourglass region are the simulation runs for which the value of pˆ obtained from the true joint likelihood approach is closer (in absolute value) to the true parameter value p. Note that the scales differ across plots.

84 Fecundity rate

● 9 ● 5 ●●● ● ●●● ● ● ●● ●●●● ●● 8 ●●●●● ● ● ● ● ●●● ●● ● ● ●● ●●●●● ●● ● ●●● ● ● 4 ● ● ● ●● ●● ● ●●● ●●● ● ●● ●● ●●● ●● ● ● ●●● ●● ● ●● 7 ● ● ● ●●●● ●●● ● ●● ● ● ● ●● ● ● ●●●● ● ● ● ●●● ●●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ●●● ●● ●●●●● ● ● ● ● 3 ● ● ●●●●●●●●● ●● ● ●●●●● ● ●●●●●●●● ●● ●●● ●

6 ● ● ●● ●● ●● ● ●●● ●● ● ●●●●●● ●● ●●● ● ●● ●●●●● ●●● ●●●● ● ● ●●●●●●● ● ●● ● ●●● ●● ●●●●●●●● ● ● ●● ● ● ●● ●●●●● ●●● ● ● ●● ● ● ●●●●●●●●●● ● ●● ●

2 ● ● ● ● ●●●●●●●● ●● ● ● ●● ●●●●● 5 ● ●● ● ●●●●●●●●●●●●● ● ● ●● ●●●●●●●●●●●● ● ●● ●●●● ● ●●●●●●●●● ● ●● ●●● ● ● ● ●● ● ●●● ● ● ●●●● ● ●●● ●● ● ● ● ● ● ● ● ● ● 4 ● 1 ● ● ● ● ● ●● ● via the composite likelihood approach via the composite likelihood approach via the composite likelihood 3 ^ ^ f f ● 0

0 1 2 3 4 5 3 4 5 6 7 8 9 ^ ^ f via the true joint likelihood approach f via the true joint likelihood approach

(a) Scenario 1 (b) Scenario 2

● ● 3.0 ● 8 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● 2.5 ● ● ● ● ● 7 ● ● ● ●●● ●●● ●● ● ●●●●● ● ● ●● ●●● ● ● ● ●●●●●●● ● ●● ●●●●● ● ● ● ●●● ● ● ● ●●● ●●●● ● ●●● ●●● ●●● ●●●●●● ●●●●●● ● ● ●● ●●● ● ● ● ●●●●●● ● ● ● ●● ●●● ● ●●●●●●●●●●●● ● ● ●●●●● ●●● ● ●●●●●●● ● ●●●●●●●●●● ●● ● ●● ● ● ● ●●●●●●●●● ●●●● ● ●●●●●●● ● ●● ●● ●●● ● ● ●● 6 ●● ● ●● ● ●●●●● ●●●● ●●●● ●●●● 2.0 ●●●●●●●●●● ● ●●●●●●●●●●● ● ● ●●●●●●●●● ●●● ● ●●●● ●●●● ●●●●● ●●●● ●●● ●●●● ● ●●●●●●●● ● ●●●●●● ● ●●●●● ● ● ●● ●●●●● ●●●●●●●● ● ●●● ●●● ● ● ●●●●●●● ● ●●●● ● ●●●●●●● ● ● ●● ●●● ● ● ● ●●● ● ● ● ● 5 ● ●● ● ● ● 1.5 ● ● 4 via the composite likelihood approach via the composite likelihood approach via the composite likelihood ^ ^ f f 1.0

1.0 1.5 2.0 2.5 3.0 4 5 6 7 8 ^ ^ f via the true joint likelihood approach f via the true joint likelihood approach

(c) Scenario 3 (d) Scenario 4

Figure B.3: Plots comparing the values of fˆ (posterior mean) obtained for the true joint likelihood approach and the composite likelihood approach in the simulation study. Each plot contains 250 data points. The points that lay in the gray hourglass region are the simulation runs for which the value of fˆ obtained from the true joint likelihood approach is closer (in absolute value) to the true parameter value f. Note that the scales differ across plots.

85 Initial population size

● ● ● ●

560 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● 550 ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ●●●●●●● ● ● ●●●●●●●●● ● ● ● ●● ●● ● ●●●●●●● ●●● ● ● ●● ● ●●● ●● ●●● ●●●● ●● ● ● ● ●● ● ●● ●●●●● ●●● ● 520 ● ● ● ● ● ● ●●●● ● ●●● ● ● ● ● ● ●●●●● ● ● ● ● ●● ● ● ●● ● ●●● ● ● ●●●●●●● ● ● ● ● ● ●●●●●●● ●●●●● ● ●●●●● ● ● ● ●● ●●●●● ● ●● ● ●●●● ●●●●●●● ● ●● ●●●●●●●● ● ●● ●● ● ●● ● ● ●● ●●●●●● ● ● 500 ●● ●● ● ● ●●● ●●●●●●● ● ● ● ●●●●●●● ● ● ●●● ●●●●●●●● ● ●●●●●●●● ● ● ●●●● ● ●● ●● ●●●●● ●●● ●●● ● ●● ● ● ●● ● ●●●●●● ● ●● ● ● ● ● ●●● ●●●● ● ● ●●● ● ●●● ●●●●●● 480 ● ●●●●●●●● ● ●●●● ●●● ● ● ●● ● ●●● ● ● ●● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● 450 ● ● ●● ● ● ● ● ●● ● ● via the composite likelihood approach via the composite likelihood ● approach via the composite likelihood 440 1 1

^ ● ^ N N

440 460 480 500 520 540 560 450 500 550 ^ ^ N1 via the true joint likelihood approach N1 via the true joint likelihood approach

(a) Scenario 1 (b) Scenario 2

● ● ● ● ●

560 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●●●● ● ●

● ● 550 ● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ●● ●● ●●● ● ●● ● ●●● ● ● ● ● ●● ● ● ●●●●● ● ●● ●●●●● ●●● ● ● ●●●●●●● ● ●●●● ● ●● ● ●●● ●● 520 ● ● ●●● ●● ●● ● ● ● ● ●● ●●●●●●● ● ●●●●●● ● ● ● ● ●●●● ●● ●● ● ● ●●●●●●●● ● ●● ● ●●● ● ● ●● ●●●● ●● ● ● ● ●● ● ●●●●●●●●● ● ●●●●● ●●●● ● ● ● ●●●●●● ●● ● ● ● ● ● ● ●● ● ● ●● ●●●●●● ● ● ●● ●● ● ●●●●●●●● ●●●● ● ● 500 ● ● ● ● ●● ●● ●● ● ●●●●●● ●● ●●●● ●●● ●●●●●●● ●● ● ●●●● ● ● ●● ●●●● ● ●●●● ●● ● ●● ●●●●●● ● ● ● ● ●●●●●●● ● ● ● ● ●●● ●●●●● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ●● ●●● ● ● ● ● ●● ●●● ●

480 ● ●● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ●●● ● ● ● 450 ● ● ●● ● ● ●●● ● ● ● via the composite likelihood approach via the composite likelihood ● ● approach via the composite likelihood 1 1 440 ●

^ ● ^ N N

440 460 480 500 520 540 560 450 500 550 ^ ^ N1 via the true joint likelihood approach N1 via the true joint likelihood approach

(c) Scenario 3 (d) Scenario 4

Figure B.4: Plots comparing the values of Nˆ1 (posterior mean) obtained for the true joint likelihood approach and the composite likelihood approach in the simulation study. Each plot contains 250 data points. The points that lay in the gray hourglass region are the simulation runs for which the value of Nˆ1 obtained from the true joint likelihood approach is closer (in absolute value) to the true parameter value N1. Note that the scales differ across plots.

86 Count variability

● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ●● ●●● ● ●● ●●● ● ● ● ●●●● ● ● ● ●●● ● ●●●● ●●●●● 60 ●●●●● 60 ● ●●● ● ●●● ●● ●●●●● ●●●● ●● ● ● ●● ●● ● ● ● ●●●● ● ● ●● ● ● ●● ● ●● ● ●●●●●●● ● ●●●●● ●● ● ● ●●●● ●● ●●●● ●● ●●●● ● ●● ● ● ●●●●●●●●●● ● ● ●● ● ● ● ●●●●●●● ●●●●●●● ●● ● ● ● ●●●●●●●● ● ● ●● ● ● ● ● ●●●●●● ● ● ● ●●●●●●●● ● ●● ●●● ● ●● ●●●●●● ● ●●●● ● ● ● ● ●● ● 40 ● ● ● ●● ● 40 ● ● ● ● ●●●● ● ● ●●●● ● ● ● ● ●● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●●●● ●● ● ●● ● ●●●●● ● ● ● ●●● ●●● ●●● ● ●●● ● ● ●●●● ● ● ● ● ● ●●● ● ●● ● ● ●●●●●●●●●●● ● ●● ● ●● ● ●●●●●● ● ●●●●●● ●●● ● ● ●●●●●● ● ● ●●● ● ● ● ● ● ● ●●●●● ● ● ●● ● ●●● ●● ●● 20 20 ● ●● via the composite likelihood approach via the composite likelihood approach via the composite likelihood ^ ^ σ σ 0 0

0 20 40 60 0 20 40 60

σ^ via the true joint likelihood approach σ^ via the true joint likelihood approach

(a) Scenario 1 (b) Scenario 2

80 ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●●● ● ● ●● ● ● ● ●●●●● ● ● ● ● ●●●●●● ●● ●●●●● 60 ● ●● ●● ●●● ●●● ●● ●●●●● ● ●●● ● ●●●● ● ●●●●● ●● 60 ● ● ●●● ● ●●●●● ● ●●●● ● ●●●● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●●● ● ● ●● ●●● ● ● ● ●●●●● ●● ● ● ● ●●●● ● ●● ●●● ● ● ● ●●●● ●●●●●● ● ●●●● ● ●● ● ● ● ● ●●●● ● ● ● ● ●●●●●● ●● ●● ● ●●●●● ● ● ●●●● ●● ● ●●●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● 40 ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ●●● ● ● ● ● 40 ●● ● ● ●●●●●● ● ● ●● ●● ● ● ● ● ●●● ●●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ●●● ● ●●●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●●●● ● ● ● ●●●●● ● ● ● ● ●●● ●●●● ● ● 20 ●●●● ● ●●● ●● ● ● 20 via the composite likelihood approach via the composite likelihood approach via the composite likelihood ^ ^ σ σ 0 0

0 20 40 60 0 20 40 60 80

σ^ via the true joint likelihood approach σ^ via the true joint likelihood approach

(c) Scenario 3 (d) Scenario 4

Figure B.5: Plots comparing the values of σˆ (posterior mean) obtained for the true joint likelihood approach and the composite likelihood approach in the simulation study. Each plot contains 250 data points. The points that lay in the gray hourglass region are the simulation runs for which the value of σˆ obtained from the true joint likelihood approach is closer (in absolute value) to the true parameter value σ. Note that the scales differ across plots.

87 Survival probability

0.45 ● ● 0.7 ● ● 0.40 ● 0.6 0.35 ^ ^ φ 0.5 φ 0.30

0.25 0.4

0.20 ● 0.3 ● ● 0.15 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite (a) Scenario 1 (b) Scenario 2

0.45 0.7 0.40

0.6 0.35 ● ● ● ^ ^ φ 0.5 φ 0.30 ● ● ● 0.25 0.4

0.20 ● ● 0.3 0.15 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite (c) Scenario 3 (d) Scenario 4

Figure B.6: Boxplots of φˆ (posterior mean) per scenario and estimation method. The horizontal line in each plot indicates the true value of the parameter φ in that scenario.

88 Recapture probability

0.5 0.5 ● ● ● 0.4 ● 0.4 ● ● ● ● ● 0.3 0.3 ^ ^ p p

0.2 0.2

0.1 0.1 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite

(a) Scenario 1 (b) Scenario 2

0.70 0.70 ● ● 0.65 0.65 ● ● ● ● 0.60 0.60

0.55 0.55 ^ ^ p p 0.50 0.50

0.45 0.45

0.40 0.40

0.35 0.35 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite

(c) Scenario 3 (d) Scenario 4

Figure B.7: Boxplots of pˆ (posterior mean) per scenario and estimation method. The horizontal line in each plot indicates the true value of the parameter p in that scenario.

89 Fecundity rate

● ● 9 5 ● ● 8

4 ● 7 ● 6 ^ ^ f 3 f

5 2 4

1 3 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite (a) Scenario 1 (b) Scenario 2

9

5 ● ● 8 ● ● ●

4 7

6 ^ ^ f 3 ● f ● ● ● 5 2 ● ● 4

1 3 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite (c) Scenario 3 (d) Scenario 4

Figure B.8: Boxplots of fˆ (posterior mean) per scenario and estimation method. The horizontal line in each plot indicates the true value of the parameter f in that scenario.

90 Initial population size

600 600

● ● ● 550 550 1 1 ^ ^

N 500 N 500

450 450 ● ● ● ● 400 400 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite (a) Scenario 1 (b) Scenario 2

600 600 ●

● 550 550 1 1 ^ ^

N 500 N 500

450 450 ● ● ● ● ●

400 400 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite (c) Scenario 3 (d) Scenario 4

Figure B.9: Boxplots of Nˆ1 (posterior mean) per scenario and estimation method. The horizontal line in each plot indicates the true value of the parameter N1 in that scenario.

91 Count variability

80 80

70 70

60 60

50 50 ^ ^ σ σ 40 40

30 30

20 20

10 10 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite (a) Scenario 1 (b) Scenario 2

80 80

70 70

60 60

50 50 ^ ^ σ σ 40 40

30 30

20 20

10 10 true joint true joint approach approach approach approach likelihood likelihood likelihood likelihood likelihood composite composite (c) Scenario 3 (d) Scenario 4

Figure B.10: Boxplots of σˆ (posterior mean) per scenario and estimation method. The horizontal line in each plot indicates the true value of the parameter σ in that scenario.

92 B.3 Bats data analysis

Parameters Mean Credible interval Lc L Lc L First year survival (φ0) 0.45 0.41 (0.32-0.58) (0.30-0.52) Subsequent survival (φ≥1) 0.92 0.94 (0.88-0.97) (0.91-0.97) Fecundity (f) 0.71 0.64 (0.49-0.95) (0.48-0.82) Presence of first year females (τ 0,f ) 0.83 0.84 (0.63-0.99) (0.65-0.99) Presence of females ≥ 1 y.o. (τ ≥1,f ) 0.91 0.89 (0.82-0.99) (0.79-0.98) Presence of first year males (τ 0,m) 0.70 0.71 (0.43-0.96) (0.45-0.96) Presence of males ≥ 1 y.o. (τ ≥1,m) 0.28 0.27 (0.12-0.47) (0.11-0.45) Standard deviation in counts (σ) 4.47 4.17 (2.28-7.03) (2.29-6.38)

Table B.1: Comparison of posterior means and credible intervals between the true likelihood (L) and the composite likelihood (Lc) analysis.

93 Appendix C

Supplementary materials for Chapter 4

C.1 Analysis of the 2012 capture-recapture data using the software MARK

This section contains results from a frequentist analysis of the 2012 capture-recapture data on Chinook at Burman River. The analysis was conducted using the software MARK. As in Section 4.4, we use the POPAN formulation of the Jolly-Seber model. In addition, we set the capture probability on the first capture occasion, for each sex, equal to that on the second occasion. We impose the same equality on the last two occasions.

Estimate 95 % CI Male escapement 2767 (2141-3575) Female escapement 1898 (1266-2845) Total escapement 4664 (3610-5718)

Table C.1: Escapement estimates obtained from analysis of the capture-recapture data in MARK. CI denotes confidence interval.

94