Signal-Jamming Dynamics

D. Bernhardt B. Taub University of Illinois Glasgow University∗†‡ University of Warwick

June 22, 2018

Abstract

We examine industry dynamics and firm learning in a stationary dynamic setting. Firms are subject to persistent common value shocks and private-value cost shocks that are private to each firm. The firms use current and past signals to learn more about their rival’s shocks, and each firm’s awareness of their rival’s attempt to learn leads them to attempt to influence that learning—that is, to signal-jam. The evolving structure in firms’ private information causes firms to change how their outputs weigh current versus past fundamentals on common-value shocks very differently from private- value shocks. The weights placed by a firm directly on the common value demand shocks that it observes shrink over time, exhibiting less persistence than the fundamentals, even though ag- gregate (across firms, direct and indirect via price) weights rise, approaching weights on public information. Thus, firm profits from privately-observed common value shocks evolve in opposite directions over time. In contrast, weights placed on private value cost shocks rise over time in absolute value, but the (oppositely signed) weights placed by the rival not hit by the shocks rise faster, implying that aggregate weights fall over time.

∗This research was carried out in part during my stay at ICEF, Higher School of , Moscow. I acknowledge financial support from the Academic Excellence Project “5-100” of the Russian Government. †We thank William Fuchs and Tibor Heumann for helpful comments. ‡IO˙draft˙06212018/June 22, 2018 1 Introduction

In the real world, firms are continually buffeted over time by shocks to demand and to costs, some of which they learn about through direct common observation, some through private observation, and some of which they attempt to extract from information in price signals. Some shocks are com- mon value in nature—for example, a common demand shock that raises demand equally for each firm’s product—entering directly into every firm’s profit function. Other shocks are private value in nature—for example, a productivity shock specific to one firm’s technology, or a firm-specific demand shock that raises demand only for one firm’s product—and hence only indirectly affect a rival’s profits to the extent that the shock alters the output of the affected firm. In an oligopolistic industry, firms account for the strategic behavior of rivals when unraveling information from price signals, and for how their own actions influence the price signals that rivals receive, and hence their inferences and output choices. In a static setting, Bernhardt and Taub (2015) establish that firms weigh privately-observed private-value shocks by more than they weigh common-value shocks, making the information content of more sensitive to private-value shocks than to common-value shocks. Our current paper asks: What happens in a dynamic econ- omy where firms continue to learn over time about past shocks? How do a firm’s strategic choices change as it learns more from new price signals about past shocks observed by a rival, so that more of that information effectively becomes public? How do the weights firms place on newer vs. older signals evolve, and what are the consequences for the pace of learning? To establish how such learning evolves, we analyze a dynamic stationary setting in which firms are constantly buffeted by new shocks, and strategic behavior is unimpeded by artificial horizons. We take on the challenge posed by Mirman, Samuelson and Urbano (1993) who observe that “the most appropriate model [is] an infinite horizon model in which the parameters of demand curves are subject to continual shocks. Firms are then repeatedly forced to draw inferences about unknown demand curves and to consider the effects of their actions on their rival’s beliefs.” Now firms can learn about rivals’ private shocks via both current and past price signals. In turn, they choose output strategically knowing that current output also influences a rival’s future actions. We suppose that common-value demand evolves according to a persistent autoregressive stochas- tic process. In addition to a publicly-observed component, each firm privately observes innovations to common demand and to its private-value of . Firms also receive noisy price signals. A firm combines the information extracted from the history of price signals with that in the history of its privately- and publicly-observed innovations to determine how much to produce. We solve for the equilibrium of this dynamic oligopoly game. Our key insight is that if the un- derlying driving demand and cost processes are stationary, then equilibrium output strategies will be stationary linear functions of the history of private signals and prices. What makes the analysis challenging is that, in equilibrium, these functions are infinite sums of ar(1) terms: no finite set of sufficient statistics can summarize a firm’s information, or its optimal behavior. Even so, because the output strategy that maximizes a firm’s expected profits conditionally is linear, it also max- imizes expected profits unconditionally. We exploit this equivalence and solve this unconditional problem, using variational methods to find the optimal output functions. We develop an iterative best-response mapping by a firm to its rival’s conjectured linear out- put strategy function. The variational first-order conditions describing the best response functions

1 reveal that public-information components of demand do not affect how a firm weights prices, and that only cumulative public-information demand matters for firm output, and not the timing of different shocks. In contrast, for privately-observed demand and cost shocks, not only does the aggregate level matter for output, but so does their timing—output intensity filters on privately- observed shocks are not scalar-valued. In particular, we prove that signal-jamming incentives lead firms to weigh newer private information more strongly so that outputs on private information decay more quickly than the fundamentals themselves. Equilibrium is described by best-response functions that are consistent with a rival’s output functions. Equilibrium is given by a fixed point to the recursion that maps a rival’s output function to a best-response function. When firms have private information only about demand, we establish the existence of a fixed point in the space of functions of a complex variable that are analytic and square summable on the unit disk, where the second-order conditions for optimization hold, yielding the equilibrium. We numerically characterize the properties of equilibrium strategies. To do this, we develop an alternative best-response recursion that allows for limited private value cost shocks. We numerically solve for the equilibrium by substituting an initial conjecture for the optimal strategy, solving for the best response, and then iteratively substituting the best responses as new conjectured optimal strategies. When private value cost uncertainty is not too high and the driving stochastic processes are not too persistent, the recursion is numerically stable. We disentangle and characterize the very different dynamic responses of firms to current and past demand and cost shocks. The dynamic economy magnifies incentives of firms to signal jam. This reflects that the over-production costs are the same as in a static economy, but the benefits of manipulating a rival’s beliefs also accrue in future periods. As a result, a firm’s output on new pri- vate information about demand exceeds static full information levels, so the current price contains more information about contemporaneous shocks. Firms’ dynamic output strategies do not merely replicate period-by-period static behavior—the equilibrium output processes are higher- order moving average, autoregressive processes, where the autoregressive parameters are smaller than those of the shock processes. This reduced persistence reflects that (1) when firms signal jam by weighing new privately-observed shocks more heavily, this (2) conveys more information to rivals via price signals (in equilibrium a rival is not fooled), (3) causing the rival to increase output at lags on that learned information, (4) which causes the firm seeing the shock to reduce its direct weight on lagged shocks more quickly. Rich structure is imposed on the time paths and co-movements of output and profits. Output on public information does not depend on the time composition of innovations, so the impacts of innovations to publicly-known demand on output decay at a constant rate. In contrast, with privately-observed common demand shocks, the output of the firm observing a shock drops more quickly than does its rival’s. Thus, vis `avis publicly-observed shocks, there is “catch-up” in output on older shocks by the firm that does not see the shock. As a result, firm profits evolve together with publicly-observed demand shocks, but not with privately-observed demand shocks. In the aggregate, the same-sized shock to demand has a smaller effect on output when privately observed; but when privately observed, the decay in output on the shock at lags is slower. Related characterizations hold for privately-observed private-value cost shocks. At any given lag, the two firms collectively place the same output weight on a privately-observed cost shock as one firm

2 places on a privately-observed common demand shock. As an “uninformed” firm learns more via prices about older shocks observed by a rival, its output weight s rise on its rival’s privately-observed older shocks; but the impact for the “informed” firm depends on the private/common value nature of the shock. The ‘informed’ firm’s output weights on older privately-observed common value de- mand shocks fall, while its weights on privately-observed private value cost shocks rise. This reflects that firms produce in the same direction on common values, but produce in the opposite direction on private values. As a result, aggregate output intensities (a) rise over time on privately-observed common demand shocks, reflecting both that duopolists produce more than monopolists on common values and the increased signal-jamming incentives; (b) fall over time on privately-observed private value cost shocks reflecting that duopolists offset each other’s output on private value shocks. Out- put intensities approach public information levels as more information is learned about older shocks. Related literature. Our paper is the first to develop a dynamic analysis of strategic firm in- teraction when firms receive private information about fundamentals that evolve over time and internalize how their outputs influence the price signals observed by rivals, and hence their out- puts. Bonatti et al. (2017) is the closest analysis. In their continuous time, finite horizon model, firms receive privately-observed private-value cost shocks at the outset, Brownian motion demand shocks influence the equilibrium price, and firms learn about cost shocks to rivals via these prices. Cost shocks do not change, so the sole dynamic fundamental is the demand shock process.1 As in our model, firms both learn about shocks to rivals from price histories, and they strategically ma- nipulate price information by overproducing.2 Due to the finite-horizon, non-stationary structure, signal-jamming effects dominate early stages, but eventually decline in magnitude, approaching conventional oligopoly interactions once enough learning occurs. Methodologically, our model is close to the financial literature (Kyle 1989, Bernhardt et al. 2010) where noise enters prices due to “liquidity” traders, and speculators condition on prices, internalizing how influence price. However, oligopoly and speculative incentives are completely different—speculators seek to conceal private information; while firms seek to convince rivals that both demand and costs are low. Producing more aggressively reduces current profits, but alters the inferences that rivals draw from prices, raising future profits. In contrast, trading more aggressively raises current profits, but conveys more information via prices, reducing future profits. As a result, equilibrium dynamics are starkly different. Speculative trading strategies also do not reflect public information because competitive makers set price equal to the asset’s expected value given public information. In contrast, oligopolists over-produce on public information. Bernhardt and Taub (2015) analyze the static counterpart to our dynamic model. They mo- tivate this as the outcome of a in supply schedules (Klemperer and Meyer 1989), in which firms see a preliminary price that is a noisy indicator of the final price. Vives (2011) an- alyzes a static model featuring private information and noisy signals where firms receive private noisy signals about costs, and because costs are correlated across firms, one firm’s signal is relevant for its rivals.3 Firms compete in supply schedules, there is no demand uncertainty, and prices are

1Thus, their model resembles the multiple-trader speculation models of Foster and Viswanathan (1996) or Back et al. (2000). 2An early signal-jamming literature explores belief manipulation in two-date models, where firms are symmetri- cally uninformed about demand or costs and learn from prices (Riordan 1985, Aghion et al. 1991, 1993, Mirman et al. 1993, Caminal and Vives 1996, Harrington 1996, Alepuz and Urbano 2005). Firms condition date-2 output on date-1 price, inducing firms to over-produce to lower date-1 price to try to persuade rivals that the market is less profitable. 3There is also a literature in which firms have private information about demand or costs, and take actions (e.g.,

3 perfectly observable. As a result, prices are privately fully revealing: in equilibrium, a firm’s own cost signal and price yield the same forecast of its costs as when a firm sees the cost signals of each firm. Consequently, a dynamic version of his model would replicate the period-by-period strategic behavior. In contrast, in our model, firms must estimate the shocks observed by rivals via noisy price signals that are not privately fully revealing. Hence, in our dynamic setting, a firm must use the information in the entire history of price signals to forecast a rival’s information, keeping in mind its rival’s incentives to strategically influence those signals.4 Keller and Rady (2003) analyze symmetric learning about demand in a continuous time setting, where demand evolves according to a two-state Markov process and firms perfectly observe each others’s actions. As a result, belief manipulation issues vanish, and there is no private infor- mation.5 We extend this literature by analyzing settings where learning from prices is entangled with the strategic efforts of firms to manipulate beliefs of rivals. Bergin and Bernhardt (2008) explore industry dynamics with uncertainty about both private value cost and common value de- mand, analytically characterizing the stationary entry and exit dynamics of a competitive industry when both demand and each firm’s production costs evolve according to Markov processes. There is also a literature on collusion with imperfect monitoring. Athey and Bagwell (2008) analyze a stationary procurement auction game in which a firm’s costs (privately observed) evolve according to a two-state Markov process, and firms make cheap-talk announcements about costs prior to bidding. Histories matter for incentives, but, with cheap talk, are not used to obtain information about fundamentals. In contrast, our analysis focuses on the evolution of the strategic interaction between learning about primitives from prices and the manipulation of a rival’s beliefs. Frequency-domain approach. In the usual time-domain approach, in each period, given the his- tory of signals that it has observed, a firm chooses its period output function to maximize expected profits, optimizing given correct beliefs about the nature of the past and future optimization by its rival. Due to our model’s linear-quadratic, Gaussian, time-separable, stationary structure, the op- timal policy rules can be expressed as linear weightings of information histories. Crucially, along an equilibrium path these weights do not change: they are independent of the history of realized shocks. Phrased differently, the linear weights on the history of observed shocks and signals that are opti- mal for a firm in each period remain optimal in future periods: along an equilibrium path, optimal strategies are stationary. Moreover, if a firm ‘errs’, taking an action that results in a lower expected lifetime payoff, its rival never realizes this, as all possible price signal histories are consistent with some equilibrium path, so there is no issue about how to specify off-equilibrium beliefs. In addition, the error enters the firm’s payoffs quadratically, implying that its strategies remain linear; existence of a linear equilibrium follows, and because off-equilibrium paths do not affect on-equilibrium path optimization, it is unnecessary to explicitly solve for off-equilibrium path behavior.6 Hansen and Sargent (1980) noted that the first-order conditions attaching to models in which expectation of future endogenous variables figure could be z-transformed, i.e., mapped to the fre- quency domain, and solved; the idea is akin to Fourier transforming a function. Whiteman (1989), building on Davenport and Root (1958), saw that the optimization problem itself could be z- limit price) to signal it. See Harrington (1986, 1987), Caminal (1990), Bagwell and Ramey (1991), or Mailath (1989). 4See Bergemann et al. (2015) for another static model in which agents learn from both private signals and prices. 5Earlier models of learning include McLennan (1984), Aghion et al. (1991), Harrington (1995), Rustichini and Wolinsky (1995) and Keller and Rady (1999). 6For a related discussion see Foster and Viswanathan (1996), p. 1446.

4 transformed and the optimization expressed as a variational problem and solved in the frequency domain. The optimization problem looks for a function that embodies the weights that the optimal policy puts on the signal history. Appendix A sets out these techniques. We go beyond this strategy in that we transform the entire dynamic game to the frequency do- main and, in essence, solve the normal form version of the game, recognizing that one can ignore off equilibrium behavior. Each firm’s objective is the norm of a function involving its policy functions and those of its rival, and optimization amounts to choosing a function to maximize this norm, rather than choosing output directly. The equilibrium of this game is a set of functions—linear filters—that when transformed back to the time domain yields the optimal linear policy for the original game. To establish existence, we iterate on the best response function, which gives rise to a fixed point problem in the space of functions containing linear policy rules, whose fixed point is a Nash equilibrium in the function space (H2). Our numerical analysis uses state space methods developed in the engineering literature that provide techniques for approximating functions in the frequency domain. Proofs are set out in Appendices A-D. Our paper also relates to a growing literature on the “forecasting the forecasts of others” endoge- nous information problem, with several papers attacking the problem with frequency domain meth- ods. Kasa (2000) modeled the forecasting the forecasts of others problem using frequency-domain methods, showing that the forecasting problem alone, which arises in rational expectations models with atomistic agents, simplifies in the frequency domain, as the infinite regress that would appear in the time domain collapses to a single function in the frequency domain. Rondina and Walker (2017) model an endogenous information equilibrium problem, characterizing the equilibrium signals in the economy as a non-invertible reduced-form matrix of functions in the frequency domain. Makarov and Rytchkov (2012) study a model with small risk averse investors that behave competitively, showing there is no finite representation of the equilibrium: because the endogenous variables have infinitely many poles, the equilibrium time domain processes cannot have Markovian dynamics— equivalently, finitely many state variables. Huo and Takayama (2015) similarly find that if infor- mation is endogenous, an infinite regress problem develops and no finite representation is possible. Nimark (2017) also looks at endogenous information aggregation in a linear rational expectations model. His approach iterates on the Euler equation stemming from a representative individual’s optimization problem in a setting with endogenous variables such as prices, accounting for the dependence of those variables on the solution of the Euler equation. Using Hilbert space methods, he derives a contraction property to obtain the equilibrium. Seiler and Taub (2008) and Bernhardt, Seiler and Taub (2010) carry out an analogous iteration in the frequency domain, also leading to a contraction property, We also use this approach. Nimark approximates the equilibrium in the time domain, truncating the MA expansion of the equilibrium intensity, demonstrating a conver- gence property and numerically illustrating the convergence of the approximation. We, by contrast, exploit the advantages of the frequency domain, using numerical methods to truncate the AR ap- proximation of the frequency-domain functions that express the endogenous elements of the model.

2 A benchmark static model

Consider a homogeneous good duopoly in which firms 1 and 2 face stochastic demand

p(q1, q2) =a ˆ + a1 + a2 − (q1 + q2), (1)

5 2 wherea ˆ > 0 is public information, aj ∼ N(0, σa) is a common value component of demand that is private information to firm j = 1, 2, and qj is output by firm j. Firm j’s cost of producing qj is

cj(qj) =cq ˆ j + cjqj,

2 wherec ˆ is public information, common to each firm, witha ˆ > cˆ ≥ 0 and cj ∼ N(0, σc ) is a private- value shock to j’s costs that it privately observes before producing. Becausea ˆ andc ˆ enter profits according to their difference, it helps to definea ¯ ≡ aˆ −cˆ > 0. We calla ¯ public information demand, but becausea ¯ is “net of expected linear costs”, it captures the expected profitability of the market.

In addition to seeing aj and cj, firm j observes a noisy signal p + e of the price at which its 2 output will be sold, where e ∼ N(0, σe ). The noisy rational expectations literature also features the simultaneous observation of a noisy price signal and price determination. However, those models feature small “competitive” agents who do not internalize that their actions affect price, conveying information to others. In contrast, firms in our economy understand that they are “informationally large” and that their outputs affect price. Thus, firm i’s output solves h i max E (¯a + a1 + a2 − (qi + q−i(p + e, a−i, c−i) + e)qi − ciqi ai, ci, p + e , (2) qi∈R where q−i(p + e, a−i, c−i)) is the supply of its rival. That is, firm i internalizes that as it varies qi, it affects the market-clearing price, and hence its rival’s output. To find the equilibrium we solve firm i’s optimization problem given beliefs that its rival’s output is a linear function of its privately observed demand and cost shocks and the price signal:

q−i(p + e, a−i, c−i) = α−ia−i + β−ia¯ − γ−ic−i + δ−i(p + e). (3)

Substituting this structure into the price function (1), confirms that the first-order condition to (2) that describes firm i’s best response has the same linear structure. To solve for the symmet- ric equilibrium—to solve for α, β, γ, and δ—we must characterize the conditional expectation of its rival’s unobserved shocks (1 − α)a−i + γc−i − δe given the net information in the price signal (1 − α)a−i + γc−i + e, which is a linear projection on the information in firm i’s price signal: h i

E (1 − α)a−i + γc−i − δe (1 − α)a−i + γc−i + e = λ[(1 − α)a−i + γc−i + e]. By matching coefficients given the conjectured linear structure in the first-order condition after imposing symmetry, one can solve for α, β, γ, and δ in terms of λ: 1 − λ 1 − λ 1 λ α = , β = , γ = and δ = . (4) 2 − λ 3 − 2λ 2(1 − λ) 2(1 − λ) In equilibrium, these weights must be consistent with the solution for the projection coefficient λ, which solves the nonlinear equation

2 2 2 2 2 (1 − α) σa + γ σc − δσe λ = 2 2 2 2 2 . (5) (1 − α) σa + γ σc + σe

Substituting the solutions in (4) for the output weights into equation (5) yields a recursion in λ whose fixed point fully describes equilibrium outcomes.

6 Definition 1 A symmetric static Bayesian linear equilibrium consists of a λ ∈ [0, 1] such that given λ, firms optimize, i.e., α, β, γ, and δ solve (4); and, given firm optimization, beliefs are correct, i.e., λ solves (5).

In the symmetric linear equilibrium, firms optimize given correct conjectures about the linear structure of each other’s output functions and prices, and the projection coefficient λ is consistent with the linear output strategies. 2 To motivate how we establish existence in the dynamic game, we assume that σc = 0, and estab- lish a contraction mapping for a variant of the recursion in (5). Net unrevealed private information is (1 − λ)2. With no cost uncertainty, we can substitute for α and δ using (4) to rewrite (5) as 2 2 1 (2 − λ) σe 2 − λ 1 − λ = 2 2 2 . (6) 2 (2 − λ) σe + σa 1 − λ Multiplying both sides by 1 − λ yields 2 2 2 1 (2 − λ) σe (1 − λ) = 2 2 2 (2 − λ). 2 (2 − λ) σe + σa Defining x ≡ 1 − λ, yields a recursion in x:  2 2 1/2 1 (1 + x) σe x = 2 2 2 (1 + x) ≡ T [x]. (7) 2 (1 + x) σe + σa Proposition 1 T (x) is a contraction mapping on [0, 1]. Thus, a unique linear equilibrium λ exists.

The logic of this argument extends to the dynamic setting; but there we only establish existence and not uniqueness of the fixed point. We can characterize the fixed points numerically, and the numerical convergence found suggests that the solutions are unique. 2 σc When cost shocks are incorporated, the contraction argument extends for 2 ≤ 2 but the con- σe 2 σc traction property breaks down for 2 > 2. This is because equation (5) is an equilibrium condition σe in which a rival’s reaction is incorporated into a firm’s optimality condition, not merely a first- order condition, and the rival’s reaction hinges on the amount of information in price about the private-value cost shock, resulting in a bifurcation into multiple equilibria at this threshold. The second-order condition for firm optimization is satisfied if (and only if) λ is in the unit inter- 2 7 σc val. For 2 ≤ 2, the recursion in equation (7) also has a fixed point outside the unit interval; this σe fixed point does not satisfy second-order conditions and hence does not constitute an equilibrium. We will show that the fixed points of the recursion in the dynamic model are also interior to the analogue of the unit interval, and so given that they satisfy optimality, they constitute equilibria.

3 The dynamic model

We now develop a stationary model in which demand and costs evolve stochastically according to first-order autoregressive processes. Common-value period demand becomes

p(q1t, q2t) = A1(L)a1t + A2(L)a2t + B(L)at − (q1t + q2t), (8)

7 A firm’s second-order condition is −1/(1 + δ−i) which becomes positive when λ > 1.

7 where L denotes the lag operator and where A1(L), is a linear function of the lag operator, A1(L) = P∞ 0 A1ia1,t−i, and similarly for A2, B, and so on below. The underlying stationary stochastic pro- 2 cesses, ait, at, and cit, are serially-uncorrelated, zero-mean Gaussian processes, with variances σa, 2 2 σa, and σc , respectively. Publicly-observed common-value demand evolves according to 1 B(L)a , with B(L) = . t 1 − bL Firm i = 1, 2 privately observes the common-value demand shock process 1 A (L)a with A (L) = , i it i 1 − ρL and the shocks to its constant marginal private-value costs of production, 1 C (L)c , with C (L) = . i it i 1 − φL The persistence parameters b, ρ and φ are all less than one in absolute value.

Each period t, firms observe price signal pt + et, where the error et is independently and iden- 2 tically normally distributed over time, et ∼ N(0, σe ). In a static setting, firms learn about a rival’s shocks by projecting on the price signal. Now, learning expands to include not only current sig- nals but updating the information in past price signals with more recent signals. With persistent privately-observed demand and cost processes, past price signals help a firm forecast the current costs and demand shocks that a rival privately observes. In addition, since past price signals are not fully revealing, the current price signal helps a firm learn more about past shocks observed by its rival: new price signals improve learning about past shocks. As a result, output intensities on fundamentals—the linear weights on the histories of fundamentals—are not just amplifications of their static analogues, and price and output processes do not simply mirror the dynamics of the fundamentals. Firms discount the future using a common discount factor 0 < η < 1. A firm’s information. Firm i’s information at time t is given by

t ωˆit = {Ai(L)ai,s,B(L)as,Ci(L)ci,s, ps + es, qi,s}s=−∞. (9)

However, along the equilibrium path, firm i’s output history is fully pinned down by the primitives, making its observation redundant. Because we restrict focus to equilibrium paths, we can drop firm i’s output history from its information set (see Foster and Viswanathan (1996)) and focus on

t ωit = {Ai(L)ai,s,B(L)as,Ci(L)ci,s, ps + es}s=−∞. (10)

That is, along an equilibrium path, firm i’s information at time t is summarized by the entire history of the primitive shocks and price signals that it observes.

A firm’s problem. We solve a firm’s optimization problem given a conjecture that its rival’s output is a linear function of its information. Because a firm’s information consists of infinite his- tories, we express strategies as functions of the lag operator. In turn, the price function is linear in the history of the fundamental processes. A firm’s optimization problem inherits the linearity of the price function, preserving the linear-quadratic structure of the dynamic model. Thus, a firm’s

8 best response is linear. Each firm projects its rival’s private information about demand and cost onto the history of price signals. To solve for an equilibrium, we first map the game played by firms to the frequency domain. The linearity and stationarity of the optimal strategies along an equilibrium path mean that these strate- gies correspond to functions of a complex variable, which act as filters on information histories. The conditional expected profit maps into an inner product that is a function of these filters. A firm’s ex- pected profit maximization problem is then a variational problem that is solved by the optimal filter. The frequency domain version of the optimal filter can in principle be mapped back to the time domain to obtain the conventional time-domain version of the optimal strategy. Rather than carry out this inverse mapping, we characterize the optimal filter directly in its frequency domain form. The frequency domain form of the filter facilitates this characterization because the persistence properties of the output process corresponding to the filter are expressed directly as autoregressive parameters. We also find the equilibrium directly in the frequency domain by using fixed point methods that exploit the Hilbert-space structure of the space in which the filters reside. The transformation to the frequency domain rests on the linearity and stationarity of the output and price functions in the equilibria. We begin by verifying this linearity.

Lemma 1 Let firm 2’s output be a linear and stationary functional of the history ω2t. Then firm 1’s best response is a stationary linear functional of its information history ω1t.

Proof: Conjecture that firm 2’s output process is a stationary linear function of its information:

q2t = α2(L)A2(L)a2t + β2(L)B(L)at − γ2(L)C2(L)cit + δ2(L)(pt + et). (11)

Substituting into price yields

−1 pt = (1 + δ2(L)) [A1(L)a1t + (1 − α2(L))A2(L)a2t + (1 − β2(L))B(L)at

+γ2(L)C2(L)cit − δ2(L)et − q1t] . (12)

To solve firm 1’s profit maximization problem, we take the conjectured linear filters of firm 2 and the implied linear structure of prices and then optimize solving

" ∞ X t −1 max E η (1 + δ2(L)) (A1(L)a1t + (1 − α2(L))A2(L)a2t + (1 − β2(L))B(L)at q (·) 1 0 i t +γ2(L)C2(L)c2t − δ2(L)et − q1t) − C1(L)c1t) q1t {a1s, as, c1s, ps + es}s=−∞ , (13) using the structure of price from equation (12) (but where the price function in the conditioning information is left abstract to conserve notation). The first-order condition describing firm 1’s best response to firm 2’s conjectured stationary linear strategy is

 −1 0 = E (1 + δ2(L)) (A1(L)a1t + (1 − α2(L))A2(L)a2t + (1 − β2(L))B(L)at i t +γ2(L)C2(L)c2t − δ2(L)et − q1t) − C1(L)c1t) − q1t {a1s, as, c1s, ps + es}s=−∞ . (14)

9 The linear structure of price, given the conjecture that the rival’s strategy is linear, means that the price function is linear, which means that the conditional forecast of the net information in price implicit in the first-order condition (14) is a linear projection on the history of price signals. Because a projection is linear in information, and because the price signal is linear, firm 1’s best response is also linear, and so firm 1’s best response output function mirrors the conjectured output function of firm 2. Stationarity is immediate: the rival’s conjectured linear strategy was not time-indexed; the resulting linear strategy for firm 1 is also not time-indexed, establishing the proposition. 2 Substituting the linear output function into the price function yields its linear structure,  −1 pt + et = (1 + δ1(L) + δ2(L)) (1 − α1(L))A1(L)a1t + (1 − α2(L))A2(L)a2t  + (1 − β1(L) − β2(L))B(L)at + γ1(L)C1(L)c1t + γ2(L)C2(L)c2t + et . (15)

Substituting the conjectured output strategy for firm 2 from (11) and firm 1’s best response into price, and then substituting the solution for price into firm 1’s output function yields

q1t = α1(L)A1(L)a1t + β1(L)B(L)at − γ1(L)C1(L)cit  −1 + δ1(L)(1 + δ1(L) + δ2(L)) (1 − α1(L))A1(L)a1t + (1 − α2(L))A2(L)a2t  + (1 − β1(L) − β2(L))B(L)at + γ1(L)C1(L)c1t + γ2(L)C2(L)c2t + et . (16)

Frequency-domain reformulation. The characteristics of the process set out in equation (16) would be straightforward were output intensities on the demand and cost shock processes simply scalar multiplications of those first-order autoregressive processes. Were this so, then even if the processes had different autoregressive parameters, when finding the common denominator to char- acterize the autoregressive structure, the weighted summation of the three processes in the price, with the addition of the noise term, would result in an ARMA(3,3) process. Output processes would similarly have an ARMA(3,3) structure. Indeed, if the shock processes all had the same autoregressive parameter, and output intensities were scalar, the price and the quantity processes would have an ARMA(1,1) structure. But as we will demonstrate, the strategic reactions of firms to their signals result in more complex autoregressive price and output structures. To deal with this complexity and find the equilibrium, we transform the problem to the fre- quency domain, as described earlier. In the frequency domain, we find the equilibrium using an extension of the approach in the static benchmark model: we seek a fixed point in the generalization of the projection coefficient λ to the space of functions, where the analogue of λ is now a function.8 The transformation and solution procedure follows these steps:

(i) Express the firm’s optimization in terms of fundamental innovations driving the exogenous processes of the model, i.e., demand and cost shocks, and noise; (ii) Convert the firm’s optimization problem to a variational form in the frequency domain, and find the variational first-order condition for the functions αi(·), βi(·), γi(·), and δi(·);

8Appendix A contains a guide to these methods.

10 (iii) Express output functions in terms of the frequency domain analogue of the static projection coefficient λ;

(iv) Impose symmetry and solve for the fixed point of this function.

Converting a firm’s optimization problem. Because the same linear output rule (i.e., coeffi- cients of the linear function) maximizes a firm’s objective given any equilibrium path of private infor- mation and price signal realizations, the output strategies that solve a firm’s conditional optimiza- tion problem, also solve its unconditional optimization problem. This reflects that in the conditional problem, a firm i can always choose the unconditional output rule. But, were that unconditional rule ever suboptimal, then integrating over all possible histories yields an unconditional expected profit that exceeds that attained from the unconditional output rule, a contradiction of the optimality of the unconditional rule. Hence, the conditional and unconditional output rules must correspond. With this observation in hand, we express each process as a function of the fundamental innova- tion processes, and then exploit the serial independence of the innovation processes to construct the expected value of the objective. This expected value is equivalent to a convolution integral of the functions in the objective, which we analyze using variational methods. Because the fundamental innovations are uncorrelated, the frequency domain formulation of the objective cleaves into parts attached to the variance of each of the innovations processes. In our static model, 1 − λ was the forecast error coefficient of the projection of unobserved pri- 1 vate information on net public information. Algebraic manipulation establishes that 1 − λ = 1+2δ . The analogous object in our dynamic setting (imposing symmetry) is:

D(L) ≡ (1 + 2δ(L))−1.

Like 1 − λ, D(L) is a projection coefficient for the net information in the price signal: viewing δ(L) as the information in price signals used by a firm, (1 + 2δ(L))−1 captures the residual information. To construct the objective in the frequency domain we first define some notation. For an arbi- trary eligible function f(z), we use f ∗ to denote the conjugate function (and conjugate transpose in the case of a matrix of functions) f(ηz−1), recalling that η is the discount factor. We use the projection or “annihilator” operator, [·]+, to eliminate terms with negative powers of z from the −2 −1 1 2 Laurent expansion of a function: if f(z) = ···+b−2z +b−1z +b0 +b1z +b2z +... , then [f]+ = 1 2 b0 +b1z +b2z +... . The annihilator operator accounts for the fact that firms can weight histories of observed signals in their strategies, but not the yet-to-be-observed future realizations of signals. Substituting from (16) and (15), we write the frequency domain version of firm 1’s objective as

I ∗ ∗ ∗ ∗ ∗ 2 ∗ ∗ ∗ ∗ 2 max D(1 − α1)(α1 + δ1D (1 − α1))A1A1σa + D(1 − α2)δ1D (1 − α2)A2A2σa α1,β1,γ1,δ1 ∗ ∗ ∗ ∗ ∗ ∗ 2 + D(1 − β1 − β2)(β1 + δ1D (1 − β1 − β2 )) BB σa (17) ! dz + (1 − Dγ )(1 − δ∗D∗)γ∗C C∗σ2 + Dγ δ∗D∗γ∗C C∗σ2 + (D − 1)δ∗D∗σ2 , 1 1 1 1 1 c 2 1 2 2 2 c 1 e z

This objective is an inner product in the space of linear functions that are mapped by linear strate- gies that operate on histories. The construction of this objective is as follows. In each period the

11 product (pt − C1(L)c1t)q1t appears. Each of the terms pt and q1t is the sum of functions operating on the fundamentals a1t, a2t, at, and so on. For example, pt + et has the term

−1 (1 + δ1(L) + δ2(L)) (1 − α1(L))A1(L)a1t, which is cross-multiplied by

−1 α1(L)A1(L)a1t + δ1(L)(1 + δ1(L) + δ2(L)) (1 − α1(L))A1(L)a1t from firm 1’s output q1t in equation (16). The cross-products of these elements with all other terms are zero because the underlying stochastic processes are uncorrelated. The optimization is in the frequency domain: firm 2’s strategy is expressed as linear functions α2, β2, γ2, and δ2; firm 1 then chooses a vector of best response functions α1, β1, γ1, and δ1. Solution of a firm’s variational optimization problem and equilibrium. As a prelude to computing the variational first-order conditions characterizing best responses, we note that

∂D 2 ∂δ1D 2 = −D and = (1 + δ2)D . ∂δ1 ∂δ1 It is useful to gather terms by implicitly defining two objects, F and J, which solve

∗ ∗ ∗ ∗ F F ≡ D(1 − δ1D ) + D (1 − δ1D) (18)

∗ ∗ ∗ 2 ∗ ∗ 2 2 J J ≡ (1 − α2)(1 − α2)A2A2σa + γ2 γ2C2C2 σc + σe . (19) The static analogue of F ∗F is 2(1 − λ) − λ: it is the projection coefficient structure corresponding to the net information in the noisy price signal (the 2(1 − λ) term) after a firm has extracted information from the price signal (the −λ term). The function J is the filter characterizing the information process from firm 1’s observation of the noisy price signals, the dynamic analogue of the net information (1 − α)a2 + γc2 + e in the price signal in a static setting. The variational derivatives—the Euler equations—of the frequency domain objective (17) are asymmetric, reflecting the ability of firms to weight histories of signals, but not future realizations of the signals; equations of this type are called Wiener-Hopf equations. The solution strategy must account for this. That is, the variational first-order condition nominally resembles the first-order condition for a static quadratic optimization problem, which might be abstractly represented as an equation My = Bx, where the objective is to solve for y. This would be conventionally done by inverting the M matrix. However, in the frequency domain formulation, this inversion cannot be done because it implicitly requires putting weights on future realizations of the history. To circumvent this inversion problem, one follows four steps: (i) Factor the M matrix into the product of two matrices, F and F 0, where F corresponds to the weighting of histories, and F 0 corresponds to the (infeasible) weighting of future realizations; (ii) invert F 0, (iii) apply a projection to the resulting right hand side—the [·]+ operator that eliminates terms that weight future histories, and finally (iv) invert F ; the inverse of F continues to weight past realizations, so it is a “legal” operation. The resulting formula is equivalent to constructing a linear least squares projection—a regression—on the history. We now take variational derivatives and exploit symmetry to solve for the Wiener-Hopf equa- tions describing a firm’s optimal output weight on each of its sources of information:

12 Proposition 2 Firm i’s optimal filters on its direct information sources, ai, a¯ and ci are

−1 −1 h ∗−1 ∗ ∗ ∗ i αi = F A F (D(1 − δ D ) − DδD )A + −1 −1 h ∗−1 ∗ ∗ ∗ i βi = F B F (1 − β)(D(1 − δ D ) − D δD) B + −1 −1 h ∗−1 ∗ ∗ i γi = F C F (1 − δ D )C (20) + Output weights on price signals satisfy the recursive system ∗ ∗ −1 h ∗−1 ∗ i h ∗−1 ∗ i ∗−1 2 J J = F F D A F D A F + σe (21) + +  ∗  1 −1 h ∗−1 2i 1 −1 D + D D = J J σe + J J ∗ . (22) 2 + 2 1 + D +

To establish (22), the key step is to use the envelope property for the other filters to simplify the D recursion, as the Wiener-Hopf equation for δ has incorporated within it the Wiener-Hopf equations for α, β, and γ. Crucially, these terms therefore drop out. Our notion of equilibrium is standard: we seek functions in the appropriate space and with appropriate optimizing restrictions for each firm given the functions of its rival. These functions are elements of the space H2[η], i.e., functions of a complex variable that are analytic and square-

−1/2 integrable on the disk {z |z| < η }.

Definition 2 A stationary symmetric Bayesian Nash equilibrium is a set of analytic functions {α, β, γ, δ} in H2[η] such that

(i) {α, β, γ, δ} are optimal for firm i given firm −i’s identical strategy, and the net information in the price signal process (15);

(ii) The linear filter δ is consistent with firm −i’s choice of {α, β, γ, δ}, i.e., the conditional linear forecast from price is consistent with the output intensities of the rival firm;

(iii) The price process is comprised of the processes generated by the optimal {α, β, γ, δ}.

This definition is structured in the frequency domain, and, as such, does not characterize off- equilibrium path behavior. We need not worry about off-equilibrium paths because if a firm errs, its rival does not perceive this, so considerations of off-equilibrium paths do not influence equilib- rium path behavior. We establish existence of equilibrium when there are no private value shocks.

2 Proposition 3 Let σc = 0 (no private value cost shocks). Then an equilibrium exists.

The proof is in Appendix C. The fixed point argument uses the recursive system (21)-(22). First, defining the right-hand side of (22) by S(D), we write the recursion as

D = S(D),

13 defining a recursion in D (equation (21) is ancillary). We show that S(D), which is a continuous mapping, is bounded by a function T (D); and that this bounding function T (D) is itself a con- traction on the unit disk and as such has a unique fixed point. It follows that S has a fixed point. We then use the Szeg¨oform of the function to establish that the fixed point is not at D = 0. We believe this approach to demonstrating the existence of a fixed point to be original. The frequency-domain algebraic operations in the recursion (factorization, inversion, and pro- jection) ensure that the mapping is contained in the analogue of the unit interval—ensuring that any fixed point of the recursion satisfies optimality for each firm. While our proof applies to the situation with no cost shocks, we conjecture that existence extends to allow for cost shocks, as long as the variance of those shocks is not too large in magnitude. Taking some intuition from the static model (Bernhardt and Taub, 2015), we established there that exis- tence holds even with cost shocks, but that if the variance of the cost shocks exceeds a threshold— the contraction property no longer holds—and, indeed, eventually we cease to obtain existence.

Characterization. Observe that the public component of the demand process, B(L)at, does not affect a firm’s optimal output weight on prices. To see this, note that B(L)at does not enter the equilibrium functions F , J or D. The following is immediate.

Corollary 1 The public information component of common-value demand affects neither the fil- tering of the price process by firms, nor firm output weights on private information.

The converse is not true—the private information components of demand and cost affect out- put weights on publicly-known demand both directly, and indirectly via the output weights on privately-observed demand, via the functions D and F that appear in the solution for β. Corollary 1 implies that we could have added any deterministic component to demand, and solved for the equilibrium: this deterministic demand component would have no effects on the portions of output that reflect private information or on the information contained in prices. Proposition 2 and Corollary 1 show that for publicly-observed demand shocks, only the total level of those shocks and not their timing affect a firm’s output. In sharp contrast, even though the underlying economy is stationary, more than the total level of the privately-observed components of demand and costs matter for a firm’s output—the timing of the shocks also matters. Were the timing not to matter, then the intensity filters would be scalar-valued. We now establish that the equilibrium output processes are higher-order moving average, au- toregressive processes, where the autoregressive parameters are smaller than those of the shock processes, indicating reduced persistence. The non-scalar nature of the output responses is not due simply to the fact that the firms attempt to filter the noisy price signals via signal extraction, that is, to construct an estimate of the underlying exogenous driving processes: the firms do carry out signal extraction, but they also actively engage in signal jamming that alters the time series struc- ture of output in ways that signal extraction does not. That is, the reduced persistence reflects that (1) firms signal jam by weighing new privately-observed shocks more heavily, which (2) conveys more information to rivals via price signals (in equilibrium a rival is not fooled), (3) causing the rival to increase output at lags on that learned information as it grows more informed, (4) which causes the firm seeing the shock to reduce its direct weight on lagged shocks more quickly.

14 We summarize these results as follows:

Proposition 4 In equilibrium:

(i) the intensity filters αi, βi, and γi are not scalar-valued: output intensities are not just ampli- fications of the dynamic shock processes;

(ii) the autoregressive structure of output is due to strategic behavior and is not the result of signal extraction alone;

(iii) the direct intensity (αi) of a firm’s output process is less persistent than the underlying common-value demand shock process.

The proof first shows that if the firms were engaging in signal extraction from prices, but do not attempt to influence the signal via strategic output, then the poles (equivalently, the autoregres- sive coefficients) of the output processes would be identical to the poles of the exogenous demand processes. We then prove that the poles of the output processes are not equal to the poles of the demand processes and that in any equilibrium there are infinitely many poles. Finally, we show that the additional poles generated in an equilibrium exceed the pole of the demand process, implying reduced persistence of the output process relative to the primitive shock processes.

4 Numerical properties of the dynamic model

The complexity of the system of equations (18), (21), and (22), the fixed point of which expresses an equilibrium, precludes analytic solutions, leading us to use the numerical methods laid out in Ap- pendix E.9 The key step in our numerical analysis transforms the D recursion in equation (22) into one that is the dynamic analogue of the contractive recursion for the static economy (equation (7)). To solve the model numerically, we substitute an initial conjecture on the right-hand side of the equation, find the numerical solution and then iteratively repeat the process until a tolerance is sat- isfied. There are three separate sub-algorithms associated with each step of the numerical recursion: minimal realization and balanced truncation, and spectral factorization; we exploit an algorithm for spectral factorization developed in Taub (2009), and applied by Seiler and Taub (2008), and Bernhardt, Seiler, and Taub (2010). These numerical state space algorithms result in approxima- tions, and as such they are accurate only up to a tolerance; each algorithm has a literature around it concerning the bounds on the approximation error associated with the tolerance. In addition, there is a Cauchy criterion tolerance associated with the improvement in the numerical calculation of D in each step of the iteration. Numerically, we find that the algorithm behaves contractively as long as the cost shock variance is sufficiently small, and the driving processes are not too persistent. Table 1 presents the base parameterization for our numerical example, and Table 2 presents the tolerances associated with each sub-algorithm, as well as the Cauchy criterion tolerance for the

9The state space methods that we use are more tractable than numerical approaches that revolve around assuming that past shocks become common knowledge after some (large) number of periods T elapses. The variance-covariance matrix describing the conditional correlation of firm information that one needs to solve grows with T , requiring finding on the order of T 2 covariances. This makes it computationally infeasible to approximate the infinite-T case.

15 Table 1: Base Parametrization

2 2 2 2 η σa σa σc σe ρ α b φ 1 0.5 0.5 0.5 1.0 0.5 0.5 0.5 0.5

Table 2: Numerical algorithm tolerances

Spectral factorization 1 × 10−8 Minimal realization .0001 Balanced truncation .1 Cauchy convergence 1 × 10−6

Table 3: Approximate filter solutions

α(z) β(z) γ(z) δ(z)

.117 0.01 0.052 .27 0.01 0.184 0.54 − 1−.435z + 1−.045z 0.36 − 1−.472z 0.54 + 1−.435z + 1−.045z 0.00 − 1−.472z

overall iteration.10 Each exogenous stochastic process is given by an ar(1) process with persistence parameter 0.5. Thus, the privately observed demand shock process is 1 a = e it 1 − .5L it with corresponding filter representation 1 . (23) 1 − .5z 2 The resulting numerical recursion is stable and converges in 28 iterations when the cost variance σc is not too high relative to the noise in the signal, and the underlying stochastic processes are not too persistent, with a Cauchy criterion of 8.57 × 10−7. Table 3 presents the numerical solutions for the equilibrium filters. The numerically-approximated equilibrium filters have two AR(1) terms. One might naively posit that firms would just linearly offset the fundamental shocks. For exam- ple, this would happen in a repeated version of Vives (2011). Were this so, the equilibrium output intensity filters would also be ar(1) processes with ar parameter 0.5, i.e., they would replicate the autoregressive structure of the driving processes as in equation (23). In fact, they have far more complicated structures, with significant ma parts, highlighting how the dynamic environment induces rich strategic behavior. The dominant autoregressive coefficient for the output filter on privately-observed demand shocks, α(z) is only 0.435, which is well below the ar coefficient of 0.5 that describes the persistence in primitive demand shock process. In addition, the leading con- stant of .54 on the output process adds a significant moving average component to the filter, which

10We actually consider η → 1; this choice does not affect the qualitative findings.

16 further reduces the persistence in output stemming from a firm’s filtering of the private demand shocks. We can express the moving average aspect explicitly by calculating the rational function corresponding to the ARMA representation of α(z): .117 0.01 .433(1 − .0442z)(1 − .811z) 0.54 − + = . 1 − .435z 1 − .045z (1 − .435z)(1 − .045z) which has a substantial MA part (the numerator term) relative to the AR part (the denominator). The reduced persistence in the autoregressive structure of output relative to the underlying processes—and concomitantly the deviation of the autoregressive structure of the price process— reflects the attempts by firms to manipulate a rival’s beliefs via signal jamming, and the learning that firms achieve in equilibrium. To understand the strategic forces, first recognize that vis `avis a static environment, the incentive to signal jam rises, reflecting that the payoff from convincing a ri- val that the market is less promising also accrues in future periods. In equilibrium, the rival unravels this, and is not fooled. In fact, because a firm weights its current private signals by more, prices now contain more information about those signals, speeding learning. In turn, both firms weigh the infor- mation in prices more aggressively—and with the rival producing more, and outputs being strategic substitutes, the firm seeing a private demand shock reduces its direct weights on its private signals at lags, both because its total output intensity falls due to its rival’s increasing weight, and because the firm itself increases its indirect weight on its private signal via its own weight on the price signal. A more revealing way to convey the intrinsic economics is to examine the total output weights on the ait, cit, and at innovations at different lags, i.e., the impulse response functions. For example, exploiting symmetry in the optimal output process in equation (16) and consolidating the direct filtering of ait with the indirect filtering via price yields the total weight filter on the a1t process:  δ(L)  α(L) + 2(1 − α(L)) A (L). (24) 1 + 2δ(L) 1 Because the filters have internal multiplications, to isolate the output weight on a lagged innovation a1,t−k one must add all of the coefficients hitting it, which can be done by calculating the coeffi- cients of successive powers of L in the filter in equation (24), and the analogous filters for the other th k fundamental processes. Recalling that the k lag enters date t demand according to ρ a1,it−k, k 11 we calculate the output weight on ρ a1,it−k, rather than a1,it−k directly. Then by comparing the magnitudes of the impulse response coefficients at different lags one can see whether output intensities rise or fall—if the coefficients are the same at all lags, then the weight a firm places on the residual contribution of the shock does not change. Table 4 presents the lag decomposition. Exploiting symmetry reveals that the sum of the weights on ait and a−it equal the total weight that a firm places on its privately-observed demand shocks (i.e., the sum of the direct (α) and indirect (via prices) output weights), while the sum of the weights on cit and −c−it capture a firm’s total weight on its privately observed cost shocks. Thus, the total weight of 0.51 on ait at lag 0 equals the sum of the indirect weight of 0.08 via price signals (which equals the weight placed on the rival’s shock via the price signal that i receives) plus the direct weight of 0.43 that i places directly on the common value demand shock that it privately observes. In a static setting, output weights equal the monopoly output intensities—but, in a dynamic set- ting with persistent shock processes, output intensities on new privately-observed shocks rise even

11 1 Observe that scaling up output weights at lags by ρk also magnifies the approximation errors.

17 Table 4: Impulse response coefficients

Lag 0 1 2 3 4 5

2¯at 0.71 0.71 0.71 0.71 0.71 0.71 ait 0.43 0.33 0.24 0.17 0.10 0.04 a−it 0.08 0.14 0.20 0.25 0.29 0.33 ait + a−it 0.51 0.47 0.44 0.42 0.39 0.37 cit -0.51 -0.48 -0.44 -0.42 -0.39 -0.37 c−it 0.09 0.18 0.25 0.31 0.36 0.41

further. That is, the 0.51 total weight on the current privately-observed innovation to demand, ait, and the -0.51 total weight (across firms) on firm i’s current cost shock12 exceed their static analogues of 0.5 and -0.5. Again, this increased weight reflects that a firm takes into account that its output on current shocks also affects a rival’s future outputs. Thus, when a privately-observed demand in- novation is positive, increasing the output weight past 0.5 not only helps in the current period by in- ducing a rival to cut back via the reduction in current price, but the lower current price also reduces its rival’s forecast of the innovation in future periods causing the rival to reduce its future outputs. The observation that the total weight that firms place on new privately-observed cost shocks equals the weight that a firm places on privately-observed demand shocks extends to all lags: at any given lag, the weight on cit + c−it equals that on ait: a firm has the same (up to numerical error) output weight on the common-value demand shock that it privately observes as the two firms together have on a privately-observed, private-value cost shock at that same lag. This result reflects that a firm can forecast a rival’s indirect weights via price signals on the shocks that it privately observes, and account for them in its output, offsetting them one-for-one. However, the rival’s indirect outputs on private and common values enter a firm’s first-order conditions linearly with the opposite signs (i.e., a rival’s output of common value demand reduces its impact, while its output on private values reinforce its impact.) A second notable feature of Table 4 is that aggregate (across firms) output intensities on public information (2¯at) are the same at each lag: what matters is the total level of known demand, P∞ k k=0 ρ a¯t−k, and not the values of the shocks that comprise it. The weight of 0.71 exceeds its static analogue of 0.69, highlighting again that signal jamming incentives rise in the dynamic econ- omy. These incentives rise for two reasons. First, a firm gains from overproducing to manipulate a rival into believing that current demand is lower than it is both in the current period and in future periods, as the rival weighs past price signals in future output choices. Second, firms produce more intensively on older information, raising the information content of a price signal, causing the sen- sitivity of a rival’s output to a price signal to rise. This heightened sensitivity increases the gain from over-producing on publicly-known common values in the dynamic economy. A third notable feature is that direct output intensities on the private information components that a firm observes fall over time, while indirect output intensities via price signals rise. At longer lags, increased learning by a rival causes a firm to step back its direct output intensity on the

12Equivalently, -0.51 is the direct weight that a firm places on its lag 0 private cost shock.

18 common value demand shocks that it privately observes.13 The economic forces underlying this are simple. A rival firm learns more about older shocks from seeing additional price signals, which causes both firms to increasingly weight these shocks via their price signals. The rival increases its output intensity both because it learns more and because there are synergies between learning and signal jamming. In turn, to make room for the increased output on the learned part of the common value shock, the firm seeing the shock cuts back its direct weight. Relative to the output weights on innovations to publicly-known demand, which decay at a constant rate, output by a firm privately seeing a demand shock falls off more sharply, and there is “catch-up” in the output weights on those privately observed shocks by the rival that does not see the shock as it learns about them via price. The weights placed by a firm directly on the common value demand shocks that it observes shrink at lags, exhibiting less persistence than the exogenous demand shock process, even though aggregate (across firms, direct and indirect via price) weights rise, approaching weights on public information. Thus, firm profits from privately-observed com- mon value shocks, and the rival’s profits from output on those shocks as it learns about them, evolve in opposite directions over time. Output intensities on private value cost shocks evolve similarly. As a firm learns more about its rival’s private value cost shocks, it produces more intensively in the opposite direction. The higher intensities on cost shocks reflect their private value character—when a firm has a high cost shock, it produces less, leading a rival to produce more via learned information in price, and this causes the firm to weigh the cost shock by even more. This additional response amplifies the information revealed by the price signal about the shock. In contrast, the opposite happens for demand shocks because firms produce in the same direction on demand shocks. To see the consequences, compare the second (a−it) and sixth (c−it) rows of Table 4: the intensity on the learned information about the rival’s common-value demand shocks grows over time, but the intensity on the learned cost shocks grows even faster. This leads the firm hit by the cost shock to further increase its output intensity, so that price signals convey private information even more quickly. This impact of strategic behavior can be illustrated by increasing the variance of the noise pro- 2 cess, σe , which reduces the information content of the price signals, and hence dampens strategic 2 behavior. Doubling the noise variance to σe = 2, leaving other parameters unchanged, yields the fil- ters in Table 5. Relative to our main simulation in Table 3, the constant terms in α and γ move closer

2 Table 5: Approximate filter solutions with noisier prices (σe = 2)

α(z) β(z) γ(z) δ(z)

.057 0.0045 0.022 .055 0.005 0.072 0.52 − 1−.485z + 1−.045z 0.34 − 1−.503z 0.52 + 1−.485z + 1−.045z 0.00 − 1−.501z

to a scalar value of 1/2, and β, the intensity on publicly observable shocks, moves closer to the static Cournot value of 1/3. The weights on the AR terms are reduced, and the autoregressive parameters move closer to the fundamental value of .5—i.e., the firms are in essence just multiplying the driv-

13Direct output intensities approach zero by lag 5. At even longer [unreported] lags, a manifestation of the growing approximation error is that the numerical solutions for direct output intensities become negative.

19 ing processes by a scalar, rather than fundamentally altering the persistence via strategic behavior. Finally, the weight on price shrinks reflecting the reduction in useful information in price signals. More generally, one can show that the effects of varying the primitives of the economy are the ex- pected ones. In particular, output weights on publicly-known common values increase in (a) private 2 2 information fundamental volatilities σa, σc , and the persistence of those processes, α, φ; and (b) 2 are more sensitive to privately observed private-value cost uncertainty (σc ) than privately-observed 2 common-value demand uncertainty (σa). These comparative static results reflect the consequences of the primitives for the informativeness of prices, and hence for how firms weigh price in their outputs, which determines signal-jamming incentives. In turn, reflecting the consequences for the informativeness of prices, indirect output weights via price (a) rise with the persistence and volatility 2 2 of the private information processes which speed learning, and (b) are more sensitive to σc than σa.

5 Conclusion

Our paper takes on a challenge posed by Mirman, Samuelson and Urbano (1993) who observe that “the most appropriate model [is] an infinite horizon model in which the parameters of demand curves are subject to continual shocks. Firms are then repeatedly forced to draw inferences about unknown demand curves and to consider the effects of their actions on their rival’s beliefs.” We consider a stationary setting in which firms are hit by common value demand shocks and private value cost shocks that they privately observe. Firms combine information extracted from the his- tory of price signals with the history of privately- and publicly-observed innovations to determine outputs. Firms learn about older innovations means that firm information changes dynamically. The firms’ mutual attempts to influence beliefs, and to extract information from the endogenous price signals, leads to an infinite regress—not just forecasting the forecasts of others, but influenc- ing the forecasts of others via signal jamming. This translates into the impossibility of a finite representation of the equilibrium. Nevertheless we demonstrate existence via a contraction argu- ment. The contraction property allows us to approximate the equilibrium numerically. We combine several methods from the engineering literature to carry out this numerical approximation. We prove that the public information components of demand do not affect the weights that firms place on price signals and private information; and that only the total magnitude of the pub- lic information component of demand and not its intertemporal composition, enters a firm’s output choice. In contrast, the evolving structure of the firms’ private information due to signal jamming causes firms to change how they weigh current versus past fundamentals in complicated ways: they do not simply amplify shocks directly. For example, as more information is revealed via price sig- nals about older privately-observed common-value demand innovations, collective output weights on their residual contribution to current demand rise. As a result, the autoregressive structure of the outputs do not simply mirror the autoregressive structure of the inputs: the structure is altered by the firms’ strategic behavior. These dynamic characterizations impose testable restrictions on the stochastic processes of prices, individual firm outputs and profits, and their co-movements. We consider two firms. To extend the analysis to N > 2 firms, one must use a vector formula- tion of the translation to the frequency domain to deal with the vector of fundamental processes, complicating analysis (see Ball and Taub 1991). Bernhardt, Seiler, and Taub (2010) provide such an analysis in a strategic financial speculation model.

20 References

[1] Aghion, P., Bolton, P., Harris, C. and Jullien, B., “Optimal learning by experimentation”, Review of Economic Studies 58, 1991, 621-654.

[2] Aghion, P., Espinosa, M., and B. Jullien, “Dynamic duopoly with learning through market experimentation,” Economic Theory, 3(3), 1993, 517-539.

[3] Alepuz, D. and Urbano, A., “Learning in asymmetric duopoly markets: competition in information and market correlation”, Spanish Economic Review, 7(3), 2005, 1435-5469.

[4] Athey, S. and K. Bagwell, “Collusion with Persistent Cost Shocks,” Econometrica, May 2008, 76 (3), 493-540.

[5] Bagwell, K. and G. Ramey, “Oligopoly Limit ”, Rand Journal of Economics, 22, 1991, 155-172.

[6] Back, K., C. H. Cao, and G. A. Willard, “Imperfect competition among informed traders”, Journal of Finance 55, 2000, 2117-2155.

[7] Ball, J., I. Gohberg, and L. Rodman, Interpolation of Rational Matrix Functions (Birkhauser Verlag, Basel, 1990).

[8] Ball, J. and B. Taub, “Factoring spectral matrices in linear-quadratic models,” Economics Letters 35, 39-44 (1991).

[9] Bergmann, D., T. Heumann and S. Morris, “Information and volatility”, Journal of Economic Theory 158, 2015, 427465.

[10] Bernhardt, D. and B. Taub, “Learning about Common and Private Values in Oligopoly,” Rand Journal of Economics, 46(1), 2015, 66-85.

[11] Bernhardt, D., P. Seiler and B. Taub, “Speculative Dynamics,” Economic Theory, 2010, 1-52.

[12] Bonatti, A., G. Cisterna and J. Toikka, “Dynamic oligopoly with incomplete information,” Review of Economic Studies, 2017, 84 (2): 503-546.

[13] Caminal, R., “A Dynamic Duopoly Model with Asymmetric Information” Journal of Industrial Economics, 38(3), 1990, 315-333.

[14] Caminal, R. and X. Vives, “Why Market Shares Matter: An Information-Based Theory” RAND Journal of Economics, 27(2), 1996, 221-239.

[15] Conway, J. B., A Course in Functional Analysis. (Springer-Verlag, New York 1985).

[16] Davenport, W. B., Jr., and W. L. Root, An Introduction to the Theory of Random Signals and Noise. New York: McGraw-Hill (1958).

[17] Dullerud, G., and F. Paganini (2000), A course in robust control theory. New York: Springer.

[18] Foster, D., and S. Viswanathan, “Strategic Trading when Agents Forecast the Forecasts of Others,” Journal of Finance LI (4), 1996, 1437-1478.

21 [19] Hansen, L., and T. Sargent, “Formulating and estimating dynamic linear rational expectations models,” Journal of Economic Dynamics and Control 2 (1980).

[20] Harrington, J., “Limit Pricing when the Potential Entrant is Uncertain of its Cost Function” Econometrica, 54(2), 1986, 429-437.

[21] Harrington, J., “Oligopolistic Entry Deterrence under Incomplete Information” RAND Journal of Economics, 18(2), 1987, 211-231.

[22] Harrington, J., “Experimentation and Learning in a Differentiated-Products Duopoly”, Journal of Economic Theory, 66(1), 1995, 275-288

[23] Huo, Z. and N. Takayama (2017), “Rational Expectations Models with Higher Order Beliefs,” working paper, Federal Reserve Bank of Minneapolis.

[24] Hoffman, K., Banach spaces of analytic functions. (Prentice-Hall, Englewood Cliffs 1962).

[25] Kasa, K., T. Walker and C. Whiteman, “Heterogeneous Beliefs and Tests of Present Value Models”, Review of Economic Studies (2014) 81, 1137-1163.

[26] Kasa, K. (2000): “Forecasting the Forecasts of Others in the Frequency Domain,” Review of Economic Dynamics, 3(4), 726-756.

[27] Keller, G. and Rady, S., “Optimal Experimentation in a Changing Environment”, Review of Economic Studies, 66, 1999, 475-507.

[28] Keller, G. and S. Rady, “Price Dispersion and Learning in a Dynamic Differentiated- Duopoly,” RAND Journal of Economics 34(1), 2003, 138-65.

[29] Klemperer, P. D., and M. A. Meyer, “Supply Function Equilibria in Oligopoly under Uncertainty” Econometrica, 57(6), 1989, 1243-1277.

[30] Kyle, A., “Informed Speculation with Imperfect Competition” Review of Economic Studies 56 1989, 317-55.

[31] Mailath, G. “Simultaneous Signaling in an Oligopoly Model,” Quarterly Journal of Economics, 104(2) 1989, 417-427.

[32] Makarov, I. and O. Rytchkov (2012), “Forecasting the Forecasts of Others: Implications for Asset Pricing,” Journal of Economic Theory, 941-966.

[33] McLennan, A., “Price dispersion and incomplete learning in the long run”, Journal of Economic Dynamics and Control 7, 1984, 331-347.

[34] Mirman, L., L. Samuelson, and A. Urbano, “Duopoly signal jamming”, Economic Theory, 3(1), 1993, 129-149.

[35] Nimark, K. (2017) “Dynamic higher order expectations,” working paper, Department of Economics, Cornell University.

[36] Riordan, M., “Imperfect Information and Dynamic Conjectural Variations” RAND Journal of Economics, 16(1) 1985, 41-50.

22 [37] Rondina, G., and T. Walker (2017), “Confounding Dynamics,” working paper, Department of Economics, Indiana University.

[38] Rozanov, Yu A., Stationary Random Processes. (Holden-Day, San Francisco 1967).

[39] Rudin, W., Real and Complex Analysis. (McGraw-Hill, New York 1974).

[40] Rustichini, A. and A. Wolinsky, “Learning about variable demand in the long run”, Journal of Economic Dynamics and Control 19, 1995, 1283-1292

[41] Sanchez-Pena, R., and M. Sznaier, Robust Systems: Theory and Applications. (John Wiley, New York 1998).

[42] Seiler, P. and B. Taub, “The Dynamics of Strategic Information Flows in Stock Markets,” Finance and Stochastics 12(1), 2008, 43-82.

[43] Taub, B. “Implementing the Iakoubovski-Merino spectral factorization algorithm using state-space methods,” Systems and Control Letters 58(6), 2009, 445-451.

[44] Taub, B. “The equivalence of lending equilibria and signaling-based insurance under asymmetric information,” Rand Journal of Economics, 21(3), 1990, 388-408.

[45] Townsend, R. M. (1983), “Forecasting the Forecasts of Others, Journal of , 91(4), 546588.

[46] Vives, X., “Strategic Supply Competition with Private Information,” Econometrica, 2011, 79, 6, 1919-1966.

[47] Whiteman, C., “Spectral , Wiener-Hopf Techniques, and Rational Expectations,” Journal of Economic Dynamics and Control, 9 (1985), 225-240.

23 A Frequency-domain methods

More limited versions of this appendix appeared in Seiler and Taub [42] and Taub [44], which build on Whiteman [47].

Consider a serially-correlated discrete-time stochastic process at that can be expressed as a weighted sum of i.i.d. innovations: ∞ X at = Aket−k. k=0

While the innovations change through time, the weights Ak remain fixed. The stochastic process can therefore be written succinctly as a function of the lag operator, L: at = A(L)et. The list of weights {Ak} can be viewed as a sequence, and by the Riesz-Fischer theorem ( see Rudin [39], pp. 86-90), are equivalent to functions of a complex variable z. The function of the lag operator A(L) is then math- ematically equivalent to a function A(z) of a complex variable z. The function A(z) can be analyzed with the rules of complex analysis, and this, in turn, fully characterizes the stochastic process at. An important aspect of complex analysis is that the properties of a function are characterized by the domain over which they are specified. The unit disk, or sets that are topologically equivalent to the unit disk, are often the domains of interest. If a complex function on the disk can be expressed as a Taylor expansion—an infinite series where the powers of the independent variable, z, range from zero to infinity—then the function is said to be analytic on the disk. However, some func- tions, termed meromorphic functions, when expressed as a generalized Taylor expansion—a Laurent expansion—have both positive and negative powers of z, defined in an annular region containing the unit circle. This implies that they correspond to functions containing negative powers of the lag operator, which means that they operate on future values of a variable. If a variable is stochastic, this is not permissible, as it would mean that the future is predictable, contradicting its stochastic aspect. In particular, solutions to an agent’s optimization problem cannot be forward-looking. The negative powers of z in meromorphic functions arise from poles. The sum of the negative powers is the principal part.14 To eliminate negative powers of z in a posited solution to an agent’s optimization problem, we use the annihilator operator,[·]+. The annihilator operator sets the co- efficients of negative powers of z in the Laurent expansion to zero, while preserving all coefficients on non-negative powers of z. This leaves a permissible, backward-looking solution to an agent’s optimization problem. A function with both backward- and forward-looking parts is converted to one with only backward-looking parts by the application of the annihilator.15 A second property of a function concerns its invertibility.16 If a serially-correlated stochastic pro- cess can be represented by an invertible operator, the innovations of the process can be completely and exactly recovered by observing the history of the process. That is, the inverse of the operator applied to the vector of realizations of the process yields the vector of innovations, exactly as it would if a finite vector of innovations were converted into a finite vector of realizations by an invertible ma-

14More precisely, a pole is a singularity located inside a region in the complex plane. Poles are only one possible type of singularity: there are also so-called essential singularities. Moreover, singularities need not be isolated points. In this paper the discussion focuses on rational functions, which are characterized by poles alone. Engineering terminology also refers to a function that is analytic as “causal”, and the presence of poles makes it non-causal. 15 2 For domain D it would be more appropriate to refer to [·]+ as the projection operator from L2(D) to H (D), but the term is in widespread use. 16In engineering parlance a function that is analytic and invertible is called minimum phase.

24 trix. A function is invertible on its domain if it does not take on a value of zero at any point inside the domain, and its inverse is then analytic. If, instead, an analytic function takes on a value of zero at a point inside the domain, then it is noninvertible. The inverse of a noninvertible function is not analytic. Hence, one cannot recover the vector of innovations by observing the vector of realizations, because inverting a function with a zero results in a function with negative powers of z. Recovery of the innovations would then depend on knowledge of future realizations. The factorization theorem of Rozanov [38] ensures that any process described by a z-transform with either negative powers of z or zeroes can be converted into an observationally-equivalent process that is characterized by an operator that is invertible and has only non-negative powers of z, so that it is backward-looking. To illustrate the variational method that we use in the frequency domain, we present a simple optimization problem. Consider an individual whose earnings evolve stochastically ac- cording to yt = A(L)et, where et is an i.i.d., zero mean, “white noise” period innovation to earnings. ∞ The consumer’s problem is to adjust bond holdings {bt}t=0 to maximize quadratic utility,

∞ X t 2 max −E β (yt + rbt−1 − bt) , (25) {b } t t=0 where r is the gross interest rate satisfying βr > 1.17 The decision problem is to choose not just the ∞ initial value of bt, but the entire sequence {bt}t=0. This problem implicitly requires the choice of functions that react to current and possibly past states. Stationarity results in the same function applying each period. The stochastic component of a quadratic utility function is essentially a conditional variance. If innovations are i.i.d., then the expectation of cross-products of random variables yields the sum of variances. For white-noise innovations, for k > s, k > r, ( 0, r 6= s E e e  = (26) t−k t−r t−s 2 σe , r = s, because of the independence of the innovations. Expressed in lag operator notation, this is ( 0, r 6= s E (Lre )(Lse ) = t−k t t 2 σe , r = s.

Notice that the “action” is in the exponents of the lag operators. From Cauchy’s theorem (Conway [15]), it is equivalent to write ( 1 I dz 0, r 6= s σ2 zrz−s = e 2 2πi z σe , r = s, where the integration is counterclockwise around the unit circle. In Cauchy’s theorem, z, which is a complex number with unit radius (it is on the boundary of the disk), is represented in polar form: z = e−iθ. Now a more conventional integral can be undertaken, integrating over θ ∈ [0, 2π]. Using

17To make this problem well-defined a (small) adjustment cost must also be included, but we suppress it here because the net effect of the adjustment cost is just to make the solution stationary. Alternatively, one could simply impose the requirement that any solution be stationary.

25 Euler’s theorem, which represents complex numbers in trigonometric form, e−iθ = cos θ − i sin θ, gives θ the interpretation of a frequency, so that z and functions of z are in the frequency domain. Whiteman (1985) showed that a discounted conditional covariance involving complicated lags can be succinctly expressed as a convolution. Consider two serially-correlated processes, at and bt, where ∞ ∞ X X at = Aket−k and bt = Bket−k. k=0 k=0 The discounted conditional covariance as of time t, setting realized innovations to zero, is

 ∞   ∞ ∞ ! ∞ ! X s X s X X Et β at+sbt+s = Et β Aket+s−k Bket+s−k . (27) s=1 s=1 k=0 k=0

Because cross-product terms drop out, coefficients of like lags of et can be grouped:

2 2 β[A0B0+βA1B1 + β A2B2 + ... ]Et[et+1] 2 2 2 +β [A0B0 + βA1B1 + β A2B2 + ... ]Et[et+2] + ... 2 2 = β[A0B0+βA1B1 + β A2B2 + ... ]σe 2 2 2 +β [A0B0 + βA1B1 + β A2B2 + ... ]σe + ... ∞ βσ2 X βσ2 1 I dz = e βkA B = e A(z)B(βz−1) . (28) 1 − β k k 1 − β 2πi z s=0 This is a useful transformation because the integrand is a product. Because the optimal policy for an optimization problem in which the objective is an expected value like that in (27), the repre- sentation in (28) permits a direct variational approach. Equation (28) is an instance of Parseval’s formula, which states that the inner product of analytic functions is the sum of the products of the coefficients of their power series expansions.

A.1 Optimization in the frequency domain

We apply these insights to the consumer’s optimization problem. Hansen and Sargent ([19]) showed that the first-order conditions of linear-quadratic stochastic optimization problems could be ex- pressed in lag-operator notation, z-transformed, and solved. Whiteman noticed that the z-trans- formation could be performed on the objective function itself, skipping the step of finding the time-domain version of the Euler condition.18 The objective is then a functional, i.e., a mapping of functions into the real line. One can then use the calculus of variations to find the optimal policy function. The first step is to conjecture that the solution to the agent’s optimization problem must be an analytic function of the fundamental process et:

bt = B(L)et.

18A similar variational approach in continuous time can be found in Davenport and Root [16], p. 223.

26 The agent’s objective can then be restated in terms of the functions A and B, and the innovations:

∞  2 X t  max −E β (A(L) − (1 − rL)B(L))et . B(·) t=0 Expressing the objective in frequency-domain form, using the equivalence established in (28), the agent’s objective can be written as

βσ2 1 I dz max − e (A(z) − (1 − rz)B(z))(A(βz−1) − (1 − rβz−1)B(βz−1)) . B(·) 1 − β 2πi z

It is immediate that a solution exists using standard methods from functional analysis.19

A.2 The variational method

1 Let ζ(z) be an arbitrary analytic function on the domain {z : |z| ≤ β 2 }, and let a be a real number. Let B(z) be the agent’s optimal choice. His objective can be restated as

βσ2 1 I J(a) = max − e (A(z) − (1 − rz)(B(z) + aζ(z)))(A(βz−1) a 1 − β 2πi dz − (1 − rβz−1)B(βz−1) + aζ(βz−1))) . z This is a conventional problem. Differentiating with respect to a and setting a = 0 yields the first-order condition describing the agent’s optimal choice of B(·):

βσ2 1 I dz J 0(0) = 0 = − e ζ(z)(1 − rz)(A(βz−1) − (1 − rβz−1)B(βz−1)) 1 − β 2πi z βσ2 1 I dz − e ζ(βz−1)(1 − rβz−1)(A(z) − (1 − rz)B(z)) . 1 − β 2πi z

Observe the symmetry between the two integrals—everywhere βz−1 appears in the first integral, z appears in the second, and conversely. Whiteman establishes that the two integrals are in fact equal; we refer to this property as “β-symmetry”. Therefore, the first-order condition simplifies to 1 I dz 0 = − (A(z) − (1 − rz)B(z))(1 − rβz−1)ζ(βz−1) , (29) 2πi z

2 βσe where we have dropped the constant 1−β .

19By reformulating the problem, the Szeg¨o-Kolmogorov-Krein theorem (Hoffman, [24], p. 49) can be applied. The first step in this application is to re-write the argument of the integral as |1 − (1 − rz)BA−1|2|A|2, and then re-interpret |A|2 as the positive measure µ in the theorem. The second step is to transform the objective with a conformal mapping so that the transformed version of (1 − rz) has a zero at 0 instead of at r−1; the modification −1 of the control function (1 − rz)BA then is an element of A0, the analytic functions with a zero at 0. The Szeg¨o-Kolmogorov-Krein theorem also provides a method for computing the value of the optimized objective, but we use a more direct approach here because we are interested in characterizing the controls themselves. We are grateful to Joe Ball for suggesting and discussing with us the application of this theorem.

27 The integral in first-order condition (29) must be zero for arbitrary analytic functions ζ. By Cauchy’s integral theorem, a contour integral around a meromorphic function with all its singu- larities inside the domain—a function of z that has no component that can be represented as a convergent power series expansion within the domain—is zero. Thus, all that is needed to make the integral in (29) zero is to make the integrand singular inside the disk, and to have no singularities outside the disk. The assertion is an indirect way of stating that the contour of integration is treat- ing the outside of the circle (including ∞) as the domain over which the meromorphic function has no poles so that it is analytic there: Cauchy’s theorem asserts that the integral in this sense is zero. Recall that a solution to the agent’s optimization problem must be an analytic function. The next step in the solution is to separate the forward-looking components in (29) from the backward- looking components, so that we can then eliminate the non-analytic portion from our solution. Examining equation (29), note that by construction ζ is analytic, so that it can be represented as a power series, ∞ X j ζ(z) = ζjz . j=0 This means that ζ(βz−1) has an expansion of the form

∞ −1 X j −j ζ(βz ) = ζjβ z , j=0 which has only nonpositive powers of z. The negative powers of z—all but the first term—define singularities at z = 0, which is an element of the unit disk. However, the rest of the integrand in (29), (1 − rβz−1)(A(z) − (1 − rz)B(z)), can have both positive and negative powers of z in its power series expansion. If it were possible to guarantee that only negative powers of z appeared in (1 − rβz−1)(A(z) − (1 − rz)B(z)), then its expansion would take the form

∞ −1 X j −j (1 − rβz )(A(z) − (1 − rz)B(z)) = fjβ z , j=1

−1 for some {fj}, and the product of this with ζ(βz ) would take the form

∞ −1 −1 X j −j ζ(βz )(1 − rβz )(A(z) − (1 − rz)B(z)) = gjβ z . j=1 for some {gj}. Every term in the sum is a singularity, and the integral of the sum is therefore zero. The first-order condition (29) can now be broken out of the integral and stated as follows:

−1 X (1 − rβz−1)(A(z) − (1 − rz)B(z)) = , (30) −∞

P−1 where −∞ is shorthand for an arbitrary function that has only negative powers of z, and hence cannot be part of the solution to the agent’s optimization problem. This type of equation is known as a Wiener-Hopf equation.

28 A.3 Factorization

To solve the Wiener-Hopf equation of a stochastic linear-quadratic optimization problem, we must factor the equation to separate the nonanalytic parts from the analytic parts. The factorization problem is a generalization of the problem of solving a quadratic equation, but there is no general formula for the solution. However, if a candidate factorization can be found, then even if it is not analytic and invertible, there is a general formula for converting that solution into an analytic and invertible factorization (Ball, Gohberg and Rodman[7]). The Wiener-Hopf equation (30) can be restated as:

−1 X (1 − rβz−1)(1 − rz)B(z) = (1 − rβz−1)A(z) + . (31) −∞ At this point it should be emphasized that the solution will be a Wiener filter, as opposed to a Kalman filter. A Kalman filter recursively reacts to information from the previous period and converges as the history of information evolves after its initiation. A Wiener filter explicitly treats history as infinite and therefore a starting date in the infinite past; the stationarity of the model dictates the use of the Wiener approach. It is tempting to solve for B(z) by dividing the left-hand side by the coefficient of B(z), −1 P−1 (1 − rβz )(1 − rz). However, this would multiply the −∞ term by positive powers of z, making it impossible to establish the coefficients of the positive powers of z in the solution. The correct procedure is first to factor the coefficient of B(z) into the product of analytic and non-analytic functions: (1 − rβz−1)(1 − rz) = βr2(1 − (βr)−1βz−1)(1 − (βr)−1z). Because by assumption 1 < β1/2, the first factor on the right-hand side, (1 − (βr)−1βz−1), when r inverted has a convergent power series (on the disk defined by {z |z| ≤ β1/2})) in negative powers of z. Hence, we can divide through by this factor to rewrite the Wiener-Hopf equation as

−1 (1 − rβz−1) X βr2(1 − (βr)−1z)B(z) = A(z) + , (32) 1 − (βr)−1βz−1 −∞ where we use the fact that −1 1 X (1 − (βr)−1βz−1 −∞ has only negative powers of z. Because the left-hand side of (32) is the product of analytic functions, applying the annihilator to (32) yields  −1  2 −1 (1 − rβz ) βr (1 − (βr) z)B(z) = −1 −1 A(z) . (1 − (βr) βz ) + Because (β1/2r)−1 < 1, it follows that the inverse of (1 − (βr)−1z) is also analytic, so that we can divide by (1 − (βr)−1z) to solve for the optimal B(z), (1 − (βr)−1βz−1)−1(1 − rβz−1)A(z) B(z) = + . [(βr2)(1 − (βr)−1z)]

29 A more explicit solution for B(z) obtains if the endowment process is ar(1), so that 1 A(z) = . 1 − ρz

Proposition 5 establishes a key result that is used repeatedly: the annihilate when there is an AR(1)  −1  construct can be simply calculated—if A(z) is an ar(1), then f(βz )A(z) + = f(βρ)A(z).

Proposition 5 If f is analytic on β−1/2 and ρ < β−1/2, then

 ∗ −1 −1 f (1 − ρz) + = f(βρ)(1 − ρz) .

Proof: Direct computation. 2

Proposition 6 shows that the proposition about annihilates of first-order AR functions must be used with caution. If there is a zero in the annihiland, the proposition changes.

h 1− 1 z−1 i Proposition 6 Let a < β−1/2. Then f ∗ a = 0. 1−az +

P : roof " # 1 − 1 z−1 1  az − 1 1 f ∗ a = z−1f ∗ = −f ∗z−1 = 0. 1 − az a 1 − az a + + + 2 Using Proposition 5, it follows that

(1 − rβ)A(z) B(z) = . [(βr2)(1 − (βr)−1βρ)(1 − (βr)−1z)]

This formula has a simple “permanent income” interpretation: the agent applies the filter 1 − rβ [(βr2)(1 − (βr)−1βρ)(1 − (βr)−1L)] to the endowment process A(L)et in order to smooth .

A.4 Equivalence of time domain and frequency domain approaches

Our focus has been on generating the Wiener-Hopf equation in the frequency domain and solving it there. We now illustrate in our consumer optimization problem the general result that the time domain approach is equivalent, but less convenient. Going back to the time domain objective in equation (25),

∞ X t 2 max −E β (yt + rbt−1 − bt) , (33) {b } t t=0

30 we can calculate the first order condition at time t:

0 = −(yt + rbt−1 − bt) + rβEt[(yt+1 + rbt − bt+1)].

This is an Euler equation in which the future value of the choice variable, bt+1, appears, with the expectation of that future variable conditional on time t information. This makes the equation non-trivial in general. The technical challenge is to calculate the expectation of the future value of the choice variable, bt+1. The solution is to posit that bt has a fixed and stationary structure, described by a filter. First recall that yt is a serially correlated stationary process described by

yt = A(L)et.

Posit that the choice variable has the stationary structure

bt = B(L)et, (34) for all t. Because the conjectured structure applies to future values of the choice variable bt, the expectation can be calculated. Note also that we made this conjecture in the development of the frequency domain approach, but before the optimization step.

For the conjecture to be correct, we must show that if the future and past values of b, bt+1 and bt−1, take this form, then it is also optimal for bt to take the same form.

Substituting the structure of yt and the conjectured form of bt into the Euler equation yields

0 = −(A(L)et + rB(L)et−1 − B(L)et) + rβEt[(A(L)et+1 + rB(L)et − B(L)et+1)].

Consolidate this further by expressing the future and lagged values of the functions using lag operators:

2 −1 −1 (1 − rL)B(L)et + r βB(L)et − rβEt[B(L)L et] = A(L)et − rβEt[(A(L)L et].

The next step is key. One can use the linearity of the expectation operator, and the fact that the expected value of the conditioning information is an identity, to yield

2 −1 −1 Et[(1 − rL + r β − rβL )B(L)et] = Et[(1 − rβL )A(L)et]

Carrying out some algebra yields

−1 −1 Et[(1 − rL)(1 − rβL )B(L)et] = Et[(1 − rβL )A(L)et]. (35)

What remains is to solve this equation for B. We have yet to specify any structure on B. However, we know that B cannot weigh future re- alizations of the innovations et: by construction they are hidden from view. However, it is possible that the general solution for B in equation (35) contains such terms. So let us posit that B has two parts, Bˆ, which weights only current and past values of et, and B˜, which weights only future values of et, pretending for the moment that this is allowed. Substituting into (35) then yields

−1 −1 −1 Et[(1 − rL)(1 − rβL )(Bˆ(L) + B˜(L ))et] = Et[(1 − rβL )A(L)et].

31 We can isolate the B˜ term:

−1 −1 −1 −1 Et[(1 − rL)(1 − rβL )Bˆ(L)et] = Et[(1 − rβL )A(L)et] + Et[(1 − rL)(1 − rβL )B˜(L )et].

The part on the right hand side will now be zeroed out by the expectation operator because it entails only future, unrealized and unobservable innovations. We write it suggestively as

−1 −1 −1 Et[(1 − rL)(1 − rβL )Beˆ t] = Et[(1 − rβL )A(L)et] + Et[f(L )et], where all we care about is that f only has terms involving L−1, L−2, and so on. Thus, when the expectation is taken, the result is zero; f can otherwise be arbitrary. Removing the expectation yields

(1 − rL)(1 − rβL−1) = (1 − rβL−1) + f(L−1).

Formally, the additional step of z-transforming the equation can now be undertaken, yielding equa- tion (31), the same Wiener-Hopf equation obtained by taking the variational first order condition −1 P−1 of the z-transformed objective, except that here we use the notation f(L ) instead of −∞. As shown in the solution procedure for the frequency domain version of equation (31), this equa- tion has a solution, validating the conjecture expressed in equation (34) that a stationary solution to the Euler equation exists. Thus, we have validated our assertion that the frequency-domain methods yield the same results as the time domain methods, that is, optimizing over optimal quantities in the time domain is equivalent to optimizing over functions in the frequency domain due to stationarity.

32 B Derivations and proofs

Proof of Proposition 1. Define  2 2  (1 + x) σ φ(x) ≡ 2 2 2 . (36) (1 + x) σ + σa Clearly, for x ∈ [0, 1], we have T [x] ∈ [0, 1]. Also, if we treat φ as a constant, then for x ∈ (0, 1),  1/2 1/2 0 1/2 1 1 1 1 0 < |T [x]| = φ < . 2 2 1 + x 2 Treating φ as a function of x, 11/2 1 1 φ0(x) 11/2 1 |T 0[x]| = φ1/2 ( )1/2 + φ1/2 (1 + x)1/2 2 2 1 + x φ 2 2 After algebra, we have φ0(x) 2 = (1 − φ) . φ(x) 1 + x Putting it all together, 11/2 1  1 1/2 11/2  1 1/2 |T 0[x]| = φ1/2 + (1 − φ)φ1/2 2 2 1 + x 2 1 + x 11/2  1 1/2 1  = φ1/2 + 1 − φ . 2 1 + x 2 1/2 3 1 1/2 1 Noting that φ ( 2 − φ) achieves a maximum of 2 when φ = 2 , we have 1  1 1/2 1 |T 0[x]| ≤ ≤ . 2 1 + x 2 which completes the argument. 2 Annihilator Lemma. If we can express J as the sum of pole terms (partial fractions) then we can invoke a lemma proved in Seiler and Taub (2008) (recall that η is the discount factor):

Lemma 2 (“Annihilator lemma”)  −1 −1  −1 (1 − az) f(ηz ) + = (1 − az) f(ηa).

20 Proof of Proposition 2. The variational first-order conditions, for α1, β1 and γ1 are: −1  ∗ ∗ ∗  ∗ 2 ∗ ∗ ∗ 2 X α : α1 D(1 − δ1D ) + D (1 − δ1D) A Aσa = (D − (δ1 + δ1)D D)A Aσa + −∞ −1  ∗ ∗ ∗  ∗ 2 ∗ ∗ ∗ ∗ 2 X β : β1 D(1 − δ1D ) + D (1 − δ1D) B Bσa = (1 − β2)(D(1 − δ1D ) − D δ1D) B Bσa + −∞ −1  ∗ ∗ ∗  ∗ 2 ∗ ∗ ∗ 2 X γ : γ1 D(1 − δ1D ) + D (1 − δ1D) )C1 C1σc = (1 − δ1D )C1 C1σc + . (37) −∞

20In this setting these equations are Wiener-Hopf equations.

33 Rewrite these Wiener-Hopf equations as −1 ∗ ∗ 2 ∗ ∗ ∗ 2 X α : α1F FA Aσa = (D − (δ1 + δ1)D D)A Aσa + (38) −∞ −1 ∗ ∗ 2 ∗ ∗ ∗ ∗ 2 X β : β1F FB Bσa = (1 − β2)(D(1 − δ1D ) − D δ1D) B Bσa + (39) −∞ −1 ∗ ∗ 2 ∗ ∗ ∗ 2 X γ : γ1F FC1 C1σc = (1 − δ1D )C1C1 σc + (40) −∞ ∗ Inverting the terms on the left-hand side, applying the [·]+ operator, and inverting the remaining left-hand side coefficients yields the formulas for α, β and γ in Proposition 2. Derivation of the δ recursion. The next step involves developing a recursion in δ. That equation will form the initial recursion that will be developed into the recursion (22).

The δ1 Wiener-Hopf equation is ∗ ∗2 ∗ ∗ 2 ∗ ∗2 ∗ ∗ 2 D(1 − α1)((1 + δ2)D (1 − α1))A1A1σa + D(1 − α2)(1 + δ2)D (1 − α2)A2A2σa  ∗ ∗2 ∗ ∗  ∗ 2 + D(1 − β1 − β2) (1 + δ2)D (1 − β1 − β2 ) BB σa ∗ ∗2 ∗ ∗ 2 ∗ ∗2 ∗ ∗ 2 ∗ ∗2 2 − (1 − Dγ1)((1 + δ2)D )γ1 C1C1 σc + Dγ2(1 + δ2)D γ2 C2C2 σc + (D − 1)(1 + δ2)D σe ∗2 ∗ ∗ 2 ∗2 ∗ ∗ 2 − D (1 − α1)(α1 + δ1D(1 − α1))A1A1σa − D (1 − α2)δ1D(1 − α2)A2A2σa ∗2 ∗ ∗ ∗ 2 − D (1 − β1 − β2 )(β1 + δ1D(1 − β1 − β2)) BB σa −1 ∗2 ∗ ∗ 2 ∗2 ∗ ∗ 2 ∗2 2 X + (D γ1 )(1 − δ1D)γ1C1C1 σc − D γ2 δ1Dγ2C2C2 σc − D δ1Dσe = . −∞ ∗ ∗ Divide out D and bring out the factor (1 + δ2) to obtain:  ∗ ∗ ∗ 2 ∗ ∗ ∗ 2 D(1 − α1)D (1 − α1))A1A1σa + D(1 − α2)D (1 − α2)A2A2σa ∗ ∗ ∗ ∗ 2 + D(1 − β1 − β2)D (1 − β1 − β2 )BB σa ∗ ∗ ∗ 2 ∗ ∗ ∗ ∗ 2 ∗ 2 − (1 − Dγ1)D γ1 C1C1 σc (1 + δ2) + Dγ2D γ2 C2C2 σc + (D − 1)D σe ∗ ∗ ∗ 2 ∗ ∗ ∗ 2 − D (1 − α1)(α1 + δ1D(1 − α1))A1A1σa − D (1 − α2)δ1D(1 − α2)A2A2σa ∗ ∗ ∗ ∗ 2 − D (1 − β1 − β2 )(β1 + δ1D(1 − β1 − β2)) BB σa −1 ∗ ∗ ∗ 2 ∗ ∗ ∗ 2 ∗ 2 X + (D γ1 )(1 − δ1D)γ1C1C1 σc − D γ2 δ1Dγ2C2C2 σc − D δ1Dσe = . −∞ Define H by ∗ ∗ ∗ 2 ∗ ∗ 2 H H ≡ (1 − α1)(1 − α1)A1A1σa + (1 − α2)(1 − α2)A2A2σa ∗ ∗ ∗ 2 ∗ ∗ 2 ∗ ∗ 2 2 + (1 − β1 − β2 )(1 − β1 − β2)BB σa + γ1 γ1C1C1 σc + γ2 γ2C2C2 σc + σe , and rewrite the Wiener-Hopf equation as ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 2 ∗ 2 ∗ D DH Hδ1 = D DH H(1 + δ2) − (D γ1 C1C1 σc + D σe )(1 + δ2) −1 ∗ ∗ ∗ 2 ∗ ∗ ∗ ∗ 2 ∗ ∗ ∗ 2 X − α1D (1 − α1)A1A1σa − β1D (1 − β1 − β2 )BB σa + γ1D γ1 C1C1 σc + −∞

34 with solution

−1 −1h ∗ ∗−1 ∗ ∗ 2 2 ∗ δ1 = H D DH(1 + δ2) − H (γ1 C1C1 σc + σe )(1 + δ2)

∗ ∗ 2 ∗ ∗ ∗ 2 ∗ ∗ 2i + α1(1 − α1)A1A1σa + β1(1 − β1 − β2 )BB σa − γ1γ1 C1C1 σc (41) + This Wiener-Hopf equation for δ, equation (41), can be simplified by noticing that it contains the Wiener-Hopf equations for α, β, and γ. Crucially, these terms therefore drop out. Begin by rearranging the Wiener-Hopf equation for α, equation (37) as:

−1  ∗ ∗ ∗  ∗ 2 ∗ ∗ 2 X (1 − α1) D(1 − δ1D ) + D (1 − δ1D) A Aσa = D A Aσa + . −∞ Substituting for D = (1 + 2δ)−1 and D∗ = (1 + 2δ∗)−1 this simplifies to:

−1 ∗  ∗  ∗ 2 ∗ ∗ 2 X (1 − α1)D D 2 + δ1 + δ1) A Aσa = D A Aσa + . (42) −∞ Dividing out D∗ and grouping terms yields

−1  ∗ ∗  ∗ 2 X D 1 + δ2 − δ1 − α1(2 + δ2 + δ2)) A Aσa = . (43) −∞

Next, examine the α1 elements in the δ1 Wiener-Hopf equation:

∗ ∗ ∗ ∗ ∗ ∗ 2 D D(1 − α1)(1 − α1)(δ1 − δ2 − 1) + D (1 − α1)α1A Aσa. ∗ Dividing out the D term (it appears in all of the non-α1 terms as well) and then bringing out the common factor D yields

∗  ∗ −1  ∗ 2 D(1 − α1) (1 − α1)(δ1 − δ2 − 1) + D α1 A Aσa.

The inner terms can be rearranged to yield

−1 ∗  ∗ ∗  ∗ 2 X −D(1 − α1) 1 + δ2 − δ1 − α1(2 + δ2 + δ2) A Aσa = , −∞ with the last equality following from the Wiener-Hopf equation (37). Thus, these terms all drop out of the δ1 Wiener-Hopf equation (41).

The same reasoning applies to the γ1 and β1 terms. Rearrange the γ1 Wiener-Hopf equation (37) as −1 ∗ ∗ ∗ ∗ ∗ X (γ1D D(2 + δ2 + δ2) − (1 − δ1D )) C1 C1 = . −∞ Further manipulation yields

−1 ∗ ∗ ∗  ∗ 2 X D γ1D(2 + δ2 + δ2) − (1 + δ2) C1 C1σc = . (44) −∞

35 The γ1 elements from the δ1 Wiener-Hopf equation (41) are

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 2 (D Dγ1 γ1(δ1 − δ2 − 1) + D γ1 (1 + δ2) − D γ1 γ1) C1 C1σc .

Grouping terms yields

∗ ∗ ∗ −1 ∗ ∗ 2 D γ1 Dγ1(δ1 − δ2 − 1 − D ) + 1 + δ2 C1C1 σc , and further algebra yields

−1 ∗ ∗ ∗ ∗  ∗ 2 X −D γ1 Dγ1(2 + δ2 + δ2) − (1 + δ2) C1C1 σc = , −∞ with the last equality following from (44). Thus, the γ1 elements drop out of the δ1 equation (41).

The β1 Wiener-Hopf equation is

−1  ∗ ∗ ∗  ∗ 2 X (1 − β1 − β2)D D(2 + δ2 + δ2) − D (1 − β2) B Bσa = . (45) −∞

The β terms from the δ1 Wiener-Hopf equation are

 ∗ ∗ ∗ ∗ ∗ ∗ ∗  ∗ 2 D D(1 − β1 − β2 )(1 − β1 − β2)(δ1 − δ2 − 1) + D β1(1 − β1 − β2 ) B Bσa.

Consolidating terms yields

∗ ∗ ∗  ∗  ∗ 2 D (1 − β1 − β2 ) D(1 − β1 − β2)(δ1 − δ2 − 1) + β1. B Bσa.

Adding and subtracting 1 − β2 yields

∗ ∗ ∗  ∗  ∗ 2 D (1 − β1 − β2 ) D(1 − β1 − β2)(δ1 − δ2 − 1) − (1 − β1 − β2) + (1 − β2) B Bσa.

Consolidating yields

−1 ∗ ∗ ∗  ∗  ∗ 2 X D (1 − β1 − β2 ) D(1 − β1 − β2)(−2 − δ2 − δ2) + (1 − β2) B Bσa = , −∞ with the last equality following from (45). Thus, the β elements also drop out of the δ1 equation (41).

With the extraneous terms eliminated, the δ1 Wiener-Hopf equation (41) reduces to

−1 ∗  ∗ ∗ 2 ∗ ∗ 2 2 ∗ ∗ 2 ∗ X D D (1 − α2)(1 − α2)A2A2σa + γ2 γ2C2C2 σc + σe (1 + δ2 − δ1) = D σe (1 + δ2) + . (46) −∞

We next substitute the definition of J from (19) in the main text, into equation (46) to obtain

−1 ∗ ∗ ∗ ∗ 2 ∗ X D DJ J(1 + δ2 − δ1) = D σe (1 + δ2) + . −∞

36 Grouping the terms above yields:

−1 ∗ ∗ 2 ∗ X DJ Jδ1 = (DJ J − σe )(1 + δ2) + . −∞

Solving yields −1 −1 h ∗−1 2 ∗ i δ1 = D J (DJ − J σe )(1 + δ2) . (47) + To attempt a stable recursion, further manipulation is required. We have

−1 X ∗ ∗ 2 ∗ = DJ J(1 + δ2 − δ1) − σe (1 + δ2) −∞

 ∗ δ1 −1 2 = D J J(1 − ∗ ) − D σe 1 + δ2  ∗ δ1 2 = D J J(1 − ∗ ) − (1 + δ1 + δ2)σe 1 + δ2  ∗ δ1 2 = D J J(1 − ∗ ) − (1 + δ1 + δ2)σe . 1 + δ2 Now apply the annihilator operator:   1 ∗ δ1 (δ1 + δ2)D = −D + D 2 J J(1 − ∗ ) . σe 1 + δ2 + Using symmetry and dividing by 2D yields

−1   1 D ∗ δ δ = − + 2 DJ J(1 − ∗ ) , (48) 2 2σe 1 + δ + The static form of this equation exactly mirrors the static δ recursion in Bernhardt and Taub (2015). Reformulating the D equation. Manipulating the definition of D in equation (48) yields    1 + 2δ ∗ δ 1 + 2δ = 2 DJ J 1 − ∗ . σe 1 + δ + Dividing out 1 + 2δ yields    2 ∗ δ σe = DJ J 1 − ∗ . (49) 1 + δ + Next undo the annihilator operator and write

−1 X 1 + δ∗ − δ (1 + δ∗)σ2 + = J ∗J . (50) e 1 + 2δ −∞

Now substitute 1 1 + D δ = (D−1 − 1) and 1 + δ = 2 2D

37 into the first-order condition for δ, equation (50), to obtain

−1  D∗−1 − 1 − (D−1 − 1) 1 + D∗ X J ∗J 1 + D = σ2 + , (51) 2 2D∗ e −∞ which reduces to −1  D D∗  1 + D∗ X J ∗J D∗D + − = σ2 + . (52) 2 2 2 e −∞

Starting with equation (52) we derive a new recursion in D. We first multiply equation (52) by 2:

−1 ∗ ∗ ∗ ∗ 2 X J J (2D D + D − D ) = (1 + D )σe + . (53) −∞ Next, add and subtract D:

−1 ∗ ∗ ∗ ∗ 2 X J J (2D D + 2D − (D + D)) = (1 + D )σe + (54) −∞ and then rearrange to obtain

−1 ∗ ∗ ∗ 2 ∗ ∗ X 2J JD (1 + D ) = (1 + D )σe + J J(D + D) + . (55) −∞ Divide by 2(1 + D∗): −1 1 1 D∗ + D X J ∗JD = σ2 + J ∗J + . (56) 2 e 2 1 + D∗ −∞ We have isolated D on the left-hand side and the D∗ terms on the right-hand side, and the J terms are the compound term J ∗J. After dividing by J ∗, we impose the annihilator projection operator and divide by J to obtain the new recursion  ∗  1 −1 h ∗−1 2i 1 −1 D + D D = J J σe + J J ∗ , (57) 2 + 2 1 + D + or equivalently   1 −1 h ∗−1 2i 1 −1 2Re (D) D = J J σe + J J ∗ . (58) 2 + 2 1 + D + The next step is to express the J ∗J terms in terms of D (equivalently δ). Expressing J in terms of D. First, we impose symmetry on equations (18) and (19) to obtain

F ∗F ≡ D(1 − δ∗D∗) + D∗(1 − δD) (59)

∗ ∗ ∗ 2 ∗ ∗ 2 2 J J ≡ (1 − α )(1 − α)AA σa + γ γCC σc + σe . (60) 2 In the existence proof in Appendix C, we will assume that σc = 0, yielding ∗ ∗ ∗ 2 2 J J ≡ (1 − α )(1 − α)AA σa + σe . (61)

38 To elaborate on the structure of J, we develop the structure of α: h i α = F −1A−1 F ∗−1(D(1 − δ∗D∗) − DδD∗)A . + The first step is to convert this to an expression in 1 − α. Write

−1 X F ∗F α = (D(1 − δ∗D) − D∗Dδ)A + . −∞ Substitution from equation (59) and further manipulation yields

−1 X (D(1 − δ∗D∗) + D∗(1 − δD)) Aα = (D(1 − δ∗D∗) + D∗(1 − δD)) A − D∗A + . −∞ Bringing the common term over to the left-hand side and factoring yields:

−1 X (1 − α)(D(1 − δ∗D∗) + D∗(1 − δD)) A = D∗A + . −∞ Solving yields h i (1 − α) = A−1F −1 F ∗−1D∗A . + Notice that if A is a single-pole function, we can apply Lemma 2. The annihilator term will then have the structure of A, multiplied by a constant. The leading A−1 term will cancel the A term inside the annihilator and thus (1 − α) takes the form

(1 − α) = cF −1, where c is a constant. Thus, h i h i∗ (1 − α)AA∗(1 − α∗) = F −1 F ∗−1D∗A F ∗−1D∗A F ∗−1, (62) + + the left-hand side of which appears in equation (61). Now apply Lemma 2: assuming that A is of single-pole form, this expression becomes

(1 − α)AA∗(1 − α∗) = f(ηa)2AA∗F −1F ∗−1 where f(ηa) is F (ηa)−1D(ηa), reflecting the result of the annihilator lemma. To complete the derivation we need to characterize F ∗F in order to characterize J ∗J. We have 1 + δ 1 1 (1 − δD) = = D(1 + D−1) = (1 + D). 1 + 2δ 2 2 Therefore, 1 F ∗F = (D(1 − δ∗D∗) + D∗(1 − δD)) = (D∗ + D + 2D∗D). (63) 2 Thus, ∗ ∗ −1 h ∗−1 ∗ i h ∗−1 ∗ i ∗−1 2 J J = F F D A F D A F + σe (64) + +

39 and after applying the annihilator lemma,

∗ 2 ∗ ∗ −1 ∗ 2 2 J J = 2f(ηa) (D + D + 2D D) A Aσa + σe (65) or ∗ −2 2 ∗ ∗ −1 ∗ 2 2 J J = 2F (ηa) D(ηa) (D + D + 2D D) A Aσa + σe . (66) Recapitulating, we have the recursion

 ∗  1 −1 h ∗−1 2i 1 −1 D + D D = J J σe + J J ∗ , (67) 2 + 2 1 + D + which is equation (22) and which can also be expressed as

 ∗  1 −1 h ∗−1 2i 1 −1 2Re[D ] D = J J σe + J J ∗ . (68) 2 + 2 1 + D + Equations (64) and (67) comprise a system in the two functions J and D. We use these two equations in an iteration argument. 2

40 C Existence of equilibrium in the dynamic model

To establish existence of equilibrium in our dynamic setting, we use the recursive system in D in equation (67), showing that the associated mapping is bounded by a function that is, itself, a 2 contraction. In our static existence argument, we assumed that cost shocks were zero, i.e., σc = 0, and we developed a recursion in λ, proving that it was a contraction on the unit interval. If a wider domain for the recursion is allowed, the geometric approach in Bernhardt and Taub (2015) shows that fixed points of the recursion can exist outside the unit interval, but the output associated with the first-order conditions evaluated at those fixed points is suboptimal for the firms. Also, the contraction property in the unit interval breaks down if the cost shock variance is too high. As in the static model, existence fails in the dynamic model when cost shocks are too volatile, leading us to establish existence of equilibrium in the dynamic model when the cost shocks are zero. We also note that our recursion captures the restriction to the unit interval via the factorization operation: when a spectral density is factored—the generalization of taking a square root—the smaller root is automatically chosen, so that the function in question has roots inside the unit disk. Expressing the contraction property via a variational derivative. One would like to prove that the recursion in (67) is a contraction, just as in the scalar model. In a functional recursion such as (67), a mapping µ is a contraction if there exists a positive constant ∆ < 1 such that

kµ(D ) − µ(D )k 1 2 < ∆. kD1 − D2k When µ is a differentiable function, we can write

µ(D1)−µ(D2) kD1 − D2k kµ(D1) − µ(D2)k kD1−D2k µ(D1) − µ(D2) ∂ ≤ = ∼ µ(D) , kD1 − D2k kD1 − D2k kD1 − D2k ∂D where the derivative is the variational derivative. The result is the norm of the derivative, not the derivative of the norm. To develop intuition, we verify that this condition holds in a simplified quasi- scalar version of the model, using a conventional derivative rather than a variational derivative. Intuition from scalar case. The dynamic model reduces to the scalar model when the persis- tence parameters b, ρ and φ are zero. Intuition about the contraction property can be gleaned by considering the ordinary derivative of a scalar version of (67). Then, D, F and J become ordinary real variables, not functions of z, so the annihilator operator becomes the identity, D∗ = D, etc.

−1 h D∗ i We first analyze the second term in the D recursion (67): in the scalar version of J J ∗ , 1+D + 1 2D D the annihilator operator is not present, leaving 2 1+D = 1+D . The derivative is d D 1 = < 1, dD 1 + D (1 + D)2 as long as D is strictly positive. Now consider the first term in (67). In a scalar setting, substituting from (63), equation (62)

41 becomes h i h i∗ (1 − α)AA∗(1 − α∗) = F −1 F ∗−1D∗A F ∗−1D∗A F ∗−1 + + 1 ∼ (D∗D)(F ∗ F )−2 (1 − a)2 D2 1 ∼ n 1 2 2 2 (69) ( 2 (2D + 2D )) (1 − a) D2 1 = 1 2 2 2 4 (2D + 2D ) (1 − a) 1 1 = (1 + D)2 (1 − a)2 1 where we arbitrarily write the scalar value of A as 1−a . The recursion equations (66) and (67) then become a difference equation system,

2 1 1 2 2 Jn+1 = 2 2 σa + σe (70) (1 + Dn) (1 − a) and 1 1 2 Dn Dn+1 = 2 σe + . (71) 2 Jn 1 + Dn Notice from the definition of J in equation (19) that the first term is bounded, i.e.,

1 1 2 1 2 σe ≤ , 2 Jn 2 and J(0) 6= 0, implying that it is not a fixed point.

We can analyze the nonlinear system (70) and (71) for stability. For large values of Dn, Jn+1 is 1 approximately σe, so that Dn+1 is driven to approximately 2 + 1. For very small values of Dn, Jn approaches a constant, and therefore Dn+1 also approaches a constant. Moreover, this fixed point 2 is stable, as (setting σa to one and a to zero for simplicity) the derivative 1 1 3 2 2 d 1 1 2 2 (1+Dn) (1+Dn) σe 1 2 σe = σe 2 =     dDn 2 Jn  1 2 1 2 1 2 (1 + Dn) 2 + σ 2 + σe 2 + σe (1+Dn) e (1+Dn) (1+Dn) is obviously a fraction if Dn is positive. Thus, if the initial value of Dn is positive, this (scalar) recursion is stable and has a positive fractional fixed point.

Existence proof for the dynamic model

The main recursion, equation (68), is complicated by the presence of the annihilator operator. Were the annihilator operator not there, we could execute a direct proof of the contraction prop- erty. However, the annihilator operator necessitates an indirect approach. The indirect approach entails finding an ancillary mapping T that (1) bounds the mapping S implicitly defined by the right hand side of (67), and (2) is itself bounded and a contraction. The ancillary mapping is tractable, so it is straightforward to characterize the domain over which it is a contraction. We show that S also maps this domain into itself and is continuous. It then follows that a fixed point of S exists.

42 Lemma 3 Let X be a Banach space. Let T : X → X and S : X → X be mappings such that

(i) T is bounded and a contraction;

(ii) S is continuous with kSk ≤ kT k on a compact and convex subset X of X that includes the fixed point of T .

Then a fixed point of S exists in X.

The space X in our setting is a Hardy space H2[η], that is, the space of square integrable func-

−1/2 tions on the η disk, i.e., the elements z in the complex plane such that {z |z| ≤ η }. The function D, which is our object of interest, is an element of X. The space H2[η] is a Hilbert space, and as such is a complete normed vector space, and as such is a Banach space.21 Because it is a Banach space we can establish that there is a fixed point by invoking Schauder’s fixed point theorem. Lemma 3 does not deliver uniqueness of the fixed point. However, we conjecture that the fixed point and associated equilibrium are, in fact, unique (given sufficiently little uncertainty about private values). Proof: The sole issue is to identify the compact subset X. We define the set using the contraction property. Let x∗ be the fixed point of T . Let

∗ X0 ≡ {x : 0 ≤ |x| ≤ |x |} .

This set is closed and bounded. The upper bound of |T [X0]| is finite due to the contraction prop- erty. The upper bound of T [T [X0]] is also finite, and by the contraction property must be closer to the fixed point |x∗|; and this holds for all iterations T [...T [T [x]] ... ]. Define

X ≡ {x : 0 ≤ |x| ≤ sup |T [X0]|} , which is a closed and bounded (compact) set and trivially convex. Because kSk < kT k, and because T [X] ⊆ X by the contraction property, S[X] ⊆ X. Because S is a continuous mapping, we can apply Schauder’s fixed point theorem to establish that a fixed point of S exists. 2 To apply Lemma 3, we first show that our recursion satisfies its key inequality, i.e., there is a bounding mapping T that is a contraction. Viewing D as an element of H2[η], define the mapping: 1 D T [D] ≡ + . (72) 2 1 + D Also define the mapping associated in the recursion in (68) by S. We begin with:

Lemma 4  ∗  1 −1 h ∗−1 2i 1 −1 2Re[D ] 1 D |D| = |S[D]| = J J σe + J J ∗ ≤ + . (73) 2 + 2 1 + D + 2 1 + D That is, |S| ≤ |T | .

21See Seiler and Taub (2008), Appendix C for properties of H2[η].

43 Proof: The first term is easy. The absolute value (and therefore the norm) passes through the annihilator operator (see the appendix of Seiler and Taub 2008):

1 −1 h ∗−1 2i 1 −1 2 2 1 −1 ∗−1 2 J J σe ≤ J σe = J J σe ≤ 1 2 + 2 2 by the construction of J. For the second term, we have

 ∗  ∗ 1 −1 D + D 1 −1 D + D J J ∗ ≤ J |J| ∗ 2 1 + D + 2 1 + D ∗ 1 D + D ≤ 2 1 + D∗ ∗ D  ∗   1 D 1 + ∗ 1 D D ≤ D ≤ 1 + ∗ ∗ ∗ 2 1 + D 2 1 + D D ∗ D D ≤ ≤ . 1 + D∗ 1 + D 2 Note that the cancellation of J and J −1 would not necessarily work were we calculating the sup norm instead of the absolute value at the same value of z. The next lemma establishes that the bound mapping T is contractive.

Lemma 5 If the domain of T is such that |1 + D| > 1, then T is a contraction and T is bounded.

Proof: The final term of T becomes contractive: the variational derivative is ∗ ∂ D 1 = < 1, when |1 + D| > 1. ∂D 1 + D∗ (1 + D∗)2 2 To establish boundedness, we must prove that |1 + D| > 1 in the vicinity of the fixed point. We do this in Proposition 3.

Lemma 6 S is a continuous mapping.

Proof: Because the recursion is nonlinear, we establish continuity component by component: −1 the elements of S include inversion (J ), the annihilator operator ([·]+), factorization (J), and the construction of J, which involves D nonlinearly. We must show that each of these elements preserves continuity. To show that J is a continuous function of D, we use the Szeg¨ofactorization. The Szeg¨ofactorization is the generalization of representing a function in exponential-log form: for a function f(x), we can write eln f(x). If the function f is a function of a complex variable and is two sided, i.e., f(z) = A(z)A(z−1), then the Szeg¨oform allows one to effectively take the square root and recover A(z). One can then indirectly demonstrate properties of the function A(z).22

22See Taub (1990) for a more thorough discussion of the Szeg¨oform.

44 Using the Szeg¨oform for the J function, we can write

1 1 H ζ+α ln(J∗(ζ)J(ζ)) dζ J(α) = e 2 2πi ζ−α ζ .

Because the exponential function is continuous, we just need to show that J ∗J is continuous in D. The annihilator operator can be expressed with the Szeg¨oform,

1 1 H ∗−1 −1 dζ h ∗−1i −1 ln(J J ) J = J(0) = e 2 2πi ζ , + as can the inverse J(z)−1,

1 1 H ζ+z ∗ −1 −1 dζ −1 ln(J (ζ) J(ζ) ) J (z) = e 2 2πi ζ−z ζ .

However, because |z| = 1 and in the Szeg¨ofactorization, |α| < 1, this expression holds in the limit. Recalling equations (63) and (64),

∗ ∗ h ∗−1 ∗ i h ∗−1 ∗ i ∗ ∗ −1 2 2 J J = 2 F D A F D A (D + D + 2D D) σa + σe (74) + + and we just need to establish continuity for this object. (D∗ + D + 2D∗D)−1 is continuous in D for  ∗−1 ∗  D > 0. We can also establish that F D A + is continuous in D by using the Szeg¨ofactorization, 1 but because of the annihilator lemma we calculate it at a (recall that A = 1−az ):

1 1 H ζ+a ∗ ∗ ∗ −1 dζ h ∗−1 ∗ i −1 − ln(D D(D +D+2D D) ) F D A = F (a) D(a)A(z) = e 2 2πi ζ−a ζ A(z) (75) + which is continuous due to the continuity of the product, exponential, and (D∗ + D + 2D∗D)−1. 2 To apply Lemma 3 we show that D = 0 is not a fixed point of the recursion S.

Lemma 7 J(0) 6= ∞.

Proof: Use the Szeg¨ofactorization to write

1 1 H ζ+a ∗ ∗ ∗ −2 dζ −1 h ∗−1 ∗ i − ln(D D(D +D+2D D) ) F F D A = e 2 2πi ζ−a ζ A(z). (76) + The inner term can be written as D 1 1 1 D∗D(D∗ + D + 2D∗D)−2 = = . D (D∗ + D + 2D∗D) D D∗ ∗ ( D∗ + 1 + 2D) ( D∗ + 1 + 2D) ( D + 1 + 2D )

D D∗ D∗ and D are bounded away from zero (to see this, express D in polar form). Therefore, the whole denominator is bounded away from zero at D = 0. Thus, J(0) is finite. 2

Lemma 8 D = 0 is not a fixed point of the bounding function T .

45 1 Proof: Substitution yields T [0] = 2 . 2 To complete the argument, we find a positive lower bound for the mapping S, i.e., a bound D such that if |D1| > D then |S[D1]| > D, so that any fixed point is then bounded away from zero.

Lemma 9 There exists a lower bound D such that |S[D]| > D for all D.

2 1 −1  ∗−1 2 1 σe 1 Proof: From Lemma 7, we have J J σ ≥ 2 = . 2 2 e + 2 σe 2 Having determined a lower bound we can combine this with the upper bound induced by the mapping T , leading to the following corollary:

Corollary 2 There is a ξ > 0 with

Xξ ≡ {D : ξ|D| < 1} such that for D ∈ X, S[D] ∈ Xξ.

We now have the ingredients to assert

Proposition 7 A fixed point of S exists.

Proof: The contraction property for T requires |1 + D| > 1:

(1 + D)(1 + D) = 1 + 2Re(D) + |D|2 > 1, so |D|2 Re(D) > − , 2 which is satisfied by Re(D) > 0. If we define X0 as the smaller set

X0 ≡ {D : Re(D) > 0} we satisfy this requirement. Then we just need to show that if Re(D) > 0, then Re(T [D]) > 0, i.e., 1 D + Re > 0. 2 1 + D

D This follows because the denominator of 1+D has a larger real part than the numerator, but the same imaginary part. This means that if we represent D in polar form, D = D eiθ, then 1 + D 0 ˜ iθ˜ ˜ ˜ will have the form D0e , where θ < |θ| and D0 > D0. In expressing this in geometric form it D 1 D is evident that Re 1+D > 0 and therefore that 2 + Re 1+D > 0. Thus, T maps X0 into X0, and in addition |1 + D| > 1 and the contraction property holds for T as defined in equation (72). The four properties listed in Lemma 3 are satisfied for S and T by Lemmas 4, 5, 6. It remains to verify that the fixed point reflects an optimum, i.e., that the associated solutions of the Wiener-Hopf equations for α, β, and δ are optimal. Consider α. Inspection of the objective (17)

46 and the variational first-order condition (37), reveals that the variational second-order condition for α is I dz − F ∗FA∗Aσ2 = − kFAk2 < 0. a z 2 Thus, the solution for α in equation (20) represents an optimum. The optimality of β and δ follow similarly. 2 Proposition 3 in the main text follows.

47 D Proof of Proposition 4

In this appendix we develop a series of lemmas that together establish Proposition 4. Our first lemma establishes a basic property of generalized signal extraction; we then go on to demonstrate that this property if violated in equilibrium; we also show that firms do not simply amplify their input shocks, and we also characterize the persistence properties of output.

D.1 Signal jamming versus signal extraction

In linear-quadratic settings such as ours, the problem of inferring an underlying process from a noisy signal is called signal extraction. Because firms in our model observe price with noise, one might posit that the alteration of the time series structure of output relative to the input demand and cost shock processes is entirely due to the fact that the firms are carrying out signal extraction to impute the shock processes that their rivals privately observe, but that there is no richer strategic behavior. To demonstrate that this is not so, we first outline a fundamental characteristic of signal extraction. Consider a first-order autoregressive (AR) process 1 x = A(L)e = e . t t 1 − αL t Because we are going to treat the problem in terms of poles we write this as

α−1 − e , L − α−1 t where α−1 is the pole. We suppose that this process cannot be observed directly, but that there is an observable signal process yt = A(L)et + ut, where ut is a white noise process, uncorrelated with et. The signal extraction problem is to construct a filter F (·) that optimally extracts information 23 from this noisy signal, producing an output process F (L)(A(L)et + ut). We have:

Lemma 10 The poles of the signal extraction output process are the same as the poles of the input process. Signal extraction is expressed entirely in the moving average part of the filtered process.

Proof: We use frequency-domain methods. We solve the optimal filtering problem

2 min E(A(L)et − F (L)(A(L)et + ut) F I 1  ∗ 2 ∗ 2dz = min (A − FA) (A − FA)σe + F F σu . (77) F 2πi z The variational first-order condition is

∗ 2 2 A (A − FA)σe − F σu = 0. 23This proposition was stimulated by a personal exchange with Ken Kasa and Charles Whiteman.

48 P−1 The right hand side is zero instead of −∞ because the filter is allowed to be two-sided. The solution is ∗ 2 2 −1 ∗ F = (A Aσe + σu) A A The poles of A completely cancel, leaving an ARMA part where the poles (the denominator part) come from the MA part of the noisy process. When one hits the noisy process with this filter, the MA part of the noisy process cancels, but the forward-looking part of the filter’s poles remains. Repeating the process with a one-sided filter, the variational first-order condition is

−1 ∗ 2 2 X A (A − FA)σe − F σu = −∞ or −1 ∗ 2 2 ∗ 2 X −(A Aσe + σu)F + A Aσe = . −∞ Define the factor H by ∗ ∗ 2 2 H H = A Aσe + σu. The poles of H are the same as the poles of A. The solution is

−1  ∗−1 ∗  F = H H A A + . Invoking the assumption that A is the sum of AR(1) terms—that is, that the number of poles of A exceeds the number of zeroes, we can apply Proposition 5 term by term, with the result that the  ∗−1 ∗  poles of the annihilate H A A + are the same as the poles of A. The poles of H, which are the zeroes of H−1, then cancel the poles of the annihilate. 2 When one hits the noisy process—which is characterized by H—with this filter, the H parts cancel, leaving a sum of AR’s, but weighted differently from the original A process. The numerator of the filtered process—the MA part—has the signal extraction information. Importantly, there are no new poles; the original poles, and only those poles, are preserved in the product FH.

Signal jamming leads to pole proliferation

In our model, were signal extraction the only force determining the output process, the poles of the input process would be preserved in the output process, and there would be no new poles. We now prove that there are more poles (equivalently, the autoregressive coefficients that are the inverses of the poles) of the output process than of the input process. In fact, there must be infinitely many such poles. We begin with the essential observation that the output process is non-scalar:

Lemma 11 The intensity filters αi, βi, and γi are not scalar-valued: output intensities are not just amplifications of the dynamic shock processes.

Proof: Because the exogenous shock processes are first-order autoregressive (AR(1)) processes, the frequency domain filters A, B, and C have single poles. Suppose by way of contradiction that

49 the intensities αi, βi, γi and δi are all scalar. Then, in equation (20) (the equation for γi), because the C(·) function is a single pole form, the projection operator [·]+ yields a constant multiplying C(·) (from the “annihilator lemma”, Lemma 2). The C(·) function is then canceled by the C−1(·) term, leaving the right hand side as a pure scalar if F −1 is scalar. Thus, γ will be a scalar if F −1 is a scalar. For F −1 to be a scalar, F must be scalar. The definition of F in equation (18) reveals that F is scalar only if δ and D are scalar, which is true by our maintained assumption. For D to be scalar, J would need to be scalar. But J cannot be scalar: in equation (21), a scalar F and D means that J is driven by the A filter, which is exogenously non-scalar, a contradiction. 2

We focus on the output on common value demand shocks, α(L)A(L)ait, where A(L)ait is a first-order autoregressive process, 1 A(z) = , 1 − az but the argument generalizes to cost shocks as well.

Lemma 12 J has infinitely many poles.

Proof: Recall that one can express an arbitrary rational and analytic function either in ratio form or in partial fractions form: 1 1 1 − (a + b)z + abz2 + = 1 − az 1 − bz (1 − az)(1 − bz) which is in partial fractions form on the left hand side, and grouped form on the right hand side, and has both pole terms (1 − az) and (1 − bz) in the denominator. We can then apply Proposition 2, the “annihilator lemma,” to isolate and characterize pole terms in the partial fraction expression, term by term. Recall the recursive equation for D, (22),

  ∗   1 −1 h ∗−1 2i 1 D + D D = J J σe + J ∗ . (78) 2 + 2 1 + D + Multiplying both sides by J, rewrite the recursion as     ∗   1 1 1 h ∗−1 2i 1 D JD − JD ∗ = J σe + J ∗ . 2 1 + D + 2 + 2 1 + D +

1 On the left hand side, because 1+D∗ is a conjugate function, we can invoke Proposition 2 to the h 1 i expression JD ∗ , and conclude that it has the same poles as the “naked” term JD on the 1+D + left hand side. ∗−1  ∗−1 2 Also observe that because J is a conjugate term, J σe + is a constant and does not contribute poles on the right hand side. If JD has finitely many poles, then the right hand side of this expression has only the poles of J, which is fewer. This can only happen if (i) D is a constant, or (ii) J and D have infinitely many poles. Were D a constant, then from equation (18), F would be constant. By (20), α would be a con- stant, because the A and A−1 terms would cancel after invoking Proposition 2. But if α is constant,

50 then the formula for J, (19), would have a single pole, the pole from A. However, J would also have an induced zero after finding a common denominator; therefore J −1 would have a pole, consisting of this zero. Examining the formula for D in equation (22), while the poles of J will cancel, D will have an additional pole, the induced pole in J −1. This contradicts the hypothesis that D is a constant. Because D cannot be a constant, J cannot be constant, as proven in Proposition 11. Therefore, the number of poles is infinite. 2 Because J has infinitely many poles, it follows that D, F , and α each have infinitely many poles.

Output is less persistent than the exogenous shock processes

Having established that the poles of the endogenous output processes proliferate relative to the exogenous shock processes, we can show that the additional poles are larger than the input poles. This translates into smaller autoregressive coefficients, i.e. reduced persistence in output. Intu- itively, the output process filter is a convex combination of multiple autoregressive terms. For the sake of illustration suppose there are two output processes: 1 1 φ + (1 − φ) 1 − a1z 1 − a2z where a1 > a2. Thus, as the weight φ shrinks, the second autoregressive part gets more emphasis and the persistence of the overall process declines. This is what happens in the dynamic model. Recalling that the poles are the inverses of these autoregressive coefficients, we have:

Lemma 13 The additional poles exceed the poles of the input processes.

2 Proof: To reduce clutter we assume that there are no cost shocks, i.e., σc = 0 and that the 1 privately observed demand shock process is a single-pole AR process, A(z) = 1−az . Suppose initially that α is a constant. Expressing the formula for J in equation (19) in rational form yields (1 − α∗)(1 − α )σ2 + (1 − az)(1 − az−1)σ2 2 2 a e . (79) (1 − az)(1 − az−1) Then, in the formula for J, equation (19), J will have a single pole, the pole from A. It will also have an induced moving average part, the numerator term, and this term will have a zero that can be found by solving ∗ 2 −1 2 (1 − α2)(1 − α2)σa + (1 − az)(1 − az )σe = 0. (80) It is clear that the largest zero of this equation, which is the relevant one, exceeds 1/a and increases 2 as σe increases. Label this zero as 1/a1. The inverse J −1 appears in the formula for D in equation (20), and the zero as calculated above now becomes the pole of D. Because the new pole exceeds 1/a, the corresponding autoregressive coefficient is smaller that a, which translates into less persistence of the associated process. In turn, D appears in (20), the formula for the output intensity filter α; the new pole is thus pro- pogated in the formulas for the other terms such as α. The higher magnitude of the pole translates into a reduced autoregressive coefficient in α, resulting in reduced persistence in the output process.

51 We now repeat the process: use the modified formula for α to construct the ratio form of J as above in (80), except that α will now have a pole term, and so the numerator equation will be more complicated

∗ 2 −1 −1 2 (1 − α2)(1 − α2)σa + (1 − az)(1 − az )(1 − a1z)(1 − a1z )σe = 0. (81)

Now, two solutions will exceed 1, and we can proceed as before: each of the new poles will exceed the input pole. 2 A more formal result of this kind can be found in Seiler and Taub (2007), Lemma C.19, Ap- pendix C, p. 13. We can combine these results to demonstrate Proposition 4: Lemma 11 establishes Proposition 4 (i); by Lemma 10 and Lemma 13 we have demonstrated Proposition 4 (ii); and Lemma 13 we have demonstrated Proposition 4 (iii).

52 E State space methods in the numerical analysis

To numerically simulate and iterate the recursion in equation (87), we construct algorithms using state space methods from the control systems engineering literature. These methods suppose that the stochastic processes in a system have an ARMA (autoregressive-moving average) structure, but can be arbitrary vector processes that take the general form

xt = Axt−1 + But (82) where xt and ut can be vector processes, and A and B are appropriately conformable matrices. In engineering settings the xt process would be considered the state process, and the ut process would be a serially uncorrelated and i.i.d. process, i.e., white noise. When xt and ut are scalar-valued and A and B are scalar constants, this is simply an AR(1) process. Intuitively, for an AR(1) process to be stable requires that |A| < 1; this stability notion generalizes: a more general vector-valued system is stable if the eigenvalues of A are less than one in absolute value. There might be an output process driven by this state,

yt = Cxt + Dut (83) where yt can be a vector process, and C and D are appropriately conformable matrices. We write (82) using the lag operator L: xt = ALxt + ut and if A has the appropriate structure, i.e., with eigenvalues less than one, we can solve:

−1 xt = (I − AL) ut.

Substituting into (83) yields −1 yt = (C(I − AL) B + D)ut that is, the output process is expressed entirely in terms of the underlying fundamental or driving process ut. This is the generalization of an ARMA, not just an AR, process. It is convenient to use the fact from Appendix A that the lag operator maps into an element of the complex plane, which we denote z, and we can represent the process yt by its z-transform,

C(I − Az)−1B + D. (84)

This expression is the generalization of a rational function. It helps to re-express models of this type with the inverse of the A matrix,

C(zI − A)−1B + D, (85) where the eigenvalues of A now need to exceed one for stability to hold. This engineering convention is used in the exposition from this point forward; the eigenvalues are then referred to as the poles. A process expressed in this way is a state space realization. Importantly, the realization form in expression (85) is preserved when familiar algebraic opera- tions are carried out on the expression. For example, the sum of two processes that are constructed

53 from the same driving process ut can be expressed as   −1   −1 −1  A1 0 B1  (C1(zI−A1) B1+D1)+(C2(zI−A2) B2+D2) = C1 C2 zI − + D1 + D2 , 0 A2 B2 (86) which has the same basic form as (85). Because the form is preserved, the engineering literature has developed a special notation for it:  A B  C D The addition operation can be expressed in this notation by   A1 0 B1  0 A2 B2  . C1 C2 D1 + D2 Similarly, multiplication and inversion are expressed as   A1 B1C2 B1D2  0 A2 B2  . C1 D1C2 D1D2

 A − BD−1C BD−1  −D−1C D−1 The details of these and other operation can be found in Dullerud and Paganini [17], p. 99, or in Sanchez-Pena and Sznaier [41], p. 465-470. Other operations such as transposition and complex conjugation are also straightforward. There are two other operations that can be expressed using state space methods: the annihi- lation operator [·]+ discussed in Appendix A, and spectral factorization. All of these operations— addition, multiplication, conjugation and transposition, annihilation, and spectral factorization— are used in the recursion equation (87). Finally, it is possible to numerically calculate norms using the realization by solving a Lyapunov equation; see p. 475 of Sanchez-Pena and Sznaier (1998)). The realization for a system is not necessarily unique. Specifically, we can construct transfor- mations of a realization to manipulate the A matrix, that is, we can calculate

CT −1(zI − T AT −1)−1TB + D = C˜(zI − A˜)−1B˜ + D

Such transformations can usefully isolate characteristics of the system, and importantly, provide ways of it approximating the system with a smaller system in which the dimension of the A matrix is reduced: this is called balanced truncation. In balanced truncation, the so-called controllability and observability Gramians are calculated via solving a Lyapunov equation for each. A coordi- nate transformation is chosen so that these Gramians are identical. The singular values of the Gramians—the square roots of the eigenvalues—then can be ordered, with the largest correspond- ing to the sup norm of the system in question. The elements of the system associated with the smallest singular values can then be discarded if the resulting change in the infinity norm of the re- sulting system is dominated by the chosen tolerance. This reduces the number of poles. Moreover,

54 the error entailed in the reduction of the model has an analytical bound that is a linear function of the sums of the discarded singular values. We use the balanced truncation algorithm of Laub and Glover (see p. 319 of Sanchez-Pena and Sznaier (1998)). There is an additional operation needed in the numerical calculations: minimal realization. A minimal realization generalizes the idea of canceling the poles and zeroes of a rational function if they are equal. The rational function

(1 − .2z)(1 − .3z) (1 − .2z)(1 − .7z) is obviously equivalent to (1 − .3z) , (1 − .7z) but what about (1 − .2z)(1 − .3z) ? (1 − .2001z)(1 − .7z) The state space approach generalizes rational functions of this sort. Due to the approximations that are carried out from operations such as inversion, spectral factorization, and balanced truncation, small numerical errors can make coefficients in the numerator and denominator that should cancel slightly different; minimal realization algorithms force the cancellation if the coefficients satisfy a tolerance. We use the Kung algorithm to compute the minimal realization (see p. 310 of Sanchez-Pena and Sznaier (1998)). The algorithm sets up a block Hankel matrix of the system and uses singular value decomposition (a generalization of diagonalization of a matrix) to factor the Hankel matrix. The tolerance level removes nearly zero singular values, so that the remaining system is both controllable and observable—which translates into pole-zero cancellation when there is numerical noise.

The numerical recursion

We next develop the equation used in the numerical analysis.

Proposition 8        1 −1 ∗−1 1 2 1 −1 D D = J J 1 + ∗ σe + 1 − J J ∗ . (87) 2 D + 2 D +

Proof: Manipulating the definition of D in equation (48) yields    1 + 2δ ∗ δ 1 + 2δ = 2 DJ J 1 − ∗ . σe 1 + δ + Dividing out 1 + 2δ yields    2 ∗ δ σe = DJ J 1 − ∗ . (88) 1 + δ +

55 Undoing the annihilator operator yields

−1 X 1 + δ∗ − δ (1 + δ∗)σ2 + = J ∗J . (89) e 1 + 2δ −∞ Now substitute 1 1 + D δ = (D−1 − 1) and 1 + δ = 2 2D into the first-order condition for δ, equation (89), to obtain

−1  D∗−1 − 1 − (D−1 − 1) 1 + D∗ X J ∗J 1 + D = σ2 + , (90) 2 2D∗ e −∞ which reduces to −1  D D∗  1 + D∗ X J ∗J D∗D + − = σ2 + . (91) 2 2 2 e −∞ Next, divide equation (91) by both J ∗ and D∗. This yields

−1 1  1  1  D  X JD = J ∗−1 1 + σ2 + J 1 − + . (92) 2 D∗ e 2 D∗ −∞ Now apply the annihilator; the left-hand side is preserved:       1 ∗−1 1 2 1 D JD = J 1 + ∗ σe + J 1 − ∗ , (93) 2 D + 2 D + or        1 ∗−1 1 2 1 −1 D JD = J 1 + ∗ σe + J 1 − J J ∗ . (94) 2 D + 2 D + Dividing by J yields        1 −1 ∗−1 1 2 1 −1 D D = J J 1 + ∗ σe + 1 − J J ∗ . (95) 2 D + 2 D + 2

The algorithm

We use Mathematica to calculate the recursion. Examining equation (87), the algorithm works as follows:

1. An initial conjecture of the solution of D(z) is posited (not to be confused with the notation D for the state-space realization);

2. This conjecture is used in equation (63) where a spectral factorization is carried out to calcu- late F , using the method devised in Taub [43]. In turn, the calculated value of F as well as the conjectured value of D(z) is used in the spectral factorization in equation (64) to calculate J;

56 3. The resulting value of J and the conjectured value of D are substituted on the right hand side of the recursion (87) and the requisite multiplication, inversion, annihilation and addition operations are carried out, resulting in a new value of D, which becomes the new conjecture;

4. The iteration terminates when a Cauchy-style convergence criterion is met, that is, for iter- ation i, the norm of the improvement kDi − Di−1k2 falls below the chosen tolerance.

We highlight some further details of the algorithm. Examining the recursion equation (87) reveals that there are some inverses in the equation, as well as some spectral factorizations. These inversions and factorizations increase the number of pole terms on each iteration. The balanced truncation algorithm trims the insignificant pole terms in the iteration. The spectral factorization algorithm must cope with the arbitrary number of pole terms that arise from the proliferation of poles in the iteration. For this reason, we use robust spectral factor- ization algorithm developed in Taub [43]. This procedure also requires the choice of a tolerance. Thus, four tolerances must be chosen: the spectral factorization tolerance, the balanced trun- cation tolerance, the minimal realization tolerance, and the Cauchy criterion for D. Excessively relaxing the tolerances leads to unstable behavior numerically. With appropriate tolerances, the system converges numerically, consistent with the contraction property established in Appendix C.

57