Preferences, Homophily, and Social Learning

Home , Complex network, Network topology

Ilan Lobel, Evan Sadler Stern School of Business, New York University, New York, New York 10012 {[email protected], [email protected]}

We study a sequential model of Bayesian social learning in networks in which agents have heterogeneous preferences, and neighbors tend to have similar preferences—a phenomenon known as homophily. We find that the density of network connections determines the impact of preference diversity and homophily on learning. When connections are sparse, diverse preferences are harmful to learning, and homophily may lead to substantial improvements. In contrast, in a dense network, preference diversity is beneficial. Intuitively, diverse ties introduce more independence between observations while providing less information individually. Homophilous connections individually carry more useful information, but multiple observations become redundant. Subject classifications: network/graphs; probability; games/group decisions. Area of review: Special Issue on Information and Decisions in Social and Economic Networks. History: Received June 2013; revision received June 2014; accepted December 2014. Published online in Articles in Advance April 7, 2015.

1. Introduction tributions, and the extent of homophily, influence social We rely on the actions of our friends, relatives, and neigh- learning. In this paper, we study a sequential model of social learn- bors for guidance in many decisions. This is true of both ing to elucidate how link structure and preference structure minor decisions, like where to go for dinner or what movie interact to determine learning outcomes. We adopt a partic- to watch tonight, and major, life-altering ones, such as ularly simple representation of individual decisions, assum- whether to go to college or get a job after high school. ing binary states and binary actions, to render a clear intu- This reliance may be justified because others’ decisions ition on the role of the network. Our work builds directly are informative: our friends’ choices reveal some of their on that of Acemoglu et al. (2011) and Lobel and Sadler knowledge, which potentially enables us to make better (2015), introducing two key innovations. First, we signif- decisions for ourselves. Nevertheless, in the vast majority icantly relax the assumption of homogeneous preferences, of real-life situations, we view the implicit advice of our considering a large class of preference distributions among friends with at least some skepticism. In essentially any the agents. Although all agents prefer the action matching setting in which we would expect to find social learning, the underlying state, they can differ arbitrarily in how they preferences influence choices as much as information does. weigh the risk of error in each state. Second, we allow For instance, upon observing a friend’s decision to patron- correlations between the link structure of the network and ize a particular restaurant, we learn something about the individual preferences, enabling our study of homophily. restaurant’s quality, but her decision is also influenced by Our principal finding is that network link density deter- her preference over cuisines. Similarly, a stock purchase mines how preference heterogeneity and homophily will signals both company quality and risk preferences. In each impact learning. In a relatively sparse network, diverse case, two individuals with the same information may rea- preferences are a barrier to information transmission: het- sonably choose opposite actions. erogeneity between neighbors introduces an additional The structure of our social network determines what source of signal noise. However, we find that sufficiently information we can obtain through social ties. This struc- strong homophily can ameliorate this noise, leading to ture comprises not only the pattern of links between indi- learning results that are comparable to those in networks viduals, but also patterns of preferences among neighbors. with homogeneous preferences. In a dense network, the

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. Since differences in preferences impact our ability to infer situation is quite different. If preferences are sufficiently information from our friends’ choices, these preference pat- diverse, dense connectivity facilitates learning that is highly terns affect the flow of information. In real-world networks, robust, and too much homophily may lead to inefficient homophily is a widespread structural regularity: individu- herding. Homophilous connections offer more useful infor- als interact much more frequently with others who share mation, but the information provided through additional similar characteristics or preferences.1 To understand the ties quickly becomes redundant. Conversely, diverse ties effects of network structure on information transmission, provide less information individually, but there is more we must therefore consider how varying preference dis- independence between observations. Hence, the value of

a homophilous connection versus a diverse connection broad, there are always some types that act on their pri- depends on the other social information available to an vate information, creating the needed independence. Goeree individual. et al. (2006) find this leads to asymptotic learning in a Our formal analysis begins with sparse networks, mean- complete network even with bounded private beliefs. The- ing networks in which there is a uniform bound on how orem 4 generalizes this insight to a broad class of net- many neighbors any agent can observe. We first present a work structures, showing that learning robustly succeeds specialized example to highlight key mechanisms underly- within a dense cluster of agents as long as two condi- ing our results. There are two distinct reasons why pref- tions are satisfied: there is sufficient diversity within the erence heterogeneity reduces the informational value of cluster, and agents in the cluster are aware of its exis- an observation. One source of inefficiency is uncertainty tence. Since homophily reduces the diversity of preferences about how our neighbors make trade-offs. Perhaps surpris- among an agent’s neighbors, it has the potential to inter- ingly, there is additional inefficiency simply by virtue of fere with learning in a dense network. Example 4 shows the opposing trade-offs agents with different preferences that if homophily is especially extreme, with agents sorting make. Consider a diner who prefers Japanese food to Ital- themselves into two isolated clusters based on preferences, ian. When choosing between restaurants, this diner might informational cascades can emerge, and learning is incom- default to a Japanese restaurant unless there is strong evi- plete. We might imagine a network of individuals with dence that a nearby Italian option is of higher quality. strongly polarized political beliefs, in which agents on each Observing this individual enter a Japanese restaurant is a side never interact with agents on the other. Interestingly, a much weaker signal of quality than observing the same small amount of bidirectional communication between the individual enter an Italian restaurant. Consequently, this two clusters is sufficient to overcome this failure. The sem- observation is unlikely to influence a person who strongly inal paper of Bikhchandani et al. (1992) emphasized the prefers Italian food. When connections are scarce, long- fragility of cascades, showing that introducing a little out- run learning depends upon individual observations carrying side information can quickly reverse them. Our result is a greater amount of information, and homophily reduces similar in spirit, showing that in the long run, the negative inefficiency from both sources. impact of homophily on learning in dense networks is also Further results show that preference heterogeneity is fragile. generically harmful in a sparse network. Theorem 1 shows We make several contributions to the study of social that we can always render asymptotic learning impossible learning and the broader literature on social influence. First, via a sufficiently extreme preference distribution. Moreover, we show clearly how preference heterogeneity has dis- we find that the improvement principle, whereby agents tinct effects on two key learning mechanisms. This in turn are guaranteed higher ex ante utility than any neighbor, helps us understand how the network affects long-run learn- breaks down with even a little preference diversity. The ing. Most of the social learning literature assumes that improvement principle is a cornerstone of earlier results all individuals in society have identical preferences, dif- in models with homogeneous preferences (Banerjee and fering only in their knowledge of the world.2 Smith and Fudenberg 2004, Acemoglu et al. 2011), but we show that Sorensen (2000) offer one exception, showing that, in a there can be no improvement principle unless strong pref- complete network, diverse preferences generically lead to erences occur much less frequently than strong signals. an equilibrium in which observed choices become unin- Homophily reduces the inefficiency from preference diver- formative. However, a key assumption behind this result sity by rescuing the improvement principle: if an agent can is that preferences are nonmonotonic in the state: differ- identify a neighbor with preferences very close to her own, ent agents may change their actions in opposite directions then she can use her signal to improve upon that neighbor’s in response to the same new information. This excludes choice. We define the notion of a strongly homophilous the case in which utilities contain a common value com- network, in which each agent can find a neighbor with arbi- ponent, which is our focus. Closer to our work, Goeree trarily similar preferences in the limit as the network grows, et al. (2006) consider a model with private and common and we find that learning in a strongly homophilous net- values in a complete network, obtaining a result that we work is comparable to that in a network with homogeneous generalize to complex dense networks. Models that situate preferences. Moreover, adding homophily to a network that learning in the context of investment decisions have also

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. already satisfies the improvement principle will never dis- considered preference heterogeneity. This work finds that rupt learning. heterogeneity hinders learning and may increase the inci- In contrast, when networks are dense, preference het- dence of cascades, leading to pathological spillover effects erogeneity plays a positive role in the learning process. (Cipriani and Guarino 2008). Our innovation is the incorpo- When agents have many sources of information, the law ration preference heterogeneity and complex network struc- of large numbers enables them to learn as long as there is ture into the same model. We obtain results for general some independence between the actions of their neighbors. classes of networks, providing new insights on the under- If the support of the preference distribution is sufficiently lying learning mechanisms. Lobel and Sadler: Preferences, Homophily, and Social Learning 566 Operations Research 64(3), pp. 564–584, © 2016 INFORMS

In tandem with our study of preference diversity, we 2. Model shed light on the impact of homophily on social learning. Each agent in a countably infinite set sequentially chooses Despite its prevalence, few papers on learning have con- between two actions, 0 and 1. We index agents by the order sidered homophily. Golub and Jackson (2012) study group- of their decisions, with agent mbridging Agent n is endowed with a private signal sn, a ran- structural holes often accounts for the majority of diffu- dom variable taking values in an arbitrary metric space S . sion or information flows in a network (see, for instance, Conditional on the state à, each agent’s signal is inde- Granovetter 1973, Burt 2004, and Bakshy et al. 2012). pendently drawn from a distribution ⌃à. The pair of mea- Others have emphasized the importance of strong ties that sures 4⌃01 ⌃15 is common knowledge, and we call these carry greater bandwidth, especially when the information measures the signal structure. We assume that ⌃0 and ⌃1 being transmitted is complex (see Hansen 1999, Centola are not almost everywhere equal, so agents have a pos- and Macy 2007, and Aral and Walker 2014). Aral and Van itive probability of receiving an informative signal. The Alstyne (2011) analyze this “diversity-bandwidth” trade- private belief p ⇣4à 1 s 5 of agent n is a suffi- n ⌘ = ó n off in behavior diffusion, arguing that weak, diverse ties cient statistic for the information contained in sn. We write are more likely to provide access to novel information, ⌥i4r5 ⇣4pn ∂ r à i5 for the state-conditional private but strong, homophilous ties transmit a greater volume of belief⌘ distribution.ó We= assume the private beliefs have information, potentially resulting in more novel information full support over an interval 4Ç1 Ç5, where 0 Ç< 1 < ¯ ∂ 2 flowing through strong ties. Our work offers an alternative 3 Ç¯ ∂ 1. As in previous papers, we say private beliefs are theoretical basis for this trade-off, showing how rationality unbounded if Ç 0 and Ç¯ 1 and bounded if Ç>0 and implies that decisions made by a homophilous contact con- = = Ç¯ <1. vey less noisy information and reflect similar priorities to In addition to the signal sn, agent n also has access to our own. Furthermore, our analysis of the impact of net- social information. Agent n observes the actions of the work structure provides guidance on which contexts should agents in her neighborhood B4n5 811 210001n 19. That

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. ✓ É lead strong or weak ties to be most influential. is, agent n observes the value of xm for all m B4n5. The We first describe our model before analyzing learning neighborhood B4n5 is a random variable, and the2 sequence in sparse networks. Our analysis emphasizes the difficul- of neighborhood realizations describe a social network of ties that preference diversity creates and the benefits that connections between the agents. We call the probability

homophily confers. We next provide general results on qn ⇣4à 1 B4n5, xm for m B4n55 agent n’s social dense networks, ﬁnishing with a brief discussion. Most belief.= = ó 2 proofs are contained in the main body of the paper, with We represent the structure of the network as a joint dis- more technical results given in the appendix. tribution ⌘ over all possible sequences of neighborhoods Lobel and Sadler: Preferences, Homophily, and Social Learning Operations Research 64(3), pp. 564–584, © 2016 INFORMS 567

and types; we call ⌘ the network topology and assume We conclude this section with two basic lemmas required ⌘ is common knowledge. We allow correlations across in our subsequent analysis. These lemmas provide use- neighborhoods and types since this is necessary to model ful properties of belief distributions and characterize best homophily, but we assume ⌘ is independent of the state response behavior. à and the private signals. The types t share a common n Lemma 1. The private belief distributions ⌥ and ⌥ sat- marginal distribution , and the distribution has full sup- 0 1 1 isfy the following properties: port in some range 4É1É5, where 0 ∂ É ∂ ∂ É ∂ 1. The ¯ 2 ¯ (a) For all r 401 15, we have d⌥ /d⌥ 4r5 41 r5/r0 range 4É1É5 provides one measure of the diversity of pref- 2 0 1 = É ¯ (b) For all 0 0 and 0 = É · = ¯ = d⌥041 r5, and d⌥14r5 4r/41 r55d⌥141 r5. É<1. É = É É ¯ For algebraic simplicity, we often impose the following Proof. Parts (a) through (d) comprise Lemma 1 in Ace- symmetry and density assumptions, but where they save moglu et al. (2011). Part (e) follows immediately from part little effort, we refrain from invoking them. (a) together with Assumption 1. É Assumption 1. The private belief distributions are anti- Lemma 2. Let ë Ë be an equilibrium, and let I I be 2 n 2 n symmetric: ⌥ 4r5 1 ⌥ 41 r5 for all r 601 17; the a realization of agent n’s information set. The decision of 0 = É 1 É 2 marginal type distribution is symmetric around 1/2; the agent n satisﬁes distributions ⌥0, ⌥1, and have densities.

01 if ⇣ ë 4à 1 In5tn1 sists of the type tn, the signal sn, the neighborhood real- = ó ization B4n5, and the decisions xm for every agent m 2 and x 801 19 otherwise. Equivalently, the decision of B4n5. Agent n’s strategy ën is a function mapping real- n agent n 2satisfies izations of In to decisions in 801 19. A strategy profile ë is a sequence of strategies for each agent. We use ë n to É tn41 qn5 denote the set of all strategies other than agent n’s, ë n 01 if p < É 1 É = n t 41 q 5 q 41 t 5 8ë110001ën 11ën 11 0 0 09, and we can represent the strategy 8 n É n + n É n É + xn profile as ë 4ën1ë n5. Given a strategy profile ë, the = > t 41 q 5 = É <>11 if p > n É n 1 sequence of actions 8xn9n is a stochastic process with n 2 tn41 qn5 qn41 tn5 measure ⇣ . We analyze the perfect Bayesian equilibria É + É ë > of the social learning game, denoting the set of equilibria :> and xn 801 19 otherwise. by Ë. 2 We study asymptotic outcomes of the learning process. Proof. Agent n maximizes her expected utility given her information set I and the equilibrium ë Ë. Therefore, We interpret these outcomes as measures of long-run effi- n 2 she selects action 1 if ⇧ 6à 41 t 5 I 7>⇧ 641 à5 ciency and information aggregation. We say that asymptotic ë + É n ó n ë É + learning occurs if, in the limit as n grows, agents act as tn In7, where ⇧ë represents the expected value in a given equilibriumó ë Ë. The agent knows her type t , so this though they have perfect knowledge of the state. Equiv- 2 n condition is equivalent to ⇧ 6à I 7>t . Since à is an indi- alently, asymptotic learning occurs if actions converge in ë ó n n cator function and t is independent of à, we have ⇧ 6à I 7 probability on the true state: n ë ó n ⇣ 4à 1 s 1B4n51x for all m B4n55, proving the = ë = ó n m 2 clause for xn 1. The proof for xn 0 is identical with the lim ⇣ ë 4xn à5 10 n = = !à = = inequalities reversed. The second characterization follows immediately from the first and an application of Bayes’ Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. This is the strongest limiting result achievable in our frame- rule. work. Asymptotic learning implies almost sure convergence É of individual beliefs, but almost sure convergence of actions An agent chooses action 1 whenever the posterior prob- need not occur: agents must continue to act based on their ability that the state is 1 is higher than her type, and she signals to achieve full information aggregation. Whether chooses action 0 whenever the probability is lower than her

asymptotic learning obtains is the main focus of our anal- type. Therefore, agent n’s type tn can be interpreted as the ysis, but we also comment on learning rates to shed some minimum belief agent n must have that the true state is light on shorter term outcomes. à 1 before she will choose action x 1. = n = Lobel and Sadler: Preferences, Homophily, and Social Learning 568 Operations Research 64(3), pp. 564–584, © 2016 INFORMS

3. Sparsely Connected Networks same utility as that neighbor, so utility is strictly increasing We call a network sparse if there exists a uniform bound along a chain of agents. In our example, differing prefer- on the size of all neighborhoods; we say the network is ences mean that an agent earns less utility than her neigh- M-sparse if, with probability 1, no agent has more than M bor if she copies. The improvement component is unable neighbors. We begin our exploration of learning in sparse to make up this loss, and asymptotic learning fails. networks with an example. Although we shall model far more general network structures, in which agents have multiple neighbors and are Example 1. Suppose the signal structure is such that uncertain about their neighbors’ types, this insight is impor- ⌥ 4r5 2r r 2 and ⌥ 4r5 r 2. Consider the network tant throughout this section. Learning fails if there is a 0 = É 1 = topology ⌘ in which each agent observes her immediate chance that an agent’s neighbors have sufﬁciently different 1 predecessor with probability 1, agent 1 has type t1 5 with preferences because opposing risk trade-offs obscure the 4 = probability 0.5 and type t with probability 0.5. Any information that is important to the agent. Homophily in 1 = 5 other agent n has type tn 1 tn 1 with probability 1. the network allows individuals to identify neighbors who = É É In this network, agents fail to asymptotically learn the are similar to themselves, and an improvement principle true state, despite having unbounded beliefs and satisfying can operate along homophilous connections. the connectivity condition from Acemoglu et al. (2011).

1 3.1. Failure Caused by Diverse Preferences Without loss of generality, suppose t1 5 . We show inductively that all agents with odd indices= err in state 0 This subsection analyzes the effects of diverse preferences 1 in the absence of homophily. We assume that all types and with probability at least 4 , and likewise agents with even 1 neighborhoods are independently distributed. indices err in state 1 with probability at least 4 . For the 1 9 3 first agent, observe that ⌥04 5 5 25 < 4 , so the base case = Assumption 2. The neighborhoods 8B4n59n and types holds. Now suppose the claim holds for all agents of index 2 8tn9n are mutually independent. less than n, and n is odd. The social belief qn is minimized 2 if x 0, taking the value Definition 1. Define the ratio RÖ t41 Ö5/4t41 Ö5 n 1 t ⌘ É É É = 41 t5Ö5. Given private belief distributions 8⌥ 9, we say + É à ⇣ ë 4xn 1 0 à 15 1/4 1 that the preference distribution is M-diverse with respect É = ó = æ 0 1 ⇣ ë 4xn 1 0 à 15 ⇣ ë 4xn 1 0 à 05 1/4 1 = 5 to beliefs if there exists Ö with 0 <Ö< 2 such that É = ó = + É = ó = + It follows from Lemma 2 that agent n will choose action 1 1 1/M Ö 1 Ö 1 Ö 1 whenever p > . We obtain the bound ⌥04Rt 5 É ⌥14Rt É 5 d4t5 ∂ 00 n 2 0 É Ö Z  ✓ ◆ 1 1 A type distribution with M-diverse preferences has ⇣ 4x 1 à 05 1 ⌥ 0 ë n = ó = æ É 0 2 = 4 at least some minimal amount of probability mass located ✓ ◆ near the endpoints of the unit interval. We could also An analogous calculation proves the inductive step for interpret this as a polarization condition, ensuring there agents with even indices. Hence, all agents err with prob- are many people in society with strongly opposed prefer- ability bounded away from zero, and asymptotic learning ences. As M increases, the condition requires that more fails. Note this failure is quite different from the classic mass is concentrated on extreme types. For any N

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. away from 1 across all M-sparse networks. a neighbor’s decision. We can decompose an agent’s utility into two components: utility obtained through copying Proof. Given the Ö in Deﬁnition 1, we show the social

her neighbor and the improvement her signal allows over belief qn is contained in 6Ö11 Ö7 with probability 1 for this. The improvement component is always positive when all n, which immediately impliesÉ all of the stated results. 1 private beliefs are unbounded, creating the possibility for Proceed inductively; the case n 1 is clear with q1 2 . improvements to accumulate toward complete learning over Suppose the result holds for all=n k. Let B4k 15= B ∂ + = time. With homogeneous preferences, this is exactly what with B ∂ M, and let xB denote the random vector of ó ó B happens because copying a neighbor implies earning the observed actions. If x 801 19ó ó is the realized vector of 2 Lobel and Sadler: Preferences, Homophily, and Social Learning Operations Research 64(3), pp. 564–584, © 2016 INFORMS 569

decisions that agent k 1 observes, we can write the social ⇣ 4x i à 05 d4t5 + · ë m = ó = belief qk 1 as 1/41 l5, where l is a likelihood ratio: 1 1 + + qn ⇣4pn ∂ Rt 5⇣ ë 4xm i à 05 d4t5 ⇣ 4x x à 05 = 0 i 0 = ó = l ë B = ó = Z = 1 X = ⇣ ë 4xB x à 15 1 y y = ó = y⌥04Rt É 5 41 y5⌥04Rt 5 d4t5 ⇣ ë 4xm xm à 01xi xi1i < m5 = 0 + É = ó = = 0 Z h 1 i = m B ⇣ ë 4xm xm à 11xi xi1i < m5 y 1 y 2 = ó = = y 41 y5⌥04Rt 5 y41 ⌥04Rt É 55 d4t5 Y = + 0 É É É Conditional on a fixed realization of the social belief qm, Z 1h i y y the decision of each agent m in the product terms is inde- y 41 y5⌥04Rt 5 y⌥14R1 t5 d4t5 = + 0 É É É pendent of the actions of the other agents. Since q m 2 Z 1h i 6Ö11 Ö7 with probability 1, we can fix the social belief of y y É y 41 y5⌥04Rt 5 y⌥14Rt 5 d4t51 agent m at the end points of this interval to obtain bounds. = + 0 É É Z Each term of the product is bounded above by h i as desired. É 1 Ö 1/M Theorem 2. Suppose the network is 1-sparse, and let As- 0 ⌥04Rt 5 d4t5 1 Ö 1 ∂ É 0 sumptions 1 and 2 hold. If either of the following conditions 4R1 Ö5 d 4t5 Ö R0 ⌥1 t É ✓ ◆ is met, then asymptotic learning fails. R (a) The preference distribution satisfies This in turn implies that qk 1 æ Ö. A similar calculation using a lower bound on the+ likelihood ratio shows that 4t5 qk 1 ∂ 1 Ö. É lim inf > 00 + t 0 É ! t Regardless of the signal or network structure, we can always find a preference distribution that will disrupt (b) For some K>1 we have asymptotic learning in an M-sparse network. Moreover, ⌥ 4r5 because of the uniform bound, this is a more severe fail- lim 0 c>01 and r 0 r K 1 = ure than what occurs with bounded beliefs and homo- ! É 1 K 1 t É 4K 42K 15t5 geneous preferences. Acemoglu et al. (2011) show that d 4t5< 00 É K É asymptotic learning often fails in an M-sparse network if 0 41 t5 É private beliefs are bounded, but the point at which learn- Z Proof. Each part follows from the existence of an Ö>0 ing stops depends on the network structure and may still such that 4y5 given in Lemma 3 satisfies 4y5< y for be arbitrarily close to complete learning. In this sense, the Z Z y 61 Ö115. This immediately implies that learning is challenges introduced by preference heterogeneity are more incomplete.2 É substantial than those introduced by weak signals. Assume the condition of part (a) is met. Divide by 1 y An analysis of 1-sparse networks allows a more pre- to normalize; for any Ö>0 we obtain the bound É cise characterization of when diverse preferences cause the improvement principle to fail. In the proof of our result, 1 y y y we employ the following convenient representation of the ⌥04Rt 5 ⌥14Rt 5 d4t5 0 É 1 y improvement function under Assumptions 1 and 2. Recall Z h É i y 1 Ö 1 the notation Rt t41 y5/4t41 y5 41 t5y5. É y y ⌘ É É + É ∂ ⌥04Rt 5 d4t5 ⌥04Rt 5 d4t5 0 + 1 Ö Lemma 3. Suppose Assumptions 1 and 2 hold. Define Z Z É y 1 261 1 17 6 1 1 17 by y Z 2 2 ⌥14Rt 5 d4t5 ! É 1 y y É Z 1 1 y 41 y5 4y5 y 41 y5 4Ry5 y 4Ry5 d 4t50 (1) y Z ⌥0 t ⌥1 t ∂ ⌥04R1 Ö5 4Ö5 ⌥1 É 0 = + 0 É É É + É 2 1 y Z h i ✓ ◆ É In any equilibrium ë, we have ⇣ ë 4xn à B4n5 8m95 Choosing Ö sufficiently small, the second term is negligible, = ó = = Z4⇣ 4x à55. Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. ë m while for large enough y, the first term approaches zero, = and the third is bounded above by a constant less than zero. Proof. Observe under Assumption 1 we have ⇣ 4x à5 ë n = Hence the improvement term in Equation (1) is negative in ⇣ ë 4xn à à i5 for each i 801 19, and =1 = ó 1 = 2 61 Ö115 for sufficiently small Ö. f4t5d4t5 f41 t5d4t5 for any f . Taking y 0 = 0 É = TheÉ proof of part (b) is presented in the appendix. ⇣ 4x à5, we have ⇣ 4x à B4n5 8m95 equal to É R ë m = R ë n = ó = These conditions are significantly weaker than 1- 1 1 diversity. We establish a strong connection between the ⇣ ë 4xn à B4n5 8m91 xm i1 à 01tn t5 0 i 0 = ó = = = = tail thicknesses of the type and signal distributions. The Z X= Lobel and Sadler: Preferences, Homophily, and Social Learning 570 Operations Research 64(3), pp. 564–584, © 2016 INFORMS

improvement principle fails to hold unless the preference technical challenges because we cannot retain the inde- distribution has sufficiently thin tails; for instance, part (a) pendence assumptions from the last subsection; we need implies learning fails under any signal structure if types are to allow correlations to represent homophily. In a model uniformly distributed on 401 15. As the tails of the belief with homogeneous preferences, Lobel and Sadler (2015) distributions become thinner, the type distribution must highlight unique issues that arise when neighborhoods are 1 increasingly concentrate around 2 in order for learning to correlated. This creates information asymmetries leading remain possible. In many cases, little preference diversity different agents to have different beliefs about the overall is required to disrupt the improvement principle. structure of connections in the network. To avoid the com- Even if preferences are not so diverse that learning fails, plications this creates, and focus instead on issues related we might expect heterogeneity to significantly slow the to preferences and homophily, we assume the following. learning process. Interestingly, once the distribution of preferences is concentrated enough to allow an improvement Assumption 3. For every agent n, the neighborhood B4n5 principle, learning rates are essentially the same as with is independent of the past neighborhoods and types homogeneous preferences. Lobel et al. (2009) show in a 84tm1B4m559m y for all y 6 1 1 15. If for to obtain a more tractable problem. Although the assump- 2 2 some K>1 we have tion is not costless, we retain the ability to explore the role of preferences and homophily in a rich class of network ⌥ 4r5 lim 0 c>01 and topologies. In many cases, we can translate a network of r 0 r K 1 ! É = interest into an identical (or at least similar) one that sat- 1 K 1 t É 4K 42K 15t5 isfies Assumption 3. For instance, Examples 2 and 3 in §4 d 4t5> 01 É K É 0 41 t5 show two different, but equivalent ways to model a net- É Z work with two complete subnetworks, one with low types 1/4K 15 then the probability of error decreases as nÉ + : and one with high types. The first example is perhaps more intuitive, but the second, which satisfies Assumption 3, is 1/4K 15 ⇣ 4x à5 O4nÉ + 50 entirely equivalent. ë n 6= = If agent m is a neighbor of agent n, then under our

Proof. See appendix. É previous assumption, the distribution of tn conditioned on the value of t is , regardless of agent m’s type. With Once an improvement principle holds, the size of the m homophily, we would expect this conditional distribution improvements, and hence the rate of learning, depends on the tails of the signal structure. Early work on learning to concentrate around the realized value of tm. We define a suggests that diverse preferences can slow learning (Vives notion of strong homophily to capture the idea that agents 1993, 1995), even if information still aggregates in the long are able to find neighbors with similar types to their own run. Perhaps surprisingly, Proposition 1 shows that as long in the limit as the network grows large. as an improvement principle still holds, asymptotic learn- To formalize this, we recall terminology introduced by Lobel and Sadler (2015). We use B4n5 to denote an exten- ing rates are identical to the homogeneous preferences case. ˆ This result suggests that the learning speed we find if pref- sion of agent n’s neighborhood, comprising agent n’s erences are not too diverse, or if we have homophily as in neighbors, her neighbors’ neighbors, and so on. the next section, is comparable to that in the homogeneous Definition 2. A network topology ⌘ features expand- preferences case. ing subnetworks if, for all positive integers K,

some of the learning challenges preference heterogeneity choice functions 8Én9n . It consists only of the links in ⌘ 2 introduces. Analyzing homophily in our model presents selected by the neighbor choice functions 8Én9n . 2 Lobel and Sadler: Preferences, Homophily, and Social Learning Operations Research 64(3), pp. 564–584, © 2016 INFORMS 571

Without the expanding subnetworks condition, there is For example, let be the uniform distribution on 401 15, some subsequence of agents acting based on a bounded and suppose agent n 101 has randomly drawn type t = 101 = number of signals, which clearly precludes asymptotic 00552. By Equation (2), we have i⇤ 55 since 00552 learning. We interpret this as a minimal connectivity 600551 00565. Suppose agent n 80 has= the 55th lowest type2 requirement. Neighbor choice functions allow us to make among the ﬁrst 100 agents. Then,= agent 101 will observe precise the notion of identifying a neighbor with cer- the decision of agent 80 with probability 101ä/4101ä 995 tain attributes. We call a network topology strongly and observe any other agent with probability 1/4101ä +995. homophilous if we can form a minimally connected Since agent 80, with the 55th lowest type among the+ ﬁrst

chosen neighbor topology in which neighbors’ types hundred agents, is likely to have a type near t101 00552, become arbitrarily close. That is, individuals can identify this network exhibits homophily. = a neighbor with similar preferences, and the subnetwork The parameter ä neatly captures our concept of ho- formed through these homophilous links is itself minimally mophily. A value of ä 0 corresponds to a network with connected. no homophily, in which= each agent’s neighborhood is a uniform random draw from the past, independent of real- Definition 3. The network topology ⌘ is strongly homophilous if there exists a sequence of neighbor choice ized types. As ä increases, agent n’s neighborhood places increasing weight on the past agent whose type rank most functions 8Én9n such that ⌘É features expanding subnetworks, and for2 any Ö>0 we have closely matches agent n’s percentile in the type distribution. Moreover, it is easy to check that B4n5 contains a lim ⌘4 tn tÉ 4B4n55 >Ö5 00 past agent drawn uniformly at random, independent of the n ó É n ó = !à history, so Assumption 3 is satisfied. We obtain a sharp Theorem 3. Suppose private beliefs are unbounded and characterization of learning in these networks as a function Assumption 3 holds. If ⌘ is strongly homophilous, asymp- of the parameter ä. totic learning obtains. Proposition 2. Suppose private beliefs are unbounded, Proof. See appendix. É Assumption 1 holds, and lim inft 0 4t5/t > 0. Asymptotic This theorem is proved by repeatedly applying an learning obtains in a simple ä-homophilous! network if and improvement principle along the links of the chosen neigh- only if ä>1. bor topology. If, in the limit, agents are nearly certain Proof. For the forward implication, two observations to have neighbors with types in a small neighborhood of establish that Theorem 3 applies. First, their own, they will asymptotically learn the true state. The reasoning behind this result mirrors that of our previ- nä lim ⌘ä4B4n5 8i⇤95 1 ous negative result. With enough homophily, we ensure the n ä !à = = n 2 n = neighbor shares the agent’s priorities with regard to trade- É + whenever ä>1. Second, it follows from the Glivenko- offs, so copying this neighbor’s action entails no loss in Cantelli Theorem that for any Ö>0, we have ex ante expected utility. Unbounded private beliefs are then sufficient for improvements to accumulate over time. We lim ⌘ä4 tn ti >Ö5 00 n ⇤ note that the condition in Theorem 3 does not require the !à ó É ó = sequence of type realizations to converge. As we now illus- Given any Ö>0, we can find N4Ö5with ⌘ä4B4n5 8i⇤95 ∂ trate, sufficient homophily can exist in a network in which 6= Ö/2 and ⌘ä4 tn ti >Ö5∂ Ö/2 for all n æ N4Ö5. Defining type realizations are mutually independent and distributed ó É ⇤ ó the functions 8Én9n in the natural way, we have according to an arbitrary with full support on 401 15. 2 To better understand our positive result, consider its ⌘ä4 tn tÉ 4B4n55 >Ö5∂ Ö ó É n ó application to the following simple class of networks. We for all n æ N4Ö5. It is a simple exercise to show that ⌘ä shall call a network topology ⌘ä a simple ä-homophilous network if it has the following structure. Let ä be a nonneg- features expanding subnetworks. For the converse result, fix n>m. Define y ative real-valued parameter, and let be a type distribution = ⇣ 4x à5, with a density. The types are mutually independent and are ë m = generated according to the distribution . Given a realiza- Pi4t5 ⇣ ë 4xm à à i1B4n5 8m91 tn t51 and tion of 8tm9m∂n, define i⇤ such that = = ó = = =

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. P 4t5 q 4t5 i 0 1 i⇤ 1 1 i⇤ i t É É 1 É 1 (2) = Pi4t5 1 P1 i4t5 n 2 n 1 n 1 + É É  ✓ É ◆ ✓ É ◆◆ Using Assumption 1 and following calculations similar to and let mi denote the index of the agent with ith small- the proof of Lemma 3, we obtain est type among agents 11 210001n 1. Deﬁne the weights ä É 8wi9i

We can uniformly bound Pi4t5, and hence qi4t5, using y Definition 5. Let ⌘ and ⌘0 be modifications of one Å as follows. Define p⇤ 4n 15/4n 2 n 5, and note another with conditional type distributions n and n0 , = É É + that the distribution of agent n’s neighbor can be expressed respectively. Given a sequence ã 8ãn9n 601 17 , we = 2 2 4ã5 as a compound lottery choosing uniformly from the past define the ã-mixture of ⌘ and ⌘0 as the modification ⌘ 4ã5 with probability p⇤ and choosing agent k⇤ otherwise. Con- of ⌘ with conditional type distributions ã n = n n + ditioned on the event that a uniform draw from the past 41 ã 50 . É n n was taken, the probability that xm à is equal to y when conditioned on either possible value= of à. Thus we have We consider networks in which asymptotic learning obtains by virtue of an improvement principle, and we find that mixing such networks with strongly homophilous mod- p⇤y ∂ Pi4t5 ∂ 1 p⇤41 y5 É É ifications preserves learning.

for each i and for all t 401 15. As long as p⇤ is uni- 2 Definition 6. We say that a network topology ⌘ satis- formly bounded away from zero, we can make a slight fies the improvement principle if any agent can use her modification to the argument of Theorem 2 part (a) to show signal to achieve strictly higher utility than any neigh- that learning fails. This is precisely the case if and only if bor. Formally, this means there is a continuous, increas- Å 1. ∂ É ing function4 Z2611 3/27 611 3/27 such that Z4y5> y ! With ä>1, the likelihood of observing the action of for all y 611 3/25, and ⇧ë 6u4tn1xn1à5 B4n5 8m97 2 ó = æ a neighbor with a similar type grows fast as n increases. Z4⇧ë 6u4tm1xm1 à575 for all m

tory of types t110001tn 1 and the history of neighborhoods bility of the improvement principle—and consider a con- B4151 0 0 0 1 B4n5. To simplifyÉ notation, we represent the convex combination between this network and one with strong

ditional type distribution of agent n by n. homophily, agents will still learn. One interpretation of this statement is that the addition of homophily to a network Definition 4. Let and be two network topologies ⌘ ⌘0 never harms learning that is already successful without that satisfy Assumption 3. We say that and are modifi- ⌘ ⌘0 homophily. cations of one another if they share the same neighborhood Our findings in this section establish a positive role for distributions and the same marginal type distribution . homophily in social learning. Without homophily, a sparse

If the network topologies ⌘ and ⌘0 are modiﬁcations network with diverse preferences struggles to aggregate of each other, they represent very similar networks. They information because the meaning of a decision is difﬁcult will have identical neighborhood distributions and equally to interpret, and different agents make different trade-offs.

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. heterogeneous preferences. However, the conditional distri- Homophily counters both sources of inefﬁciency, leading

butions n and n0 can differ, so the networks may vary sig- to better outcomes as the degree of homophily increases. nificantly with respect to homophily. This will allow us to In the next section, we see the other side of homophily’s make statements about the impact of increasing homophily impact on learning. Networks with dense connections offer while keeping other properties of the network topology an opportunity to overcome the difficulty of learning with constant. Given two network topologies that are modifica- bounded private beliefs. The ability to observe a diverse tions of each other, we can take convex combinations to neighborhood is a major driver of this positive result, and create other modifications of the original two. an excess of homophily can stifle the learning process. Lobel and Sadler: Preferences, Homophily, and Social Learning Operations Research 64(3), pp. 564–584, © 2016 INFORMS 573

4. Densely Connected Networks needs to be verified. We still have a great deal of flexibility A classic result by Banerjee (1992) and Bikhchandani et al. to represent different network structures. Examples 2 and 3 (1992) shows that informational cascades occur in a com- demonstrate two ways to represent essentially the same net- plete network when private beliefs are bounded and pref- work structure, both of which comply with Assumption 4. erences are homogeneous. More recently, Goeree et al. Example 2. Suppose types are i.i.d., and suppose the net- (2006) find that unbounded preferences can remedy the sit- 1 work topology is such that any agent n with type tn æ 2 uation, proving a strong positive result in a comparable 1 observes any previous agent m with tm æ 2 and any agent setting. Intuitively, if the distribution of preferences has full 1 n with type tn < 2 similarly observes any previous agent m support on 401 15 and types are independent, then agents 1 with tm < 2 . The sequence of stopping times 8Åi9i , where always emerge whose preferences roughly balance against 1 2 Åi is the ith agent with type at least 2 , forms a cluster, and their social information. These agents must rely on their the agents with types below 1 form another. private signals to choose an action, so new information is 2 revealed to the rest of the network. Even if an individ- In the example above, a cluster of agents forms accord- ual decision provides little information, dense connections ing to realized types; neighborhood realizations are corre- allow learning to operate through a law-of-large-numbers lated such that high type agents link to all other high type mechanism in addition to the improvement principle. The agents. A similar clustering of types can be achieved in presence of this second learning mechanism leads to robust a network with deterministic neighborhoods and correlated learning. types instead of correlated neighborhoods. In contrast to our results for sparse networks, preference diversity appears decisively beneficial with full observation. Example 3. Let and denote the distribution con- É + 1 In this section, we show this insight is generic in a broad ditional on a type realization less than or at least 2 , respec- class of dense network components that we call clusters. tively. Suppose agents are partitioned into two disjoint fixed subsequences C 8ci9i and D 8di9i with c1 1. Definition 7. A cluster C is a sequence of stopping = 1 2 = 2 = Conditional on t1 < 2 , agents in C realize types indepen- times 8Åi9i with respect to the filtration generated by 2 dently according to with agents in D realizing types É 84tn1B4n559n that satisfy 1 2 independently from ; if t1 æ 2 , the type distributions for the subsequences are+ switched. Let the network structure lim ⌘4Åk B4Åi55 1 for all k 0 i be deterministic, with all agents in C observing all previ- !à 2 = 2 ous agents in C, and likewise for the subsequence D. The A cluster is a generalization of the concept of a (ran- members of C form one cluster, and the members of D domly generated) clique. Any complete subnetwork—a form another. clique—with infinitely many members is a cluster. A subset C of the agents such that any member of C is observed This section’s main finding is a sufficient condition for with probability approaching one by later members of C learning within a cluster 8Åi9i . We say that asymptotic 2 is also a cluster. Clusters may exist deterministically, with learning occurs within a cluster 8Åi9i if we have 2 the indices Åi being degenerate random variables as in the complete network, or they may arise stochastically. For lim ⇣ ë 4xÅ à5 10 i i instance, suppose types are i.i.d. and all agents with types !à = = above 008 are connected with each other through some cor- We require the cluster satisfies two properties. relation between neighborhoods and types. In this case, the

stopping time Åi would refer to the ith agent with type Definition 8. A cluster is identified if there exists a family above 008 and this group would form a cluster. Note that k of neighbor choice functions 8Éi 9i1k such that for each a cluster potentially has far fewer edges than a clique; the k, we have 2 ratio of the number of edges in a cluster to the number of edges in a clique can approach zero as n grows. k lim ⌘4Éi 4B4Åi55 Åk5 10 i As in §3, we introduce an independence assumption for !à = = tractability. That is, a cluster is identified only if, in the limit as the

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. Assumption 4. For each stopping time Åi in the cluster network grows large, agents in a given cluster can identify C, conditional on the event Åi n, the type tn is gener- which other individuals belong to the same cluster. ated according to the distribution= independently of the Åi A cluster is uniformly diverse if the following two con- history t110001tn 1 and B4151 0 0 0 1 B4n5. ditions hold. É (a) For any interval I 401 15 there exists a constant The assumption above represents a nontrivial technical ✓ ÖI > 0 such that ⇣4tÅ I5æ ÖI for inﬁnitely many i. restriction on the random agents that form a cluster. If i 2 the types are i.i.d., then any stopping time will satisfy the (b) There exists a ﬁnite measure å such that the Radon- Nykodim derivative då/d 1 almost surely for all i. assumption, but if the types are correlated, the condition Åi æ Lobel and Sadler: Preferences, Homophily, and Social Learning 574 Operations Research 64(3), pp. 564–584, © 2016 INFORMS

The first part of the uniform diversity condition gener- Example 4. Suppose the types 8tn9n form an i.i.d. 1 2 alizes the notion of unbounded preferences. It is unnec- sequence, is symmetric around 2 , and private beliefs are essary for any particular member of a cluster to have a bounded. Consider a network in which agents sort them- type drawn from a distribution with full support; we simply selves into three clusters C for i 811 21 39. Let 1 C , and i 2 2 1 need infinitely many members of the cluster to fall in any let cn denote the number of agents in C1 with index less given interval of types. The second part of the condition than n. Each agent n is contained in C1 with probability is technical, requiring the existence of a finite measure on 1/2cn , and these agents observe the entire history of action. 401 15 that dominates the type distribution of all agents in Otherwise, if t < 1 we have n C , and n observes all n 2 2 2 the cluster. Without this condition, it would be possible to prior agents in C2 and only those agents for sure. Likewise, construct pathological situations by having the type distri- if t 1 we have n C , and n observes all prior agents n æ 2 2 3 butions concentrate increasing mass near the end points in C3 and only those agents for sure. Åi of 401 15 as i grows. Together, these conditions are suffi- The cluster C1 is uniformly diverse, but unidentified, and cient for asymptotic learning in clusters regardless of the learning fails. The clusters C2 and C3 are identified, but the signal structure or the structure of the rest of the network. lack of diversity means informational cascades occur with positive probability; asymptotic learning fails here too. Theorem 4. Let Assumption 4 hold. Suppose 8Åi9i is an identified, uniformly diverse cluster. Asymptotic learning2 In Example 4, agents sort themselves into three clusters: obtains within the cluster. a small uniformly diverse cluster, a large cluster of low types, and a large cluster of high types. Since private beliefs Proof. See appendix. É are bounded, if the low type cluster approaches a social belief close to one, no agent will ever deviate from action 1. With enough preference diversity, there is sufficient Likewise, if the high type cluster approaches a social belief experimentation within a cluster to allow full information close to zero, no agent will deviate from action 0. Both aggregation, even with bounded private beliefs. A key and of these clusters have a positive probability of inefficient novel element in our characterization is the concept of iden- herding. If this occurs, the agents in C1 observe a history tification. Agents need to be able to tell with high proba- in which roughly half of the agents choose each action, bility who the other members of their cluster are to ensure and this is uninformative. Since the other members of C1 asymptotic learning within the cluster. We can immediately make up a negligible share of the population and cannot be apply this result to any network without homophily where identified, their experimentation fails to provide any useful the entire network forms a cluster. information to agents in C1. This example clearly suggests a negative role for Corollary 1. Suppose preferences are unbounded, and homophily in a dense network. Densely connected net- the types 8tn9n form an i.i.d. sequence that is also inde- 2 works offer an opportunity to overcome the difficulty pendent of the neighborhoods. If for each m we have presented by bounded private beliefs, but if homophily is strong enough that no identified cluster is uniformly lim ⌘4m B4n55 11 (3) n 2 = diverse, the benefits of increased connectivity are lost. !à However, an inefficient herding outcome requires extreme asymptotic learning obtains. isolation from types with different preferences, and even slight exposure is enough to recover a positive long-run Proof. Define the stopping indices Å i, and define the i = result. neighbor choice functions Consider the network introduced in Example 3. For a given marginal type distribution with unbounded pref- i i if i B4n5 erences, let denote the distribution conditional on a Én4B4n55 2 É 1 = ( otherwise. type realization less than 2 , and the distribution con- ô + 1 ditional on a type realization of at least 2 . Agents are par- Theorem 4 applies. titioned into two disjoint deterministic subsequences C É =1 8ci9i and D 8di9i with r1 1. Conditional on t1 < 2 , 2 = 2 = This corollary substantially generalizes the asymptotic agents in C realize types independently according to , É Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. learning result of Goeree et al. (2006) for the complete while agents in D realize types independently according to network; it obtains a positive result that applies even to 1 , with the distributions switched conditional on t1 æ 2 . + networks with correlated neighborhoods and gaps, as long We assume that ci B4cj 5 and di B4dj 5 for all i

cluster occasionally observe a member of the other clus- problems that arise are due to both uncertainty regarding ter. Only a small amount of bidirectional communication how a neighbor makes her decision and to different trade- between the clusters is needed to disrupt herding, leading offs neighbors face. Homophily ameliorates both issues, to asymptotic learning in the entire network. making it unambiguously helpful to this type of learning. Dense connectivity allows a far more robust learning Proposition 4. Assume private beliefs are bounded. Con- mechanism to operate. Having strong information from sider a network topology ⌘ with deterministic neighborhoods that is partitioned into two clusters as described individual neighbors becomes less important when one can above. Asymptotic learning occurs in both clusters if and observe the actions of many neighbors. As long as each only if observation provides some new information, the law of large numbers ensures that learning succeeds. Diverse preferences provide a degree of independence to the observa- sup8i2 j1ci B4dj 59 sup8i2 j1di B4cj 59 0 9 2 = 9 2 =à tions, facilitating this means of learning. Homophily, since it reduces preference diversity within a neighborhood, has Proof. See appendix. É the potential to interfere, but our results suggest this learn- The key requirement in this statement is another type of ing mechanism is resilient enough to be unaffected unless minimal connectivity. For each cluster, the observations of homophily is particularly extreme. the other cluster cannot be confined to a finite set. This Overall, we find that preference heterogeneity and means that periodically there is some agent in each clus- homophily play positive, complementary roles in social ter who makes a new observation of the other. This single learning as typical complex networks include both sparse observation serves as an aggregate statistic for an entire and dense components. If private signals are of bounded cluster’s accumulated social information, so it holds sub- strength, preference diversity is necessary to ensure learn- stantial weight even though it is generated by a cluster of ing in the dense parts of the network via the law-of- agents with significantly different preferences. This sug- large-numbers mechanism. If homophily is also present, the gests that informational cascades in two cluster networks information accumulated in the dense parts of the network are difficult to sustain indefinitely, even in the presence of can spread to the sparse parts via the improvement principle homophily, though if observations across groups are very mechanism. The combination of preference heterogeneity rare, convergence may be quite slow. and homophily should generally benefit information aggregation in complex, realistic networks. 5. Conclusions

Preference heterogeneity and homophily are pervasive in Acknowledgments real-world social networks, so understanding their effects is crucial if we want to bring social learning theory closer This paper incorporates results from “Heterogeneity and Social to how people make decisions in practice. Preference diver- Learning,” a working paper with Daron Acemoglu and Asu Ozdaglar. The authors thank Daron and Asu for their permission sity, homophily, and network structure impact learning in to use earlier results here. The authors are grateful to Alireza complex ways, and the results presented in this paper Tahbaz-Salehi for very helpful conversations on early versions of provide insight on how these three phenomena interact. this work, and to Sinan Aral for guidance on the homophily liter- Underlying our analysis is the idea that learning occurs ature. The authors gratefully acknowledge financial support from through at least two basic mechanisms, one based on the the NET Institute, www.NETinst.org. The authors also thank the improvement principle and the other based on a law-of- National Science Foundation [Grant CCF-1216004] for its finan- large-numbers effect. The structure of the network dictates cial support. Any errors are those of the authors. which mechanisms are available to the agents, and the two mechanisms are affected differently by preference heterogeneity and homophily. Appendix The improvement principle is one way people can learn We first prove part (b) of Theorem 2 from §3.1 and estab- about the world. An individual can often combine her lish a version of the improvement principle to prove the private information with that provided by her neighbor’s results in §3.2. We conclude by applying the theory of mar- action to arrive at a slightly better decision than her neigh- tingales to establish results from §4. Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. bor. In sparsely connected networks, the improvement principle is the primary means through which learning occurs. However, this mechanism is, to some extent, quite fragile: Proof of Theorem 2 Part (b)

it requires the possibility of extremely strong signals and Our ﬁrst task is to show the assumption on ⌥0 implies a a precise understanding of the decisions made by individ- similar condition on ⌥1: ual neighbors. Preference heterogeneity introduces noise in the chain of observations, and our results show this noise ⌥ 4r5 ⌥ 4r5 1 lim 0 c lim 1 c 1 0 challenges the operation of the improvement principle. The r 0 r K 1 = =) r 0 r K = É K ! É ! ✓ ◆ Lobel and Sadler: Preferences, Homophily, and Social Learning 576 Operations Research 64(3), pp. 564–584, © 2016 INFORMS

y An application of Lemma 1 and integration by parts gives For a given Ö>0, the argument R1 Ö goes to zero as y É approaches 1, so for sufﬁciently large y and all t ∂ 1 Ö, r r s É ⌥14r5 d⌥14s5 d⌥04s5 y y = 0 = 0 1 s 41 y5⌥ 4R 5 y⌥ 4R 5 É É 0 t É 1 t Z r Z r 1 4r5 4s5ds0 y K 1 K 1 y K ⌥0 2 ⌥0 c 41 y541 h4Ö554R 5 É y41 h4Ö55 É 4R 5 = 1 r É 0 41 s5 ∂ É + t É É K t É Z É  Using our assumption on , given any Ö>0, we may ﬁnd K K 1 ⌥0 41 y5 ct É 41 h4Ö554t41 y5 y41 t55 r such that for all r r , = É + É + É Ö ∂ Ö ✓  K 1 K 1 K 1 K 1 yt41 h4Ö55 É 44t41 y5 y41 t55 5É 0 c41 Ö5r É ∂ ⌥04r5 ∂ c41 Ö5r É 0 É É K · É + É É + ◆

Thus, for any r ∂ rÖ we have For y sufficiently close to 1, the portion of the improvement term r K 1 r c41 Ö5s É 1 ds 4s5ds 1 Ö É 2 ∂ 2 ⌥0 É y y 0 41 s5 0 41 s5 41 y5⌥04Rt 5 y⌥14Rt 5 d4t5 (4) Z É Z É 0 É É r K 1 c41 Ö5s É Z ds0 ∂ + 2 is then bounded above by 0 41 s5 É Z 1 Ö c K É K 1 Now compute, 41 y5 4t É 6K41 h4Ö554t41 y5 y41 t55 K É 0 + É + É r K 1 K r K 1 Z s É r s É 4K 1541 h4Ö55ty75 ds 4K 15 ds É É É 2 K 1 0 41 s5 = 1 r É É 0 41 s5 44t41 y5 y41 t55 5É d 4t50 (5) Z É É Z É K r · É + É r à K 1 i 4K 15 s É + ds For t 601 1 Ö7, the integrand in Equation (5) converges = 1 r É É 0 i 0 uniformly2 asÉy approaches 1 to É Z X= r K r K K 1 K 1 4K 15 O4r + 50 t É 4K 42K 15t h4Ö54K t55 = 1 r É É K + É É + É 0 É 41 t5K É It follows that for any r ∂ rÖ, Therefore, for any Ö0 > 0 and y sufficiently close to 1, K K r r r K 1 + 1 Ö ⌥04r5 c41 Ö5 4K 15 O4r 5 É K 1 1 r É + 1 r É É K + 4t É 6K41 h4Ö554t41 y5 y41 t55 4K 15 É ✓ É ◆ 0 + É + É É É ∂ ⌥14r51 and Z K 1 41 h4Ö5ty575 44t41 y5 y41 t55 5É d4t5 r r K r K · É · É + É 1 Ö K 1 ⌥14r5 ∂ ⌥04r5 c41 Ö5 4K 15 É t É 4K 42K 15t h4Ö54K t55 1 r É É 1 r É É K Ö0 d 4t50 ∂ É É +K É É ✓ É ◆ + 0 41 t5 K 1 O4r + 50 Z É + As Ö approaches zero, this converges to K Dividing through by r and letting r go to zero, we have 1 K 1 t É 4K 42K 15t5 Ö0 d 4t50 (6) É K É 1 Ö ⌥14r5 1 Ö + 0 41 t5 c 1 + ∂ lim ∂ c 1 É Z É É K r 0 r K É K ✓ ◆ ! ✓ ◆ If the integral in Equation (6) is negative, then we can for any Ö, proving the result. choose some Ö0 > 0, Ö>0, and a corresponding y⇤ such that We now show that for any y 6y⇤1 15, the integral in Equation (5) is negative. Therefore,2 1 K 1 t É 4K 42K 15t5 1 Ö Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. d 4t5< 01 É K É É y y 0 41 t5 41 y5⌥04Rt 5 y⌥14Rt 5 d4t5< 0 Z É 0 É É Z h i implies the existence of Ö>0 such that Z4y5< y for all for any y 6y⇤1 15. y 61 Ö115. Choose a function h4Ö5 such that h4Ö5> 0 To complete2 the proof, we show that for a sufficiently whenever2 É Ö>0, and small choice of Ö, there exists yÖ < 1 such that

1 Ö É h4Ö5 1 lim d4t5 00 41 y5 4Ry5 y 4Ry5 d 4t5< 0 Ö 0 K ⌥0 t ⌥1 t ! 0 41 t5 = 1 Ö É É Z É Z É h i Lobel and Sadler: Preferences, Homophily, and Social Learning Operations Research 64(3), pp. 564–584, © 2016 INFORMS 577

1 1 for all y 6y 1 15. Thus, for y 6max4y 1y⇤51 15, the entire K 1 K Ö Ö for all Ö< 2 . Since 0 t É 4K 42K 15t5/41 t5 d4t5 2 2 1É É É improvement term is negative. Again using Lemma 1 and is positive, we know that tK /41 t5K d4t5 is finite. integration by parts we have R 0 É Hence, for any Ö0, we can choose Ö such that 4Kc/c5 1 K K R ¯ · y y t /41 t5 d4t5<Ö0. 41 y5⌥04Rt 5 y⌥14Rt 5 1 Ö É É É É 1 1 K 1 K y y Take Ö0 < t É 4K 42K 15t5/41 t5 d4t5, Rt Rt 3 0 1 y r R 1 É K É K É 41 y5d⌥04r5 yd⌥14r5 É É d⌥14r5 choose Ö so that 4Kc/c5 1 Ö t /41 t5 d4t5<Ö0, and = 0 É É = 0 r R ¯ É É find y⇤ such that Z Ry Z R y41 2t5 t 41 y5⌥ 4r5 4Ry5 1 dr0 1 Ö É ⌥1 t É 2 É y y 0 = t + r 41 y5⌥04Rt 5 y⌥14Rt 5 d4t5 Z 0 É É Since ⌥1 is increasing and bounded, there exist constants Z h 1 K 1 i 0 < c1 we have 2 2 2 2 C such that Z4y5> y C41 y5K . The analysis of Lobel y + É y41 2t5 Rt 41 y5⌥ 4r5 et al. (2009) now implies the result. É 4Ry5 1 dr É ⌥1 t É 2 We use the following version of the improvement prin- t + 0 r Z ciple to establish positive learning results. cy41 2t5 y K c41 y5 y K 1 É 6R 7 ¯ É 6R 7 É 0 ∂ t t + K 1 t Lemma 4 (Improvement Principle). Let Assumption 3 É The right-hand side is negative whenever t>y4c hold. Suppose there exists a sequence of neighbor choice · functions 8Én9n and a continuous, increasing function 4K 15 c5/42yc4K 15 42y 15c5. This threshold 2 É +¯ É + É ¯ Z2611 3/27 611 3/27 with the following properties: is decreasing in y, with a limiting value that is strictly ! less than 1. Therefore, for any Ö<1 c4K 15 c/42c (a) The chosen neighbor topology ⌘É features expand- É É +¯ · ing subnetworks. 4K 15 c5, we can find yÖ such that É +¯ (b) We have Z4y5> y for any y<3 . y y 2 41 y5⌥04R 5 y⌥14R 5<0 (c) For any Ö>0, we have É t É t for all t æ 1 Ö and all y 6yÖ1 15, completing the lim ⇣ ë 8⇧ë 6u4tn1xn1à5 Én4B4n557 É 2 n proof. É !à ó < Z4⇧ë 6u4tÉ 4B4n551xÉ 4B4n551 à575 Ö9 00 Proof of Proposition 1 n n É = Fixing an Ö>0, we can bound the improvement term as Asymptotic learning obtains. 1 Proof. Note that asymptotic learning is equivalent to y y 41 y5⌥04Rt 5 y⌥14Rt 5 d4t5 0 É É 3 Z h 1 Ö i lim ⇧ë 6u4tn1xn1 à57 0 É n y y !à = 2 æ 41 y5⌥04Rt 5 y⌥14Rt 5 d4t5 0 É É We construct two sequences 8ák9 and 8îk9 with the prop- Z 1h i y erty that for all k æ 1 and n æ ák, ⇧ë 6u4tn1xn1 à57 æ îk. y⌥14Rt 5 d4t50 3 É 1 Ö Upon showing that limk îk 2 , we shall have our result. Z É !à = Carrying out a similar exercise as in the previous proof, Using our assumptions, for any integer K and Ö>0, we we can show that for any Ö0, we can find y⇤ such that can find a positive integer N4K1Ö5 such that 1 Ö É y y Ö 41 y5⌥04Rt 5 y⌥14Rt 5 d4t5 ⌘4Én4B4n55 8m91 m

Proceed by induction to show that ⇧ 6u4t 1x 1 à57 î with the first line corresponding to the payoff when à 0 ë n n æ k = for all n æ ák. The base case k 1 trivially holds because and the second to the payoff when à 1. An application an agent may always choose the= action preferred a priori of Lemma 1 provides the following inequalities.= according to her type, obtaining expected utility at least 1. 1 L L L Considering n æ ák 1 we have ⌥ 4L5 É ⌥ 4L5 ⌥ 1 + 0 æ L 1 + 4 1 2 ⇧ 6u4t 1x 1à57 ✓ ◆ ë n n U 1 U 1 U ⇧ 6u4t 1x 1à5 É 4B4n55 m7⌘4É 4B4n55 m5 1 ⌥14U5æ 41 ⌥04U55 É 1 ⌥1 + 0 æ ë n n ó n = n = É 1 U É + 4 É 2 m0, define pÖ ⇣4 tm tn >Ö Én4B4n55 m ó n = ó É ó ó = on the realized t and the event É 4B4n55R m. We fur- m5. Integrating over t and using Assumption 3, for n n = n ther define Eit as the expected utility of agent m given any Ö>0 we can bound the integrated base terms from that tm t and Ei analogously to Pi. These quantities are below by related= by 1 1 6u4t 1x 1à57 4s t54P P 5d 4t5d 4s5 2 1 2 1 (7) ⇧ë m m 0t 1t tm tn tn E0t tP0t t1E1t 4 t5P1t t0 + 0 0 É É ó = + É = É + Z Z Note that P and E are constants, independent of the ⇧ 6u4t 1x 1 à57 Ö p 0 (12) it it æ ë m m É É Ö realization of tn, while Pi and Ei are random variables as functions of t through the conditional distribution . Moving to the improvement terms, note that n tm tn Define the thresholds ó 1 1 E 41 t5 P P d 4t5 0t É É d 4t50 tn41 P05 tnP0 0 0t tm tn tm tn L 1U 0 = 0 ó = 0 2t ó tn É tn = tn41 P05 41 tn5P1 = tnP0 41 tn541 P15 Z Z É + É + É É The last integrand is Lipschitz continuous in t for t For the remainder of the proof we suppress the subscript t n bounded away from zero. Therefore, for any Ö with 0 < to simplify notation. An application of Bayes’ rule shows 2Ö1 if p >U0 E0t 41 t5 Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. < n d 4t5 ∂ É É tm tn 0 2t ó Fixing>tn, the expected payoff from the decision xn is easily Z : ˜ 1 E 41 t 5 p computed as 0t n d 4t5 cÖ Ö 0 ∂ É É tm tn 1 0 2tn ó + + tn 6⌥ 4L541 t 5 4⌥ 4U5 ⌥ 4L5542t P 1 t 5 Z 2 0 + n + 0 É 0 n 0 + É n This is equivalent to 41 ⌥ 4U5541 t 5 41 ⌥ 4U5542 t 5 + É 0 É n + É 1 É n E 41 t 5 p E 41 t 5 p 0 É É n cÖ Ö P 0 É É n cÖ Ö 0 4⌥ 4U5 ⌥ 4L554241 t 5P t 5 ⌥ 4L5t 71 (9) ∂ 0 ∂ + 1 É 1 É n 1 + n + 1 n 2tn É É tn 2tn + + tn Lobel and Sadler: Preferences, Homophily, and Social Learning Operations Research 64(3), pp. 564–584, © 2016 INFORMS 579

Similarly, we can find a constant c such that Recall that y⇤ y⇤4t 5 is a function of t through E = n n 0 and E1. Using Equations (15) and (16), we can integrate E1 tn pÖ E1 tn pÖ over tn to bound the contribution of the improvement terms É cÖ ∂ P1 ∂ É cÖ 0 241 t 5 É É 1 t 241 t 5 + + 1 t to agent n’s utility. Since the improvement terms are non- É n É n É n É n negative, we can choose any Ö0 > 0 and restrict the range Consider a modification of the improvement terms in Equa- of integration to obtain a lower bound of tion (10) where we replace P by 4E 41 t 55/42t 5 and 0 0 É É n n P by 4E t 5/4241 t 55, including in the definitions of 1 Ö0 1 1 É n É n É L and U . Our work above, together with the continuity of Z4y⇤4t55 2Ñ4Ö1 pÖ1t5 d4t50 Ö0 É the belief distributions, implies that the modified terms dif- Z h i fer from the original improvement terms by no more than Now define y ⇧ë 6u4tm1xm1 à57 and note that y 1 = 1 = some function Ñ4Ö1 p 1t 5, where Ñ converges to zero as 0 y⇤4t5d4t5. Observe since y⇤ æ 2 , this implies ⇣ ë 4y⇤ ∂ Ö n 3 y/25 43 2y5/41 2y5. Choosing Ö0 sufficiently small Ö and pÖ approach zero together, and the convergence is 4 æ weR + can boundÉ the improvement+ terms below by uniform in tn for tn bounded away from 0 and 1. We can then bound the improvement terms by 1 Ö 3 2y É 0 É Z4y5 2Ñ4Ö1 pÖ1t5d4t50 (17) L L 241 2y5 É Ö0 + tn41 P05 ⌥1 Z É 4 2 3 ✓ ◆ Finally, define Z4y5 y 443 2y5/241 2y55Z4 4 1 41 t E 52 1 t E y/25. Combining Equations= + (10), (É12), and (17+), and using+ + n É 0 ⌥ + n É 0 Ñ4Ö1p 1t 51 (13) æ 8 1 E E 1 241 E E 5 É Ö n that Z is nonincreasing, we have É 0 + 1 ✓ É 0 + 1 ◆ ⇧ 6u4t 1x 1à5 É 4B4n55 m7 1 U 1 U ë n n ó n = 41 t 541 P 5 É 1 ⌥ + 1 Ö0 É n É 1 4 É 0 2 É ✓ ✓ ◆◆ æ Z4y5 Ö pÖ 2 2Ñ4Ö1 pÖ1t5d4t5 É É É Ö0 1 42 t E 52 2 t E Z É n É 1 1 ⌥ 1 É n É 1 æ 8 1 E E É 0 É 241 E E 5 for any Ö>0 and some fixed Ö0 > 0. The hypothesis of + 0 É 1 ✓ ✓ + 0 É 1 ◆◆ the theorem implies that for any Ö>0, p approaches zero Ñ4Ö1 p 1t 50 (14) Ö É Ö n as n grows without bound. Using the uniform convergence to 0 of Ñ for t 6Ö01 1 Ö07, we see that Z satisfies the 2 n 2 É Let y⇤ 4E0 E15/2; we must have either E0 ∂ 3 y⇤ tn =2 + 3 + hypothesis of Lemma 4, completing the proof. É or E y⇤ 1 t since y⇤ . The first term on the 1 ∂ 3 + É n ∂ 2 right-hand side of Equation (13) can be rewritten as Proof of Proposition 3 2 1 41 tn E05 1 tn E0 Since ⌘0 is strongly homophilous, there exist neighbor + É ⌥1 + É 1 choice functions 8Én9n for which limn ⌘04 tn tÉ 4B4n55 8 1 2E0 2y⇤ 241 2E0 2y⇤5 n É + ✓ É + ◆ >Ö5 0, and the improvement2 principle!à ofó É Lemma 4ó = 2 applies to establish learning along the chains of observations which we note is decreasing in E0 for E0 ∂ 3 y⇤ tn (we 1 2 + in ⌘0 . We shall see the same neighbor choice functions can have used that y⇤ æ 2 ). Therefore, if E0 ∂ 3 y⇤ tn, then É Equation (13) is bounded below by + be used to establish learning in ⌘4ã5. Considering an arbitrary agent n with Én4B4n55 m, 2 recall the decision thresholds L and U defined in= the 1 2 1 42/35y⇤ tn tn 1 y⇤ ⌥1 É Ñ4Ö1 pÖ1tn50 (15) proof of Theorem 3. Here we let L and U denote the 16 É 3 4 É tn tn ✓ ◆ ✓ ◆ corresponding thresholds for the network , and let L0 ⌘ tn 2 Similarly, if E y⇤ 1 t , then Equation (14) is and Ut0 denote the thresholds for the network ⌘0. As in 1 ∂ 3 + É n n bounded below by the proof of Theorem 3, we suppress the subscript tn from here on. A lower bound on the improvement agent n makes 2 1 2 1 42/35y⇤ in network ⌘0 can be obtained by using the suboptimal 1 y⇤ 1 ⌥ 1 É 16 É 3 É 0 É 4 decision thresholds L and U . Since the last of the base ✓ ◆ ✓ ✓ ◆◆ terms (see Equation (11)) does not depend on the decision

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. Ñ4Ö1 p 1t 50 (16) É Ö n thresholds, the proof of Theorem 3 may be followed with essentially no changes to obtain an improvement function To simplify notation, deﬁne Z0 satisfying the assumptions of Lemma 4. Crucially, this 2 improvement function applies for decisions made via the 1 2 1 42/35y⇤ Z4y⇤5 1 y⇤ min ⌥ É 1 suboptimal decision thresholds that are appropriate to the = 16 É 3 1 4 ✓ ◆  ✓ ◆ network ⌘ as opposed to ⌘0. 1 42/35y⇤ By assumption, we have an improvement function Z for 1 ⌥ 1 É 0 É 0 É 4 the network ⌘. Considering an arbitrary ã-mixture of ⌘ ✓ ✓ ◆◆ Lobel and Sadler: Preferences, Homophily, and Social Learning 580 Operations Research 64(3), pp. 564–584, © 2016 INFORMS

and , we suppose that n uses the decision thresholds Since the social belief q is bounded below by 0, we nec- ⌘0 Ån L and U , instead of the optimal thresholds for ⌘4ã5. It is essarily have immediately clear that 4q > pÖ à 05 pÖ0 ⇣ ë Ån ∂ 4ã5 ó = Z min 8ãZ 41 ã5Z09 ã 601 17 = 2 + É Therefore, is an improvement function for the suboptimal decision ⇣ 4x 0 à 05 rule. This provides a lower bound on the true improvement, ë Ån = ó = so 4ã5 is an improvement function for 4ã5. Lemma 4 1 Z ⌘É 41 pÖ5 4x 0 à 01q pÖ1 t t5d 4t5 æ ⇣ ë Ån Ån Ån Ån applies. É É 0 = ó = = = The following lemmas provide the essential machinery Z 1 41 pÖ5 4RpÖ5d 4t50 for the proof of Theorem 4. The first lemma shows that ⌥0 t Ån = É 0 if a subsequence of decisions in a cluster provides infor- Z mation that converges to full knowledge of the state, the Condition (b) of uniform diversity ensures this last inte- entire cluster must learn. The essence of the proof is that gral approaches 1 as Ö approaches zero, completing the the expected social belief of agents in an identified cluster, proof. É conditional on the decisions of this subsequence, must also Lemma 6. Let Assumption 4 hold. Let Å be a stopping time converge on the state à. Since social beliefs are bounded within a cluster, and suppose I is a random variable taking between 0 and 1, the realized social belief of each agent values contained in agent Å’s information set with proba- in the cluster deviates from this expectation only with very bility 1. For a given realization I of I, define q ⇣ 4à ˆÅ = ë = small probability. 1 I I5, and suppose there exist p1d > 0 such that The second lemma establishes a lower bound on the ó = 1 q 1 q amount of new information provided by the decision of an ÉˆÅ ÉˆÅ Å4RÇ d5 Å4RÇ d5 æ p0 agent within a uniformly diverse cluster. Given a subset ¯ É É + of the information available to that agent, we bound how If for some Ö>0 we have qÅ Ö, then there exists some ˆ æ much information the agent’s decision conveys on top of Ñ>0 such that that. Under the assumption of uniform diversity, we can ⇣ ë 4xÅ 0 I I1 à 05 show that either the agent acts on her own signal with pos- = ó = = æ 1 Ñ0 ⇣ 4x 0 I I1 à 15 + itive probability, or she has access to significant additional ë Å = ó = = information not contained in the subset we considered. This Similarly, if q 1 Ö, then we can find Ñ>0 such that ˆÅ ∂ É is a key technical innovation that allows us to apply mar- ⇣ ë 4xÅ 1 I I1 à 15 tingale convergence arguments even when agents within a = ó = = æ 1 Ñ0 ⇣ 4x 1 I I1 à 05 + cluster do not have nested information sets. ë Å = ó = = Proof. Fix a realization I of I, and define Lemma 5. Let Assumption 4 hold; suppose 8Åi9i is a uniformly diverse, identified cluster. If there exists2 a Pqi d⇣ ë 4qÅ q I1 à i50 subsequence 8Åi4j59j such that the social beliefs qk = = ó = 4à 1 x 1j 2k5 converge to à almost surely,ˆ then= ⇣ ë Åi4j5 ∂ We can rewrite the ratio asymptotic= ó learning obtains within the cluster. ⇣ 4x 0 I1à 05 ë Å = ó = Proof. Consider the case where à 0; the case à 1 is ⇣ 4x 0 I1à 15 analogous. Fixing any Ö>0, by assumption= there= exists ë Å = ó = 1 4x 0 I1à 01t t5d 4t5 some random integer KÖ, which is finite with probabil- 0 ⇣ ë Å Å Å 1 = ó = = ity 1, such that qk Ö/3 for all k KÖ. Fix a k large = ∂ æ R0 ⇣ ë 4xÅ 0 I1à 11tÅ t5dÅ4t5 enough so that 4Kˆ >k5 Ö/3. Since the cluster is identi- = ó = = ⇣ Ö ∂ 1 1 q fied, for large enough n there are neighbor choice functions R0 0 Pq0⌥04Rt 5dq dÅ4t5 415 4k5 1 1 q ÉÅ 10001ÉÅ such that = n n R0 R0 Pq1⌥14Rt 5dq dÅ4t5 Ö 1 1 q q 4i5 R R P 4R 5 4P P 5 4R 5dq d 4t5 ⇣ ë 4 i ∂ k ÉÅ 4B4Ån55 Åi5 ∂ 0 0 0 q1⌥0 t q0 q1 ⌥0 t Å 9 ó n 6= 3 + É = 1 1 P 4Rq5dq d 4t5 Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. ⌥ t We then have R R 0 0 q1 1 Å 1 1 R R q q Pq1⌥04Rt 5 4Pq0 Pq154⌥04Rt 5 ⇧ë 6qÅ x 4i5 1i k1 à 07 n ÉÅn 4B4Ån55 ∂ = 0 0 + É ó = ✓Z Z Ö Ö qÅ 1 ⇧ 6q à 07 ⌥ 4Rtˆ 55dq d 4t5 ∂ É 3 ë ˆk ó = + 3 É 0 Å ✓ ◆ ◆ 2 1 1 1 Ö 2Ö q É ∂ 1 ⇧ë 6qk k æ KÖ1à 07 ∂ Ö0 Pq1⌥14Rt 5dq dÅ4t5 0 (18) É 3 ˆ ó = + 3 · 0 0 ✓ ◆ ✓Z Z ◆ Lobel and Sadler: Preferences, Homophily, and Social Learning Operations Research 64(3), pp. 564–584, © 2016 INFORMS 581

It follows from the definition of the social belief and an For q in the given range, Lemma 1 part (d) gives application of Bayes’ rule that for almost all q we have ⇣ ë 1 q 0 ⌥04Rt 5dÅ4t5 1 1 q 1 Pq0 É ⌥ 4Rt 5d 4t5 q 1 1 0 R0 1 Å = + qÅ É Pq1 q q 1 q  ✓ ˆ ◆ R ⌥04Rt 5dÅ4t5 ⌥04Rt 5dÅ4t5 0 + q = 1 q In particular, if qP , and the same holds R ⌥14Rt 5RdÅ4t5 ˆÅ q0 q1 0 with the inequalities reversed. Now for any given Ö0 > 0, q q 1 q 4⌥041/25/R⌥141/255 0 ⌥14Rt 5dÅ4t5 q ⌥14Rt 5dÅ4t5 define qÉ Rq , so we may fix Ö0 2 b Integrating over all such q and choosing Ö0 small enough small enough so that qÖÉ >Rq . Restricting the range of 0 Ö+ we have integration, Equation (19) is bounded0 below by

1 1 q ÖÉ q q ⌥041/25 0 Ö+ 4P P 5 4R 5 dq d 4t5 1 1 4R 0 5 d 4t5 q0 q1 ⌥0 t Å b ⌥1 t Å 0 0 É + ⌥141/25 É R ✓ ◆ q+ Z Z 1 Z Ö0 Ö0 qÉ q Ö0 ˆÅ qÉ æ 4⌥04Rt 5 ⌥04Rt 55 dÅ4t5 ⌥041/25 Ö0 6 0 É æ 1 1 ⌥14b5dÅ4t5 + 41/25 É Rb Z ⌥1 q pÖ0 qÉ q ✓ ◆ Ö+ Ö0 Å Z 0 æ min 4⌥04Rt 5 ⌥04Rtˆ 55 > 01 6 1 qÅ 1 qÅ É t 6RÇÉˆ d1R Éˆ d7 ⌥041/25 b 2 + Ç¯ É É 1 1 ⌥14b54Å4qÖ 5 Å4Rq 55 > 1 = + 41/25 É 0 É Ö+ ✓⌥1 ◆ 0 q ˆÅ where the last inequality follows because Rt is bounded for small enough Ö0. Choosing Ö0 small enough for all strictly between Ç and Ç¯ for the given range, and ⌥0 has three cases to satisfy the corresponding inequalities finishes full support in 4Ç1 Ç5¯ . Since the denominator in Equa- the proof of the first statement; the second statement is tion (18) is bounded by 1, and the first term of the numer- analogous. É ator is at least as large by Lemma 1 part (d), the inequality above gives us our desired Ñ. For the second case, if Proof of Theorem 4

⇣ ë 4qÖ+ ∂ qÅ I I5æ 1/3, then a similar exercise gives 0 ó = We ﬁrst construct a subsequence on which we can apply Lemma 5. Deﬁne the indices i4j5 recursively, beginning 1 1 q with i415 1. For j>1, let 4Pq0 Pq15⌥04Rt 5 dq dÅ4t5 0 0 É = Z Z 1 1 Ö0 q q+ Ö min min8 4à 1 x 1l

variable q⇤ by the martingale convergence theorem. Con- Proof of Proposition 4 ditional on à 1, the likelihood ratio 41 q 5/q is also a = Éˆk ˆk The negative implication follows similar reasoning as nonnegative martingale (Doob 1953, Eq. (7.12)). Therefore, Example 4. If one cluster fails to observe the other beyond conditional on à 1, the ratio 41 q 5/q converges with = Éˆk ˆk a certain point, then once social beliefs become strong probability 1 to a limiting random variable. In particular, enough in favor of the cluster’s preferred state, the cluster 1 q⇤ will herd. Since this happens with positive probability, the ⇧ë É à 1 < 1 q⇤ = à other cluster obtains bounded information from this cluster,  and herds with positive probability as well. We now prove (Breiman 1968 , Theorem 5.14), and therefore, q > 0 with ⇤ the positive implication. probability 1 when à 1. Similarly, q /41 q 5 is a non- = ˆk Éˆk Define qC ⇣ 4à 1 x 1i k5 and qD ⇣ 4à 1 negative martingale conditional on à 0, and by a similar k ë ci ∂ k ë x 1i k5ˆ. Following= = theó same argumentˆ as= in the proof= ó = di ∂ argument we have q⇤ < 1 with probability 1 when à 0. C D = of Theorem 4, the sequences qk and qk converge almost We shall see that the random variable q⇤ equals à with Cˆ Dˆ probability 1. Let x4k5 denote the vector comprised of the surely to random variables q and q , respectively. We actions 8x 9 , and suppose without loss of generality further have for each i 801 19 that Åi4j5 j∂k 2 that x 0 for infinitely many k. Using Bayes’ rule Åi4k 15 C D + = ⇣ ë 4q 1 i à i5 ⇣ ë 4q 1 i à i5 00 twice, = É ó = = = É ó = = 4k5 Without loss of generality, assume t 601 1/25; following q ⇣4à 1 x 01x 5 c1 k 1 Åi4k 15 2 ˆ + = = ó + = the argument from Theorem 4, and applying Lemma 6, 4x4k5 à 05 4x 0 x4k51à 05 1 C ⇣ ⇣ Åi4k 15 É we immediately find that the support of q is contained + 1 ó = = ó = D = + ⇣4x4k5 à 15⇣4x 0 x4k51à 15 in 809 61 Ç1 17 and the support of q is contained in  Åi4k 15 [ É ó = + = ó = 601 1 Ç7 819. 4k5 1 ¯ ⇣4x 0 x 1à 05 É É [ C D 1 Åi4k 15 Define the constants Ñ and Ñ by 1 1 + = ó = 0 (21) 4k5 = + qk É ⇣4xÅ 0 x 1à 15  ✓ ◆ i4k 15 C C ˆ + = ó = Ñ inf8Ñ ⇣ 4q 809 61 Ñ1 175 19 and To simplify notation, let = ó ë 2 [ É = ÑD inf8Ñ 4qD 601 Ñ7 8195 190 4x 0 x4k51à 05 ⇣ ë 4k5 ⇣ Åi4k 15 = ó 2 [ = f4x 5 + = ó = 0 = 4x 0 x4k51à 15 C C ⇣ Åi4k 15 Our first task is to show that if q 61 Ñ 1 15, then only + = ó = 2 É D Thus, finitely many agents in C select action 0; similarly if q 401ÑD7, only finitely many agents in D selection action 1.2 1 1 4k5 É We will analyze only the cluster C since the other result qk 1 1 1 f4x 5 0 ˆ + = + q É is analogous. The argument should be familiar by now; we  ✓ ˆk ◆ establish a contradiction with almost sure convergence of Suppose qk 6Ö11 Ö7 for all sufficiently large k and ˆ 2 É the martingale. some Ö>0. From our construction of the subsequence Åi4j5, Suppose x 0 with positive probability, and recall this implies ck 1 the computation+ = leading to Equation (18) in the proof of q 4à 1 x 1m Éi4l5 4B4Å 551 l k5 k ⇣ ë m Åi4k 15 i4k 15 ∂ Lemma 6. Consider the corresponding result with I ˜ ⌘ = ó = + + 8x 9 , = Ö Ö ci i∂k 1 1 2 2 É 2 4x 0 x 1i k1 à 05  ⇣ ë ck 1 ci ∂ + = ó = for all such k. By Lemma 6 together with the uniform diver- 4x 0 x 1i k1 à 15 4k5 ⇣ ë ck 1 ci ∂ sity property, there exists ÑÖ > 0 such that f4x 5 æ 1 ÑÖ. + = ó = 1 1 q q q + ˆÅ Given this ÑÖ, we can bound the difference qk qk 1. If 0 0 Pq1⌥04Rt 5 4Pq0 Pq154⌥04Rt 5 ⌥04Rt 55dqd 4t5 + + É É É ˆ Éˆ 1 1 q 0 xÅ 0, we have = i4k 15 R R 0 0 Pq1⌥14Rt 5dqd 4t5 + = É 1 1 É If qC 61 ÑC 1 1 R Ö7R for some fixed Ö>0, then by almost qk qk 1 æ qk 1 1 41 ÑÖ5 ˆ Éˆ+ ˆ É + q É + 2 É É C  ✓ ˆk ◆ sure convergence, for all k large enough we must have qk æ 1 ÑC Ö 1 Ç Ö. Since t 1 with probabilityˆ 1, qk41 qk5ÑÖ Ö41 Ö5ÑÖ æ ck 1 ∂ 2

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. ˆ Éˆ É É É qÉ É q + æ we have ⌥ 4Rt 5 ⌥ 4Rt 5 0 for any q>1 Ç, so these = 1 ÑÖ41 qk5 1 41 Ö5ÑÖ 0 1 + Éˆ + É values of q do not= contribute= to the integrals;É from here on K04Ö5 > 00 (22) ⌘ we consider the densities Pqi above to be conditional on this If q 6Ö11 Ö7 for all sufﬁciently large k, this implies the event. We analyze two cases: First suppose that conditional ˆk 2 É sequence 8qk9 is not Cauchy. This contradicts the almost on q ∂ 1 Ç, we have q ∂ 1 Ç 2Ö with probability ˆ É É É sure convergence of the sequence, so the support of q⇤ at least 1/2. Following the argument of Lemma 6, this gives cannot contain 6Ö11 Ö7 for any Ö>0. Lemma 5 completes us a lower bound on the second term in the numerator, and É the proof. É therefore bounds the ratio strictly above one. Lobel and Sadler: Preferences, Homophily, and Social Learning Operations Research 64(3), pp. 564–584, © 2016 INFORMS 583

Alternatively, suppose that conditional on q ∂ 1 Ç we for any n æ NÖ and all j. É D D C have q 61 Ç 2Ö11 Ç7 with probability at least 1/2. Suppose ⇣ ë 4q 401Ñ 7 q q1 à 15< For any2t sufficientlyÉ É closeÉ to 1/2, we have 41 Ç541 q5/4Çq5.2 From the aboveó work,= we= can find a collectionÉ É of sequences 8x 9 with positive measure ck k q ⌥ 4Ç 2Ö5 C 2 ⌥04Rt 5 0 conditional on q q, an Ö>0, and an integer NÖ0 such q æ + > 10 = ⌥ 4Rt 5 ⌥ 4Ç 2Ö5 that 1 1 + ⇣ 4x 0 8x 9 1à 15 41 Ç541 q5 Since is supported in any interval 61/2 Ñ1 1/27, posi- ë dm = ó ck k∂n = É É É É ∂ Ö tive measure is assigned to these values of t, and together ⇣ 4x 0 8x 9 1à 05 Çq É ë dm = ó ck k∂n = with Lemma 1 part (d) this again allows us to bound the for all n1m N when one of the sequences 8x 9 is ratio strictly above one. Following the argument of The- æ Ö0 ck k C 2 orem 4, infinitely many agents in C selecting action 0 realized. For all sufficiently large n, qn is within Ö/2 of q; C C ˆ would contradict almost sure convergence of qk if q this in turn implies that for all such n and m æ NÖ0, we have C ˆ 2 61 Ñ 1 1 Ö7. Since Ö was arbitrary, the result follows. an Ö0 > 0 such that OurÉ nextÉ task is to show for a fixed q 61 ÑC 1 15 we 2 É have ⇣ ë 4à 1 xd 01xc 1k∂ n5 ∂ 1 Ç Ö00 = ó m = k É É 41 Ç541 q5 D D C Note that for any subsequence of the cluster D, since types ⇣ ë 4q 401Ñ 7 q q1 à 15 æ É É 0 (23) 2 ó = = Çq are conditionally independent and arbitrarily close to 1, infinitely many members of the subsequence will select D Similarly, for a fixed q 401Ñ 7 we have action 0. If d B4c 5 for n1m as above, we have ⇧ 6q 2 m 2 n ë cn ó xd 01xc 1k∂ n7 ∂ 1 Ç Ö0. Since social beliefs are Çq m = k É É C C C ¯ bounded between 0 and 1, there is a positive lower bound ⇣ ë 4q 6Ñ 1 15 q q1 à 05 æ 0 (24) 2 ó = = 41 Ç541 q5 on the probability that qc ∂ 1 Ç Ö0/2 conditional on É ¯ É n É É these observations. Since infinitely many agents cn observe C D Let N and N denote the smallest integer valued ran- such dm and have types arbitrarily close to 1/2, with prob- C dom variables such that xc 1 for all k æ N and xd 0 ability 1 infinitely many agents in C choose action 0. We k = k = for all k æ N D. From the above work, we have that N C conclude that either ÑC 0 or is finite with probability 1 conditional on qC 61 ÑC 1 15, = 2 É and N D is finite with probability 1 if qD 401ÑD7. Con- D D lim ⇣ ë 4q 401Ñ 7 xc 1k∂ n1 à 15 sider a particular sequence of decisions 8x29 such that n 2 ó k = ck k !à qC q 61 ÑC 1 15. This immediately implies2 41 Ç541 q5 = 2 É ⇣ 4qD 401ÑD7 qC q1 à 15 É É = ë 2 ó = = æ Çq ⇣ ë 4xd 0 8xc 9k 1à 15 lim k = ó k 2 = k ⇣ ë 4xd 0 8xc 9k 1à 05 as claimed. In the former case, decisions of agents in C !à k = ó k 2 = D D become fully informative, and asymptotic learning clearly ⇣ ë 4q 401Ñ 7 8xc 9k 1à 15 2 ó k 2 = obtains. On the other hand, if Equations (23) and (24) hold, D D = ⇣ ë 4q 601Ñ 7 8xc 9k 1à 05 2 ó k 2 = D D C C ⇣ ë 4q 401Ñ 7 8xc 9k 1à 150 ⇣ ë 4q 61 Ñ 1 17 à 05 = 2 ó k 2 = 2 É ó = ÑD C C D Given an Ö>0 we can choose NÖ large enough so that ⇣ ë 4q 61 Ñ 1 17 q q1à 05 = 0 2 É ó = = 4N N 8x 9 5 1 Ö. If further we have N ⇣ ë C ∂ Ö ck k∂NÖ æ Ö Z D ó C É d⇣ 4q q à 05 larger than the realized N for our fixed sequence, we have · ë = ó = the bounds ÑD Çq ¯ D æ d⇣ ë 4q q à 05 4x 0 8x 9 1à 15 0 41 Ç541 q5 = ó = ⇣ ë dj ck k ¯ = ó 2 = Z É É ÑD ⇣ ë 4xd 0 8xc 9k 1à 05 Çq j = ó k 2 = ¯ Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. 4x 0 8x 9 1à 15 Ö = 0 41 Ç541 q5 ⇣ ë dj ck k∂n Z É ¯ É ∂ = ó = + 1 and D 4x 0 8x 9 1à 05 Ö d⇣ ë 4q q à 05 D ⇣ ë dj ck k∂n d 4q q à 150 (25) = ó = É D = ó = ⇣ ë · d⇣ ë 4q q à 15 = ó = ⇣ ë 4xd 0 8xc 9k 1à 15 = ó = j = ó k 2 = D ⇣ ë 4xd 0 8xc 9k 1à 05 Now, observe the definition of q implies j = ó k 2 = ⇣ 4x 0 8x 9 1à 15 Ö D D ë dj ck k∂n d⇣ 4q q à 05 ⇣ 4à 0 q q5 1 q æ = ó = É ë = ó = ë = ó = É 0 4x 0 8x 9 1à 05 Ö D D ⇣ ë dj ck k∂n d⇣ 4q q à 15 = ⇣ 4à 1 q q5 = q = ó = + ë = ó = ë = ó = Lobel and Sadler: Preferences, Homophily, and Social Learning 584 Operations Research 64(3), pp. 564–584, © 2016 INFORMS

Substituting into Equation (25) gives Banerjee A, Fudenberg D (2004) Word-of-mouth learning. Games Econom. Behav. 46:1–22. Ç ÑD Bikhchandani S, Hirshleifer D, Welch I (1992) A theory of fads, fash- C ¯ D ion, custom, and cultural change as information cascades. J. Political ⇣ ë 4q 61 Ñ117 à 05æ d⇣ ë 4q q à 15 2 É ó = 1 Ç 0 = ó = Econom. 100:992–1026. É ¯ Z Breiman L (1968) Probability (Addison-Wellesley, Reading, MA). Ç Burt R (2004) Structural holes and good ideas. Amer. J. Sociol. 110(2): ¯ ⇣ 4qD 601ÑD7 à 150 = 1 Ç ë 2 ó = 349–399. É ¯ Çelen B, Kariv S (2004) Observational learning under imperfect information. Games Econom. Behav. 47(1):72–86. D A similar calculation for q gives Centola D (2011) An experimental study of homophily in the adoption of health behavior. Science 334(6060):1269–1272. 1 Ç Centola D, Macy M (2007) Complex contagions and the weakness of long D D C C ⇣ ë 4q 601Ñ 7 à 15æ É ⇣ ë 4q 61 Ñ 117 à 050 ties. Amer. J. Sociol. 113(3):702–734. 2 ó = Ç 2 É ó = Cipriani M, Guarino A (2008) Herd behavior and contagion in financial markets. The B.E. J. Theoret. Econom. 8(1):1–54. Combining the two gives Conley T, Udry C (2010) Learning about a new technology: Pineapple in Ghana. Amer. Econom. Rev. 100:35–69. Currarini S, Jackson M, Pin P (2009) An economic model of friend- ⇣ 4qC 61 ÑC 1 17 à 05 ë 2 É ó = ship: Homophily, minorities, and segregation. Econometrica 77(4): Ç41 Ç5 1003–1045. ¯ C C Currarini S, Jackson M, Pin P (2010) Identifying the roles of race-based æ É ⇣ ë 4q 61 Ñ 1 17 à 050 41 Ç5Ç 2 É ó = choice and chance in high school friendship network formation. Proc. É ¯ Natl. Acad. Sci. 107(11):4857–4861. Doob J (1953) Stochastic Processes (John Wiley & Sons, New York ). Since the ratio on the right-hand side is strictly larger than Gale D, Kariv S (2003) Bayesian learning in social networks. Games one, this leads to a contradiction if the probability is larger Econom. Behav. 45(2):329–346. than zero. We conclude that both probabilities are equal to Goeree J, Palfrey T, Rogers B (2006) Social learning with private and common values. Econom. Theory 28(2):245–264. zero, and consequently, asymptotic learning obtains. É Golub B, Jackson M (2012) How homophily affects the speed of learning and best-response dynamics. Quart. J. Econom. 127(3):1287–1338. Endnotes Granovetter M (1973) The strength of weak ties. Amer. J. Sociol. 78(6):1360–1380. 1. See Marsden (1988), McPherson et al. (2001), and Currarini Guarino A, Ianni A (2010) Bayesian social learning with local interac- et al. (2009, 2010). tions. Games 1(4):438–458. 2. This includes the foundational papers by Banerjee (1992) and Hansen M (1999) The search-transfer problem: The role of weak ties in Bikhchandani et al. (1992) as well as more recent work by sharing knowledge across organization subunits. Admin. Sci. Quart. Bala and Goyal (1998), Gale and Kariv (2003), Çelen and Kariv 44(1):82–111. Lobel I, Sadler E (2015) Information diffusion in networks through social (2004), Guarino and Ianni (2010), Acemoglu et al. (2011, 2014), learning. Theoret. Econom. Forthcoming. Mossel et al. (2012), Mueller-Frank (2013), and Lobel and Sadler Lobel I, Acemoglu D, Dahleh M, Ozdaglar A (2009) Lower bounds on (2015). the rate of learning in social networks. IEEE Proc. Amer. Control 3. See Smith and Sorensen (2000), Acemoglu et al. (2011), and Conf., AAC ’09 (IEEE, Piscataway, NJ), 2825–2830. Lobel and Sadler (2015). Marsden P (1988) Homogeneity in confiding relations. Soc. Networks 4. The range 611 3/27 of the function in this definition corre- 10(1):57–76. Z McPherson M, Smith-Lovin L, Cook J (2001) Birds of a feather: sponds to the possible range of an agent’s ex ante expected utility, Homophily in social networks. Annual Rev. Sociol. 27:415–444. not the possible range of an agent’s probability of matching the Mossel E, Sly A, Tamuz O (2012) On agreement and learning. Working state of the world. paper, University of California, Berkeley, Berkeley, CA. Mueller-Frank M (2013) A general framework for rational learning in social networks. Theoret. Econom. 8(1):1–40. Smith L, Sorensen P (2000) Pathological outcomes of observational learn- References ing. Econometrica 68:371–398. Sorensen A (2006) Social learning and health plan choice. RAND J. Acemoglu D, Bimpikis K, Ozdaglar A (2014) Dynamics of information Econom. 37:929–945. exchange in endogenous social networks. Theoret. Econom. 9(1): Vives X (1993) How fast do rational agents learn? Rev. Econom. Stud. 41–97. 60(2):329–347. Acemoglu D, Dahleh M, Lobel I, Ozdaglar A (2011) Bayesian learning Vives X (1995) The speed of information revelation in a financial market Rev. Econom. Stud. in social networks. 78(4):1201–1236. mechanism. J. Econom. Theory 67(1):178–204. Aral S, Van Alstyne M (2011) The diversity-bandwidth tradeoff. Amer. J. Sociol. 117(1):90–171.

Downloaded from informs.org by [216.165.95.68] on 15 September 2016, at 14:56 . For personal use only, all rights reserved. Aral S, Walker D (2014) Tie strength, embeddedness, and social inﬂuence: Ilan Lobel is an assistant professor of information, operations, Evidence from a large scale networked experiment. Management Sci. and management sciences at the Stern School of Business at New 60(6):1352–1370. York University. He is interested in questions of pricing, learning, Bakshy E, Rosenn I, Marlow C, Adamic L (2012) The role of social and contract design in dynamic and networked markets. networks in information diffusion. Proc. 21st Internat. Conf. World Evan Sadler is a doctoral student in information, operations, Wide Web (ACM, New York), 519–528. and management sciences at the Stern School of Business at New Bala V, Goyal S (1998) Learning from neighbours. Rev. Econom. Stud. 65(3):595–621. York University. He is interested in the ways people inﬂuence Banerjee A (1992) A simple model of herd behavior. Quart. J. Econom. each other’s behavior and how the structure of a social network 107:797–817. affects behavior in aggregate.