<<

This article has been accepted for publication in a issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation : DOI 10.1109/ACCESS.2017.2687422, IEEE Access

1 A Technological Perspective on Information Cascades via Social Learning Fernando Rosas, Jen-Hao Hsiao and Kwang-Cheng Chen, IEEE Fellow

Abstract— behavior in human society is attracting a simpler and easier to predict [5], [6]. In the early days of lot of attention, particularly due to novel emergent phenomena Internet, the promises of abundance of information and the associated with online social media and networks. In effect, anti-authoritarian structure were thought to be seeds that would although crowd wisdom and herding behaviour have been well- studied in social science, the rapid development of Internet bring great benefits to society [7]. However, it has been shown computing and e-commerce brings further needs of in-depth that the heterogeneity and absence of recognized information comprehension of their consequences and impact from a tech- sources can generate doubts about causation, which in turn nological perspective. Based on social learning, an analytical can stimulate speculations and misinformation [8]. Moreover, originated in social science, we re-examine the well- the excess of available information and the limited processing known phenomenon of where rational agents can ignore personal knowledge in order to follow a predominant capabilities of individuals trigger confirmation , which social behaviour triggered by earlier decisions made by peers. stimulate the exclusive use of information sources that support Moreover, we look into the cascade behavior from a commu- one’s existing beliefs or points of view [9]. Furthermore, nication theoretic perspective, interpreting social learning as a online recommendation algorithms constantly and invisibly distributed data processing scheme. This perspective enables the filter user’s queries, presenting contents that might better development of a novel framework, which allows a characteriza- tion of the conditions that trigger information cascades and trace satisfy the user’s profile and preferences. All these elements their impact on the accuracy of the collective inference. Finally, are creating so-called digital echo chambers, where disjoint potential applications and examples of information cascade have groups of society are progressively reinforced in their beliefs been presented under various cyber technological scenarios, illus- —whatever they might be [10]. The digital misinformation that trating the prolific interplay between communication technology these mechanisms are generating is so severe that the World and computational social science. Economic Forum (WEF) listed this as one of the main threats to our modern society [11]. I. INTRODUCTION In order to address these issues, one big challenge is to The surprising outcomes of recent political polls, such as clarify the effects and consequences of the large amount of the Brexit referendum and the latest US presidential election, information that is constantly generated and exchange between is revealing the limitation of our current understanding of individuals in a digital society [12]. As a of , the social behaviour in a highly-interconnected world. It has been massive deployment and use of Internet mobile terminals and claimed that, similar to the way in which takes devices is enabling massive information networks, making the place among living species, human society evolves in global mobile data traffic of 2015 to grow more than 74% from simple to more complex forms of organization and reaching 3.7 exabytes per month, driven by 2.7 billion behavior [1]. One of the distinctive and more challenging connected devices [13]. Moreover, social habits are evolving characteristics of complex systems, which has been widely concurrently with the pervasive use of Internet, making social acknowledged in social scenarios, is that the aggregation of networks an essential tool for social interaction and informa- the activities of simple components or agents can generate tion exchange [14]–[16]. For example, most people nowadays complex and unpredictable outcomes [2]–[4]. Therefore, just use the Internet to check other people’s recommendations as thermodynamics and statistical mechanics went beyond prior to making decisions for traveling, buying a product or classical mechanics in order to provide an adequate framework choosing a restaurant. In these cases, subsequent decisions for the description of gasses and liquids, a new theory might are influence by earlier agents, which allows possible mis- be necessary in order to enable a deeper understanding of information and cascades across the network. Such complex important phenomena that characterize modern society [1]. interactions may defy and are difficult to predict, and New information dynamics are defying our traditional tools therefore an in-depth understanding of the inner mechanisms of analysis, which were forged in when the world was is very much desirable. In particular, one crucial technological goal is to understand F. Rosas and J.H. Hsiao are with the Graduate Institute of Commu- the way in which decision making is affected by the mecha- nication Engineering, National Taiwan University, Taipei 10617, Taiwan nisms of large distributed data processing [17]. In addition to (email nrosas, r04942065 @ntu.edu.tw). K.C. Chen is with the Department of Electrical{ Engineering,} University of South Florida, Tampa, FL 33620, social science and Internet computing, this knowledge is fur- USA (email [email protected]), and was with the Graduate Institute ther crucial to system automation and cyber-physical systems, of Communication Engineering, National Taiwan University, Taipei 10617, machine learning and artificial [18]. It is noted Taiwan. This research is partially supported by the U.S. Air Force under Grant AOARD-14-4053, and by Mediatek Inc. and the Ministry of Science that most engineering approaches focuses on a combination of and Technology, Taiwan, under Grant 104-2622-9-002-002. distributed information gathering and centralized computing.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications//index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

2

However, large-scale deployment of intelligent systems/agents contents based on sequential actions of like or dislike [40]. allows distributed processing by agents who sequentially col- Therefore, developing further in-depth analytical understand- lect and exchange data and perform local inferences, which ing on the mechanisms that trigger information cascades and allows the attainment of more graceful scaling properties with their effect on social learning emerges as a fundamental issue respect to the network size [19]–[21]. in modern human society. Decision making based on distributedly sensed information The main motivation behind this article is to explore social was intensively studied during the 1980s and early 1990s in learning as a distributed signal processing method, building the context of distributed radar systems, where the goal was a bridge between the research done separately by economists to design schemes that detect events as accurately as possible and sociologist, and electrical engineers and computer scien- considering various communication restrictions (c.f. [22]–[24] tists. We intend to provide a quantitative framework to analyse and references therein). Unfortunately, it has been shown that the impact of information cascades over the performance of the design of optimal schemes for information transfer and social learning, while the current literature primarily pays distributed processing for the general case is NP-hard [25]. In attention to the conditions that guarantee the achievement fact, although in many cases these schemes can be described of a perfect inference asymptotically (i.e. the inference error as a of thresholds against which likelihood functions must rate goes to zero). Furthermore, following an engineering be compared, the determination of the optimal thresholds is perspective, social learning can be quite useful as a distributed in general an intractable problem [22]. For example, it has data processing scheme for a range of applications if the error been shown that using equal thresholds can be suboptimal rate does not goes to zero but can be bounded below some even for the simple case of a network of identically distributed critical . Therefore, we develop novel upper bounds for sensors arranged in a star topology [26], being only asymp- the asymptotic inference performance, which can be used as totically optimal for large networks [22], [27]. Moreover, a design guide for applying social learning in engineering symmetric strategies are not useful for more applications. Moreover, these bounds provide fundamental topologies (e.g. [28], [29]), and hence heuristic methods for insights that delineate the way in which information cascades finding the thresholds are necessary. Renewed interest on this influences the asymptotic error rate of social learning. Finally, problem took place after the of wireless sensor our framework also provides analytical formulas for the exact networks, by considering the effect of noise, outage events performance of each agent, allowing an efficient exploration and the impact of energy and bandwidth constraints (c.f. of the error rates in non-asymptotic regimes. [30], [31] and references therein). Other aspects have also The rest of this article is structured as follows. Sections II been analysed, including robust distributed estimation [32], and III present an introduction to information cascades and [33], multi-objective optimization [34], distributed parameter social learning, presenting the fundamental and dis- optimization, tracking and many others [35]–[37]. However, cussing some of the relevant literature. After this, Section IV most of these works focus on networks of star topology and presents our novel perspective of social learning as a data are based on very distinctive roles for regular nodes and fusion aggregation scheme. Our main results about the character- centers, which makes them not well-suited for large-scale ization of information cascades, related to this novel per- distributed networks with ad hoc topologies. spective, are presented in Section V, and are then illustrated In parallel, remarkable efforts have been made in for the case of social networks driven by binary signals in and social science to analyze sequential information process- Section VI. Section VII uses the results of Section V to ing and learning on social networks (an extensive literature derive novel bounds for the achievable performance of social review about social learning is provided in Section III). In learning, providing a deeper understanding of the impact these models, agents make decisions combining private knowl- of information cascades over collective signal processing. edge with social information that they gather from looking These results are then illustrated by numerical evaluations in at theirs peers’ actions. Interestingly, it has been shown that Section VIII. Sections IX and X discuss several important the aggregation of rational decisions can generate irrational applications that information cascades have in prominent cyber global behaviour, degrading the “wisdom of the crowds” into technological scenarios, including cyber physical security and mere behaviour. This phenomenon, called information machine learning. Finally, Section XI summarizes our main cascades, arises when the social information overloads agents, conclusions. forcing them to ignore their private knowledge and to adopt the predominant social behaviour (an introduction and literature II. ANINTRODUCTIONTOINFORMATIONCASCADES review on information cascades is provided in Section II). It is believed that information cascades crucial roles in the for- This section present a general introduction to information mation of political opinions, the adoption or rejection of new cascades, reviewing the state of the art and providing the technology and many other important social phenomena [38]. necessary background for the unfamiliar reader. In the sequel, There have existed further interest in understanding the role first Section II-A discusses some fundamental aspects and of information cascades in the context of e-commerce and provide historical remarks. Then, Section II-B presents some the digital society [39]. For example, information cascades social implications, and provides a preliminary definition of can have tremendous consequences in online stores where what an information cascade is (which is latter revisited in customers can review the opinion of previous customers before Section V-A). Finally, statistical approaches to study informa- deciding to buy a product, or in the emergence of viral media tion cascades are discussed in Section II-C.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

3

A. Fundamentals of group decisions and social influence takes place when an agent actually chooses to ignore pri- vate information in order to follow the predominant social Intuitively, the decision made by a group of agents can behaviour or preference, in despite of possible contradictions be much more accurate than the one made by an isolated with his/her personal information [38]. The term “cascade” individual. This phenomenon, known as wisdom of the crowds, comes from the fact that once one agent has cascaded, this has been acknowledged and experimentally verified in diverse increases the social pressure for future agents to take a similar contexts by researchers of economics, psychology and soci- decision. Therefore, these events can easily spread over large ology [41]–[43]. The origin of this is commonly traced portions of a . back to a work written in 1907 by Sir Francis Galton, who Information cascades have been proposed as an was cousin of Charles Darwin [44]. By examining the results of how the “wisdom of the crowds” can transform into herd of a “guess the weight of the ox” competition in a country behaviour [54]. Moreover, it has been suggested that informa- fair, Galton noted that the median of all the estimations was tion cascades play crucial roles in many politic, economic and particularly accurate, being closer to the real value than the social phenomena [38]. For example, some companies provide guesses made by experts. This is somewhat related to the early sales or early tests opportunities to trigger cascades of of large numbers, where the process of averaging can keep the purchasing decisions [55], [56]. In markets with a monopolis- consistent part of a signal while reducing non-coherent noisy tic seller and buyers that are aware of each other’s purchases, elements. It has to be noted that the on the accuracy of it has been shown that the monopolist has incentives to alter aggregated opinions is not a mere theoretical tool, being deeply the good’s prices in order to induce herding [57]. With respect rooted into the popular mind and influencing stock markets, to the extremely slow adoption of a clearly more convenient political elections, and quiz shows [45]. hybrid seed corn during the , researchers Interestingly, it is also commonly acknowledged that in suggest that is was due to the higher trust that farmers had real life the wisdom of the crowd is far from infallible. for their neighbours over the information provided by the In effect, it has been noted that the effectiveness of this corresponding salesman [58]. Also, the dangers of information phenomenon requires two main : decentralization cascades over political preferences has been acknowledged by (allowing diversity of opinion and independence of noise and countries such as and , who have made to errors) and aggregation [46]. The failure of any of these prohibit polling during the days or weeks before elections in principles can severely degrade the accuracy of the wisdom order to avoid a cascading influence over the citizens [59]. of the crowds. A renewed interest about information cascades have Already renowned philosophers, like Soren Kierkegaard emerged recently with respect to the social dynamics that take and Friedrich Nietzsche, noted that aggregated behaviour can place in massive e-commerce and e-marketing platforms [39]. degrade into mere herd dynamics, where people follow the The steering or manipulation of information cascades phenom- prominent social behaviour without judging it with critical ena could have a big impact over these systems, which raises thinking [47]. More recently, a number of studies have shown concerns about their safety and resilience from malicious how social influence can affect individual decisions, compro- attacks or dishonest users. Developing a clear understanding mising the results of estimation tasks, price determination of the triggering conditions and range of the effects of infor- and even music preferences [43], [48]–[50]. Social interac- mation cascades is therefore of fundamental importance, as tion undermines the independency of individual opinions, as this can enable the design of secure and trustable platforms individuals are usually aware of each other’s decisions and that are crucial for a prosperous digital society. this might induce them to review their own estimates [51]. After being aware of peer’s choices, agents might want to modify their opinions due to towards C. Statistical approaches or a suspect that others might have better information [52], The mechanisms behind the wisdom of the crowds and in- [53]. As a simple example, many people like buying popular formation cascades are of statistical , and hence they are products and think that, if many people liked it, it cannot not restricted to social or psychological phenomena. According be bad. In particular, [50] analyses the behaviour of agents to this rationale, statistical Social Learning may supply a contrasting the results when they are aware or not of other’s theoretical tool to develop a deep understanding of information decisions. The corresponding experimental shows cascades. While Social Learning is discussed in Section III, that the knowledge of other’s decisions effectively reduces the other are introduced in the sequel. diversity of opinions, which in turns degrades the effectiveness 1) Information diffusion: Empirical studies of the struc- of . In this way, one can see how the tural characteristics of information cascades can be done aggregation of rational decisions can generate irrational global by analysing user-generated contents collected from records behaviour. of online social networks. For analysing this data, some approaches (e.g. Social Learning, c.f. Section III) adopt a microscopic viewpoint where cascades are considered to be the B. Preliminary definition and social implications consequence of the decision patters of specific users. In con- Social agents usually are faced with the dilemma of how to trast, a large volume of literature in computer science avoids act when the information gathered from their social network these complexities by adopting a macroscopic perspective, contradict their private conclusion. An information cascade where cascading contents are analysed as a case of information

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

4

diffusion over a network. In particular, most of the works focus on studying the behavior of statistical properties such as cascade length, tree size and link distribution [60]–[66]. Moreover, this approach enables the use of techniques from algebraic graph theory to characterize the diffusion process, including spectral analysis of the Laplacian or network adja- cency matrices [67]. 2) Dynamical systems analysis: An alternative framework to study information cascades have been presented in [68], which is based on a novel connection found between se- quential decision processes and dynamical systems. In this work the authors consider a group of agents that have to sequentially make a binary decision. The decision of the n- th agent, Xn, is generated by considering the information Fig. 1: Evolution of sequential decision making viewed as a dynam- carried by a private signal Sn and the previous decisions ical system. Studying the evolution of Rn, as defined in (1), stable X1,...,Xn 1. Moreover, in order to consider irrational and and unstable points can be defined, which correspond to the limiting stochastic aspects of human decisions, it is assumed that Xn is values of this time series [68]. stochastically generated following a Bernoulli random variable with parameter p = u (R ,S ), where u ( , ) is an utility n n n n n · · function and Rn is the output of a pre-decision filter given by the information cascade theory, if some few initial decisions

n 1 coincide, then the subsequent decisions tend to follow the 1 established pattern ignoring the result of the actual withdraw. Rn = n,j(Xj) (1) n 1 S A simple operational model for information cascades was j=1 X presented in [73], being based exclusively on variables that where is a collection of stochastic functions that map Sn,j can be observed from real data. The model was tested on data 0, 1 to R. The pre-decision filter model is motivated by { } related to the adoption of electronic commerce technologies, the tendency of people to have selective attention and hence showing that information cascades play a major role in such project high-dimensional signals into low-dimensional repre- processes. sentations. Hence, Rn represents a lossy compressed version The report presented in [74] studies the effect of information of the decisions of previous agents. cascades in data related to how people choose which movie Interestingly, it has been shown that the evolution of the to see in the theater. The results suggest that the cascading time series Rn can be traced down using tools from dynamical behaviour influence the box-office revenue and profits char- system theory. A very intuitive understanding of the evolution acteristics, which are governed by Levy distributions with of this system can attained by considering the plot of a specific infinite mean and infinite variance. This, in turn, might actually utility function, over which the evolution of Rn corresponds to explain the inherent difficulty of making accurate predictions a standard dynamical system. Following this rationale, stable in those scenarios. and unstable points in the evolution can be defined, which A recent research trend is to use both social learning correspond to the values towards which Rn converges almost modeling and data analytics [75], [76] from online platforms surely (see Figure 1). [77], [78]. Some topics considered by this literature include This simple model allows to develop a clear intuition complex user behavior analysis and prediction [65], [66], [79], over the aggregated effect of the social information over the and control [80] or steering [39], [81], [82] of complex social evolution of pn. In effect, the graphical perspective provided system phenomena. The interested reader can find a detailed by the dynamical system representation enables an insight analysis of [39] in Section X-A. on how, depending on the structure of the variables , Sn,j some situations are more likely to make Rn+1 to be larger III. HOW STATISTICAL LEARNING TAKES PLACE IN SOCIAL than Rn, or -versa. The decomposition of the space of NETWORKS possible values of R based on attractors, attractive regions n This section provide a general introduction to social learn- and unstable points allows a whole new approach to the study ing. The pioneer works of Bayesian social learning are dis- of information cascades, whose exploration has just started. cussed in Section III-A, and then Section III-B reviews the 3) Empirical approaches: Since the idea of information contributions of more recent works. Aspects of Non-bayesian cascades was published in the economics literature [54], [69], social learning are then discussed in Section III-C, and finally a number of empirical studies of information cascades have Section III-D present some open question. been carried out. For example, [70]–[72] design experiments to investigate the information cascades phenomenon in sequential decision making by human test subjects. In [70], test subjects A. Early efforts were ask to make predictions about the predominant colour Social learning was initially investigated by [54], [69], [83], of a collection of balls inside an urn, after looking to a [84] by analyzing sequential decision-making processes in randomly chosen one. Following the prediction provided by social networks. In these systems each agent has to make one

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

5

personal knowledge and adopt an homogeneous behaviour, and

hence Xm = Xnc for all m>nc. This is usually an undesired phenomenon that stops the inclusion of new evidence in the inference process, as further agents discard their own private information to blindly follow the prevailing behavior. In summary, the initial research succeeded in showing that information cascades effectively can take place within social systems, but could not provide a general understanding of their nature and generating causes. Fig. 2: The Sequential Social Learning Model. The variables Sn rep- resent personal evidence, while the arrows between various decisions Xj illustrate how decisions influence each other in the social system. B. Effect of cost functions and Motivated by these initial findings, researchers aimed to decision following a pre-defined fixed ordering. The decision deepen the understanding of the mechanisms of social learning of the agent that decides in the n-th place of the sequence, by extending the original models by considering more general denoted as X , is based on two sources of information cost metrics, assuming that the cost functions and priors n 2 Xn (see Figure 2). In one hand, agents possess personal knowledge possessed by different agents could disagree [85]–[88]. Their about the corresponding subject, which is represented by results show that the cost function plays a crucial role in the random variable S that is only available to the generating homogeneous asymptotic decisions, or in allow- n 2 Sn corresponding agent. These private signals can be discrete or ing heterogeneous behaviours to coexist. The work reported continuous depending on the cardinality of the signal space in [87] shows that diversity in the agent’s preferences can . Also, these signals are correlated with a variable that undermine the trust that agents give to each other, decreasing Sn represents the “state of the world”, denoted as W , which is the interest in conformity and allowing different behaviours to unknown to the agents. Secondly, the n-th agent is aware of coexist asymptotically. In fact, [86] presents some examples the decisions made by all the previous agents, denoted as where the trust is undermined to such an extent that the n 1 past does not provides valuable information to future X =(X1,...,Xn 1). This allows each agent to learn from the examples of previous agents in order to improve agents, and hence they have to rely exclusively on their own their decision accuracy. private signals. An interesting question is how the agents combine these In addition, [86] proved that the asymptotic accuracy of two heterogeneous sources of information in order to optimize social learning is not perfect if the information conveyed by the their decisions. A crucial assumption adopted in these works private signals is bounded. Concretely, let us consider binary decisions (i.e. 0, 1 ) and a binary state of the world is that the agents act based on perfect rationality making Xn 2 { } Bayesian decisions, which maximize the average value of a variable W . Then, under very general conditions, it can be cost function that quantifies their personal benefit. This cost shown that the maximization of the utility function is equiv- function is traditionally assumed to be affected by the agent’s alent to making Xn as much similar as W as possible given the available information (c.f. the corresponding discussion in own decision Xn and the state of the world as encoded by W . An interesting result is that social learning allows individuals Section IV-C). Perfect asymptotic accuracy is hence equivalent with uninformative private signals to harvest information from to guarantee that other’s private signals by copying their decisions. Therefore, lim P Xn = W =0 . (2) n even if Sn is not very informative, the n-th agent can still !1 { 6 } gain indirect access to the information conveyed in S1,...,Sn Now, for each possible private signal realization Sn = s n 1 2 by considering X , which represent a lossy compressed one can compute the likelihood ratio of s taking place Sn version of the other agents’ private signals. This is an em- under the event W =1 versus W =0 (for a precise { } { } bodiment of the wisdom of the crowds (c.f. Section II-A), definition of the private signal likelihood see Section IV-A). as the evidence provided by early agents’ decisions can be It is well-known that in general these likelihood ratio values successfully aggregated for guaranteeing a surprisingly im- are sufficient statistics for estimating W based on Sn [89], proved accuracy for the decision made by later agents. This and hence due to the above discussion they contain all the phenomenon is particularly remarkable for the case of binary information relevant for generating Xn. Following this, [86] decisions (i.e. X 0, 1 ) and complex private signals, as proved that (2) is not satisfied if the likelihood ratio values n 2 { } from a distributed sensing perspective this can be seen as a of the private signals are bounded above and bellow by given distributed data fusion scheme that requires small amounts of constants. Note that one of the consequences of this result is information exchange. that social learning cannot attain perfect asymptotic accuracy However, these efforts suggest further room to enhance when the private signals are drawn from finiten spaces. social learning, as under certain conditions the aggregation of Further important insights about social learning were rational decisions can generate irrational global behaviour and achieved by studying the effects of the social network topology information cascades (c.f. Section II-B). In effect, information on the aggregated behaviour [39], [56], [90]–[92]. Hence, cascades arises in social learning when the social information although the original models of social learning assumed that becomes so persuasive that all subsequent agents ignore their the n-th agent act based on the knowledge of all the previous

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

6

n 1 decisions X , these works consider a more general case exchanging opinions is modeled as where agents only have access to a limited subset of the N previous decisions. The neighbourhood of the n-th agent is f (t)= q f (t 1) , (3) normally defined as the set of agents which are connected n n,j j j=1 to him/her by the social network, which are denoted as X . Correspondingly, in these scenarios the decision X is where t is an non-negative integer. This method of fusing Bn n made considering the information of Sn and X , where the information is algebraically simpler than the one used by Bn latter represents the vector of decisions of the neighbours of Bayesian agents, allowing to perform detailed analyses based the n-th agent. Interestingly, the above works consider both on techniques based on Markov chain theory [98, Chapter 6]. deterministic and stochastic social network topologies, the In fact, the iterative process can be represented by later being characterized by random neighbourhoods. f~(t)=Qf~(t 1) = Qnf~(0) (4) In [91] the authors provide various conditions over the t network topology and private signal structure that guarantee where f~(t)=(f1(t),...,fn(t)) and Q is the matrix with or forbid perfect asymptotic social leaning. Their results show entries qn,j. Therefore, it is shown that the asymptotic value that asymptotic learning does not take place if there is a n limt f~(t) is governed by the properties of Q for large !1 group of agents that are “too influential”, i.e. if there exist values of n. Therefore, using standard tools like the Perron- a infinite number of individuals influenced exclusively by a Frobenius Theorem [98], it is possible to predict when the specific finite subset of agents. On the other hand, it is also agents achieve asymptotic agreement. Interestingly, if consen- shown that perfect asymptotic learning takes place if there are sus is reached its value is an afine combination of the original no too influential groups and the private signal likelihood is opinions. unbounded. In [95] and [97], it is shown that simple ad hoc updating As a partial converse result, a number of network topology rules can still achieve asymptotic correct inference, even when characteristics have been presented which, when combined they are unaware of important aspects like the social network with bounded private signal likelihood, makes it impossible topology or the signal structure of other agents. The system to achieve perfect learning. This extends the result of [86] for model of [95] closely follows the model presented in [93], the more general case of a (possibly random) social network although it explores the connection of Q with the social topology. However, [91] also provides a fascinating example network topology and other aspects. On the other hand, [97] of a network structure that allows perfect asymptotic learning introduces a constant arrival of new information, which allows even in presence of bounded private signal likelihood. This the study novel phenomena like asymptotic learning in finite topology is characterized by two different kinds of agents: networks. Both papers acknowledge that, although somehow innovators, which have few social connections and hence are surprisingly simple rules can achieve asymptotic learning, the likely to follow the dictate of their own private signal, and simplicity of the rule might have a strong negative effect on the gatherers, whose highly connected network location allows learning rate and the corresponding speed of convergence.cy them to synthesize previous opinions. This inspiring heteroge- The effect of the network topology on the asymptotic neous network showed how an adequate topology can undo the performance of non-bayesian social learning was studied in limitations of the private signal structure, allowing the system [96] using a similar model than the one proposed in [93], but to reach a perfect asymptotic inference. introducing random matrices Q in order to represent stochas- tic social networks. This work also considers heterogeneous agents, some of them being stubborn and hence less likely to modify their initial opinions. Their analysis shows that the C. Non-bayesian social learning evolution of the social learning is connected to the matrix W˜ , All the works discussed so far are focused on Bayesian which can be expressed as learning, where agents choose the actions following perfect W˜ = T + D. (5) rationality. However, important research efforts have been made in parallel in non-bayesian social learning models, where Above, the matrix T is governed by the probabilities of agent agents use simple rule-of-thumb methods to combine their interaction and hence is related to the social structure of private information and the one that comes from the social the group, while D represents the influence structure that signals [93]–[97]. quantifies which agents are more or less willing to adapt A vast part of the literature that studies non-bayesian social their decisions to follow their neighbours. Interestingly, it is leaning is inspired on a model presented in [93], where agents shown that under general conditions W˜ n converges to a matrix combine the neighbours’ opinions additively. In particular, with equal columns, and that the row vector determines the this work considers a group of agents who need to make value of the social consensus. Moreover, they provide bounds estimates of the value of a parameter ✓. The prior knowledge of possible deviations from asymptotic learning that can be of the n-th agent about the parameter is represented by a produced by an excessive influence of some agents. One of probability distribution over the possible values, denoted as these bounds is based on the spectral gap of T , which is a well- fn(0). Hence, by considering non-negative constants qi,j that known measure of the social network connectivity (related to represent the trust of agents on each other, the process of the second largest eigenvalue of the matrix [67]).

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

7

There is an interesting ongoing debate about if Bayesian structure there are two exclusive possibilities: either the social or non-Bayesian frameworks are more suitable to describe learning achieves perfect asymptotic learning in the absence human endeavour. In a nutshell, although Bayesian models of information cascades, or there are information cascades that are elegant and tractable, they assume that agents act always limit the learning process and hence prevent perfect learning rationally [99], and consequently make unrealistic assumptions to be achieved. Following this idea, in combination with the about their knowledge of posterior probabilities that are related insights about the effect of the network topology presented in to non-trivial aggregated social interactions [97]. However, [91], [39] explores how selective adaptation of incentives and Bayesian models provide an important benchmark, not neces- the progressive rewiring of the network connections can help sarily due to their accuracy but because their simplicity allows to steer information cascades. In despite of this preliminary to develop important insights about the nature of the dynamics effort, the literature presents little fundamental understanding of aggregated decisions, providing important reference points of the relationship between information cascades and the for discussing non-bayesian models as well [92]. On the other achievable inference performance that can be attained by social hand, the use of ad hoc information fusion methods in the non- learning. A first attempt to clarify this relationship is presented bayesian social learning literature makes it difficult to attain in Section VII. general results. Although highly desirable, it is challenging to compare the IV. SOCIAL LEARNING AS A DATA AGGREGATION SCHEME results from bayesian and non-bayesian social learning in a meaningful and fair way. In fact, while agents in Bayesian This section presents our interpretation of social learning systems usually decide once and do not adapt their decision as a case of distributed signal processing. For this, first Sec- further, most of the non-bayesian learning literature consider tion IV-A discusses the system model and basic assumptions agents that update their decision regularly, being closer to and then Section IV-B focus in analyzing the decision rule used information diffusion processes (c.f. Section II-C1). One way by agents. Finally, Section IV-C develops a communication of filling this gap is by exploring single decision non-bayesian theoretic interpretation of social learning, settling the bases of social learning. This has been done in some recent litera- the framework that is developed in the next sections. ture [68], [100], [101], which also addresses non-asymptotic properties which were not well-explored before. For example, A. Preliminaries and basic assumptions [100] proposes a data fusion scheme which combines Bayesian Let us consider a group of N agents that are sequentially updating for processing private information and an ad-hoc engaged in a binary decision-making process. Each agent combination based on a Gibbs measure for synthesising the so- makes one decision, being it labeled according to the place cial information. They show this scheme achieves exponential it takes within the decisions’ sequence. The decision of the convergence, and moreover quantify the dependencies of the n-th agent, denoted as X 0, 1 , is based on two sources n 2 { } learning rate over different learning rules and communication of information (see Figure 3): a private signal S that n 2 S constraints using large deviation theory. In [101] there is a corresponds to a discrete or continuous random variable that comparison of the performance that are attainable using differ- represents personal information that the n-th agent possesses, ent kind of data aggregation rules, including additive averaging and social information given by the random variable G n 2 Gn and majority rules. However, it is difficult to compare their that corresponds to information that the agent obtains from its performance results with previous works as their model focus own social network. on a final decision that uses as input all the partial decisions All participating agents have the same ca- generated in the learning process. The main aspects of [68] pabilities, and therefore the signals Sn are assumed to be are discussed in Section II-C2. identically distributed. Moreover, it is assumed that the signals are affected by the environment, which is represented by the D. Asymptotic learning and information cascades random variable W . We focus in the case of W 0, 1 , as 2 { } The existent literature suggests a direct connection between this simplifies the details of our presentation. For tractability imperfect asymptotic learning and information cascades. In we follow the existent literature in assuming that the most sequential decision processes, the progressive amending signals Sn are conditionally independent given W , following of new evidence generates the eventual achievement of perfect probability measures denoted by µw when conditioned on inference, at least asymptotically. This can be verified in the event W = w . It is assumed that both µ and µ { } 0 1 the case of classic hypothesis testing [102] and in vari- are absolutely continuous with respect to each other [104], ous distributed hypothesis testing schemes [36], where the which means that no particular signal completely determines convergence to a perfect inference is attained exponentially the state of the world. As a consequence of this assumption, the fast. However, one of the distinctive characteristics of social log-likelihood ratio of these two distributions is well-defined learning —acknowledged since the early efforts— is that and given by the logarithm of the corresponding Radon- under some conditions perfect learning cannot be achieved dµ1 Nikodym derivative ⇤S(s) = log (s)⇤. It is also assumed and the asymptotic performance is still sub-optimal. Moreover, dµ0 that µ0 = µ1, so that ⇤S(s) is not trivially equal to zero. the literature argues that imperfect learning is a distinctive 6 effect of information cascades, which limit the amount of new dµ1 P Sn=s W =1 ⇤When Sn takes a finite number of values then dµ (s)= {S =s|W =0} , 0 P{ n | } evidence that is included in the learning process [103]. There- while if Sn is a continuous random variable with conditional p.d.f. p(Sn w) dµ1 p(s w=1) | fore, depending on the network topology and private signal then dµ (s)= p(s|w=0) . 0 |

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

8

B. Decision rule We consider rational agents that follow a Bayesian strat- egy to minimize the average cost given by ¯ ⇡ = Un{ n} E u(⇡n(Sn, Gn),W) , where E is the expected value { } {·} operator and u(x, w) is a cost function that can be engineered to match the relevance of the decision Xn = x when W = w. For example, if u(w, x)=1 w,x with w,x the Kronecker ¯ delta, then n ⇡ = P W = ⇡ is the error rate of ⇡ as a U { } { 6 } predictor of W . Also, if u(w, x)= w x 2 then the ¯ is | | Un the mean square error. To find a functional description of Bayesian strategies, let us first consider the average cost of deciding Xn = x given Sn = s and Gn = gn, which can be expressed as

n(x s,gn)=E un(x, W ) Sn = s, Gn = gn U | { | } = u(x, w)P W = w Sn = sn, Gn = gn . { | } w 0,1 2X{ } Hence, the corresponding Bayesian strategy is given by b ⇡n(s, gn)=argminx n(x s, gn). Note that the average 2X U | Fig. 3: A social learning problem, where an agent need to make a cost after adopting the policy ⇡n can then be written as decision (Xn) based on personal information coming from a private S G ¯ signal ( n) and social information ( N ) coming from a social n ⇡n = E E n(⇡n(s, g) s, g) Sn = s, Gn = gn , network. U { } { {U | | }} clarifying that ¯ ⇡b ¯ ⇡ for any other strategy ⇡ . Un{ n}  Un{ n} n The Bayesian strategy for the case of binary deci- n 1 The available social information, Gn, represents what the sions can be determined by comparing n(0 s, g ) and n 1 U | n-th agent can observe in the social network about the deci- n(1 s, g ), which are the relative costs associated with n 1 U | sions made by other agents, which are denoted as X = Xn =0and Xn =1, respectively. This leads to an equivalent (X1,...,Xn 1). In effect, it is assumed that in general those condition given by [89]: decisions might not be directly observable by the agent, as X =0 P W =1Sn, Gn n u(0, 0) u(0, 1) they are measured through a social network that can impose { | } 7 . (7) P W =0Sn, Gn Xn=1 u(1, 1) u(1, 0) observational restrictions. In general Gn can be a random { | } variable, vector, matrix or other mathematical object. Useful Moreover, due to the property of Gn (c.f. Sec- examples are when Gn corresponds to: tion IV-A), Sn and Gn are conditionally independent

– The k previous decisions: Gn =(Xn k+1,...,Xn 1). given W = w. Therefore, using the Bayes rule on – The average value of the all the previous decisions: Gn = P W =1Sn, Gn and P W =0Sn, Gn , a direct calcu- n 1 { | } { | } X /(n 1). lation shows that (7) can be re-written as k=1 k – The decisions of agents connected by an Erdos-Renyi net- Xn=0 P n 1 ⇤ (S )+⇤ (G ) ⌫ + ⌘ , work with parameter q [0, 1], i.e. Gn 0, 1,e , S n Gn n 7 (8) 2 2 { } Xn=1 where [u(0,0) u(0,1)] W =0 P{ } where ⌫ = log [u(1,1) u(1,0)] , ⌘ = log W =1 and ⇤Gn (Gn) Xk with probability q, P{ } Zk = (6) is the log-likelyhood ratio of Gn. This condition supplies a e with probaility 1 q. ( simple data fusion rule for combining the information provided Our approach is not to assume any concrete functional form by Sn and Gn. for Gn, but to develop a general framework with which the consequences of various properties of Gn can be explored. As C. Communication theoretic interpretation G a minimal requirement, we ask n to satisfy the following Without loss of generality, for non-constant cost functions basic properties: and adequate decision’s labeling one can make the event (i) Causality: it is assumed that Gn is conditionally inde- X = W less costly that X = W , or equivalently { n } { n 6 } pendent given W of Sm for all m n. u(1, 1) u(1, 0) and u(0, 0) u(0, 1). Therefore, the   (ii) Uniform social uncertainty: the uncertainty present in Bayesian strategy is to choose Xn as similar to W as possible, the social media is independent of W . Therefore, Gn according to the available state of knowledge provided by Sn n 1 and W are conditionally independent given X . and Gn. Hence, decisions Xn can be considered to be noisy A strategy is a rule for generating a decision Xn based on estimations of W in a communication theoretic signal space. n n Sn = s and G = g , i.e. a collection of deterministic or To formalize the above intuition, one can represent the random functions ⇡ such that X = ⇡ (S , Gn) for n decision process of each agent as data transmission over a n n n n 2 1,...,N . noisy channel (for a summary of correspondences please refer { }

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

9

to Table I). As a matter of fact, our scenario is equivalent to a single data source W that is measured over multiple noisy channels, generating the signals Sk for k =1,...,n that are processed in order to generate Xn (c.f. Figure 4). The decoder of the n-th node, hence, receives as input the signal Sn and uses Gn as side information for optimizing the decoding process. The social information Gn corresponds to a lossy compression of the information provided by the signals S1,...,Sn 1 that is expressed in the vector of previous n 1 decisions X , representing a bandwidth constrain for the communication between the agents.

TABLE I: Table of Correspondances

Communication theory Social learning Fig. 4: Diagram that shows how social learning can interpreted Node Social agent as a case of distributed signal processing. The depicted decoder Data source “State of the world” implements the rule given by (9), which combines the information Noisy measurement Private information provided by the private signal of the n-th agent (Sn) and the Communication range Social neighbourhood evidence that comes form the social network (Gn). The latter can Local processing Agent’s decision be considered as additional side-information that helps to increase Bandwidth constraints Social information the accuracy of the inference.

To further explore this perspective, let us re-formulate (8) presents a novel statistical definition of information cascades as Xn=0 that distinguishes between local and global cascades, being ⇤S(Sn) 7 ⌧n(Gn) , (9) valid for Bayesian and non-Bayesian social learning. Then, Xn=1 Section V-B characterizes the conditions that trigger local in- where ⌧n(Gn)=⌫ + ⌘ ⇤G (Gn) is the decision threshold. formation cascades. Global information cascades are then ana- n Therefore, the agent’s decoder can be modeled as two inde- lyzed for the case of perfect social information in Section V-D. pendent signal processing modules that feed a decision module The results are extended for non-idea social information in (see Figure 4). The first signal processing module receives as Section V-C. Finally, Section V-E presents a communication- input the signal Sn —which can be a number, vector, matrix theoretic interpretation of the main results of previous sections. or any other mathematical object— and outputs ⇤S(Sn), In the following, Xn = ⇡n(Sn,Gn) corresponds which is a real number that serves as sufficient statistic for to Bayesian strategies unless otherwise stated. Also, the decision process. In this sense, this signal processing Pw X Y = P X Y,W = w is used as a short-hand { | } { | } module plays a similar role to the one of matched filtering notation. in a digital communication system [105]. The second signal processing module takes as input Gn and outputs ⌧n(Gn), A. Definitions which corresponds to side information that is processed in information cascade order to optimize the decision threshold. Following [54], we understand as the Finally, a decision module classifies the decision signal phenomenon where the behaviour of a small part of the social network can trigger a herd behaviour, forcing agents to ignore ⇤S(Sn) based on a Vonoroi tessellation, which divides R in two semi-open intervals given by their personal knowledge and act according to the social pressure. In general the decision Xn = ⇡n(Sn, Gn) depends 0 1 =( , ⌧n(Gn)), =[⌧n(Gn), ) . (10) directly on S and G ; however, a local cascade takes place Kn 1 Kn 1 n n when the interdependency between X and S is broken due Therefore, the output of the decision module is provided by n n to a dominant influence of Gn. This intuition is formalized in 1 n n 1 1 if ⇤S(Sn) n, the next definition. ⇡ (Sn, X )= 2 K (11) b 0 if ⇤ (S ) 0 . ( S n n Definition 1. The social information g n pushes the n-th 2 K n 2 G Note that the decision module is equivalent to the last agent into a local information cascade if Xn = ⇡n(Sn, gn) is stage of a demodulator module in digital communication statistically independent of Sn. receivers [105], with the particular feature that the tessellation Note that, for the particular case of Bayesian strategies then is determined by the side information provided by ⇤G (Gn). b n ⇡n : n 0, 1 is a deterministic function, and hence This feature and subsequent consequences are analyzed in the S ⇥ G ! { } the above definition states that gn causes a local cascade if next sections. and only if ⇡(s, g ) is constant for all s . Therefore, the n 2 S above definition generalises the one provided in [39], being V. C ASCADING BEHAVIOUR also valid for non-Bayesian strategies. This section presents our main contribution in the analysis As a next step, global information cascades are defined. of information cascades. For this purpose, first Section V-A Intuitively, after a particular agent experiences Gn = gn then

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

10

a global cascade takes place if all subsequent agents also fall On the other hand, if Ls < ⌧n(gn)

Pw Xn =0 Gn = gn observation, note that a direct calculation shows that { | } n n 1 n 1 n ⌧n+1(X ) ⌧n(X )=⇤Xn 1 (X ) ⇤Xn (X ) = Pw Xn =0Gn = gn,Sn = s dµw(s) { | } n 1 ZS = ⇤X Xn 1 (Xn X ) , (12) n| | = ⇡b (s, g )=0 µ (s) n n d w where the conditional log-likelihood is given by ZS = ⇤ (S ) < ⌧ (g ) n 1 Pw S n n n n 1 P1 Xn X { } n 1 | ⇤ ⇤Xn X (Xn X ) = log n 1 . = Fw ⌧n(gn) , | | P0 Xn X | where F ⇤( ) is the c.d.f. of the variable ⇤ (S ) conditioned This shows that ⌧n+1 decreases if Xn provides additional w · S n to W = w, whose properties are discussed in Appendix A. evidence about W =1over W =0with respect to the Note that the first equality is a consequence of the conditional previous decisions, and increased if the opposite happens. A independency between Sn and Gn given W = w, the second direct calculation using (V-B) shows that is due to (11) and the third to (9). The above results allows n 1 n 1 ⇤X Xn 1 (xn x )=(xn, ⌧n x ) . (13) to prove an useful lemma. n| | where the function ( , ) is defined as Lemma 1. Xn ⌧n Gn form a Markov Chain (i.e. ⌧n(Gn) · · ⇤ ⇤ is a sufficient statistic for generating Xn). F1 (⌧) 1 F1 (⌧) (x, ⌧)=x log ⇤ +(1 x) log ⇤ . (14) F0 (⌧) 1 F0 (⌧) Proof: From (V-B) one can see that Pw Xn ⌧n, Gn do { | } n 1 n 1 not depend on Gn, and therefore the conditional independency This shows that ⌧n(X ) is a sufficient statistic of X of X and G given ⌧ is clear. for predicting ⌧ ⌧ . Finally, using these results one can n n n n+1 n Let us introduce the notation Us =esssups ⇤S(Sn = s) find that 2S and Ls =essinfs ⇤S(Sn = s) for the essential supermum n 1 2S n 1 k 1 and infimum of ⇤ (S ) . If any of these quantities diverge, ⌧ (X )=⌘ + ⌫ X , ⌧ (X ) (15) S n † n k k then there exist signals that provide overwhelming evidence in k=1 X favour of one of the hypothesis. If both are finite, the agents for n 2, while for n =1then ⌧ = ⌫ + ⌘. 1 are said to have bounded beliefs. Using these definitions, we From (15) it is tempting to interpret ⌧n as a random walk proceed to characterize local information cascades. over the decision space(for an introduction about Random walks, see [107]). Although this is true for some particular Proposition 1. The social information gn n triggers a 2 G cases , this does not hold always as the steps ⌧n+1 ⌧n = local information cascade if and only if ⌧n(gn) / [Ls,Us]. 2 ⇤X Xn 1 are in general not identically distributed. How- n| Proof: From (V-B) it can be seen that if ⌧n Us then have quite complex non-Markovian statistics, the process ⇤ ⇤ F0 (⌧n)=F1 (⌧n)=1. Therefore, if ⌧n(gn) / [Ls,Us] ⌧ , ⌧ ,... posses some useful properties that are explored in 2 1 2 then it determines Xn almost surely, making Xn and Sn the next Lemma. independent. Proposition 2. The process ⌧1, ⌧2,... is Markovian and Super- or sub- Martingale for W =0and W =1, respec- †The essential supremum is the smallest upper bound that holds almost surely, being the natural measure-theoretic extension of the supremum [106]. tively.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

11

Proof: See Appendix B. Proof: See appendix B. The main result of this sub-section is to state that the process The following Theorem extends Theorem 1 for the case of ⌧n get trapped in some specific areas of R, and that these non-ideal social information, providing a sufficient condition events correspond to global information cascades. Theorem 1 under which local information cascades always trigger a global states that as soon as ⌧n goes beyond [Ls,Us] then it gets one. In summary, this proves that in these scenarios the stucked, this condition being necessary and sufficient for an decision threshold ⌧n evolves until the first time it reaches information cascade to be triggered. As that region of the outside of the interval [L ,U ]. If ⌧ / [L ,U ] then ⌧ = ⌧ s s n 2 s s m n decision signal space characterizes local information cascades for all m>n, this being an unequivocal signal of the (c.f. Proposition 1), is clear that this implies that in this beginning of an global information cascade. scenario every local information cascade triggers a global one. Theorem 2. If Gn possess global consistent distortion, then Theorem 1. Let us assume that the private signals provide any local information cascades trigger a global information bounded beliefs, i.e. both Us and Ls are finite (c.f. Sec- cascade. n 1 n 1 tion V-B). Then, for a given x 0, 1 , the three 2 { } Proof: See appendix B. following conditions are equivalent: n 1 n 1 For providing further intuition and further leverage our (i) X = x triggers a local information cascade in results from Section V-C, we explore the relationship between agent n. n 1 the condition that trigger global information cascades under n 1 (ii) ⇤Xn X (Xn x )=0almost surely. n |1 n 1| perfect social information and the ones that trigger cascades X = x (iii) causes a global information cascade. in the non-ideal case. Proof: See Appendix B. Proposition 3. If the process Gn has consistent distortion, Basically, the above theorem states that if the private signals n n 1 then each g n for which exists at least one x 0, 1 have bounded beliefs then each local information cascade 2n G1 full n 1 2 { } with ↵ (g x ) > 0 and ⌧ (x ) / [L ,U ] triggers an triggers a global one. Therefore, the simple graphical char- n | n 2 s s global information cascade. acterization provided by Proposition 1 for local cascades can full n be used to analyse global cascades as well (c.f. the analysis Proof: Let us focus in the case that ⌧n (x ) >Us, as full n provided in Section V-E). the proof for the case ⌧n (x ) Us. A direct computation This section extends the results for global information shows that cascades obtained in the previous section to more general P1 Gn = g ⇤Gn (g) = log { } scenarios. For this, let us first define the distortion coefficients P0 Gn = g given by { } n 1 n n xn 0,1 n ↵n(g x )P1 X = x n 1 n 1 n 1 = log 2{ } | { n } ↵n(g x )= Gn = g X = x . (16) n 1 n n P n xn 0,1 n ↵n(g x )P0 X = x | | P 2{ } | { } n 1 Note that ↵n(gn x ) corresponds to the likelihood for (17) | P the n-th agent to experience Gn = gn when the previous Then, by applying the previous lemma to the of the n 1 n 1 decisions are X = x . In the following, we use the logarithm, one can show that full n 1 n 1 n 1 notation ⌧n (X )=⌘ + ⌫ ⇤X (X ) to refer to the min ⇤ (xn) ⇤ (g) max ⇤ (xn) , decision threshold of the case of perfect social information, n Xn Gn n Xn x n   x n studied in Section IV, distinguishing it with respect to the 2A 2A n n o n n n o actual decision threshold ⌧ (G ) related to a state of limited where n = x 0, 1 ↵(g x ) > 0 . This condition is n n A 2 { } | knowledge as introduced in Section IV-C. We also introduce equivalent to the following further property, which is crucial for the rest of min ⌧ full(xn) ⌧ (G ) max ⌧ full(xn) . (18) n n n n n n the section. x n   x n 2A n o 2A n o Definition 3. The social information Gn is said to has consis- From this last condition, combined with the global consistency, n n 1 tent distortion if, for all gn n, one of the following possi- guarantees that if one x is such that ↵n(gn x ) > 0 and 2 G n 1 n 1 full n 1 | bilities hold: either all the decision vectors x 0, 1 ⌧n (x ) >Us, then ⌧n(Gn) >Us as well. n 1 full n 2 { } such that ↵n(gn x ) > 0 satisfy ⌧n (x ) / [Ls,Us], | n 1 2 or all decision vectors such that ↵ (g x ) > 0 satisfy n | E. Information cascades from a communication-theoretic per- ⌧ full(xn) [L ,U ]. n 2 s s spective Before presenting the main results of the section, we need While previous sections study the conditions that trigger to state the following useful lemma. local and global information cascades, here we aim to re- late the achieved results with the discussion presented in Lemma 2. If a1,...,an and b1,...,bn are collections of non- negative numbers, then Section IV-C. In this way, by following a communication- n theoretic perspective, it is possible to see the evolution of aj a1 an j=1 a1 an ⌧n as a refinement in the process of signal decoding and a min ,..., n max ,..., . b1 bn  b  b1 bn information cascade as a halt on this refinement. In effect, ⇢ Pj=1 j ⇢ P 2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

12

In the following, first Section VI-A presents results valid for the general binary case, and then Section VI-B focuses in the case of private signals with structure similar to a binary symmetric channel.

A. General results Let us consider focus our analysis on the general case of social systems where the agents have access to binary signals, i.e. = 0, 1 . Let us denote the false alarm and miss- S { } detection rates by ✏w = Pw Sn =1 w for w 0, 1 , and Fig. 5: The decision module compares the decision signal ⇤S (Sn) { } 2 { } assume without loss of generality that max ✏ , ✏ 1/2. against a threshold ⌧n(Gn), which evolves with the social informa- { 0 1}  tion. A direct computation of ⇤S gives that

1 ✏1 ✏1 ⇤S(Sn)=Sn log +(1 Sn) log . (20) ✏0 1 ✏0 the social information makes ⌧n to grow progressively, which corresponds to a stronger side information that favours one Note that Us = ⇤S(1) > 0 > ⇤S(0) = Ls, which is of the two hypothesis (see Figure 5). Correspondingly, an consequence of the fact that 1 ✏ ✏ . Correspondingly, 1 0 information cascade is equivalent to side information that is the c.d.f. of ⇤S(Sn) for given W = w is a step function so persuasive that completely determines the output of the given by decoder, disregarding the actual private signal realization. It is important to notice that when ⌧n stops evolving due 0 if ⌧ 1 if Us ⌧. decision process of future agents any longer, implying that  the learning process has actually stopped. The error rates of As an inmediate:> observation, Proposition 1 guarantees that all subsequent agents after an information cascade is triggered if ⌧1 = ⌘ + ⌫ Us explains why information cascades prevent the achievement of then Xn =0for all n. Therefore, for avoiding trivial scenarios a perfect inference even for asymptotically large networks. it is always assumed that ⌧ [L ,U ]. 1 2 s s It is interesting to note that the asymptotic performance Let us compute the log-likelihoods of the social decisions. of social learning is directly related to the size of [Ls,Us]. A simple application of (11) and the fact that L ⌧ U , s  1  s To illustrate this fact, let us first note that a value of it can be concluded that X1 = S1 almost surely. Therefore, n 1 P W =1Sn, X close to 0 or 1 of represent a high by comparing (13) and (20), one obtains that | certainty about the “state of the world”. A direct calculation using Bayes rule shows that ⇤X1 (X1)=⇤S(S1) , (22)

P W =1Sn, Gn =(⇤Gn (Gn) ⌘ + ⇤S(Sn)) where the equality holds almost surely. An analogous analysis { | } =( ⌧ + ⌫ + ⇤ (S )) , (19) reveals that n S n n 1 x n 1 ⇤S(Sn) if ⌧n(x ) [Ls,Us], where (x)=1/(1 + e ) is the well-known sigmoid n 1 ⇤Xn X (Xn X )= 2 function, and the second equality comes from the fact that | | (0 in other case. ⌫ ⌧ = ⌘ ⇤ (G ). Therefore, it is clear that a large n Gn n value of ⌧ corresponds to a state of high certainty about Therefore, using (13) and (15) is clear that | n| W . Moreover, according to Theorem 2 and Proposition 1, nc 1 n 1 asymptotically all values of ⌧n are either larger than Us or ⌧ (X )=⌫ + ⌘ ⇤ (S ) (23) n S k smaller than L . Therefore, if those values are large then k=1 s X this guarantees a smal asymptotic error rate. This intuition for all n n , where n is the first agent who experiences is further explored in Section VII. c c a local information cascade. Therefore, the evolution of ⌧n can be described as follows: it starts at ⌫ + ⌘ and evolves VI. SOCIAL LEARNING WITH BINARY PRIVATE SIGNALS taking upward steps of size L or downward steps of size s For illustrating the results of the previous section, we Us, stoping as soon as it reaches beyond [Ls,Us]. present an application of our framework to study social The above derivation illustrates the result stated by Theo- systems where the private signals are binary. Please note rem 1, showing how ⌧n stops evolving after reaching beyond that that such binary systems are popular in the literature, the range of values of the private evidence. Moreover, it also being extensively discussed, e.g. [38], and experimentally shows how the global information cascade corresponds to the validated [70]. Further applications to systems with other fact that the social network stops processing new data after private signal distributions can be found in Appendix C. the first agent suffers a local information cascade.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

13

B. Binary symmetric channel 1 0.8

Let us now specify our analisys by considering a case where 0.6

the false alarm rate is equal to the miss-detection rates, which 0.4 Decisions correspond to a binary symmetric channel with cross-over 0.2 probability ✏0 = ✏1 := ✏. The symmetry of this scenario 0 1 2 3 4 5 6 7 8 9 10 allows to relate the evolution of the decision threshold with a agent random walk and find close-form expressions for the cascading probabilities. 1

As a first step, by realizing that in this case ⇤S(1) = 0.5 1 ✏ ⇤(0) = log , it is clear that (23) can be re-written as ✏ 0 Tau 1 ✏ −0.5 ⌧ = ⌫ +⌘ k ⇤(0) k ⇤(1) = ⌫ +⌘ +(k k ) log , n 0 1 0 1 ✏ −1 1 2 3 4 5 6 7 8 9 10 where k0 and k1 are the number of decisions equal to 0 or agent 1 made by the agents before n . Therefore, (24) shows that c Fig. 7: Up- Sequence of decisions in a social system with binary in this scenario ⌧n evolves following a random walk over the 1 ✏ symmetric signals, cross-over probability ↵ =0.35 and ⌫ + ⌘ =0.1. set ⌫ + ⌘ + k log ✏ ; k Z , which gets trapped as soon Down- Evolution of the decision threshold ⌧n for the same system. { 1 ✏ 2 } as ⌧ > log (c.f. Figure 6). A direct inspection shows As Theorem 2 predicted, a global information cascade is triggered | n| ✏ that k might only take four possible values: 1, 0, 1 and 2 if after the decision threshold reach beyond the limits imposed by the ⌫ + ⌘ < 0 or 2, 1, 0 and 1 if ⌫ + ⌘ > 0. An example of a bounded private beliefs (marked by the red lines), which takes place after the decision of the sixth agent. realization of one decision sequence is illustrated in Figure 7, where one can observe an information cascade being triggered as soon as the decision threshold leaves the area demarcated where =(1 ✏)/✏. Therefore, q can be understood as ✏ 1 by red lines. the probability of having a correct cascade (e.g. a cascade of 1 ✏ 1’s when W =1) when ⌫ + ⌘ [ log , 0], i.e. when 2 ✏ having favorable priors (c.f. Figure 6). On the other hand, q2 1 ✏ corresponds to the case where ⌫ + ⌘ [0, log ], i.e. to the 2 ✏ rate of correct cascades when the priors are not favorable. The accuracy of these expressions for predicting the cascade rates have been verified by numerical simulations (see Figure 8).

1

Fig. 6: The evolution of the decision threhold for fully connected 0.95

social networks with binary private signals can be characterized as a 0.9 random walk in the decision signal space. Every decision Xn =1or 0.85 Xn =0causes steps to the left or right, respectively. When random walk moves away of the interval defined by the private of 0.8

each agent, then the random walk stops and an information cascade 0.75 is triggered. Probability 0.7 With this characterization, the probability of information 0.65 cascade of 1’s or 0’s, denoted as P C1 and P C0 , can 0.6 Correct cascade rate − favorable priors (theoretical) { } { } Correct cascade rate − favorable priors (simulations) be found using the theory of Random Walks and Ruin Prob- 0.55 Correct cascade rate − non−favorable priors (theoretical) Correct cascade rate − non−favorable priors (simulations) lems [107]. For this, let us consider a Random Walk over the 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 numbers 0, 1, 2 and 3, which starts either at 1 or 2 and stops Private signal error rate (α ) when it first hits 0 or 3. Without loss of generality (due to the symmetry of the scenario), we assume that each upward Fig. 8: Rate of the correct cascade (e.g. P C1 if W =1) for different priors, as described in Section VI-B.{ Results} confirm that step occurs with probability ✏ while each downward step take the evolution of social learning for binary signals can be accurately place with probability 1 ✏. Let us denote by q the probability z described by a random walk model, and that (24) effectively predict of hitting 0 before 3, and pz the probability of hitting the 3 the corresponding cascading rates. before the 0, both when the random walk starts from position z. Then, a classic result of random walk theory (c.f. [107, pg. The characterization of the threshold’s evolution in terms 345]) states that q =1 p and of a random walk introduces clear insights about the social z z 3 z learning process. For example, the above results show how two ✏ 1 1 ✏ agreed consecutive decisions are sufficient to trigger a global qz = 3 , (24) information cascade (c.f. Figure 6). In contrast, Appendix C ⇣ ✏ ⌘ 1 1 ✏ shows how more complex private signals can make it more ⇣ ⌘

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

14

difficult to trigger global cascades. where ( ) is defined as · ⇤ ⇤ (t)= u0,0F (t)+u0,1 1 F (t) (t ⌫) VII. ANALYSIS OF THE AVERAGE COST 0 0 ⇣ ⇤ ⇥ ⇤⇤ ⌘ In this section we study the evolution of the agent’s cost, + u1,0F1 (t)+u1,1 1 F1 (t) (⌫ t). which is a natural metric to evaluate the performance of Proof: See⇣ Appendix D. ⌘ social learning. First, Section VII-A provide general close- ⇥ ⇤ Note that ( ) measures the contribution of each t form formulas that enable efficient numerical evaluations. · 2 Tn to the average costs. In particular, following the results of Then, Section VII-B use these results to provide upper bounds Sections IV, let us consider a possible cascade of 0’s generated to the asymptotic cost, which clarify the impact of information ⇤ by t0 Us. Note that, under that condition, then F0 (t0)= cascades over the learning process. ⇤ F1 (t0)=1. Then, using the previous proposition, the contri- bution of this potential cascade to the asymptotic average cost A. Computing the average payoff can be found to be (t )=u (u u )(t ⌫). 0 1,0 1,0 0,0 0 A direct calculation shows that the average payoff of the n- Interestingly, when t ⌫ 0 then (t ) (u + u )/2, 0 ⇡ 0 ⇡ 1,0 0,0 th agent under a Bayesian strategy, as defined in Section IV-B, which is the payoff of a random guess. On the other hand, can be expressed as due to u u (c.f. Section IV-C), when t 0,0  1,0 0 !1 ¯ then (t0) decreases monotonically towards a limit given by n(⇡b)= Pw Gn = gn P W = w U { } { } (t0) u0,0, being this the payoff of a perfect prediction. w 0,1 ! 2{ } gXn An equivalent analysis can be done for the case of cascades 2G of 1’s where t L , showing that for that case (t)=  s un(w, ⇡b s, ⌧n(gn) )dµw(s) (25) u +(u u )(⌫ t) and hence where a more negative ⇥ { } 0,1 1,1 0,1 ZS value of t reduces the impact of cascades. These insights = Pw Gn = gn P W = w { } { } motivate us in calling the quantity t ⌫ the “accuracy” w 0,1 | | 2{ } or predictive power of a cascade. Then, the brief analysis gXn 2G provides the following important insight: cascades with a u F ⇤(⌧ (g )) + u 1 F ⇤(⌧ (g )) , ⇥ w,0 w n n w,1 w n n higher accuracy have a smaller impact over the asymptotic ⇣ ⇥ ⇤(26)⌘ cost. As a next step, we use these insights to develop an upper where the first equality is consequence of the conditionally bound to the average cost. The main idea is that U and independency of S and G given W = w, and the second s n n L impose natural restrictions over the threshold values of equality of (V-B) and the fact that ⇡ is a binary variable. Note s b information cascades, and hence can be used to consider the that a direct evaluation of (26) is possible using the algorithm performance of a worst-case-scenario. to compute Pw Gn = g given by Appendix E. { } With the above results it is possible to perform numerical Theorem 3. Let us consider a social network with consistent evaluations of the exact performance of social learning in distortion and bounded beliefs. Moreover, assume that the diverse scenarios. This is illustrated in Section VIII for the network end up in a information cascade almost surely. Then, case of different private signal statistics. the following lower bound holds: ¯ (⇡b) h0(Us ⌫)P C0 + h1(⌫ Ls)P C1 . (29) B. Asymptotic performance analysis U1  { } { } where hw(x):=u1 w,w (u1 w,w uw,w)(x), and C0 and In this section we derive lower bounds for the asymptotic C1 denote the events of cascades of 0’s or 1’s, respectively. payoff limn ¯n(⇡b):= ¯ (⇡b). This quantity is a key !1 U U1 performance metric for large social networks. Although the Proof: See Appendix D. Although we provide closed-form expressions for P C1 exact value can be estimated by numerical evaluations using { } and P C0 for some special cases (c.f. Section VI-B), in the formulas presented in Section VII-A, in most cases the { } complexity of the computations that are required to reach general these terms are difficult to compute. Nevertheless, they the point of convergence. Therefore, the simple upper bounds can be calculated by Monte Carlo simulations. Furthermore, presented in this Section provide valuable insights of this one can use the following simpler bounds, whose proofs are important metric. direct and left to the interested reader. For this, let us first present Proposition 4, which considers Corollary 1. For a social network that satisfies the conditions the statistics of ⌧n. Note that, because ⌧n is a deterministic of Theorem 3, the following upper bound holds: function of Gn, is possible to express its statistics as ¯ (⇡b) max h0(Us ⌫),h1(⌫ Ls) . (30) U1  Pw ⌧n = t = Pw Gn = g . (27) { } { } n o g Finally, let us study the case of u0,0 = u1,1 =0and u0,1 = 2Gn ⌧nX(g)=t u1,0 =1, for which ¯ Proposition 4. For arbitrary social systems with partial social n(⇡b)=P Xn = W (31) information, it holds that U { 6 } = Pw Xn =1 w P W = w . (32) ¯ { } { } (⇡b)= lim E (⌧n) , (28) w 0,1 1 n 2X{ } U !1 { }

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

15

Let us introduce the shorthand notation P (FA)=inputs ⌘, ⌫, the state of the world w, the network size N, the 1 limn P0 Xn =1 and P (MD)=limn P1 W =0 distortion coefficients and the signal statistics in the form of !1 { } 1 !1 { } for the asymptotic false alarms and miss-detection rates, the c.d.f. of the signal log-likelihood F ⇤( ) and F ⇤( ). 0 · 1 · respectively. The next corollary provides useful expressions We used Algorithm 1 to generate decision sequences under for those quantities. diverse private signal statistics. Please note that if the private signals follow a Natural Exponential distribution then Corollary 2. For a social network that satisfies the conditions ‡ the corresponding signal log-likelihood is simply a linear of Theorem 3, then the following formulae hold: function. In fact, a Natural Exponential Family distribution P (FA) ( Us)P0 C0 + (Ls)P0 C1 , p.d.f. can be expressed as 1  { } { } P (MD) ( Us)P1 C0 + (Ls)P1 C1 . 1 Pw Sn = s =exp s ✓(w)+h(s) b(w) , (35)  { } { } { } { · } Proof: From (32) is clear that where ✓(w) is the natural parameter, h( ) is the carrier measure ¯ · (⇡b)=P (FA)P W =0 +P (MD)P W =1 . (33) and b( ) is the log-normalizer, and therefore the corresponding 1 1 1 · U { } { } log-likelihood can be written as Now let us consider the case of u0,0 = u1,1 =0and u0,1 = u =1, for which ⌫ =0and h (x)=h (x)=( x). ⇤ (S )=S [✓(1) ✓(0)] [b(1) b(0)] . (36) 1,0 0 1 Sn n n Hence, the bound provided in (29) can be re-written as Because of this , Gaussian signals have unbounded ¯ (⇡b) w 0,1 ( Us)Pw C0 private beliefs, as in this case Sn ( , ). On the U1  2{ } { } 2 1 1 contrary, as exponential distributions S [0, ) then the X h n + (Ls)Pw C1 P W = w . (34) 2 1 { } { } corresponding beliefs are bounded in one side and unbounded i in the other, allowing only cascades of 1’s if ✓(1) ✓(0) is The corollary is proven by comparing (33) and (34) and real- positive and only cascades of 0’s if it is negative. On the izing that they both hold for any choice of priors P W = w . { } other hand, any discrete distribution with finite support has bounded beliefs, as the private signal log-likelihood can take One of the main consequences of these results is the only a finite number of different values. fact that —as long as the system has consistent distortion— one can provide a bound of the asymptotic error rate based Our simulations reflect these results, and illustrate the in- exclusively in the extreme values of the signal log-likelihood. sights discussed with respect to Proposition 1 and Theorem 2. For illustrating this point, let us consider a system where For signals with bounded beliefs a global information cascade L = U . Then, as a consequence of Corollary 2, one can is triggered as soon as the evolution of the decision threshold s s ⌧ [L ,U ] directly guarantee that the false alarms and error rates are n drives it away from s s . This is illustrated by Figure 9, Us where it can be seen that after ⌧n goes beyond the red smaller than ( Us)=1/(1 + e ). Reciprocally, one can lines which are further away from zero (which correspond to guarantee that the asymptotic error rate is less than p0 if both 1 the extreme values of ⇤S(SN )) a sequence of homogeneous Us and Ls are both larger than log p 1 . { 0 } decisions is triggered. On the other hand, our simulations show that under Gaussian signals even very long sequences VIII. NUMERICAL RESULTS of equal decisions can be suddenly reversed when a signal This section present numerical results that illustrate the with high enough log-likelihood is founded. As an example findings presented in Sections V and Section VII. Our aim is to of this, Figure 9 shows how in a system with Gaussian private show how our approach allows to find quantitative conclusions signals a sequence of 1’s take place after a sequence of almost about the achievable performance of social learning, providing 500 consecutive 1’s. an engineering perspective that complements the more quali- Interestingly, our simulations also show that decisions that tative results that exist in the social science literature. confirm a trend have a diminishing impact (smaller step size) In the sequel, we use our results to show to aspects of social over the evolution of the decision threshold. For example, learning. First, Section VIII-A presents an Algorithm based on the plots that correspond to Binomials and Exponentials in the results of Section V to simulate decision sequences, and Figure 9 show that the subsequent step-sizes decrease when the also illustrate how the decision threshold evolves in accordance value of ⌧n moves away from zero. On the contrary, the same to the learning process. Then, Section VIII-B uses the results figures show how decisions that go against the majority of of Section VII in order to to compare the achievable previous choices induce important jumps, which corresponds of the inference developed by social learning in scenarios with to a high amount of new information that is included in the different private signals statistics. For simplicity, through this inference process. Correspondingly, the fact that ⌧n is constant section we focus in the case of flat priors (i.e. ⌘ =0) and after a global information cascade is trigger corresponds to the u1,1 = u0,0 =0and u1,0 = u0,1 =1—and hence ⌫ =0. fact that no new information is being processed by the social learning —and hence the accuracy of future agents does not A. Simulations of decision sequences increase further but stays constant. The results presented in Section V-B allow us to develop ‡Many well-known distributions belong to the Natural Exponential Family, an efficient algorithm to simulate decision sequences following including Bernoulli, Binomial, Poisson, Negative Binomial, Gaussian with diverse signal statistics. Algorithm 1 implements this, using as known variance and Exponential distributions among others.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

16

Algorithm 1 Simulation of social decisions 1: function DECISION VECTOR(N,⌘, ⌫,w) 2: ⌧1 = ⌫ + ⌘. ⇤ 3: P0 X1 =0 = F0 (⌧1). { } ⇤ 4: P1 X1 =0 = F (⌧1). { } 1 5: Generate x1 Bernoulli(Pw X1 =0 ). ⇠ { } 6: for n =2,...,N do 7: for g n do 8 2 G n 1 n 1 8: P0 Gn = g = ↵n(g x1)P0 X = x . { } | n 1 n 1 9: P1 Gn = g = ↵n(g x1)P1 X = x . { } G =|g P1{ n } 10: ⇤Gn (g) = log G =g . P0{ n } 11: ⌧n(g)=⌫ + ⌘ ⇤Gn (g). n 1 n 1 n 1 ⇤ 12: X =0X = x = ↵(g x )F (⌧ (g )) P0 n g n n 0 n n | n 1 n 1 2G | n 1 ⇤ 13: X =0X = x = ↵(g x )F (⌧ (g )) P1 n g n n 1 n n | P 2G n 1 | n 1 14: Generate xn Bernoulli(Pw Xn =0X = x ). n n⇠ P n 1| n 1 n 1 n 1 15: P0 X = x = P0 Xn = xn X = x P0 X = x . { n n} | n 1 n 1 · n 1 n 1 16: P1 X = x = P1 Xn = xn X = x P1 X = x . { } | · N 17: return x

1 1 1 1

0.9 0.9 0.9 0.9

0.8 0.8 0.8 0.8

0.7 0.7 0.7 0.7

0.6 0.6 0.6 0.6

0.5 0.5 0.5 0.5 Decisions Decisions Decisions Decisions 0.4 0.4 0.4 0.4

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

0 0 0 0 0 1 2 3 0 1 2 3 0 1 10 10 10 10 10 10 10 10 0 100 200 300 400 500 600 700 800 900 1000 10 10 agent agent agent agent

1.5 3 0.8 1

0.8 0.6 1 2 0.6 0.4 0.4 0.5 1 0.2 0.2

0 0 0 0 Tau Tau Tau Tau

−0.2 −0.2 −0.5 −1 −0.4 −0.4 −0.6 −1 −2 −0.6 −0.8

−1.5 −3 −0.8 −1 0 1 2 3 0 1 2 3 10 10 10 10 10 10 10 10 0 100 200 300 400 500 600 700 800 900 1000 0 2 4 6 8 10 12 agent agent agent agent (a) Binomial (b) Poisson (c) Gaussian (d) Cauchy Fig. 9: Above- Sequence of decisions in a social system generated using Algorithm 1, assuming perfect social information and diverse signals statistics. Bellow- Evolution of the corresponding decision threshold ⌧n. Confirming Proposition 1 and Theorem 2, as soon as the decision threshold —Tau, Y-axis— reaches beyond the the possible values of the private signals (marked by red lines), the decision threshold stops evolving and the network falls into a global information cascade.

For completeness, simulations over signals with Cauchy converge to zero in other case (e.g. Gaussian signals in distributions with fixed scale and variable location coefficients Figure 10). Interestingly, results suggest that the error rates were included in order to illustrate that these results also hold do not converge to zero when the signals are bounded in one under continuous signal with bounded private beliefs. The side —and hence only allow cascades of either 0’s or 1’s, interested reader can find an analysis of Cauchy signals in which in Figure 10 corresponds to the Poisson distribution. Appendix C. It is also interesting how Cauchy distributions achieve a very poor performance, although being a continuous distribution B. Evaluation of payoffs (c.f. Appendix C). Numerical evaluations confirmed the accuracy of the ex- Our results also confirmed the intuition that the asymptotic act expressions and bounds derived in Section IV for the error rate depends mainly on the extreme values of the signal error rates. Figure 10 compares the performance attained by likelihood (c.f. the corresponding discussion in Section IV). In social networks with perfect social information when they effect, if the signal parameters are chosen in a way that they are driven by various signal statistics. In accordance to the possess the same values of Us and Ls, then the asymptotic results presented in Section III, the error rates converge to a value is generally close to the value predicted by Theorem 3 value larger than zero when the signals have bounded beliefs (see Figure 11). It is to be noted that, if Poisson distributions and hence global information cascades take place, while they are chosen in such a way that it’s single extreme value of the

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

17

−2 10 accuracy. In effect, in a case where cascades would be always Binomial signals Poisson signals correct then the asymptotic performance would be perfect in Cauchy signals despite of the cascading behaviour. Therefore, the asymptotic Guassian signals −3 10 Upper bound performance of social learning is not limited by the cascading itself, but by the corresponding “cascade accuracy” (c.f. Sec- tion VII-B). In the case of Bayesian strategies, our framework −4 10 shows that the cascade accuracy is directly related with the extreme values of the private signal log-likelihood. Interest-

−5 ingly, this allows to conclude that social learning can provide Succesful inference rate 10 error rates as small as desired if the system designer can engineer the private signal statistics appropriately. Therefore,

−6 10 this data aggregation method does not impose, by itself, any lower bounds on the achievable performance. 2 4 6 8 10 12 14 Agents The proposed framework is based on two important simpli- fying assumptions: the conditional independency of the private Fig. 10: Evolution of the error rate for various signal statistics, as signals and the assumption of perfect rationality of the agents. given by (26), and the upper bound presented in (30). The parameters of the different distributions were chosen in order to provide a error Please note how these assumptions allow a detailed analytic 2 rate of 10 for the first agent, while the binomial distribution has approach, which allows to generate a first understanding on a support of n =6. such a challenging topic.

IX. APPLICATION TO CYBER-PHYSICAL SYSTEM

Binomial signals, n=6 −2 SECURITY 10 Binomial signals, n=3 Poisson signals Cauchy signals This section reviews an application of information cascades in the field of cyber-physical security, following the work reported in [108]. This application is novel in taking advantage −3 10 of some of characteristics of information cascades that are traditionally regarded as undesired. In the following, first Section IX-A provides the necessary context and discusses Inference error rate −4 the application. Then, Section IX-B presents the system model 10 and fundamental assumptions. The leveraging of information cascades in this scenario is discussed in Section IX-C, and finally some numerical results are presented in Section IX-D. −5 10 2 4 6 8 10 12 14 Agent A. CPS security as a dilemma in sensor networks design Fig. 11: Evolution of the error rate for various signal statistics, as Security plays a critical role in cyber-physical systems given by (26), and the upper bound presented in (30). The parameters (CPSs), particularly for those involved in public utilities whose of the different distributions were chosen in order to provide a error 2 safety is critical [21]. In effect, recent attacks to public CPSs rate of 10 for the first agent, while the binomial distribution has a support of n =6. that created significant damages have been widely reported. There exist many technological challenges for the security in nowadays’s cyber-physical systems. Moreover, as more signal log-likelihood coincide with one of the values of the cyber-control automation is progressively entering our daily other signals, it’s asymptotic error rate is approximately half life, guaranteeing CPS security will become even a more than the other; this corresponds with the fact that only half challenging subject. As the level of security is determined by of the information cascades happen (c.f. the corresponding the weakest element of the entire system, one major dilemma discussion about Poisson distributions in Section VIII-A). lies in the information fusion that takes place on the sensor Finally, although different signal statistics with similar ex- networks that supply vital information to control and manage treme values converge to the same value, Figure 11 shows the CPS. The main weakness of these sensor networks are different convergence speeds. It is reasonably to postulate that described in the sequel. In such networks the number of sensors is significantly the convergence speed is related to the Total Variation • of the corresponding probability density functions [104], but large, and hence sensors are usually deployed randomly the proof of this conjecture remains open. (i.e. the location of each sensor might be unknown to the system controller). A precise management of them is therefore challenging or even unfeasible. Furthermore, C. Discussion a good portion of the sensors might be deployed in An important insight that springs out of our analysis is geographical regions where it is not possible to warrant that information cascades are not intrinsically prejudicial for physical or cyber security (e.g. war zones or regions the performance, as their impact is conditioned over their controlled by an adversary party).

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

18

Typical low-end (i.e. low-cost) sensors are only able B. Scenario description and main assumptions • to perform low-complexity computing and networking. Consider a sensor network composed of N sensor nodes, Therefore, it is nearly impossible to implement reliable which are deployed over an area for the purpose of monitoring security functions and protocols of high complexity. or surveillance. Based on the sensor’s signals, the network Sensor data from these low-complexity sensors may be • shall infer the value of the binary random variable W , with unreliable due to malfunctions in terms of measurement, events W =1 and W =0 corresponding to the presence computing, or battery. Data can also be corrupted or { } { } or absence of an intrusion, respectively. No knowledge about delayed because of link outages or packet errors resulted of the prior distribution of W is assumed, as intrusions are from noise and interference in wireless communications, rare and have unknown/unpredictable patterns. or also due to spectrum-sharing or energy-harvesting opportunities. Battery limitations impose severe restrictions on the com- The vulnerability of these systems makes it reasonable to munication between sensors, and hence each node is assumed expect that they might be victims of cyber/physical attacks to forwards data to others by broadcasting only a binary vari- by intelligent adversaries. This is particularly critical to low- able. Under a medium access control mechanism, and without end sensor networks that are usually at the edge of large and loss of generality, sensor nodes are assumed to transmit their complex CPS. signals sequentially according to their indices. Due to the Due to above reasons, although the technical challenges nature of wireless broadcasting, nearby transmissions can be of the design of secure wireless sensor networks have been overheard. Therefore, it is assumed that the n-th node can widely studied [109], there remain open problems of both generate its decision based on its own sensor output and the theoretical and engineering nature [110]. Attacks to wireless signal exchanged by other sensors. sensor networks are commonly categorized into outside attacks A data aggregator or fusion center collects the transmit- and insider attacks. Outside attacks include (distributed) denial ted data and is recognized as a specific node denoted as n 1,...,N . The performance of the entire sensor of services (DoS) attacks facilitated by cyber- or physical- FC 2 { } means, which are facilitated by the broadcasting nature for network is quantified by the corresponding miss-detection wireless communications [109]. Insider attacks can create and false alarm rates in the fusion center, given by PMD = P XnFC =0W =1 and PFA = P XnFC =1W =0 re- potentially more severe harm to CPS, where the adversary { | } { | } can recruit low-complexity sensors by malware through cy- spectively. ber/wireless means, or by physical substitution of targeted 1) Powerful insider attack: During the attack, it is assumed sensor nodes. These compromised sensor nodes can report that there are N ⇤ Byzantine nodes controlled by an adversary, false data in order to create harmful results and malfunctions, while the network management does not know this situation. It which is related to the well-known Byzantine Generals prob- is important to notice that these Byzantine nodes may include lem [111]. Low-complexity sensors at the edge of CPS are DAs or FCs. The adversary can therefore freely define the natural targets of such insider attacks. values of the binary signals transmitted by Byzantine nodes in A traditional sensor network consists of a fusion center order to degrade the sensor network performance, which might and regular sensors nodes. Most of the literature assume that be viewed as a man-in-the-middle attack or false data inject the fusion center is capable of executing secure coding and attack. It is further assumed that the adversary is topology- protocols. It is to be noted that in large CPS, the sensors aware, knowing the sensor sequence and the strategy in use. at the edge of the network may require another kind of In other words, this is a very powerful attack from inside of mediator devices known as data aggregators (DAs), which the sensor network. have the capability to access the cloud through high-bandwidth 2) Defense without knowledge of attack: In most (surveil- communication [112]. DAs are surely attractive to insider lance) sensor networks, miss-detections are more important attacks. Differing from almost all existing research, this project than false alarms. Furthermore, it is difficult to estimate assumes that DAs and fusion centers are possible to be the cost structure under the worst-case scenario. Therefore, recruited and compromised. the Neyman-Pearson criteria shall be selected by setting an In response to the technology challenges related to the allowable false alarm rate and focusing on the achievable development of resilient low-end sensor networks that are miss-detection rate. Most signal processing techniques for capable of facing an intelligent adversary, the desirable sensor distributed detection rely on a FC(s) that gather data and fusion mechanism shall be robust and adaptive to the existence generate estimators [114]. In order to guarantee diversity, of compromised sensor nodes and false data. In this way, traditional distributed detection schemes choose to ignore the adapted operation can be subsequently reconstituted [113] previously broadcasted signals. However, as regular sensors if there exists a reliable and secure management. This is do not perform any data aggregation, each of the overheard similar to the of cyber resilience, which was officially signals cannot serve as a good estimator of the target variable. proposed in 2012 World Economy Forum. Therefore, the goal When there exist Byzantine nodes in sensor network, various of the proposed is techniques have been proposed, such as identification and (1) to establish a resilient operation with the presence of an removal of Byzantine sensors, cryptography, secure proto- unknown number of compromised sensors, and cols [115], but none of them consider potential Byzantine (2) to enable a resilient sensor fusion even in the presence DAs or FCs. Consequently, it is very desirable for sensor of false data from compromised sensors. network management to find an appropriate network-resilient

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

19

1 strategy to mitigate the effect from this powerful topology- 0.9 * N = 0 0.8 aware adversary, especially when the network manager (i.e. N* = 20 0.7 defender) has no knowledge of the number of Byzantine nodes N* = 60 0.6 N* = 100 or other statistics of the attack. 0.5 0.4 Inspired by and social learning, a totally detection rate − 0.3 different philosophy is undertaken to face this problem. Each 0.2 sensor node can be considered to be a rational agent that Miss 0.1 0 decides sequentially about the presence of attacks, based on 1 21 41 61 81 101 121 141 161 181 201 a Bayesian data fusion of their measurements and overheard Sensor node signals from other nodes. Of course, the Byzantine sensors 1 do not follow this strategy as their goal is to bring down the 0.9 N* = 0 0.8 sensor fusion performance. Let denote the set of indices N* = 20 B 0.7 N* = 60 of Byzantine nodes in the sensor network, where N ⇤ = . 0.6 |B| N* = 100 In the example of intrusion detection, as events W =0 0.5 { } 0.4 are much more frequent than W =1, any abnormal detection rate { } − 0.3 increase of the false alarm rate would be quickly noted by the 0.2 operator, which is undesirable to the adversary. Consequently, Miss 0.1 0 the adversary strategy is to increase the miss-detection rate as 1 21 41 61 81 101 121 141 161 181 201 much as possible, which is achieved by forcing null signals Sensor node for all n . Fig. 12: The top figure presents the performance (i.e. miss-detection 2 B rate for fusion) of a surveillance sensor network of 200 nodes, while the lower figure shows the same quantity when only the 10% C. Leveraging information cascade less favorable cases are considered. Different curves correspond to different number of compromised sensors. When considering total Intuitively, the accuracy of the n-th sensor grows with n, miss-detection rate, up to 60 compromised sensors —that is 30% of and hence nFC is usually chosen as one of the last nodes in the sensor nodes— the fusion still functions well. The performance floor decision sequence. However, as the number of shared signals for 100 compromised sensors implies exceeding the bound of the grow, the increasing social can make the nodes to ignore Byzantine Generals Problem. their individual measurements and fall into and information cascade. Interestingly, one unique aspect of this approach is status for the trust management in the CPS. Please note that a to identify a positive effect of information cascades, which has social learning data fusion mechanism is generally compatible been overlooked before. In effect, information cascades make to any cryptograph and secure networking protocol. a large number of nodes to hold equally qualified estimators, generating a large number of locations in the network where the network operator can collect aggregated data. This property D. Numerical results avoids single points of failure, providing robustness against To illustrate the application of social learning against topology-aware false data injection attacks. topology-aware data falsification attacks, a network of ran- On the other hand, an attacker can also leverage the informa- domly distributed sensors over a sensitive area following a tion cascade phenomenon. In fact, a rational attacking strategy Poisson Point process (PPP) was considered. The ratio of is to tamper the first N ⇤ nodes of the decision sequence, the total area that falls under the range of each sensor is setting their signals in order to push the networked decisions denoted by r. It is assumed that intrusions can occur uniformly towards a misleading cascade or a misleading public belief. If over the surveilled area, and hence the probability of an N ⇤ is large enough, an information cascade can be triggered intrusion taking place under the coverage area of a particular almost surely, making the learning process to fail. However, sensor is equal to r. It is further assumed that each node if N is not large enough then the network may undo the is equipped with a binary sensor (i.e. Sn 0, 1 ) that ⇤ 2 { } initial pool of wrong opinions and end up triggering a correct might generate wrong measurements due to electronic and cascade anyway. Therefore, to achieve resilient sensor fusion other imperfections. Figure 12 shows the social learning based against false data attacks, the trustworthy sensor networking fusion successfully against Byzantine data attacks. shall be conducted in clusters for any large sensor network. Luckily, since sensors are usually equipped with very short- X. FURTHER APPLICATIONS range radios and sequentially multiple access, such a clustering This section presents a brief exploration to other appli- strategy is consistent with engineering . cations of information cascades and social learning. In the More precisely, the proposed sensor fusion does not solely following, Section X-A explains how the steering of infor- rely on the security mechanisms to against the attacks, but take mation cascades can be supervised and controlled, and dis- advantage of public belief to enhance its capability against cusses promising applications to e-marketing and e-commerce. attacks. As DA or FC usually serves as the last node to Then, Section X-B introduces the potential applications of transmit the measurement (i.e. announce the decision), both information cascades to consensus building problems. Finally, measurement and public belief are simultaneously disclosed Section X-C gives a vision on the way of connecting social to report fusion result for CPS operation and implicit security learning and machine learning applications.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

20

A. Steering information cascades Algorithm 2 Selective Rewiring This subsection presents a discussion about design consid- 1: Input: Nz,Nw, k H erations for steering information cascades in a social system 2: for n =1,...,N do where agents perform Bayesian social learning, following the 3: Find n = z Xz = k , n n, n = Nz Z { | 6 H } Z ⇢ B |Z | work reported in [39]. 4: Find n = w Xw = k , n n = ?, n = W { | H } W \ B |W | 1) Design of social systems: Information cascades play a Nw sensible role in e-commerce and e-marketing [39]. In effect, 5: n ( n n) n B B \Z [ W it is intuitive that customers might tend to choose a given product when they find many positive comments about it online. Conversely, customers might be prone not to choose the out out- number, denoted as dn , which correspond to product when facing a significant number of negative reviews. agents whose decisions is seen by the larger number of other Therefore, for marketing purposes, an information cascade of agents [67]. The pseudo-code of the proposed algorithm is positive reviews is highly desirable as it can increase sales presented in algorithm 3. volume. It is clear that information cascades can usually be steered by manipulating social information, e.g. by creating Algorithm 3 Incentive Seeding biased reviews or introducing fake statistics. Therefore, if out 1: Input: d , ⌫⇤, k people have the ability to selfishly manipulate information min H cascades, then the integrity and trust of the e-commerce 2: for n =1,...,N do 3: if dout dout then system will be compromised. Therefore, the design of tools n min 4: ⌫ ⌫⇤ for preventing fake information cascades is a fundamental n challenge for the well-being of a digital society [5]. To approach this issue, a first step is to analize (8) using 4) Numerical results: The algorithms were tested over

a new perspective, from which ⇤Sn is interpreted as personal various networks of different topologies, including fully-

evidence, ⇤Gn as the social observation, ⌘n as the connected networks, Erdos-Renyi¨ random networks, small- and ⌫n as the incentive structure of the n-th agent. In most world networks, and scale-free networks [38], [67], [116]. The scenarios it is not possible to control the personal evidence and private signals were assumed to follow a Binary Symmetric bias, which can only be estimated experimentally. However, channel with respect to the state of the world variable, similar in many cases the scope of the social observation and the to the case analyzed in Section VI-B. The parameter of the incentive structure are susceptible to be engineered, as the Binary Symmetric channel, denoted here as p, corresponds system manager usually can modify the portion of the social to the agent’s private signal quality. Numerical simulations network that is available to the inspection of other agents (e.g. were perform to compare how agents can form information determining a group of desirable reviews that shall be shown to cascades under various setting, and to investigate the ways to new customers). Similarly, the incentive structure can also be steer cascades efficiently. engineered, say, by acknowledging good reviews with special Figures 13 and 14 demonstrate the ratio of -cascades that H1 bonus, coupon, or discounts. can be steered by a system controller when the true state of Following this rationale, selective rewiring is proposed as the world is W =0, and see the relation to the signal quality. a method to control or steer information cascades when the Interestingly, results show that the social network topology social can be controlled by the system admin- significantly affect the steering capability, which suggests istrator. On the other hand, incentive seeding is served for some practical system design considerations with respect to scenarios where the incentive structure can be engineered. particular types of network topology: 2) Selective rewiring: Let us denote by -cascade as a Hk For Erdos-Renyi¨ networks, adding more links progres- cascade where X = k for all m n , and by the • m c Bn sively is an effective way of steering cascades. set of neighbours of the n-th agent. The intuition behind the For small-world and scale-free networks, it is better to • proposed selective rewiring approach is that, in order to steer combine the use of adding more new links and deleting a -cascade, one can restructure the observation neighbour- Hk bad links. hood of the n-th agent n to connect it with previous decisions Small-world networks have small diameters, which helps B • such that Xj = k, or to disconnect it previous decisions where to resist control. Xj = k, where 1 j

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

21

(a) Full observation network (b) ER network (c) Small-world network (d) Scale-free network

Fig. 13: Ratio of 1-cascades that can be steered by selective rewiring when W =0under various network topologies. Intuitively, when the private signalH is very good then all cascades follow the state of the world, hence reducing the possibilities of steering other cascades.

(a) Full observation network (b) ER network (c) Small-world network (d) Scale-free network out Fig. 14: Ratio of 1-cascades that can be steered using incentive seeding under different social network topologies. The value of dmin determines the minimalH outdegree needed to be considered a influential agent whose incentives should be modified. Selecting an appropriate out dmin has a important impact over the algorithm performance [39].

online digital platforms. Moreover, it was demonstrated that connected if their distance is smaller than a radius r, that is, it feasible for a system designer to identify control variables of a complex social system and use them to devise effective ~qi ~qj

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

22

C. Social learning for machine intelligence (piece-wise continuous) function, then y can be expressed S y as the union of intervals. Then 1 [a (y),b (y)] = (note Our interpretation of social learning as a data aggregation [j=1 j j S scheme can be applied to machine learning [121], [122]. In that ⇤s(aj(y)) = ⇤s(bk(y)) = y) and hence from (37) is clear effect, by considering a scenario where there are a number of that b (y) interconnected learning machines, we can view each machine 1 j 1 F ⇤(y)= dµ = [H (b (y)) H (a (y))] . as a social agent and the whole system as a social network, w w w j w j j=1 aj (y) j=1 where the neighbourhoods are determined by the degree of X Z X interaction between different machines. APPENDIX B The primary objective of learning machines is to learn PROOFS OF SECTION V from the patterns in their observations about the environment, and hence minimize their prediction error with respect to a Proof of Proposition 2: To prove the Markovianity, let us given loss function. When the observation includes both the first note that (V-B) implicitly show that Xn is conditionally n 1 n 1 data generated from the state of the world and the prediction independent of X given ⌧n (in this case Gn = X ). Therefore, (12) shows that ⌧ = ⌧ (X , ⌧ ), and made by other learning machines, this scenario fit in the n+1 n n n social learning framework presented in this work. With the therefore if ⌧n is given then ⌧n+1 depend only on Xn and n detailed analysis of the average cost function in Section VII, hence is conditionally independent on X as well. Finally, the inspired by the work of [123], [124], it is noted that one can Markovianity is proven by realizing that ⌧k is a deterministic n characterize the interdependency among the loss functions of function of X , and hence also conditionally independent of learning machines. Therefore, the problem is how to design a ⌧n+1 given ⌧n. mechanisms for a networked machine learning system, which To prove that ⌧n n1=1 is still a sub- or super- martingale {n } can be very relevant for problems of Ensemble Learning. with respect to X , depending on the realization of W = w. Analytic techniques of social learning can be applied to study In fact, it is direct to see that the complex feedback that exists between each learning ma- n 1 j 1 E ⌧n W = w, X = ⌧n 1 DXj x (39) chine and the whole networked system. Do groups of learning | | j 1 machines generate information cascades? Is the knowledge where DXj x is the conditional mean value of | j 1 about information cascades suitable to be used for improving ⇤X Xj 1 (Xj x ), which can be computed as j | | the achievable performance of systems of networked learning j 1 machines? These issues constitute a promising application of P1 Xj x j 1 | DXj x = E log j 1 W = w, our proposed framework. | ( P0 Xj X ) | D( p1 1,xj 1 p1 0,x j 1 ) if w =1, and XI. CONCLUDING REMARKS = | || | j 1 j 1 Although this paper presents a systematic analytical ( D( p1 0,x p1 1,x ) if w =0, | || | methodology and an extensive literature survey with illus- j 1 j 1 j 1 where pxj w,x = Pw Xj = xj X = x and trating examples, many aspects of collective behavior in | p 1 p | D(p q)=p log +(1 p) log is the Kullback-Leiver human society and online platforms remain open to human || q 1 q divergence between two Bernoulli distributions with param- knowledge. Moreover, the impact they have on the techno- eters p and q, respectively. The Lemma is finally proven by logical development and our human society urge for further the well-known non-negativity of the Kullback-Leiver diver- explorations. Many possibilities to assist future technological gence [102]. systems design are still on the horizon, and rely on the readers Proof of Theorem 1: To start, let us note that for any of IEEE Access to make them come true. x R the function ⇤S(s) introduces a partition over the signal 2 0 1 0 APPENDIX A space = (x) (x), where s (x) if ⇤S(s) U then 1(⌧ )= , n s S n ⇤ and therefore Xn =0for all possible signals Sn. Similarly, Fw (y)= dµw (37) ? y if ⌧

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

23

full n Hence, the conditional independency of Xn and Sn given almost surely, which in turns shows that ⌧n+1(X ) >Us n 1 ⌧n —which is a function of X — is equivalent to Xn almost surely as well. Finally, the fact that all decision vectors n 1 n to be a deterministic function of X . This is equivalent x that have positive probability guarantee ⌧n >Us, combined n 1 for Xn and W to be conditionally independent given X , with the consistent distortion condition and Lemma 2 guaran- n 1 which in turns is equivalent to P Xn W =1, X = tee that ⌧n(Gn+1) >Us almost surely. This, combined with n 1 | P Xn W =0, X , which guarantees (ii). Proposition 1, proves the desired result. | Let us show that (i) (iii). The previous discussion showed The proof for the the case ⌧n(g) n. This, in turn, implies that (i) holds for all m>n. ANALYSIS OF SYSTEMS WITH VARIOUS PRIVATE SIGNAL Therefore, it can be seen that Xm is a deterministic function of STRUCTURE n 1 X , and hence Xm and Sm are conditionally independent A. Binomial distribution n 1 on X for all m>n, proving (iii). Let us consider the case where the signals follow a Binomial To prove (iii) (i), let us assume that (i) does not hold and ) distribution with parameters qw and n, i.e. show this implies that (iii) also doesn’t. If ⌧n (L, U) implies 0 1 2 n that both (⌧n) and (⌧n) are both not empty, and hence s (n s) S S pw(s)= qw(1 qw) . (43) min P Xn =1⌧n , P Xn =0⌧n > 0. The fact that both s { { | } { | } ✓ ◆ probabilies are positive implies that there are signals s 2 S By assuming without lack of generality that q1 >q0, then the that make Xn =1and others that make Xn =0, and hence signal log-likelihood is a linear function given by Xn and Sn are not conditionally independent given ⌧n, or n 1 q1 1 q1 equivalently X . Therefore the system is not in a cascade. ⇤ (s)=s log +(n s) log . (44) S q 1 q 0 0 1 q Proof of Lemma 2: The proof can be done directly using Note that ⇤ (s) is bounded with L = ⇤ (0) = n log 1 S s S 1 q0 an induction over n, and is left to the interested reader. and U = n log q1 . Moreover, using XX and XX and XX one s q0 Proof of Theorem 2: Let us assume that Gn has consis- can find that tent distortion and consider g n is such that ⌧n(g) >Us. 2 G 0 if ⌧n 1 if ⌧n >US. ⇤Gn (g) = log { } P P0 Gn = g { } n 1 n 1 n 1 where k⇤(⌧n) is the largest> integer k such that ⇤S(k) ⌧n. n 1 ↵ (g x ) X = x : x n n P1  = log 2A | , By defining k⇤ = 1 when ⌧n 0 and ⌧n (x ) >Us. Xn X n 1 n (45) n 1 | n 1 n 1 | | j=k +1 p0(j) Above, n = x 0, 1 ↵(g x ) > 0 . Then, P ⇤ A { 2 { } | | n 1 } k⇤ due to the consistent distortion, all x 0, 1 n also P j=1 p1(j) full n 1 2 { } 2 A +(1 Xn) log (46) satisfy ⌧n (x ) >Us. k⇤ p (j) n 1 Pj=1 0 A similar derivation shows that, for any x n, all n 1 2 A :=f(X1,k⇤) . (47) g0 such that ↵ (g0 x ) > 0 also satisfy ⌧ (g0) >U . P 2 Gn n | n s Therefore, is clear that Therefore, a partition of n +1 regions is introduced in the n 1 n 1 decision signal space by the signal log-likelihood function, Pw Xn =0X = x { | } each of which indexed by a k⇤. Therefore, the step sizes can n 1 = ↵n(g0 x )Pw Xn =0Gn = g0 be of n +1different sizes, according to which region the ⌧ | { | } n g X02Gn belongs to. n 1 ⇤ = ↵ (g0 x )F (⌧ (g0)) n | w n g n B. Poisson signals X02G =1 , (41) Let us now consider a system where agents receive discrete but infinite signals s 0, 1,... that follow a Poisson where the last equality is due to the fact that if ⌧n >Us 2 { } ⇤ n 1 distribution with parameter Lw, i.e. then Fw (⌧n)=1and also that for a fixed x the terms n 1 (L )s ↵n(g0 x ) form a p.d.f. With this result, it can be shown w Lw | Pw Sn = s = e (48) that { } n! n 1 Let us assume without lack of generality that L1 >L0. Then, n P1 Xn X n 1 n | n 1 ⇤X (X ) = log n 1 + ⇤X (X ) the signal log-likelihood is given by P0 Xn X n| 1 L1 n 1 ⇤ (s)=s log + L L , (49) = ⇤X (X ) (42) S 0 1 L0

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

24

2 s mw u /2 where s is the greatest integer that is smaller that s. 1 Q( ), where Q(s)=1/(2⇡) 1 e du. Hence, b c s Interestingly, as s 0 then the private beliefs are bounded one can find that R from bellow by Ls = L0 L1 but not from above. 2 1 w m Therefore, one can find that Hw(⇤ (x)) = Hw x =1 Q x +( 1) . S 2m 2m ✓ ◆ ⇣ ⌘ 1 l + L1 L0 ⇤ (l)= (50) Using this and (23), one can find that S log L1 & L0 ' n 1 social w m P Xn =1W = w, X = Q ⌧n +( 1) . Then, using XX and XX and XX one can find that | 2m ⇣ ⌘ social Finally, noting that 1 Q(x)=Q( x), the conditional log- 0 if ⌧

Let us assume that, for given W = w, Sn are continuous Let us consider now the case in which agents have access real variables which are absolutely continuous with respect to Cauchy signals, whose p.d.f. is given by to the Lebesgue measure, i.e. their statistics can be described 1 h (s)= using a p.d.f. hw(s). Then, the log-likelihood ratio can be w 2 (58) s mw expressed as ⇡ 1+ h1(s)  ⇤S(s) = log . (52) ⇣ ⌘ h0(s) where mw and are the location and shape parameters. As As a particular case, we will study Gaussian channels where, in the case of Gaussian signals, we will assume that m1 = m0 := m while do not depend on W . With this, a direct for given W = w, Sn distributes as a Gaussian random calculation gives that variable with mean value mw and variance that does not dependent on W = w. Without loose of generality, let us 2 +(s + m)2 ⇤ (s) = log . assume that m = m := m. Then, a direct computation S 2 2 (59) 1 0 +(s m) shows that for this case the log-likelihood is linear, as shown by a direct computation: By inverting the equation ⇤S(s)=l, one finds the following second order polinomial 1 ⇤ (s)= (s + m)2 (s m)2 (53) S 2 1+el 2 s2 +2 ms + 2 + m2 =0 , (60) 2m ⇥ ⇤ 1 el = 2 s. (54) whose solution is given by This shows that Gaussian signals provide non-bounded beliefs, as a large signal can provided a arbitrarly strong evidence in 1+el 4m2el s (l)= m 2 . (61) favor of any of the two states of the world. Note that this also ± 1 el ± s(1 el)2 shows that the log-likelihood is an increasing function, and its Therefore, for each l R 0 there exists two signals inverse can be expressed as 2 { } such that ⇤s(s )=l, which are the ones given in (61) (see 2 ± 1 ⇤ (x)= x. (55) Figure 15). With this and (62), one can find that S 2m Moreover, it is useful to recall that the c.d.f. of Gaussian ⇤ Hw(s+(l)) Hw(s (l)) if l 0 Fw (l)=  (62) variables can be written in terms of the Q-function as Hw(s)= (Hw(s (l)) + 1 Hw(s+(l)) if l>0,

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

25

1 s+( 1)wm 1 where Hw(s)= arctan + is the c.d.f. of a Let us denote as n = t R exists g ⇡ 2 T { 2 | 2 Cauchy distribution. Using this and (23), one finds that n such that ⌧n(g)=t the set of all values that ⌧n can adopt. ⇣ ⌘ G } Then, one can re-write the above expression as n Pw Xn+1 =1X { | } n n n 1 Hw(s+(X )) + Hw(s (X )) if ⌧n(X ) 0 ¯ = n n n  n(⇡b)= ([2w 1][⌫ t]) 0 P Gn = g 1 (Hw(s+(X )) Hw(s (X )) if ⌧n(X ) > 0, U { } w 0,1 g n 2X{ } X2G 1 c+ c n t n B⌧n(g)=t C 2T 1 ⇡ arctan arctan if ⌧n(X ) 0 B C  ⇤ @ ⇤ A = 8 h ⇣ ⌘ ⇣ ⌘i uw,0Fw (t)+uw,1 1 Fw (t) . 1 c+ c n ⇥ > arctan arctan if ⌧ (X ) > 0. < ⇡ n Finally, the Proposition⇣ is proven⇥ by expanding⇤ ⌘ the sum over h ⇣ ⌘ ⇣ ⌘i W and using (27). Above,:> the last equality introduces the shorthand notation n w n w c+ = s+(X )+( 1) m and c = s (X )+( 1) m. Proof of Theorem 3: Due to the theorem’s assumptions and Theorem 2, one can find that, for large values of n, Tn ends up being composed by numbers that are either larger than Us or smaller than Ls. Now, by recalling that if t>Us then 2 ⇤ ⇤ Fw (t)=1while if t

1 ¯ (⇡b)= lim u1,0 (u1,0 u0,0)(t ⌫) P ⌧n = t U1 n { } !1 t 0.5 n h i t>UX2T s

0 likelihood +lim u (u u )(⌫ t) ⌧ = t . − 0,1 0,1 1,1 P n n { } Log !1 t −0.5 n h i t

−2 (u1,0 u0,0)(t ⌫) (u1,0 u0,0)(Us ⌫) . (65) −20 −15 −10 −5 0 5 10 15 20 Signal value Equivalently, for the case of t L one finds that (u  s 0,1 u )(⌫ t) u (L )+u (⌫ L ). Using these Fig. 15: In contranst to the Gaussian case, Cauchy signals provide 1,1 0,1 s 1,1 s bounded beliefs. This implies that a very high signal in the Gaussian inequalities the theorem can be proven, taking into account case is a clear indicator that w =1, while in the Cauchy case such a that high signal is not really informative. Moreover, the inverse is no 1-1 but 1-2. Red asterisks show s+ and s as given in (61) for l =1/2. P C0 =lim P ⌧n = t =limP ⌧n Us , (66) { } n { } n { } !1 t n !1 t>UX2T s

P C1 =lim P ⌧n = t =limP ⌧n Ls . (67) { } n { } n {  } APPENDIX D !1 t !1 2Tn PROOFS OF SECTION VII t

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

26

Finally, using this result the distribution of a decision vector [24] R. S. Blum, S. A. Kassam, and H. V. Poor, “Distributed detection with can be found using the fact that multiple sensors i. advanced topics,” Proceedings of the IEEE, vol. 85, no. 1, pp. 64–79, 1997. n [25] J. Tsitsiklis and M. Athans, “On the complexity of decentralized deci- n n j 1 j 1 Pw X = x = Pw Xj = xj X = x , sion making and detection problems,” IEEE Transactions on Automatic { } | j=1 Control, vol. 30, no. 5, pp. 440–446, 1985. Y (68) [26] D. Warren and P. Willett, “Optimum quantization for detector fusion: some proofs, examples, and pathology,” Journal of the Franklin Insti- 0 tute, vol. 336, no. 2, pp. 323–359, 1999. with the that Pw X1 = x1 X = | [27] J.-F. Chamberland and V. V. Veeravalli, “Asymptotic results for de- Pw X1 = x1 . centralized detection in power constrained wireless sensor networks,” { } IEEE Journal on selected areas in communications, vol. 22, no. 6, pp. REFERENCES 1007–1015, 2004. [28] J. D. Papastavrou and M. Athans, “Distributed detection by a large team [1] Y. Bar-Yam, “Complexity rising: From human to human civi- of sensors in tandem,” IEEE Transactions on Aerospace and Electronic lization, a complexity profile,” Encyclopedia of Life Support Systems Systems, vol. 28, no. 3, pp. 639–653, 1992. (EOLSS), UNESCO, EOLSS Publishers, Oxford, UK, 2002. [29] P. F. Swaszek, “On the performance of serial networks in distributed [2] J. P. Crutchfield, “The calculi of emergence: computation, dynamics detection,” IEEE transactions on aerospace and electronic systems, and induction,” Physica D: Nonlinear Phenomena, vol. 75, no. 1, pp. vol. 29, no. 1, pp. 254–260, 1993. 11–54, 1994. [30] B. Chen, L. Tong, and P. K. Varshney, “Channel aware distributed [3] S. Wolfram, A new kind of science. Wolfram media Champaign, 2002, detection in wireless sensor networks,” in IEEE Signal Processing Mag. vol. 5. Citeseer, 2006. [4] R. K. Sawyer, Social emergence: Societies as complex systems. Cam- [31] J.-F. Chamberland and V. V. Veeravalli, “Wireless sensors in distributed bridge University Press, 2005. detection applications,” IEEE signal processing magazine, vol. 24, [5] K.-C. Chen, M. Chiang, and H. V. Poor, “From technological networks no. 3, pp. 16–25, 2007. to social networks.” IEEE Journal on Selected Areas in Communica- [32] Z.-Q. Luo, “Universal decentralized estimation in a bandwidth con- tions, vol. 31, no. 9-Supplement, pp. 548–572, 2013. strained sensor network,” IEEE Transactions on information theory, [6] M. Del Vicario, A. Bessi, F. Zollo, F. Petroni, A. Scala, G. Caldarelli, vol. 51, no. 6, pp. 2210–2219, 2005. H. E. Stanley, and W. Quattrociocchi, “The spreading of misinformation [33] J.-J. Xiao and Z.-Q. Luo, “Decentralized estimation in an inhomo- online,” Proceedings of the National Academy of Sciences, vol. 113, geneous sensing environment,” IEEE Transactions on Information no. 3, pp. 554–559, 2016. Theory, vol. 51, no. 10, pp. 3564–3575, Oct 2005. [7] F. Turner, From counterculture to cyberculture: Stewart Brand, the [34] E. Masazade, R. Rajagopalan, P. K. Varshney, C. K. Mohan, G. K. Whole Earth Network, and the rise of digital utopianism. University Sendur, and M. Keskinoz, “A multiobjective optimization approach to Of Chicago Press, 2010. obtain decision thresholds for distributed detection in wireless sensor [8] C. R. Sunstein and A. Vermeule, “Conspiracy theories: Causes and networks,” IEEE Transactions on Systems, Man, and Cybernetics, Part cures,” Journal of Political Philosophy, vol. 17, no. 2, pp. 202–227, B (Cybernetics), vol. 40, no. 2, pp. 444–457, 2010. 2009. [35] S. Haykin and K. R. Liu, Handbook on array processing and sensor [9] R. S. Nickerson, “Confirmation bias: A ubiquitous phenomenon in networks. John Wiley & Sons, 2010, vol. 63. many guises.” Review of general psychology, vol. 2, no. 2, p. 175, [36] V. V. Veeravalli and P. K. Varshney, “Distributed inference in wireless 1998. sensor networks,” Philosophical Transactions of the Royal Society of [10] R. K. Garrett, “Echo chambers online?: Politically motivated selective London A: Mathematical, Physical and Engineering Sciences, vol. 370, exposure among internet news users1,” Journal of Computer-Mediated no. 1958, pp. 100–117, 2012. Communication, vol. 14, no. 2, pp. 265–285, 2009. [37] S. Barbarossa, S. Sardellitti, and P. Di Lorenzo, “Distributed detection [11] L. Howell, “Digital wildfires in a hyperconnected world,” WEF Report, and estimation in wireless,” Academic Press Library in Signal Pro- 2013. cessing: Communications and Radar Signal Processing, vol. 2, p. 329, [12] M. Castells, The rise of the network society: The information age: 2013. Economy, society, and . John Wiley & Sons, 2011, vol. 1. [13] “Cisco visual networking index: Global mobile data traffic forecast, [38] D. Easley and J. Kleinberg, Networks, crowds, and markets: Reasoning 2015–2020,” Cisco, Tech. Rep., February 2016. about a highly connected world. Cambridge University Press, 2010. [14] N. A. Christakis and J. H. Fowler, Connected: The surprising power [39] J.-H. Hsiao and K.-C. Chen, “Steering information cascades in a social of our social networks and how they shape our lives. Little, Brown, system by selective rewiring and incentive seeding,” in Communica- 2009. tions (ICC), 2016 IEEE International Conference on. IEEE, 2016, [15] W. Quattrociocchi, G. Caldarelli, and A. Scala, “Opinion dynamics pp. 1–6. on interacting networks: media competition and social influence,” [40] J. Leskovec, L. A. Adamic, and B. A. Huberman, “The dynamics of Scientific reports, vol. 4, 2014. viral marketing,” ACM Transactions on the Web (TWEB), vol. 1, no. 1, [16] R. Kumar, M. Mahdian, and M. McGlohon, “Dynamics of conver- p. 5, 2007. sations,” in Proceedings of the 16th ACM SIGKDD international [41] I. Lorge, D. Fox, J. Davitz, and M. Brenner, “A survey of studies con- conference on Knowledge discovery and data mining. ACM, 2010, trasting the quality of group performance and individual performance, pp. 553–562. 1920-1957.” Psychological bulletin, vol. 55, no. 6, p. 337, 1958. [17] F. Provost and T. Fawcett, “Data science and its relationship to big data [42] C. Hommes, J. Sonnemans, J. Tuinstra, and H. Van de Velden, and data-driven decision making,” Big Data, vol. 1, no. 1, pp. 51–59, “Coordination of expectations in asset pricing experiments,” Review 2013. of Financial studies, vol. 18, no. 3, pp. 955–980, 2005. [18] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and [43] I. Yaniv and M. Milyavsky, “Using advice from multiple sources to A. H. Byers, “Big data: The next frontier for innovation, competition, revise and improve judgments,” Organizational Behavior and Human and productivity,” McKinsey Global Institute, Tech. Rep., 2011. Decision Processes, vol. 103, no. 1, pp. 104–120, 2007. [19] G. Weiss, Multiagent systems: a modern approach to distributed [44] F. Galton, “Vox populi (the wisdom of crowds),” Nature, vol. 75, no. 7, artificial intelligence. MIT press, 1999. pp. 450–451, 1907. [20] M. Wooldridge, An introduction to multiagent systems. John Wiley [45] J. Surowiecki, The wisdom of crowds. Anchor, 2005. & Sons, 2009. [46] H. Oinas-Kukkonen, “Network analysis and crowds of people as [21] K.-D. Kim and P. R. Kumar, “Cyber–physical systems: A perspective sources of new organisational knowledge,” Knowledge Management: at the centennial,” Proceedings of the IEEE, vol. 100, no. Special Theoretical Foundation, pp. 173–189, 2008. Centennial Issue, pp. 1287–1308, 2012. [47] H. N. Tuttle, The crowd is untruth: The existential critique of mass [22] J. N. Tsitsiklis et al., “Decentralized detection,” Advances in Statistical society in the thought of Kierkegaard, Nietzsche, Heidegger, and Ortega Signal Processing, vol. 2, no. 2, pp. 297–344, 1993. y Gasset. Peter Lang, 1996, vol. 176. [23] R. Viswanathan and P. K. Varshney, “Distributed detection with multi- [48] M. J. Salganik, P. S. Dodds, and D. J. Watts, “Experimental study of ple sensors i. fundamentals,” Proceedings of the IEEE, vol. 85, no. 1, inequality and unpredictability in an artificial cultural market,” science, pp. 54–63, 1997. vol. 311, no. 5762, pp. 854–856, 2006.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

27

[49] R. L. Goldstone and T. M. Gureckis, “Collective behavior,” Topics in [75] R. A. Banos,˜ J. Borge-Holthoefer, and Y. Moreno, “The role of hidden cognitive science, vol. 1, no. 3, pp. 412–438, 2009. influentials in the diffusion of online information cascades,” EPJ Data [50] J. Lorenz, H. Rauhut, F. Schweitzer, and D. Helbing, “How social Science, vol. 2, no. 1, p. 1, 2013. influence can undermine the wisdom of crowd effect,” Proceedings of [76] Y. Wang, H. Shen, S. Liu, and X. Cheng, “Learning user-specific latent the National Academy of Sciences, vol. 108, no. 22, pp. 9020–9025, influence and susceptibility from information cascades,” in Proceedings 2011. of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI [51] W. A. Mason, F. R. Conrey, and E. R. Smith, “Situating social influence Press, 2015, pp. 477–483. processes: Dynamic, multidirectional flows of influence within social [77] K. Burghardt, E. F. Alsina, M. Girvan, W. M. Rand, and K. Lerman, networks,” Personality and review, vol. 11, no. 3, “The myopia of crowds: A study of collective evaluation on stack pp. 279–300, 2007. exchange,” Robert H. Smith School Research Paper No. RHS, vol. [52] S. E. Asch, “Opinions and social pressure,” Readings about the social 2736568, 2016. animal, vol. 193, pp. 17–26, 1955. [78] L. E. Celis, P. M. Krafft, and N. Kobe, “Sequential voting promotes [53] H. J O’Gorman, “The discovery of : An ironic collective discovery in social recommendation systems,” in Tenth lesson,” Journal of the History of the Behavioral Sciences, vol. 22, International AAAI Conference on Web and Social Media, 2016. no. 4, pp. 333–347, 1986. [79] S. Sreenivasan, K. S. Chan, A. Swami, G. Korniss, and B. Szymanski, [54] S. Bikhchandani, D. Hirshleifer, and I. Welch, “A theory of fads, fash- “Information cascades in feed-based networks of users with limited ion, custom, and cultural change as informational cascades,” Journal attention,” IEEE Transactions on and Engineering, of political Economy, pp. 992–1026, 1992. vol. PP, no. 99, pp. 1–1, 2016. [55] D. Sgroi, “Optimizing information in the herd: Guinea pigs, profits, and [80] N. E. Friedkin, “The problem of and coordination of welfare,” and Economic Behavior, vol. 39, no. 1, pp. 137–166, complex systems in : A look at the community cleavage 2002. problem,” IEEE Control Systems, vol. 35, no. 3, pp. 40–51, 2015. [56] D. Gill and D. Sgroi, “Sequential decisions with tests,” Games and [81] A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec, “Steering economic Behavior, vol. 63, no. 2, pp. 663–678, 2008. user behavior with badges,” in Proceedings of the 22nd international [57] S. Bose, G. Orosel, M. Ottaviani, and L. Vesterlund, “Dynamic conference on World Wide Web. ACM, 2013, pp. 95–106. monopoly pricing and herding,” The RAND Journal of Economics, [82] F. Chierichetti, J. Kleinberg, and A. Panconesi, “How to schedule a vol. 37, no. 4, pp. 910–928, 2006. cascade in an arbitrary graph,” SIAM Journal on Computing, vol. 43, [58] C. Carboneau, “Using diffusion of innovation and academic detaling no. 6, pp. 1906–1920, 2014. to spread evidence-based practices,” Journal for Healthcare Quality, [83] I. H. Lee, “On the convergence of informational cascades,” Journal of vol. 27, no. 2, pp. 48–52, 2005. Economic theory, vol. 61, no. 2, pp. 395–411, 1993. [59] W. Farnsworth, The legal analyst: A toolkit for thinking about the law. [84] S. Bikhchandani, D. Hirshleifer, and I. Welch, “Learning from the University of Chicago Press, 2008. behavior of others: Conformity, fads, and informational cascades,” The [60] D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins, “Information Journal of Economic Perspectives, vol. 12, no. 3, pp. 151–170, 1998. diffusion through blogspace,” in Proceedings of the 13th international [85] V. Bala and S. Goyal, “Learning from neighbours,” The review of conference on World Wide Web. ACM, 2004, pp. 491–501. economic studies, vol. 65, no. 3, pp. 595–621, 1998. [61] E. Bakshy, I. Rosenn, C. Marlow, and L. Adamic, “The role of [86] L. Smith and P. Sørensen, “Pathological outcomes of observational social networks in information diffusion,” in Proceedings of the 21st learning,” Econometrica, vol. 68, no. 2, pp. 371–398, 2000. international conference on World Wide Web. ACM, 2012, pp. 519– [87] V. Bala and S. Goyal, “Conformism and diversity under social learn- 528. ing,” Economic theory, vol. 17, no. 1, pp. 101–120, 2001. [62] D. M. Romero, B. Meeder, and J. Kleinberg, “Differences in the [88] A. Banerjee and D. Fudenberg, “Word-of-mouth learning,” Games and mechanics of information diffusion across topics: idioms, political Economic Behavior, vol. 46, no. 1, pp. 1–22, 2004. hashtags, and on ,” in Proceedings of the [89] H. V. Poor, An introduction to signal detection and estimation. 20th international conference on World wide web. ACM, 2011, pp. Springer Science & Business Media, 2013. 695–704. [90] D. Gale and S. Kariv, “Bayesian learning in social networks,” Games [63] J. Yang and J. Leskovec, “Modeling information diffusion in implicit and Economic Behavior, vol. 45, no. 2, pp. 329–346, 2003. networks,” in 2010 IEEE International Conference on Data Mining. [91] D. Acemoglu, M. A. Dahleh, I. Lobel, and A. Ozdaglar, “Bayesian IEEE, 2010, pp. 599–608. learning in social networks,” The Review of Economic Studies, vol. 78, [64] A. Anderson, D. Huttenlocher, J. Kleinberg, J. Leskovec, and M. Ti- no. 4, pp. 1201–1236, 2011. wari, “Global diffusion via cascading invitations: Structure, growth, [92] D. Acemoglu and A. Ozdaglar, “Opinion dynamics and learning in and ,” in Proceedings of the 24th International Conference social networks,” Dynamic Games and Applications, vol. 1, no. 1, pp. on World Wide Web. ACM, 2015, pp. 66–76. 3–49, 2011. [65] J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec, [93] M. H. DeGroot, “Reaching a consensus,” Journal of the American “Can cascades be predicted?” in Proceedings of the 23rd international Statistical Association, vol. 69, no. 345, pp. 118–121, 1974. conference on World wide web. ACM, 2014, pp. 925–936. [94] P. M. DeMarzo, J. Zwiebel, and D. Vayanos, “ bias, social [66] J. Cheng, L. A. Adamic, J. M. Kleinberg, and J. Leskovec, “Do influence, and uni-dimensional opinions,” Social Influence, and Uni- cascades recur?” in Proceedings of the 25th International Conference Dimensional Opinions (November 2001). MIT Sloan Working Paper, on World Wide Web. International World Wide Web Conferences no. 4339-01, 2001. Steering Committee, 2016, pp. 671–681. [95] B. Golub and M. O. Jackson, “Naive learning in social networks and [67] M. Newman, Networks: an introduction. Oxford university press, the wisdom of crowds,” American Economic Journal: Microeconomics, 2010. vol. 2, no. 1, pp. 112–149, 2010. [68] S. L. Huang and K. C. Chen, “Information cascades in social networks [96] D. Acemoglu, A. Ozdaglar, and A. ParandehGheibi, “Spread of (mis) via dynamic system analyses,” in 2015 IEEE International Conference information in social networks,” Games and Economic Behavior, on Communications (ICC), June 2015, pp. 1262–1267. vol. 70, no. 2, pp. 194–227, 2010. [69] A. V. Banerjee, “A simple model of ,” The Quarterly [97] A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi, “Non- Journal of Economics, pp. 797–817, 1992. bayesian social learning,” Games and Economic Behavior, vol. 76, [70] L. R. Anderson and C. A. Holt, “Information cascades in the labora- no. 1, pp. 210–225, 2012. tory,” The American economic review, pp. 847–862, 1997. [98] J. Lamperti, Probability: a survey of the mathematical theory, 2nd ed. [71] J. E. Alevy, M. S. Haigh, and J. A. List, “Information cascades: Wiley, 1966. Evidence from a field experiment with financial market professionals,” [99] R. J. Shiller, “Conversation, information, and herd behavior,” The The Journal of Finance, vol. 62, no. 1, pp. 151–180, 2007. American Economic Review, vol. 85, no. 2, pp. 181–185, 1995. [72] S. Mori, M. Hisakado, and T. Takahashi, “Collective adoption of max– [100] A. Lalitha, A. Sarwate, and T. Javidi, “Social learning and distributed min strategy in an information cascade voting experiment,” Journal of hypothesis testing,” in 2014 IEEE International Symposium on Infor- the Physical Society of Japan, vol. 82, no. 8, p. 084004, 2013. mation Theory. IEEE, 2014, pp. 551–555. [73] E. Walden and G. Browne, “Information cascades in the adoption of [101] J. B. Rhim and V. K. Goyal, “Distributed hypothesis testing with new technology,” ICIS 2002 Proceedings, p. 40, 2002. social learning and symmetric fusion,” IEEE Transactions on Signal [74] A. De Vany and W. D. Walls, “Uncertainty in the movie industry: Does Processing, vol. 62, no. 23, pp. 6298–6308, 2014. star power reduce the terror of the box office?” Journal of cultural [102] T. M. Cover and J. A. Thomas, Elements of information theory. John economics, vol. 23, no. 4, pp. 285–318, 1999. Wiley & Sons, 2012.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2687422, IEEE Access

28

[103] J. B. Rhim and V. K. Goyal, “Social teaching: Being informative Fernando Rosas received the B.A. degree in mu- vs. being right in sequential decision making,” in Information Theory sic composition and philosophy (Minor), the B.Sc. Proceedings (ISIT), 2013 IEEE International Symposium on. IEEE, degree in mathematics, and the M.S. and Ph.D. 2013, pp. 2602–2606. degree in engineering sciences from the Pontificia [104] M. Loeve, Probability Theory I. Springer, 1978. Universidad Catlica de Chile, Santiago, Chile, in [105] B. Sklar, Digital communications: fundamentals and applications, 2003, 2007, and 2012, respectively. He is currently 2nd ed. Prentice Hall, 2001. working as a Postdoctoral Researcher with the Grad- [106] J. Dieudonne, Treatise on Analysis. Associated Press, New York, uate Institute of Communication Engineering, at the 1976, vol. II. National Taiwan University. His research interests [107] W. Feller, An introduction to probability theory and its applications: lay in the interface between communication theory volume I, 3rd ed. John Wiley & Sons, Inc., 1968. and complexity science. He was the recipient of the [108] F. Rosas and K. C. Chen, “Social learning against data falsification Marie Slodowska-Curie Individual Fellowship from the European Union in in sensor networks,” IEEE Signal Processing Letters, under revision 2016, the F+ Scholarship from the KU Leuven University in 2013, and the 2016. CONICYT Doctoral Fellowship from the Chilean Ministry of in [109] E. Shi and A. Perrig, “Designing secure sensor networks,” IEEE 2008. Wireless Communications, vol. 11, no. 6, pp. 38–43, 2004. [110] W. Trappe, R. Howard, and R. S. Moore, “Low-energy security: Limits and opportunities in the internet of things,” IEEE Security Privacy, vol. 13, no. 1, pp. 14–21, Jan 2015. [111] L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 4, no. 3, pp. 382–401, 1982. [112] K.-C. Chen and S.-Y. Lien, “Machine-to-machine communications: Technologies and challenges,” Ad Hoc Networks, vol. 18, pp. 3–23, 2014. [113] P. Ramuhalli, M. Halappanavar, J. Coble, and M. Dixit, “Towards a Jen-Hao Hsiao received B.S. degree in department theory of autonomous reconstitution of compromised cyber-systems,” of electrical engineering from National Taiwan Uni- in Technologies for Homeland Security (HST), 2013 IEEE International versity, Taipei, in 2015. He is currently working Conference on. IEEE, 2013, pp. 577–583. toward his M.S. degree in the Graduate Institute [114] R. Rajagopalan and P. K. Varshney, “Data-aggregation techniques in of Communication Engineering, National Taiwan sensor networks: A survey,” Commun. Surveys Tuts., vol. 8, no. 4, pp. University, Taipei. His research interests include so- 48–63, Oct. 2006. cial networks, information dynamics, and collective [115] A. Vempaty, L. Tong, and P. K. Varshney, “Distributed inference with behavior of interactive multi-agents. byzantine data: State-of-the-art review on data falsification attacks,” IEEE Signal Processing Magazine, vol. 30, no. 5, pp. 65–75, 2013. [116] M. O. Jackson et al., Social and economic networks. Princeton university press Princeton, 2008, vol. 3. [117] J. Ferber, Multi-agent systems: an introduction to distributed artificial intelligence. Addison-Wesley Reading, 1999, vol. 1. [118] R. Olfati-Saber, “Flocking for multi-agent dynamic systems: Algo- rithms and theory,” IEEE Transactions on automatic control, vol. 51, no. 3, pp. 401–420, 2006. [119] R. Hegselmann, U. Krause et al., “Opinion dynamics and bounded confidence models, analysis, and simulation,” Journal of Artificial Societies and Social Simulation, vol. 5, no. 3, 2002. [120] ——, “ and cognitive division of labour: First steps towards a computer aided social ,” Journal of Artificial Societies and Kwang-Cheng Chen (M89-SM94-F07) received the Social Simulation, vol. 9, no. 3, p. 10, 2006. B.S. from the National Taiwan University in 1983, [121] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning from and the M.S. and Ph.D from the University of data. AMLBook Singapore, 2012, vol. 4. Maryland, College Park, United States, in 1987 and [122] P. Domingos, “A few useful things to know about machine learning,” 1989, all in electrical engineering. From 1987 to Communications of the ACM, vol. 55, no. 10, pp. 78–87, 2012. 1998, Dr. Chen worked with SSE, COMSAT, IBM [123] M.-F. Balcan, A. Blum, J. D. Hartline, and Y. Mansour, “Mechanism Thomas J. Watson Research Center, and National design via machine learning,” in 46th Annual IEEE Symposium on Tsing Hua University, working on the mobile com- Foundations of Computer Science (FOCS’05). IEEE, 2005, pp. 605– munications and networks. From 1998 to 2016, he 614. was a Distinguished Professor with the National [124] N. Nisan, T. Roughgarden, E. Tardos, and V. V. Vazirani, Algorithmic Taiwan University, Taipei, Taiwan, ROC, also served theory. Cambridge University Press Cambridge, 2007, vol. 1. as the Director, Graduate Institute of Communication Engineering, Director, Communication Research Center, and the Associate Dean for Academic Affairs, College of Electrical Engineering and Computer Science during 2009- 2015. Since 2016, Dr. Chen has been with the Department of Electrical Engineering, University of South Florida, Tampa, United States. He has been actively involving in the organization of various IEEE conferences as General/TPC chair/co-chair, and has served in editorships with a few IEEE journals. Dr. Chen also actively participates in and has contributed essential technology to various IEEE 802, Bluetooth, and LTE and LTE-A wireless standards. Dr. Chen is an IEEE Fellow and has received a number of awards including the 2011 IEEE COMSOC WTC Recognition Award, 2014 IEEE Jack Neubauer Memorial Award and 2014 IEEE COMSOC AP Outstanding Paper Award. His recent research interests include wireless networks, cybersecurity, cyber-physical systems, social networks and data analytics.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.