A Probabilistic Representation of Deep Learning for Improving the Information Theoretic Interpretability

Total Page:16

File Type:pdf, Size:1020Kb

A Probabilistic Representation of Deep Learning for Improving the Information Theoretic Interpretability A Probabilistic Representation of Deep Learning for Improving The Information Theoretic Interpretability Xinjie Lan, Kenneth E. Barner Department of Electrical and Computer Engineering, University of Delaware, Newark, DE, USA, 19711 ARTICLEINFO ABSTRACT Keywords: In this paper, we propose a probabilistic representation of MultiLayer Perceptrons (MLPs) to improve deep neural networks the information theoretic interpretability. Above all, we demonstrate that the activations being i.i.d. is information bottleneck not valid for all the hidden layers of MLPs, thus the existing mutual information estimators based on probabilistic modeling non-parametric inference methods, e.g., empirical distributions and Kernel Density Estimate (KDE), non-parametric inference are invalid for measuring the information flow in MLPs. Moreover, we introduce explicit probabilistic explanations for MLPs: (i) we define the probability space (ΩF ; T ;PF / for a fully connected layer f and demonstrate the great effect of an activation function of f on the probability measure PF ; (ii) we prove the entire architecture of MLPs as a Gibbs distribution P ; and (iii) the back-propagation aims to optimize the sample space ΩF of all the fully connected layers of MLPs for learning an optimal Gibbs distribution P < to express the statistical connection between the input and the label. Based on the probabilistic explanations for MLPs, we improve the information theoretic interpretability of MLPs in three aspects: (i) the random variable of f is discrete and the corresponding entropy is finite; (ii) the information bottleneck theory cannot correctly explain the information flow in MLPs if we take into account the back-propagation; and (iii) we propose novel information theoretic explanations for the generalization of MLPs. Finally, we demonstrate the proposed probabilistic representation and information theoretic explanations for MLPs in a synthetic dataset and benchmark datasets. 1. Introduction Notably, the non-parametric statistical models lack solid theoretical basis in the context of DNNs. As two classical Improving the interpretability of Deep Neural Networks non-parametric inference algorithms (Wasserman, 2006), the (DNNs) is a fundamental issue of deep learning. Recently, empirical distribution and KDE approach the true distribu- numerous efforts have been devoted to explaining DNNs from tion only if the samples are independently and identically the view point of information theory. In the seminal work, distributed (i.i.d.). Specifically, the prerequisite of applying Shwartz-Ziv and Tishby(2017) initially use the Information the non-parametric statistics in DNNs is that the activations Bottleneck (IB) theory to clarify the internal logic of DNNs. of a hidden layer are i.i.d. samples of the true distribution Specifically, they claim that DNNs optimize an IB tradeoff of the layer. However, none of previous works explicitly between compression and prediction, and the generalization demonstrates the prerequisite. performance of DNNs is causally related to the compression. Moreover, the unclear definition for the random variable However, the IB explanation causes serious controversies, of a hidden layer results in an information theoretic issue especially Saxe et al.(2018) question the validity of the IB (Chelombiev et al., 2019). Specifically, a random variable explanations by some counter-examples, and Goldfeld et al. is a measurable function F :Ω → E mapping the sample (2019) doubt the causality between the compression and the space Ω to the measurable space E. All the previous works generalization performance of DNNs. simply assume the activations of a hidden layer as E but not Basically, the above controversies stem from different specify Ω, which indciates F as a continuous random vari- probabilistic models for the hidden layer of DNNs. Due to able because the activations are continuous. As a result, the arXiv:2010.14054v1 [cs.LG] 27 Oct 2020 the complicated architecture of DNNs, it is extremely hard to conditional distribution P .F X/ would be a delta function establish an explicit probabilistic model for the hidden layer ð under the assumption that DNNs are deterministic models, of DNNs. As a result, all the previous works have to adopt thereby the mutual information I.X;F / = ∞, where X is non-parametric statistics to estimate the mutual information. the random variable of the input. However, that contradicts Shwartz-Ziv and Tishby(2017) model the distribution of a experimental results I.X;F / < ∞. hidden layer as the empirical distribution (a.k.a. the binning To resolve the above information theoretic controversies method) of the activations of the layer, whereas Saxe et al. and further improve the interpretability for DNNs, this paper (2018) model the distribution as Kernel Density Estimation proposes a probabilistic representation for feedforward fully (KDE), and Goldfeld et al.(2019) model the distribution connected DNNs, i.e., the MultiLayer Perceptrons (MLPs), as the convolution between the empirical distribution and in three aspects: (i) we thoroughly study the i.i.d. property additive Gaussian noise. Inevitably, different probabilistic of the activations of a fully connected layer, (ii) we define models derive different information theoretic explanations the probability space for a fully connected layer, and (iii) we for DNNs, thereby leading to controversies. explicitly propose probabilistic explanations for MLPs and [email protected] (X. Lan) the back-propagation training algorithm. ORCID(s): 0000-0001-7600-106 (X. Lan) Xinjie Lan et al.: Preprint submitted to Elsevier Page 1 of 26 A Probabilistic Representation of Deep Learning for Improving The Information Theoretic Interpretability First, we demonstrate that the correlation of activations (') �"" �"" with the same label becomes larger as the layer containing �" (') the activations is closer to the output. Therefore, activations �'" (") (') �'" � �(" being i.i.d. is not valid for all the hidden layers of MLPs. "' ⋮ (") (') �%' � In other words, the existing mutual information estimators -" � (") � )" based on non-parametric statistics are not valid for all the �&' "' hidden layers of MLPs as the activations of hidden layers �% ⋮ � (() cannot satisfy the prerequisite. '' �"2 ⋮ (() Second, we define the probability space (Ω ; ;P / for �'2 � F T F � ⋮ )2 "( (() a fully connected layer f with N neurons given the input �32 ⋮ �'3 x. Let the experiment be f extracting a single feature of �& �"- x, (ΩF ; T ;PF / is defined as follows: the sample space ΩF consists of N possible outcomes (i.e., features), and each � � � � outcome is defined by the weights of each neuron; the event � � � space T is the -algebra; and the probability measure PF Figure 1: The input layer x has M nodes, and f1 has N N ³M .1/ is a Gibbs measure for quantifying the probability of each neurons ^f1n = 1[g1n.x/]`n=1, where g1n.x/ = m=1 !mn ⋅ xm + .1/ outcome occurring the experiment. Notably, the activation b1n is the nth linear function with !mn being the weight of the edge between x and f , and b being the bias. .⋅/ function of f has a great effect on PF , because an activation m 1n 1n 1 is a non-linear activation function, e.g., the ReLU function. equals the negative energy function of PF . Similarly, f = ^f = [g .f /]`K has K neurons, where Third, we propose probabilistic explanations for MLPs 2 2k 2 2k 1 k=1 g f ³N !.2/ f b . In addition, f is the softmax, and the back-propagation training: (i) we prove the entire 2k. 1/ = n=1 nk ⋅ 1n + 2k Y 1 ³K .3/ thus fyl = exp.gyl/ where gyl = k !kl ⋅ f2k + byl and architecture of MLPs as a Gibbs distribution based on the ZY =1 Gibbs distribution P for each layer; and (ii) we show that ³L F ZY = l=1 exp.gyl/ is the partition function. the back-propagation training aims to optimize the sample space of all the layers of MLPs for modeling the statistical connection between the input x and the label y, because the Finally, we generate a synthetic dataset to demonstrate weights of each layer define sample space. the theoretical explanations for MLPs. Since the dataset only In summary, the three probabilistic explanations for fully has four simple features, we can validate the probabilistic connected layers and MLPs establish a solid probabilistic explanations for MLPs by visualizing the weights of MLPs. foundation for explaining MLPs in an information theoretic In addition, the four features has equal probability, thus the way. Based on the probabilistic foundation, we propose three dataset has fixed entropy. As a result, we can demonstrate novel information theoretic explanations for MLPs. the information theoretic explanations for MLPs. Above all, we demonstrate that the entropy of F is finite, The rest of the paper is organized as follows. Section2 i.e., H.F / < ∞. Based on (ΩF ; T ;PF /, we can explicitly briefly discusses the related works. Section3 and4 propose define the random variable of f as F :ΩF → EF , where the probabilistic and information theoretic explanations for EF denotes discrete measurable space, thus F is a discrete MLPs, respectively. Section5 specifies the mutual informa- random variable and H.F / < ∞. As a result, we resolve the tion estimators based on (ΩF ; T ;PF / for a fully connected controversy regarding F being continuous. layer. Section6 validates the probabilistic and information Furthermore, we demonstrate that the information flow theoretic explanations for MLPs on the synthetic dataset and of X and Y in MLPs cannot satisfy IB if taking into account benchmark dataset MNIST and Fashion-MNIST. Section7 the back-propagation training. Specifically, the probabilistic concludes the paper and discusses future work. explanation for the back-propagation training indicates that Preliminaries. P .X;Y / = P .Y ðX/P .X/ is an unknown ΩF depends on both x and y, thus F depends on both X joint distribution between two random variables X and Y .A and Y , where Y is the random variable of y. However, IB j j j M j J dataset D = ^.x ; y / x Ë R ; y Ë R` consists of J requires that F is independent on Y given X, ð j=1 i.i.d.
Recommended publications
  • Ruelle Operator for Continuous Potentials and DLR-Gibbs Measures
    Ruelle Operator for Continuous Potentials and DLR-Gibbs Measures Leandro Cioletti Artur O. Lopes Departamento de Matem´atica - UnB Departamento de Matem´atica - UFRGS 70910-900, Bras´ılia, Brazil 91509-900, Porto Alegre, Brazil [email protected] [email protected] Manuel Stadlbauer Departamento de Matem´atica - UFRJ 21941-909, Rio de Janeiro, Brazil [email protected] October 25, 2019 Abstract In this work we study the Ruelle Operator associated to a continuous potential defined on a countable product of a compact metric space. We prove a generaliza- tion of Bowen’s criterion for the uniqueness of the eigenmeasures. One of the main results of the article is to show that a probability is DLR-Gibbs (associated to a continuous translation invariant specification), if and only if, is an eigenprobability for the transpose of the Ruelle operator. Bounded extensions of the Ruelle operator to the Lebesgue space of integrable functions, with respect to the eigenmeasures, are studied and the problem of exis- tence of maximal positive eigenfunctions for them is considered. One of our main results in this direction is the existence of such positive eigenfunctions for Bowen’s potential in the setting of a compact and metric alphabet. We also present a version of Dobrushin’s Theorem in the setting of Thermodynamic Formalism. Keywords: Thermodynamic Formalism, Ruelle operator, continuous potentials, Eigenfunc- tions, Equilibrium states, DLR-Gibbs Measures, uncountable alphabet. arXiv:1608.03881v6 [math.DS] 24 Oct 2019 MSC2010: 37D35, 28Dxx, 37C30. 1 Introduction The classical Ruelle operator needs no introduction and nowadays is a key concept of Thermodynamic Formalism.
    [Show full text]
  • The Dobrushin Comparison Theorem Is a Powerful Tool to Bound The
    COMPARISON THEOREMS FOR GIBBS MEASURES∗ BY PATRICK REBESCHINI AND RAMON VAN HANDEL Princeton University The Dobrushin comparison theorem is a powerful tool to bound the dif- ference between the marginals of high-dimensional probability distributions in terms of their local specifications. Originally introduced to prove unique- ness and decay of correlations of Gibbs measures, it has been widely used in statistical mechanics as well as in the analysis of algorithms on random fields and interacting Markov chains. However, the classical comparison theorem requires validity of the Dobrushin uniqueness criterion, essentially restricting its applicability in most models to a small subset of the natural parameter space. In this paper we develop generalized Dobrushin comparison theorems in terms of influences between blocks of sites, in the spirit of Dobrushin- Shlosman and Weitz, that substantially extend the range of applicability of the classical comparison theorem. Our proofs are based on the analysis of an associated family of Markov chains. We develop in detail an application of our main results to the analysis of sequential Monte Carlo algorithms for filtering in high dimension. CONTENTS 1 Introduction . .2 2 Main Results . .4 2.1 Setting and notation . .4 2.2 Main result . .6 2.3 The classical comparison theorem . .8 2.4 Alternative assumptions . .9 2.5 A one-sided comparison theorem . 11 3 Proofs . 12 3.1 General comparison principle . 13 3.2 Gibbs samplers . 14 3.3 Proof of Theorem 2.4................................. 18 3.4 Proof of Corollary 2.8................................. 23 3.5 Proof of Theorem 2.12................................ 25 4 Application: Particle Filters .
    [Show full text]
  • Continuum Percolation for Gibbs Point Processes Dominated by a Poisson Process, and Then We Use the Percolation Results Available for Poisson Processes
    Electron. Commun. Probab. 18 (2013), no. 67, 1–10. ELECTRONIC DOI: 10.1214/ECP.v18-2837 COMMUNICATIONS ISSN: 1083-589X in PROBABILITY Continuum percolation for Gibbs point processes∗ Kaspar Stuckiy Abstract We consider percolation properties of the Boolean model generated by a Gibbs point process and balls with deterministic radius. We show that for a large class of Gibbs point processes there exists a critical activity, such that percolation occurs a.s. above criticality. Keywords: Gibbs point process; Percolation; Boolean model; Conditional intensity. AMS MSC 2010: 60G55; 60K35. Submitted to ECP on May 29, 2013, final version accepted on August 1, 2013. Supersedes arXiv:1305.0492v1. 1 Introduction Let Ξ be a point process in Rd, d ≥ 2, and fix a R > 0. Consider the random S B B set ZR(Ξ) = x2Ξ R(x), where R(x) denotes the open ball with radius R around x. Each connected component of ZR(Ξ) is called a cluster. We say that Ξ percolates (or R-percolates) if ZR(Ξ) contains with positive probability an infinite cluster. In the terminology of [9] this is a Boolean percolation model driven by Ξ and deterministic radius distribution R. It is well-known that for Poisson processes there exists a critical intensity βc such that a Poisson processes with intensity β > βc percolates a.s. and if β < βc there is a.s. no percolation, see e.g. [17], or [14] for a more general Poisson percolation model. For Gibbs point processes the situation is less clear. In [13] it is shown that for some | downloaded: 28.9.2021 two-dimensional pairwise interacting models with an attractive tail, percolation occurs if the activity parameter is large enough, see also [1] for a similar result concerning the Strauss hard core process in two dimensions.
    [Show full text]
  • Mean-Field Monomer-Dimer Models. a Review. 3
    Mean-Field Monomer-Dimer models. A review. Diego Alberici, Pierluigi Contucci, Emanuele Mingione University of Bologna - Department of Mathematics Piazza di Porta San Donato 5, Bologna (Italy) E-mail: [email protected], [email protected], [email protected] To Chuck Newman, on his 70th birthday Abstract: A collection of rigorous results for a class of mean-field monomer- dimer models is presented. It includes a Gaussian representation for the partition function that is shown to considerably simplify the proofs. The solutions of the quenched diluted case and the random monomer case are explained. The presence of the attractive component of the Van der Waals potential is considered and the coexistence phase coexistence transition analysed. In particular the breakdown of the central limit theorem is illustrated at the critical point where a non Gaussian, quartic exponential distribution is found for the number of monomers centered and rescaled with the volume to the power 3/4. 1. Introduction The monomer-dimer models, an instance in the wide set of interacting particle systems, have a relevant role in equilibrium statistical mechanics. They were introduced to describe, in a simplified yet effective way, the process of absorption of monoatomic or diatomic molecules in condensed-matter physics [15,16,39] or the behaviour of liquid solutions composed by molecules of different sizes [25]. Exact solutions in specific cases (e.g. the perfect matching problem) have been obtained on planar lattices [24,26,34,36,41] and the problem on regular lattices is arXiv:1612.09181v1 [math-ph] 29 Dec 2016 also interesting for the liquid crystals modelling [1,20,27,31,37].
    [Show full text]
  • Markov Random Fields and Gibbs Measures
    Markov Random Fields and Gibbs Measures Oskar Sandberg December 15, 2004 1 Introduction A Markov random field is a name given to a natural generalization of the well known concept of a Markov chain. It arrises by looking at the chain itself as a very simple graph, and ignoring the directionality implied by “time”. A Markov chain can then be seen as a chain graph of stochastic variables, where each variable has the property that it is independent of all the others (the future and past) given its two neighbors. With this view of a Markov chain in mind, a Markov random field is the same thing, only that rather than a chain graph, we allow any graph structure to define the relationship between the variables. So we define a set of stochastic variables, such that each is independent of all the others given its neighbors in a graph. Markov random fields can be defined both for discrete and more complicated valued random variables. They can also be defined for continuous index sets, in which case more complicated neighboring relationships take the place of the graph. In what follows, however, we will look only at the most appro- achable cases with discrete index sets, and random variables with finite state spaces (for the most part, the state space will simply be {0, 1}). For a more general treatment see for instance [Rozanov]. 2 Definitions 2.1 Markov Random Fields Let X1, ...Xn be random variables taking values in some finite set S, and let G = (N, E) be a finite graph such that N = {1, ..., n}, whose elements will 1 sometime be called sites.
    [Show full text]
  • A Note on Gibbs and Markov Random Fields with Constraints and Their Moments
    NISSUNA UMANA INVESTIGAZIONE SI PUO DIMANDARE VERA SCIENZIA S’ESSA NON PASSA PER LE MATEMATICHE DIMOSTRAZIONI LEONARDO DA VINCI vol. 4 no. 3-4 2016 Mathematics and Mechanics of Complex Systems ALBERTO GANDOLFI AND PIETRO LENARDA A NOTE ON GIBBS AND MARKOV RANDOM FIELDS WITH CONSTRAINTS AND THEIR MOMENTS msp MATHEMATICS AND MECHANICS OF COMPLEX SYSTEMS Vol. 4, No. 3-4, 2016 dx.doi.org/10.2140/memocs.2016.4.407 ∩ MM A NOTE ON GIBBS AND MARKOV RANDOM FIELDS WITH CONSTRAINTS AND THEIR MOMENTS ALBERTO GANDOLFI AND PIETRO LENARDA This paper focuses on the relation between Gibbs and Markov random fields, one instance of the close relation between abstract and applied mathematics so often stressed by Lucio Russo in his scientific work. We start by proving a more explicit version, based on spin products, of the Hammersley–Clifford theorem, a classic result which identifies Gibbs and Markov fields under finite energy. Then we argue that the celebrated counterexample of Moussouris, intended to show that there is no complete coincidence between Markov and Gibbs random fields in the presence of hard-core constraints, is not really such. In fact, the notion of a constrained Gibbs random field used in the example and in the subsequent literature makes the unnatural assumption that the constraints are infinite energy Gibbs interactions on the same graph. Here we consider the more natural extended version of the equivalence problem, in which constraints are more generally based on a possibly larger graph, and solve it. The bearing of the more natural approach is shown by considering identifi- ability of discrete random fields from support, conditional independencies and corresponding moments.
    [Show full text]
  • Large Deviations and Applications 29. 11. Bis 5. 12. 1992
    MATHEMATISCHES FORSCHUNGSINSTITUT OBERWOLFACH Tagungsbericht 51/1992 Large Deviations and Applications 29. 11. bis 5. 12. 1992 Zu dieser Tagung unter der gemeinsamen Leitung von E. Bolthausen (Zurich), J. Gartner (Berlin) und S. R. S. Varadhan (New York) trafen sich Mathematiker und mathematische Physiker aus den verschieden- sten Landern mit einem breiten Spektrum von Interessen. Die Theorie vom asymptotischen Verhalten der Wahrscheinlichkeiten groer Abweichungen ist einer der Schwerpunkte der jungeren wahr- scheinlichkeitstheoretischen Forschung. Es handelt sich um eine Pra- zisierung von Gesetzen groer Zahlen. Gegenstand der Untersuchun- gen sind sowohl die Skala als auch die Rate des exponentiellen Abfalls der kleinen Wahrscheinlichkeiten, mit denen ein untypisches Verhal- ten eines stochastischen Prozesses auftritt. Die Untersuchung dieser kleinen Wahrscheinlichkeiten ist fur viele Fragestellungen interessant. Wichtige Themen der Tagung waren: – Anwendungen groer Abweichungen in der Statistik – Verschiedene Zugange zur Theorie groer Abweichungen – Stochastische Prozesse in zufalligen Medien – Wechselwirkende Teilchensysteme und ihre Dynamik – Statistische Mechanik, Thermodynamik – Hydrodynamischer Grenzubergang – Verhalten von Grenzachen, monomolekulare Schichten – Dynamische Systeme und zufallige Storungen – Langreichweitige Wechselwirkung, Polymere – Stochastische Netzwerke Die Tagung hatte 47 Teilnehmer, es wurden 42 Vortrage gehalten. 1 Abstracts G. Ben Arous (joint work with A. Guionnet) Langevin dynamics for spin glasses We study the dynamics for the Sherrington-Kirkpatrick model of spin glasses. More precisely, we take a soft spin model suggested by Som- polinski and Zippelius. Thus we consider diusions interacting with a Gaussian random potential of S-K-type. (1) We give an “annealed” large deviation principle for the empirical measure at the process level. We deduce from this an annealed law of large numbers and a central limit theorem.
    [Show full text]
  • Stein's Method for Discrete Gibbs Measures
    The Annals of Applied Probability 2008, Vol. 18, No. 4, 1588–1618 DOI: 10.1214/07-AAP0498 c Institute of Mathematical Statistics, 2008 STEIN’S METHOD FOR DISCRETE GIBBS MEASURES1 By Peter Eichelsbacher and Gesine Reinert Ruhr-Universit¨at Bochum and University of Oxford Stein’s method provides a way of bounding the distance of a prob- ability distribution to a target distribution µ. Here we develop Stein’s method for the class of discrete Gibbs measures with a density eV , where V is the energy function. Using size bias couplings, we treat an example of Gibbs convergence for strongly correlated random vari- ables due to Chayes and Klein [Helv. Phys. Acta 67 (1994) 30–42]. We obtain estimates of the approximation to a grand-canonical Gibbs ensemble. As side results, we slightly improve on the Barbour, Holst and Janson [Poisson Approximation (1992)] bounds for Poisson ap- proximation to the sum of independent indicators, and in the case of the geometric distribution we derive better nonuniform Stein bounds than Brown and Xia [Ann. Probab. 29 (2001) 1373–1403]. 0. Introduction. Stein [17] introduced an elegant method for proving convergence of random variables toward a standard normal variable. Bar- bour [2, 3] and G¨otze [10] developed a dynamical point of view of Stein’s method using time-reversible Markov processes. If µ is the stationary dis- tribution of a homogeneous Markov process with generator , then X µ if and only if E g(X) = 0 for all functions g in the domainA ( ) of∼ the operator . ForA any random variable W and for any suitable functionD A f, to assess theA distance Ef(W ) f dµ we first find a solution g of the equation | − | g(Rx)= f(x) f dµ.
    [Show full text]
  • Boundary Conditions for Translation- Invariant Gibbs Measures of the Potts Model on Cayley Trees Daniel Gandolfo, M
    Boundary Conditions for Translation- Invariant Gibbs Measures of the Potts Model on Cayley Trees Daniel Gandolfo, M. M. Rakhmatullaev, U.A. Rozikov To cite this version: Daniel Gandolfo, M. M. Rakhmatullaev, U.A. Rozikov. Boundary Conditions for Translation- In- variant Gibbs Measures of the Potts Model on Cayley Trees. Journal of Statistical Physics, Springer Verlag, 2017, 167 (5), pp.1164. 10.1007/s10955-017-1771-5. hal-01591298 HAL Id: hal-01591298 https://hal.archives-ouvertes.fr/hal-01591298 Submitted on 26 Apr 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Boundary conditions for translation-invariant Gibbs measures of the Potts model on Cayley trees D. Gandolfo Aix Marseille Univ, Universit´ede Toulon, CNRS, CPT, Marseille, France. [email protected] M. M. Rahmatullaev and U. A. Rozikov Institute of mathematics, 29, Do'rmon Yo'li str., 100125, Tashkent, Uzbekistan. [email protected] [email protected] Abstract We consider translation-invariant splitting Gibbs measures (TISGMs) for the q- state Potts model on a Cayley tree of order two. Recently a full description of the TISGMs was obtained, and it was shown in particular that at sufficiently low temper- atures their number is 2q − 1.
    [Show full text]
  • Arxiv:2103.07391V1 [Math.PR] 12 Mar 2021 Studied
    GIBBS MEASURES OF POTTS MODEL ON CAYLEY TREES: A SURVEY AND APPLICATIONS U.A. ROZIKOV Abstract. In this paper we give a systematic review of the theory of Gibbs mea- sures of Potts model on Cayley trees (developed since 2013) and discuss many applications of the Potts model to real world situations: mainly biology, physics, and some examples of alloy behavior, cell sorting, financial engineering, flocking birds, flowing foams, image segmentation, medicine, sociology etc. 1. Introduction The Potts model is defined by a Hamiltonian (energy) of configurations of spins which take one of q possible values on vertices of a lattice. The model is used to study the behavior of systems having multiple states (spins, colors, alleles, etc). Since the model has a rich mathematical formulation it has been studied extensively. Usually the results of investigations of a system (in particular, the Potts model) are presented by a measure assigning a number to each (interesting) suitable properties of the system. The Gibbs measure is one of such important measures in many problems of probability theory and statistical mechanics. It is the measure associated with the Hamiltonian of a (biological, physical) system. Each Gibbs measure gives a state of the system. The main problem for a given Hamiltonian on a countable lattice is to describe its all possible Gibbs measures. In case of uniqueness of the Gibbs measure (for all values of the parameters, in particular a parameter can be temperature), the system does not change its state. The existence of some values of parameters at which the uniqueness of Gibbs measure switches to non-uniqueness is interpreted as phase transition (the system will change its state).
    [Show full text]
  • Lecture Notes Gibbs Measures and Phase Transitions. Part 2
    Lecture notes Gibbs measures and phase transitions. Part 2 Anton Bovier Rheinische Friedrich-Wilhelms Universit¨at Bonn Institut f¨ur Angewandte Mathematik Endenicher Allee 60 53115 Bonn Contents 1 Gibbsian formalism and metastates page 1 1.1 Introduction 1 1.2 Random Gibbs measures and metastates 3 1.3 Remarks on uniqueness conditions 13 1.4 Phase transitions 13 1.5 The Edwards–Anderson model. 17 2 The random-field Ising model 19 2.1 The Imry–Ma argument 19 2.2 The Aizenman–Wehr method 28 2.2.1 Translation covariant states 28 2.2.2 Order parameters and generating functions 30 2.3 The Bricmont-Kupiainen renormalization group 36 2.3.1 Renormalization group and contour models 38 2.3.2 The ground-states 44 2.3.3 The Gibbs states at finite temperature 58 Bibliography 77 i 1 Gibbsian formalism and metastates Longtemps les objets dont s’occupent les math´ematiciens ´etaient pour la pluspart mal d´efinis; on croyait les connaˆıtre, parce qu’on se les repr´esentait avec le sens ou l’imagination; mais on n’en avait qu’une image grossi`ere et non une id´ee pr´ecise sur laquelle le raisonnement pˆut avoir prise1. Henri Poincar´e, La valeur de la science We now turn to the main topic of this book, disordered systems. We split this into two parts, treating in turn lattice models and mean-field models. From the physical point of view, the former should be more realistic and hence more relevant, so it is natural that we present the general mathemat- ical framework in this context.
    [Show full text]
  • Distribution Dependent Stochastic Differential Equations
    Distribution Dependent Stochastic Differential Equations∗ Xing Huanga), Panpan Renb), Feng-Yu Wanga) a) Center for Applied Mathematics, Tianjin University, Tianjin 300072, China b) Department of Mathematics, City University of HongKong, HongKong, China [email protected], [email protected], [email protected] December 29, 2020 Abstract Due to their intrinsic link with nonlinear Fokker-Planck equations and many other applications, distribution dependent stochastic differential equations (DDSDEs for short) have been intensively investigated. In this paper we summarize some recent progresses in the study of DDSDEs, which include the correspondence of weak solutions and nonlinear Fokker-Planck equations, the well-posedness, regularity estimates, exponential ergodicity, long time large deviations, and comparison theorems. AMS subject Classification: 60B05, 60B10. Keywords: DDSDE, nonlinear Fokker-Planck equation, Bismut formula, Wasserstein distance, gradient estimate. arXiv:2012.13656v1 [math.PR] 26 Dec 2020 1 Introduction To characterize nonlinear PDEs in Vlasov’s kinetic theory, Kac [27, 28] proposed the “propagation of chaos” of mean field particle systems, which stimulated McKean [33] to study nonlinear Fokker-Planck equations using stochastic differential equations with distribution dependent drifts, see [45] for a theory on mean field particle systems and applications. In general, a nonlinear Fokker-Planck equation can be characterized by the following distri- bution dependent stochastic differential equations (DDSDEs for short): L L (1.1) dXt = b(t, Xt, Xt )dt + σ(t, Xt, Xt )dWt, ∗Supported in part by NNSFC (11771326, 11831014, 11921001, 11801406). 1 where Wt is an m-dimensional Brownian motion on a complete filtration probability space (Ω, F , P), L is the distribution (i.e.
    [Show full text]