An Empirical Study on the Intrinsic Privacy of Stochastic Gradient Descent Stephanie L

Total Page:16

File Type:pdf, Size:1020Kb

An Empirical Study on the Intrinsic Privacy of Stochastic Gradient Descent Stephanie L An Empirical Study on the Intrinsic Privacy of Stochastic Gradient Descent Stephanie L. Hyland and Shruti Tople Microsoft Research {stephanie.hyland, shruti.tople}@microsoft.com ABSTRACT - for example, a hospital which trains a prediction model on its In this work, we take the first step towards understanding whether own patient data and then shares it with other hospitals or a cloud the intrinsic randomness of stochastic gradient descent (SGD) can provider. The adversary then has access to the fully trained model, be leveraged for privacy, for any given dataset and model. In doing including its architecture, and we assume details of the training so, we hope to mitigate the trade-off between privacy and util- procedure are public (e.g. batch size, number of training iterations, ity for models trained with differential-privacy (DP) guarantees. learning rate), but not the random seed used to initialise the model Our primary contribution is a large-scale empirical analysis of parameters and sample inputs from the dataset. We therefore focus SGD on convex and non-convex objectives. We evaluate the in- on how SGD introduces randomness in the final weights of a model. herent variability in SGD on 4 datasets and calculate the intrinsic This randomness is introduced from two main sources — (1) data-dependent ϵi ¹Dº values due to the inherent noise. For logistic random initialization of the model parameters and (2) random sam- regression, we observe that SGD provides intrinsic ϵi ¹Dº values pling of the input dataset during training. We argue that rather between 3.95 and 23.10 across four datasets, dropping to between than viewing this variability as a pitfall of stochastic optimisation, 1.25 and 4.22 using the tight empirical sensitivity bound. For neural it can instead be seen as a source of noise that can mask information networks considered, we observe high ϵi ¹Dº values (>40) owing about participants in the training data. This prompts us to investi- to their larger parameter count. We propose a method to augment gate whether SGD itself can be viewed as a differentially-private the intrinsic noise of SGD to achieve the desired target ϵ, which mechanism, with some intrinsic data-dependent ϵ-value, which we produces statistically significant improvements in private model refer to as ϵi ¹Dº. To calculate ϵi ¹Dº, we propose a novel method performance (subject to assumptions). Our experiments provide that characterises SGD as a Gaussian mechanism and estimates strong evidence that the intrinsic randomness in SGD should be the intrinsic randomness for a given dataset, using a large-scale considered when designing private learning algorithms. empirical approach. To the best of our knowledge, ours is the first work to report the empirical calculation of ϵi ¹Dº values based on the observed distribution. Finally, we propose an augmented 1 INTRODUCTION differentially-private SGD algorithm that takes into account the Respecting the privacy of people contributing their data to train intrinsic ϵi ¹Dº to provide better utility. We empirically compute machine learning models is important for the safe use of this tech- the ϵi ¹Dº for SGD and the utility improvement for models trained nique [11, 27, 30]. Private variants of learning algorithms have been with both convex and non-convex objectives on 4 different datasets: proposed to address this need [5, 12, 24, 29, 31]. Unfortunately the MNIST, CIFAR10, Forest Covertype and Adult. utility of private models typically degrades, limiting their applicabil- ity. This performance loss often results from the need to add noise 2 PROBLEM & BACKGROUND during or after model training, to provide the strong protections We study the variability due to random sampling, and sensitivity of ϵ-differential-privacy [8]. However, results to date neglect the to dataset perturbations of stochastic gradient descent (SGD). fact that learning algorithms are often stochastic. Framing them as ‘fixed’ queries on a dataset neglects an important source of intrinsic Differential Privacy (DP).. Differential privacy hides the participa- noise. Meanwhile, the randomness in learning algorithms such as tion of an individual sample in the dataset [8]. ¹ϵ; δº-DP ensures that 0 stochastic gradient descent (SGD) is well-known among machine for all adjacent datasets S and S , the privacy loss of any individual learning practitioners [10, 14], and has been lauded for affording datapoint is bounded by ϵ with probability at least 1 − δ [9]: superior generalisation to its non-stochastic counterpart [16]. More- Definition 2.1 ( ¹ϵ; δº-Differential Privacy). A mechanism M with over, the ‘insensitive’ nature of SGD relative to variations in its domain I and range O satisfies ¹ϵ; δº-differential privacy if for any input data has been established in terms of uniform stability [13]. two neighbouring datasets S; S 0 2 I that differ only in one input The data-dependent nature of this stability has also been charac- and for a set E ⊆ O, we have: Pr¹M¹Sº 2 Eº ≤ eϵ Pr¹M¹S 0º 2 Eº+δ terised [18]. Combining these observations, we speculate that the variability in the model parameters produced by the stochasticity of SGD may exceed its sensitivity to perturbations in the specific We consider an algorithm whose output is the weights of a input data, affording ‘data-dependent intrinsic’ privacy. In essence, trained model, thus focus on the `2-sensitivity: we ask: “Can the intrinsic, data-dependent stochasticity of SGD help Definition 2.2` ( -Sensitivity (From Def 3.8 in [9]). Let f be a with the privacy-utility trade-off?” 2 function mapping from a dataset to a vector in Rd . Let S, S 0 be two Our Approach. We consider a scenario where a model is trained datasets differing in one data point. Then the `2-sensitivity of f is 0 securely, but the final model parameters are released to the public defined as: ∆2¹f º = maxS;S0 kf ¹Sº − f ¹S ºk2 An Empirical Study on the Intrinsic Privacy of SGD One method for making a deterministic query f differentially 3 AUGMENTED DP-SGD private is the Gaussian mechanism. This gives a way to compute In this section, we show how to account for the data-dependent the ϵ of a Gaussian-distributed query given δ, its sensitivity ∆2¹f º, intrinsic noise of SGD while adding noise using the output per- 2 and its variance σ . turbation method that ensures differential privacy guarantees [31]. Theorem 2.3. Gaussian mechanism (From [9]). Let f be a function The premise of output perturbation is to train a model in secret, that maps a dataset to a vector in Rd . Let ϵ 2 ¹0; 1º be arbitrary. and then release a noisy version of the final weights. For a desired ϵ, ˆ ; ˆ ∗ For c2 > 2 ln ¹1:25/δº, adding Gaussian noise sampled using the δ, and a known sensitivity value (∆S ∆S ) the Gaussian mechanism (Theorem 2.3) gives us the required level of noise in the output, parameters σ ≥ c∆2¹f )/ϵ guarantees ¹ϵ; δº-differential privacy. which we call σtarget. Stochastic Gradient Descent (SGD). SGD and its derivatives are In Wu et al. [31], this σtarget defines the variance of the noise the most common optimisation methods for training machine learn- vector sampled and added to the model weights, to produce a ¹ϵ; δº- ing models [3]. Given a loss function L¹w; ¹x;yºº averaged over DP model. In our case, we reduce σtarget to account for the noise a dataset, SGD provides a stochastic approximation of the tra- already present in the output of SGD. Since the sum of two inde- ditional gradient descent method by estimating the gradient of 2 2 pendent Gaussians with variance σa and σb is a Gaussian with L at random inputs. At step t, on selecting a random sample 2 + 2 ¹Dº variance σa σb , if the intrinsic noise of SGD is σi , to achieve ¹xt ;yt º the gradient update function G performs wt+1 = G¹wt º = the desired ϵ we need to augment σi ¹Dº to reach σtarget: w − ηr L¹w ; ¹x;yºº, where η is the (constant) step-size or learn- t w t q 2 2 ing rate, and wt are the weights of the model at t. In practice, the σaugment = σtarget − σi ¹Dº (2) stochastic gradient is estimated using a mini-batch of B samples. Given the likely degradation of model performance with out- Recently, Wu et al. [31] showed results for the sensitivity of SGD put noise, accounting for σ ¹Dº is expected to help utility with- to the change in a single training example for a convex, L-Lipschitz i out compromising privacy. The resulting algorithm for augmented and β-smooth loss L. We use this sensitivity bound in our analyses differentially-private SGD is shown in Algorithm 1. to compute intrinsic ϵi ¹Dº for convex functions. Let A denote the SGD algorithm using r as the random seed. The upper bound for Algorithm 1 Augmented differentially private SGD sensitivity for k-passes of SGD with learning rate η is given by 1: Given σ ¹Dº, ϵ , δ , ∆ ¹f º, model weights w . 0 i target 2 private ∆ˆ S = max kA¹r; Sº − A¹r; S ºk ≤ 2kLη (1) p −5 r 2: c 2log¹1:25)/δ + 1 × 10 3: σ c∆ ¹f )/ϵ ˆ target 2 target ∆S gives the maximum difference in the model parameters dueto 4: if σi ¹Dº < σtarget then the presence or absence of a single input sample. When using a q 2 2 5: σaugment σ − σi ¹Dº batch size of B as we do, the sensitivity bound can be reduced by a target 6: else factor of B i.e., ∆ˆ S ≤ 2kLη/B.
Recommended publications
  • Complexity Theory and Its Applications in Linear Quantum
    Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 2016 Complexity Theory and its Applications in Linear Quantum Optics Jonathan Olson Louisiana State University and Agricultural and Mechanical College, [email protected] Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Part of the Physical Sciences and Mathematics Commons Recommended Citation Olson, Jonathan, "Complexity Theory and its Applications in Linear Quantum Optics" (2016). LSU Doctoral Dissertations. 2302. https://digitalcommons.lsu.edu/gradschool_dissertations/2302 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. COMPLEXITY THEORY AND ITS APPLICATIONS IN LINEAR QUANTUM OPTICS A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of Physics and Astronomy by Jonathan P. Olson M.S., University of Idaho, 2012 August 2016 Acknowledgments My advisor, Jonathan Dowling, is apt to say, \those who take my take my advice do well, and those who don't do less well." I always took his advice (sometimes even against my own judgement) and I find myself doing well. He talked me out of a high-paying, boring career, and for that I owe him a debt I will never be able to adequately repay. My mentor, Mark Wilde, inspired me to work hard without saying a word about what I \should" be doing, and instead leading by example.
    [Show full text]
  • Haystack Hunting Hints and Locker Room Communication∗
    Haystack Hunting Hints and Locker Room Communication∗ Artur Czumaj † George Kontogeorgiou ‡ Mike Paterson § Abstract We want to efficiently find a specific object in a large unstructured set, which we model by a random n-permutation, and we have to do it by revealing just a single element. Clearly, without any help this task is hopeless and the best one can do is to select the element at random, and 1 achieve the success probability n . Can we do better with some small amount of advice about the permutation, even without knowing the object sought? We show that by providing advice of just one integer in 0; 1; : : : ; n 1 , one can improve the success probability considerably, by log n f − g a Θ( log log n ) factor. We study this and related problems, and show asymptotically matching upper and lower bounds for their optimal probability of success. Our analysis relies on a close relationship of such problems to some intrinsic properties of random permutations related to the rencontres number. 1 Introduction Understanding basic properties of random permutations is an important concern in modern data science. For example, a preliminary step in the analysis of a very large data set presented in an unstructured way is often to model it assuming the data is presented in a random order. Un- derstanding properties of random permutations would guide the processing of this data and its analysis. In this paper, we consider a very natural problem in this setting. You are given a set of 1 n objects ([n 1], say ) stored in locations x ; : : : ; x − according to a random permutation σ of − 0 n 1 [n 1].
    [Show full text]
  • Discrete Uniform Distribution
    Discrete uniform distribution In probability theory and statistics, the discrete uniform distribution is a symmetric probability discrete uniform distribution wherein a finite number of values are Probability mass function equally likely to be observed; every one of n values has equal probability 1/n. Another way of saying "discrete uniform distribution" would be "a known, finite number of outcomes equally likely to happen". A simple example of the discrete uniform distribution is throwing a fair die. The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the probability of a given score is 1/6. If two dice are thrown and their values added, the resulting distribution is no longer uniform because not all sums have equal probability. Although it is convenient to describe discrete uniform distributions over integers, such as this, one can also consider n = 5 where n = b − a + 1 discrete uniform distributions over any finite set. For Cumulative distribution function instance, a random permutation is a permutation generated uniformly from the permutations of a given length, and a uniform spanning tree is a spanning tree generated uniformly from the spanning trees of a given graph. The discrete uniform distribution itself is inherently non-parametric. It is convenient, however, to represent its values generally by all integers in an interval [a,b], so that a and b become the main parameters of the distribution (often one simply considers the interval [1,n] with the single parameter n). With these conventions, the cumulative distribution function (CDF) of the discrete uniform distribution can be expressed, for any k ∈ [a,b], as Notation or Parameters integers with Support pmf Contents CDF Estimation of maximum Random permutation Mean Properties Median See also References Mode N/A Estimation of maximum Variance This example is described by saying that a sample of Skewness k observations is obtained from a uniform Ex.
    [Show full text]
  • Official Journal of the Bernoulli Society for Mathematical Statistics And
    Official Journal of the Bernoulli Society for Mathematical Statistics and Probability Volume Twenty Five Number Two May 2019 ISSN: 1350-7265 CONTENTS MAMMEN, E., VAN KEILEGOM, I. and YU, K. 793 Expansion for moments of regression quantiles with applications to nonparametric testing GRACZYK, P. and MAŁECKI, J. 828 On squared Bessel particle systems EVANS, R.J. and RICHARDSON, T.S. 848 Smooth, identifiable supermodels of discrete DAG models with latent variables CHAE, M. and WALKER, S.G. 877 Bayesian consistency for a nonparametric stationary Markov model BELOMESTNY, D., PANOV, V. and WOERNER, J.H.C. 902 Low-frequency estimation of continuous-time moving average Lévy processes ZEMEL, Y. and PANARETOS, V.M. 932 Fréchet means and Procrustes analysis in Wasserstein space CASTRO, R.M. and TÁNCZOS, E. 977 Are there needles in a moving haystack? Adaptive sensing for detection of dynamically evolving signals DAHLHAUS, R., RICHTER, S and WU, W.B. 1013 Towards a general theory for nonlinear locally stationary processes CHEN, X., CHEN, Z.-Q., TRAN, K. and YIN, G. 1045 Properties of switching jump diffusions: Maximum principles and Harnack inequalities BARBOUR, A.D., RÖLLIN, A. and ROSS, N. 1076 Error bounds in local limit theorems using Stein’s method BECHERER, D., BILAREV, T. and FRENTRUP, P. 1105 Stability for gains from large investors’ strategies in M1/J1 topologies OATES, C.J., COCKAYNE, J., BRIOL, F.-X. and GIROLAMI, M. 1141 Convergence rates for a class of estimators based on Stein’s method IRUROZKI, E., CALVO, B. and LOZANO, J.A. 1160 Mallows and generalized Mallows model for matchings LEE, J.H.
    [Show full text]
  • Kernels of Mallows Models Under the Hamming Distance for Solving the Quadratic Assignment Problem
    Kernels of Mallows Models under the Hamming Distance for solving the Quadratic Assignment Problem Etor Arzaa, Aritz Péreza, Ekhiñe Irurozkia, Josu Ceberiob aBCAM - Basque Center for Applied Mathematics, Spain bUniversity of the Basque Country UPV/EHU, Spain Abstract The Quadratic Assignment Problem (QAP) is a well-known permutation-based combinatorial opti- mization problem with real applications in industrial and logistics environments. Motivated by the challenge that this NP-hard problem represents, it has captured the attention of the optimization community for decades. As a result, a large number of algorithms have been proposed to tackle this problem. Among these, exact methods are only able to solve instances of size n < 40. To overcome this limitation, many metaheuristic methods have been applied to the QAP. In this work, we follow this direction by approaching the QAP through Estimation of Distribution Algorithms (EDAs). Particularly, a non-parametric distance-based exponential probabilistic model is used. Based on the analysis of the characteristics of the QAP, and previous work in the area, we introduce Kernels of Mallows Model under the Hamming distance to the context of EDAs. Conducted experiments point out that the performance of the proposed algorithm in the QAP is superior to (i) the classical EDAs adapted to deal with the QAP, and also (ii) to the specific EDAs proposed in the literature to deal with permutation problems. 1. Introduction The Quadratic Assignment Problem (QAP) [41] is a well-known combinatorial optimization prob- lem. Along with other problems, such as the traveling salesman problem, the linear ordering problem and the flowshop scheduling problem, it belongs to the family of permutation-based (a permutation is a bijection of the set f1; :::; ng onto itself) problems [15].
    [Show full text]
  • Permanental Partition Models and Markovian Gibbs Structures
    PERMANENTAL PARTITION MODELS AND MARKOVIAN GIBBS STRUCTURES HARRY CRANE Abstract. We study both time-invariant and time-varying Gibbs distributions for configura- tions of particles into disjoint clusters. Specifically, we introduce and give some fundamental properties for a class of partition models, called permanental partition models, whose distribu- tions are expressed in terms of the α-permanent of a similarity matrix parameter. We show that, in the time-invariant case, the permanental partition model is a refinement of the cele- brated Pitman-Ewens distribution; whereas, in the time-varying case, the permanental model refines the Ewens cut-and-paste Markov chains [J. Appl. Probab. (2011), 3:778–791]. By a special property of the α-permanent, the partition function can be computed exactly, allowing us to make several precise statements about this general model, including a characterization of exchangeable and consistent permanental models. 1. Introduction Consider a collection of n distinct particles (labeled i = 1;:::; n) located at distinct positions x1;:::; xn in some space . Each particle i = 1;:::; n occupies an internal state yi, which may change over time. Jointly,X the state of the system at time t 0 depends on the ≥ positions of the particles x1;:::; xn and the interactions among these particles. We assume that, with the exception of their positions in , particles have identical physical properties X and, thus, the pairwise interaction between particles at positions x and x0 is determined only by the pair (x; x0). In this setting, we model pairwise interactions by a symmetric function K : [0; 1] with unit diagonal, K(x; x) = 1 for all x , where K(x; x ) X × X ! 2 X 0 quantifies the interaction between particles at locations x and x0.
    [Show full text]
  • Cost Efficient Frequency Hopping Radar Waveform for Range And
    Cost Efficient Frequency Hopping Radar Waveform for Range and Doppler Estimation Benjamin Nuss1, Johannes Fink2, Friedrich Jondral2 1Institut fur¨ Hochfrequenztechnik und Elektronik (IHE) 2Communications Engineering Lab (CEL) Karlsruhe Institute of Technology (KIT), Germany E-mail: [email protected], johannes.fi[email protected], [email protected] Abstract—A simple and cost efficient frequency hopping radar chose exactly the same sequence π is given by waveform which has been developed to be robust against mutual 1 interference is presented. The waveform is based on frequency P (Π = π) = . (1) hopping on the one hand and Linear Frequency Modulation Shift N! Keying (LFMSK) on the other hand. The signal model as well as The height of one step is ∆f = B , where B is the total the radar signal processing are described in detail, focusing on N the efficient estimation of range and Doppler of multiple targets occupied bandwidth of the frequency hopping signal. Each in parallel. Finally, the probability of interference is estimated possible frequency is transmitted exactly once due to the and techniques for interference mitigation are presented showing random permutation. Fig. 1 shows an exemplary hopping a good suppression of mutual interference of the same type of sequence for N = 1024 and B = 150 MHz. radar. 150 I. INTRODUCTION In recent years more and more radar systems in unlicensed 100 bands have come to market resulting in an increasing interference level. Thus, new waveforms have to be developed being robust against mutual interference. Furthermore, in order f in MHz 50 to successfully compete in the market, the radars have to be cheap and thus, the hard- and software have to be as simple and cost efficient as possible.
    [Show full text]
  • Table of Contents
    A Comprehensive Course in Geometry Table of Contents Chapter 1 - Introduction to Geometry Chapter 2 - History of Geometry Chapter 3 - Compass and Straightedge Constructions Chapter 4 - Projective Geometry and Geometric Topology Chapter 5 - Affine Geometry and Analytic Geometry Chapter 6 - Conformal Geometry Chapter 7 - Contact Geometry and Descriptive Geometry Chapter 8 - Differential Geometry and Distance Geometry Chapter 9 - Elliptic Geometry and Euclidean Geometry Chapter 10 WT- Finite Geometry and Hyperbolic Geometry _____________________ WORLD TECHNOLOGIES _____________________ A Course in Boolean Algebra Table of Contents Chapter 1 - Introduction to Boolean Algebra Chapter 2 - Boolean Algebras Formally Defined Chapter 3 - Negation and Minimal Negation Operator Chapter 4 - Sheffer Stroke and Zhegalkin Polynomial Chapter 5 - Interior Algebra and Two-Element Boolean Algebra Chapter 6 - Heyting Algebra and Boolean Prime Ideal Theorem Chapter 7 - Canonical Form (Boolean algebra) Chapter 8 -WT Boolean Algebra (Logic) and Boolean Algebra (Structure) _____________________ WORLD TECHNOLOGIES _____________________ A Course in Riemannian Geometry Table of Contents Chapter 1 - Introduction to Riemannian Geometry Chapter 2 - Riemannian Manifold and Levi-Civita Connection Chapter 3 - Geodesic and Symmetric Space Chapter 4 - Curvature of Riemannian Manifolds and Isometry Chapter 5 - Laplace–Beltrami Operator Chapter 6 - Gauss's Lemma Chapter 7 - Types of Curvature in Riemannian Geometry Chapter 8 - Gauss–Codazzi Equations Chapter 9 -WT Formulas
    [Show full text]
  • Enumerative Combinatorics: Class Notes
    Enumerative Combinatorics: class notes Darij Grinberg May 4, 2021 (unfinished!) Status: Chapters 1 and 2 finished; Chapter 3 outlined. Contents 1. Introduction7 1.1. Domino tilings . .8 1.1.1. The problem . .8 1.1.2. The odd-by-odd case and the sum rule . 12 1.1.3. The symmetry and the bijection rule . 14 1.1.4. The m = 1 case . 18 1.1.5. The m = 2 case and Fibonacci numbers . 19 1.1.6. Kasteleyn’s formula (teaser) . 26 1.1.7. Axisymmetric domino tilings . 28 1.1.8. Tiling rectangles with k-bricks . 31 1.2. Sums of powers . 38 1.2.1. The sum 1 + 2 + ... + n ....................... 38 1.2.2. What is a sum, actually? . 42 1.2.3. Rules for sums . 48 1.2.4. While at that, what is a finite product? . 53 1.2.5. The sums 1k + 2k + ... + nk .................... 54 1.3. Factorials and binomial coefficients . 58 1.3.1. Factorials . 58 1.3.2. Definition of binomial coefficients . 59 1.3.3. Fundamental properties of the binomial coefficients . 61 1.3.4. Binomial coefficients count subsets . 69 1.3.5. Integrality and some arithmetic properties . 74 1.3.6. The binomial formula . 78 1.3.7. Other properties of binomial coefficients . 86 1.4. Counting subsets . 93 1.4.1. All subsets . 93 1 Enumerative Combinatorics: class notes page 2 1.4.2. Lacunar subsets: the basics . 94 1.4.3. Intermezzo: SageMath . 97 1.4.4. Counting lacunar subsets . 102 1.4.5. Counting k-element lacunar subsets .
    [Show full text]
  • Mallows and Generalized Mallows Model for Matchings
    Submitted to Bernoulli arXiv: arXiv:0000.0000 Mallows and Generalized Mallows Model for Matchings EKHINE IRUROZKI1,* BORJA CALVO1,** and JOSE A. LOZANO1,† 1University of the Basque Country E-mail: *[email protected]; **[email protected]; †[email protected] The Mallows and Generalized Mallows Models are two of the most popular probability models for distributions on permutations. In this paper we consider both models under the Hamming distance. This models can be seen as models for matchings instead of models for rankings. These models can not be factorized, which contrasts with the popular MM and GMM under Kendall’s-τ and Cayley distances. In order to overcome the computational issues that the models involve, we introduce a novel method for computing the partition function. By adapting this method we can compute the expectation, joint and conditional probabilities. All these methods are the basis for three sampling algorithms, which we propose and analyze. Moreover, we also propose a learning algorithm. All the algorithms are analyzed both theoretically and empirically, using synthetic and real data from the context of e-learning and Massive Open Online Courses (MOOC). Keywords: Mallows Model, Generalized Mallows Model, Hamming, Matching, Sampling, Learn- ing. 1. Introduction Permutations appear naturally in a wide variety of domains, from Social Sciences [47] to Machine Learning [8]. Probability models over permutations are an active research topic in areas such as preference elicitation [7], information retrieval [17], classification [9], etc. Some prominent examples of these are models based on pairwise preferences [33], Placket-Luce [34], [42], and Mallows Models [35]. One can find in the recent literature on distributions on permutations both theoretical discussions and practical applications, as well as extensions of all the aforementioned models.
    [Show full text]
  • Some Algebraic Identities for the Alpha-Permanent
    SOME ALGEBRAIC IDENTITIES FOR THE α-PERMANENT HARRY CRANE Abstract. We show that the permanent of a matrix is a linear combinationofdeterminants of block diagonal matrices which are simple functions of the original matrix. To prove this, we first show a moregeneral identity involving α-permanents: for arbitrary complex numbers α and β, we show that the α-permanent of any matrix can be expressed as a linear combination of β-permanents of related matrices. Some other identities for the α-permanent of sums and products of matrices are shown, as well as a relationship between the α-permanent and general immanants. We conclude with a discussion of the computational complexity of the α-permanent and provide some numerical illustrations. 1. Introduction The permanent of an n × n C-valued matrix M is defined by n (1) per M := Mj,σ(j), σX∈Sn Yj=1 where Sn denotes the symmetric group acting on [n] := {1,..., n}. Study of the perma- nent dates to Binet and Cauchy in the early 1800s [11]; and much of the early interest in permanents was in understanding how its resemblance of the determinant, n (2) det M := sgn(σ) Mj,σ(j), σX∈Sn Yj=1 where sgn(σ) istheparityof σ ∈ Sn, reconciled with some stark differences between the two. An early consideration, answered in the negative by Marcus and Minc [8], was whether there exists a linear transformation T such that per TM = det M for any matrix M. In his seminal paper on the #P-complete complexity class, Valiant remarked about the perplexing arXiv:1304.1772v1 [math.CO] 4 Apr 2013 relationship between the permanent and determinant, We do not know of any pair of functions, other than the permanent and determinant, for which the explicit algebraic expressions are so similar, and yet the computational complexities are apparently so different.
    [Show full text]
  • A Generalization of the “Probl`Eme Des Rencontres”
    1 2 Journal of Integer Sequences, Vol. 21 (2018), 3 Article 18.2.8 47 6 23 11 A Generalization of the “Probl`eme des Rencontres” Stefano Capparelli Dipartimento di Scienze di Base e Applicate per l’Ingegneria Universit´adi Roma “La Sapienza” Via A. Scarpa 16 00161 Roma Italy [email protected] Margherita Maria Ferrari Department of Mathematics and Statistics University of South Florida 4202 E. Fowler Avenue Tampa, FL 33620 USA [email protected] Emanuele Munarini and Norma Zagaglia Salvi1 Dipartimento di Matematica Politecnico di Milano Piazza Leonardo da Vinci 32 20133 Milano Italy [email protected] [email protected] Abstract In this paper, we study a generalization of the classical probl`eme des rencontres (problem of coincidences), where you are asked to enumerate all permutations π ∈ 1This work is partially supported by MIUR (Ministero dell’Istruzione, dell’Universit`ae della Ricerca). 1 S with k fixed points, and, in particular, to enumerate all permutations π S n ∈ n with no fixed points (derangements). Specifically, here we study this problem for the permutations of the n + m symbols 1, 2, ..., n, v , v , ..., v , where v 1 2 m i 6∈ 1, 2,...,n for every i = 1, 2,...,m. In this way, we obtain a generalization of the { } derangement numbers, the rencontres numbers and the rencontres polynomials. For these numbers and polynomials, we obtain the exponential generating series, some recurrences and representations, and several combinatorial identities. Moreover, we obtain the expectation and the variance of the number of fixed points in a random permutation of the considered kind.
    [Show full text]