Random Batch Methods (RBM) for Interacting Particle Systems
Total Page:16
File Type:pdf, Size:1020Kb
Interacting Particle Systems: random batch methods for classical and quantum N-body problems, and non-convex optimization Shi Jin (金石) Shanghai Jiao Tong University, China outlines • Random batch methods for classical interacting particles • Random batch methods for N-body quantum dynamics • Consensus-based interacting particle systems for non-convex optimization (gradient-free) (binary) Interacting particle systems • We use first-order binary interacting system as an example but the idea works for second order system (Newton’s second law type) and more interacting particles • The idea works with or without the Brownian motion term Applications • Physics and chemistry: molecular dynamics, electrostatics, astrophysics (stars, galaxies) … • Biology: flocking, swarming, chemotaxis, … • Social sciences: wealth distribution, opinion dynamics, pedestrian dynamics … • Data sciences: clustering,… • Numerical particle methods for kinetic/mean-field equations • Courtesy of P.E. Jabin The computational cost It is clear to be of per time step (or if it is J particle interaction) Fast Multipole Methods were developed to address this issue for binary interactions We introduce the random batch methods to reduce the computational cost to per time step and it works for general interacting potentials The random batch methods (with Lei Li, SJTU and J.G. Liu, Duke) at each time step, • random shuffling: group N particles randomly to n groups (batches), each batch has p particles • particles interacting only inside their own batches Algorithm 1 (RBM-1) • Let time step be discrete time The computational cost is • Random shuffling algorithm: for example the Durstemfd’s modern revision of Fisher- Yates shuffle algorithm costs In MATLAB: randperm(N) • The summation cost is due to small batch size p Algorithm 2 (RBM with replacement: RBM-r) • At each time step, draw a batch of size p randomly, interacting within this batch, for n independent times Remarks • For these methods to be competitive over the deterministic solvers, the time step needs to be independent of N (we will prove this for special V and K) • For Coulomb interaction, one can do a time splitting then for p=2, the Coulomb interaction step can be solved analytically, avoiding stiffness due to singularity Relevant approaches in other fields • stochastic gradient (or coordinate) descent methods in machine learning (use small and random batches to do gradient descent) • Direct simulation Monte-Carlo methods (Birds, Bobylev, Nanbu)— based on binary collisions-- for Boltzmann equation; and its adaptation for mean-field equations of flocking models using random binary interactions (Albi, Pareschi, Carrillo) Theoretical analysis for RBM-1 This assumption guarantees that the evolution group of the original deterministic particle system is a contraction: In this case one needs to have a non-trivial equilibrium (otherwise all particles go to a single point) Notations and definitions • Solution of the original particle system: • Solution of RBM-1 • Assume all particles are indistinguishable • Coupling • Error process • Norm of error • Wasserstein-2 distance Error estimates • Since C is independent of N, therefore is independent of N A key lemma (consistency) Mean-field limit • The empirical measure • As , under certain assumption on V and K, the marginal distribution of particles as well as converge to the Fokker-Planck equation Error to the mean-field limit and Asymptotic-Preserving property (Cattiaux-Guillin-Malrieu) • If the deterministic flow is not contracting, one may still get (Dobrushin, Benachour-Roynette-Talay-Vallois, Jabin-Wang) Applications • Dyson Brownian motion from random matrix theory • Charged particles on sphere: Thomson’s problem (Brownian motion on sphere) • Stochastic dynamics of wealth • Opinion dynamics • Data clustering and stochastic block model, reordering sparse matrix The Dyson Brownian motion • The eigenvalues of a Hermitian matrix valued Ornstein-Ulenbeck process satisfies Dyson Brownian motion: (Tao, Erdos-Yau) • It has a mean-field limit • Analytical solution (for ) • Invariant measure (semi-circle law): Comparison between RBM-1 and RMB-r Charged particles on sphere F: Coulomb RBM-r (consider ) Stochastic dynamics of wealth • A mean-field game model by Degond, Liu and Ringhofer: Considering N market agents with two attributes: the economic configurations and its wealth Comparison with equilibrium distribution of the mean-field equation for (D=1, multiplicative noise and long-range potential) Stochastic opinion dynamics (for consensus of opinions) An opinion dynamics (Motsch-Tadmor) • We consider a stochastic (RBM-1) version Clustering through interacting particle system Reordering sparse matrices Some other applications • Sampling: ➢ Stein variational gradient descent: L. Li, Y. Li, J-G.. Liu, Z. Liu, J. Lu ➢ Sampling of Gibbs measure that corresponds to particle systems with Lenard-Jone potential L. Li, Z. Xu, Y. Zhao • Poisson-Boltzmann (particle formulation) L. Li, J.-G. Liu-Y. Tang • Control of synchronolization in particle systems E. Zuazua, etc. A consensus-based optimization (CBO) algorithm for high dimensional machine learning problem • In machine learning one seeks to find the global minimum of a loss function (non-convex, high dimensional) • Non-convex global optimization is NP-hard problem • The most popular method is the stochastic gradient descent method which needs to take the gradient (along a few randomly selected spatial directions at each iteration) • Often the loss function is not a good function to take its gradient, or the function is known only in discrete set of data • Alternative gradient-free numerical methods are of great interest Gradient-free optimization methods • Simulated annealing: Kirkpatrick (‘83) • Genetic algorithms: Holland (’75) • swarming intelligence (SI): a population of simple agents interacting with each other, and the collective behavior exhibits “intelligence” not known by individuals--better way to get out of local extrema compared to simulated annealing Examples: particle swarming optimization (PSO): Kennedy, Eberhart and Shi (‘95-’98) ant colony optimization (ACO): Moyson Manderick (‘88) artificial bee colony optimization (ABC): Karaboga (’05) from wikipedia About metaheuristics: from wikipedia Most literature on metaheuristics is experimental in nature, describing empirical results based on computer experiments with the algorithms. … . While the field also features high-quality research, many of the publications have been of poor quality; flaws include vagueness, lack of conceptual elaboration, poor experiments, and ignorance of previous literature.[7] Sörensen, Kenneth (2015). "Metaheuristics—the metaphor exposed" . International Transactions in Operational Research. 22: 3–18. We develop an interacting particle system with proven convergence toward the global minimum for general non-convex, high dimensional functions A consensus-based optimization (CBO) • Pinnau-Totzeck-Tse-Martin (M3AS ‘17) • Carrillo-Choi-Totzeck-Tse (M3AS ‘18) Heaviside function For sufficiently large the particles form consensus--converge to the global minimum of L exponentially fast but the drift rate is dimension sensitive! Our improvement: a dimension-independent model! (with J. Carrillo,Imperial College; Lei Li, SJTU and Yuhua Zhu, Stanford) • Use geometric Brownian motion • SGD to compute L: • RBM to evaluate : B randomly selected mini-batch Heuristics: for H=1 Consider the case: For particles to form a consensus: PTTM model: Our model: our model is dimension insensitive! Convergence analysis for the particle systems (with Seung-yeal Ha, SNU,; Doheon Kim, KIAS) consensus for continuous system Main idea of the proof • Write the system in consensus form • Different and correspond to different schemes (explicit, semi-implicit, exponential integrator, etc which leads to different numerical stability condition) Answer to question A Some assumptions Answer to question B • Remarks: 1) error proportional to 2) initial data quite restrictive-close to and support of initial distribution contains Rastrigin function of 20 dimensions: PTTM algorithm our algorithm MNIST dataset Conclusions on CBO • Initial data still quite restrictive—one needs nice initial data • Although the convergence rate does not depend on dimension, the error does • Analysis of the mean-field limit, and the RBM approximation • More tests, and error analysis on various parameters (N, for example) and improvement and hybridization with other methods.