Arxiv:2101.10154V1 [Cond-Mat.Dis-Nn] 25 Jan 2021
Total Page:16
File Type:pdf, Size:1020Kb
Variational Neural Annealing 1, 2, 3, 1 1, 2 2, 3 1, 2 Mohamed Hibat-Allah, ∗ Estelle M. Inack, Roeland Wiersema, Roger G. Melko, and Juan Carrasquilla 1Vector Institute, MaRS Centre, Toronto, Ontario, M5G 1M1, Canada 2Department of Physics and Astronomy, University of Waterloo, Ontario, N2L 3G1, Canada 3Perimeter Institute for Theoretical Physics, Waterloo, ON N2L 2Y5, Canada (Dated: January 26, 2021) Many important challenges in science and technology can be cast as optimization problems. When viewed in a statistical physics framework, these can be tackled by simulated annealing, where a gradual cooling procedure helps search for groundstate solutions of a target Hamiltonian. While powerful, simulated annealing is known to have prohibitively slow sampling dynamics when the optimization landscape is rough or glassy. Here we show that by generalizing the target distribution with a parameterized model, an analogous annealing framework based on the variational principle can be used to search for groundstate solutions. Modern autoregressive models such as recurrent neural networks provide ideal parameterizations since they can be exactly sampled without slow dynamics even when the model encodes a rough landscape. We implement this procedure in the classical and quantum settings on several prototypical spin glass Hamiltonians, and find that it significantly outperforms traditional simulated annealing in the asymptotic limit, illustrating the potential power of this yet unexplored route to optimization. P I. INTRODUCTION <latexit sha1_base64="Wk+wWdaEIxpmR5GD8uDFICvy2Ig=">AAACNHicfVDLSgMxFL1TX7W+Rl26CRbBVZkRQZdFN4KbCvYBbRkyaaYNzWSGJGMpQz/KjR/iRgQXirj1G8y0s6iteCBwOOfcJPf4MWdKO86rVVhZXVvfKG6WtrZ3dvfs/YOGihJJaJ1EPJItHyvKmaB1zTSnrVhSHPqcNv3hdeY3H6hULBL3ehzTboj7ggWMYG0kz76teSnq9KKRwFJGI9RJ4kUyZ/6Xm5Q8u+xUnCnQMnFzUoYcNc9+NjeSJKRCE46VartOrLsplpoRTielTqJojMkQ92nbUIFDqrrpdOkJOjFKDwWRNEdoNFXnJ1IcKjUOfZMMsR6oRS8T//LaiQ4uuykTcaKpILOHgoQjHaGsQdRjkhLNx4ZgIpn5KyIDLDHRpuesBHdx5WXSOKu4TsW9Oy9Xr/I6inAEx3AKLlxAFW6gBnUg8Agv8A4f1pP1Zn1aX7NowcpnDuEXrO8fzwWstw==</latexit> #""##"" Exact<latexit sha1_base64="2XaTC3aQIuGNTTVlr0WTlG/Zn/g=">AAACBnicbVDLSsNAFJ34rPUVdSnCYBFchUQEXZaK4LKCfUAbymQ6aYdOJmHmRlpDV278FTcuFHHrN7jzb5y0XWjrgYHDOfdy55wgEVyD635bS8srq2vrhY3i5tb2zq69t1/Xcaooq9FYxKoZEM0El6wGHARrJoqRKBCsEQyucr9xz5TmsbyDUcL8iPQkDzklYKSOfdQGNoTsekgo4Eos4CEiUuKuueyMix275DruBHiReDNSQjNUO/ZXuxvTNGISqCBatzw3AT8jCjgVbFxsp5olhA5Ij7UMlSRi2s8mMcb4xChdHMbKPAl4ov7eyEik9SgKzGREoK/nvVz8z2ulEF76GZdJCkzS6aEwFRhinHdi0ipGQYwMIVRx81dM+0SZSkxzeQnefORFUj9zPNfxbs9L5cqsjgI6RMfoFHnoApXRDaqiGqLoET2jV/RmPVkv1rv1MR1dsmY7B+gPrM8f6XuYvA==</latexit> Boltzmann dist. A wide array of complex combinatorial optimization Simulated<latexit sha1_base64="wTzXr1453Y4dG8tGIAcGvYSIDw0=">AAACBHicbVA9SwNBEN3z2/gVtUyzGASrcCeClkEbS0WjgeQIc5tJXNzbO3bnxHCksPGv2FgoYuuPsPPfuEmu0OiDgcd7M7szL0qVtOT7X97M7Nz8wuLScmlldW19o7y5dWWTzAhsiEQlphmBRSU1NkiSwmZqEOJI4XV0ezLyr+/QWJnoSxqkGMbQ17InBZCTOuVKm/Ce8gsZZwoIuxy0RnCv9YelTrnq1/wx+F8SFKTKCpx1yp/tbiKyGDUJBda2Aj+lMAdDUigcltqZxRTELfSx5aiGGG2Yj48Y8l2ndHkvMa408bH6cyKH2NpBHLnOGOjGTnsj8T+vlVHvKMylTjNCLSYf9TLFKeGjRHhXGhSkBo6AMNLtysUNGBDkchuFEEyf/Jdc7dcCvxacH1Trx0UcS6zCdtgeC9ghq7NTdsYaTLAH9sRe2Kv36D17b977pHXGK2a22S94H98J7phR</latexit> annealing problems can be reformulated as finding the lowest en- Variational<latexit sha1_base64="Bqmg4yPwC5oFpBO4TMHz7s2a0e4=">AAAB/HicbVBNS8NAEN3Ur1q/oj16WSyCp5KIoMeiF48V7Ae0oWy2m3bpZhN2J2II9a948aCIV3+IN/+NmzQHbX0w8Hhvhpl5fiy4Bsf5tipr6xubW9Xt2s7u3v6BfXjU1VGiKOvQSESq7xPNBJesAxwE68eKkdAXrOfPbnK/98CU5pG8hzRmXkgmkgecEjDSyK4PgT1C1iWKFwoR89rIbjhNpwBeJW5JGqhEe2R/DccRTUImgQqi9cB1YvAyooBTwea1YaJZTOiMTNjAUElCpr2sOH6OT40yxkGkTEnAhfp7IiOh1mnom86QwFQve7n4nzdIILjyMi7jBJiki0VBIjBEOE8Cj7liFERqCKGKm1sxnRJFKJi88hDc5ZdXSfe86TpN9+6i0bou46iiY3SCzpCLLlEL3aI26iCKUvSMXtGb9WS9WO/Wx6K1YpUzdfQH1ucPFBGVBg==</latexit> ergy configuration of an Ising Hamiltonian of the form [1]: T<latexit sha1_base64="cTNTABkGEVMBEkArn0q5AJC+d6Q=">AAAB7XicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0ItQ9OKxQr+gDWWz3bRrN7thdyOU0P/gxYMiXv0/3vw3btIctPXBwOO9GWbmBTFn2rjut1NaW9/Y3CpvV3Z29/YPqodHHS0TRWibSC5VL8CaciZo2zDDaS9WFEcBp91gepf53SeqNJOiZWYx9SM8FixkBBsrdVroBrmVYbXm1t0caJV4BalBgeaw+jUYSZJEVBjCsdZ9z42Nn2JlGOF0XhkkmsaYTPGY9i0VOKLaT/Nr5+jMKiMUSmVLGJSrvydSHGk9iwLbGWEz0cteJv7n9RMTXvspE3FiqCCLRWHCkZEoex2NmKLE8JklmChmb0VkghUmxgaUheAtv7xKOhd1z617D5e1xm0RRxlO4BTOwYMraMA9NKENBB7hGV7hzZHOi/PufCxaS04xcwx/4Hz+AID2jcE=</latexit> =0 N Htarget = Jijσiσj hiσi; (1) T<latexit sha1_base64="PMBZWbfPAED9/Vaoha3/tb8p5sw=">AAAB8nicbVBNS8NAEN3Ur1q/qh69LBbBU0lE0ItQ9OKxQr8gDWWz3bRLN7thdyKE0J/hxYMiXv013vw3btoctPpg4PHeDDPzwkRwA6775VTW1jc2t6rbtZ3dvf2D+uFRz6hUU9alSig9CIlhgkvWBQ6CDRLNSBwK1g9nd4Xff2TacCU7kCUsiMlE8ohTAlbyO/gGD7mMIKuN6g236S6A/xKvJA1Uoj2qfw7HiqYxk0AFMcb33ASCnGjgVLB5bZgalhA6IxPmWypJzEyQL06e4zOrjHGktC0JeKH+nMhJbEwWh7YzJjA1q14h/uf5KUTXQc5lkgKTdLkoSgUGhYv/8ZhrRkFklhCqub0V0ynRhIJNqQjBW335L+ldND236T1cNlq3ZRxVdIJO0Tny0BVqoXvURl1EkUJP6AW9OuA8O2/O+7K14pQzx+gXnI9v0KmQSQ==</latexit> = P − − <latexit sha1_base64="t0AuCNsn3x0+fHpmprYV+tC5H60=">AAACNnicbVDLSgMxFL3js9bXqEs3wSK4KjMi6LLoxo1QwT6gLUMmzbShmcyQZCxl6Fe58TvcdeNCEbd+gpl2FG17IHA459wk9/gxZ0o7zsRaWV1b39gsbBW3d3b39u2Dw7qKEklojUQ8kk0fK8qZoDXNNKfNWFIc+pw2/MFN5jceqVQsEg96FNNOiHuCBYxgbSTPvqt6KUKoncRYymiI2t1oKBbpr70s96ONi55dcsrOFGiRuDkpQY6qZ7+Yi0gSUqEJx0q1XCfWnRRLzQin42I7UTTGZIB7tGWowCFVnXS69hidGqWLgkiaIzSaqn8nUhwqNQp9kwyx7qt5LxOXea1EB1edlIk40VSQ2UNBwpGOUNYh6jJJieYjQzCRzPwVkT6WmGjTdFaCO7/yIqmfl12n7N5flCrXeR0FOIYTOAMXLqECt1CFGhB4ggm8wbv1bL1aH9bnLLpi5TNH8A/W1zePZa0L</latexit> i<j i=1 1 "##""#" X X where σi = 1 are spin variables defined on the N nodes of a graph.± The topology of the graph together with P the couplings Jij and fields hi uniquely encode the op- <latexit sha1_base64="EKlxO/vmlDs9yZbdPqO+EHi9ync=">AAACOHicbVDLSsNAFL3xWesr6tLNYBFclUQEXRbduLOCfUBbwmQ6aYdOJmFmYimhn+XGz3Anblwo4tYvcNIGH20PXDicc+/MvcePOVPacZ6tpeWV1bX1wkZxc2t7Z9fe26+rKJGE1kjEI9n0saKcCVrTTHPajCXFoc9pwx9cZX7jnkrFInGnRzHthLgnWMAI1kby7Juql6IM7W40FFjKaIjaSTxLFpm/2o84Lnp2ySk7E6B54uakBDmqnv1kHiJJSIUmHCvVcp1Yd1IsNSOcjovtRNEYkwHu0ZahAodUddLJ4WN0bJQuCiJpSmg0Uf9OpDhUahT6pjPEuq9mvUxc5LUSHVx0UibiRFNBph8FCUc6QlmKqMskJZqPDMFEMrMrIn0sMdEm6ywEd/bkeVI/LbtO2b09K1Uu8zgKcAhHcAIunEMFrqEKNSDwAC/wBu/Wo/VqfVif09YlK585gH+wvr4BTjStXw==</latexit> #""#"#" timization problem, and its solutions correspond to spin configurations σ that minimize H . While the low- f ig target Figure 1. Schematic illustration of the space of probability est energy states of certain families of Ising Hamiltoni- distributions visited during simulated annealing. An arbitrar- ans can be found with modest computational resources, ily slow SA visits a series of Boltzmann distributions starting most of these problems are hard to solve and belong to at the high temperature (e.g. T = ) and ending in the T = 0 the non-deterministic polynomial time (NP)-hard com- Boltzmann distribution (continuous1 yellow line), where a per- plexity class [2]. fect solution to an optimization problem is reached. These Various heuristics have been used over the years to solutions are found either at the edge or a corner (for non- find approximate solutions to these NP-hard problems. degenerate problems) of the standard probabilistic simplex A notable example is simulated annealing (SA) [3], which (colored triangle plane). A practical, finite-time SA trajectory mirrors the analogous annealing process in materials sci- (red dotted line), as well as a variational classical annealing trajectory (green dashed line), deviate from the trajectory of ence and metallurgy where a crystalline solid is heated exact Boltzmann distributions. and then slowly cooled down to its lowest energy and most structurally stable crystal arrangement. In addi- tion to providing a fundamental connection between the annealing has been so successful that it has inspired in- arXiv:2101.10154v1 [cond-mat.dis-nn] 25 Jan 2021 thermodynamic behavior of real physical systems and complex optimization problems, simulated annealing has tense research into its quantum extension, which requires enabled scientific and technological advances with far- quantum hardware to anneal the tunneling amplitude, reaching implications in areas as diverse as operations and can be simulated in an analogous way to SA [11, 12]. research [4], artificial intelligence [5], biology [6], graph The SA algorithm explores an optimization problem's theory [7], power systems [8], quantum control [9], cir- energy landscape via a gradual decrease in thermal cuit design [10] among many others [5]. The paradigm of fluctuations generated by the Metropolis-Hastings algo- rithm. The procedure stops when all thermal kinetics are removed from the system, at which point the solu- tion to the optimization problem is expected to be found. ∗ [email protected] While an exact solution to the optimization problem is al- 2 ways attained if the decrease in temperature is arbitrarily taken over the probability pλ(σ). The von Neumann slow, a practical implementation of the algorithm must entropy is given by necessarily run on a finite time scale [13]. As a conse- S (p ) = p (σ) log (p (σ)) ; (3) quence, the annealing algorithm samples a series of effec- classical λ − λ λ σ tive, quasi-equilibrium distributions close but not exactly X equal to the stationary Boltzmann distributions targeted where the sum runs over all the elements of the state during the annealing [14] (see Fig.1 for a schematic illus- space σ . In our setting, the temperature is decreased f g tration). This naturally leads to approximate solutions from an initial value T0 to 0 using a linear schedule func- to the