Tensor network and neural network methods in physical systems

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Peiyuan Teng

Graduate Program in Physics

The Ohio State University

2018

Dissertation Committee:

Dr. Yuan-Ming Lu, Advisor Dr. Ciriyam Jayaprakash Dr. Jay Gupta Dr. Comert Kural c Copyright by

Peiyuan Teng

2018 Abstract

In this dissertation, new ideas and methods from tensor network theory and neu- ral network theory are discussed. Firstly, common computational methods, such as the exact diagonalization method, the Density Matrix Renormalization Group ap- proach, and the tensor network theory are reviewed. Following this direction, a way of generalizing the tensor renormalization group (TRG) to all spatial dimensions is proposed. Mathematically, the connection between patterns of tensor renormalization group and the concept of truncation sequence in polytope geometry is discovered. A theoretical contraction framework is proposed. Furthermore, the canonical polyadic decomposition is introduced to tensor network theory. A numerical verification of this method on the 3-D Ising model is carried out.

Secondly, this dissertation includes an efficient way of calculating the geomet- ric measure of entanglement using tensor decomposition methods. The connection between these two concepts is explored using the tensor representation of the wave- function. Numerical examples are benchmarked and compared. Furthermore, highly entangled qubit states are searched for to show the applicability of this method.

Finally, machine learning approaches are reviewed. Machine learning methods are applied to quantum mechanics. The radial basis function network in a discrete basis is used as the variational wavefunction for the ground state of a quantum system.

ii Variational Monte Carlo(VMC) calculations are carried out for some simple Hamil- tonians. The results are in good agreements with theoretical values. The smallest eigenvalue of a Hermitian matrix can also be acquired using VMC calculations. These results demonstrate that machine learning techniques are capable of solving quantum mechanical problems.

iii This is dedicated to my parents, Mr. Yun Teng and Mrs. Min Xu.

iv Acknowledgments

I am sincerely thankful to my advisor Dr. Yuan-Ming Lu for the enlightening discussions, helpful suggestions, and careful comments. I’m amazed at his sharpness towards concepts, which clarifies a lot of my research ideas. His capability of digging deep into physics questions is also a good example for me to follow and to learn from.

His love and dedication to physics influence me to move forward with my Ph.D. study.

Learning from his good traits not only advances my research but will also help my future life.

I’m also indebted to the Department of Physics of The Ohio State University for providing such a good environment for my study and for providing kindness support.

I’m very happy to work in such a great place with distinguished professors and friendly classmates. Especially, I’d like to thank Dr. Jonathan Pelz for his wisdom and guidance about my study.

I’m also extremely grateful to my committee members, Dr. Ciriyam Jayaprakash,

Dr. Jay Gupta and Dr. Comert Kural for all the helpful talks and great suggestions.

Their suggestions and helps guided through my study and will also be beneficial to my future career.

I also want to express my appreciation to all the referees in the peer review pro- cesses of my papers. Their insightful comments and suggestions improved my work a lot.

v I’m should also thank all the wonderful friends that I met during my life as a

Ph.D. student in Columbus and in the United States, with whom I can explore many wonderful places and cultures in North America. Life would never be so amazing without them.

Finally, I’d like to express my deepest gratitude to my parents, for their unrelenting love and selfless support. My thankfulness to them for everything is beyond any words.

vi Vita

January 2, 1990 ...... Born, Dandong, Liaoning, China

2008 - 2012 ...... B.S., Physics, Nankai University, China 2012 - 2015 ...... M.S., Physics, The Ohio State Univer- sity, USA 2015 - present(2018) ...... Ph.D. Candidate, Physics, The Ohio State University, USA

Publications

Research Publications

Peiyuan Teng, ”Generalization of the tensor renormalization group approach to 3-D or higher dimensions.” Physica A: Statistical Mechanics and its Applications 472 (2017): 117-135. Peiyuan Teng, ”Accurate calculation of the geometric measure of entanglement for multipartite quantum states.” Quantum Information Processing 16.7 (2017): 181.

Fields of Study

Major Field: Physics

vii Table of Contents

Page

Abstract ...... ii

Dedication ...... iv

Acknowledgments ...... v

Vita ...... vii

List of Tables ...... xii

List of Figures ...... xiii

1. Introduction ...... 1

2. Numerical methods for many-body systems ...... 5

2.1 Exact Diagonalization ...... 5 2.1.1 Methodology ...... 5 2.1.2 Numerical implementation of exact diagonalization method 1 for spin- 2 models ...... 8 2.2 Density Matrix Renormalization Group ...... 9 2.2.1 Methodology ...... 9 2.2.2 Numerical simulations using Density Matrix Renormalization Group ...... 14 2.3 Tensor Network and Matrix Product States ...... 15 2.3.1 Tensor Network ...... 15 2.3.2 Matrix Product States ...... 17 2.4 Quantum Monte Carlo methods ...... 18 2.4.1 Introduction ...... 18 2.4.2 Variational Monte Carlo (VMC) ...... 19

viii 2.4.3 World Line Monte Carlo ...... 20

3. Generalization of the Tensor Renormalization Group method...... 22

3.1 Introduction ...... 22 3.2 Tensor renormalization approach for a 2-D system ...... 25 3.2.1 Classical Ising model and tensor network ...... 25 3.2.2 Contraction of a 2-D network ...... 27 3.2.3 Contraction of a 2-D square tensor network ...... 30 3.2.4 Free energy calculation for a 2d kagome Ising model . . . . 31 3.3 Tensor renormalization approach for a 3-D system ...... 34 3.3.1 Dual polyhedron ...... 34 3.3.2 Tensor renormalization group (TRG) ...... 35 3.3.3 Canonical polyadic decomposition (CPD) ...... 37 3.3.4 TRG in detail: From cube to octahedron ...... 40 3.3.5 TRG in detail: From octahedron back to a cube ...... 47 3.4 Numerical results of the 3-D cubic tensor network model ...... 49 3.4.1 Ising model and tensor network ...... 49 3.4.2 Calculation framework ...... 50 3.4.3 Calculation steps: tensor size and cutoff ...... 50 3.4.4 Accuracy of CPD ...... 51 3.4.5 Rescaling of the tensor network and the dual tensor network 53 3.4.6 Derivation of free energy ...... 56 3.4.7 Numerical results ...... 57 3.4.8 Current restrictions of 3D-TRG method ...... 60 3.5 Tensor RG for higher dimensional tensor network ...... 61 3.6 Discussion ...... 64 3.6.1 Applications to quantum system ...... 64 3.6.2 Potential problems of TRG in Higher dimensions ...... 64 3.6.3 Is CPD the only choice? ...... 65 3.6.4 CPD and best rank-r approximation ...... 65

4. Tensor methods for Geometric Measure of Entanglement ...... 67

4.1 Introduction ...... 67 4.2 Geometric measure of entanglement and tensor decomposition . . 70 4.2.1 Geometric measure of entanglement ...... 70 4.2.2 Tensor decomposition ...... 71 4.2.3 Numerical algorithm ...... 74 4.3 Numerical evaluation of the geometric measure of entanglement using Alternate Least Square algorithm ...... 76

ix 4.3.1 Geometric measure of entanglement for symmetric qubits pure states ...... 76 4.3.2 Geometric measure of entanglement for combinations of three qubits W states ...... 76 4.3.3 Geometric measure of entanglement for d-level system (qudits) 78 4.3.4 Hierarchies of Geometric measure of entanglement . . . . . 82 4.4 Discussions ...... 84 4.4.1 Geometric measure of entanglement for many-body systems 84 4.4.2 Several comments ...... 85 4.A Appendix: Searching for highly entangled states and maximally en- tangled states...... 86 4.A.1 Bounds on the geometric measure of entanglement . . . . . 87 4.A.2 Maximally entangled four qubits states ...... 87 4.A.3 Highly entangled four qubits states ...... 89 4.A.4 Highly entangled five qubits states ...... 92 4.A.5 Highly entangled six and seven qubits states ...... 92

5. Theory of Machine Learning ...... 94

5.1 Introduction ...... 94 5.2 Regression ...... 95 5.3 K-nearest neighbors algorithm(KNN) ...... 97 5.4 Decision tree ...... 98 5.5 Support vector machine ...... 99 5.6 K-means clustering ...... 100 5.7 Principal component analysis ...... 102 5.8 Restricted Boltzmann Machines (RBM) as an artificial neural network102 5.9 Model evaluation ...... 103

6. Solving quantum mechanics problems using radial basis function network 107

6.1 Introduction ...... 107 6.2 Artificial neural network theory and the variational Monte Carlo method ...... 109 6.2.1 Artificial neural network theory ...... 109 6.2.2 Variational Monte Carlo method(VMC) ...... 115 6.3 Solving quantum mechanics problems using artificial neural network 116 6.3.1 Theoretical outline ...... 117 6.3.2 One dimensional quantum harmonic oscillator in electric field 120 6.3.3 Two dimensional quantum harmonic oscillator in electric field 125 6.3.4 Particle in a box ...... 128 6.3.5 Neural network as a Hermitian matrix lowest eigenvalue solver131

x 6.4 Discussion ...... 134

7. Conclusions ...... 136

Bibliography ...... 138

xi List of Tables

Table Page

4.1 Overlaps for n-partite qubit systems ...... 77

4.2 Overlaps for n-partite qudit systems ...... 81

4.3 Hierarchies of 5-qubits W state ...... 83

6.1 The relation between nmax and the VMC energy at Ex = 4.0,Ey = 2.0.127

6.2 Comparison between exact values, perturbation results and numerical VMC energy at different a...... 131

6.3 VMC results of the lowest eigenvalue of H(d)...... 133

xii List of Figures

Figure Page

2.1 The infinite-system DMRG ...... 12

2.2 The finite-system DMRG ...... 13

2.3 Tensor network...... 16

3.1 The triangular Ising lattice and its tensor network...... 25

3.2 The kagome Ising lattice and its tensor network...... 26

3.3 Singular value decomposition (SVD)...... 27

3.4 SVD and contraction (honeycomb)...... 29

3.5 Singular value decomposition (SVD) for a square tensor network. . . 30

3.6 SVD and contraction (square)...... 31

3.7 Converting a triangular tensor network to a hexagonal tensor network. 32

3.8 Dual polyhedron for a cube...... 35

3.9 Graphical representation of the Tensor RG process, from T to T 0. . . 36

3.10 Relabel Tabcxyz to Tmnp ...... 38

3.11 Truncation sequence from a cube to an octahedron...... 40

3.12 CPD from T tensors to A tensors...... 42

xiii 3.13 The contraction from SVD from A to B to C ...... 43

3.14 Transformation of the tensor network ...... 44

3.15 Stacking of the octahedrons in the dual space...... 45

3.16 The projection of 3-D model to 2-D model ...... 46

3.17 SVD of the dual tensor network S...... 47

3.18 The renormalized tensor network T 0 ...... 48

3.19 Vertex Ising model and Bond Ising model ...... 49

3.20 Semi-log diagram of the percentage error as a function of r under dif- ferent magnetic field and temperature using cp als...... 52

3.21 Magnetization as a function of temperature at different h...... 58

3.22 Magnetization as a function of temperature at h = 0...... 59

3.23 n-cube and n-orthoplex (n=2,3,4) ...... 62

4.1 Entanglement as a function of s using tensor decomposition...... 79

4.2 Decomposition for complex tensors...... 80

4.3 Entanglement of the HS family of states as a function of t...... 88

4.4 Entanglement of the BSSB family of states as a function of x. . . . . 90

5.1 The support vector machine...... 101

5.2 The restricted Boltzmann machine...... 104

6.1 An illustration of the artificial neural network...... 110

6.2 Minimization of the ground state energy of H at E = 0, using Gaussian radial basis network...... 122

xiv 6.3 Minimization of the ground state energy of H at E = 0, using Eq. 6.9 as the radial basis function...... 123

6.4 Minimization of the ground state energy of H at E = 0.5, 1.0, 2.0, using Gaussian radial basis function...... 124

6.5 ψ(n) as a function of n at E = 0.0, 0.5, 1.0, 2.0, using Gaussian radial basis function...... 125

6.6 ψ(nx, ny) as a function of nx + 1, ny + 1 at Ex = 1.0,Ey = 1.0, using Gaussian radial basis function...... 127

6.7 ψ(nx, ny) as a function of nx + 1, ny + 1 at Ex = 4.0,Ey = 2.0, using Gaussian radial basis function...... 128

6.8 ψ(nx, ny) as a function of ny at different nx with Ex = 1.0,Ey = 1.0. 129

6.9 Minimization of the ground state energy of H at a = 0.0, 2.0, 4.0, 8.0, −8.0130

xv Chapter 1: Introduction

More is Different.

P.W. Anderson

Our understanding of nature was changed fundamentally since the birth of quan- tum mechanics. The advances in quantum theory brought us a conceptual revolution, and we realized that the classical physical picture that comes from daily life may not be true for a quantum system. Although we are faced with the ultimate problem of reconciling quantum mechanics and general relativity, the development of the quan- tum theory itself never disappointed us.

One successful branch of quantum physics is the quantum field theory, which leads to the Standard Model in the late 20th century. The phenomena of superconductivity and superfluidity represent another important direction of the quantum theory, the

Collective Phenomena. As P.W. Anderson once said, ”More is Different.” Although the behavior of a quantum particle can be relatively easy to understand, the collective behavior of a large number of particles might be far beyond our imagination.

One of the most interesting phenomena of the Collective Phenomena in many-body systems is the quantum Hall effect, especially the fractional quantum Hall effect[1].

1 For a two-dimensional electron system in a magnetic field, the Hall conductance σxy

is determined by

e2 σ = ν . (1.1) xy h

Here e is the electron charge, h is the Planck constant and ν is the Landau level

filling factor. For the integer quantum Hall effect, the filling factors ν are integers.

The most surprising fact is that, when the magnetic field is large, the filling factor can be a fractional number, which is exotic. This is the fractional quantum Hall effect.

Laughlin proposed a wave function[2],

Y m Y 2 ψ(z1, z2, ...zn) = (zi − zj) exp(−|zl| ) (1.2) i

Here zi is the complex position of a 2D particle and m = 1/ν, which is an integer.

This is an intrinsically many-body wavefunction which cannot be obtained from any perturbation theory of noninteracting electrons.

Understanding the physical properties of a many-body system is a difficult branch of modern physics. Two well-known examples of many-body Hamiltonians are the

Hubbard model for electrons and the Heisenberg model for spins. Although these models look extremely simple, they proved to be very difficult to solve analytically beyond one-dimension. To understand the properties of such models, developing efficient numerical methods to solve these models is a very crucial task.

Our understanding of quantum many-body physics is limited by the size of the system that can be studied. On one side, it is very rare to have theoretical solutions to a generic quantum many-body system. On the other side, the dimension of the

Hilbert space grows exponentially with the system size, which makes the many-body

2 system difficult to be solved numerically. Our understanding of the quantum many- body physics is also restricted by the fact that we don’t have a good numerical method for a higher dimensional system. Therefore, developing efficient algorithms to solve quantum many-body problems is crucial to the development of quantum many-body physics.

The exact diagonalization method is the most straightforward approach to solve a many-body problem numerically. The idea is to write down a matrix representa- tion of the Hamiltonian and diagonalize this matrix numerically. One restriction of this method is that it can only deal with a small-size system. Inspired by the nu- merical renormalization group approach, the density matrix renormalization group method[3] is proposed in 1992, which is used to determine the ground state energy of one-dimensional quantum systems. The DMRG method is a very successful numeri- cal method for quantum many-body physics, and it was used to verify the Haldane’s conjecture and accurately to calculate the excitation gap of the spin-1 antiferromag- netic Heisenberg model. However, this method is inefficient when applied to higher dimensional systems. Another method, the quantum Monte Carlo approach, also suf- fers greatly from the notorious sign problem and is usually not applicable to a generic model.

The development of the tensor network theory provides us new possibilities. After realizing the relation between DMRG and the Matrix Product States(MPS), the MPS and the Projected Entangled Pair States[4] had been successfully used to acquire the ground state energy of a quantum system. The success of the MPS states gave rise to the tensor network theory. More recently, neural network theory was successfully used to solve the many-body problem in the context of variational Monte Carlo approach[5].

3 These pioneering works set up new paradigms, and in this dissertation, I will continue to explore these directions, which is, applying tensor network and neural network theory to physical systems.

Motivated by the intuitiveness and success of the 2D tensor renormalization group approach (TRG)[6], in this dissertation, a way of generalizing the tensor renormal- ization group to higher dimensions is first proposed. Secondly, following the idea of using a tensor to represent a quantum state, an efficient method of calculating the geometric measure of entanglement[7] using tensor decomposition methods is sug- gested. Finally, inspired and motivated by the success of machine learning theory, neural network methods are used to solve for the ground state of a quantum system.

Useful numerical methods for many-body systems and machine learning methods are also discussed for completeness.

This dissertation contains published journal articles from Ref. [8][9] and preprint from Ref. [10]. Copyrights of these chapters are credited to the corresponding journal.

This dissertation is organized as follows. In Chapter 2, numerical methods for many-body systems are reviewed. In Chapter 3, tensor renormalization group (TRG) is generalized. In Chapter 4, the geometric measure of entanglement using tensor decomposition methods is proposed. In Chapter 5, machine learning methods are discussed. In Chapter 6, the radial basis function network on a discrete basis is used as the variational wavefunction to solve for the ground state of quantum systems.

4 Chapter 2: Numerical methods for many-body systems

2.1 Exact Diagonalization

2.1.1 Methodology

Exact diagonalization is one of the most accurate methods for solving a quantum

or quantum many-body system. The idea of exact diagonalization is very simple,

which is to represent the Hamiltonian of a physical system by a matrix and to solve

the matrix eigenvalues and eigenvectors mathematically. No approximations and

truncations are made in the exact diagonalization method, so we can acquire the

exact ground state energy and the ground state of our system. The main restriction

of exact diagonalization method is that it can only deal with small system size since

the dimension of the Hilbert space and the Hamiltonian matrix grows exponentially

1 with the system size. Practically, the maximum system size of a quantum spin- 2 Heisenberg model that exact diagonalization method can solve is about 40 sites on a

supercomputer.

Historically, exact diagonalization calculation was first carried out by Bonner and

1 Fisher[11]. In their work, spin- 2 model was studied with the number of sites ranges from 2 to 11. Many later studies were carried out with the development of compu- tational technology. Currently, exact diagonalization method is still one of the best

5 benchmark methods, and it gives us a glimpse on the properties of a system. In this

1 section, the methodology of exact diagonalization will be illustrated using the spin- 2 Heisenberg model.

1 The Hamiltonian of the spin- 2 Heisenberg model is given by

n X H = J SiSi+1. (2.1) i=1

Here J is the coupling constant, n is the number of sites in this system. Si is the spin operator for site i. We can identify n+1 as 1 for a system with periodic boundary condition. For each spin, we can take the local basis to be spin up and down, which can be labeled by |0i and |1i , respectively. Then, a random element in the basis for

1 the spin- 2 chain can be written as |010011i, for example. It can be interpreted as a binary number, so we can label the basis of the spin chain as |0i ↔ |000 ... 000i,

|1i ↔ |000 ... 001i, |2i ↔ |000 ... 010i,..., |2N − 1i ↔ |111 ... 111i. This is called

1 the bit representation of spin- 2 systems. For higher spin systems, we always can use the same idea (trinary representation for a spin-1 system, for example). For other systems, we can discretize and label the basis in the same way and carry out the exact diagonalization calculation. After this labeling step, the matrix elements of the

Hamiltonian can be written as

Hmn = hm|H|ni. (2.2)

6 N Here m and n =0, 1,..., 2 − 1, and Hmn is the matrix element at the m-th row

1 and n-th column. For the spin- 2 Heisenberg model, we can rewrite the Hamiltonian as

n X 1 1 H = J (SzSz + S+S− + S−S+ ). (2.3) i i+1 2 i i+1 2 i i+1 i=1 The matrix element can be calculated as follows. Looping over all the matrix

z z basis |mi. For the diagonal element of Hmn, consider the interaction term Si Si+1, if the i-th bit of the binary representation of |mi equals the (i+1)-th bit of the binary

1 representation of |ni, we add 4 to the existing value of Hmm, this is because the i-th bit and the (i+1)-th bit have the same sign, so after applying the spin-z operator we

1 have a value of 4 ; if the i-th bit of the binary representation of |mi is different from

1 the (i+1)-th bit of the binary representation of |mi, we subtract 4 to the existing

value of Hmn, since the spin-z operators will yield a different sign. Summing over

all the possible bounds we will get the value of Hmn. For the off-diagonal matrix

z z elements, the Si Si+1 interaction doesn’t contribute, so we only need to consider the

1 + − 1 − + contribution from the interaction term 2 Si Si+1 + 2 Si Si+1. This interaction term will only be nonzero when the i-th bit and the (i+1)-th bit of the binary representation of

|mi are different. One and only one term will swap the spins at the i-th bit and the

(i+1)-th bit, while the other term evaluates to zero. Assume the swapped stated is

1 labeled by |ni, we should assign Hmn = 2 . Looping over all the basis we can construct

1 the Hamiltonian matrix of the spin- 2 Heisenberg model. After the Hamiltonian matrix is constructed, which is, in general, a sparse matrix

for a physical system, we can mathematically diagonalize this matrix and get the

ground state energy and the ground state. Practically this can be done by the Lanczos

7 algorithm[12], which is proposed by Cornelius Lanczos in 1950. Although initially, the

Lanczos algorithm suffered from numerical instability issue, it was later on modified and optimized. Currently, the Lanczos algorithm is one of the most widely used sparse matrix eigenvalue solvers. With the already construct matrix libraries, sparse matrix eigenvalues can be easily and accurately solved in C++ or MATLAB.

2.1.2 Numerical implementation of exact diagonalization method 1 for spin-2 models

In this section, a detailed implementation of the exact diagonalization method will be discussed. The results are carried out with a combination of C++ and MAT-

LAB. Specifically, the Hamiltonian is computed and saved in C++ for fastness, then transferred to MATLAB for its specialization in matrix calculation.

One of the major restrictions of the exact diagonalization method is the memory, or put it in another way, the storage of the Hamiltonian matrix. We can imagine a

Heisenberg type model with N sites and M bonds. For a 1d Heisenberg chain with the periodic boundary condition, we have N=M, although N and M might be different for other models, for example, spins on a 2d honeycomb lattice. The number of nonzero elements in the Hamiltonian matrix approximately equals 2 ∗ M ∗ 2N , which can be calculated by simple counting. For a 1d Heisenberg model with 30 sites, with the storage cost of each byte set to be 8 Bytes, we need approximately 480GB of storage.

While for a model with 40 sites, this number is 640TB with is around the current storage limit of a supercomputer. Construction of such large matrix in MATLAB is extremely slow, so C++ is used to generate such matrix.

Practically, the bonds in the Hamiltonian is first constructed as a two-dimensional array. The basis of the many-body Hilbert space is looped over, during which the

8 decimal labels are converted to binary numbers. Inside the basis loop, the bonds are

also looped over to construct the Hamiltonian matrix. The algorithm of the con-

struction of Hamiltonian is described in the previous section. After this construction,

this matrix is saved to a file and transferred to MATLAB for diagonalization. In

MATLAB, the data imported is used to generate a sparse matrix and eigs() is used

to diagonalize the Hamiltonian, in which the Lanczos algorithm is implicitly used.

Eigenvalues and eigenvectors are the outputs of the eigs() function. After this step,

we can calculate the density matrix, reduced density matrix, or expectation value

1 of observables. The ground state energy converges to 0.4438 for spin- 2 Heisenberg model when the number of sizes is increased. The dimension of the system and the structure of the lattice can be encoded in the bonds, since we can map the lattice into a 1d chain and the interaction in the other dimensions can be converted to long-range interaction.

2.2 Density Matrix Renormalization Group

2.2.1 Methodology

The Density Matrix Renormalization Group (DMRG) was first introduced by

Steven White in 1992[3], which was successfully used to compute the ground state energy for a one-dimensional quantum system. Compared with exact diagonalization which can only solve a system with a small number of sites, DMRG can tackle systems with a practical number of sites, such as 100 sites. The DMRG method was inspired by the previous numerical renormalization group method[13], which successfully solved the Kondo problem while failed to acquire accurate results for other models. The

9 DMRG method rephrases the renormalization process in terms of the density matrix, so it generalizes the previous numerical renormalization group approach.

To start with, we first need to introduce the DMRG for an infinite system. The infinite system DMRG is also an essential step in a finite system DMRG. Consider a

Hamiltonian on a lattice given by

n X H = J OiOi+1 + hOi. (2.4) i=1

Here Oi is an operator that acts on site i, J and h are coupling constants. This is a general Hamiltonian where Oi can be any local operator. We are using this

Hamiltonian for illustration purpose, while DMRG can be applied to other systems that cannot be described by this Hamiltonian.

To use the infinite-system DMRG algorithm, we start from a block consists of one site, assuming the dimension of the Hilbert space of a single site is D, our goal is to enlarge this block in a systematical way. In general, from a mathematical induction perspective, we can start from a block with n sites and dimension m, our goal is to enlarge this block with n+1 sites while keeping the dimension of the Hilbert space well truncated, for example, the dimension of the enlarged space is still m. To do this we enlarge the n-site block by adding one site into the n-site system. The Hamiltonian of the enlarged block can be written as

Henlarge = Hblock + Hsite + Hinteraction. (2.5)

The meaning of all the Hamiltonian is explained in their subscripts. Hblock is the

Hamiltonian of the block that we begin with. Hsite is the single site Hamiltonian, and

Hinteraction is the interaction term between the block and the added site. Furthermore,

10 we can construct a super block by adding the enlarged block with a mirror reflection symmetrical enlarged block Hblock0 . The Hamiltonian of the super block system can be written as

Hsuper = Hblock + Hblock0 + Hsitesite. (2.6)

Here Hsitesite is the interaction term between two added sites.

After acquiring the matrix representation of Hsuper, we can numerically diagonalize this matrix. Assuming the ground state of this Hsuper is |ψsni, which can be solved by numerically diagonalize this super block Hamiltonian. And we can construct the reduced density matrix of the enlarged block ρenlarge = T rblock0 |ψsnihψsn| . The most important step of the DMRG method is the truncation of this reduced density matrix.

We can do a singular value decomposition on this matrix and keep only m largest singular values. When the block is growing at the beginning, we may not have enough m. What we could do is to take the maximal dimension which is smaller than m.

After this truncation using the singular value decomposition, we rotate into a new set of truncated basis, therefore, we can represent the new enlarged block Hamiltonian in terms of the new basis,

0 † Henlarge = A HenlargeA. (2.7)

Here A is the matrix that rotates from the old basis to the new truncated basis.

Other operators can be rotated in the same way.

Iterate over this step, the block size will increase without increasing the compu- tational cost, therefore, we could get the ground state energy of an infinite system.

This process is illustrated in Figure 2.1.

11 Figure 2.1: The infinite-system DMRG This figure is credited to Ref. [14]

The infinite-system DMRG sometimes doesn’t yield the desired accuracy, to rem- edy this issue, we have the finite-system DMRG procedure.

The finite-system DMRG algorithm starts with the infinite-system DMRG. Specif- ically, a superblock with the desired length is constructed using the infinite-system

DMRG algorithm. After a superblock is constructed, a sweep procedure is introduced to increase the accuracy.

We start from a superblock after infinite system DMRG, we increase the left block by one site while reduce the right block by one site to keep the total site number

fixed. During the infinite system DMRG procedure, we should store all the block

Hamiltonians with a certain block site number to construct all the right blocks for

12 the sweeping. The superblock Hamiltonian, the ground state, and reduced density matrix can be constructed in the same way as the infinite-system DMRG algorithm.

The sweeping procedure continues by enlarging the left block and shrinking the right block. The sweeping direction reverses when the right block has only one site. After several sweeps, the DMRG algorithm will converge, and we acquire the ground state energy and the ground state. In general, this DMRG algorithm is designed for open boundary condition.

The finite DMRG process is illustrated in Figure 2.2.

Figure 2.2: The finite-system DMRG This figure is credited to Ref. [14]

Good review articles of DMRG could be found at Ref.[14][15]

13 2.2.2 Numerical simulations using Density Matrix Renor- malization Group

One popular tool for simulating strongly correlated systems is the ALPS (Al- gorithms and Libraries for Physics Simulations) project[16]. This library contains solvers for exact diagonalization, Density Matrix Renormalization Group, quantum

Monte Carlo, and the Matrix Product States etc. The ALPS project uses the XML script language to set parameters because of the wide application of the XML lan- guage on the internet. The XML language provides a way to specify the lattice and the model. One advantage of the ALPS project is that once the model and lattice are specified, we can run the simulation using different solvers and compare the re- sults. This would cross-check the validity of our simulation. The ALPS project is also optimized for parallelization and workflow management, which makes it a useful numerical tool of the strongly correlated systems.

Using DMRG function in the ALPS project is straight forward. After we set up the parameters, which include the model type, lattice type, number of sweeps, the maximal number of states to keep, conserved quantum numbers etc. DMRG can be done by calling a single ”dmrg” function. In this way, the ground state of a 1d model can be found with little error. Excited state energy and energy gap can also be found by tracking the excited states in DMRG. The DMRG method in ALPS can also be used to track local observables and correlation functions, which can be specified in the parameter file. The ALPS project provides an easy and organized way to test out new ideas.

Another package for the DMRG simulation is the ITensor library developed by

Miles Stoudenmire[17]. Compared to the ALPS project, the ITensor library is written

14 in C++, and it provides much more flexibility in coding than the ALPS project. The

ITensor library can be used to do tensor network calculations. ITensor objects can be

constructed and contracted intuitively without writing out the indices explicitly. This

gives great convenience in terms of programming. The DMRG function in ITensor is

written in the Matrix Product States context. In ITensor, the Hamiltonian can be

added in a systematic way using the Matrix Product Operator by adding the site-

site connections. After constructing the Hamiltonian and setting up the parameters,

DMRG calculations can be carried out easily. Local observables and correlation func-

tions can be constructed in terms of the MPS and physical properties of our system

can be acquired. The MPS formulation also enables us to simulate 2-dimensional

systems with small dimensions by mapping the 2-D system into a 1-D system, in

which higher dimensional bonds are converted to long-range connections.

2.3 Tensor Network and Matrix Product States

2.3.1 Tensor Network

In Figure 2.3, we illustrate a network of tensor T s live on the nodes, and common

indices between tensors are represented by bonds.

If we have two tensors Bβα, Bαγwith a bond α, we can contract this bond by

X Aβγ = BβαBαγ. (2.8) α

The entire tensor network represents a tensor with bond indices contracted. In general, we can use the Einstein summation convention to omit the summation sym- bol. The external legs of nodes correspond to the legs of a tensor.

15 Figure 2.3: Tensor network.

A quantum many-body wavefunction can be written as

X |ψi = Ci1i2...in |i1i|i2i...|ini. (2.9)

In practice, this tensor Ci1i2...in is usually too large to be represented, so it would be interesting if we could represent this tensor in terms of a network of smaller tensors.

This will capture the entanglement structure of the ground state of a quantum many- body system. Also, it will facilitate the calculation. This idea leads to the Matrix

Product States (MPS) and Projected Entangled Pair States (PEPS).

Tensor network can also be used to represent the partition function of a system which leads to tensor renormalization group approach in the next chapter. Good reviews of the tensor network theory could be found in Ref.[18][4].

16 2.3.2 Matrix Product States

The Matrix Product States[19] can be written as

X i1 i2 in |ψi = T r(A1 A2 ...An )|i1i|i2i...|ini. (2.10)

Here Ak can are matrices with indices ik. Theoretically, Any one-dimensional quantum state can be approximated by a Matrix Product State given enough di- mensions of the matrix. Some lattice model can have an exact matrix product state representation.

This representation is a theoretical representation of a given state, to do numerical calculations we need other approaches.

One approach is the variational optimization. The goal is to minimize the varia- tional energy

hψ|H|ψi E = (2.11) hψ|ψi

The variational wave function can be represented as an MPS state.

Another approach is the imaginary time evolution. The ground state is given by

e−τH |ψ(0)i |Gi = lim (2.12) τ→∞ ||e−τH |ψ(0)i||2

The term e−τH can be decomposed into infinitesimal time, and the ground state can be acquired by acting infinitesimal evolution operator to the matrix product states. This approach gives rise to the time-evolving block decimation (TEBD) algo- rithm.

17 For a two dimensional system, we can similarly write

X i1 i2 in |ψi = Contraction(A1 A2 ...An )|i1i|i2i...|ini. (2.13)

The ”contraction” means contracting over a tensor network. This is the Projected

Entangled Pair States (PEPS)[20].

2.4 Quantum Monte Carlo methods

2.4.1 Introduction

The Monte Carlo methods are one of the most used methods in science and engi-

neering. As a result, the Quantum Monte Carlo methods are also helpful when used

to solve quantum mechanics problems. In this chapter, we’ll discuss different ways of

doing quantum Monte Carlo simulation, such as variational Monte Carlo, diffusion

Monte Carlo etc.

One important application of the Monte Carlo method is to evaluate integration.

For example, give a definite integral

Z b I = f(x)dx. (2.14) a

This integral can be approximated by

X I = [(b − a)/n] f(xi), (2.15) i

where n is the number of elements in the sum. And xi are uniformly sampled from [a, b]. This method can be modified to assign weights on each f(xi) to improve √ performance. The error decays as 1/ n.

18 More generally, we can rewrite the integral as

Z b I = g(x)p(x)dx, (2.16) a where f(x) = g(x)p(x), where p(x) is a probability distribution function.

If xi are sample with probability p(x), we have

X I = (1/n) g(xi). (2.17) i

2.4.2 Variational Monte Carlo (VMC)

The VMC method was proposed by McMillan in 1965[60]. It is one important branch of the quantum Monte Carlo Method.

The energy of a quantum system can be written as

Hψˆ (X; λ) R |ψ(X; λ)|2 dX hψ(λ)|Hˆ |ψ(λ)i ψ(X; λ) E(λ) = = , (2.18) hψ(λ)|ψ(λ)i R |ψ(X; λ)|2dX

with a Hamiltonian Hˆ and a variational wave function |ψ(λ)i. λ is a set of

variational parameters, where X is one configuration of the quantum system. The

local energy is

Hψˆ (X; λ) E (X; λ) = , (2.19) local ψ(X; λ)

The term

|ψ(X; λ)|2 , (2.20) R |ψ(X; λ)|2dX

19 can be interpreted as a probability distribution. And the local energy can be

sampled with this probability distribution to get a good estimate of the ground state

energy using, for example, the Metropolis algorithm[61].

A common choice of the variational wavefunction for interacting atoms is the

Jastrow function[62]

N X ψ = exp( −u(rij)), (2.21) i

Where u(rij) is a function with parameters to optimize. For a fermionic system, considering the exchange statistics, we have

N X ψ = D × exp( −u(rij)). (2.22) i

A good review of the VMC and DMC method can be found in Ref.[63].

2.4.3 World Line Monte Carlo

Given the partition function of a quantum system

Z = T re−βH (2.23)

The world line Monte Carlo starts dealing with this partition function by defining

β = L∆τ,

Z = T r[e−∆τH ]L. (2.24)

Using the Suzuki-Trotter transformation,

lim [e−∆τ(H1+H2+...Hn)]L = [e−∆τH1 e−∆τH2 ...e−∆τHn ]L. (2.25) L→∞ 20 And H = H1 + H2 + ...Hn, where Hi may not commute with each other. The

partition function can be written as

Z ≈ T r[e−∆τH1 e−∆τH2 ...e−∆τHn ]L. (2.26)

We can insert basis to this partition function, and this partition function can be understood as the partition function of a classical system with a higher dimension.

The Monte Caro can be done by sampling over the configurations of the classical system. This is the world line Monte Carlo.

21 Chapter 3: Generalization of the Tensor Renormalization Group method.

3.1 Introduction

Tensor network model has become a promising method in simulating classical and quantum many body systems. This method represents physical quantities, such as, wave-function, or the exponential of a Hamiltonian, in terms of a multi-indexed tensor. Then we can calculate, physical observables, or partition functions, from a network of tensors. After contracting over this network, we can get physical behavior of our many body system. Examples of this approach is the matrix product state

(MPS)[19][21] and projected entangled paired states (PEPS)[20].

The density matrix renormalization group (DMRG) [3] is a powerful method for

1-D quantum systems. For systems in dimensions larger than 1, the DMRG algorithm is known to scale exponentially with the system size. The tensor network correspon- dence of the DMRG, which is MPS, can be generalized to higher dimensions. Other generalizations such as multi-scale entanglement variational ansatz (MERA)[22] are also key aspects of the tensor network theory.

Compared with quantum Monte Carlo, which suffers from the sign problem, tensor network theory provides us a new way of doing calculations. Direct contraction of a

22 tensor network, however, is not always possible. As a result, finding an organized way to approximate and contract a tensor network is an important aspect of the tensor network method. For example, we can group together some tensors systematically and contract some of our tensors and get a new coarse-grained tensor. The new tensor network shares the same symmetry with the original tensor network. This idea was explored by Levin and Nave[6]. They proposed this method for 2-D classical lattice models and use singular value decomposition (SVD) to do approximations.

Their method has a similar spirit with the block spin method[23], a and they call this method tensor renormalization group(TRG). It can be generalized to the so-called second renormalization group (SRG)[24], tensor network renormalization(TNR)[25], higher-order singular value decomposition (HOSVD)[26]. The way of contracting over the tensor network can also be applied to quantum models using the mapping between a d-dimensional quantum system and d+1 dimensional classical system[27].

Novel decompositions such as rank-1 decompositions were also proposed to tensor network theory[28].

TRG proposed by Levin and Nave is for 2-D classical systems. For a higher dimensional system, especially 3-D, calculations had been done, for example, as a variant of DMRG[29], as a new contraction strategy[30], and HOSVD[26]. Among these methods, HOSVD shares some similarities with TRG. Mathematically, HOSVD uses Tucker decomposition, which is a specific higher-dimensional generalization of

SVD. But we should notice that when their method is applied to a 2-D system, the geometric structure of the contraction is different from TRG. Therefore it is necessary for us to consider the generalization TRG to higher dimensions.

23 In this paper, we propose a framework to do contraction systematically on tensor networks in higher dimensions. We also generalized TRG to higher dimensional tensor networks. To achieve this goal, we introduce the canonical polyadic decomposition

(CPD) into the tensor network method. The concept of tensor rank can also be defined using CPD, therefore, CPD is also called tensor rank decomposition. Our method reduces to 2D-TRG when it is applied to a 2-D tensor network, which is a result of the fact that SVD is the 2-D version of CPD.

We apply our method to a 3-D cubic tensor network. The concept of the dual tensor network is also proposed. To make the contraction process iterate, the tensor network has to go from the original tensor network to its dual tensor network, then dual back. The dual of the dual tensor network has the same geometric structure with the original tensor network.

Mathematically, we propose a correspondence between TRG and the concept of truncation sequence in polytope geometry. The tensor network transforms in the same way as the truncation of a polyhedron(polytope in 4-D or higher). And the tensor

CPD geometrically correspond to the truncation of the corner of a polyhedron. TRG in 2-D can also be understood in the framework of dual tensor network and truncation sequence.

This part is organized as follows. Section 3.2 reviews the tensor renormalization group method in 2-D. Section 3.3 generalizes the concept of TRG to 3-D, CPD is introduced and details about the renormalization process are discussed. Section 3.4 shows some simulation results about this 3-D tensor renormalization group method.

Section 3.5 proposes the similar method in higher dimensions. Section 3.6 discusses some applications and problems with this method.

24 Regarding the terminology, we need to mention some points that may be am- biguous. The word ’truncation’ means either the numerical truncation of the singular values of a tensor(matrix) or the geometric truncation of a polytope. The word ’honey- comb’ means either the 2-D honeycomb lattice or of a higher dimensional space.

3.2 Tensor renormalization approach for a 2-D system

3.2.1 Classical Ising model and tensor network

Let’s first review 2-D TRG and discuss its geometric meanings. For a classical lattice system, one can find its tensor network representation. For example, both triangular and the kagome lattice can be mapped to a honeycomb tensor network.

A honeycomb tensor network may correspond to two Ising model: (1) a triangular lattice Ising model, see Figure 3.1. (2) a kagome lattice one, see Figure 3.2.

Figure 3.1: The triangular Ising lattice and its tensor network.

By connecting the centers of the triangles for both lattices, we get a honeycomb tensor network. In these Ising models, the spins live on the vertices and interactions

25 Figure 3.2: The kagome Ising lattice and its tensor network.

lives on the lines, while for the tensor network, each tensor corresponds to a triangle

and is represented by Tijk. The three indices of the tensor Tijk correspond to three spins of the Ising model.

The partition function of a system can be represented by

X −βH(σ) X Z = e = TijkTjpqTkabTkmn.... (3.1) spins indices −1 Here β = (kbT ) , H(σ) is the Hamiltonian of the Ising model, and kb is the

Boltzmann constant.

X X H(σ) = − Jσiσj − µh σi. (3.2) i 1 For example, for the kagome lattice, we have a factor of 2 in front of h, since each spin is shared by two triangles.

1 βJ(σiσj +σj σk+σkσi)+ βµh(σi+σj +σk) Tijk = e 2 . (3.3)

26 This tensor can be regarded as our initial tensor when specific values of β, µ J and h are given. Iterations can be done based on this tensor.

3.2.2 Contraction of a 2-D honeycomb network

The physical properties of a classical spin system can be calculated from the partition function. Direct calculation of the partition function is a very difficult task.

Monte Carlo method is an efficient way for classical systems. Here we’d like to use

TRG to calculate physical properties.

First, separate the tensors in the partition function into nearest pairs and find tensor S so that

X X TijmTmkl ≈ SlinSjkn. (3.4) m n This step can be graphically represented as (shown in Figure 3.3)

Figure 3.3: Singular value decomposition (SVD).

27 This step is achieved through singular value decomposition(SVD) by setting

Mli,jk = TijmTmkl. (3.5)

and finding a matrix S which minimizes ||M − SST ||. SVD of M gives a diagonal matrix d and two unitary matrices U,V .

X ∗ Mli,jk = dnUli,nVjk,n. (3.6) n √ √ A B ∗ The S matrices can be get by setting Slin = dnUli,n, Sjkn = dnVjk,n, up to a phase factor. Here we take the largest values of dn in order to match the dimension of the matrices. Two S matrices can be equal as long as T matrix has certain symmetries and proper unitary phase factor is selected.

Then we contract three S tensors into a tensor T 0, as is shown in Figure 3.4.

0 X Tijk = SiabSjbcSkca. (3.7) pqr 1 After these steps, the number of spins in a system is reduced by a factor of 3 . The tensor is mapped from T to T 0. When the initial temperature and the magnetic field is fixed, this method provides us an organized way of approximating and contracting a tensor network. After several steps of iterations, for a finite system, we’ll get a contracted tensor T ∗. Trace over this tensor, we’ll get the value of the partition function. For an infinite system, after proper normalization, this iteration will lead

0 to a tensor Tf , which is the fixed point of the mapping from T to T , although it is not necessarily the critical point tensor of the system. The free energy per spin also converges under iterations.

28 Figure 3.4: SVD and contraction (honeycomb). Iterating contraction of the honeycomb tensor network. The middle picture is the truncated .

Here we’d like to introduce some mathematical concepts. The word tiling or tessellation, means using one or more shapes to fill a 2-D plane without overlapping and spacing. A tiling is called regular when the tessellation is done using only one type of regular polygon. There are three regular tilings in 2-D Euclidean plane: , , and hexagonal tiling. The tiling in higher dimensions is usually called a honeycomb. Truncation means cutting the corner of a polygon or a polyhedron. We’ll get a new edge or a new face after truncation. The remaining shape depends on the size of the truncation, and this dependence is called a truncation sequence.

From a geometric point of view, a honeycomb lattice can be viewed as a hexagonal tiling of a 2-D plane. The SVD step actually gives rise to a truncated hexagonal tiling of a 2-D plane(see Figure 3.4 middle). After summation, we get back the hexagonal tiling. This means a hexagon truncates to itself through a 12-gon(dodecagon). The kagome lattice corresponds to a .

29 3.2.3 Contraction of a 2-D square tensor network

The TRG iteration for a square lattice is similar. In a square lattice, we have a

tensor with 4 indices, we can write this tensor as T(ij)(kl).

T(ij)(kl) ≈ S(ij)rS(kl)r. (3.8)

Treat this tensor as a matrix and do an SVD on this matrix. We can get two tensors denoted by S. This step can be graphically represented as (see Figure 3.5)

Figure 3.5: Singular value decomposition (SVD) for a square tensor network.

After SVD, our tensor network now consists of a tessellation of squares and oc-

tagons. Summing over every little square, we get our new tensor network T 0, see

Figure 3.6 . This process will iterate and we can calculate the value of the partition function.

We’d like to reconsider TRG from the aspect of tessellation geometry. Square tiling is another in 2-D. SVD gives rise to a truncated square tiling.

The contraction gives back the square tiling. When thinking in terms of tessellation, we discover that T 0 is actually the dual tensor network of the square tensor network

30 Figure 3.6: SVD and contraction (square). Iteration of the square tensor network. Middle picture is the truncated square tiling.

T . The tricky part is that, for 2-D, the dual of a square is still a square, therefore, we

may not discover the subtleties in 2-D. Things are not so simple in higher dimensions

since the dual of a cube is no longer a cube. Our framework of higher dimensional

TRG reduces to the square case when applied to a 2-D system. We’ll explain the

concept of a dual tensor network in details later.

Triangular tiling is the third type of tiling which consists of only one shape.

A triangular tensor network can be converted to a hexagonal tensor network using

Tucker decomposition, see Figure 3.7. Therefore, we have discussed the contraction technique for all the possible regular tensor network in 2-D for completeness.

3.2.4 Free energy calculation for a 2d kagome Ising model

The TRG contraction technique can be applied to both the finite lattice and the infinite lattice. Applications to a finite lattice system have been discussed in earlier papers, for example, HOSVD. In this part, we’d like to use TRG to calculate the physical properties of an infinite lattice.

31 Figure 3.7: Converting a triangular tensor network to a hexagonal tensor network. Converting a triangular tensor network (black) to a hexagonal tensor network (blue). Tucker decomposition and contraction.

Now let’s discuss an example of how to calculate the physical properties as a

function of the temperature and the magnetic field. For simplicity, we assume that

the spin takes value -1 or 1, so each index of tensor Tijk can have two values. Then

this tensor can have 8 variables. For our kagome Ising lattice, the Hamiltonian is

symmetric about the permutation of a, b, and c. So there are actually 4 variables and

the matrix M is symmetric, so tensor SA and SB are the same.

Following the steps mentioned above, i.e. SVD and contraction, we can get the

tensor T 0. This tensor actually contains the information of 3 tensors in the previous

step. After several iterations, this tensor will get larger and larger. For an infinite

system, it will diverge. To avoid divergence, we should normalize this tensor after

each iteration. One of the choices of the normalization factor is the largest value of √ dn. There are some subtle parts about the selection of this scaling factor, we will discuss it in detail in the 3-D case, see section 3.4.5. Similar normalization factor had been noticed by[31].

32 To calculate the partition function, we assume this system have N tensors. Be-

cause of the geometry of a kagome lattice, N tensors correspond to 1.5N spins.

X −βH(σ) X Z = e = TijkTjpqTkabTkmn.... (3.9) spins indices

Next step is to separate these N tensors into pairs and do SVD and contraction.

The details of this part have been discussed previously.

Let’s denote the original tensor by T0 and the contracted tensor by T1. Due to the translational symmetry of the tensor network, this tensor is the same everywhere in the tensor network. Furthermore, let’s the normalization factor by f1. We have N

N/3 tensors, so the total normalization factor should be f1 , so after one step of iteration

XY N/3 XY Z(β, h) = T0(β, h) = f1 T1(β, h). (3.10) N N 3

Denote the normalization factor of the second iteration by f2, and continue this

iteration, we can get

XY N/3 XY Z(β, h) = T0(β, h) = f1 T1(β, h) N N 3 N N/3 32 XY = f1 f2 T2(β, h) = ... (3.11) N 32 In the end, we’ll get

N N N/3 32 3n Z(β, h) = f1 f2 ...fn .....T r(Tf ), (3.12)

0 Tf is the fixed point of the mapping from T to T . The free energy can be

calculated as

33 1 F (β, h) = − lnZ(β, h) (3 × 1.5N)β 1 1 1 1 = − (lnf + lnf + .... lnf + ... + T r(T )) (3.13) (4.5)β 1 3 2 3n−1 n N f

The factor of 1.5 comes from the kagome lattice. If N goes to infinity,

1 F (β, h) = − lnZ(β, h) (3 × 1.5N)β 1 1 1 = − (lnf + lnf + .... lnf + ...) (3.14) (4.5)β 1 3 2 3n−1 n

These factors are also a function of temperature and magnetic field. One impor- tant thing is that, for an infinite lattice, i.e. when N goes to infinity, this factor has to be carefully chosen so that the limiting value of f neither converges to zero nor

diverges. We assume this limit exists and the limiting f is a nonzero value at almost every point of the parameter space of t and h, except for the critical point.

Numerically, we could evaluate the free energy with and without a small magnetic

field. By taking the numerical differentiation with respect to h, we can get the

magnetization as a function of T .

3.3 Tensor renormalization approach for a 3-D system

3.3.1 Dual polyhedron

In this section, we consider the problem of how to contract a 3-D cubic tensor network under tensor RG. The concept of dual polyhedron[32] can be used to

illustrate this process.

For some polyhedrons in 3 dimension, the dual polyhedron can be defined by

converting each face into a vertex and each vertex into a face. Therefore, for a cube,

34 the dual polyhedron will be an octahedron. Again by converting each face of the octahedron into a vertex, we will get a cube. This process is illustrated in Figure 3.8.

The dual of a dual polyhedron is the polyhedron itself.

Figure 3.8: Dual polyhedron for a cube. The dual polyhedron of a cube is an octahedron. The dual polyhedron of an octahedron is a cube.

3.3.2 Tensor renormalization group (TRG)

Let us apply the concept of dual polyhedron to our cubic tensor network. For the

first step, we can separate the cubic tensor network by grouping together 8 tensors of one cube and transform it into an octahedron of 6 tensors. For the second step, for each tensor of the octahedron, we can do an SVD, then sum over the octahedron, we’ll get our renormalized tensor T 0. This process is shown in Figure 3.9.

35 Figure 3.9: Graphical representation of the Tensor RG process, from T to T 0. We start from a cubic tensor network T . Eight T tensors form a cube and this cube has 24 external tensor legs. We can view our space as a stacking of this tensor cube. Our first step is to change from the T tensor network to the S tensor network(we have marked the relative positions between T and S. Same type of tensor is marked with the same color.). The external legs of the T tensors form part of a cube which surrounds tensor S, and this cube is contracted to cube S. The second step is to contract 6 S tensors to construct a new tensor T 0. Detailed implementation of these two steps are explained in section 3.3.4 and3.3.5.

36 We start with a partition function which can be represented by a cubic tensor network. X ZIsing = TabcxyzTab0c0x0y0z0 Tabc0x0y0z0 .... (3.15) indices The goal is to find an iterative way to simplify this partition function and write it in terms of T 0.

X 0 0 0 ZIsing = TabcxyzTab0c0x0y0z0 Tabc0x0y0z0 .... (3.16) indices

3.3.3 Canonical polyadic decomposition (CPD)

6 In our tensor network, each tensor can be written as Tabcxyz, which have D ele- ments, where D is the bond dimension. Our starting element of TRG is a cube which has 8 tensors. For each tensor, we label outgoing indices(with respect to the cube) and by abc and the internal indices are labeled by xyz. Therefore we can change the tensor from a tensor with 6 indices Tabcxyz to a tensor with 3 indices T(ax)(by)(cz) = Tmnp, see

Figure 3.10, where (ax) is treated as one index . This tensor has (D × D)3 elements.

Notice that we are just relabeling the tensors in order to do decompositions, so ax and m are equivalent.

Now we introduce a new way of transforming tensors in tensor network theory.

This method is called ”tensor rank decomposition” or ”canonical polyadic decomposition (CPD)” We call it CPD for short in this article. It can be regarded as a tensor generalization of the widely used singular value decomposition (SVD).

For a three-way tensor, CPD can be written as

X Tmnp = λrArm ⊗ Arn ⊗ Arp. (3.17) r 37 Figure 3.10: Relabel Tabcxyz to Tmnp (ax) ↔ m,(by) ↔ n,(cz) ↔ p

This can be regarded as a three-dimensional generalization of the SVD, which, for a matrix, is

X ∗ Tmn = λrArm ⊗ Arn = USV . (3.18) r

S is the singular value matrix and the singular values are λr, U and V correspond to Arm and Arn.

The rank of an n-way tensor can be defined as the minimal value of r where the following expression is exact.

X Tmnp...z = λrArm ⊗ Arn ⊗ Arp · · · ⊗ Arz. (3.19) r We can do a minimal square fitting for any integer value of r. Rank corresponds to the minimal r when this composition is exact. When r is larger than the rank of

38 the tensor, we may have multiple solutions. When r is smaller than the rank, we can

fix r and find the least square approximation by minimizing ||T − M||2.

X Tmnp...z ≈ λrArm ⊗ Arn ⊗ Arp · · · ⊗ Arz = M. (3.20) r A detailed review of tensor decomposition could be found in[33]. Currently, there

is no good way to find the rank of an arbitrary tensor[33]. For a three-way tensor,

however, an upper and lower bound can be set as[34]

max(I, J, K) ≤ rank(Tmnp) ≤ min(IJ, JK, KI). (3.21)

Here Tmnp is a I × J × K array. For example, for a 4 × 4 × 4 tensor T, which is

the tensor that we’ll use in our calculation, the rank of this tensor should be between

4 and 16.

We should also notice the difference between CPD and higher order singular value

decomposition(HOSVD). There are two major ways to decompose a tensor, one is

CPD, the other is HOSVD (Tucker decomposition). These two techniques are already

used in subjects like signal processing etc. Compared with CPD, HOSVD can be

written as

X Tmnp...z = λαβγ...ωAαm ⊗ Aβn ⊗ Aγp · · · ⊗ Aωz. (3.22) r We can see in CPD λ are numbers while in HOSVD λ is a tensor. CPD and

HOSVD can be regarded as two different ways of generalizing matrix SVD. The difference is how to understand the singular values (numbers or matrices) in an SVD.

Applications of HOSVD to Ising models can be found in[26].

39 3.3.4 TRG in detail: From cube to octahedron

The idea of this process can be get from the process of changing a cube to an octahedron shown in Figure 3.11. We can think of the tensor network as the stacking of these basic elements. The letters represent different types of tensor network that will be discussed later. External legs are omitted.

Figure 3.11: Truncation sequence from a cube to an octahedron. Letters represent the corresponding tensor network. Edges correspond to the connections in the tensor network. Vertices correspond to the places where the tensors live. Each face has 4 perpendicular external legs (except for S), these legs are omitted in the picture.

40 To continue TRG, we may start from the three-way tensor constructed from the

Ising model, and apply a CPD on this tensor. We can choose the number of singular

values to be a number k and make a least square fitting.

X Tmnp ≈ λrArm ⊗ Arn ⊗ Arp. (3.23) r

The number of λr to keep should be determined by balancing the computational cost and accuracy. The expression above is exact when it is summed to the rank of the tensor.

The CPD is not unique in general, under some conditions it can be unique up to rescaling and change of basis. A detailed discussion of the uniqueness of CPD can be found in[34].

We can furthermore absorb λ in to A matrices by multiplying the cubic root of √ 0 3 λr into each Ar,Arm = λrArm, so we can write

X 0 0 0 Tmnp ≈ Arm ⊗ Arn ⊗ Arp. (3.24) r According to the definition of the outer product 0⊗0 of a tensor, for each specific

0 0 m, n, p. The value of Tmnp should just take the numerical product of Arm and Arn

0 and Arp. In order to get rid of the tensor product symbol and write it in the formalism of tensor network, we can convert the previous equation into, see Figure 3.12.

X 0 0 0 Tmnp ≈ AµνmAνρnAρµp. (3.25)

where tensor A0 is not zero only when µ = ν = ρ. This expression is identical to the previous one, although we can have a straightforward graphical correspondence

41 for this expression. We have to point out that a CPD may not be symmetrical, therefore we may need to keep track of three different A0 tensors inside a cube.

Therefore our tensor networks can be written in terms of A0, this process is shown in Figure 3.12.

X Z ≈ A0 ··· A0. (3.26)

Figure 3.12: CPD from T tensors to A tensors. CPD changes our network from T tensors to A tensors, some external tensor legs are omitted for simplicity, although they are there.

These A0 matrices can be paired up by summing over every bond on the edges of the cubes, see Figure 3.13. We can label these summed matrices(tensors) as B and

X 0 0 Bαβaµνb = AαµaxAβνbx. (3.27) x

42 Figure 3.13: The contraction from SVD from A to B to C

These B tensors form a tensor network of cuboctahedron, which have 8 trian- gular faces and 6 square faces, see Figure 3.14 (a).

X X 0 0 Bαβaµνb = Bkl = CαβayCµνby = AαµaxAβνbx. (3.28) y x

This B tensor can be written in terms of a matrix Bkl by grouping together indices.

Then we can do an SVD (see Figure 3.13) and separate it into two different matrices and label them by C, see Figure 3.14 (b).

X Z ≈ C ··· C. (3.29)

In the expressions above, 8 C matrices that form a cube can be grouped together and summed over and therefore we get another matrix and it is labeled by S, see

Figure 3.14 (c).

X S = CCCCCCCC. (3.30) cube Notice these S matrices have 8 indices. The number of values in each index

depends on the cut-off imposed on the singular matrix in our previous SVD.

43 (a) (b)

(c)

(d)

Figure 3.14: Transformation of the tensor network

(a) CPD changes our network from T tensors to A tensors, some external tensor legs are omitted for simplicity, although they are there. (b) B tensor network (cuboctahe- dron). The T network is drawn in thin lines for reference, some lines are omitted for simplicity. (c) Contraction over 8 C tensors and S. (d) S tensor network. S tensors live on octahedrons. This is the dual tensor network of the cubic T tensor network. Each S tensor have 8 legs, but we draw 4 of them for clearness.

44 Now we have done a transformation from a cubic tensor network T to octahedron tensor network S, see Figure 3.14 (d). Notice the length of the edge of the octahedron √ is 2 times larger than that of the cube.

X Z ≈ S ··· S. (3.31)

In this dual tensor network, each tensor S have 8 legs. The bond dimension in each leg is determined by the SVD in matrix B. Each tensor S are connected 8 other tensors. Figure 3.15 shows how these octahedrons are stacked together.

Figure 3.15: Stacking of the octahedrons in the dual space.

We’d like to point out that this 3-D cubic tensor renormalization group approach could be projected to a 2-D plane that parallels to the faces of the 3-D network.

The patterns of this projection exactly correspond to the renormalization approach

45 to a 2-D square tensor network proposed in [6]. This is the reason that we say our framework is a generalization of TRG. This correspondence is shown in Figure 3.16.

The easy thing for a 2-D square network is that the dual of a square is still a square.

The orientation of the dual square is changed, but when being converted back, the dual of a dual square have the same orientation as the original one.

Figure 3.16: The projection of 3-D model to 2-D model

We should give a short mathematical background review of this process. The process shown above, in the aspect of polyhedron geometry, is called the truncation sequence of a polyhedron. Our sequence starts from a cube, then goes through truncated-cube, cuboctahedron, truncated-octahedron, to octahedron. The cubocta- hedron is called rectified or complete-truncation. The cube and the octahedron are two of the 5 Platonic solids (convex regular polyhedron). The truncated-cube, cuboc- tahedron, truncated-octahedron are part of the so-called Archimedean solid when the lengths of its edges are equal.

46 For 3-D Platonic solids another two sequences exist. Icosahedron and dodecahe-

dron truncate to each other. A tetrahedron rectifies to an octahedron and truncates

to itself.

3.3.5 TRG in detail: From octahedron back to a cube

The next step is to transform back to the cubic tensor network from the dual

space (octahedron tensor network). This step is straightforward, and it involves with

only one step of SVD. We separate 8 indices of the S tensor network into 2 parts and

treat each part as a single index. Then S becomes a matrix and we can do an SVD on it, see Figure 3.17.

We should impose a cut-off on singular values. In order to make the renormaliza- tion process iterate, the bond dimension should stay the same. This is a requirement of renormalization.

X S(abcd)(xyzw) = D(xyzw)rD(abcd)r. (3.32) r

Figure 3.17: SVD of the dual tensor network S.

47 The last step is summing over the D tensors. 6 D tensors live on an octahedron and they can be summed over to get a renormalized tensor T 0, see Figure 3.18. Finally, we can express our partition function in terms of T 0.

Figure 3.18: The renormalized tensor network T 0

X 0 0 0 ZIsing = TabcxyzTab0c0x0y0z0 Tabc0x0y0z0 .... (3.33) indices Due to the summation, the norm of this tensor is getting larger. In order to restore the original tensor and make this system scale invariant, we need to rescale this tensor by dividing some factors. In fact, for a 3-D model, we could do a rescale on both the tensor space and its dual space. Details of this rescale will be discussed later.

In summary, the renormalization of a cubic 3-D tensor network can be realized by

CPD, SVD, and contraction. We are writing our partition function in terms of T -type tensors, A-type, B-type, C-type, S-type, D-type, and T 0(T )-type tensors. Rescaling

48 should be done on both S-type and T -type tensor, i.e. the original space and the

dual space.

3.4 Numerical results of the 3-D cubic tensor network model

3.4.1 Ising model and tensor network

We’d like to test our framework by considering Ising model with spins that live

on bonds. Both the cubic Ising model with spins that live on vertices and the Ising

model spins that live on bonds can be represented by a 3-D cubic tensor network. We

call them vertex Ising model and bond Ising model respectively, see Figure 3.19.

A bond Ising model requires less tensor bond dimension. For example, in Na3N, the

Na atom have a natural stacking as a 3-D bond Ising model. Therefore it is also

practical to discuss the bond Ising model.

Figure 3.19: Vertex Ising model and Bond Ising model

49 3.4.2 Calculation framework

We realize our tensor network model using MATLAB. The SVD function is embed- ded in MATLAB. The CPD calculation is done using the MATLAB Tensor Toolbox

Version 2.6[35] developed by Bader and Kolda et al. CPD had been developed as a workable MATLAB code due to its wide application to multilinear problems in chemo- metrics, signal process, neuroscience and web analysis[36]. Different algorithms have been provided by this package, such as Alternation Least Square(ALS) optimization and Gradient-based optimization. methods[36]. In our code, we use Alternation Least

Square(cp als).

Based our description, our code consists of 6 types of tensors, T , A, B, C, S,

D, T 0. They are related by CPD, contraction, SVD, contraction, SVD, contraction, respectively. For the contraction part, we need to contract over a bond, a cube, and an octahedron respectively.

3.4.3 Calculation steps: tensor size and cutoff

First, we need to set up our initial tensor from the Hamiltonian. Our initial tensor is based on an Ising model with spins living on bonds. For each vertex, the spins form an octahedron. We only consider the nearest neighbor interaction, the interaction terms can be represented by the edges of the octahedron.

The initial tensor can be represented as a tensor with 6 indices.

P 1 P6 βJ( aiaj )+ βµh ai T0(h, T ) = e ij 2 i=1 . (3.34)

We may choose µ = 1, J = 1, then our initial tensor is a function of temperature and the magnetic field.

50 To be specific, we start from a model with spins taking two values, ±1. Therefore

our initial tensor is 2 × 2 × 2 × 2 × 2 × 2. We group together 2 indices, then our tensor is Tijk has a dimension of 4×4×4. After setting up the initial tensor, we can do CPD on this tensor. For the reason of computational time and details of the algorithm, we choose only the largest value of CPD. It is obvious that this choice will not work for the zero magnetic field case, due to the form of the tensor at zero magnetic field, which has a Z2 symmetry.

In our calculation, we take the largest value for CPD, and 2 largest singular values for SVD. A cut-off is imposed for the purpose of renormalization. Our cut-off stress the speed of the iteration and our calculation is done on a personal computer.

Section 3.4.5 and section 3.4.6 is about the rescaling and free energy calculation.

Readers who are not interested can skip them.

3.4.4 Accuracy of CPD

In this section, we’d like to access the accuracy of CPD. A test is done on a 4×4×4 tensor, which is the starting point of our iteration.

To achieve this goal, we verify CPD, by comparing the value of the trace of a single tensor with our without CPD (direct calculation) to see how much error can be brought by CPD. We plot the error of CPD as a function of r, i.e. how many

spectrum values are kept in CPD.

X Tmnp = λrArm ⊗ Arn ⊗ Arp. (3.35) r We estimate the error under some combinations of T and h, see Figure 3.20. For a nonzero h, the lowest order could have a truncation error of about 10%.

51 (T,h)=(2,0) (T,h)=(4,0) 100 100

10-5 10-5 error error 10-10 10-10

10-15 10-15 0 10 20 30 40 0 10 20 30 40 r r (T,h)=(4,2) (T,h)=(4,4) 100 100

10-5 10-5 error error 10-10 10-10

10-15 10-15 0 10 20 30 40 0 10 20 30 40 r r

Figure 3.20: Semi-log diagram of the percentage error as a function of r under different magnetic field and temperature using cp als.

The y axis is the log of the percentage error |(Z − Z0)/Z| and x axis is the log of r. r = 1, 2, 4, 8, 16, 32. (T, h) = (2, 0); (4, 0); (4, 2); (4, 4) respectively. The error decreases with the increase of r. We notice that the error may increase when r is larger that the theoretical rank of the tensor. The fitting will still be accurate since it is around the numerical truncation error.

52 3.4.5 Rescaling of the tensor network and the dual tensor network

For the reason of simplicity, we’d like to use another notation to describe the

asymptotic behavior. The observation is that our Hamiltonian is asymptotically

scale-invariant.

After proper cut-off is set on the tensors, similar to the 2-D case, we still need

to rescale the tensors. We should notice that our new tensor T 0, after one step of

RG, contains the information of 8 tensor T s. Therefore if we do not do a rescale, the

tensor T will blow up.

We’ll use the argument from dilation invariance to find the scaling factor. Under

dilation, the Hamiltonian transforms as

H(λβ) ∼ λH(β). (3.36)

As a result, for the tensor, we have

||T (λβ)|| ∼ ||T (β)||λ. (3.37)

0 For a 2-D honeycomb tensor network, we have a contraction over a triangle Ts = TTT , we use the subscript s to denote the tensor before scaling. Then

0 ||Ts(β)|| ∼ ||T (β)||||T (β)||||T (β)||. (3.38)

In order for our new tensor T 0 to have the same dilation relation, we need to divide

it by a factor f, where f ∼ T 2, which have the same order with the largest singular

value in SVD.

0 0 ||T (β)|| ∼ ||Ts(β)||/f. (3.39)

53 Then we’ll have

||T 0(λβ)|| ∼ ||(T 0(β))||λ. (3.40)

If we don’t impose a rescaling, the tensor gets bigger and bigger. A natural way to select this factor would be the largest singular value of SVD. The reason for us to do the rescaling is that we need to keep iterating, and T 0 and T should be physically equivalent. The scaling factor is extracted and it contributes to the free energy, which has been shown previously.

For the 3-D case, this becomes complicated. The point is that we should do the rescaling on both in the tensor space and the dual tensor space to make sure that the tensor has the same scaling property in both spaces. Although mathematically we may do rescaling only in the tensor space, if we make our code like this, the tensors will oscillating and eventually blow up.

For 3-D cubic tensor network,

||T (λβ)|| ∼ ||T (β)||λ. (3.41)

So

||A(λβ)|| ∼ ||A(β)||λ/3. (3.42)

||B(λβ)|| ∼ ||B(β)||2λ/3. (3.43)

||C(λβ)|| ∼ ||C(β)||λ/3. (3.44)

54 Then contract over a cube with 8 C tensors

0 8λ/3 ||Ss(λβ)|| ∼ ||Ss(β)|| . (3.45)

to keep linearity, we should rescale by S5/3

||S(λβ)|| ∼ ||S(β)||λ. (3.46)

A practical choice of the factor could be based on the first SVD step, if we denote the square root of the largest singular value of the SVD of B by f1. We know that

1/3 5 f1 ∼ S . So the first scaling factor would be f1 . Then after SVD, we have

||D(λβ)|| ∼ ||D(β)||λ/2. (3.47)

Contract over an octahedron, which has 6 D tensors.

0 0 3λ ||Ts(λβ)|| ∼ ||Ts(β)|| . (3.48)

0 2 So we need to divide Ts by T . Similarly, if we denote the square root of the

1/2 largest singular value of the SVD of S by f2, since f2 ∼ T , we need to rescale by

4 f2 . Then

||T 0(λβ)|| ∼ ||T 0(β)||λ. (3.49)

55 Our overall rescaling factor f for the cubic Ising tensor network would be

5 4 f = (f1 , f2 ) = (a, b). (3.50)

Numerically, based on our test on the program, (5,4) are the correct value to make iteration converge. Other powers would either cause the tensor goes to zero, or diverge.The notation (a, b) will be used in the next section.

3.4.6 Derivation of free energy

For a cubic tensor network system, we want to calculate the partition function.

X X Z = e−βH(σ) = T T T T.... (3.51)

Let’s assume we have N tensors (3N spins). After converting to the dual tensor

3N N 0 network, we have 8 of S tensors, then after one step of RG, we have 8 of T tensors.

3N 3N N XY 8 XY 8 8 XY 0 Z = T = a1 S = a1 b1 T = 3N N 8 8 3N N 3N 3N N 3N N 8 8 82 XY 0 8 8 82 82 XY 00 a1 b1 a2 S = a1 b1 a2 b2 T = .... (3.52) 3N N 82 82 In the end, if we have a very large finite N, we’ll get

3N N 3N N 3N N 8 8 82 82 8n 8n ∗ Z = a1 b1 a2 b2 ...an bn .....T r(T ) (3.53)

The free energy per unit spin can be calculated as

1 1 3 1 F = − lnZ = − ( lna + lnb + (3N)β (3)β 8 1 8 1 3 1 3 1 lnT r(T ∗) lna + lnb + .... lna + lnb + ... ) (3.54) 82 2 82 2 8n n 8n n N

56 The factor of 3 comes from the fact that each T corresponds to 3 spins. If N goes

to infinity,

1 1 3 1 F = − lnZ = − ( lna + lnb + (3)β (3)β 8 1 8 1 3 1 3 1 lna + lnb + .... lna + lnb ...) (3.55) 82 2 82 2 8n n 8n n

Here ai(bi) is a function of temperature and magnetic field, so ai = ai(T, h),bi =

bi(T, h). So we have a function of the free energy as function of h and T , so we can

calculate physical quantities based on the free energy. These results are similar to

2-D.

Practically, the f factor converges very fast. So we actually take n = 4 and the free energy is approximated by

1 1 3 3 3 F = − lnZ = − ( lna + lna + lna + (3)β (3)β 8 1 82 2 83 3 3 1 1 1 1 lna + lnb + lnb + lnb + lnb) (3.56) 83 × 7 8 1 82 2 83 3 83 × 7

3.4.7 Numerical results

We do the calculation for a 32 × 32 × 32 cubic tensor network which corresponds

to 5 TRG steps. We also compare this tensor results with the Monte Carlo results.

The calculation cost scales with the number of sites for the Monte Carlo method,

while for TRG, it scales with the number of iterations. The contraction part of TRG

takes most of the computational time. The tensor contractor NCON[37] is used to

carry out this part.

The CPD values are kept at 2. The alternate least square method is used to

conduct the decomposition. The comparison is carry out at h = 0, h = 0.5, h = 1.0

57 and h = 1.5. See Figure 3.21 and Figure 3.22. The curved lines correspond to the

TRG results and the dotted lines correspond to the Monte Carlo results. Monte Carlo results are carried out using the Metropolis algorithms.

Graph of Magnetization as a function of temperature for bond ising model 1

TRG h-1.5 0.9 MC h=1.5 MC h=1.0 MC h=0.5 0.8 TRG h=1 TRG h=0.5

0.7

0.6

0.5

0.4 Magnetization

0.3

0.2

0.1

0 0 2 4 6 8 10 12 14 Temperature

Figure 3.21: Magnetization as a function of temperature at different h. When the magnetic field gets higher, the truncation error gets lower, therefore the results agree better.

58 Graph of Magnetization as a function of temperature at h=0 1.2 MC h=0 TRG h=0

1

0.8

0.6

0.4 Magnetization

0.2

0

-0.2 0 2 4 6 8 10 12 14 Temperature

Figure 3.22: Magnetization as a function of temperature at h = 0. We notice that for overall bond dimension D = 2, 3D-TRG blows up at around criticality. The initial condition for CPD iteration is set at (1, 0.5, 0.5, 2) using Tensor Toolbox. The CPD is carried out by consecutive best rank-1 approximation to achieve stability. At criticality, we see a peak, which is not accurate, therefore it is not drawn.

59 In these figures, the initial tensor bond dimension starts at 2. The bond dimension

for CPD is kept at 2 (where 16 is about the theoretical rank). 4 singular values (out

of 8) are kept for the first SVD step and 2 singular values (out of 16) are kept at

the second SVD step in order for the renormalized tensor to have the same bond

dimension.

We should notice that our current numerical results agree well with the Monte

Carlo results at large magnetic field. The reason is that, at a large h, the largest

singular value or CPD value carries most information of the tensor. Therefore at a

high h, even if we keep one CPD value, we will be able to accurately calculate the

magnetization. When h = 0, due to the symmetry, the singular values spread out,

therefore larger bond dimensions are needed.

We should notice that the currently bond dimension is not accurate at around

criticality. The singular value spread out at criticality under the current bond di-

mension. Using the numerical differentiation, we will have a small free energy at zero

magnetic field, therefore it will lead to a large magnetization. At criticality, under

current bond dimension, what we will see is a peak for magnetization, which is a result

of the previous reason. This peak decreases when the bond dimension is increased.

But would not disappear under our computational ability.

3.4.8 Current restrictions of 3D-TRG method

The most time-consuming part comes from tensor contraction from tensor C to tensor S. We are summing over a cube with 12 edges and 8 external legs. On a PC,

NCON can deal with bond dimension 6, while a direct for loop contraction can only deal with 2 bond dimensions. The accuracy depends greatly on how well can the

60 truncation error be controlled. Assuming we keeping the same bond dimension for all the tensors, the time is proportional to D20, D is the bond dimension. Therefore we would say that model with larger bond dimension is not currently verified and there are many possibilities, which need further developments. We do have to be careful that when the bond dimension gets large, the truncation error may still get bigger, due to the fact that higher dimension tensor has more legs, therefore, increase the truncation error.

Another potential restriction is that the CPD optimization problem is not convex, therefore an initial condition has to be carefully chosen.

Although a systematical application of this method depends greatly on the details and mathematical understandings of CPD, the right algorithms to use and the compu- tational ability of computers, we should not underestimate the importance of tensor network method, since SVD is well-controlled. CPD could also be well-controlled, and it is a non-perturbative method and may be applied to any systems in any dimension since our framework is exact without truncation.

3.5 Tensor RG for higher dimensional tensor network

We consider an n-dimensional tensor network, which can be mapped to an n-D

Ising model. Knowledge of regular polytopes is needed[38]. Polytopes are the higher dimensional generalization of polyhedrons and polygons. In mathematics, honeycomb means close packing of the polytopes.

This method can be generalized to any higher dimension. For an n-cube hon- eycomb tensor network, we should utilize the topological duality relation between

61 an n-cube and an n-orthoplex. Especially, in 4D, this duality becomes the duality between an 8-cell(tesseract) and 16-cell.

Figure 3.23 shows this topological duality relation in 2, 3, and 4 dimensions.

Figure 3.23: n-cube and n-orthoplex (n=2,3,4) 4D cases credit to Stella Software.

We can see that the in 2-D, 2-cube and 2-orthoplex are all squares. This obser- vation provides a different view of TRG in 2-D, i.e, we are still transforming into the dual space. The interesting part is: the dual space is the same as the original space.

The process of RG should be understood within the framework of the truncation sequence from an n-cube to an n-orthoplex. For example, in 4D, we are going from the original space to its dual space by going through the sequence of 4-cube (8-cell), truncated 4-cube, rectified 4-cube, bitruncated 4-cube, (16-cell). In the language of tensor network, the truncation of an n-cube corresponds to the CPD of an n-way tensor (2n-way before grouping indices), rectification of an n-cube corresponds to sum

62 over the edges, then an SVD is needed to go from the rectified n-cube to a bitruncated n-cube, then after summing over an n-cube tensor, we get the 16-cell dual space.

In order to dual back to the original space, we have to do another SVD on the n-orthoplex tensor network, then contact over the n-orthoplex, we get back to the n-cube tensor network.

Scaling is also needed in n-D, and the corresponding factor can be calculated based on the requirement of converging to a finite value at infinity. Then we can follow the same procedure and calculate the free energy.

We should notice that spacial dimension 4 has different tessellation property than

3-D, and it is similar to 2-D. In 2-D we have 3 different kinds of regular tessellation: triangle, square, and hexagon. In 3-D, however, cubic honeycomb is the only regular tessellation. In 4-D, things get complicated again. We have 8-cell (4-cube) tessella- tion, 16-cell tessellation, and 24-cell tessellation. In 5-D or higher dimensions cubic honeycomb is the only regular honeycomb that can tessellate the entire space.

In 4-D, 120-cell and 600-cell also truncate to each other but they don’t form any regular honeycomb. 4-simplex (5-cell) truncates to itself but it doesn’t form any honeycomb.

In all, our framework can be applied to any regular tessellated Euclidean tensor network (a tensor network that can be represented as the repeating of one regular all-space-filling polytope).

The tensor renormalization group process in any dimension for a regular tessellated network can be illustrated using the truncation sequence of a certain polytope. The truncation process may correspond to the tensor rank decomposition. Converting to the dual tensor network is needed.

63 3.6 Discussion

3.6.1 Applications to quantum system

Our framework is a geometric way to contract a tensor network. It can be applied to both classical model and quantum model. We can use the Trotter-Suzuki formula to convert d a dimensional quantum system to a d+1 dimensional classical system. After the conversion, we can represent the classical model in a solvable tensor network and then do RG on the tensor network to find the thermal-dynamical quantities. Details can be found at[24].

3.6.2 Potential problems of TRG in Higher dimensions

(1) Truncation error.

For 2-D TRG, we are keeping D values of the D×D matrices from SVD, assuming

D is the bond dimension of the matrix. Previous result tells us that the accuracy, of the TRG increases with the bond dimension D. For a 3-D cubic tensor network, we are actually keeping D values of a D4 × D4 matrix (D8 tensor). For an n-D cubic tensor network, we are keeping D values of a Dn−1 × Dn−1 matrices. It’s natural that the truncation error grows with the dimension, therefore the accuracy of TRG gets lower for a higher dimensional system.

(2) Calculation cost.

For TRG in a higher dimensional system, we have to sum over an n-cube and an n-orthoplex. The costs of this summation grow exponentially with the spacial dimension. Programming is also significantly difficult for a higher dimensional system.

Therefore Monte Carlo method is still a good choice for classical systems. For a

64 quantum system, due to the sign problem, TRG looks like a good candidate, but further research still needs to be done in order to test the accuracy of this method.

3.6.3 Is CPD the only choice?

Our starting point of this framework is the tessellation geometry, therefore we should keep the same number of tensor legs as the truncated polytope has. Careful readers may notice that the key equation for a truncation is, (same as equation 3.25).

X 0 0 0 Tmnp ≈ AµνmAνρnAρµp. (3.57)

CPD can be regarded as the easiest way to implement this equation, since for

CPD µ = ν = ρ, and the rank is defined as the smallest number of sums. In general

HOSVD(Tucker decomposition) also satisfies this equation, since we can rewrite our core tensor of HOSVD as a vector. This equation is not restricted to these two cases, since we may have µ 6= ν 6= ρ. For 2-D, this equation reduces to SVD, since we have a matrix to decompose and the corner of the truncation is an edge, while for 3-D it is a face.

This equation can be regarded as an inverse of a tensor contraction, therefore it is very likely that we can have a better way to do this decomposition. It will be interesting to think about other possibilities that are better than CPD. We notice that this direction had been explored by the papers of S-J Ran[28].

3.6.4 CPD and best rank-r approximation

Notice that theoretically, we are using CPD as our decomposition method, al- though numerically we need to pick a fixed number as our rank. This is generally

65 termed as the best rank-r approximation. For 2-D, SVD represents the best rank-r ap- proximation. Currently, there is no general way to find the exact rank of an arbitrary n-way tensor. Therefore, practically, people use the best rank-r approximation as the numerical approximation of CPD. Notice that the tensor rank problem is ill-posed since there exist tensors that have a fixed theoretical rank but it can be approximated arbitrarily close by a numerical rank that is lower than theoretical rank, see details in[39].

66 Chapter 4: Tensor methods for Geometric Measure of Entanglement

4.1 Introduction

Quantum entanglement is an essential concept in quantum physics and quan- tum information. Various measures of quantum entanglement have been proposed to characterize quantum entanglement, such as the Von Neumann entanglement entropy.

The geometric measure of entanglement[7] has recently gained popularity, owing to its clear geometric meaning. The geometric measure of entanglement was first pro- posed by Shimony[40], then generalized to the multipartite system by Barnum and

Linden[41], and finally examined by Wei and Goldbart, who gave a rigorous proof that it provides a reliable measure of entanglement[7].

A large amount of research regarding the properties of geometric entanglement has been performed. For example, properties of symmetric states were discussed using the

Majorana representation of such states[42]. The geometric measure of entanglement has been discussed theoretically, although few practical numerical evaluation methods are available owing to the complicated structure of a quantum state, whose amplitude is a complex-valued function. A simple way to determine geometric entanglement was given in Ref[43], where their method was tested for three or four qubits states

67 with non-negative coefficients. A problem with this method is that although the overlap will converge, it may not converge to the minimal overlap. Recently, a method to calculate of the geometric measure of entanglement for non-negative tensors was proposed by[44]. Our article illustrates a way to numerically calculate the geometric measure of entanglement for any arbitrary quantum state with complex amplitude, which extends the scope of previous numerical methods.

Tensor network theory is currently widely used as a way of simulating physical systems. The idea of tensor network theory is to represent the wavefunction in terms of a multi-indexed tensor, such as the matrix product states (MPS)[45]. Therefore, it is also natural to consider the entanglement within this context. Tensor theory was applied to study quantum entanglement in Ref.[46][47]. Using tensor eigenvalues to study geometric entanglement was discussed in Ref.[48]. The possibility of using tensor decomposition methods to study quantum entanglement was pointed out in

Ref.[49] in the context of Minimal Renyi-Ingarden-Urbanik entropy, of which the geometric entanglement is a special case. The asymptotic behavior of the GME for qutrit systems was studied using the PARAFAC tensor decomposition in Ref.[49]. In this work, we comprehensively study the possibility of using tensor decomposition methods to calculate the geometric measure of entanglement for arbitrary quantum states. Tensor decomposition methods are currently being developed rapidly. By using the new results in tensor decomposition theory, we can not only use the most efficient way to calculate the geometric measure of entanglement but also gain a deeper understanding of the structure of quantum states from the perspective of theoretical tensor decomposition theory.

68 To furthermore demonstrate the efficiency of tensor decomposition methods, we conduct a numerical research for maximally and highly entangled quantum states.

Deep understanding of highly entangled multiqubit states is important for quantum information processing. Highly entangled states, such as the cluster states, could be crucial to quantum computers[50]. Highly entangled states are also key parts of quantum error correction and quantum communication[51]. Therefore, searching for highly entangled quantum states is necessary for the development of quantum information science.

In this article, we first review the concept of geometric measure of entanglement and tensor rank decomposition. Then we point out that the spectrum value for a rank-one decomposition is identical to the overlap of wavefunctions. Our method is capable of calculating an arbitrary (real or complex, symmetrical or non-symmetrical) pure state wavefunction. We also demonstrated that tensor decomposition method can be used to extract the hierarchical structure of a wavefunction. Perfect agreement is found for the examples that we tested. At last, we use this method to character- ize some quantum states. A maximally entangled state that is similar to the HS states is found. In addition, we performed a numerical search for highly entangled quantum states from four qubits to seven qubits. We provide new examples of such states that are more entangled than a few of currently known states under geometric entanglement.

This part is organized as follows. In Section 4.2, we mainly focus on the theoret- ical aspects of tensor theory and entanglement theory. In Section 4.3, several known

69 examples are calculated to demonstrate the effectiveness of the tensor rank decom- position method. In Section 4.A, maximally and highly entangled states are searched and discussed.

4.2 Geometric measure of entanglement and tensor decom- position

4.2.1 Geometric measure of entanglement

The geometric measure of entanglement for multipartite systems was comprehen- sively examined by Wei and Goldbart in Ref.[7]. Following their notations, we start from a general n-partite pure state

X 1 2 n |ψi = χp1,p2...pn |ep1 ep2 . . . epn i. (4.1) p1,...pn

Define a distance as

d = min k|ψi − |φik, (4.2) |φi

where |φi is the nearest separable state, which can be written as

n l |φi = ⊗l=1|φ i. (4.3)

|φli is the normalized wavefunction for each party l. A practical choice of the norm could be the Hilbert–Schmidt norm, or equivalently the Frobenius norm for a tensor, which equals the squared sum of the modulus of the coefficients.

The geometric entanglement can be written as

E(|ψi) = 1 − |hψ|φi|2. (4.4)

It was proved by Wei and Goldbart in Ref.[7] that this measure of entanglement satisfies the criteria of a good entanglement measure.

70 We can write a wavefunction in the language of tensor. A general n-partite pure state can be written as

X |ψi = Tij...k|ij . . . ki. (4.5) i,j,...k We use tensor T to describe a quantum state and the Frobenius norm of this tensor kT k = 1. The label i, j, k goes from one to the dimension of the Hilbert space of each party.

A direct product state can be written as

|φi = ai|ii ⊗ aj|ji · · · ⊗ ak|ki. (4.6)

ai|ii is a normalized wavefunction for party i, here Einstein summation convention is used.

After writing the wavefunction in the language of tensors, we can use the tech- niques from tensor decomposition theory to calculate the geometric measure of en- tanglement.

4.2.2 Tensor decomposition

In general, a tensor decomposition method decomposes a tensor into the direct products of several smaller tensors. Moreover, there are two major ways to decompose a tensor.

One way is the ”Tensor Rank Decomposition” or ”Canonical Polyadic Decompo- sition”. For an n-way tensor, the Tensor Rank Decomposition can be written as

X Tmn···p = λrarm ◦ arn · · · ◦ arp. (4.7)

71 The minimal value of r, that can make this expression exact, is called the rank of

this tensor. The Tensor Rank Decomposition can be physically understood as the

decomposition of a multipartite wavefunction into the sum of the direct products of

the wavefunction from each part. The dyadic product notation ”◦” is used, which means that we treat the product as a single tensor.

Another way to decompose a tensor is the Tucker Decomposition. In some articles, it is called ”Higher Order Singular Value Decomposition (HOSVD)”. It can be written as X Tmnp...z = λαβγ...ωaαm ◦ aβn ◦ aγp · · · ◦ aωz. (4.8)

The Greek letters α, β, γ., ..ω are arbitrarily fixed integers.

These two decomposition methods can be regarded as the tensor generalization of the widely used Singular Value Decomposition (SVD) for a matrix.

r X ∗ Tmn = λiaim ◦ ain = USV . (4.9) i=1 S is the singular value matrix.

Since matrix S is diagonal, different understandings of this singular matrix can lead to different decomposition methods. A detailed discussion of tensor decomposi- tion methods can be found in Ref.[33].

The objective function of a rank–k approximation of a tensor can be written as

k X d = min kTmn···p − λiaim ◦ ain · · · ◦ aipk. (4.10) i=1

While for the Tucker decomposition, we can also fix the index range of λαβγ...ω and minimize the norm.

72 When we restrict our λ to be a single scalar for both the Tucker Decomposition and the Tensor Rank Decomposition, these two approximations become the same. In another word, they have the same rank–1 decomposition. Therefore, our objective function becomes

d = min kTmn···p − λam ◦ an · · · ◦ apk. (4.11)

For general quantum states, these tensors and vectors are defined on the complex

field C. Notice that this objective function has the same form as in our definition of geometric measure of entanglement with Tmn···p = |ψi and λam ◦ an · · · ◦ ap = |φi

From a geometric argument, if kTmn···pk = 1 and kam ◦ an · · · ◦ apk = 1, then our claim is

λ = hψ|φi. (4.12)

This can be understood intuitively: since kTmn···pk is a unit vector in our space, and for a fixed kam ◦ an · · · ◦ apk with unit length, kλam ◦ an · · · ◦ apk is a line in our vector space (m × n × · · · p dimensional), when we vary λ. Therefore, our minimiza- tion problem can be geometrically understood as finding the minimal perpendicular distance from all the possible direct product lines in the space. Since both vectors are unit vectors, λ must equal the angle between these two vectors. Understanding quantum mechanics in the context of geometry has been pointed out in Ref.[52]. Then our geometric entanglement is

E(T ) = 1 − |λ|2, (4.13)

which is expressed in the language of a tensor.

73 Tensor decomposition methods have been existing the scientific computing society

for some time, moreover, they have been applied to different fields such as statistics

and signal processing etc.[33].

4.2.3 Numerical algorithm

There are numerous algorithms that can be used for both the Tensor Rank and

Tucker decomposition. The Alternate Least Squares algorithm is one of the most

popular approaches. We will not discuss the details of the algorithms here. A complete

survey of the algorithms can be found in Ref.[33] and one of the Alternating Least

Squares algorithm for Tucker decomposition was given in Ref. [53].

There are also numerous existing code packages that can be utilized on different

coding platforms, such as C++ or MATLAB etc. In this article, we use the MATLAB

tensor toolbox 2.6 developed by Sandia National Laboratories[35]. This package is

already developed and available online.

We must point out a few important facts about the numerical results. Theo-

retically, both tensor rank decomposition and Tucker decomposition can be used to

perform the calculation. In reality, however, some codes are actually written for the

set of real numbers R. We need to work in the domain of complex numbers C in order to be able to represent an arbitrary wave function. Note that different vector spaces will lead to different optimization results. In addition, the decomposed wavefunction may not be normalized. A practical choice here would be the Alternate Lease Squares algorithm for the Tucker Decomposition (tucker als) provided in the toolbox.

The Alternate Lease Squares Tucker algorithm involves the following parameters:

(i) The initial tensor, i.e., the tensor that is used to represent quantum states. (ii)

74 The core of the Tucker decomposition, which can be a tensor with any dimension. (In the case of the best rank one approximation or geometric entanglement, this tensor is just a scalar.) (iii) An optional initial condition, which is used to initialize the iteration and could be set at random. (iv) Optional iteration control parameters.

After proper normalization of the initial tensor, the output: the best-fitted core scalar is the maximal overlap and the fitted tensors are the corresponding direct product states. This function implements the well-known Higher Order Orthogonal

Iteration (HOOI) algorithm for the Tucker approximation, which behaves better than the previous naive HOSVD algorithm[33]. The details of this algorithm are non- trivial, and can be found in Ref.[54]. The original HOOI paper was formulated in terms of a real tensor, but as pointed out by the authors of Ref.[54], this algorithm equally applies to a complex tensor. Moreover, our numerical study also shows the applicability to quantum states with complex amplitudes.

From the viewpoint of tensor decomposition theory, we can see that previous work about the numerical evaluation of geometric entanglement[43] is a special case of a naive HOSVD algorithm, which was used at an early stage of Tucker decomposition.

The problem with the naive HOSVD in Ref. [43] is that although the wavefunction overlap converges, the converged overlap value may not be the minimal overlap in the Hilbert space, see section 4.2 in Ref. [33] . The HOOI algorithm is designed to minimize the norm and, therefore, it gives the correct result for the geometric measure of entanglement. Another practical point is that the solution may not be unique, and the result may be trapped in locally minimal state[33] . Therefore, for consistency, it is better to examine the initial conditions for all the calculations.

75 4.3 Numerical evaluation of the geometric measure of en- tanglement using Alternate Least Square algorithm

4.3.1 Geometric measure of entanglement for symmetric qubits pure states

We would like to benchmark the results given by Wei and Goldbart in Ref.[7].

Considering a general n qubit symmetric state

r k!(n − k)! X |S(n, k)i = |0 ··· 01 ··· 1i. (4.14) n! permutations k is the number of |0is, and n − k is the number of |1is.

The overlap is given by s n! k k n − k n−k Λ = ( ) 2 ( ) 2 . (4.15) k!(n − k)! n n

In Table 1, we use Λ to denote the theoretical results and the λ to label the numerical ones.

We test the overlaps for both methods up to 6-partite systems, i.e. 6-way tensors.

Agreements are found.

4.3.2 Geometric measure of entanglement for combinations of three qubits W states

Assuming we have a superposition of two W states

√ √ √ √ |ψi = s|S(3, 2)i + 1 − seiφ|S(3, 1)i = s|W i + 1 − seiφ|Wfi. (4.16)

We can gauge out the factor φ by changing basis without affecting the entangle-

ment. The geometric measure of entanglement of this state is given by, see Ref.[7]

76 Table 4.1: Overlaps for n-partite qubit systems n value k value Λ theoretical λ numerical 4 0 1 1.0000 1 0.6495 0.6495 2 0.6124 0.6124 3 0.6495 0.6495 5 0 1 1.0000 1 0.6400 0.6400 2 0.5879 0.5879 3 0.5879 0.5879 4 0.6400 0.6400 6 0 1 1.0000 1 0.6339 0.6339 2 0.5738 0.5738 3 0.5590 0.5590 4 0.5738 0.5738 5 0.6339 0.6339

Comparison between theoretical value Λ and the calculation using tensor decomposition λ. Alternate Lease Square method for Tucker decomposition is used.

77 E = 1 − Λ2. (4.17)

With (notice that there is a typo in[7] for this equation) √ 3 √ √ Λ = [ scosθ(s) + 1 − ssinθ(s)]sin2θ(s). (4.18) 2

t = tanθ, where t is the real root of the equation

√ √ √ √ 1 − st3 + 2 st2 − 1 − st − s = 0. (4.19)

Perfect agreement is found, see Figure 4.1.

For a general complex wavefunction

√ |ψi = |W i + 1 − seiφ|Wfi. (4.20)

Tensor decomposition method can indeed capture the complex φ factor and reflect the fact that this factor does not affect the entanglement. See Figure 4.2.

4.3.3 Geometric measure of entanglement for d-level system (qudits)

Up till now, the index of our tensor has a range of two, which corresponds to a qubit system. We can obviously use a tensor that has a larger index range which corresponds to a d-level system.

For example, we have a symmetric state with n parts, for simplicity we assume that one part is in state d − 1, the other parts are all in state 0, and our wavefunction is a symmetric sum of all these possible state. r (n − 1)! X |S(n, d)i = |0 ··· 0(d − 1)i. (4.21) n! permutations

78 Entanglement as a function of parameters s 0.5 Numerical Theoretical

0.45

0.4

0.35 Entanglement

0.3

0.25

0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 s

Figure 4.1: Entanglement as a function of s using tensor decomposition.

79 Entanglement as a function of parameters s and phi

0.5

0.45

0.4

0.35 Entanglement

0.3

0.25 1

0.8 8 0.6 6 0.4 4 0.2 2 s 0 0 phi

Figure 4.2: Decomposition for complex tensors. Entanglement for two parameters using tensor decomposition, entanglement is not affected by φ.

80 Table 4.2: Overlaps for n-partite qudit systems n value d value Λ theoretical λ numerical 4 4 0.6495 0.6495 10 0.6495 100 0.6495 200 0.6495 5 5 0.6400 0.6400 10 0.6400 20 0.6400 50 0.6400 6 6 0.6339 0.6339 10 0.6339 20 0.6339

Comparison between theoretical value Λ and the calculations using tensor decomposition λ for qudit n-partite states.

The overlap is given by, from Ref.[7]

s n! 1 1 n − 1 n−1 Λ = ( ) 2 ( ) 2 , (4.22) (n − 1)! n n

which is independent of d.

In Table 2, we use Λ to denote the theoretical results and the λ to label the numerical ones.

We tested the overlaps of qudit systems up to 6-partite systems, i.e. 6-way ten- sors. Each tensor is tested up to a bond dimension of 50 to 200. The largest tested tensor has a bond dimension of 505 owing to the restriction of computational power.

Agreements are found. Our results show that tensor decomposition method is capable of dealing with qudit systems.

81 4.3.4 Hierarchies of Geometric measure of entanglement

In general, hierarchies of geometric measure of entanglement refers to the structure

of the distances from a quantum state to the K-separable states. For example, for a

general pure state, some parts of the system is entangled while the wavefunction can

still be written as the direct products of some larger parts. A detailed discussion of

the hierarchies can be found in Ref. [55].

We need to point out that this hierarchy structure of entanglement is quite natu-

ral to understand in the context of tensor theory. For a general tensor Tij···k, we can

combine the first two index together T(ij)···k and write ij as a single index l, which

means that we treat them as one part. To calculate the entanglement for this parti-

tion, what we need is to find the entanglement for the tensor Tl···k. It is easy to see

that different partitions are equivalent to different ways of combining tensor indices.

Therefore, it is natural to understand the hierarchies in the language of tensor.

For example, if we have a quantum state which has three parties and can be

written as a three index tensor T2,3,4, i.e. the dimension of the Hilbert space of

each party is 2, 3, 4 respectively. Naturally, we can consider a 2-separable state

where one party has a Hilbert dimension of 6 and another party has a dimension of

4. Writing in the language of tensors, for T2,3,4, we can rewrite the first to labels as one single label which has a bond dimension of 2 × 3 = 6, simply by rewriting

each 2 by 3 matrix as a 6-dimensional vector. Therefore, by combining two indices

we get a new tensor T6,4. Although these two tensors have a one to one map, the

tensor decomposition structure has changed, therefore, we can calculate two parties

geometric entanglements by different ways of combining indices.

82 Table 4.3: Hierarchies of 5-qubits W state Partition Tensor size λ numerical E E from[55] 1,4 2 × 16 0.8944 0.2000 0.200 2,3 4 × 8 0.7745 0.4001 0.400 1,1,3 2 × 2 × 8 0.7745 0.4001 0.400 1,2,2 2 × 4 × 4 0.6761 0.5429 0.543 1,1,1,2 2 × 2 × 2 × 4 0.6639 0.5592 0.559 1,1,1,1,1 2 × 2 × 2 × 2 × 2 0.6400 0.5904 0.590

Comparison of hierarchies using tensor decomposition method.

We would like to calculate the hierarchies for the 5-qubits W state, and compare with the results in Ref. [55].

r1 |W i = (|00001i + |00010i + |00100i + |01000i + |10000i). (4.23) 5

We found agreements between these results, see Table 3 for more details. Tensor decomposition method is capable of finding the hierarchical structure of a quantum state.

Tensor decomposition method can also be used to find the Hierarchies of geometric entanglement for non-symmetric states.

For example,

3 |ψW i = N3(γ1|001i + γ2|010i + γ3|100i). (4.24)

The theoretical value of overlap square is found to be[55]

2 2 2 2 2 Λ (i|j, k) = N3 max[γi , γj + γk]. (4.25)

83 Where i, j, k are labels for different parties.

With γ1 = 1,γ2 = 2,γ3 = 3, the overlap square is found to be at 0.6428 using tensor decomposition method, which is in perfect agreement with the theoretical value.

For another example,

4 |ψW i = N4(γ1|0001i + γ2|0010i + γ3|0100i + γ4|1000i). (4.26)

The theoretical value of overlap square is found to be[55]

2 2 2 2 2 2 Λ (i, j|k, l) = N4 max[γi + γj , γk + γl ]. (4.27)

With γ1 = 1,γ2 = 2,γ3 = 3,γ4 = 4, the overlap square is found to be at 0.8333 using

tensor decomposition method, which is also in perfect agreement with the theoretical

value.

4.4 Discussions

4.4.1 Geometric measure of entanglement for many-body sys- tems

The geometric measure of entanglement defined above is for finite quantum states.

For many-body systems, we can define geometric entanglement per site, by using the

overlap between an entangled state and a direct product state of every site. For

a 1-D system, the ground state can be written as a Matrix Product State (MPS).

Assuming translational symmetry, we can efficiently calculate geometric entanglement

per site based on the local structure of the MPS representation. Remarkably, the

geometric entanglement structure for a translational symmetric many-body system

84 is more simpler for a finite state space. For certain 1-D systems, analytical solutions exist. The details of the process discussed above can be found in Ref. [56].

Recently, research has been performed for 2-D systems. For a 2-D translational invariant quantum many-body system, the ground state can be represented as an infinite Projected Entangled Pair States (iPEPS). Following the same procedure used for the 1-D case, geometric entanglement per site can be calculated by contracting over the tensor network representation of the overlap coefficient. The overlap is dominated by the largest singular value of the representation tensor treated as a matrix. The largest singular value of a matrix is the same as the overlap coefficient of the best rank-one approximation of the tensor discussed in this paper. We can use this overlap and geometric entanglement to discover phase transition for a 2-D many-body system (see details in Ref.[57]). For a 2-D system, an iPEPS tensor can be represented as a matrix and therefore, it is still easy to calculate. Our method is potentially beneficial to the tensor representation of a 3-D or higher dimensional system, although a realistic tensor representation for 3-D quantum systems is beyond the current computational power.

4.4.2 Several comments

A topic that we have not discussed in this article is the calculation of the geometric measure of entanglement for mixed states. It is known that the entanglement curve for a mixed state is the convex hull for the corresponding pure state. After numerical calculation the entanglement surface of the pure states, it should be straightforward to calculate the convex hull geometrically using numerical methods.

85 A subtle detail that we should stress is that the tensor decomposition may be trapped in a numerical metastable state if the initial conditions are not properly set.

Therefore, for a reliable calculation, great care should be given to the initial conditions to avoid erroneous results.

Tensor decomposition theory is currently still under development and therefore, some theoretical aspects of its properties are still unknown. It will be interesting if new developments of tensor decomposition theory shed some light on quantum theory and quantum information theory.

It will be interesting to explore the restrictions of this method. It is known that the calculation of the best rank-one approximation of a tensor is NP-hard[58], which was also proved in Ref.[59]. Therefore, it is difficult to calculate a geometric measure of entanglement for a rather large quantum system. Our method is easy to implement, and is based on existing code packages. Based on existing calculation software such as MATLAB. Convex hull (Convex envelop) can also be constructed in MATLAB, to represent the entanglement of mixed quantum states, see [7] for details.

4.A Appendix: Searching for highly entangled states and maximally entangled states.

Deep understanding of highly entangled multiqubit states is important for quan- tum information processing. In this section, we discuss several maximally or highly entangled quantum states.

86 4.A.1 Bounds on the geometric measure of entanglement

By exploiting the correspondence between the geometric measure of entanglement and best rank-one approximation, properties of the geometric measure of entangle- ment, such as the upper bound, can be acquired.

For example, considering a quantum state that can be represented by a real tensor

T . Assuming the party number is m, and the dimension of the each party is given by

2 ≤ n1 ≤ n2 ≤ · · · ≤ nm. Then the overlap in the real space satisfies

1 √ < λ ≤ 1. (4.28) n1n2 ··· nm−1 Therefore,

1 0 ≤ E < 1 − . (4.29) n1n2 ··· nm−1 Based on the states that we tested, we could say that this bond is valid. It is not clear whether or when this bound is exact. For mathematical details, please see

Ref.[93].

4.A.2 Maximally entangled four qubits states

The four qubits quantum state, Higuchi- Sudbery (HS) state, is conjectured to be maximally entangled[94].

2πi We consider a family of Higuchi- Sudbery states |HSit, where w = e 3 corre- sponds to the previously discovered HS state.

r1 |HSi = [|0011i + |1100i + w(|1010i + |0101i) + w2(|1001i + |0110i)], (4.30) t 6

87 Entanglement of the HS state as a function of t 0.8

0.78

0.76

0.74

0.72

0.7

Geometric Entanglement 0.68

0.66

0.64

0.62 0 1 2 3 4 5 6 7 t

Figure 4.3: Entanglement of the HS family of states as a function of t. ti w = e . The maximal entanglement, Emax = 0.7778

In Figure 4.3, we show the evolution of the geometric entanglement as a function

2πi of w. As expected, E has a maximum at w = e 3 . We also notice that the state at

πi w = e 3 has the same entanglement as the |HSi state. Therefore, we have numerically

discovered a few four qubits states that are maximally entangled. However, we should

point out that these states might be equivalent under local unity transformations.

We searched complex four qubits state using Monte Carlo sampling with 100000

samples. We did not find any four qubits quantum states with a higher geometric

88 entanglement, therefore, the |HSi is likely to be the four qubits state with the highest entanglement under geometric entanglement measure.

4.A.3 Highly entangled four qubits states

The L state maximizes the average Tsallis α-entropy of the partial trace for

α > 0[95]. While, surprisingly, we find that this state has a constant geometric entanglement E = 0.6667 with respect to changing w.

r 1 |Li = [(1 + w)(|0000i + |1111i) + (1 − w)(|0011i + |1100i)+ t 12 w2(|0101i + |0110i + |1001i + |1010i)]. (4.31)

The |BBSB4i is found to be a highly entangled state with respect to a certain measure [96].

r1 |BSSB i = [|0110i + |1011i + i(|0010i + |1111i) + (1 + i)(|0101i + |1000i)]. 4 8 (4.32)

Our result shows that it is a local minimum under a family of |BSSB4it states, at w = i with EBSSB4 = 0.7500, see Figure 4.4.

r1 |BSSB i = [|0110i+|1011i+w(|0010i+|1111i)+(1+w)(|0101i+|1000i)]. (4.33) 4 t 8

In addition to the highly entangled state listed above, we provide a list of highly entangled four qubits states, based on our numerical search. The states with real

89 Entanglement of BSSB state as a function of parameters x 1

0.9

0.8

0.7

0.6

0.5

0.4 Geometric Entanglement 0.3

0.2

0.1

0 0 1 2 3 4 5 6 7 x

Figure 4.4: Entanglement of the BSSB family of states as a function of x. xi w = e . The entanglement at w = i is EBSSB4 = 0.7500.

90 integer coefficients are relatively easy to prepare in experiment. These states have

the same entanglement as the |BBSB4i state.

1 |φ i = (|0000i + |1110i + |0101i + |1011i). (4.34) 4,1 2

1 |φ i = (|1100i + |0010i + |0101i + |1011i). (4.35) 4,2 2

1 |φ i = (|1000i + |0110i + |0001i + |1111i). (4.36) 4,3 2

1 |φ i = (|0100i + |0010i + |1001i + |1111i). (4.37) 4,4 2

1 |φ i = (|0110i + |1010i + |0001i + |1101i). (4.38) 4,5 2

1 |φ i = (|0010i + |1110i + |0101i + |1001i). (4.39) 4,6 2

1 |φ i = (|0000i + |1100i + |0011i + |1111i). (4.40) 4,7 2

All the states above have a overlap of λ = 0.5 and a geometric entanglement of

E = 0.75.

All the |φi states in this paper are constructed and searched using Monte Carlo

Sampling. We start with a several index tensor and random initialize each element of tensor to zero or one. Practically, the number of 1s in each tensor is fixed under each Monte Carlo process, although different value is used in different runs. Then

91 we normalize each tensor and calculate the geometric entanglement. Using a large number of samples, the tensor with the largest geometric entanglement is recorded.

4.A.4 Highly entangled five qubits states

The |BBSB5i state is found to be a highly entangled five qubits state[96].

r1 |BBSB i = (|00001i − |00010i + |01000i − |01011i+ 5 8 |10001i + |10010i + |11100i + |11111i). (4.41)

The geometric entanglement is 0.7500. Our search find a new state |φ5,1i, which is more entangled than |BBSB5i under the measure of geometric entanglement.

r1 |φ i = (|00000i + |01100i + |10010i + |11001i + |00111i + |11111i). (4.42) 5,1 6

For |φ5,1i, the overlap is λ = 0.4329 with entanglement E = 0.8126.

r1 |φ i = (|11000i+|01100i+|10010i+|10110i+|00001i+|01001i+|00111i+|11111i). 5,2 8 (4.43)

For |φ5,2i, the overlap is λ = 0.500 with entanglement E = 0.7500, which is the same as |BBSB5i.

4.A.5 Highly entangled six and seven qubits states

We provide two examples of six qubits state.

r1 |φ i = (|100000i+|011000i+|011110i+|101110i+|101001i+|110101i+|000011i). 6,1 7 (4.44)

92 For |φ6,1i, the overlap is λ = 0.3780 with entanglement E = 0.8571.

r1 |φ i = (|11000i + |001100i + |010110i + |100110i+ 6,2 8 |001001i + |100101i + |111101i + |101011i). (4.45)

For |φ6,2i the overlap is λ = 0.3954 with entanglement E = 0.8436.

Notice that our six qubits states are more simple than the state found in Ref.[97].

For seven qubits states, we found

r 1 |φ i = (|0110000i + |0011000i + |1100100i + |0001100i + |1110010i 7,1 10 + |1001010i + |1101001i + |1010101i + |0000011i + |1111111i). (4.46)

For |φ7,1i, the overlap is λ = 0.3162 with entanglement E = 0.9000.

r 1 |φ i = (|0110000i + |0000100i + |1100100i + |1011100i + |1001010i 7,2 11 + |0011110i + |0101101i + |1110011i + |0000011i + |0011011i + |1010111i). (4.47)

For |φ7,2i, the overlap is λ = 0.3183 with entanglement E = 0.8987.

Notice the geometric entanglement of all the states in this section is invariant under local unitary transformation of each party. Therefore, we can get other states by applying a rotation on each qubit.

93 Chapter 5: Theory of Machine Learning

5.1 Introduction

Machine learning is a subject which gives a machine the ability to learn from data without being explicitly programmed. Machine learning methods can be classified into two major categories: supervised learning and unsupervised learning. In super- vised learning, labels are given to data points, which serve as supervisors, while in unsupervised learning, there is no label associated with the data. Another method is called the reinforcement learning. In reinforcement learning, training data is used to reward or punish the machine learning algorithm. Machine learning methods can be used for different tasks, for example, classification, regression, clustering, and dimen- sional reduction.

To carry out a machine learning analysis, we need datasets to learn from. Also, we need a machine learning algorithm to use. Finally, we need to judge the quality of our machine learning method.

Since raw data could be in any format with different qualities, data cleaning is necessary for the machine learning process.

The first step of data cleaning is to remove duplicate and irrelevant observations, since we will not get any extra information from these data. Secondly, we should

94 remove structural errors from the data which may include typos and inconsistencies.

Thirdly, outliers should be removed with valid reasons. This can be subtle since outliers may come from measurement errors, or may contain important information.

Finally, missing data should be treated carefully. Naive ways such as ignoring or dropping these data, or replacing these data based on other observations may not lead to good results. For missing categorical data, we should treat these data as a different class. For missing numerical data, we should flag these data and give 0 values to these data. After these data cleaning steps, our data should be more robust that raw data.

After data cleaning, we should choose the right estimator for our job. This depends on the size and type of our dataset and depends on our learning goal. A summarization of this estimator selection process can be found, for example, from the documentation of scikit-learn. scikit-learn is a popular and powerful python package that can be used to carry out machine learning analysis.

In the next few sections, we will introduce several popular machine learning meth- ods. Good books about machine learning are Ref. [65] [66].

5.2 Regression

Linear regression is the method of modeling the linear relationship between depen- dent variables y and independent variables xi. The linear regression model assumes

95 linearity relationship, independence between observation, and errors should be nor-

mally distributed with the same variance. The simplest linear regression model, for

instance, can be written as

y = w0 + w1 × x1 + w2 × x2 + ... + wn × xn. (5.1)

Here wi are the parameters of the model, which are acquired by optimizing the objective function. More generally, we can have

y = w0 + w1 × φ1(x1) + w2 × φ2(x2) + ... + wn × φn(xn), (5.2)

with φi as the basis functions. The basis functions can take the form of polynomial

functions; therefore, leads to polynomial regression. Other forms such as the Gaussian

or the sigmoid function are also widely used as the basis functions.

By minimization the loss function or maximizing the likelihood function, we can

find our best approximations for wi. In order to avoid overfitting, a regularization

P p term λ |wi| is usually added to the loss function. Here λ is the regularization

coefficient. p is an integer. When p = 1 it is called the lasso regression, while for

p = 2, it is called the ridge regression. Regularizations are used for models with small

sample size to avoid over-fitting.

One important concept in the machine learning theory is the bias-variance trade-

off. Bias represents the deviation of the model from the theoretical value, while

variance represents the deviation of the prediction for their average value. Generally,

over-fitted models have more variance and under-fitted models have more bias.

The linear regression method has the advantage that the results are easy to im-

plement and interpret, although linear regression poorly behave for nonlinear data.

96 Another widely used regression model is the logistic regression for unsupervised learning. In logistic regression a sigmoid function σ(y),

1 z = σ(y) = , (5.3) 1 + e−y

is applied to the linear output y in order to give a binary classification of the input values. The loss function or the maximal likelihood can be minimized in order to find the optimal parameters. Similar to linear regression, logistic regression is also easy to implement, compute and interpret. Sometime logistic methods may have low accuracy and may underfit the data.

5.3 K-nearest neighbors algorithm(KNN)

The K-nearest neighbors algorithm (KNN) is also one of the easiest algorithms in machine learning theory. It is a non-parametric algorithm and it is one of the most used classification algorithms. In the classification process, the class label of a new point is determined using the class labels of its K nearest neighbors by majority voting. Here K is an integer that is manually selected. The KNN method can also be used for regression, where the average value of K nearest neighbors is used as the prediction value.

In practice, we can also add weights to the neighbors so that nearer neighbors have more contribution. The Euclidean Metric is widely used as the distance measure. The majority voting may also cause problems when data are not balanced.

The parameter K can be selected empirically by checking the performance for different K. The KNN method makes no assumptions about the data and is very easy to implement, although it can be computationally expensive.

97 5.4 Decision tree

The decision tree method is a very intuitive machine learning method. Deci-

sion trees can be classified into classification trees and regression trees, depending

on whether target variables are discrete or continuous. A decision tree is usually

drawn upside down, starting from the root node. Internal nodes represent a condi-

tion while the leaves represent decisions. Decision tree method is easy to compute

and understand but sometimes may lead to overfitting.

Given a number of features, the most important step of constructing a decision

tree is to select a feature and split the decision tree. To measure the quality of a split,

we need to define some metric.

One widely used metric is the information gain. The entropy of information is

defined as X H = − pilog2(pi). (5.4) i

Here pi is the probability of choosing a class.

The information gain is defined as the difference of the entropy before and after the split, where on each child node, the entropy is weighted by the probability of selecting a specific child node. Among all the features, the one with the highest information gain is selected. This process will continue until a decision tree is fully constructed or there is no information gain.

Another widely used metric is the Gini impurity, which is defined as

X 2 G = 1 − pi . (5.5) i

98 In a regression, the loss function can be chosen as the standard deviation reduction,

and the average value of a node can be used as the target value. Pruning method can

be used in the decision tree to avoid overfitting.

5.5 Support vector machine

The object of a support vector machine[67] is to construct a hyperplane to separate

data points of different classes.

We start with a set of data points labed by (xi, yi), with xi as a point in a high dimensional space and yi as the class label of xi. For example, yi = 1 if xi belongs to one class and yi = −1 if xi belongs to another.

The hyperplane is given by

wx − b = 0 (5.6)

When our data is linearly separable, two hyperplanes

wx − b = −1 (5.7)

wx − b = 1 (5.8)

2 can be defined as the margins. The distance between two margin planes is |w| . We can impose constraints on the data sets such that

wxi − b ≤ −1 (5.9)

when yi = −1.

wxi − b ≥ 1 (5.10)

99 when yi = 1. These two equations can be rewritten as

yi(wxi − b) ≥ 1 (5.11)

Therefore, our optimization objective is minimizing |w| under the constraint above.

|w| is determined by the points that are near the margin. These points are called

support vectors, so this method is called the support vector machine method.

When our data is not linearly separable, we can use the hinge loss function.

max(0, 1 − yi(wxi − b)). (5.12)

The object then becomes minimizing

1 X max(0, 1 − y (wx − b)) + λ|w|2. (5.13) n i i i The SVM is illustrated in Figure 5.1.

Kernel method can be used to deal with nonlinear boundaries. The advantages of

SVMs are that the results are easy to compute and interpret. The disadvantages are that results are sensitive to the choice of kernels.

5.6 K-means clustering

The K-means clustering method[68] is an example of unsupervised learning. The goal is to group many data samples into K clusters. Therefore, it serves as a classifi- cation method. The solution of current K-means clustering algorithms will generally give a local optimum.

The algorithm of K-means clustering can be summarized as follows. (1). Initialize

K points as the centroids of K clusters. (2) Calculate the distance of each point to

100 Figure 5.1: The support vector machine. This figure is credited to Wikipedia by author cyc.

the centroids and assign it to the nearest centroid. (3) Recalculate the position of each centroid after the assignments of all the samples. (4) Repeat step 2 and 3 until the positions of the centroids have converged. Due to the fact that this algorithm can only find the local minimum, it is necessary to run the K-means clustering several times to find the best classification.

101 5.7 Principal component analysis

The goal of principal component analysis (PCA) [69] is to transform the data into uncorrelated variables using an orthogonal transformation. This uncorrelated vari- ables are called the principal components. PCA is a very useful tool for dimensional reduction.

Given an n-dimensional random vector X, where each random event is a data point.

We can construct an n by n covariance matrix σ. In order to get the uncorrelated components, we can do a singular value decomposition on σ

σ = U ∗SV (5.14)

Here S is the singular matrix, and U, V are two orthogonal transformations.

To reduce the dimension of our data set. We can take the largest k several eigen- values of S and transform our data using part of V . Our new dataset will have dimension k and will be uncorrelated. In this way, we reduce the dimension of our data from n to k. Practically, it is necessary to scale the dataset before doing the principal component analysis.

5.8 Restricted Boltzmann Machines (RBM) as an artificial neural network

A restricted Boltzmann machine (RBM) [70] is a specific type of artificial neural network which can be used to learn a probability distribution. The RBM is a widely used tool in deep learning method.

102 Consider an RBM consist of m visible neurons and n hidden neurons. We can

define the energy of an RBM

X X X E(v, h) = aivi + bihi + Wijvihi. (5.15)

The partition function of an RBM can be defined as

X Z = e−E(v,h). (5.16) v,h Here we are summing over all the possible configurations of v and h.

The probability of a given configuration v is

X p(v) = e−E(v,h)/Z. (5.17) h

A dataset consists of many data vectors labeled by vi. If we maximize the log- likelihood function

1 X L = log(p(v )), (5.18) n i i we can construct the probability distribution of the data sets using RBM.

The RBM is illustrated in Figure 5.2.

5.9 Model evaluation

After applying a machine learning method to the data set, a natural question to consider is whether our learning process is good or not. This lead to the model evaluation.

A good practice of model evaluation is to split the data set into the training set and the testing set. And we use the training set for learning, and the testing set for model

103 Figure 5.2: The restricted Boltzmann machine. This figure is credited to Wikipedia by author Qwertyus.

evaluation. When parameters tuning is needed we can also split a validation set.

Using this technique, we can determine whether our model is overfitted or underfitted.

For instance, if we have a large error on both the training set and the testing set, our model might be underfited, while if we have a small error on the training set while a large error on the testing set, the model might be overfitted. Another method, the k- fold cross validation(separate the data into k parts and randomly choose some parts as the training set and the others as the validation set) or bootstrapping(random sample some data as the training sets).

To judge the quality of a classification problem, the confusion matrix can be in- troduced. True positives(TP) are samples that are labeled as positives and classified

104 as positives. False positives(FP) are samples that are labeled as negatives and clas-

sified as positives. True negatives(TN) are samples that are labeled as negatives and

classified as negatives. False negatives (FP) are samples that are labeled as positives

and classified as negatives.

Quantities such as the recall

TP recall = , (5.19) TP + FN

and the precision

TP precision = , (5.20) TP + FP

can be defined on these rates.

In general, there is a tradeoff between the recall(R) and the precision(P). therefore,

we can define the F1 score,

2PR F 1 = , (5.21) P + R

to be a combined metric.

We can also plot TP in the y-axis versus NP in the x-axis. This is the Receiver

Operating Characteristic (ROC) curve. A large ”Area under the curve (AUC)” means a better model. Another metric, the log loss is also useful.

For a regression model, we can define the coefficient of determination, which is the ratio of regression variation and the total variation. A better model will have the coefficient of determination close to 1.

105 In terms of selection the best parameters of our model, popular methods such as the grid search and random search are also useful. Grid search set up grids to find the best parameter while the random search samples the parameters.

106 Chapter 6: Solving quantum mechanics problems using radial basis function network

6.1 Introduction

Machine learning theory[71] has been developing rapidly in recent years. Machine learning techniques have been successfully applied to solve a variety of problems, such as email filtering, optical character recognition(OCR), and natural language process- ing, and have become a part of everyday life. In the physical sciences, researchers are also applying machine learning methods to explore new possibilities. For exam- ple, machine learning methods are used in molecular dynamics[72][73], as a way to bypass the Kohn-Sham equation in density functional theory[74], to assist in ma- terials discovery[75], or to identify phase transitions[76]. Considering the power of machine learning, it is interesting to consider solving quantum mechanics problems using machine learning methods.

Artificial neural networks (ANNs) [77], which are inspired by biological neural networks, are one of the most important methods in machine learning theory. An

ANN consists of a network of artificial neurons, and examples of ANNs include feed- forward neural networks[71], radial basis function (RBF) networks[78], and restricted

Boltzmann machines[79]. As a universal approximator [80][81], an ANN can be used

107 to represent functions, and it is possible to use an ANN as a representation of the wavefunction in a quantum system.

Researchers have been trying to combine neural network theory and quantum me- chanics, for example, using a neural network in the real space to solve differential equations, especially the schr¨odingerequation with some specific potential[82]. An- other example is the quantum neural network[83], where information in an ANN is processed quantum mechanically. One of the most promising works was the recent research by Carleo and Troyer in Ref.[5], where the restricted Boltzmann machine was used as the variational Monte Carlo (VMC) ground state wavefunction. In their work, the ground state of a many-body system could be efficiently represented by a neural network. Following their work, other possibilities were also explored. Most recently, in Ref.[84], a three-layer feedforward neural network was used to calculate the ground state energy of the Bose-Hubbard model. Machine learning methods were shown to be able to distinguish different phases, even for systems with the sign problem[85].

VMC methods do not suffer from the fermion sign problem; therefore, using a neu- ral network as a VMC ansatz is very promising and has the potential to tackle the calculations that are almost impossible in other Monte Carlo methods.

In this article, the possibility of using an RBF network to represent the wavefunc- tion of a quantum-mechanical system is discussed. Our work is new in two major aspects. First, the representation power of the RBF network is illustrated, which has not been discussed in the physics literature. Second, instead of a lattice system, where the dimension of the Hilbert space of each site is finite, a general quantum-mechanical system with infinite or continuous degrees of freedom is discussed. A binary restricted

108 Boltzmann machine is not sufficient for the simulations of such system; therefore, it is interesting to search for new ansatz. An RBF network is one of the candidates.

In our work, a VMC procedure is formulated, where an RBF network is used as the variational wavefunction. A harmonic oscillator in a linear potential and a particle in a box with a linear potential are then used as benchmarks. Furthermore, we discuss the possibility of using the VMC method to solve for the lowest eigenvalue of a matrix.

This article is organized as follows. In section 6.2, artificial neural network theory and variational Monte Carlo theory are reviewed. Section 6.3 contains major results, that is, quantum mechanical problems are solved using radial basis neural network.

In section 6.4, we discuss some related questions.

6.2 Artificial neural network theory and the variational Monte Carlo method

In this section, two cornerstones of this work will be introduced, which are the artificial neural network theory and the variational Monte Carlo method. Although we use RBF networks in the calculation, in this section, we introduce feedforward neural networks for pedagogical purposes and for comparison.

6.2.1 Artificial neural network theory

Inspired by the biological neural network model, ANN theory was proposed by

McCulloch and Pitts in 1943[77], in an attempt to propose a mathematical description of the biological nervous system.

109 Feedforward neural network

Figure. 6.1 illustrates a simple example of a feedforward neural network which consists of three layers of artificial neurons. Each neuron is represented by a circle.

(1) Suppose we have N inputs denoted by xi , i = 1, 2, ..., N. These inputs are repre- sented by N neurons in the input layer. The input layer can be fed to the hidden layer through the relation

Figure 6.1: An illustration of the artificial neural network. (1) A typical neural network consists of three layers of neurons: The input layer xi , the (2) (2) hidden layer xj , and the output layer yk . The lines between layers are associated with weight w, and the circles that represent neurons are associated with bias b.

(1) X (1) (1) (1) yj = wji xi + bj . (6.1) i

110 (1) (1) Here, wji is called weight and bj is called bias. j = 1, 2, ..., M is the index labeling the the hidden layer and M is the number of neurons in the hidden layer. In

(1) the hidden layer, each yj is transformed to the input of the next layer through the activation function

(2) (1) xj = a(yj ). (6.2)

Practically, the activation function can be the sigmoid function

1 sigmoid(x) = σ(x) = , (6.3) 1 + e−x

or the hyperbolic tangent function.

(2) After the activation function, xj is fed to the output layer through

(2) X (2) (2) (2) yk = wkj xj + bk , (6.4) j

(2) with k labeling the output layer. yk then is transformed to the final output of the neural network through the activation function

(2) zk = σ(yk ). (6.5)

There can be more than one hidden layers in a neural network, and the transfor-

mation rule can be constructed similarly.

Neural networks are a widely-used tool in machine learning theory, for example,

as a statistical classification tool for supervised learning. In practice, the learning

process can be carried out by minimizing the error function

N 1 Xt E(w, b) = ||z(x ; w, b) − z (x )||2. (6.6) 2 l 0 l l=1

111 In this equation, w,b are the weights and biases in the neural network, Nt is the

number of elements in the training set, and z, z0 are the output of the neural network

and the measured value in the training set, respectively.

The goal of the learning process is to find the optimal w and b that minimize the error function E(w, b). This can be a highly non-trivial problem when there are a large number of parameters. For such algorithms such as the back-propagation, please see Ref.[71].

In a typical machine learning problem using neural network methods, the input neuron can be a binary number. For example, in a handwritten digit recognition problem, each input neuron corresponds to a pixel in a figure and takes a value of 0 or 1. The input values are processed through the neural network using, for example, the rules mentioned above. The output values of the neural network are compared with the objective values, and the error E(w, b) is minimized by finding the optimal parameters.

The feedforward neural network discussed above is only an illustration. In this article, another type of neural network, the RBF network, is used as the variational wavefunction ansatz.

Radial basis function (RBF) network

The RBF network has the same graphical representation with the feedforward neural network, but the function representations are different. For example, for a

112 three-layer neural network with one single output neuron, the output function z(x)

can be written as

M X z(x) = aiρi(||x − ci||). (6.7) i=1

In this output function, ai and ci are parameters of the neural network. x is the

input vector which has the same dimension as ci. M is the number of neurons in the hidden layer. ρ(|| • ||) is the radial basis function which can be a Gaussian function with a Euclidean norm.

2 −|bi||x−ci| ρi(||x − ci||) = e , (6.8) or an exponential absolute value function

−|bi||x−ci| ρi(||x − ci||) = e . (6.9)

Other activation functions, such as multiquadratics

p 2 2 ρi(||x − ci||) = |x − ci| − |bi| , (6.10)

Or inverse multiquadratics

2 2 1 ρi(||x − ci||) = (|x − ci| − |bi| ) 2 , (6.11)

are also commonly used in the machine learning community. These activation func-

tions can also be understood as kernel functions. In the activation functions, |bi|

are parameters that control the spread of the activation function. Other activation

function are also possible, discussions about the activation function can be found in

Ref. [86]

113 In addition to the feedforward neural network and the RBF network, many dif- ferent types of neural networks can be constructed, such as the restricted Boltzmann machines or the autoencoders, which are widely used in deep learning technology. The universal approximation theorem establishes the mathematical foundation of neural network theory, which states that neural network functions are dense in the space of continuous functions defined on a compact subset of Rn, under some assumptions about the activation function and given enough hidden neurons[80][81].

In this paper, the RBF network is used as a variational wave function represented in a discrete eigenbasis. Note that we use |bi| as a variational parameter in our calculations instead of a constant number as in a regular RBF network. The absolute value of |bi| is for the stability of the optimization.

When neural network methods are applied to quantum physics, the inputs of the neural network can take discrete quantum numbers. After being processed through the neural network, the outputs of the neural network represent the amplitudes of the wavefunction on the basis labeled by the input quantum numbers. The neural network is then trained by minimizing the energy expectation value. For example, for a three- dimensional quantum harmonic oscillator in an orthogonal coordinate system, we can use a neural network with three input neurons, where each input can take integer values for 0 to ∞. The trained neural network should represent the ground state of this system, in which, after proper normalization, the output should be 1 given 000 as the input, and 0 for other inputs.

114 6.2.2 Variational Monte Carlo method(VMC)

The VMC method, first proposed by McMillan in 1965[60], combines the varia-

tional method and the Monte Carlo method in order to evaluate the ground state of

a quantum system.

Start from a Hamiltonian Hˆ and a variational wave function |ψ(λ)i, where λ is a

set of variational parameters, the energy expectation value can be written as

hψ(λ)|Hˆ |ψ(λ)i E(λ) = . (6.12) hψ(λ)|ψ(λ)i This energy expectation value can be computed using the widely known Metropo- lis algorithm[61], which is one of the most efficient algorithms in computational sci- ence. As a Markov chain Monte Carlo method, it may currently be the only efficient algorithm for evaluating a multidimensional integral.

The next step of the VMC method is to minimize the energy in the parameter space. This can be a difficult problem when there are many variational parameters.

Two examples of such algorithms are the linear method[87] and the stochastic recon-

figuration method[88]. The minimization algorithm gives the minimum of the energy in the parameter space, and it is reasonable to use this value as our approximation for the ground state energy. For a detailed review of the VMC method, please refer to Ref. [89].

Currently, physicists believe that the accuracy of the VMC method depends, to a great extent, on a proper choice of the variational wavefunction; therefore, it is impor- tant to choose a wavefunction based on physical intuition or a physical understanding of the system. This belief may not be true in the age of machine learning. Neural network functions are capable of approximating unknown functions by maximizing

115 or minimizing the objective function. It would be interesting to further explore the possibility of using a neural network function as the variational wavefunction of a quantum system.

6.3 Solving quantum mechanics problems using artificial neu- ral network

In the pioneering work of Carleo and Troyer[5], restricted Boltzmann machine(RBM) was used as a variational wave-function for many-body systems. The transverse-field

Ising model and anti-ferromagnetic Heisenberg model were benchmarked using the

RBM wavefunction. Variational Monte Carlo calculations were carried out. Their results demonstrate that a neural network wavefunction is capable of capturing the quantum entanglement of the ground states and giving an accurate estimation of the ground state energy.

In this article, we continue developing this idea using artificial neural network functions as the ground state variational wavefunction. In Ref.[5], the restricted

Boltzmann machine is only binary-valued, we will demonstrate the representation power of a neural network wavefunction without this constraint. In addition, we dis- cuss the possibility of using a neural network wavefunction to solve a generic quantum mechanics problem. This VMC method behaves at least as accurate as the pertur- bation theory.

116 6.3.1 Theoretical outline

Consider a quantum system which has a countable number of basis, an arbitrary

state |ψi in the Hilbert space can be represented by

X |ψi = ψ(n1, n2, ..., np)|n1, n2, ..., npi, (6.13)

n1,n2,...,np

where |n1, n2, ..., npi is a set of basis labeled by quantum number ni, i = 1, 2...p., and p is the number of sites in the system. For example, for the Heisenberg model, p represents the number of spins; for a three dimensional harmonic oscillator in a Cartesian coordinate, we could use n1,n2,n3 to label three quantum numbers.

ψ(n1, n2, ..., np) is the amplitude of |ψi on basis |n1, n2, ..., npi. We can interpret

this amplitude as a function of n1, n2, ..., np. A similar ansatz is also used in Ref.

[84].

This function can be represented by a neural network with one output neuron.

Using an RBF network, the amplitude function can be written as,

M X ψ(n1, n2, ..., np; a, c) = aiρi(||n − ci||), (6.14) i with n represents an array of quantum numbers and

2 −|bi||x−ci| ρi(||x − ci||) = e . (6.15)

One reason to choose this neural network is that the Gaussian activation function

guarantees that the amplitude does not diverge when n → ∞.

Practically, it is useful to truncate the quantum number ni if its range is countably

infinite. This is not necessary for a spin half lattice system since ni can only take two

117 values. For a harmonic oscillator, however, we may truncate the quantum number

at some finite value. The universal approximation theorem is only valid for a closed

space. This truncation will also facilitate numerical simulations.

Using this variational wave function, the energy expectation value is

hψ(λ)|H|ψ(λ)i R |ψ(n; λ)|2E (n; λ))dn E(λ) = = local , (6.16) hψ(λ)|ψ(λ)i R |ψ(n; λ)|2dn

with

P 0 0 hn|H|ψ(λ)i 0 hn|H|n ihn |ψ(λ)i E (n; λ) = = n , (6.17) local hn|ψ(λ)i hn|ψ(λ)i

Here, λ represents all the variational parameters, for example, ai, bi and ci.

The energy expectation can be evaluated using the Metropolis algorithm. After initialization and thermalization, repeat these two step until equilibrium: (1) generate a move from configuration n to n00. (2) Using proper transition probability, accept

hn00|ψ(λ)i 2 or reject the move with probability min(1, | hn|ψ(λ)i | ). Expectation value of other operators can be evaluated similarly.

Compared with exact diagonalization, one advantage of this formalism is that

the matrix element hn|H|n0i is never stored explicitly. Only the non-zero matrix

elements are needed to be valued and summed during the sampling process.

The energy as a function of parameters λ can be, for example, minimized using

the stochastic reconfiguration method[88]. Due to the fact that the energy in a VMC

calculation are stochastically sampled, conventional optimization methods in machine

learning may not be sufficient. Therefore, the stochastic reconfiguration method is

more convenient to use.

118 In the stochastic reconfiguration method, an operator

∂λi ψλ(n) Oi(n) = , (6.18) ψλ(n) can be defined for each parameter in the variational wavefunction.

For a radial basis neural network with Gaussian basis function

ρ O (n) = i , (6.19) ai ψ

2 aibi|n − ci| ρi Obi (n) = − , (6.20) |bi|ψ

2a |b |(n − c )ρ O (n) = i i j ij i , (6.21) cij ψ

where cij is the j-th component of of ci. The covariance matrix and forces are defined as

∗ ∗ Sij = hOi Oji − hOi ihOji, (6.22)

∗ ∗ Fi = hElocalOi i − hElocalihOi i. (6.23)

The parameters can be updated by

0 −1 λj = λj + αSij Fi. (6.24)

Here, h•i is the expectation value of an operator. α can be understood as the

0 learning rate of the optimization algorithm. A regularization, Sii = Sii + λ(k)Sii, is applied to the diagonal elements of matrix S in all our calculation, where λ(k) =

119 max(100 × 0.9k, 10−4)[5]. This process iterates until the optimization converges, and we treat the converged energy as our best approximation of the ground state energy.

In this article, the method mentioned above is used for the optimization. We notice that the recent work of Saito[84], in which feedforward neural network was successfully used to represent the ground state of the Bose-Hubbard model. In their work, an exponential function was written based on the output of feedforward neural network.

It is an interesting question whether an exponential of feedforward neural network output function can be used to represent a quantum mechanical wavefunction.

6.3.2 One dimensional quantum harmonic oscillator in elec- tric field

To start with, we’d like to benchmark the quantum harmonic oscillator. Since we use a set of discrete quantum numbers to describe the variational wavefunction, it is natural to use the energy eigenbasis of an unperturbed harmonic oscillator to calculate the matrix element.

Consider the one dimensional Hamiltonian

pˆ2 xˆ2 H = + + Exˆ = H + Ex,ˆ (6.25) 2 2 0

where E is a parameter that can be understood as the electric field.

Using natural units, it is easy to see that the ground state energy of H0 is 0.5.

Assuming the eigenstates of H0 are labeled by |ni, the variational ansatz for the ground state of H can be approximated by

n −1 maxX |ψi = ψ(n)|ni, (6.26) n=0

120 with ψ(n) represented by an RBF network with one input neuron, and we truncate the quantum number to nmax − 1. In this notation, the RBF network represents the function ψ(n). The variable n can take different values, for example, if n = 1, the output of the neural network is the coefficient on the basis |1i, which is ψ(1).

The neural network represents the function ψ, and the coefficient on basis |ni are represented by ψ(n).

We use the VMC procedure described in Section 6.3.1 to conduct the calculation.

The parameters are initialized randomly. Our code is written in C++, where the matrix solving library Eigen[90] is used for the Stochastic Reconfiguration.

A neural network with random parameters is first created. And then the ground state energy under one set of parameters are calculated using Monte Carlo method.

The state space of the Monte Carlo sampling is a truncated discrete space denoted by n. Specifically, our quantum number is the quantum number of the unperturbed

Hamiltonian H0, and the basis are the eigenbasis of H0. We are trying to solve for the ground state of the perturbed one. A random plus or minus move is generated for each sample and accepted using the Metropolis Algorithm. In this work, when a random move yields a quantum number that is below zero at the boundary of state space, the quantum number is reflected back to positive in order to satisfy the detailed balance condition. For each specific n, we can plug it into the neural network and get its amplitude. During the Monte Carlo process, 50000 samples are used. Being able to calculate the energy, we can then use Stochastic Reconfiguration method to

find the minimal energy, and we treat this energy as our best approximation of the ground state energy.

121 In Figure. 6.2, we illustrate the minimization of ground state energy during the iteration process using the Gaussian basis function. See Eq.6.8. The learning rate is set at 0.1. m denotes the number of neurons in the hidden layer.

VMC results at E=0 using gaussian basis function 1.5 m=1 m=2 m=5 m=10 m=20

1

0.5 Sampled ground state energy

0 0 20 40 60 80 100 Number of iteration steps

Figure 6.2: Minimization of the ground state energy of H at E = 0, using Gaussian radial basis network. m is the number of hidden layers in the neural network.

Alternatively, we can use the exponential absolute value function as the RBF, see

Eq. 6.9. Under the same learning rates, this RBF network also converge to the correct eigenvalue, see Figure.6.3. It is easy to see that Gaussian RBF network behaves better than the other. Based on our experience, the Gaussian network also performs better in other cases, therefore we use the Gaussian network in later examples.

Remarks: We use n as our variable for the variational wavefunction. The output of ψ(n) is discrete. It should not be confused with the method that uses a Gaussian

122 function in the coordinate representation as the variational wavefunction, which is trivial. One reason that we compare Eq. 6.8 and Eq. 6.9 is to demonstrate that this method is capable of giving the correct coefficients regardless of the radial basis function.

VMC results at E=0 using absolute expential basis function 5 m=20 4.5 m=10 m=5 4 m=2

3.5

3

2.5

2

1.5

1 Sampled ground state energy 0.5

0 0 20 40 60 80 100 Number of iteration steps

Figure 6.3: Minimization of the ground state energy of H at E = 0, using Eq. 6.9 as the radial basis function. m is the number of hidden layers in the neural network.

Figure. 6.4 illustrates the behavior of VMC under different electric field. In our simulation, a separate neural network is trained for each E. The theoretical value

2 of the ground state energy eg is eg = 0.5(1 − E ). The VMC results converge at

0.375, 0.000,−1.460 when E = 0.5, 1.0, 2.0, while the exact value is at 0.375, 0.000,

−1.5 respectively. Notice that the error increase with E under certain nmax. In this section nmax = 20.

123 VMC results at different electric field E 4

E=2.0 E=0.5 3 E=1.0

2

1

0

Sampled ground state energy -1

-2 0 20 40 60 80 100 Number of iteration steps

Figure 6.4: Minimization of the ground state energy of H at E = 0.5, 1.0, 2.0, using Gaussian radial basis function.

Notice that during the optimization process, the sampled ground state energy may have some spikes. The author believes that this phenomenon is a result of the stochastic nature of the optimization algorithm. Random fluctuations of the expecta- tion value of the operator and the complicated structure of the energy function may lead to drastic changes in the ground state energy during the optimization process.

Figure. 6.5 shows ψ(n) as a function of n under different E. ψ(n) is normalized and its value means the overlap between new ground state of H and the energy eigenstate |ni of Ho.

Theoretically one can calculate that

Z ∞ 1 1 1 −(x−E)2/2 −x2/2 ψ(n) = √ ( ) 2 e e Hn(x)dx, (6.27) n −∞ 2 n! π

124 where Hn(x) are the Hermite polynomials. Simplify this expression, we will get

1 2 ψ(n) = √ Ene−E /4. (6.28) 2nn! It can be seen that VMC values agree very well with the exact value when E is small. Errors begin to increase when E gets larger.

Value of the trained neural network under different input value n 1 E=0.0 E=0.5 0.8 E=1.0 E=2.0 0.6

0.4

Coefficient 0.2

0

-0.2 0 5 10 15 20 Quantum number n

Figure 6.5: ψ(n) as a function of n at E = 0.0, 0.5, 1.0, 2.0, using Gaussian radial basis function. Circles represent theoretical values and asterisk represents the values with RBF network.

Based on these results, we claim that the radial basis neural network clearly captures the behavior of 1D quantum harmonic oscillator.

6.3.3 Two dimensional quantum harmonic oscillator in elec- tric field

Similarly, we can consider a radial basis neural network with many input neurons.

For example, with two input neurons, we can consider a two-dimensional quantum harmonic oscillator in an electric field.

125 Consider a Hamiltonian

pˆ 2 xˆ2 pˆ 2 yˆ2 H = x + + y + + E xˆ + E yˆ = H + E xˆ + E y.ˆ (6.29) 2 2 2 2 x y 0 x y

It is easy to see that the eigenvalue of H0 is 1.0. We will treat Ex and Ey as our parameters.

Our neural network wavefunction can be written as

n −1 maxX |ψi = ψ(nx, ny)|nx, nyi (6.30)

nx,ny=0 We can use the same VMC procedure as the previous part to perform the calcu- lation. The learning rate, in this case, is set at 0.2, our neural network has 10 hidden neurons and 2 input neurons. The algorithm used for this 2d example is similar to the 1d Harmonic Oscillator.

Figure 6.6 and 6.7 illustrate the behavior of the trained neural network at different electric field. From the shape of the surface, we can see that a proper choice of nmax is important to the accuracy of this method. The reason is that, in this example, when Ex, and Ey gets larger, the bump in the function ψ(n) will shift away from the origin. The states out of nmax are not considered, therefore the accuracy will be affected if the overlaps out of nmax are large. In these figures, we choose nmax = 10 to illustrate the influence of nmax on the accuracy.

The exact value of ψ(nx, ny) can be solved as

1 1 2 2 nx ny −Ex/4 −Ey /4 ψ(nx, ny) = E E e e . (6.31) p n p n x y 2xnx! 2y ny! Table 6.1 lists a sample of the relation between nmax and the VMC energy at

Ex = 4.0,Ey = 2.0. We can see that in this example the accuracy of the results improve with nmax.

126 E =1.0,E =1.0 x y

0.8

0.6

0.4 (n) ψ 0.2

0 0 10 5 5

n +1 0 10 n +1 x y

Figure 6.6: ψ(nx, ny) as a function of nx + 1, ny + 1 at Ex = 1.0,Ey = 1.0, using Gaussian radial basis function. In this figure, ψ(nx, ny) is not normalized.

Table 6.1: The relation between nmax and the VMC energy at Ex = 4.0,Ey = 2.0. nmax VMC energy 3 -6.28397 4 -7.80747 5 -8.02855 10 -8.71073 20 -8.90894 40 -8.98397

Exact value is 9.

127 E =4.0, E =2.0 x y

0.3

0.2 (n)

ψ 0.1 0 0 10 5 5 0 10 n +1 n +1 y x

Figure 6.7: ψ(nx, ny) as a function of nx + 1, ny + 1 at Ex = 4.0,Ey = 2.0, using Gaussian radial basis function. In this figure, ψ(nx, ny) is not normalized.

Figure. 6.8 shows ψ(nx, ny) as a function of nx and nx under different E =

(1.0, 1.0). We can see that numerical results agree well with exact results.

6.3.4 Particle in a box

Another example that is benchmarked is a particle in a box with perturbation.

Consider the Hamiltonian

pˆ2 H = + V (x) + axˆ = H + ax,ˆ (6.32) 2 0 with V (x) = 0 when 0 < x < 1 and V (x) = ∞ when x takes other values. axˆ is a linear potential defined on 0 < x < 1 with a as a parameter.

128 Comparision between numerical value and exact value at E=(1,1) 0.7 n =0 exact x n =0 numerical 0.6 x n =1 exact x n =1 numerical 0.5 x n =3 exact x

) n =3 numerical

y 0.4 x ,n x (n

ψ 0.3

0.2

0.1

0 0 1 2 3 4 5 6 7 8 9 n y

Figure 6.8: ψ(nx, ny) as a function of ny at different nx with Ex = 1.0,Ey = 1.0. Circles represent theoretical values and asterisk represents the values with RBF network. In this figure, ψ(nx, ny) is normalized.

π2 In natural units, the ground state energy of H0 is 2 = 4.9348. The first order perturbation theory correction for the ground state energy is a/2. The second order perturbation will give a correction of −0.002194a2.

A radial basis neural network VMC simulation can be similarly carried out. As always we choose the basis to be the eigenbasis of H0. 50000 samples are used. Ten hidden neurons ( m = 10 ) are chosen in our calculation . nmax is set at 20. The learning rates are set at 0.01. The matrix element in the local energy can be calculated as

n1+n2 4[(−1) − 1]n1n2 hn1|ax|n2i = a 2 2 2 , (6.33) (n1 − n2) (n1 + n2) π

129 when n1 6= n2. And

hn1|ax|n2i = 0.5a, (6.34)

when n1 = n2.

In Figure 6.9, The convergence VMC ground state at different parameters is illus- trated. Intermediate points that have a value which is larger than 20 are set at 20 to maintain the scale of this graph. Notice that we get more spikes during the iteration when a is small. The heights of the spikes decrease if smaller the learning rates are used.

VMC results for particle in a box with perturbation 20 a=0.0 a=2.0 a=4.0 15 a=8.0 a=-8.0

10

5 Sampled ground state energy 0 0 50 100 150 200 Number of iteration steps

Figure 6.9: Minimization of the ground state energy of H at a = 0.0, 2.0, 4.0, 8.0, −8.0

130 Table 6.2 compares the result using RBF network VMC, theoretical results up to

second-order perturbation theory, and exact results. The exact ground state energy

values are calculated using Mathematica. We can see that VMC performs much better

than first-order perturbation theory and converge to the ground state energy that is

very close to the theoretical ground state energy.

Table 6.2: Comparison between exact values, perturbation results and numerical VMC energy at different a.

a 1st order 2nd order VMC energy exact value 0.0 4.9348 4.9348 4.9348 4.93481 2.0 5.9348 5.9260 5.9260 5.92603 4.0 6.9348 6.8997 6.8977 6.89974 8.0 8.9348 8.7944 8.7957 8.79508 -8.0 0.9348 0.7944 0.7946 0.795078

6.3.5 Neural network as a Hermitian matrix lowest eigen- value solver

So far the examples that are benchmarked can all be solved by perturbation theory.

Can neural network VMC method have a wider application than perturbation theory?

In this part, we will illustrate the possibility of using RBF network VMC method

to solve for the smallest eigenvalue of a Hermitian matrix. This problem is non-

perturbative and purely mathematical, and our result implies that neural network

VMC can have much broader scope than perturbation method.

Consider an n × n Hermitian matrix H. The eigenvector that corresponds to the lowest energy is an n dimensional vector.

131 We can write this eigenvector as

n X ~x = ψ(i)ˆi, (6.35) i=1 and any vector in this finite vector space can be written in this form.

Define the objective function to be

E = ~x∗H~x. (6.36)

Then the smallest value of E corresponds to the lowest eigenvalue of H. And our

goal is to find a set of parameters in neural network ψ that minimize E.

We can convert the matrix multiplication in E into a discrete sum, which can

be evaluated using the Metropolis algorithm. Instead of the energy eigenbasis, in

this situation, we can choose our configuration space to be n points, where n is the dimension of vector ~x, and the trial move would be from basis ˆi to iˆ0. Therefore we can use the same VMC technique to minimize E.

Our previous examples can be essentially understood in this way since our Hamil- tonians are truncated to a finite dimensional matrix.

To give a concrete implementation of this idea, we consider a matrix

H(d)pq = 1/p + 1/q. (6.37)

Here H(d) is a d × d dimensional matrix. p, q are the label for H(d). And the

matrix element on the p-th row and q-th column equals 1/p + 1/q.

We use the RBF network ansatz to calculate the lowest eigenvalue of H(d). Hidden

neuron numbers are set at 20. 50000 samples are chosen. Iteration undergoes 300

steps and learning rate is 0.01. Table 6.3 shows the result of our VMC simulation.

132 Table 6.3: VMC results of the lowest eigenvalue of H(d). d exact value VMC result 2 -0.0811 -0.0811 3 -0.1874 -0.1874 5 -0.4219 -0.4218 10 -1.008 -1.002

Our optimized neural network also yields the eigenvector that corresponds to the

lowest eigenvalue. The components can be acquired by plugging in i into ψ(i). For −→ example, when n = 10, VMC gives a eigenvector V , which is (0.6851,0.1174,-0.0711,- −→ 0.1646,-0.2200,-0.2562,-0.2813,-0.2994,-0.3127,-0.3226), while the exact vector V0 is

(0.6807,0.1194,-0.0677,-0.1613,-0.2174,-0.2548,-0.2816,-0.3016,-0.3172,-0.3297). The Eu-

−→ −→ −4 clidean norm of the error d = |V − V0| = 1.3 × 10 .

We also calculate the relation between the accuracy and m (the number of neurons

in the hidden layer). For d = 10, the variational energy is −0.0811, −0.0811, −0.9943, −1.0002 for m = 5, 10, 15, 20 respectively.

Caveat: The learning rate depends on the number of hidden neurons, and it has to be set by trial and error. We also have to point out that when d > 10, the VMC optimization procedure may converge slowly or fail to converge. The stability also depends on forms of H. For some large ill-conditioned matrices, it is expected that the random sampling process will not capture all the matrix elements and lead to inaccurate results.

133 6.4 Discussion

Is it possible to use a neural network with continuous variables as the variational wavefunction, for example, a neural network where the input is the coordinate of the position? This is possible for some certain problems. For example, we can use an RBF network with a Gaussian basis as the variation wavefunction for the ground state of a harmonic oscillator. Based on our test, although this ansatz works perfectly for the harmonic oscillator, the iteration may not converge to the correct value when applied to other models. This test is trivial for the harmonic oscillator since its ground state is intrinsically a Gaussian function. For wavefunctions with continuous variables, the

Kato’s cusp condition[91] poses strong constraints on the mathematical form of the wavefunction. A wavefunction that does not satisfy this condition will result in strong numerical instability in the VMC calculation.

How is this approach useful? This approach provides a new way to find the ground state energy of a quantum system. Compared with traditional variational Monte

Carlo simulation, this method does not require choosing a specific wavefunction from our intuition. Does this method depend on choosing a basis |ni? The example on the diagonalization of a Hermitian matrix illustrates that it doesn’t depend on it as well, although a good basis may improve the accuracy and stability.

One advantage of ANN-based VMC is that the code is easy to modularize. When programming, we can write the modules for a neural network, Hamiltonian, and optimization separately. For the same Hamiltonian, we can also compare the rep- resentation power of different neural networks and different optimization methods.

This greatly reduces programming difficulties and improves accuracy.

134 A potential issue with the neural network VMC method is that the optimization algorithm may fail to find the global minimum of the objective function. This is a common issue in machine learning methods. We see that the stochastic reconfigura- tion may not work well enough that it could find the smallest eigenvalue of a matrix of arbitrarily large dimension. Therefore, finding a stable algorithm or stable neural network mathematical form for the VMC optimization should be a crucial task. If successful, the neural network VMC method may give numerical conclusions to many unsolved problems in quantum physics.

Based on the above points, one important research direction is to develop more efficient VMC optimization algorithms. Another interesting direction is to discuss the representation power of different neural networks since there are a variety of neural networks developed by the machine learning community. For example, one interesting problem is the representation power of a continuous restricted Boltzmann machine[92]. With a Gaussian activation function, a continuous restricted Boltzmann machine has some similarities with the RBF network ansatz discussed in this paper.

It is promising to provide more accurate results due to the elegant mathematical structure of the restricted Boltzmann machine.

135 Chapter 7: Conclusions

In this dissertation, we review some common numerical methods in quantum many-body systems. For example, the exact diagonalization method, the Density

Matrix Renormalization Group approach, and the Matrix Product States theory and the quantum Monte Carlo Methods.

First, we discussed the generalization of TRG to higher dimensions and a system- atical contraction scheme is proposed. This method currently agrees well with the

Monte Carlo results at a high magnetic field. Further development in CPD is needed to give rise to a more accurate physical result for any magnetic fields.

Second, we established the connection between tensor decomposition theory and the geometric measure of entanglement. We found agreements between theoretical and numerical method. Furthermore, we searched and characterized several quantum states with high entanglement. We illustrated that the tensor decomposition method is an efficient and accurate method for the calculation of the geometric measure of entanglement.

Third, Machine Learning theory are reviewed. We use the radial basis function network as the variational wavefunction for quantum systems. Variational Monte

Carlo calculations are carried out. For the examples that we examined, the VMC results agree well with theoretical predictions. Our results demonstrate that neural

136 network wave functions are capable of representing the ground state of a quantum mechanical system. Furthermore, it is possible to use VMC method to calculate the lowest eigenvalue of a Hermitian matrix. It is promising to represent the wavefunction using an artificial neural network.

Developing practical numerical methods for a many-body system is important for the development and the understanding of many-body physics. The direction of applying machine learning methods to many-body physics is currently being actively studied. It would be promising to see that new numerical methods may provide insights for strongly correlated systems.

137 Bibliography

[1] D. C. Tsui, H. L. Stormer, and A. C. Gossard. Two-dimensional magnetotrans- port in the extreme quantum limit. Phys. Rev. Lett., 48:1559–1562, May 1982.

[2] R. B. Laughlin. Anomalous quantum hall effect: An incompressible quantum fluid with fractionally charged excitations. Phys. Rev. Lett., 50:1395–1398, May 1983.

[3] Steven R. White. Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett., 69:2863–2866, Nov 1992.

[4] Romn Ors. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of Physics, 349:117 – 158, 2014.

[5] Giuseppe Carleo and Matthias Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325):602–606, 2017.

[6] Michael Levin and Cody P. Nave. Tensor renormalization group approach to two-dimensional classical lattice models. Phys. Rev. Lett., 99:120601, Sep 2007.

[7] Tzu-Chieh Wei and Paul M. Goldbart. Geometric measure of entanglement and applications to bipartite and multipartite quantum states. Phys. Rev. A, 68:042307, Oct 2003.

[8] Peiyuan Teng. Generalization of the tensor renormalization group approach to 3-d or higher dimensions. Physica A: Statistical Mechanics and its Applications, 472:117 – 135, 2017.

[9] Peiyuan Teng. Accurate calculation of the geometric measure of entanglement for multipartite quantum states. Quantum Information Processing, 16(7):181, Jun 2017.

[10] Peiyuan Teng. Machine learning quantum mechanics: solving quantum mechan- ics problems using radial basis function network, 2017.

[11] Jill C. Bonner and Michael E. Fisher. Linear magnetic chains with anisotropic coupling. Phys. Rev., 135:A640–A658, Aug 1964.

138 [12] Cornelius Lanczos. An iteration method for the solution of the eigenvalue prob- lem of linear differential and integral operators. J. Res. Natl. Bur. Stand. B, 45:255–282, 1950.

[13] Kenneth G. Wilson. The renormalization group: Critical phenomena and the kondo problem. Rev. Mod. Phys., 47:773–840, Oct 1975.

[14] U. Schollw¨ock. The density-matrix renormalization group. Rev. Mod. Phys., 77:259–315, Apr 2005.

[15] Gabriele De Chiara, Matteo Rizzi, Davide Rossini, and Simone Montangero. Density matrix renormalization group for dummies. Journal of Computational and Theoretical Nanoscience, 5(7):1277–1288, 2008.

[16] B Bauer, L D Carr, H G Evertz, A Feiguin, J Freire, S Fuchs, L Gamper, J Gukelberger, E Gull, S Guertler, A Hehn, R Igarashi, S V Isakov, D Koop, P N Ma, P Mates, H Matsuo, O Parcollet, G Pawowski, J D Picon, L Pollet, E Santos, V W Scarola, U Schollwck, C Silva, B Surer, S Todo, S Trebst, M Troyer, M L Wall, P Werner, and S Wessel. The alps project release 2.0: open source software for strongly correlated systems. Journal of Statistical Mechanics: Theory and Experiment, 2011(05):P05001, 2011.

[17] E. M. Stoudenmire. Itensor library. http://itensor.org/.

[18] F. Verstraete, V. Murg, and J.I. Cirac. Matrix product states, projected en- tangled pair states, and variational renormalization group methods for quantum spin systems. Advances in Physics, 57(2):143–224, 2008.

[19] M. Fannes, B. Nachtergaele, and R. F. Werner. Finitely correlated states on quantum spin chains. Communications in Mathematical Physics, 144(3):443– 490, 1992.

[20] Frank Verstraete and J Ignacio Cirac. Renormalization algorithms for quantum- many body systems in two and higher dimensions. arXiv preprint cond- mat/0407066, 2004.

[21] Stellan Ostlund¨ and Stefan Rommer. Thermodynamic limit of density matrix renormalization. Phys. Rev. Lett., 75:3537–3540, Nov 1995.

[22] G. Vidal. Entanglement renormalization. Phys. Rev. Lett., 99:220405, Nov 2007.

[23] L. P. Kadanoff. Spin-spin correlations in the two-dimensional ising model. Il Nuovo Cimento B (1965-1970), 44(2):276–305, 1966.

[24] H. H. Zhao, Z. Y. Xie, Q. N. Chen, Z. C. Wei, J. W. Cai, and T. Xiang. Renor- malization of tensor-network states. Phys. Rev. B, 81:174411, May 2010.

139 [25] G. Evenbly and G. Vidal. Tensor network renormalization. Phys. Rev. Lett., 115:180405, Oct 2015.

[26] Z. Y. Xie, J. Chen, M. P. Qin, J. W. Zhu, L. P. Yang, and T. Xiang. Coarse- graining renormalization by higher-order singular value decomposition. Phys. Rev. B, 86:045139, Jul 2012.

[27] H. C. Jiang, Z. Y. Weng, and T. Xiang. Accurate determination of tensor network state of quantum lattice models in two dimensions. Phys. Rev. Lett., 101:090603, Aug 2008.

[28] Shi-Ju Ran, Bin Xi, Tao Liu, and Gang Su. Theory of network contractor dynamics for exploring thermodynamic properties of two-dimensional quantum lattice models. Phys. Rev. B, 88:064407, Aug 2013.

[29] Tomotoshi Nishino, Yasuhiro Hieida, Kouichi Okunishi, Nobuya Maeshima, Ya- suhiro Akutsu, and Andrej Gendiar. Two-dimensional tensor product variational formulation. Progress of Theoretical Physics, 105(3):409–417, 2001.

[30] Artur Garc´ıa-S´aezand Jos´eI. Latorre. Renormalization group contraction of tensor networks in three dimensions. Phys. Rev. B, 87:085130, Feb 2013.

[31] Wei Li, Shi-Ju Ran, Shou-Shu Gong, Yang Zhao, Bin Xi, Fei Ye, and Gang Su. Linearized tensor renormalization group algorithm for the calculation of ther- modynamic properties of quantum lattice models. Phys. Rev. Lett., 106:127202, Mar 2011.

[32] Magnus J Wenninger. Dual models. Cambridge University Press, 2003.

[33] Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455–500, 2009.

[34] J. B. Kruskal. Multiway data analysis. pages 7–18, 1989.

[35] Brett W. Bader, Tamara G. Kolda, et al. Matlab tensor toolbox version 2.6. Available online, February 2015.

[36] Evrim Acar, Daniel M. Dunlavy, and Tamara G. Kolda. A scalable optimization approach for fitting canonical tensor decompositions. Journal of Chemometrics, 25(2):67–86, 2011.

[37] Robert NC Pfeifer, Glen Evenbly, Sukhwinder Singh, and Guifre Vidal. Ncon: A tensor network contractor for matlab. arXiv preprint arXiv:1402.0939, 2014.

[38] Harold Scott Macdonald Coxeter. Regular polytopes. Courier Corporation, 1973.

140 [39] Vin de Silva and Lek-Heng Lim. Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM Journal on Matrix Analysis and Appli- cations, 30(3):1084–1127, 2008.

[40] ABNER SHIMONY. Degree of entanglementa. Annals of the New York Academy of Sciences, 755(1):675–679, 1995.

[41] H Barnum and N Linden. Monotones and invariants for multi-particle quantum states. Journal of Physics A: Mathematical and General, 34(35):6787, 2001.

[42] Martin Aulbach, Damian Markham, and Mio Murao. The maximally entangled symmetric state in terms of the geometric measure. New Journal of Physics, 12(7):073025, 2010.

[43] Alexander Streltsov, Hermann Kampermann, and Dagmar Bruß. Simple algo- rithm for computing the geometric measure of entanglement. Phys. Rev. A, 84:022323, Aug 2011.

[44] Shenglong Hu, Liqun Qi, and Guofeng Zhang. Computing the geometric measure of entanglement of multipartite pure states by means of non-negative tensors. Phys. Rev. A, 93:012304, Jan 2016.

[45] F. Verstraete, V. Murg, and J.I. Cirac. Matrix product states, projected en- tangled pair states, and variational renormalization group methods for quantum spin systems. Advances in Physics, 57(2):143–224, 2008.

[46] Guyan Ni and Minru Bai. Spherical optimization with complex variablesfor com- puting us-eigenpairs. Computational Optimization and Applications, 65(3):799– 820, 2016.

[47] O. Curtef, G. Dirr, and U. Helmke. Conjugate gradient algorithms for best rank-1 approximation of tensors. PAMM, 7(1):1062201–1062202, 2007.

[48] Guyan Ni, Liqun Qi, and Minru Bai. Geometric measure of entanglement and u-eigenvalues of tensors. SIAM Journal on Matrix Analysis and Applications, 35(1):73–87, 2014.

[49] Marco Enr´ıquez, Zbigniew Pucha la, and Karol Zyczkowski.˙ Minimal r´enyi– ingarden–urbanik entropy of multipartite quantum states. Entropy, 17(7):5063– 5084, 2015.

[50] Robert Raussendorf and Hans J. Briegel. A one-way quantum computer. Phys. Rev. Lett., 86:5188–5191, May 2001.

[51] Gilad Gour and Nolan R. Wallach. All maximally entangled four-qubit states. Journal of Mathematical Physics, 51(11):112201, 2010.

141 [52] Dorje C. Brody and Lane P. Hughston. Geometric quantum mechanics. Journal of Geometry and Physics, 38(1):19 – 53, 2001.

[53] Arie Kapteyn, Heinz Neudecker, and Tom Wansbeek. An approach ton-mode components analysis. Psychometrika, 51(2):269–275, 1986.

[54] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. On the best rank-1 and rank-(r1 ,r2 ,. . .,rn) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications, 21(4):1324–1342, 2000.

[55] M. Blasone, F. Dell’Anno, S. De Siena, and F. Illuminati. Hierarchies of geo- metric entanglement. Phys. Rev. A, 77:062304, Jun 2008.

[56] Tzu-Chieh Wei. Entanglement under the renormalization-group transformations on quantum states and in quantum phase transitions. Phys. Rev. A, 81:062313, Jun 2010.

[57] Qian-Qian Shi, Hong-Lei Wang, Sheng-Hao Li, Sam Young Cho, Murray T. Batchelor, and Huan-Qiang Zhou. Geometric entanglement and quantum phase transitions in two-dimensional quantum lattice models. Phys. Rev. A, 93:062341, Jun 2016.

[58] Christopher J. Hillar and Lek-Heng Lim. Most tensor problems are np-hard. J. ACM, 60(6):45:1–45:39, November 2013.

[59] Yichen Huang. Computing quantum discord is np-complete. New Journal of Physics, 16(3):033027, 2014.

[60] W. L. McMillan. Ground state of liquid He4. Phys. Rev., 138:A442–A451, Apr 1965.

[61] Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Au- gusta H. Teller, and Edward Teller. Equation of state calculations by fast com- puting machines. The Journal of Chemical Physics, 21(6):1087–1092, 1953.

[62] Robert Jastrow. Many-body problem with strong forces. Phys. Rev., 98:1479– 1484, Jun 1955.

[63] Julien Toulouse, Roland Assaraf, and Cyrus J. Umrigar. Chapter fifteen - in- troduction to the variational and diffusion monte carlo methods. In Philip E. Hoggan and Telhat Ozdogan, editors, Electron Correlation in Molecules ab ini- tio Beyond Gaussian Quantum Chemistry, volume 73 of Advances in Quantum Chemistry, pages 285 – 314. Academic Press, 2016.

[64] R.C Grimm and R.G Storer. Monte-carlo solution of schrodinger’s equation. Journal of Computational Physics, 7(1):134 – 156, 1971.

142 [65] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 1 edition, 2007.

[66] Peter Harrington. Machine Learning in Action. Manning Publications Co., Greenwich, CT, USA, 2012.

[67] J.A.K. Suykens and J. Vandewalle. Least squares support vector machine clas- sifiers. Neural Processing Letters, 9(3):293–300, Jun 1999.

[68] J. A. Hartigan and M. A. Wong. Algorithm AS 136: A K-Means Clustering Algorithm. Applied Statistics, 28(1):100–108, 1979.

[69] Svante Wold, Kim Esbensen, and Paul Geladi. Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1):37 – 52, 1987. Proceed- ings of the Multivariate Statistical Workshop for Geologists and Geochemists.

[70] P. Smolensky. Parallel distributed processing: Explorations in the microstruc- ture of cognition, vol. 1. chapter Information Processing in Dynamical Systems: Foundations of Harmony Theory, pages 194–281. MIT Press, Cambridge, MA, USA, 1986.

[71] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.

[72] Florian Hase, Stephanie Valleau, Edward Pyzer-Knapp, and Alan Aspuru-Guzik. Machine learning exciton dynamics. Chem. Sci., 7:5139–5147, 2016.

[73] Michael Gastegger, Jorg Behler, and Philipp Marquetand. Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci., 8:6924– 6935, 2017.

[74] Felix Brockherde, Leslie Vogt, Li Li, Mark E. Tuckerman, Kieron Burke, and Klaus-Robert M¨uller.Bypassing the kohn-sham equations with machine learning. Nature Communications, (1):872.

[75] Paul Raccuglia, Katherine C. Elbert, Philip D. F. Adler, Casey Falk, Malia B. Wenny, Aurelio Mollo, Matthias Zeller, Sorelle A. Friedler, Joshua Schrier, and Alexander J. Norquist. Machine-learning-assisted materials discovery using failed experiments. Nature, 533:73 EP –, May 2016.

[76] Juan Carrasquilla and Roger G. Melko. Machine learning phases of matter. Nature Physics, 13:431 EP –, Feb 2017.

[77] Warren S. McCulloch and Walter Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4):115–133, Dec 1943.

143 [78] Friedhelm Schwenker, Hans A. Kestler, and Gnther Palm. Three learning phases for radial-basis-function networks. Neural Networks, 14(4):439 – 458, 2001.

[79] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.

[80] Franco Scarselli and Ah Chung Tsoi. Universal approximation using feedforward neural networks: A survey of some existing methods, and some new results. Neural Networks, 11(1):15 – 37, 1998.

[81] J. Park and I. W. Sandberg. Universal approximation using radial-basis-function networks. Neural Computation, 3(2):246–257, 1991.

[82] I.E. Lagaris, A. Likas, and D.I. Fotiadis. Artificial neural network methods in quantum mechanics. Computer Physics Communications, 104(1):1 – 14, 1997.

[83] Adenilton Jos da Silva, Teresa Bernarda Ludermir, and Wilson Rosa de Oliveira. Quantum perceptron over a field and neural network architecture selection in a quantum computer. Neural Networks, 76(Supplement C):55 – 64, 2016.

[84] Hiroki Saito. Solving the bosehubbard model with machine learning. Journal of the Physical Society of Japan, 86(9):093001, 2017.

[85] Peter Broecker, Juan Carrasquilla, Roger G. Melko, and Simon Trebst. Machine learning quantum phases of matter beyond the fermion sign problem. Scientific Reports, 7(1):8823, 2017.

[86] T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9):1481–1497, Sep 1990.

[87] C. J. Umrigar, Julien Toulouse, Claudia Filippi, S. Sorella, and R. G. Hen- nig. Alleviation of the fermion-sign problem by optimization of many-body wave functions. Phys. Rev. Lett., 98:110201, Mar 2007.

[88] Sandro Sorella. Wave function optimization in the variational monte carlo method. Phys. Rev. B, 71:241103, Jun 2005.

[89] Brenda Rubenstein. Introduction to the Variational Monte Carlo Method in Quantum Chemistry and Physics, pages 285–313. Springer Singapore, Singapore, 2017.

[90] Ga¨elGuennebaud, BenoˆıtJacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010.

[91] Tosio Kato. On the eigenfunctions of many-particle systems in quantum mechan- ics. Communications on Pure and Applied Mathematics, 10(2):151–177, 1957.

144 [92] Hsin Chen and Alan Murray. A continuous restricted boltzmann machine with a hardware- amenable learning algorithm. In Jos´eR. Dorronsoro, editor, Artifi- cial Neural Networks — ICANN 2002, pages 358–363, Berlin, Heidelberg, 2002. Springer Berlin Heidelberg.

[93] Liqun Qi. The best rank-one approximation ratio of a tensor space. SIAM Journal on Matrix Analysis and Applications, 32(2):430–442, 2011.

[94] A. Higuchi and A. Sudbery. How entangled can two couples get? Physics Letters A, 273(4):213 – 217, 2000.

[95] M Enr´ıquez,I Wintrowicz, and Karol Zyczkowski.˙ Maximally entangled mul- tipartite states: A brief survey. Journal of Physics: Conference Series, 698(1):012003, 2016.

[96] Iain D K Brown, Susan Stepney, Anthony Sudbery, and Samuel L Braunstein. Searching for highly entangled multi-qubit states. Journal of Physics A: Math- ematical and General, 38(5):1119, 2005.

[97] A Borras, A R Plastino, J Batle, C Zander, M Casas, and A Plastino. Multi- qubit systems: highly entangled states and entanglement distribution. Journal of Physics A: Mathematical and Theoretical, 40(44):13407, 2007.

145