<<

Tensor Network Benchmarking for Eugene Dumitrescu

Collaborators: Alex McCaskey, Dmitry Liakh, Travis Humble, Raphael Pooser, Pavel Lougovski

ORNL is managed by UT-Battelle This work is supported by the DOE Quantum for the US Department of Energy Testbed Pathfinder and ORNL LDRD projects. Goals for this talk:

1. Outline capabilities of a near term quantum devices 2. Discuss potential first programs: • “Find”, realize, and sample states from quantum physics — type of quantum simulation 3. How large scale classical computations validate, benchmark, and improve quantum hardware/ software

2 Quantum Computing Institute What is (quantum) information?

• Physical state of the systems is the set of position and momenta of all constituent particles. • A vector in “phase space”.

dq @H µ = dt @pµ dp @H µ = dt @qµ

An n bits is described by…n bits ˆ H (r)=E (r) • State described by complex valued wavefunction. • Superposition of ‘classical’ states leads to entanglement and richer computational space

Babadi, Demler, Knap PRX (2015) = i ,i ,...i | i i1,i2,...in | 1 2 ni ~iX=0,1 R | i L 3 Quantum Computing Institute | i State space:

Information 0 0 physically 1 | i | i encoded | i QuantumQubit bit = 0 | i | i = 1 U1| i | i = ↵ 0 + 1 U2U1| i | i | i 1 | i

q | 1i U1 U2 q2 4 | i 3 U q3 U | i time

4 Quantum Computing Institute The Quantum Advantage

• QFT P. Shor • Factoring/Cryptography

• Phase • Partition Functions (Sampling) • Discrete Optimization Estimation • Machine Learning/AI

• Grover search

• Linear algebra

• Materials Science • Quantum • Chemistry • Biological System simulation • High-energy Physics • Linear Systems (PDEs)

5 Quantum Computing Institute : 2017 and beyond Rigetti IBM Limitations: ~ 50 qubits ~ 50 gates

Ritter 3:30 PM

Delft, NL Future: ~ 500 qubits ~ 500 gates

Near term applications? Can realize many states Could experimental () supported by 50 qubit realizations find applications in system. quantum systems? ==> => IBM, Google: Chemistry, ML

6 Quantum Computing Institute => ORNL: Nuclear, Cond-matt Resources: a closer look

new 20/50 qubit devices Cons Pros • Locality constraints • High dimensional state space • Temporal (noise) • Fast operation constraints • Computationally supreme?

7 Quantum Computing Institute HPC with QPU ORNL Testbed Pathfinder PI: R. Pooser 2PM Tomorrow

+

8 Quantum Computing Institute Our solution - XACC Specification

Treat near-term QPUs asQuantum accelerators withinaccelerator a larger pXACCrogramming Specification XACC - Heterogeneous HPC environment. How do we program this? Quantum Intermediate Representation CPU-QPU https://github.com/ORNL-QCI/xacc Programming Model We have provided a solution - (QIR) Specification Quantum Programming XACC Specification: Key insight: Provide common Implementations • Familiar API and Landscape with XACC QMB representation to map N QPLs to N QPUs Abstraction for single QPU instructions Programming model Hadamard Scaffold ProjectQ pyQuil ... QPL-N Scaffold ProjectQ pyQuil ... QPL-N • OpenCL-like - high-level QuantumInstruction kernel compilation and execution API Compiler Frontend CNOT • LLVM-like - language and XACC - Heterogeneous Abstraction for composition of hardware agnostic Quantum Intermediate QPU instructions CPU-QPU through a well designed Representation qfoo1() Programming Model intermediate QuantumFunction representation Backend Generator qfoo2() CNOT • Program quantum code once, X H in your language, and XACC IBM Google Rigetti QPU-N QPU ... 1 +1 handles the rest. QPU QPU Takes QIR and performs IBM Google Rigetti 0 1 +1 QPU-N QPU QPU QPU ... hardware independent and Simulator U(✓) 1 dependent transformations. 1 +1 0 3 billings7893Enabling Quantum Acceleration in Scientific High Performance Computing - ExaTensor Midterm Review 9 Quantum Computing Institute 7 billings7893Enabling Quantum Acceleration in Scientific High Performance Computing - Midterm Review Near Term Algorithms and Programs

10 Quantum Computing Institute Pros lead to supremacy

A quantum supreme device: Applications:

• Performing a classically intractable • A different kind of characterization computation tool compared to • Does NOT mean outperform tomography. classical computers at ANY other • Randomized benchmarking 2.0? computations • Exponentially large Hilbert space • ‘Supreme’ in a very limited sense used to • Supremacy hurdle must be • Encode strongly correlated overcome before for some quantum mechanical states exponential advantages to be • Large dimensional ‘fitting’ (a la realized (e.g. Shor) neural networks)

11 Quantum Computing Institute Supremacy applied: Quantum Many Body

Hˆ (r)=E (r)

• Matrix (vector) representation of quantum modes and dynamics scales exponentially • Number of Hamiltonian terms grows polynomially (N4, N8) • Use QPU to represent quantum objects as needed! 1 Hˆ = h cˆ† cˆ + h cˆ† cˆ†cˆ cˆ pq p q 2 pqrs p q r s pq pqrs X X = trial wavefunction: (✓) = U(✓) HF | i | i

i ij T (✓)= t c†c + t c†c†c c + (T (✓)) j i j kl i j k l O 3 T (✓) T †(✓) i virt i>j virt U(✓) e j2Xocc k>lX2 occ 2 2 ⌘ Minimize Energy/ Objective: E(✓) = (✓) h (✓) h i h | i| i i X 12 Quantum Computing Institute Hybrid computation and variational methods No coherent feedback No Quantum state preparation (✓) = U(✓) | i | 0i … convergence?

Shen 2017 (HeH+ ) 1 +1

update ✓ classical computation/post-processing

Peruzzo, McClean 2014 13 Quantum Computing Institute Example Chemistry Code using XACC

github.com/QISKit/

Source https://github.com/ORNL-QCI/xacc-vqe

14 Quantum Computing Institute Metrics for experimental performance

O’Malley (Google) PRX 2016

Kandala (IBM) Nature 2017

• Total (classical + quantum) runtime is metric for successful computation. • I.e. determines energy/properties to within a pre- determined tolerance. • Keep time constant and study precision scaling (future work) Classical • Simulate performance of noiseless algorithm Hybrid Total • Sets bound lower bound on runtime via gate count/circuit depth runtime • Noisy simulations — generalize runtime to include classical computations with additional classical post-processing. • Execute program on hardware. • Additional post-processing may be required system size • Computational results within Bayesian framework?

15 Quantum Computing Institute Enhanced post processing — partial tomography 0 | i = ↵ 0 + 1 ↵2 2 | i | i | i | || | = ↵0 + 0 + ⇢ | i |i | i

= ↵00 +i + 00 i | i | i | i 1 | i

16 Quantum Computing Institute HPC Tensor Network Simulation: Quantum Validation and Benchmarking

-Dirac 1929

• Exponential quantum state space exposes computational intractability of brute force simulations • classical computation limited to <50 qubits • Compression methods!

R. Orus ‘15

17 Quantum Computing Institute Compressed quantum spaces

Entanglement structure ~ complexity of state representation

= ci1,i2,...in i1,i2,...in 0 0 0 | i | i | i | i | i ~iX=0,1 vs. general entangled states

1 1 1 | i | i | i

Q: Way to systematically describe entanglement structure? A: Schmidt Decomposition across a bipartition

= i i i 1 | i | i | i 0 = 1 = ij p X 2

18 Quantum Computing Institute Many-body TN states

= c i ,i ,...i | i i1,i2,...in | 1 2 ni ~iX=0,1 • Compression for multi-linear mapping (i.e. tensor) • Auxiliary spaces (small) encode correlations. Gauge freedom. • Come from Renormalization group. • Like convolutional neural network for entanglement

DMRG White 92 MPS Tree MERA Vidal 09’ PRL 19 Quantum Computing Institute Tensor Network Implementation: Look to future QPU Algorithms

SVD replaced by GPU tensor optimization https://github.com/DmitryLyakh/TAL_SH Single node - multi CPU https://github.com/g1257/merapp Single node - GPU

Distributed TN CPU Distributed TN GPU

May 2018

Source https://github.com/ORNL-QCI/tnqvm

20 Quantum Computing Institute Simulation Scalability

Device Notes

18,688 AMD Opteron 16-core CPUs ~9200 POWER9 44-core CPUs Specs 32 GB Ram 18,688 Nvidia Tesla K20X GPUs ~27,600 Nvidia Tesla V100 GPUs i7 CPU 10 PFLOPS 150–300 PFLOPS

Scales with ~45qubits Brute force ~25 qubits ~42 qubits total RAM 1/2 hard to ~90 qubits Tree Shor 39 qubits ~80 qubits simulate Classical part MPS VQE >>100 qubits >>200 qubits >>250 qubits more difficult

PEPS MPS 1D: 50 qubits 1D: 90 qubits 1D: 84 qubits algorithm Supremacy 2D: 25 qubits 2D: 45 qubits 2D: 42 qubits better for 2D

Performance MERA 1D: Unbounded 1D: Unbounded 1D: Unbounded unknown. Supremacy 2D: ??? 2D: >50? 2D: >60 Algorithm under development. MERA++: G. Alvarez, D. Liakh

Quantum Computing Institute Dual Register System Simulation

r 1 (22l 1)/r 1 d e = jr + i xi mod N . | i pr 0 | i1 ⌦ | i i=0 j=0 X X @ A

Choose best tensor network algorithms to simulate quantum computations (different from states of matter)

(a) CU8 (b) (c) + | i4 4 4 =2 ...

3

3 3 3

1 2 1 2 1 2 2

22 Quantum Computing(d Institute) (e) Dumitrescu (2017) 1

3 4

1 2 1 2 3 4 Simulations Discretized circuit UCC (Trotterization) Simulated VQE

• Study complexity of VQE state preparation circuits. • Exploit symmetries and structure to enable realizable quantum programs • Verified larger molecules VQE energetics with UCC. • Looking for realistic solutions. • Digitizing IBM-style interlaced entangler/ non-local gate ansatz

Better quantum circuit found using tnqvm!

23 Quantum Computing Institute Cost of VQE • Fundamental building is parametrized T d(100ns) + 2µs quantum circuit circuit ⇡ • Perform ensemble averaging (determines variance to leading order) over circuit for each set of non-commuting N N N sampling ⇡ ensemble ⇤ terms • Classical optimization until convergence (or fixed N 4 time) ⇠ orbitals • get linear circuit depth by increasing classical complexity Nupdates • scales poorly with number of classical parameters (don’t want to get stuck doing classical optimization)

• pre-sampling to generate Jacobian, measure all Hessians N exp(L) overhead ⇠ observables • Error mitigation/tomography/analysis overhead L simplified error model ⇠ sacrifice accuracy • Find ‘sweet spot’ T = T N (N + N ) total circuit ⇤ ensemble ⇤ terms overhead • Communication/Latency cost?

24 Quantum Computing Institute Conclusions

• Introduced modern QPU • Good at navigating Hilbert space, preparing (classically intractable amplitude) probability distribution • Look for ‘important’ regions which encode relevant problem • Hybrid-algorithms get the most out of short depth circuits by augmenting quantum hardware with classical resources • Relevant to quantum systems (Chemistry, Cond-matt, Nuclear, etc) • Simulations vs. Execution • Computational complexity function of classical and quantum algorithms • Classical overhead improves accuracy (but avoid exponentially scaling correction schemes!) • TN algorithms enable simulation of today’s hybrid programs, provide excellent tool to evaluate future programs, debug etc. • Quantum programs should become intractable (event for TN algorithms) in future. Benchmarking before this stage gives us confidence in their performance.

25 Quantum Computing Institute