Reinforcement Learning for Quantum Memory
Total Page:16
File Type:pdf, Size:1020Kb
Reinforcement Learning for Quantum Memory Thomas F¨osel Marquardt group Max Planck Institute for the Science of Light Erlangen, Germany May 8th, 2019 X [T. F¨osel,P. Tighineanu, T. Weiss & F. Marquardt, PRX 8, 031084 (2018)] AlphaGo Oct 2015 Fan Hui 5:0 Mar 2016 Lee Sedol 4:1 May 2017 Ke Jie 3:0 PRX 8, 031084 T. F¨osel,MPL Erlangen 2/19 What are the \games" that we play in physics? PRX 8, 031084 T. F¨osel,MPL Erlangen 3/19 unexperienced (random) X X X reinforcement learning (no human guidance!) smart e. g.: manipulating quantum systems X PRX 8, 031084 T. F¨osel,MPL Erlangen 4/19 X X X e. g.: manipulating unexperienced (random) quantum systems reinforcement learning (no human guidance!) X smart PRX 8, 031084 T. F¨osel,MPL Erlangen 4/19 long-term vision: unified approach to quantum error correction Quantum error correction: the RL approach hybrid dynamical DD-DFS decoupling phase estimation stabilizer codes decoherence-free subspaces temporal correlation of the noise temporal correlation spatial correlation of the noise PRX 8, 031084 T. F¨osel,MPL Erlangen 5/19 Quantum error correction: the RL approach hybrid dynamical DD-DFS decoupling phase estimation RL for Quantum Feedback stabilizer codes decoherence-free subspaces temporal correlation of the noise temporal correlation spatial correlation of the noise long-term vision: unified approach to quantum error correction PRX 8, 031084 T. F¨osel,MPL Erlangen 5/19 Reinforcement learning example: Go [Silver et al., Nature 529, 484-489 (2016)] [Silver et al., Nature 550, 354-359 (2017)] PRX 8, 031084 T. F¨osel,MPL Erlangen 6/19 Supervised learning vs. reinforcement learning supervised learning reinforcement learning ? ! ! ? ? teacher student develop own strategies (no teacher) PRX 8, 031084 T. F¨osel,MPL Erlangen 7/19 training rewards (Model-free) reinforcement learning: basic setup agent (neural network) state 49 195 action 4450 40 45 47 57 61 63 77 69 133 35 193 194 52 42 3839 46 199 60 7 62 7071 66 5 36 196197 48 3 43 41 56 58 59 76 37 64 68 134 4 51 164 200 166 87 97 207 65 121 72 67 125 126 203 86 96 201 206 75 6 204 205 122 156 132 55 11 13 82 202 92 116117 120 155 73 54 9 10 15 80 161 162 93 115 180 182 81 8 12 78 79 84 95 114 118 119 168 179 83 28 26 53 85 8889 150 108 110111 113 174175 177178 192 27 19 20 25 149 98112 99 106 109 190 181 30 2324 91 100 102 103 107 124 188 189 14 29 210 131 129 104105 159 158 170 171172 74 209 17 130 101 213123 173 214 167 191 142 165 94 21 1 208 128 31 33 211 186 187 143 2 136 141 146 18 22 16 32 198 34 212 184 185 144 137138 135 147148 environment 176 183 169 139140 152 153 145 90 at 15 127 at 37 151 at 141 154 at 148 157 at 141 160 at 148 163 at 141 Rules 1 . x2 . x3 . x PRX 8, 031084 T. F¨osel,MPL Erlangen 8/19 (Model-free) reinforcement learning: basic setup agent (neural network) training state 49 195 action 4450 40 45 47 57 61 63 77 69 133 35 193 194 52 42 3839 46 199 60 7 62 7071 66 5 36 196197 48 3 43 41 56 58 59 76 37 64 68 134 4 51 164 200 166 87 97 207 65 121 72 67 125 126 203 86 96 201 206 75 6 204 205 122 156 132 55 11 13 82 202 92 116117 120 155 73 54 9 10 15 80 161 162 93 115 180 182 81 8 12 78 79 84 95 114 118 119 168 179 83 28 26 53 85 8889 150 108 110111 113 174175 177178 192 27 19 20 25 149 98112 99 106 109 190 181 30 2324 91 100 102 103 107 124 188 189 14 29 210 131 129 104105 159 158 170 171172 74 209 17 130 101 213123 173 214 167 191 142 165 94 21 1 208 128 31 33 211 186 187 143 2 136 141 146 18 22 16 32 198 34 212 184 185 144 137138 135 147148 environment 176 183 169 139140 152 153 145 90 at 15 127 at 37 151 at 141 154 at 148 157 at 141 160 at 148 163 at 141 Rules rewards 1 . x2 . x3 . x PRX 8, 031084 T. F¨osel,MPL Erlangen 8/19 we: address new problem: build complete QEC protocol introduce model-free RL with neural networks in physics 2 tage in controlling realistic systems where constructing a reliable e↵ective model is infeasible, for example due to disorder or dislocations. To manipulate the quantum system, our computer agent constructs piecewise-constant protocols of dura- tion T by choosing a drive protocol strength hx(t) at each time t = nδt, n = 0, 1, ,T/δt ,withδt the time-step size. In order to{ make··· the agent} learn, it is given a reward for every protocol it constructs – the fi- 2 delity Fh(T )= (T ) for being in the target state |h ⇤| i| after time T following the protocol hx(t) under unitary Schr¨odinger evolution. The goal of the agent is to max- imize the reward in a series of attempts. Deprived of any knowledge about the underlying physical model, the agent collects information about already tried protocols, based on which it constructs new, improved protocols through a sophisticated biased sampling algorithm [35]. In realistic applications, one does not have access to in- Prior applications of reinforcement learning in physicsfinite control fields; for this reason, we restrict to fields hx(t) [ 4, 4], see Fig. 1b. Pontryagin’s maximum prin- ciple further2 − allows us to focus on bang-bang protocols (Fig. 1b, red), where hx(t) 4 , although we verified quantum phase that RL also worksdesign for quasi-continuous quantum2 {± } protocols with quantum control many di↵erent steps δhx [35]. Even though there is only estimation one control field,optics the space experiments of available protocols grows 1 exponentially with the inverse step size δt− . in the task of designing new interesting experiments, and here we A B Control PhasesC of ConstrainedD Qubit Manipulation.elucidate— why this should be the case. The following analysis also t Fh(T )=1.000 To benchmark the application of RL to physics problems, 4 sheds light on other settings where we can be confident that RL consider first a two-level system described by techniques can be applied as well. 2 ) z x t First, the space of optical setups can be illustrated using a ( H(t)= S hx(t)S , (1) x 0 − − h graph as given in Fig. 4C, where the building of an optical exper- 2 ↵ − where S , are the spin-1/2 operators. Thisiment Hamil- corresponds to a walk on the directed graph. Note that 4 tonian comprises both integrable many-body and non- − 0.0 0.5 1.0 1.5 2.0 2.5 3.0 optical setups that create a certain state are not unique: Two or t interacting translational invariant systems, suchmore as different the setups can generate the same quantum state. Due transverse-field Ising model, graphene and topologicalto this in- fact, this graph does not have a tree structure but rather FIG. 1: (a) Phase diagram of the quantum state prepa- sulators. The initial i and target states areresembles chosen a maze. Navigating in a maze, in turn, constitutes ration problem for the qubit in Eq. (1) vs. protocol du- as the ground states| ofi (1) at h =| ⇤i2 and h = 2, re- x − onex of the classic textbook RL problems (10, 41, 48, 49). Sec- [Hentschel &ration Sanders,T , as determined by the orderFig. 3. parameterExperimental[Bukovq( etT ) setups al.,spectively. frequently Although used by the there PS exists agent.[Melnikov an (A analytical) Local et al., solution to (red) and the maximum possible achievableparity sorter. fidelity (B)F Nonlocal(T ) parity sorter (as discovered by the program). ond, our empirical analysis suggests that experiments generating PRL 104, 063603 (2010)] PRX 8, 031086 (2018)]h solve for the optimalPNAS, protocol 201714936 in this case (2017)] [46], ithigh-dimensional does not multipartite entanglement tend to have some (blue), compared to the variational(C fidelity) Nonlocalh(T parity) (black, sorter in the Klyshko wave front picture (53), in which F generalize to non-integrable many-body systems.structural Thus, similarities (12) (Fig. 4 A and B partially displays dashed). Increasing the total protocolthe paths timea andT ,d weareidentical go studying to the paths thisb problemand c, respectively. using RL (D) serves Setup to a two-fold pur- from an overconstrained phase I, throughincrease adimensionality glassy phase of photons.pose: (i)(A–D) weIn benchmark a simulation the of 100 protocols agents, the obtainedthe by exploration the space). Fig. 4 shows regions where the den- highest-weighted subsetups were 11 times experiment A, 22 times experi- II, to a controllable phase III. (b) Left: the infidelity RL agent demonstrating that, even though RLsity is a of com- interesting experiments (large colored nodes) is high and ment B, and 43 times experiment D was part of the highest-weighted sub- landscape is shown schematically (green). Right: the pletely model-free algorithm, it still finds theothers physically where it is low—interesting experiments seem to be clus- setup. Only in 24 cases were other subsetups the highest weighted.