Arxiv:1802.05267V3 [Quant-Ph] 31 Aug 2018 Experiments [18]
Total Page:16
File Type:pdf, Size:1020Kb
Reinforcement Learning with Neural Networks for Quantum Feedback Thomas F¨osel,Petru Tighineanu, and Talitha Weiss Max Planck Institute for the Science of Light, Staudtstr. 2, 91058 Erlangen, Germany Florian Marquardt Max Planck Institute for the Science of Light, Staudtstr. 2, 91058 Erlangen, Germany and Physics Department, University of Erlangen-Nuremberg, Staudtstr. 5, 91058 Erlangen, Germany Machine learning with artificial neural networks is revolutionizing science. The most advanced challenges require discovering answers autonomously. This is the domain of reinforcement learning, where control strategies are improved according to a reward function. The power of neural-network- based reinforcement learning has been highlighted by spectacular recent successes, such as playing Go, but its benefits for physics are yet to be demonstrated. Here, we show how a network-based \agent" can discover complete quantum-error-correction strategies, protecting a collection of qubits against noise. These strategies require feedback adapted to measurement outcomes. Finding them from scratch, without human guidance, tailored to different hardware resources, is a formidable challenge due to the combinatorially large search space. To solve this, we develop two ideas: two-stage learning with teacher/student networks and a reward quantifying the capability to recover the quantum information stored in a multi-qubit system. Beyond its immediate impact on quantum computation, our work more generally demonstrates the promise of neural-network-based reinforcement learning in physics. We are witnessing rapid progress in applications of quantum memory from noise. It covers a wide range of artificial neural networks (ANN) for tasks like image clas- scenarios where one would otherwise have to select an ex- sification, speech recognition, natural language processing, isting scheme (stabilizer codes, adaptive phase estimation, and many others [1, 2]. Within physics, the examples etc.) and adapt it to the given situation. Our findings emerging during the past two years range across areas are of immediate relevance to the broad field of quan- like statistical physics, quantum many-body systems, and tum error correction (including quantum-error-mitigation quantum error correction [3{11]. To date, most appli- techniques) and are best suited to be used in few-qubit cations of neural networks employ supervised learning, quantum modules. These could be used as stand-alone where a large collection of samples has to be provided quantum memory or be part of the modular approach together with the correct labeling. to quantum computation, which has been suggested for However, inspired by the long-term vision of artificial several leading hardware platforms[19, 20]. scientific discovery [12, 13], one is led to search for more Given a collection of qubits and a set of available quan- powerful techniques that explore solutions to a given task tum gates, the agent is asked to preserve an arbitrary autonomously. Reinforcement learning (RL) is a general quantum state α 0 + β 1 initially stored in one of the approach of this kind [2], where an \agent" interacts with qubits. It finds complexj i j sequencesi including projective an \environment". The agent's \policy", i. e. the choice measurements and entangling gates, thereby protecting of actions in response to the environment's evolution, the quantum information stored in such a few-qubit sys- is updated to increase some reward. The power of this tem against decoherence. This is a very complex chal- method, when combined with ANNs, was demonstrated lenge, where both brute force searches and even the most convincingly through learning to play games beyond hu- straightforward RL approaches fail. The success of our man expertise [14, 15]. In physics, RL without neural approach is due to a combination of two key ideas: (i) networks has been introduced recently, for example to two-stage learning, with an RL-trained network receiving study qubit control [16, 17] and invent quantum optics maximum input acting as a teacher for a second network, arXiv:1802.05267v3 [quant-ph] 31 Aug 2018 experiments [18]. Moving to neural-network-based RL and (ii) a measure of the recoverable quantum informa- promises access to the vast variety of techniques currently tion hidden inside a collection of qubits, being used as a being developed for ANNs. reward. In this work, we introduce network-based RL in physics Recent progress in multi-qubit quantum devices [21{30] (Fig.1) and illustrate its versatility in the domain of quan- has highlighted hardware features deviating from often- tum feedback. Specifically, we devise a unified, fully au- assumed idealized scenarios. These include qubit con- tonomous, human-guidance-free approach for discovering nectivity, correlated noise, restrictions on measurements, quantum-error-correction (QEC) strategies from scratch, or inhomogeneous error rates. Our approach can help in few-qubit quantum systems subject to arbitrary noise finding \hardware-adapted" solutions. This builds on and hardware constraints. This approach relies on a net- the main advantage of RL, namely its flexibility: it can work agent that learns feedback strategies, adapting its discover strategies for such a wide range of situations actions to measurement results. As illustrated in Fig.1b- with minimal domain-specific input. We illustrate this d, our method provides a unified approach to protect a flexibility in examples from two different domains: in 2 can be employed in an experiment, receiving measurement a action (gate) qubits results and selecting suitable subsequent gate operations neural network conditioned on these results. However, in our two-stage learning approach, we do not directly train this neural noise network from scratch. Rather, we first employ reinforce- measurement ment learning to train an auxiliary network that has full RL-agent RL-environment knowledge of the simulated quantum evolution. Later on, the experimentally applicable network is trained in b c d a supervised way to mimic the behavior of this auxiliary network. Noise & Noise & hybrid Hardware Hardware We emphasize that feedback requires reaction towards dynamical DD-DFS Model Model decoupling the observations, going beyond optimal control type chal- lenges (like pulse shape optimization or dynamical decou- Algorithm pling), and RL has been designed for exactly this purpose. phase estimation RL for Quantum Specifically, in this work we will consider discrete-time, RL for Quantum Feedback Feedback stabilizer codes Quantum digital feedback, of the type that is now starting to be Compiler decoherence-free implemented experimentally [31{35], e. g. for error cor- subspaces rection in superconducting quantum computers. Other Device Device Temporal correlation of the noise correlation Temporal wide-spread optimization techniques for quantum con- Spatial correlation of the noise Instructions Instructions trol, like GRAPE, often vary evolution operators with respect to continuous parameters [36, 37], but do not Figure 1. (color) (a) The general setting of this work: A easily include feedback and are most suited for optimizing few-qubit quantum device with a neural-network-based con- the pulse shapes of individual gates (rather than com- troller whose task is to protect the quantum memory residing plex gate sequences acting on many qubits). Another in this device against noise. Reinforcement learning (RL) recent approach [38] to quantum error correction uses lets the controller (\RL-agent") discover on its own how to optimization of control parameters in a pre-configured best choose gate sequences, perform measurements, and re- gate sequence. By contrast, RL directly explores the act to measurement results, by interacting with the quantum space of discrete gate sequences. Moreover, it is a \model- device (\RL-environment"). (b) visualizes the flexibility of free" approach [2], i. e. it does not rely on access to the our approach (schematic). Depending on the type of noise underlying dynamics. What is optimized is the network and hardware setting, different approaches are optimal (DD, agent. Neural-network based RL promises to complement dynamical decoupling; DFS, decoherence-free subspace). By contrast, the RL approach is designed to automatically dis- other successful machine-learning techniques applied to cover the best strategy, adapted to the situation. In (c) we quantum control [39{42]. show the conventional procedure to select some QEC algo- Conceptually, our approach aims to control a quan- rithm and then produce hardware-adapted device instructions tum system using a classical neural network. To avoid (possibly re-iterating until an optimal choice is found). We confusion, we emphasize our approach is distinct from compare this to our approach (d) that takes care of all these future \quantum machine learning" devices, where even steps at once and provides QEC strategies fully adapted to the network will be quantum [8, 43, 44]. the concrete specifications of the quantum device. Reinforcement Learning one set of examples (uncorrelated bit-flip noise), the net- work is able to go beyond rediscovering the textbook The purpose of RL (Fig.1a) is to find an optimal set of stabilizer repetition code. It finds an adaptive response actions (in our case, quantum gates and measurements) to unexpected measurement results that allows it to in- that an \agent" can perform in response to the changing crease the coherence time, performing better than any state of an \environment" (here, the quantum