<<

Machine Learning meets Foundations: A Brief Survey

Kishor Bharti∗ and Tobias Haug Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543

Vlatko Vedral Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543 and Clarendon Laboratory, University of Oxford, Parks Road, Oxford OX1 3PU, United Kingdom

Leong-Chuan Kwek Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543 MajuLab, CNRS-UNS-NUS-NTU International Joint Research Unit, UMI 3654, Singapore National Institute of Education, Nanyang Technological University, 1 Nanyang Walk, Singapore 637616 and School of Electrical and Electronic Engineering Block S2.1, 50 Nanyang Avenue, Singapore 639798 (Dated: June 15, 2020) The goal of machine learning is to facilitate a computer to execute a specific task without explicit instruction by an external party. seeks to explain the conceptual and mathematical edifice of quantum theory. Recently, ideas from machine learning have successfully been applied to different problems in quantum foundations. Here, we compile the representative works done so far at the interface of machine learning and quantum foundations. We conclude the survey with potential future directions.

CONTENTS IX. Conclusion and Future Work 12

I. Introduction 1 Acknowledgments 13

II. Machine Learning 2 Data Availability 13 A. Artificial Neural Networks 3 B. Relation with Artificial Intelligence 5 References 13

III. Quantum Foundations 5 A. Entanglement 5 I. INTRODUCTION B. Bell Nonlocality 6 C. Contextuality 7 The rise of machine learning in recent times has remark- D. Quantum Steering 8 ably transformed science and society. The goal of machine learning is to get computers to act without being explicitly IV. Neural network as Oracle for Bell Nonlocality 8 programmed [1, 2]. Some of the typical applications of ma- A. Machine Learning Nonlocal Correlations 8 chine learning are self-driving cars, efficient web search, im- B. Oracle for Networks 9 proved speech recognition, enhanced understanding of the hu- man genome and online fraud detection. This viral spread in V. Machine Learning for optimizing Settings for Bell interest has exploded to various areas of science and engineer- Nonlocality 10 ing, in part due to the hope that artificial intelligence may sup- A. Detecting Bell nonlocality with many-body plement human intelligence to understand some of the deep states 10 problems in science. B. Playing Bell Nonlocal Games 10 The techniques from machine-learning have been used for arXiv:2003.11224v2 [quant-ph] 12 Jun 2020 VI. Machine Learning-Assisted State Classification 11 automated theorem proving, drug discovery and predicting the A. Classification with Bell Inequalities 11 3-D structure of proteins based on its genetic sequence [3–5]. B. Classification by Representing Quantum States In , techniques from machine learning have been ap- with Restricted Boltzmann Machines 11 plied for many important avenues [6–39] including the study C. Classification with Tomographic Data 11 of black hole detection [25], topological codes [40], phase transition [16], glassy dynamics[27], gravitational lenses[23], VII. Neural Networks as “Hidden" Variable Models for Monte Carlo simulation [28, 29], wave analysis [24], quantum Quantum Systems 12 state preparation [41, 42], anti-de Sitter/conformal field theory (AdS/CFT) correspondence [43] and characterizing the land- VIII. A Few More Applications 12 scape of string theories [44]. Vice versa, the methods from physics have also transformed the field of machine learning both at the foundational and practical front [45, 46]. For a comprehensive review on machine learning for physics, refer ∗ [email protected] to Carleo et al [47] and references therein. For a thorough 2 review on machine learning and artificial intelligence in the instance, if we are programming a robot to play Go, playing quantum domain, refer to Dunjko et al [48] or Benedetti et al Go is the task. Some of the examples of machine learning [49]. tasks are following. Philosophies in science can, in general, be delineated from the study of the science itself. Yet, in physics, the study of • Classification: In classification tasks, the computer pro- quantum foundations has essentially sprouted an enormously gram is trained to learn the appropriate function h : m successful area called science. Quantum R → {1,2,··· ,t}. Given an input, the learned pro- foundations tells us about the mathematical as well as con- gram determines which of the t categories the input be- ceptual understanding of quantum theory. Ironically, this area longs to via h. Deciding if a given picture depicts a cat has potentially provided the seeds for future computation and or a dog is a canonical example of a classification task. communication, without at the moment reaching a consensus • Regression: In regression tasks, the computer program among all the physicists regarding what quantum theory tells is trained to predict a numerical value for a given input. us about the nature of reality [50]. m The aim is to learn the appropriate function h : R → R. In recent years, techniques from machine learning have A typical example of regression is predicting the price been used to solve some of the analytically/numerically com- of a house given its size, location and other relevant fea- plex problems in quantum foundations. In particular, the tures. methods from reinforcement learning and supervised learn- ing have been used for determination of the maximum quan- • Anomaly detection: In anomaly detection (also known tum violation of various Bell inequalities, the classification of as outlier detection) tasks, the goal is to identify rare experimental statistics in local/nonlocal sets, training AI for items, events or objects that are significantly different playing Bell nonlocal games, using hidden neurons as hid- from the majority of the data. A representative exam- den variables for completion of quantum theory, and machine ple of anomaly detection is credit card fraud detection learning-assisted state classification [51–54]. where the credit card company can detect misuse of the Machine learning also attempts to mimic human reasoning customer’s card by modelling his/her purchasing habits. leading to the almost remote possibility of machine-assisted n scientific discovery [55]. Can machine learning do the same • Denoising: Given a noisy examplex ˜ ∈ R , the goal of with the foundations of quantum theory? At a deeper level, denoising is to predict the conditional probability dis- n machine learning or artificial intelligence, presumably with tribution P(x|x˜) over noise-free data x ∈ R . some form of quantum computation, may capture somehow the essence of Bell nonlocality and contextuality. Of course, The measure of the success P of a machine learning algorithm such speculation belies the fact that human abstraction and depends on the task T. For example, in the case of classifi- reasoning could be far more complicated than the capabilities cation, P can be measured via the accuracy of the model, i.e. of machines. fraction of examples for which the model produces the cor- In this brief survey, we compile some of the representative rect output. An equivalent description can be in terms of error works done so far at the interface of quantum foundations and rate, i.e. fraction of examples for which the model produces machine learning. The survey includes eight sections exclud- the incorrect output. The goal of machine learning algorithms ing the introduction (section I). In section II, we discuss the is to work well on previously unseen data. To get an estimate basics of machine learning. Section III contains a brief in- of model performance P, it is customary to estimate P on a troduction to quantum foundations. In sections IV to VII, we separate dataset called test set which the machine has not seen discuss various applications of machine learning in quantum during the training. The data employed for training is known foundations. There is a rich catalogue of works, which we as the training set. could not incorporate in detail in sections IV to VII, but we Depending on the kind of experience, E, the machine is find them worth mentioning. We include such works briefly permitted to have during the learning process, the machine in section VIII. Finally, we conclude in section IX with open learning algorithms can be categorized into supervised learn- questions and some speculations. ing, unsupervised learning, and reinforcement learning. 1. Supervised learning: The goal is to learn a function y = f (x), that returns the label y given the correspond- II. MACHINE LEARNING ing unlabeled data x. A prominent example would be images of cats and dogs, with the goal to recognize Machine learning is a branch of artificial intelligence which the correct animal. The machine is trained with la- involves learning from data [1, 2]. The purpose of machine beled example data, such that it learns to correctly iden- learning is to facilitate a computer to achieve a specific task tify datasets it has not seen before. Given a finite set without explicit instruction by an external party. According of training samples from the join distribution P(Y,X), to Mitchel (1997) [56] “A computer program is said to learn the task of supervised learning is to infer the proba- from experience E with respect to some class of tasks T and bility of a specific label y given example data x i.e., performance measure P, if the performance at tasks in T, as P(Y = y|X = x). The function that assigns labels can measured by P, improves with E.” Note that the meaning of be inferred from the aforementioned conditional proba- the word “task” doesn’t involve the process of learning. For bility distribution. 3

2. Unsupervised learning: For this type of machine learn- question. The training and test data are generated according ing, data x is given without any label. The goal is to rec- to a over some datasets. Such a dis- ognize possible underlying structures in the data. The tribution is called data-generating distribution. It is conven- task of the unsupervised machine learning algorithms tional to make independent and identically distributed (IID) is to learn the probability distribution P(x) or some in- assumption, i.e. each example of the dataset is independent teresting properties of the distribution, when given ac- of another and, the training and test set are identically dis- cess to several examples x. It is worth stating that the tributed. Under such assumptions, the performance of the ma- distinction between supervised and unsupervised learn- chine learning algorithm depends on its ability to reduce the m ing can be blurry. For example, given a vector x ∈ R , training error as well as the gap between training and test er- the joint probability distribution can be factorized (us- ror. If a machine-learning algorithm fails to get sufficiently ing the chain rule of probability) as low training error; the phenomenon is called under-fitting. On the other hand, if the training error is low, but the test error is m large, the phenomenon is called over-fitting. The capacity of P(x) = P(x |x ,··· ,x ). (1) ∏ i 1 i−1 a machine learning model to fit a wide variety of functions is i=1 called model capacity. A machine learning algorithm with low The above factorization (1) enables us to transform a model capacity is likely to underfit, whereas a too high model unsupervised learning task of learning P(x) into m su- capacity often leads to over-fitting the data. One of the ways pervised learning tasks. Furthermore, given the super- to alter the model capacity of a machine learning algorithm is vised learning problem of learning the conditional dis- by constraining the class of functions that the algorithm is al- tribution P(y|x), one can convert it into the unsuper- lowed to choose. For example, to fit a curve to a dataset, one vised learning problem of learning the joint distribution often chooses a set of polynomials as fitting functions (see P(x,y) and then infer P(y|x) via Fig.1). If the degree of the polynomials is too low, the fit may not be able to reproduce the data sufficiently (under-fitting, or- P(x,y) ange curve). However, if the degree of the polynomials is too P(y|x) = . (2) ∑y1 P(x,y1) high, the fit will reproduce the training dataset too well, such that noise and the finite sample size is captured in the model This last argument suggests that supervised and unsu- (over-fitting, green curve). pervised learning are not entirely distinct. Yet, this distinction between supervised versus unsupervised is 3 sometimes useful for the classification of algorithms. data 3. Reinforcement learning: Here, neither data nor labels 2 p(2) fit are available. The machine has to generate the data it- p(3) fit self and improve this data generation process through 1 p(30) fit optimization of a given reward function. This method y 0 is somewhat akin to a human child playing games: The child interacts with the environment, and initially per- 1 forms random actions. By external reinforcement (e.g.

praise or scolding by parents), the child learns to im- 2 prove itself. Reinforcement learning has shown im- 2 0 2 4 6 mense success recently. Through reinforcement learn- x ing, machines have mastered games that were initially thought to be too complicated for computers to master. FIG. 1. Fitting data sampled from a degree 3 polynomial with small This was for example demonstrated by Deepind’s Alp- noise. A degree 2 polynomial (orange dashed line) is under-fitting haZero which has defeated the best human player in the as it has too low model capacity to capture the data. The degree board game Go [57]. 3 model has the right model capacity (blue solid line). The degree 30 polynomial (red dashed-dotted line) is over-fitting the data, as it One of the central challenges in machine learning is to de- captures the sampled data (low training error), but fails to capture the vise algorithms that perform well on previously unseen data. underlying model (high test error). The learning ability of a machine to achieve a high perfor- mance P on previously unseen data is called generalization. The input data to the machine is a set of variables, called features. A specific instance of data is called feature vec- tor. The error measure on the feature vectors used for train- A. Artificial Neural Networks ing is called training error. In contrast, the error measure on the test dataset, which the machine has not seen during the training, is called generalization error or test error. Given Most of the recent advances in machine learning were fa- an estimate of training error, how can we estimate test er- cilitated by using artificial neurons. The basic structure is a ror? The field of statistical learning theory aptly answers this single artificial neuron (AN), which is a real-valued function 4 of the form, input layer hidden layers output layer ! k AN(x) = φ ∑wixi where x = (xi)i ∈ R , i k w = (wi)i ∈ R and φ : R → R. (3)

In equation 3, the function φ is usually known as activation function. Some of the well-known activation functions are

1. Threshold function: φ (a) = 1 if a > 0 and 0 otherwise,

1 2. Sigmoid function: φ (a) = σ(a) = 1+exp(−a) , and FIG. 2. A feed-forward neural networks are a network of artificial 3. Rectified Linear (ReLu) function: φ(a) = max(0,a). neurons chained together in sequence. Each dot corresponds to a ar- tificial neuron, whereas the line indicate the input weights that feed The w vector in equation 3 is known as weight vector. into that neuron. Data fed into the input layer is propagated each Many of those artificial neurons can be combined to- layer at a time via the hidden layers (here two hidden layers as ex- gether via communication links to perform complex compu- ample) to the final output layer. tation. This can be achieved by feeding the output of neurons (weighted by the weights w ) as an input to another neuron, i where Z = exp(−E (v,h)) is a normalization factor, where the activation function is applied again. Such a graph ∑v,h commonly known as partition function. As there are no intra- structure G = (V,E), with the set of nodes (V) as artificial layer connections, one can sample from this distribution eas- neurons and edges in E as connections, is known as an ar- ily. Given a set of visible units v, the probability of a specific tificial neural network or neural network in short. The first hidden unit being h = 1 is layer, where the data is fed in, is called input layer. The lay- j ers of neurons in-between are the hidden layers, which are p(h j = 1|v) = σ(h j + ∑wi, jvi), (6) defining feature of deep learning. The last layer is called i output layer. A neural network with more than one hidden where σ(a) is the sigmoid activation function as introduced layer is called deep neural network. Machine learning involv- earlier. A similar relation holds for the reverse direction, e.g. ing deep neural networks is called Deep learning. A feedfor- given a set of hidden units, what is the probability of the visi- ward neural network is a directed acyclic graph, which means ble unit. that the output of the neurons is fed only into forward di- rection (see Fig.2). Apart from the feedforward neural net- hidden layer work, some of the popular neural network architectures are h h h h convolutional neural networks (CNN), recurrent neural net- 1 2 3 n b1 b2 b3 ... bn works (RNN), generative adversarial network (GAN), Boltz- mann machine and restricted Boltzmann machines (RBM).

We provide a brief summary of RBM here. For a detailed Wi,j understanding of various other neural networks, refer to [2]. An RBM is a bipartite graph with two kinds of binary- a ... 1 a2 a3 am valued neuron units, namely visible (v = {v1,v2,···}) and v1 v2 v3 vm hidden (h = {h ,h ,···}) (see Fig.3) The weight matrix W = 1 2 visible layer (wi j) encodes the weight corresponding to the connection be- tween visible unit vi and hidden unit h j [58]. Let the bias FIG. 3. A restricted Boltzman machines (RBM) is composed of hid- weight (offset weight) for the visible unit vi be ai and hidden den and visible layer. Each node has a bias (ai for visible, b j for unit h j be b j. hidden layer), and is connected with weight Wi, j to all the nodes in For a given configuration of visible and hidden neurons the opposite layer. There are no intra-layer connections. (v,h), one can define an energy function E (v,h) inspired from statistical models for systems as With these concepts, complicated probability distributions P(v) over some variable v can be encoded and trained within E (v,h) = −∑aivi − ∑b jh j − ∑∑viwi jh j. (4) i j i j the RBM. Given a set of training vectors v ∈ V, the goal is find the weights W 0 of the RBM that fit the training set best The probability of a configuration (v,h) is given by Boltz- arg max 0 P(v). (7) mann distribution, W ∏ v∈V exp(−E (v,h)) RBMs have been shown to be a good Ansatz to represent P(v,h) = , (5) Z many-body wavefunctions, which are difficult to handle with 5 other methods due to the exponential scaling of the dimen- sion of the . This method has been successfully applied to quantum many-body physics and quantum founda- AI tions problems [54, 59]. In context beyond physics, machine learning with deep neu- ral networks has accomplished many significant milestones, such as mastering various games, image recognition, self- ML driving cars, and so forth. Its impact on the physical sciences is just about to start [47, 54]. DL

B. Relation with Artificial Intelligence

FIG. 4. Deep learning (DL) is a sub-discipline of machine learning The term “artificial intelligence” was first coined in the fa- (ML). ML can be considered a sub-discipline of artificial intelligence mous 1956 Dartmouth conference [60]. Though the term was (AI). invented in 1956, the operational idea can be traced back to Alan Turing’s influential “Computing Machinery and Intelli- gence” paper in 1950, where Turing asks if a machine can of quantum theory. The study concerns the search for non- think [61]. The idea of designing machines that can think classical effects such as Bell nonlocality, contextuality and dates back to ancient Greece. Mythical characters such as different interpretations of quantum theory. This study also Pygmalion, Hephaestus and Daedalus can be interpreted as involves the investigation of physical principles that can put some of the legendary inventors and Galatea, Pandora and the theory into an axiomatic framework, together with an ex- Hephaestus can be thought of as examples of artificial life ploration of possible extensions of quantum theory. In this [2]. The field of artificial intelligence is difficult to define, survey, we will focus on non-classical features such as Entan- as can be seen by four different and popular candidate defini- glement, Bell nonlocality, contextuality and quantum steering tions [62]. The definitions start with “AI is the discipline that in some detail. aims at building ...” An interpretation of quantum theory can be viewed as a map from the elements of the mathematical structure of quan- 1. (Reasoning-based and human-based): agents that can tum theory to elements of reality. Most of the interpretations reason like humans. of quantum theory seek to explain the famous measurement 2. (Reasoning-based and ideal rationality): agents that problem. Some of the brilliant interpretations are the Copen- think rationally. hagen interpretation, quantum Bayesianism [63], the many- world formalism [64, 65] and the consistent history interpre- 3. (behavior-based and human-based): agents that behave tation [66]. The axiomatic reconstruction of quantum theory like humans. is generally categorized into two parts: the generalized prob- abilistic theory (GPT) approach [67] and the black-box ap- 4. (behavior-based and ideal rationality): agents that be- proach [68]. The physical principles underlying the frame- have rationally. work for axiomatizing quantum theory are non-trivial com- Apart from its foundational impact in attempting to under- munication complexity [69, 70], information causality [71], stand “intelligence,” AI has reaped practical impacts such as macroscopic locality [72], negating the advantage for non- automated routine labour and automated medical diagnosis, local computation [73], consistent exclusivity [74] and local to name a few among many. The real challenge of AI is to orthogonality [75]. The extensions of the quantum theory in- execute tasks which are easy for people to perform, but hard clude collapse models [76], quantum measure theory [77] and to express formally. An approach to solve this problem is by acausal quantum processes [78]. allowing machines to learn from experience, i.e. via machine learning. From a foundational point of judgment, the study of machine learning is vital as it helps us understand the mean- A. Entanglement ing of “intelligence”. It is worthwhile mentioning that deep learning is a subset of machine learning which can be thought Quantum interaction inevitably leads to quantum entangle- of as a subset of AI (see Fig.9). ment. The individual states of two classical systems after an interaction are independent of each other. Yet, this is not the case for two quantum systems [79, 80]. Quantum states III. QUANTUM FOUNDATIONS can be pure or mixed. For a bipartite pure |ψi ∈ H A ⊗ H B is entangled if it cannot be written as a The mathematical edifice of quantum theory has intrigued product state, i.e. |ψi = |ψAi ⊗ |ψBi for some |ψAi ∈ H A and puzzled physicists, as well as philosophers, for many and |ψBi ∈ H B. A mixed bipartite state is however ex- years. Quantum foundations seeks to understand and de- pressed in terms of a , ρ. Like for pure state, velop the mathematical as well as conceptual understanding a density matrix ρ is entangled if it cannot be expressed in 6

A A i B the form ρ = ∑ pi|ψi i hψi | ⊗ |ψi hψi |. An n−partite state i x1 x2 ρsep is called separable if it can be represented as a convex combination of product states i.e.

1 2 n ρsep = ∑ piρi ⊗ ρi ⊗ ··· ⊗ ρi , (8) i where 0 ≤ pi ≤ 1 and ∑i pi = 1. A quantum state is called separable if it is not entangled. Computationally it is not easy to check if a mixed state, especially in higher dimensions and for more parties, is sep- a1 a2 arable or entangled. Numerous measures of quantum entan- glement for both pure and mixed states are proposed [80]: for FIG. 5. The simplest scenario in which Bell nonlocality can be bipartite systems where the dimension of i(i = A,B) is 2, a H demonstrated is the famous Clauser-Horn-Shimony-Holt (CHSH) good measure is concurrence [81]. Other measures for quanti- experiment. There are two parties involved. Each party can per- fying entanglement are entanglement of formation, robustness form two dichotomic measurements. Thus, it is a (2,2,2) scenario. of entanglement, Schmidt rank, squashed entanglement and so The experimental statistics corresponds to probabilities of the form forth [80]. It turns out that for any entangled state ρ, there ex- P(a1,a2|x1,x2). ists a Hermitian matrix W such that Tr(ρW) < 0 and for all separable states ρsep, Tr(ρW) ≥ 0 [82–85]. was mooted a long time ago by Er- tics of the following form: win Schrödinger, but it took another thirty years or so for P = {P(a1,a2,··· ,aN|x1,x2,··· ,xN)} . John Bell to show that quantum theory imposes strong con- x1,···,xN ∈X ,a1,···,aN ∈A straints on statistical correlations in experiments. Yet, corre- (9) lations are not tantamount to causation and one wonders if We will refer to P as behavior. A Bell experiment involving machine learning could do better [86]. In the sixties and sev- N space-like separated parties, each party having access to m enties, this Gedanken experiment was given further impetus inputs and each input corresponding to k outputs is referred with some ingenious experimental designs [87–90]. The ad- to as (N,m,k) scenario. The famous Clauser-Horn-Shimony- vent of quantum information in the nineties gave a further Holt (CHSH) experiment is a (2,2,2) scenario [101], and it is push: quantum entanglement became a useful resource for the simplest scenario in which Bell nonlocality can be demon- many quantum applications, ranging from teleportation [91], strated (see Fig. 5). communication [92], purification [93], dense coding [94] and A behavior P admits local hidden variable description if computation [95, 96]. Interestingly, quantum entanglement is and only if a fascinating area that emerges almost serendipitously from the foundation of quantum into real practical ap- P(a1,a2,··· ,aN|x1,x2,··· ,xN) = ∑P(λ)P(a1|x1,λ) plications in the laboratories. λ ···P(an|xn,λ). (10) This is known as local behavior. The set of local behaviors B. Bell Nonlocality (L ) forms a convex polytope, and the facets of this polytope are Bell inequalities (see Fig. 6). In quantum theory, governs probability according to According to John Bell [97], any theory based on the col-    lective premises of locality and realism must be at variance P(a1,a2,··· ,aN|x1,x2,··· ,xN) = Tr Ma1|x1 ⊗ ··· ⊗ MaN |xN ρ , with experiments conducted by spatially separated parties in- (11) M volving shared entanglement, if the underlying measurement where ai|xi i are positive- valued measures events are space-like separated. The phenomenon, as dis- () and ρ is a shared density matrix. If a behavior sat- cussed before, is known as Bell nonlocality [98]. Apart from isfying Eq.11 falls outside L , it then violates at least one Bell its significance in understanding foundations of quantum the- inequality, and such behavior is said to manifest Bell nonlo- ory, Bell nonlocality is a valuable resource for many emerg- cality. The condition that parties do not communicate during ing device-independent quantum technologies like quantum the course of the Bell experiment is known as no-signalling key distribution (QKD), distributed computing, randomness condition. Intuitively speaking, it means that the choice of in- certification and self-testing [92, 99, 100]. The experiments put of one party can not be used for signalling among parties. which can potentially manifest Bell nonlocality are known Mathematically it means, as Bell experiments. A Bell experiment involves N spatially P(a a |x x ) = P(a |x )∀i j ∈ {1 ··· N} and i 6= j separated parties A1,A2,··· ,AN. Each party receives an input ∑ i, j i, j i i , , , . a j x1,x2,x3, ··· ,xN ∈ X and gives an output a1,a2,a3,··· ,aN ∈ A . For various input-output combinations one gets the statis- (12) 7

Bell Inequality

No-Signalling Set

Local Set

Quantum Set FIG. 7. The Mermin-Peres square in (A) provides a proof of Kochen- Specker theorem. Each line in the 3×3 grid describes mutually com- muting operators. Since the square of the operator at each node is the identity matrix, its eigenvalues are ±1. The product of the oper- ator along each line is the Identity, except for the bold one in which FIG. 6. The set of local behaviors (L ) forms a convex polytope the product is minus the Identity matrix.There is no way to assign ± (bounded polyhedron). The set of quantum behaviors Q is a convex definite value 1 to each operator to get these product rules since set. The set of behaviors compatible with no-signalling condition each operator appears in exactly two lines (or contexts). (B) shows (NS ) is again a convex polytope. It is important to note L ⊆ Q ⊆ the Mermin’s pentagram where we have four mutually commuting NS . The hyperplanes defining the boundary of the local set L three- operators. Along each line the product of the operators is are Bell inequalities. It is important to reiterate that the violation of Identity except for the bold one, leading to a contradiction. Classi- Bell inequalities means that a local realistic description of nature is cally, it is impossible to assign the numbers ± to each node such that impossible. the product of the numbers along a straightline is 1, except for the bold line which is -1.

The set of behaviours which satisfy the no signalling condi- tion are known as no-signalling behaviours. We will denote The contextual nature of quantum theory can be established the aforementioned set by NS . The no-signalling set also via simple constructive proofs. In Fig. 7 (A), one considers an forms a polytope. Furthermore, L ⊆ Q ⊆ NS (see Fig. 6). array of operators on a two-qubit system in any quantum state. The no-signalling behaviours which do not lie in Q are also There are nine operators, and each of them has eigenvalue ±1. known as post-quantum behaviours. In the (2,2,2) scenario The three operators in each row or column commute and so it i.e. CHSH Scenario, there is a unique Bell inequality, namely is easy to check that each operator is the product of the other CHSH inequality, upto relabelling of inputs and outputs. The two operators on a particular row or column, with a single CHSH inequality is given by exception, the third operator in the third column equals the minus of the product of the other two. Suppose there exist E0,0 + E0,0 + E0,0 + E0,0 ≤ 2 (13) pre-assigned values (−1 or +1) for the outcomes of the nine operators, then we can replace the none operators by the pre- where Ex1,x2 = P(a1 = a2|x1 = x2)−P(a1 6= a2|x1 = x2). All assigned values. However, there is no consistent way to as- local hidden variable theories satisfy CHSH inequality. In sign such values to the nine operators so that the product of quantum theory, suitably chosen measurement settings and the numbers in every row or column (except for the operators state can lead to violation of CHSH inequality and thus the along the bold line) yield 1 (the product yields -1). Notice CHSH inequality can be used to witness Bell nonlocal na- that each operator (node) appears in exactly two lines or con- ture√ of quantum theory. Quantum behaviours achieve upto text. Kochen and Specker provided the first proof of quantum 2 2, known as the Tsirelson bound. The upper bound for contextuality with a complicated construct involving 117 op- no-signalling behaviours (no-signalling bound) on the CHSH erators on a 3-dimensional space [102]. Another example is inequality is 4. the pentagram [103] in Fig. 7. Bell nonlocality is regarded as a particular case of con- textuality where the space-like separation of parties involved C. Contextuality creates ”context” [104, 105]. The study of contextuality has led not only to insights into the foundations of quantum me- An intuitive feature of classical models is non-contextuality chanics, but it also offers practical implications as well [106– which means that any measurement has a value independent 120]. There are several frameworks for contextuality in- of other compatible measurements it is performed together cluding sheaf-theoretic framework [121], graph and hyper- with. A set of compatible measurements is called context. graph framework [104, 122], contextuality-by-default frame- It was shown by Simon Kochen and Ernst Specker (as well as work [123–125] and operational framework [126]. A scenario John Bell) [102] that non-contextuality conflicts with quan- exhibits contextuality if it does not admit the non-contextual tum theory. hidden variable (NCHV) model [127]. 8

D. Quantum Steering Canabarro et al. [52] and Krivachy et al. [51]. In the work by Canabarro et al., the detection and characterization of nonlo- Correlations produced by steering lie between the Bell non- cality is done through an ensemble of multilayer perceptrons local correlations and those generated from entangled states blended with genetic algorithms (see IV A). [128, 129]. A state that manifests Bell nonlocality for some suitably chosen measurement settings also exhibits steering [130]. Furthermore, a state which exhibits steering must be A. Machine Learning Nonlocal Correlations entangled. A state demonstrates steering if it does not admit “local hidden state (LHS)” model [129]. We discuss this for- Given a behavior, deciding whether it is classical or non- mally here. classical is an extremely challenging task since the underly- AB Alice and Bob share some unknown quantum state ρ . Al- ing scenario grows in complexity very quickly. Canabarro et x ice can perform a set of POVM measurements {Ma}a . The al. [52] use supervised machine learning with an ensemble probability of her getting outcome a after choosing measure- of neural networks to tackle the approximate version of the ment x is given by problem (i.e. with a small margin of error) via regression.  x AB   x AB The authors ask “How far is a given correlation from the lo- P(a|x) = Tr (Ma ⊗ I)ρ = Tr TrA (Ma ⊗ I)ρ cal set.” The input feature vector to the neural network is a h i = Tr ρB , (14) random correlation vector. For a given behavior, the output a|x (label) is the distance of the feature vector from the classical, B  x AB i.e. local set. The nonlocality quantifier of a behavior q is where ρ = TrA (M ⊗ I)ρ is Bob’s residual state upon a|x a the minimum trace distance, denoted by NL(q) [131]. For the n B o normalization. A set of operators ρa|x acting on Bob’s two-party scenario, the nonlocality quantifier is given by a,x space is called an assemblage if 1 B B 0 NL(q) ≡ minp∈L |q − p|, (18) ρ = ρ 0 ∀x 6= x (15) ∑ ∑ a|x ∑ a|x 2|x||y| a,b,x,y a a and where L is the local set and |x| = |y| = m is the input size for h B i the parties. The training points are generated by sampling the ∑Tr ρa|x = 1 ∀x. (16) set of non-signalling behaviors randomly and then calculating a its distance from the local set via equation 18. Given a be- Condition 15 is the analogue of the no-signalling condition. havior q, the distance predicted by the neural network is never n o B equal to the actual distance, i.e. there is always a small error An assemblage ρa|x is said to admit LHS model if there a,x (ε 6=0). Let us represent the learned hypothesis as f : q → R. exists some hidden variable λ and some quantum state ρλ act- The performance metric of the learned hypothesis f is given ing on Bob’s space such that by ρB = P(λ)P(a|x,λ)ρ . (17) a|x ∑ λ N λ 1 P( f ) ≡ |NL(qi) − f (qi)|. (19) N ∑ A bipartite state ρAB is said to be steerable from Alice to Bob i=1 if there exist measurements for Alice such that the correspond- In experiments such as entanglement swapping, which com- ing assemblage does not satisfy equation 17. Determining prises three separated parties sharing two independent sources whether an assemblage admits LHS model is a semidefinite of quantum states, the local set admits the following form and program (SDP). The concept of steering is asymmetric by def- the set is non-convex, inition, i.e. even if Alice could steer Bob’s state, Bob may not be able to steer Alice’s state. P(a1,a2,a3) = ∑ P(λ1)P(λ2)P(a1|x1,λ1)P(a2|x2,λ2) λ1,λ2

IV. NEURAL NETWORK AS ORACLE FOR BELL P(a3|x3,λ3). (20) NONLOCALITY Here, non-convexity emerges from the independence of the The characterization of the local set for the convex scenario sources i.e. P(λ1,λ2) = P(λ1)P(λ2). via Bell inequalities becomes intractable as the complexity In this case, the Bell inequalities are no longer linear. Char- of the underlying scenario grows (in terms of the number of acterizing the set of classical as well as quantum behaviors parties, measurement settings and outcomes). For networks gets complicated for such scenarios [132, 133]. where several independent sources are shared among many Canabarro et al. train the model for convex as well as non- parties, the situation gets increasingly worse. The local set is convex scenarios. They also train the model to learn post- remarkably non-convex, and hence proper analytical and nu- quantum correlations. The techniques studied in the paper are merical characterization, in general, is lacking. Applying ma- valuable for understanding Bell nonlocality for large quantum chine learning technique to tackle these issues were studied by networks, for example those in quantum internet. 9

a x1 x2 x3

Charlie Alice 1 Bob 2 γ* β

a1 a2 a3 α b c

FIG. 8. There are three separated parties with two sources of hidden pA(a|β γ) pB(b|γ α) pC(c|α β) variables. The sources are independent P(λ1,λ2) = P(λ1)P(λ2). The local set is no longer convex. In experiments such as entangle- ment swapping, such non-convexity arise. Deciding whether a be- haviour is classical or quantum gets complicated for such scenarios.

B. Oracle for Networks γ α β

FIG. 9. Reprinted with permission from Krivachy et al.[51]. (Top) Given an observed probability distribution corresponding to The triangle network configuration. (Bottom) The topology of the scenarios where several independent sources are distributed neural network is selected such that it reproduces distributions com- over a network, deciding whether it is classical or non- patible with the triangle configuration. classical is an important question, both from practical as well as foundational viewpoint. The boundary separating the clas- sical and non-classical correlations is extremely non-convex such a case admits the following form: and thus a rigorous analysis is exceptionally challenging. Z 1 P(a,b,c) = dαdβdγPA (a|β,γ)PB (b|γ,α)PC (c|α,β). In reference [51], the authors encode the causal structure 0 into the topology of a neural network and numerically de- (21) termine if the target distribution is “learnable”. A behavior The neural network is constructed such that it can approx- belongs to the local set if it is learnable. The authors har- imate the distribution of type Eq.21. The inputs to the ness the fact that the information flow in feedforward neural neural network are α,β and γ drawn uniformly at ran- networks and causal structures are both determined by a di- dom and the outputs are the conditional probabilities i.e. rected acyclic graph. The topology of the neural network is PA (a|β,γ),PB (b|γ,α) and PC (c|α,β). chosen such that it respects the causality structure. The local The cost function is chosen to be any measure of the dis- set corresponding to even elementary causal networks such tance between the target distribution Pt and the network’s out- as triangle network is profoundly non-convex, and thus an- put PM. The authors employ the techniques to a few other alytical characterization of the same is a notoriously tricky cases, such as the elegant distribution and a distribution pro- task. Using the neural network as an oracle, Krivachy et al. posed by Renou et al. [134]. The application of the technique [51] convert the membership in a local set problem to a learn- to the elegant distribution suggests that the distribution is in- ability problem. For a neural network with adequate model deed nonlocal as conjectured in [135]. Furthermore, the dis- capacity, a target distribution can be approximated if it is lo- tribution proposed by Renou et al. appears to have nonlocal cal. The authors examine the triangle network with quaternary features for some parameter regime. For the sake of complete- outcomes as a proof-of-principle example. In such a scenario, ness, now we discuss the elegant distribution and the distribu- there are three independent sources, say α,β and γ. Each of tion proposed by Renou et al. the three parties receives input from two of the three sources Elegant distribution – The distribution is generated by and process the inputs to provide outputs via fixed response three parties performing entangled measurements on entan- functions. The outputs for Alice, Bob and Charlie will be in- gles systems. The three parties share singlets i.e. |ψ−i = √1 (|01i − |10i) Every party performs entangled measure- dicated by a,b,c ∈ {0,1,2,3}. The scenario as discussed here 2 can be characterized by the probability distribution P(a,b,c) ments on their two . The eigenstates of the entangled over the random variables a,b and c. If the network is clas- measurements are given by sical, then the distribution can be represented by a directed r √ acyclic graph known as a (BN). 3 3 − 1 |χ i = |α ,−α i + i |ψ−i, (22) j 2 j j 2 Assuming the distribution P(a,b,c) over the random vari- ables a,b and c to be classical, it is assumed without loss of where |α ji are vectors symmetrically distributed on the Bloch generality that the sources send a random variable drawn uni- sphere i.e. point to the vertices of a tetrahedron, for j ∈ formly from the interval 0 to 1. A classical distribution for {1,2,3,4}. 10

Renou et al. distribution – The distribution is gener- case. As evident from references [59], RBM can repre- ated by three parties sharing the entangled state |φ +i = sent quantum many-body problems beyond 1-D and low- √1 (|00i + |11i) and performing the same measurement on entanglement. 2 each of their two qubits. The measurements are charac- h i terized by a single parameter κ ∈ √1 ,1 with eigenstates √ √ 2 B. Playing Bell Nonlocal Games |01i,|10i,u|00i + 1 − u2|11i and 1 − u2|00i − u|11i. Prediction of winning strategies for (classical) games and decision-making processes with reinforcement learning (RL) V. MACHINE LEARNING FOR OPTIMIZING SETTINGS has made significant progress in game theory in recent years. FOR BELL NONLOCALITY Motivated partly by these results, the authors in Bharti et al. [53] have looked at a game-theoretic formulation of Bell in- Bell inequalities have become a standard tool to reveal the equalities (known as Bell games) and applied machine learn- non-local structure of . However, finding ing techniques to it. To violate a Bell inequality, both the the best strategies to violate a given Bell inequality can be a quantum state as well as the measurements performed on the difficult task, especially for many-body settings or even non- quantum states have to be chosen in a specific manner. The convex scenarios. Especially the latter setting is challenging, authors transform this problem into a decision making pro- as standard optimisation tools are unable to be applied to this cess. This is achieved by choosing the parameters in a Bell case. To violate a given Bell inequality, two inter-dependent game in a sequential manner, e.g. the angles of the measure- tasks have to be addressed: Which measurements have to ment operators, the angles parameterizing the quantum states, be performed to reveal the non-locality? And which quan- or both. Using RL, these sequential actions are optimized tum states show the maximal violation? Recently, Dong-Ling for the best configuration corresponding to the optimal/near- Deng has approached the latter task for convex settings with optimal quantum violation (see Fig.10). The authors train the many-body Bell inequalities using restricted Boltzmann ma- RL agent with a cost function that encourages high quantum chines [54]. Bharti et al. [53] have approached both tasks in violations via proximal policy optimization - a state-of-the-art conjunction for both convex and non-convex inequalities. RL algorithm that combines two neural networks. The ap- proach succeeds for well known convex Bell inequalities, but it can also solve Bell inequalities corresponding to non-convex A. Detecting Bell nonlocality with many-body states optimization problems, such as in larger quantum networks. So far, the field has struggled solving these inequalities; thus, Several methods from machine learning have been adopted this approach offers a novel possibility towards finding opti- to tackle intricate quantum many-body problems [54, 59, 136, mal (or near-optimal) configurations. 137]. Dong-Ling Deng [54] employs machine learning tech- niques to detect in many-body systems SELECT QUANTUM STATE(S) using the restricted Boltzmann machine (RBM) architecture. AND MEASUREMENT SETTINGS The key idea can be split into two parts: AGENT • After choosing appropriate measurement settings, Bell inequalities for convex scenarios can be expressed as an operator. This operator can be thought of as an Hamilto- nian. The eigenstate with the maximal eigenvalue is the state that gives the maximum violation for the underly- ing Bell inequality. This state can be found by calculat- ing the of the negative of the Hamiltonian.

• Finding the ground state of a quantum Hamiltonian is QMA-hard and thus in general difficult. However, us- ing heuristic techniques involving the RBM architec- ture, the problem is recast into the task of finding the ENVIRONMENT approximate answer in some cases. TRAIN WITH REWARD Techniques like density matrix renormalization group (DMRG) [138], projected entangled pair states (PEPS) [139] FIG. 10. Reprinted with permission from Ref [53]. Playing Bell and multiscale entanglement renormalization ansatz (MERA) games with AI: In [53], the authors train AI agent to play various [140] are traditionally used to find (or approximate) ground Bell nonlocal games. The agent interacts with the quantum system states of many-body Hamiltonians. But these techniques by choosing quantum states and measurement angles, and measures only work reliably for optimization problems involving low- resulting violation of the Bell inequalities. Over repeated games, the entanglement states. Moreover, DMRG works well only for agent trains himself using past results to realize maximal violation of systems with short-range interactions in the one-dimensional Bell inequalities. 11

Furthermore, the authors present an approach to find set- put with the state being entangled or separable (1 versus 0) tings corresponding to maximum quantum violation of Bell as corresponding output. The correlation vector contains the inequalities on near-term quantum computers. The quantum expectation of the product of Alice’s and Bob’s measurement state is parameterized by a circuit consisting of single-qubit operators. The performance of the network improved as the rotations and CNOT gates acting on neighbouring qubits ar- model capacity was increased, which hints that the hypothesis ranged in a linear fashion. Both the gates and the measure- which separates entangled states from separable ones must be ment angles are optimized using a variational hybrid classic- sufficiently “complex.” The authors also trained a neural net- , where the classical optimization is per- work to distinguish bi-separable and bound-entangled states. formed by RL. The RL agent learns by first randomly try- ing various measurement angles and quantum states. Over the course of the training, the agent improves itself by learn- B. Classification by Representing Quantum States with ing from experience, and is capable of reaching the maximal Restricted Boltzmann Machines quantum violation. Harney et al. [143] use reinforcement learning with RBMs to detect entangled states. RBMs have demonstrated being ca- VI. MACHINE LEARNING-ASSISTED STATE pable of learning complex quantum states. The authors mod- CLASSIFICATION ify RBMs such that they can only represent separable states. This is achieved by separating the RBM into K partitions that A crucial problem in quantum information is identifying are only connected within themselves, but not with the other the degree of entanglement within a given quantum state. partitions. Each partition represents a (possibly) entangled For Hilbert space dimensions up to 6, one can use Peres- subsystems, that however is not entangled with the other par- Horodecki criterion, also known as positive partial transpose titions. This choice enforces a specific K-separable Ansatz of criterion (PPT) to distinguish entangled and separable states. the wavefunction. This RBM is trained to represent a target However, there is no generic or entanglement wit- state. If the training converges, it must be representable by the ness as it is in fact a NP-hard problem [80]. Thus, one must Ansatz and thus be K-separable. However, if the training does rely on heuristic approaches. This poses a fundamental ques- not converge, the Ansatz is insufficient and the target state is tion: Given partial or full information about the state, are there either of a different K0-separable form or fully entangled. ways to classify whether it is entangled or not? Machine learn- ing has offered a way to find answers to this question. C. Classification with Tomographic Data A. Classification with Bell Inequalities Can tools from machine learning help to distinguish entan- gled and separable states given the full quantum state (e.g. In reference [141], the authors blend Bell inequalities with obtained by ) as an input? Two recent a feed-forward neural network to use them as state classifiers. studies address this question. The goal is to classify states as entangled or separable. If a In Lu et al. [144], the authors detect entanglement by em- state violates a Bell inequality, it must be entangled. However, ploying classic (i.e. non-deep learning) supervised learning. Bell inequalities cannot be used as a reliable tool for entangle- To simplify data generation of entangled and separable states, ment classification i.e. even if a state is entangled, it might not the authors approximate the set of separable states by a con- violate an entanglement witness based on the Bell inequality. vex hull (CHA). States that lie outside the convex hull are For example, the density matrix ρ = p|ψ ihψ | + (1 − p) I − − 4 assumed to be most likely entangled. For the decision pro- violates the CHSH inequality only for p > √1 , but is entan- 2 cess, the authors use ensemble training via bootstrap aggre- 1 gled for p > 3 [142]. Moreover, given a Bell inequality, the gating (bagging). Here, various supervised learning methods measurement settings that witness entanglement of a quan- are trained on the data, and they form a committee together tum state (if possible) depend on the quantum state. Prompted that decides whether a given state is entangled or not. The al- by these issues, the authors ask if they can transform Bell gorithm is trained with the quantum state and information en- inequalities into a reliable separable-entangled states classi- coding the position relative to the convex hull as inputs. The fier. The coefficients of the terms in the CHSH inequalities authors show that accuracy improves if bootstrapping of ma- are (1,1,1,−1). The local hidden variable bound on the in- chine learning methods is combined with CHA. equality is 2 (see equation 13). Assuming fixed measure- In a different approach Goes et al. [145], the authors present ment settings corresponding to CHSH inequality, the authors an automated machine learning approach to classify random examine whether it is possible to get better performance in states of two qutrits as separable (SEP), entangled with posi- terms of entanglement classification compared with the values tive partial transpose (PPTES) or entangled with negative par- (1,1,1,−1,2). The main challenge to answer such a question tial transpose (NPT). For training, the authors elaborate a way in a supervised learning setting is to get labelled data that is to find enough samples to train on. The procedure is as fol- verified to be either separable or entangled. lows: A random quantum state is sampled, then using the To train the neural network, the correlation vector corre- General Robustness of Entanglement (GR) and PPT criterion, sponding to the appropriate Bell inequality was chosen as in- it is classified to either SEP, PPTES or NPT. The GR measures 12 the closeness to the set of separable states. The authors com- VIII. A FEW MORE APPLICATIONS pare various supervised learning methods to distinguish the states. The input features fed into the machine are the compo- Work on quantum foundations has led to the birth of quan- nents of the quantum state vector and higher-order combina- tum computing and quantum information. Recently, the amal- tions thereof, whereas the labels are the type of entanglement. gamation of quantum theory and machine learning has led to Besides, they also train to estimate the GR with regression a new area of research, namely learning techniques and use it to validate the classifiers. [148]. For further reading on , refer to [149] and references therein. Further, for recent trends and exploratory works in quantum machine learning, we refer the reader to [150] and references therein. Techniques from machine learning have been used to dis- cover new quantum experiments [151, 152]. In reference VII. NEURAL NETWORKS AS “HIDDEN" VARIABLE [152], Krenn et. al. provide a computer program which MODELS FOR QUANTUM SYSTEMS helps designing novel quantum experiments. The program was called Melvin. Melvin provided solutions which were quite counter-intuitive and different than something a human Understanding why deep neural networks work so well is scientist ordinarily would come up with. Melvin’s solutions an active area of research. The presence of the word “hidden” have led to many novel results [153–158]. The ideas from for hidden variables in quantum foundations and hidden neu- Melvin have further provided machine generated proofs of rons in deep learning neurons may not be that accidental. Us- Kochen-Specker theorem [159]. ing conditional Restricted Boltzmann machines (a variant of Restricted Boltzmann machines), Steven Weinstein provides a completion of quantum theory in reference [146]. The com- IX. CONCLUSION AND FUTURE WORK pletion, however, doesn’t contradict Bell’s theorem as the as- sumption of “statistical independence” is not respected. The In this survey, we discussed the various applications of statistical independence assumption demands that the com- machine learning for problems in the foundations of quan- plete description of the system before measurement must be tum theory such as determination of the quantum bound for independent of the final measurement settings. The phenom- Bell inequalities, the classification of different behaviors in ena where apparent nonlocality is observed by violating statis- local/nonlocal sets, using hidden neurons as hidden variables tical independence assumption is known as “nonlocality with- for completion of quantum theory, training AI for playing Bell out nonlocality” [147]. nonlocal games, ML-assisted state classification, and so forth. In a Bell-experiment corresponding to CHSH scenario, the Now we discuss a few open questions at the interface. Some detector settings α ∈ {a,a0} and β ∈ {b,b0}, and the cor- of these open questions have been mentioned in references [51–54] responding measurement outcomes xα ∈ {1,−1}and xβ ∈ {1,−1} for a single experimental trial can be represented Witnessing Bell nonlocality in many-body systems is an  active area of research [54, 160]. However, designing as a four-dimensional vector α,β,xα ,xβ . Such a vector can be encoded in a binary vector V = (v ,v ,v ,v ) where experimental-friendly many-body Bell inequalities is a diffi- 1 2 3 4 cult task. It would be interesting if machine learning could vi ∈ {0,1}. Here, (+1)/(−1) has been mapped to 0/(+1). The four-dimensional binary vector V represents the value help design optimal Bell inequalities for scenarios involving taken by four visible units of an RBM. The dependencies be- many-body systems. In reference [54], the author used RBM tween the visible units is encoded using sufficient number of based representation coupled with reinforcement learning to find near-optimal quantum values for various Bell inequali- hidden units H = (h1,h2,··· ,h j). With four hidden neurons, the authors could reproduce the statistics predicted by EPR ties corresponding to various convex Bell scenarios. It is well experiment with high accuracy. For example, say the vector known that optimization becomes comparatively easier once V = (0,1,1,0) occurs in 3% of the trials, then after training the representation gets compact. It would be interesting if one the machine would associate P(V) ≈ 0.03. Quantum mechan- can use other neural networks based representations such as  convolutional neural networks for finding optimal (or near- ics gives us only the conditional probabilities P xα ,xβ |α,β and thus learning joint probability using RBM is resource- optimal) quantum values. wasteful. The authors harness this observation by encoding As mentioned in reference [52], it is an excellent idea to the conditional statistics only in a conditional RBM (cRBM). deploy techniques like anomaly detection for the detection of non-classical behaviors. This can be done by subjecting the The difference between a cRBM and RBM is that the units machine to training with local behaviors only. corresponding to the conditioning variables (detector settings In many of the applications, e.g. classification of entangled here) are not dynamical variables. There are no probabil- states, the computer gives a guess, but we are not sure about ities assigned to conditioning variables, and thus the only the correctness. This is never said and cannot be overlooked, probabilities generated by cRBM are conditional probabili- as it is a limitation of these applications. The intriguing ques- ties. This provides a more compact representation compared tion is to understand how the output of the computer can be to an RBM. employed to provide a certification of the result, for instance, 13 an entanglement witness. One could use ideas from proba- cality, it is genuine to ask if the methods could be useful to bilistic machine learning in such cases [161]. The probabilis- solve problems in quantum steering and contextuality. Re- tic framework, which explains how to represent and handle cently, ideas from the exclusivity graph approach to contex- uncertainty about models and predictions, plays a fundamen- tuality were used to investigate problems involving causal in- tal role in scientific data analysis. Harnessing the tools from ference [164]. Ideas from quantum foundations could further probabilistic machine learning such as Bayesian optimization assist in developing a deeper understanding of machine learn- and automatic model discovery could be conducive in under- ing or in general artificial intelligence. standing how to utilize machine output to provide a certifica- In artificial intelligence, one of the tests to distinguish be- tion of the result. tween humans and machines is the famous “Turing Test (TT)” In reference [53], the authors used reinforcement learning due to Alan Turing [62, 165]. The purpose of TT is to de- to train AI to play Bell nonlocal games and obtain optimal (or termine if a computer is linguistically distinguishable from a near-optimal) performance. The agent is offered a reward at human. In TT, a human and a machine are sealed in different the end of each epoch which is equal to the expectation value rooms. A human jury who does not know which room con- of the Bell operator corresponding to the state and measure- tains a human and which room not, asks questions to them, ment settings chosen by the agent. Such a reward scheme is by email, for example. Based on the returned outcome, if sparse and hence it might not be scalable. It would be inter- the judge cannot do better than fifty-fifty, then the machine esting to come up with better reward schemes. Furthermore, in question is said to have passed TT. The task of distinguish- in this approach only a single agent tries to learn the opti- ing the humans from machine based on the statistics of the a) x) mization landscape and discovers near-optimal (or optimal) answers (say output given questions (say input is a sta- measurement settings and state. It would be exciting to ex- tistical distinguishability test assuming the rooms plus its in- tend the approach to multi-agent setting where every space- habitants as black boxes. In the black-box approach to quan- like separated party is considered a separate agent. It is worth tum theory, experiments are regarded as a black box where the mentioning that distributing the actions and observations of a experimentalist introduces a measurement (input) and obtains single agent into a list of agents reduces the dimensionality the outcome of the measurement (output). One of the cen- of agent inputs and outputs. Furthermore, it also dramatically tral goals of this approach is to deduce statements regarding improves the amount of training data produced per step of the the contents of the black box based on input-output statistics environment. Agents learn better if they tend to interact as [68]. It would be nice to see if techniques from the black-box compared to the case of solitary learning. approach to quantum theory could be connected to TT. Bell inequalities separate the set of local behaviours from the set of non-local behaviours. The analogous boundary sep- ACKNOWLEDGMENTS arating quantum from the post-quantum set is known as quan- tum Bell inequality [162, 163]. Finding quantum Bell inequal- We wish to acknowledge the support of the Ministry of Ed- ities is an interesting and challenging problem. However, one ucation and the National Research Foundation, Singapore. We can aim to obtain the approximate expression by supervised thank Valerio Scarani for valuable discussions. learning with the quantum Bell inequalities being the bound- ary separating the quantum set from the post-quantum set. Moreover, it is interesting to see if it is possible to guess phys- DATA AVAILABILITY ical principles by merely opening the neural-network black box. Data sharing is not applicable to this survey as no new data Driven by the success of machine learning in Bell nonlo- were created or analyzed in this study.

[1] Shai Shalev-Shwartz and Shai Ben-David. Understanding ma- [6] Louis-François Arsenault, O Anatole von Lilienfeld, and An- chine learning: From theory to algorithms. Cambridge univer- drew J Millis. Machine learning for many-body physics: effi- sity press, 2014. cient solution of dynamical mean-field theory. arXiv preprint [2] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep arXiv:1506.08858, 2015. learning. MIT press, 2016. [7] Yi Zhang and Eun-Ah Kim. Quantum loop topography for [3] Mohammed AlQuraishi. Alphafold at casp13. Bioinformatics, machine learning. Physical review letters, 118(21):216401, 35(22):4862–4865, 2019. 2017. [4] James P Bridge, Sean B Holden, and Lawrence C Paulson. [8] Juan Carrasquilla and Roger G Melko. Machine learning Machine learning for first-order theorem proving. Journal of phases of matter. Nature Physics, 13(5):431–434, 2017. automated reasoning, 53(2):141–172, 2014. [9] Evert PL Van Nieuwenburg, Ye-Hua Liu, and Sebastian D Hu- [5] Antonio Lavecchia. Machine-learning approaches in drug dis- ber. Learning phase transitions by confusion. Nature Physics, covery: methods and applications. Drug discovery today, 13(5):435–439, 2017. 20(3):318–331, 2015. [10] Dong-Ling Deng, Xiaopeng Li, and S Das Sarma. Ma- chine learning topological states. Physical Review B, 14

96(19):195145, 2017. 95(3):035105, 2017. [11] Lei Wang. Discovering phase transitions with unsupervised [30] Giacomo Torlai, Guglielmo Mazzola, Juan Carrasquilla, learning. Physical Review B, 94(19):195105, 2016. Matthias Troyer, Roger Melko, and Giuseppe Carleo. Neural- [12] Peter Broecker, Juan Carrasquilla, Roger G Melko, and Simon network quantum state tomography. Nature Physics, Trebst. Machine learning quantum phases of matter beyond 14(5):447–450, 2018. the fermion sign problem. Scientific reports, 7(1):1–10, 2017. [31] Jing Chen, Song Cheng, Haidong Xie, Lei Wang, and Tao Xi- [13] Kelvin Chng, Juan Carrasquilla, Roger G Melko, and Ehsan ang. Equivalence of restricted boltzmann machines and tensor Khatami. Machine learning phases of strongly correlated network states. Physical Review B, 97(8):085104, 2018. fermions. Physical Review X, 7(3):031038, 2017. [32] Yichen Huang and Joel E Moore. Neural network represen- [14] Yi Zhang, Roger G Melko, and Eun-Ah Kim. Machine learn- tation of tensor network and chiral states. arXiv preprint ing z 2 quantum spin liquids with quasiparticle statistics. Phys- arXiv:1701.06246, 2017. ical Review B, 96(24):245119, 2017. [33] Frank Schindler, Nicolas Regnault, and Titus Neupert. Prob- [15] Sebastian J Wetzel. Unsupervised learning of phase transi- ing many-body localization with neural networks. Physical tions: From principal component analysis to variational au- Review B, 95(24):245134, 2017. toencoders. Physical Review E, 96(2):022140, 2017. [34] Tobias Haug, Rainer Dumke, Leong-Chuan Kwek, Christian [16] Wenjian Hu, Rajiv RP Singh, and Richard T Scalettar. Discov- Miniatura, and Luigi Amico. Engineering quantum current ering phases, phase transitions, and crossovers through unsu- states with machine learning. arXiv:1911.09578, 2019. pervised machine learning: A critical examination. Physical [35] Zi Cai and Jinguo Liu. Approximating quantum many-body Review E, 95(6):062122, 2017. wave functions using artificial neural networks. Physical Re- [17] Nobuyuki Yoshioka, Yutaka Akagi, and Hosho Katsura. view B, 97(3):035116, 2018. Learning disordered topological phases by statistical recovery [36] Peter Broecker, Fakher F Assaad, and Simon Trebst. Quantum of symmetry. Physical Review B, 97(20):205110, 2018. phase recognition via unsupervised machine learning. arXiv [18] Giacomo Torlai and Roger G Melko. Learning thermo- preprint arXiv:1707.00663, 2017. dynamics with boltzmann machines. Physical Review B, [37] Yusuke Nomura, Andrew S Darmawan, Youhei Yamaji, and 94(16):165134, 2016. Masatoshi Imada. Restricted boltzmann machine learning for [19] Hsin-Yuan Huang, Kishor Bharti, and Patrick Rebentrost. solving strongly correlated quantum systems. Physical Review Near-term quantum algorithms for linear systems of equations. B, 96(20):205152, 2017. arXiv preprint arXiv:1909.07344, 2019. [38] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Reben- [20] Ken-Ichi Aoki and Tamao Kobayashi. Restricted boltzmann trost, Nathan Wiebe, and Seth Lloyd. Quantum machine learn- machines for the long range ising models. Modern Physics ing. Nature, 549(7671):195–202, 2017. Letters B, 30(34):1650401, 2016. [39] Tobias Haug, Wai-Keong Mok, Jia-Bin You, Wenzu Zhang, [21] Yi-Zhuang You, Zhao Yang, and Xiao-Liang Qi. Machine Ching Eng Png, and Leong-Chuan Kwek. Classifying learning spatial geometry from entanglement features. Physi- global state preparation via deep reinforcement learning. cal Review B, 97(4):045153, 2018. arXiv:2005.12759, 2020. [22] Mario Pasquato. Detecting intermediate mass black holes [40] Giacomo Torlai and Roger G Melko. Neural decoder for topo- in globular clusters with machine learning. arXiv preprint logical codes. Physical review letters, 119(3):030501, 2017. arXiv:1606.08548, 2016. [41] Marin Bukov. Reinforcement learning for autonomous prepa- [23] Yashar D Hezaveh, Laurence Perreault Levasseur, and Philip J ration of floquet-engineered states: Inverting the quantum Marshall. Fast automated analysis of strong gravita- kapitza oscillator. Physical Review B, 98(22):224305, 2018. tional lenses with convolutional neural networks. Nature, [42] Marin Bukov, Alexandre GR Day, Dries Sels, Phillip Wein- 548(7669):555–557, 2017. berg, Anatoli Polkovnikov, and Pankaj Mehta. Reinforcement [24] Rahul Biswas, Lindy Blackburn, Junwei Cao, Reed Essick, learning in different phases of quantum control. Physical Re- Kari Alison Hodge, Erotokritos Katsavounidis, Kyungmin view X, 8(3):031086, 2018. Kim, Young-Min Kim, Eric-Olivier Le Bigot, Chang-Hwan [43] Koji Hashimoto, Sotaro Sugishita, Akinori Tanaka, and Akio Lee, et al. Application of machine learning algorithms to the Tomiya. Deep learning and the ads/cft correspondence. Phys- study of noise artifacts in gravitational-wave data. Physical ical Review D, 98(4):046019, 2018. Review D, 88(6):062003, 2013. [44] Jonathan Carifio, James Halverson, Dmitri Krioukov, and [25] Benjamin P Abbott, Richard Abbott, TD Abbott, MR Aber- Brent D Nelson. Machine learning in the string landscape. nathy, Fausto Acernese, Kendall Ackley, Carl Adams, Thomas Journal of High Energy Physics, 2017(9):157, 2017. Adams, Paolo Addesso, RX Adhikari, et al. Observation of [45] Henry W Lin, Max Tegmark, and David Rolnick. Why does gravitational waves from a binary black hole merger. Physical deep and cheap learning work so well? Journal of Statistical review letters, 116(6):061102, 2016. Physics, 168(6):1223–1247, 2017. [26] Sergei V Kalinin, Bobby G Sumpter, and Richard K [46] Andrzej Cichocki. Tensor networks for big data analyt- Archibald. Big–deep–smart data in imaging for guiding mate- ics and large-scale optimization problems. arXiv preprint rials design. Nature materials, 14(10):973–980, 2015. arXiv:1407.3124, 2014. [27] Samuel S Schoenholz, Ekin D Cubuk, Daniel M Sussman, [47] Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Efthimios Kaxiras, and Andrea J Liu. A structural approach to Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto, relaxation in glassy liquids. Nature Physics, 12(5):469–471, and Lenka Zdeborová. Machine learning and the physical sci- 2016. ences. Review of Modern Physics, 91(4):045002, 2019. [28] Junwei Liu, Yang Qi, Zi Yang Meng, and Liang Fu. [48] Vedran Dunjko, Jacob M Taylor, and Hans J Briegel. Self-learning monte carlo method. Physical Review B, Quantum-enhanced machine learning. Physical review letters, 95(4):041101, 2017. 117(13):130501, 2016. [29] Li Huang and Lei Wang. Accelerated monte carlo simula- [49] Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia tions with restricted boltzmann machines. Physical Review B, Fiorentini. Parameterized quantum circuits as machine learn- 15

ing models. Quantum Science and Technology, 4(4):043001, Information causality as a physical principle. Nature, 2019. 461(7267):1101–1104, 2009. [50] Lucien Hardy and Robert Spekkens. Why physics needs quan- [72] Miguel Navascués and Harald Wunderlich. A glance be- tum foundations. arXiv preprint arXiv:1003.5008, 2010. yond the quantum model. Proceedings of the Royal So- [51] Tamás Kriváchy, Yu Cai, Daniel Cavalcanti, Arash Tavakoli, ciety A: Mathematical, Physical and Engineering Sciences, Nicolas Gisin, and Nicolas Brunner. A neural network oracle 466(2115):881–890, 2010. for quantum nonlocality problems in networks. arXiv preprint [73] Noah Linden, Sandu Popescu, Anthony J Short, and Andreas arXiv:1907.10552, 2019. Winter. Quantum nonlocality and beyond: limits from non- [52] Askery Canabarro, Samuraí Brito, and Rafael Chaves. Ma- local computation. Physical review letters, 99(18):180502, chine learning nonlocal correlations. Physical review letters, 2007. 122(20):200401, 2019. [74] Bin Yan. Quantum correlations are tightly bound by the ex- [53] Kishor Bharti, Tobias Haug, Vlatko Vedral, and Leong-Chuan clusivity principle. Physical review letters, 110(26):260406, Kwek. How to teach ai to play bell non-local games: Rein- 2013. forcement learning. arXiv preprint arXiv:1912.10783, 2019. [75] Tobias Fritz, Ana Belén Sainz, Remigiusz Augusiak, J Bohr [54] Dong-Ling Deng. Machine learning detection of bell nonlo- Brask, Rafael Chaves, Anthony Leverrier, and Antonio Acín. cality in quantum many-body systems. Physical review letters, Local orthogonality as a multipartite principle for quantum 120(24):240402, 2018. correlations. Nature communications, 4(1):1–7, 2013. [55] Raban Iten, Tony Metger, Henrik Wilming, Lídia Del Rio, and [76] Kelvin J McQueen. Four tails problems for dynamical collapse Renato Renner. Discovering physical concepts with neural theories. Studies in History and Part B: networks. Physical Review Letters, 124(1):010508, 2020. Studies in History and Philosophy of Modern Physics, 49:10– [56] Tom M Mitchell et al. Machine learning. 1997. Burr Ridge, 18, 2015. IL: McGraw Hill, 45(37):870–877, 1997. [77] Rafael D Sorkin. Quantum mechanics as quantum measure [57] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis theory. Modern Physics Letters A, 9(33):3119–3127, 1994. Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas [78] Ognyan Oreshkov, Fabio Costa, and Caslavˇ Brukner. Quan- Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game tum correlations with no causal order. Nature communica- of go without human knowledge. Nature, 550(7676):354, tions, 3(1):1–8, 2012. 2017. [79] Jean-Michel Raimond, M Brune, and Serge Haroche. Manip- [58] Geoffrey E Hinton. A practical guide to training restricted ulating quantum entanglement with atoms and photons in a boltzmann machines. In Neural networks: Tricks of the trade, cavity. Reviews of Modern Physics, 73(3):565, 2001. pages 599–619. Springer, 2012. [80] Ryszard Horodecki, Paweł Horodecki, Michał Horodecki, and [59] Giuseppe Carleo and Matthias Troyer. Solving the quantum Karol Horodecki. Quantum entanglement. Reviews of modern many-body problem with artificial neural networks. Science, physics, 81(2):865, 2009. 355(6325):602–606, 2017. [81] William K Wootters. Entanglement of formation and con- [60] John McCarthy, Marvin L Minsky, Nathaniel Rochester, and currence. Quantum Information & Computation, 1(1):27–44, Claude E Shannon. A proposal for the dartmouth summer 2001. research project on artificial intelligence, august 31, 1955. AI [82] Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki. magazine, 27(4):12–12, 2006. On the necessary and sufficient conditions for separabil- [61] Alan M Turing. Computing machinery and intelligence. In ity of mixed quantum states. Phys. Lett. A, 223(quant- Parsing the Turing Test, pages 23–65. Springer, 2009. ph/9605038):1, 1996. [62] Stuart Russell and Peter Norvig. Artificial intelligence: a mod- [83] Barbara M Terhal. Detecting quantum entanglement. Theo- ern approach. 2002. retical computer science, 287(1):313–335, 2002. [63] Christopher A Fuchs and Rüdiger Schack. A quantum- [84] Philipp Krammer, Hermann Kampermann, Dagmar Bruß, bayesian route to quantum-state space. Foundations of Reinhold A Bertlmann, Leong Chuang Kwek, and Chiara Physics, 41(3):345–356, 2011. Macchiavello. Multipartite entanglement detection via struc- [64] Hugh Everett III. " relative state" formulation of quantum me- ture factors. Physical review letters, 103(10):100502, 2009. chanics. Reviews of modern physics, 29(3):454, 1957. [85] Artur K Ekert, Carolina Moura Alves, Daniel KL Oi, Michał [65] Bryce Seligman DeWitt and Neill Graham. The many worlds Horodecki, Paweł Horodecki, and Leong Chuan Kwek. Direct interpretation of quantum mechanics. Princeton University estimations of linear and nonlinear functionals of a quantum Press, 2015. state. Physical review letters, 88(21):217901, 2002. [66] Roland Omnes. Consistent interpretations of quantum me- [86] Judea Pearle. Causality. Cambridge: Cambridge University chanics. Reviews of Modern Physics, 64(2):339, 1992. Press, 2000. [67] Jonathan Barrett. Information processing in generalized prob- [87] Stuart J Freedman and John F Clauser. Experimental test abilistic theories. Physical Review A, 75(3):032304, 2007. of local hidden-variable theories. Physical Review Letters, [68] Antonio Acín and Miguel Navascués. Black box quantum 28(14):938, 1972. mechanics. In Quantum [Un] Speakables II, pages 307–319. [88] Carl A Kocher and Eugene D Commins. Polarization correla- Springer, 2017. tion of photons emitted in an atomic cascade. Physical Review [69] Wim Van Dam. Implausible consequences of superstrong non- Letters, 18(15):575, 1967. locality. Natural Computing, 12(1):9–12, 2013. [89] Alain Aspect. Expériences basées sur les inégalités de bell. Le [70] Gilles Brassard, Harry Buhrman, Noah Linden, André Allan Journal de Physique Colloques, 42(C2):C2–63, 1981. Méthot, Alain Tapp, and Falk Unger. Limit on nonlocality in [90] Alain Aspect, Jean Dalibard, and Gérard Roger. Experimental any world in which communication complexity is not trivial. test of bell’s inequalities using time-varying analyzers. Physi- Physical Review Letters, 96(25):250401, 2006. cal review letters, 49(25):1804, 1982. [71] Marcin Pawłowski, Tomasz Paterek, Dagomir Kaszlikowski, [91] Charles H Bennett, Gilles Brassard, Claude Crépeau, Richard Valerio Scarani, Andreas Winter, and Marek Zukowski.˙ Jozsa, , and William K Wootters. Tele- 16

porting an unknown quantum state via dual classical and [111] Debashis Saha, Paweł Horodecki, and Marcin Pawłowski. einstein-podolsky-rosen channels. Physical review letters, State independent contextuality advances one-way communi- 70(13):1895, 1993. cation. New Journal of Physics, 21(9):093057, 2019. [92] Artur K Ekert. based on bell’s theo- [112] Kishor Bharti, Maharshi Ray, Antonios Varvitsiotis, Adán Ca- rem. Physical Review Letters, 67(6):661, 1991. bello, and Leong-Chuan Kwek. Local certification of pro- [93] Charles H Bennett, Gilles Brassard, Sandu Popescu, Benjamin grammable quantum devices of arbitrary high dimensionality. Schumacher, John A Smolin, and William K Wootters. Pu- arXiv preprint arXiv:1911.09448, 2019. rification of noisy entanglement and faithful teleportation via [113] Nicolas Delfosse, Philippe Allard Guerin, Jacob Bian, and noisy channels. Physical review letters, 76(5):722, 1996. Robert Raussendorf. Wigner function negativity and contex- [94] Charles H Bennett and Stephen J Wiesner. Communication tuality in quantum computation on rebits. Physical Review X, via one-and two-particle operators on einstein-podolsky-rosen 5(2):021003, 2015. states. Physical review letters, 69(20):2881, 1992. [114] Jaskaran Singh, Kishor Bharti, et al. [95] David Deutsch. Quantum theory, the church–turing principle protocol based on contextuality monogamy. Physical Review and the universal quantum computer. Proceedings of the Royal A, 95(6):062333, 2017. Society of London. A. Mathematical and Physical Sciences, [115] Hakop Pashayan, Joel J Wallman, and Stephen D Bartlett. 400(1818):97–117, 1985. Estimating outcome probabilities of quantum circuits using [96] Richard P Feynman. Feynman lectures on computation. CRC quasiprobabilities. Physical review letters, 115(7):070501, Press, 2018. 2015. [97] J. S. Bell. On the einstein podolsky rosen paradox. Physics [116] Kishor Bharti, Atul Singh Arora, Leong Chuan Kwek, and (Long Island City, N.Y.), 1:195–200, Nov 1964. Jérémie Roland. A simple proof of uniqueness of the kcbs [98] Nicolas Brunner, Daniel Cavalcanti, Stefano Pironio, Valerio inequality. arXiv preprint arXiv:1811.05294, 2018. Scarani, and Stephanie Wehner. Bell nonlocality. Reviews of [117] Juan Bermejo-Vega, Nicolas Delfosse, Dan E Browne, Cihan Modern Physics, 86(2):419, 2014. Okay, and Robert Raussendorf. Contextuality as a resource for [99] Stefano Pironio, Antonio Acín, Serge Massar, A Boyer models of quantum computation with qubits. Physical review de La Giroday, Dzmitry N Matsukevich, Peter Maunz, Steven letters, 119(12):120505, 2017. Olmschenk, David Hayes, Le Luo, T Andrew Manning, [118] Lorenzo Catani and Dan E Browne. State-injection schemes et al. Random numbers certified by bell’s theorem. Nature, of quantum computation in spekkens’ toy theory. Physical 464(7291):1021, 2010. Review A, 98(5):052108, 2018. [100] Dominic Mayers and Andrew Yao. Self testing quantum ap- [119] Atul Singh Arora, Kishor Bharti, et al. Revisiting the admis- paratus. Quantum Info. Comput., 4(4):273–286, July 2004. sibility of non-contextual hidden variable models in quantum [101] John F. Clauser, Michael A. Horne, Abner Shimony, and mechanics. Physics Letters A, 383(9):833–837, 2019. Richard A. Holt. Proposed experiment to test local hidden- [120] Mark Howard, Joel Wallman, Victor Veitch, and Joseph Emer- variable theories. Physical Review Letters, 23:880–884, Oct son. Contextuality supplies the ’magic’ for quantum compu- 1969. tation. Nature, 510:351 EP –, Jun 2014. [102] SIMON KOCHEN and EP SPECKER. The problem of hidden [121] Samson Abramsky and Adam Brandenburger. The sheaf- variables in quantum mechanics. Journal of Mathematics and theoretic structure of non-locality and contextuality. New Mechanics, 17(1):59–87, 1967. Journal of Physics, 13(11):113036, 2011. [103] N David Mermin. Hidden variables and the two theorems of [122] Antonio Acín, Tobias Fritz, Anthony Leverrier, and Ana Belén john bell. Reviews of Modern Physics, 65(3):803, 1993. Sainz. A combinatorial approach to nonlocality and contextu- [104] Adán Cabello, Simone Severini, and Andreas Winter. Graph- ality. Communications in Mathematical Physics, 334(2):533– theoretic approach to quantum correlations. Phys. Rev. Lett., 628, 2015. 112(4):040401, 2014. [123] Ehtibar N Dzhafarov, Víctor H Cervantes, and Janne V [105] Barbara Amaral and Marcelo Terra Cunha. On Graph Ap- Kujala. Contextuality in canonical systems of random proaches to Contextuality and their Role in Quantum Theory. variables. Philosophical Transactions of the Royal Soci- Springer, Cham, Switzerland, 2018. ety A: Mathematical, Physical and Engineering Sciences, [106] Adán Cabello, Vincenzo D’Ambrosio, Eleonora Nagali, and 375(2106):20160389, 2017. Fabio Sciarrino. Hybrid ququart-encoded quantum cryptogra- [124] Ehtibar N Dzhafarov. On joint distributions, counterfac- phy protected by kochen-specker contextuality. Physical Re- tual values and hidden variables in understanding contextu- view A, 84:030302, Sep 2011. ality. Philosophical Transactions of the Royal Society A, [107] Kishor Bharti, Maharshi Ray, and Leong-Chuan Kwek. Non- 377(2157):20190144, 2019. classical correlations in n-cycle setting. Entropy, 21(2):134, [125] Janne V Kujala and Ehtibar N Dzhafarov. Measures of con- 2019. textuality and non-contextuality. Philosophical Transactions [108] Robert Raussendorf. Contextuality in measurement-based of the Royal Society A, 377(2157):20190149, 2019. quantum computation. Physical Review A, 88(2):022322, [126] Robert W Spekkens. Contextuality for preparations, trans- 2013. formations, and unsharp measurements. Physical Review A, [109] Kishor Bharti, Maharshi Ray, Antonios Varvitsiotis, 71(5):052108, 2005. Naqueeb Ahmad Warsi, Adán Cabello, and Leong-Chuan [127] S. Kochen and E. P. Specker. The problem of hidden variales Kwek. Robust self-testing of quantum systems via noncontex- in quantum mechanics. J. Math. Mech., 17:59–87, 1967. tuality inequalities. Physical review letters, 122(25):250403, [128] Marco Túlio Quintino, Tamás Vértesi, Daniel Cavalcanti, 2019. Remigiusz Augusiak, Maciej Demianowicz, Antonio Acín, [110] Shane Mansfield and Elham Kashefi. Quantum advantage and Nicolas Brunner. Inequivalence of entanglement, steer- from sequential-transformation contextuality. Physical review ing, and bell nonlocality for general measurements. Physical letters, 121(23):230401, 2018. Review A, 92(3):032107, 2015. 17

[129] Howard M Wiseman, Steve James Jones, and Andrew C [148] Maria Schuld and Francesco Petruccione. Supervised learning Doherty. Steering, entanglement, nonlocality, and the with quantum computers, volume 17. Springer, 2018. einstein-podolsky-rosen paradox. Physical review letters, [149] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. An 98(14):140402, 2007. introduction to quantum machine learning. Contemporary [130] Erwin Schrödinger. Discussion of probability relations be- Physics, 56(2):172–185, 2015. tween separated systems. In Mathematical Proceedings of the [150] Vedran Dunjko and Peter Wittek. A non-review of quantum Cambridge Philosophical Society, volume 31, pages 555–563. machine learning: trends and explorations. Quantum Views, Cambridge University Press, 1935. 4:32, 2020. [131] Samuraí Gomes de Aguiar Brito, B Amaral, and R Chaves. [151] Alexey A Melnikov, Hendrik Poulsen Nautrup, Mario Krenn, Quantifying bell nonlocality with the trace distance. Physical Vedran Dunjko, Markus Tiersch, , and Hans J Review A, 97(2):022111, 2018. Briegel. Active learning machine learns to create new quan- [132] Armin Tavakoli, Paul Skrzypczyk, Daniel Cavalcanti, and An- tum experiments. Proceedings of the National Academy of tonio Acín. Nonlocal correlations in the star-network configu- Sciences, 115(6):1221–1226, 2018. ration. Physical Review A, 90(6):062109, 2014. [152] Mario Krenn, Mehul Malik, Robert Fickler, Radek Lap- [133] Cyril Branciard, Nicolas Gisin, and Stefano Pironio. Char- kiewicz, and Anton Zeilinger. Automated search for new acterizing the nonlocal correlations created via entanglement quantum experiments. Physical review letters, 116(9):090405, swapping. Physical review letters, 104(17):170401, 2010. 2016. [134] Marc-Olivier Renou, Elisa Bäumer, Sadra Boreiri, Nicolas [153] Xuemei Gu, Mario Krenn, Manuel Erhard, and Anton Brunner, Nicolas Gisin, and Salman Beigi. Genuine quantum Zeilinger. Gouy phase radial mode sorter for light: Concepts nonlocality in the triangle network. Physical review letters, and experiments. Physical review letters, 120(10):103601, 123(14):140401, 2019. 2018. [135] Nicolas Gisin. Entanglement 25 years after quantum telepor- [154] Feiran Wang, Manuel Erhard, Amin Babazadeh, Mehul Malik, tation: testing joint measurements in quantum networks. En- Mario Krenn, and Anton Zeilinger. Generation of the com- tropy, 21(3):325, 2019. plete four-dimensional bell basis. Optica, 4(12):1462–1467, [136] Hiroki Saito and Masaya Kato. Machine learning technique to 2017. find quantum many-body ground states of bosons on a lattice. [155] Amin Babazadeh, Manuel Erhard, Feiran Wang, Mehul Malik, Journal of the Physical Society of Japan, 87(1):014001, 2018. Rahman Nouroozi, Mario Krenn, and Anton Zeilinger. High- [137] Xun Gao and Lu-Ming Duan. Efficient representation of quan- dimensional single-photon quantum gates: concepts and ex- tum many-body states with deep neural networks. Nature com- periments. Physical review letters, 119(18):180510, 2017. munications, 8(1):1–6, 2017. [156] Florian Schlederer, Mario Krenn, Robert Fickler, Mehul [138] Ulrich Schollwöck. The density-matrix renormalization Malik, and Anton Zeilinger. Cyclic transformation of or- group. Reviews of modern physics, 77(1):259, 2005. bital angular momentum modes. New Journal of Physics, [139] Frank Verstraete, Valentin Murg, and J Ignacio Cirac. Matrix 18(4):043019, 2016. product states, projected entangled pair states, and variational [157] Manuel Erhard, Mehul Malik, Mario Krenn, and Anton renormalization group methods for quantum spin systems. Ad- Zeilinger. Experimental greenberger–horne–zeilinger entan- vances in Physics, 57(2):143–224, 2008. glement beyond qubits. Nature Photonics, 12(12):759–764, [140] Guifré Vidal. Class of quantum many-body states that can be 2018. efficiently simulated. Physical review letters, 101(11):110501, [158] Mehul Malik, Manuel Erhard, Marcus Huber, Mario Krenn, 2008. Robert Fickler, and Anton Zeilinger. Multi-photon entangle- [141] Yue-Chi Ma and Man-Hong Yung. Transforming bell’s in- ment in high dimensions. Nature Photonics, 10(4):248, 2016. equalities into state classifiers with machine learning. npj [159] Mladen Paviciˇ c,´ Mordecai Waegell, Norman D Megill, and Quantum Information, 4(1):1–10, 2018. PK Aravind. Automated generation of kochen-specker sets. [142] Reinhard F Werner. Quantum states with einstein-podolsky- Scientific reports, 9(1):1–11, 2019. rosen correlations admitting a hidden-variable model. Physi- [160] Jordi Tura, Remigiusz Augusiak, Ana Belén Sainz, Tamas cal Review A, 40(8):4277, 1989. Vértesi, Maciej Lewenstein, and Antonio Acín. Detect- [143] Cillian Harney, Stefano Pirandola, Alessandro Ferraro, and ing nonlocality in many-body quantum states. Science, Mauro Paternostro. Entanglement classification via neural net- 344(6189):1256–1258, 2014. work quantum states. arXiv preprint arXiv:1912.13207, 2019. [161] Zoubin Ghahramani. Probabilistic machine learning and arti- [144] Sirui Lu, Shilin Huang, Keren Li, Jun Li, Jianxin Chen, ficial intelligence. Nature, 521(7553):452–459, 2015. Dawei Lu, Zhengfeng Ji, Yi Shen, Duanlu Zhou, and Bei [162] Ll Masanes. Extremal quantum correlations for n parties with Zeng. Separability-entanglement classifier via machine learn- two dichotomic per site. arXiv preprint quant- ing. Phys. Rev. A, 98(1):012315, 2018. ph/0512100, 2005. [145] Caio BD Goes, Askery Canabarro, Eduardo I Duzzioni, and [163] Le Phuc Thinh. Computing quantum bell inequalities. arXiv Thiago O Maciel. Automated machine learning can clas- preprint arXiv:1909.05472, 2019. sify bound entangled states with tomograms. arXiv preprint [164] Davide Poderini, Rafael Chaves, Iris Agresti, Gonzalo Car- arXiv:2001.08118, 2020. vacho, and Fabio Sciarrino. Exclusivity graph approach to [146] Steven Weinstein. Neural networks as" hidden" variable mod- instrumental inequalities. arXiv preprint arXiv:1909.09120, els for quantum systems. arXiv preprint arXiv:1807.03910, 2019. 2018. [165] Ayse Pinar Saygin, Ilyas Cicekli, and Varol Akman. Turing [147] Steven Weinstein. Nonlocality without nonlocality. Founda- test: 50 years later. Minds and machines, 10(4):463–518, tions of Physics, 39(8):921–936, 2009. 2000.