<<

arXiv:1904.08074v2 [cond-mat.quant-gas] 8 Nov 2019 O:10.1088/1361-6501/ab44d8. DOI: micro gas Technol. quantum Sci. site-resolved of classification learning-assisted as Published simulation Quantum imaging, Site-resolved learning, Deep Keywords ieRsle unu a irsoeImages Microscope Gas Quantum Site-Resolved of Classification Learning-Assisted Deep unu a irsoe pia atc,Fursec mgn,U imaging, Fluorescence lattice, Optical microscope, gas Quantum : iad .R . ak .J,Fran,F n a inn .Deep R. Bijnen, van and F. Ferlaino, J., M. Mark, B., R. L. Picard, : uigiaig nsm odtosw e eutoso pt facto a to up f imaging. of site-resolved step de reductions noncooled significant a fidelity see a in high representing we implement pinned rate, conditions are error some reconstruction In the which in imaging. microscope during quan noncooled erbium free-space a a and of simulations s u on of improve methods sets data these to evaluate large able on training coun both following photon are methods, low which threshold-based and architectures motion network t neural by existing two addres of limited to accuracy is suited the methods which especially reconstruction in is cooling, method continuous Our without c imaging sites unoccupied. lattice which or with occupied fidelity the as improve to learning deep uses which 1 2 USA. Bijnen van Rick ei .B Picard B. R. Lewis Abstract. E-mail: Austria 4 Innsbruck, 6020 Innsbruck, Austria 3 Innsbruck, 6020 Wissenschaften, 31 eateto hsc,HradUiest,Cmrde Massach Cambridge, University, Harvard Physics, of Department ntttfu uneotkudQuanteninformation, und f¨ur Quantenoptik Institut etrfrQatmPyis nvriyo nsrc,Austria Innsbruck, of University Physics, Quantum for Center f¨ur Quantenoptik Zentrum und f¨ur Experimentalphysik Institut epeetanvlmto o h nlsso unu a microsc gas quantum of analysis the for method novel a present We 221 2020. 025201, [email protected] 2 , 4 1 afe .Mark J. Manfred , 2 , 3 serihsh kdmeder Akademie Osterreichische rnec Ferlaino Francesca , ¨ cp images. scope radi u ffrsto efforts our in orward mltdiae.We images. imulated u a microscope, gas tum Universit¨at, sts 02138, usetts, o h dlt of fidelity the pon igtecs of case the sing hreshold-based nb classified be an s edevise We ts. p images, ope ftoin two of r ltracold, plattice ep Meas. 2 , 3 , Deep Learning-Assisted Classification of Quantum Gas Microscope Images 2

1. Introduction (a) Unoccupied Occupied Over the past decade, site-resolved fluores- cence imaging of atoms in optical lattices has become an essential tool for researchers work- ing in ultracold atomic physics and quantum simulation [1]. The adoption of this pow- Number of sites erful technique has been driven by improve- ments in both high-resolution imaging systems Deconvoluted intensity and computational techniques for identifying (b) atoms separated by distances close to or below the diffraction-limited resolution [2, 3]. The task of site-resolved imaging consists of two distinct parts: 1) building an imaging system which is able to detect multiple fluorescence photons scattered by each atom in an opti- Figure 1. (a) Illustration of threshold-based cal lattice and 2) analyzing the recorded im- reconstruction. A histogram of intensities following age in order to determine whether or not each deconvolution at each site in a set of images is plotted. lattice site is occupied by an atom. This is If sites are separated by more than or close to the diffraction-limited resolution, this will reveal a bimodal both an experimental challenge, constructing distribution of intensities. The threshold intensity used a high resolution microscope, and a computa- to classify a site is determined by the point at which tional one, devising an algorithm to reliably the two peaks overlap. (b) Examples of simulated reconstruct the underlying lattice occupation images of three-by-three erbium lattice segments, with a lattice constant of 266 nm and 1.5 µs illumination from the recorded image. At present, the range time. The superimposed red lines indicate the lattice of species that can be imaged remains limited site boundaries. Of the three images, only the center by the need to continuously cool atoms dur- one has an occupied central lattice site. ing fluorescence imaging. In the vast majority of existing site-resolved imaging experiments, lattices using deep learning, in order to im- atoms are pinned in place by a deep lattice prove the performance of imaging without con- and continuously -cooled during imaging tinuous cooling. [3, 4, 5, 6, 7, 8, 9, 10]. In this case the distri- The most widely used method for recon- bution of bright pixels in a fluorescence image struction of the lattice occupation pattern in ideally results only from the point-spread func- existing experiments requires first deconvolut- tion (PSF) of the imaging system. Imaging ing each image with the known PSF of the without cooling limits the number of photons imaging system. This PSF can be deter- which can be detected from each atom, which mined experimentally by averaging raw images gets rapidly heated up and displaced from its of many isolated atoms, or calculated based on original position by scattering of the imaging known optical parameters of the imaging sys- light. This heating reduces the fidelity of tra- tem [2, 3, 4, 5, 6, 7, 8, 9]. Deconvolution allows ditional threshold-based reconstruction meth- a single value of the light intensity to be deter- ods. Here, we propose a novel method of ana- mined for each lattice site. The distribution lyzing fluorescence images of atoms in optical of light intensities will generally consist of two Deep Learning-Assisted Classification of Quantum Gas Microscope Images 3 distinct peaks corresponding to occupied and transition which can simultaneously be used unoccupied sites, as illustrated in figure 1(a). for imaging severely limits the range of species The degree of overlap of the histogram peaks which can be imaged, and generally requires is determined both by the background noise that a quantum gas microscope is custom- level and the overlap of point-spread functions built for each new species. As a result, the of atoms on neighbouring sites. The bimodal extension of single-site imaging to fermionic distribution is eventually washed out entirely alkaline atoms came significantly later than for high noise levels and/or for atom separa- boson-imaging, requiring the implementation tions significantly below the width of the point of more sophisticated cooling techniques, such spread function of the imaging system. Taking as Raman sideband and EIT cooling [1, 5, a large enough sample of lattice sites allows 7]. These cooling techniques tend to increase the estimation of the underlying distribution, experimental complexity, needing additional from which a single threshold value can be de- cooling beams, and, in the case of EIT rived which can be used to classify the occu- cooling, may themselves introduce high levels pation of all sites [4, 5, 6, 11]. Some variations of background light, which must then be on this basic method exist, such as determin- reduced by other means, such as alternating ing the occupation by minimizing the differ- cooling and imaging pulses in a single imaging ence between a real image and a reconstruction cycle [6]. To our knowledge only one example generated through convolution with the PSF of optical lattice imaging without cooling has [3], but the experimental requirements remain been published at this time, which relies on similar. More recent work on parametric de- confining Yb atoms in a deep lattice and using convolution, described in [12], has shown that short imaging pulses to prevent losses due to a more sophisticated model which uses knowl- heating [14]. Fluorescence imaging of single Li edge of both the point spread function and the atoms in free flight has recently been achieved, restricted geometry of the lattice can improve but using this method multiple atoms can only the discrimination of nearby atoms. be reliably resolved at a separation greater Without continuous cooling, atoms will than 32 µm, precluding the study of short-scale be significantly heated during the imaging many-body dynamics [15]. process. This heating occurs through the We propose a method for reconstructing build-up of velocity kicks an atom receives optical lattice images to single-site resolution each time it absorbs and re-emits a photon, which does not require atoms to be confined eventually giving it enough kinetic energy to to a lattice site during imaging. When atoms escape the potential well of a lattice site. are neither continuously cooled nor pinned Cooling and confinement by a deep pinning by a deep lattice, they will move away from lattice allows the capture of images consisting their original lattice site on a random walk as of hundreds of scattered photons per atom, they scatter photons from the imaging beam. with reconstruction fidelity limited mainly High-resolution imaging without extra cooling by atom losses and hopping between lattice and optical pinning will bring enormous sites [13]. Implementing continuous cooling experimental and conceptual simplification, is, however, among the more experimentally and will be essential to the development of challenging facets of a single-site imaging ultrafast microscopy. In this respect, atoms system. The requirement of a cooling with strong optical transitions for imaging Deep Learning-Assisted Classification of Quantum Gas Microscope Images 4 and large masses, such as lanthanides, are increasing number of applications in physics, perfect candidates, and are a target of growing particularly in classification problems [16]. interest as many-body quantum systems in the Deep neural networks may offer advantages community. In the case of our planned Er in both speed and accuracy over existing microscope, the lattice will be switched off approximations, as has been demonstrated entirely during imaging, allowing the atoms for a range of physical problems, including to diffuse in free space. In other cases, such determining observable properties of as the Yb lattice experiment that we simulate in arbitrary 2D potentials [17], reading out to assess our networks, the lattice potential trapped ion [18] and reconstructing is deepened during imaging to provide some the optical phase of imaging light at an confinement without cooling, such that atoms objective from low photon count recorded jump between lattice sites as they heat up [14]. images [19]. In other cases they may allow The random motion of the atoms makes classification of experimental data for which the reconstruction of the lattice occupation no agreed-upon approximate model exists, an intractable inverse problem, meaning that which has led to their use in identifying phase there is no way to exactly determine the transitions in quantum many-body systems most likely initial atom distribution which gave [20, 21, 22] and evaluating theoretical models rise to a particular recorded image. It is of interactions of fermions in an optical lattice nevertheless possible to approximate the atom [23]. Outside the realm of classification as a fixed point emitter, with an effective problems, recent work has focused on the PSF broadened by atom motion compared rich field of unsupervised machine learning, in to the true optical PSF. This method may which models are trained with unlabelled data be sufficient when lattice spacings are large based on some metric internal to the data set, compared to the atom displacements, or such as the degree to which different inputs when many photons are collected before the can be divided into non-overlapping clusters atoms move away from their starting positions. [24]. Unsupervised learning has recently However, an additional restriction imposed by been demonstrated to be useful in quantum noncooled imaging is that the total photon state tomography, where neural network states count must be small, as only a few photons can representing the amplitude and phase of a be detected before atoms move too far to be many-body quantum system are learned based distinguished from their neighbours, severely on sets of measurements of its state in a range limiting the applicability of the stationary of bases [25]. emitter approximation. We suggest that deep The reconstruction procedure we describe neural networks provide a way to overcome here has been designed primarily to analyze some of the limitations of noncooled image images from our planned noncooled erbium reconstruction. The advantage of using deep quantum gas microscope [26], but is generally learning for data-analysis lies in the fact applicable to most cooled and noncooled imag- that a deep neural network can approximate ing systems. To illustrate the task at hand, non-linear relationships between input data. figure 1(b) shows some typical (simulated) ex- This is especially useful in the analysis of ample images that our method aims to classify. intractable inverse problems. In the past In the present paper we test two different deep few years, machine learning has found an learning classifiers of different levels of com- Deep Learning-Assisted Classification of Quantum Gas Microscope Images 5

(a) raw input vector, through to the final output layer. A hidden layer is one which lies between the input and the output, whose state is not read out to the user. The model as Input Output Layer a whole is referred to as an artificial neural network, as its structure is inspired by, though not actually very similar to, biological neural Hidden networks [28]. Each element of a layer, usually Layer a scalar number, can be referred to as a neuron. (b) The parameters of the network that define the precise mapping from one layer to the next can be learned by repeatedly evaluating the performance of the network on a set of test input vectors, and adjusting the parameters accordingly. A feedforward neural network, illustrated in figure 2, is among the simplest neural network architectures that exist. It consists of a series of layers, where each neuron Figure 2. Illustration of three-layer feedforward neu- in a layer is connected to every neuron of ral network architectures with all-to-all interlayer con- nections in N-1-1 (a) and N-M-1 (b) configurations, its neighbouring layers, and there are no where N is the number of pixels in an input image and intralayer connections. The action of the M is the number of neurons in a hidden layer. network on an input data vector is, in its most basic form, a series of matrix multiplications. plexity, and compare their performance to a Generally a bias vector is also added to the threshold-based reconstruction model. output of each layer, and a transfer function may also be applied to each output. Thus, the 2. Reconstruction Using Deep Learning action of a single layer can be written as y(i) = f(W (i)y(i−1) + b(i)) (1) Deep neural networks are generally models y(i−1) m that transform an input vector, in our case where is an element input vector i an array of pixels, into an output vector. representing the neuron values of layer − 1, y(i) n In this case the output is a scalar value is an element output representing the (i) indicating whether or not a lattice site is neuron values of layer i, W is an n × m b(i) occupied. Deep neural networks perform matrix, is an n element bias vector and f is their function using a series of two or more an arbitrary transfer function applied element- consecutive transformations, each of which wise to the the intermediate value to give an takes the output of the previous one as its output neuron state. The transfer function input [27]. The transformations are said is often used to map scalar values back to to connect different layers of the network, the interval {0, 1}. The process of training a beginning with the input layer, consisting of a neural network broadly consists of adjusting weight matrices and bias vectors to optimize Deep Learning-Assisted Classification of Quantum Gas Microscope Images 6 the output for a particular problem. The provide qualitative post-hoc interpretations performance of a trained network can then [29]. To this end, we implement a basic be evaluated by measuring its generalization form of threshold reconstruction in a form error, the rate at which it correctly classifies resembling a neural network, and compare it items in a previously unseen data set. In to an equivalent neural network trained on a principle, a two-layer feedforward network is data set of simulated images. capable of learning any arbitrary relationship The simplest way to determine an inten- between elements of an input data vector [27]. sity value for threshold reconstruction is to In practice it is often difficult to train such a simply add up all the bright pixels in a lat- network, particularly when dealing with large tice site. This could be trivially represented in input vectors, such as the high-magnification the feedforward neural network form given in images of lattice segments we use to train our equation 1 through multiplication of the input classifier. by an m × 1 binary vector, with a one multi- Below, we discuss a number of neural plying every pixel in the region to be summed network architectures with which we have and zeros everywhere else: y(i) = w · y(i−1). experimented in order to classify lattice To improve the discrimination between pho- images. All of our neural networks are trained tons from lattice atoms and noise counts, one on large data sets of simulated images of three- can replace the simple sum by a weighted sum by-three lattice site regions (see Appendix for using a PSF centered on the lattice site being discussion of the simulation). The reason for classified. The lattice spacing and alignment using three-by-three segments is that these can be determined experimentally beforehand are able to capture the first-order correlations by various means, such as Fourier transforming between the brightness of a lattice site and a whole lattice image [8] or projecting images its eight nearest neighbours while still being onto each axis of the imaging plane and fit- small enough that we can simulate training ting with a periodic series of Gaussians [2, 3], data sets in which every possible arrangement as we do in this work. The sum of pixels of atoms is represented. When the networks weighted by the PSF can then be expressed are applied to test images, these are first as a row-matrix multiplication linking the in- broken down into overlapping three-by-three put layer and single-neuron hidden layer of a segments, which are then individually fed into neural network. In other words, the matrix the network for classification of the central site W (i), for i = 1, in equation (1) is simply a (1) of each segment. row vector W1j = PSF(xj), with xj the co- ordinates of the j-th pixel of the PSF. The 2.1. Threshold reconstruction as a three-layer transfer function applied at the hidden layer network is f(x)= x. The transformation from the hid- den layer to the single-neuron output consists In order to better understand the process of a scalar multiplication by a weight w fol- of neural network training and how it can lowed by the addition of a bias b and appli- be used to achieve improvements in fidelity, cation of the logistic-sigmoid transfer function we first wish to trace a direct link between f(y) = (1+exp(−y))−1, producing an output threshold reconstruction and some simple in the range 0 to 1, with 0 corresponding to neural network architectures for which we can an unoccupied central site and 1 correspond- Deep Learning-Assisted Classification of Quantum Gas Microscope Images 7

1 the network with a set of simulated images, (1) 0.95 after initializing W with the PSF reshaped 800 0.9 600 to a row vector as described above. During 0.85 400 the training process, the weights assigned to delity 

0.8 Frequency 200 pixels in the input image and the classification 0.75 0 0 0.1 0.2 0.3 threshold are adjusted so as to minimize the cation cation  0.7 Mean pixel intensity reconstruction error. Given that the hidden 0.65

Classi layer has only a single neuron, this still directly 0.6

0.55 corresponds to a weighted sum of all the

0.5 pixels in an input image. What we see, 0 0.1 0.2 0.3 0.4 0.5 Threshold however, is that during the training procedure the neural network learns to negatively weight Figure 3. Range of fidelities achievable using weighted bright pixels in neighbouring lattice sites, and sum threshold-based reconstruction by varying the mean pixel intensity threshold, which is equivalent to achieves a significant increase in fidelity as a b/w in the neural network representation. Fidelities result, as illustrated in figure 4. This tells us are evaluated on a data set of simulated images of that without additional manual intervention unconfined erbium atoms in a lattice of period 266 on the part of researchers the network can nm, generated according to the procedure described learn to compensate for overlap of signals from Appendix A. The inset shows the bimodal distribution of mean pixel intensities in the simulated data set. filled lattice sites onto their neighbours. We Prior to optimizing the threshold the pixels are also found that while manually initializing the weighted by a PSF centered on the lattice site, which network with the PSF allows it to reliably improves the separation of the peaks in the intensity converge to a good classifier, using any random distribution. In both plots, the optimal threshold of 0.108 is indicated by a dashed red line. initialization generally does not converge to a good solution. This shows that though training even this simple neural network ing to occupied. This layer performs the same leads to an improvement over the manually role as the comparison of site intensity to a optimized method, it remains very sensitive to fixed threshold. In principle the above net- user defined initialization conditions, which are work could also be reduced to a two-layer net- specific to each imaging system. work, but for later convenience we employed The weighted sum model alone does not a three-layer format. By scanning the param- represent the best available form of threshold eter w, the maximum possible fidelity of the reconstruction. Deconvolution, or equivalently weighted sum threshold reconstruction can be fitting an image with a set of Gaussians determined, as illustrated for the case of non- centered on each lattice site, is the most cooled erbium atoms in figure 3. widely used method, described in [9, 13, 14], We can gain some insight into the neural among others. A threshold can then be network training process by training a network applied to the fit amplitude of each Gaussian using the optimal weighted sum threshold as to assign the sites as occupied or unoccupied. our initialization condition. As a first step, The fit with a joint distribution of multiple we leave the network architecture fixed, but Gaussians serves the purpose of discriminating optimize the weight matrix W (1) and weight between the signals produced by atoms on w using conjugate gradient descent, training neighbouring lattice sites. An increased Deep Learning-Assisted Classification of Quantum Gas Microscope Images 8

amplitude for a Gaussian centered on one site is occupied. The output is then normalized site generally corresponds to a reduction of to provide a value in the range {0,1}. This the amplitude on its neighbours, representing architecture is illustrated in figure 2(b). As a reduced occupation probability. Ideally, always, the network is then trained to optimize this method converges to the most likely fidelity from these initial conditions. In section distribution of lattice site occupations for the 3 we refer to this network architecture by the whole image. We implement this method for name “Gaussian network” when we compare three-by-three lattice segments as the state-of- it to both the manual Gaussian fit and the the-art benchmark against which we compare more sophisticated deep convolutional network our machine learning methods. For the tight described below. atom confinement and larger lattice spacings We find that the output of the feedforward (≥ 512 nm) typical of existing quantum gas neural network is itself a good estimator of microscopes, threshold reconstruction remains the confidence of the result. For example, highly effective [3, 4, 5]. We explore this regime an output of 0.6 has an approximately 60% by simulating imaging of erbium atoms in a probability of genuinely corresponding to an two-dimensional square lattice with a spacing occupied site. An output of 0.1 has a 90% of 532 nm, under which conditions we see chance of corresponding to an unoccupied site. up to 99.9% threshold-based reconstruction This allows the classifier to be easily used fidelity. Threshold fidelity drops off as the for confidence-weighting or post-selection of lattice spacing is decreased and PSF overlap experimental results. increases, however, and is closer to 97% in the 266 nm spacing system we aim to image. 2.2. Convolutional neural network In order to re-express the Gaussian fit reconstruction method as a three-layer feedforward network, The three-layer networks introduced above are we use a 512-neuron hidden layer. Each of based on the assumption that atoms act like the neurons in the hidden layer is connected point sources fixed at a lattice site. Without to the input image in the same way as the continuous cooling, however, atoms wander weighted sum method described above, but between sites, so ideally a model would be now the weight matrices correspond not just able to encompass the movement of atoms and to a single Gaussian PSF on the central site, distinguish between an atom originating at a but to sums of Gaussians on each site in all central site and one which has been displaced of the 512 possible distributions of occupied there. In order to produce a model that takes and unoccupied lattice sites. The distributions into account more than simply how well a with the greatest overlaps with the real image central site is fit by a static PSF, we turn will then produce greater activations in the to a more sophisticated network architecture corresponding hidden neurons. The initial known as a convolutional neural network. weights to the final layer are then a sum of A convolutional neural network works on all the hidden neuron values corresponding basis of convoluting an input image with a to an occupied central site, minus all those learned kernel, such that each neuron in a corresponding to an empty central site. This subsequent hidden layer corresponds to the is effectively a majority vote among all the convolution for a specific position of the possible Gaussian fits as to whether the central Deep Learning-Assisted Classification of Quantum Gas Microscope Images 9

A deep convolutional neural network layers this process several times, learning one set of kernels for the input image, then another set with which to convolute the outputs of the first, etc. While the first kernels tend to represent visibly recognizable

Normalized intensity features in the input, the subsequent layers x (px) y (px) are more abstracted, learning, for example, to identify correlations between different features identified by the first layer. A convolution operation is usually accompanied by normalization and application of a function such as a rectified linear unit, serving much the same function as the transfer

Normalized intensity functions in feedforward neural networks. y (px) x (px) Most deep convolutional networks also include pooling layers, which perform the function of

Figure 4. (a) Point spread function determined by producing a statistical summary of the outputs averaging simulated images of 10000 isolated Er atoms, of a convolutional layer. Common pooling with a 3 µs imaging pulse, used to weight pixels in an processes include taking the maximum or the input image for threshold-based reconstruction. (b) average value of the convolution outputs in Learned pixel weights after training the three-layer a given region. Pooling can also perform network in figure 2(a) on a data set of 102400 distinct images, using the PSF as the initial state of the the function of dimensionality reduction; if input layer. Without any additional human input, the the overlap of pooling regions is reduced, the network learns to assign a negative weight to bright number of output neurons will be smaller pixels in the neighbouring sites of the central atom. than the number of inputs. This reduces This example illustrates the ability of even very simple neural networks to learn approximate models of the the complexity of the next convolutional correlations between neighbouring lattice sites. stage, increasing training speed and reducing memory usage. The number of pixels between each step of the pooling filter is known as the kernel on the input. Rather than learning stride. Finally, a convolutional network will a single weight for every input pixel, as in usually finish with a fully connected layer, such a feedforward network, during training the as those illustrated in figure 2, which produces convolutional network learns the weights of an output of a fixed size for a classification or the kernel, which are then re-used for all the regression task. different subsections of the input. This is In this work, we use a network with useful for identifying significant features which three convolutional layers and two average may occur at any position within a 2D image, pooling layers, followed by a fully connected such as the PSF of a freely wandering atom. classification layer. This network architecture In a realistic convolutional neural network is illustrated in figure 5, along with visual architecture, multiple kernels are often used to representations of the features learned by the identify different sets of features. network trained on simulated lattice images. Deep Learning-Assisted Classification of Quantum Gas Microscope Images 10

By testing a range of network parameters, we 3.1. Noncooled erbium lattice find we can achieve optimal performance with We first test our model on the challenging case a convolution kernel size of 10-by-10 pixels, of noncooled and unpinned ultracold atoms. corresponding to the size of a single lattice As a species of interest we choose erbium, site in our training images. We also optimize which is a highly magnetic lanthanide atom the size and stride of pooling layers, and the that has recently been brought to quantum number of training images, all of which is degeneracy [31, 32]. We simulate the following detailed in Appendix B. experimental conditions: prior to imaging, Er atoms are held in a three-dimensional optical 3. Evaluating Classifier Performance lattice with typical spacing of 266 nm. The lattice is then switched off and atoms are We evaluate the performance of both the Gaus- illuminated with a resonant light pulse of 1.5 sian and convolutional networks introduced in µs. The atomic fluorescence is projected onto the previous section in a range of different a CCD camera by our imaging system with a simulated experimental conditions, and com- numerical aperture (NA) of 0.89. The imaging pare them against the benchmark of Gaus- light operates on the 401 nm transition, for sian fit amplitude threshold-based reconstruc- which we predict a maximum scattering rate, tion. The neural network classifiers are ex- limited by the transition’s natural linewidth, tremely flexible, and can be applied to the of 9.5 × 107 s−1. With an imaging beam analysis of any two-dimensional lattice images, intensity ∼10 times higher than the saturation provided the imaging system is understood intensity of the transition, we expect to collect well enough to simulate the imaging of three- less than 90 photons per atom in a single by-three lattice segments to generate labelled image. Given this relatively small number training data. Further details of our simulation of collected photons, we can reliably assume of the imaging system are provided in the Ap- that a negligible number of pixels will be pendix. The performance metric we use is the multiply illuminated, allowing us to binarize reconstruction fidelity across the whole lattice, our images, facilitating the convergence of i.e., the percentage of sites which are correctly neural network training. For cases where the classified when the reconstruction method is magnification is small enough compared to the used to assign every site in a previously unseen lattice spacing that multiple illumination of lattice image. Depending on the particular ex- pixels is likely, we have devised an alternative perimental context in which these methods are normalization function, given in equation A.1 applied, other performance metrics could be in the appendix, to map the input to the range more appropriate. In investigations of Mott- {0,1}. insulating behaviour, for example, the rate at We test convolutional network, Gaussian which a classifier correctly identifies holes in network and threshold reconstructions on an otherwise uniformly filled lattice could be a previously unseen simulated images of entire more useful metric [13, 23]. lattices, which are divided into overlapping three-by-three site blocks for input to the networks. In figure 6, the fidelities of the various methods for a range of site occupation Deep Learning-Assisted Classification of Quantum Gas Microscope Images 11

Convolution 1 Convolution 2

Input Convolution Batch Rectied Average Convolution BatchNorm Image Normalization Linear Unit Pooling ReLu

Convolution 3 Fully Connected Softmax Layer

Classication Output

BatchNorm Convolution Average ReLu Pooling

Figure 5. Illustration of the convolutional neural network architecture used in the present work. The images are a sample of the features learned at each layer of the network. These are created using a version of the deepDream algorithm in MATLAB [30], shown as a grid of artificial images which most strongly activate those features.

densities at 266 nm spacing, from a sparsely filled to an almost completely filled lattice, are shown. In the maximum uncertainty case of half filling, the error rate is reduced from 2.03% for the Gaussian fit threshold method to 1.80% for the convolutional network. For sparse filling the error rate of the convolutional network is just 0.16%, while that of the threshold method is 0.39%, a more than twofold improvement. As the filling increases the performance of all methods decreases as a result of the reduced distinguishability of individual occupied and unoccupied sites, even as the overall entropy of the entire Figure 6. Fidelities of three reconstruction methods, for various lattice filling fractions. From left to right, lattice configuration decreases. We note the methods are: convolutional network, three-layer that at high filling the 512 hidden neuron Gaussian network and threshold reconstruction. All Gaussian network performs particularly well, test images are of unconfined erbium atoms at 266 nm better than both the convolutional network spacing and 1.5 µs illumination time. For each filling fraction we simulate a ten-by-ten site lattice of a given and the threshold-reconstruction, though we filling, which we break up into overlapping three-by- have no clear interpretation for this boost in three segments for fitting. performance. As we increase the lattice period, re- construction performance increases rapidly. The convolutional network achieves as high Deep Learning-Assisted Classification of Quantum Gas Microscope Images 12 as 99.90% reconstruction fidelity at a spac- 100 ing of 532 nm and half-filling of the lattice. Convolutional Gauss net Gauss t threshold Threshold-based reconstruction in these con- 95 ditions provides average fidelity of 99.83%, in- dicating that the neural network continues to 90 delity (%) provide a small but significant advantage at  85 high spacing. The convolutional network fi- cation  delity of 99.9% is maintained at 0.1 lattice fill- ing fraction, dropping only slightly to 99.5% Classi 80 at 0.9 filling. This fidelity is achieved despite 75 expected atom losses of ∼ 3% during imaging 1 2 3 4 5 6 caused by atoms escaping the not fully closed Imaging Pulse Time (µs) imaging transition cycle. That is, the network Figure 7. Reconstruction fidelities using convo- is able to reliably identify most lost atoms even lutional network, three-layer Gaussian network and from the small number of photons they scatter Gaussian fit threshold reconstruction for noncooled er- prior to loss. bium lattices at a range of imaging pulse lengths, with We also use our simulation to estimate a 266 nm lattice spacing and half-filling. The greatest fidelity is expected for a 1.5 µs imaging pulse. the imaging pulse time which maximizes fidelity. Figure 7 shows how the fidelity of threshold, three-layer and convolutional 3.2. Noncooled ytterbium in pinning lattice reconstruction changes with imaging pulse We subsequently seek to evaluate our recon- time. Simulations suggest that the highest struction technique for the case of noncooled reconstruction fidelity can be achieved for a imaging in which atoms are nevertheless con- 1.5 µs imaging pulse. It is assumed that at this fined in a deep lattice during imaging. We timescale background light is not a significant use as our guideline the first known success- contributor to image noise, so the noise level is ful implementation of this scheme, performed taken to be constant over all pulse lengths. We by Miranda et al. [14] using Yb atoms in a observe that the performance of the threshold- lattice of period 543.5 nm. As in the case based reconstruction drops off more rapidly of our unconfined Er lattice, the Yb atoms with increased imaging time than the neural will be heated during imaging, eventually dis- network methods, while the performances placing them from their original lattice site. of the Gaussian and convolutional networks This means that both systems require rela- appear to converge. The simulations in figure tively short imaging pulses with high scatter- 7 are conducted at half-filling of the lattice. ing rates. The addition of a pinning lattice, We find that the fidelity drops off more however, causes the atoms to remain confined sharply as imaging time is increased for dense in a smaller region, and for a longer period of filling of the lattice, though it plateaus at the time, before their eventual loss. The steep po- filling percentage for greater than half-filling, tential gradient at the nodes of the lattice also corresponding to the error rate incurred by drives atoms away from these regions, reduc- assigning all sites as occupied. ing the photon density between sites compared to the unconfined case. In the experiment, Yb Deep Learning-Assisted Classification of Quantum Gas Microscope Images 13

4. Conclusion 0 Intensity (arbitrary units) >1 The extension of site-resolved imaging of optical lattices to noncooled atoms will be a step forward in the flexibility of quantum gas microscopy. We have demonstrated the effectiveness of using both feedforward and convolutional neural networks for the analysis of noncooled lattice images, where low photon counts and atom movement limit the fidelity of traditional reconstruction techniques. We have shown that reconstruction is viable for completely unconfined erbium atoms, for which we can reduce the error rate by Figure 8. Identification of occupied sites in a as much as half compared to state-of-the- simulated 15-by-15 lattice of Yb atoms, with pinning art threshold reconstruction. We have also but without cooling, and 40% filling of the lattice. Sites shown that the convolutional neural networks classified as occupied are identified by white circles, and incorrectly classified sites are marked with red are able to perform consistently as well or crosses. Three of the 225 sites in this particular image better than threshold-based reconstruction were misclassified, corresponding to 98.8% fidelity, for trapped atoms using the test case of consistent with the average fidelity achieved on a larger pinned ytterbium atoms. The neural networks test set. While images were binarized prior to input to designed for this task are flexible, and can the network, setting the value of each nonzero pixel to 1, we display a normalized unbinarized image here. be applied to any imaging system which can be sufficiently well-simulated to produce large labelled data sets to train the network. atoms are imaged on the 1S −1 P transition 0 1 This reconstruction technique can be trivially at 399 nm during a 40 µs pulse, while confined extended to continuously cooled imaging in a lattice of depth 150 µK. With a scatter- systems, where it may prove advantageous in ing rate of 1.3 × 107 Hz, each atom scatters cases where atoms are separated by much less an average of 520 photons per imaging pulse than the diffraction limit of the imaging system [33], of which 6.6% are detected by the cam- and only a small number of photons can be era. The combined loss and hopping rate is collected. 2.5% per pulse. In our simulation we achieve a threshold-based fidelity of 98.6% and a fidelity Acknowledgments using the convolutional classifier of 98.8%, rep- resenting a small but consistent reduction in We wish to thank M. Greiner and the whole the error rate. Figure 8 shows an example of erbium team at Harvard for fruitful discussions a simulated image of a 15-by-15 lattice, with of the challenge of erbium imaging, and M. occupied sites identified by a trained convolu- Miranda for input relating to the simulation tional network and labelled. of noncooled ytterbium imaging. We also wish to thank the anonymous referees for their highly productive feedback which furthered Deep Learning-Assisted Classification of Quantum Gas Microscope Images 14 the development of this work. The neural only the central site is classified as occupied networks presented were trained using the or unoccupied. Training data sets are made HPC infrastructure LEO of the University of up of an equal number of simulated images Innsbruck. This work is financially supported of each of the 512 possible permutations of through an ERC Consolidator Grant (RARE, occupied and unoccupied sites in the three- no. 681432) and a DFG/FWF (FOR by-three lattice. During classification of real 2247/PI2790), the SFB FoQuS (FWF Project images, the entire image will be divided up into No. F4016-N23), a NFRI Grant (MIRARE, overlapping three-by-three segments which are No. OAW0600)¨ from the Austrian Academy of fed individually into the classifier network. Science and the Quantum Flagship PASQuanS The simulation models the stochastic (Grant no. 817482). processes of photon scattering and atom movement which determine the image recorded Appendix A. Imaging Simulation by a quantum gas microscope. Atoms are assumed to begin at the center of each lattice site with zero velocity. We simulate scattering 300 events in which photons are absorbed from four

200 imaging beams aligned in the imaging plane and re-emitted in a random direction of the 100 full solid angle (4π), creating a discrete velocity

0 kick with the corresponding recoil momentum at each event. If there is a lattice potential -100 switched on during imaging, the acceleration -200 and velocity of the atom are updated according to the velocity Verlet algorithm. The lattice y displacement from origin / origin nm from y displacement -300 2 -200 -100 0 100 200 potential is assumed to be a symmetric sin x displacement from origin / nm potential with an amplitude (trap depth) given

Figure A1. A set of three random walks for as an input parameter to the simulation. unconfined erbium atoms imaged with 401 nm light Emitted photons are recorded by the camera and illumination time 3 µs. The atom trajectories with a probability given by the collection are marked by solid lines, and the photon detection efficiency, which is determined by the geometry positions are marked by circles, of the same color as of the imaging system, overall losses due their source atoms. Note that approximately six-times more photons are scattered than detected here. to absorption and quantum efficiency of the camera. Each photon is detected at a random Training our neural networks requires position around the location of the atom itself, large labelled data sets of lattice images, with a probability distribution determined by a which cannot feasibly be constructed using point spread function centered on the atom. In experimental data. As a result we use the initial phase of the simulation, each photon simulations of the imaging process in order detection is represented by unity addition to to generate data sets of realistic images the illuminated pixel in the simulated image. from arbitrary underlying lattice occupation The scattering code is looped with patterns. The networks are trained on images randomized exponential-distributed timesteps of three-by-three site lattice segments, where between absorption and re-emission, with the Deep Learning-Assisted Classification of Quantum Gas Microscope Images 15 natural linewidth as input parameter, leading 12 to an effective scattering rate at about half the natural linewidth as expected. The imaging 10 process is concluded when the total elapsed 8 time exceeds a given imaging pulse time or when the atom escapes the not fully closed 6 transition cycle, accounted for by a small finite 4 lossrate evaluated at each scattering event.

Over the course of an imaging pulse, the Generalization error (%) 2 accumulation of velocity kicks heats the atom 0 and causes it to move on a random walk 103 104 105 away from its initial position. Some example Number of training examples random walks are illustrated in figure A1. Figure B1. Convergence of convolutional neural After looping over the imaging time network classification fidelity with increasing size of for all atoms, Poissonian noise is added the training data set. All data sets are composed of a to each pixel of the image to account for given number of repetitions of each possible occupation clock-induced charges, with a mean noise pattern of a three-by-three set of lattice sites, for value per pixel estimated from state-of-the- erbium imaged at 401 nm for 3 µs. art EMCCD cameras. We also add an overlay of bright pixels consisting of the leak doubly illuminated, preprocessing consists of light from a random configuration of next- binarizing the images by setting the value of nearest neighbors to each image. Only 1000 each illuminated pixel to 1 and all others to such overlays are generated, as opposed to 0. For images with a higher photon count in every five-by-five configuration, and they are which doubly illuminated pixels are likely to randomly added to all images in the training occur, pixels are normalized to the range {0,1} data set. Finally, the multiplier gain according to the formula from EMCCD cameras is applied to every pixel −1 x to calculate how many electrons per pixel will xnorm = tanh tanh (0.5) (A.1) µbright ! be present [34]. The final conversion step where x is an image, or batch of images into counts per pixel, requiring multiplication concatenated to form a single vector, and with a constant factor, adding a constant µbright is the mean value of all the nonzero offset and including the electronic readout elements of x. noise, was omitted in the present analysis. We also did not include further effects like Appendix B. Optimizing network additional charges due to background light or hyperparameters dark current as they should be negligible under the assumed experimental conditions. Hyperparameters are the parameters of the Finally, we implement a preprocessing network which are not updated during train- step, normalizing the data before feeding the ing. Hyperparameters can be individually set data to the neural network for analysis. In by the architect of a neural network, or de- the case of images with a low recorded photon termined through a hyperoptimization process count, where each pixel is very unlikely to be Deep Learning-Assisted Classification of Quantum Gas Microscope Images 16 whereby multiple networks with different hy- References perparameters are separately trained and their [1] Christian Gross and Immanuel Bloch. Quantum performance compared to select the optimal simulations with ultracold atoms in optical lat- hyperparameter values. tices. Science, 357(6355):995–1001, September Aside from network architecture, the 2017. most significant hyperparameter in our case [2] M. Karski, L. F¨orster, J. M. Choi, W. Alt, is the size of the training data set. We A. Widera, and D. Meschede. Nearest-Neighbor Detection of Atoms in a 1d Optical Lattice use data sets composed of equal numbers of by Fluorescence Imaging. Phys. Rev. Lett., simulated images generated from each of the 102(5):053001, February 2009. 512 possible distributions of atoms in a three- [3] Jacob F. Sherson, Christof Weitenberg, Manuel by-three lattice segment. We trained both Endres, Marc Cheneau, Immanuel Bloch, and Stefan Kuhr. Single-atom-resolved fluorescence three-layer and convolutional networks on data imaging of an atomic Mott . Nature, 3 5 sets consisting of between 10 and 1.5 × 10 467(7311):68, September 2010. individual images of erbium lattice segments [4] Waseem S. Bakr, Jonathon I. Gillen, Amy with 266 nm lattice spacing. As can be Peng, Simon F¨olling, and . A quantum gas microscope for detecting single seen in figure B1, the generalization error of atoms in a Hubbard-regime optical lattice. the convolutional network is minimized for Nature, 462(7269):74, November 2009. 12.8×105 images, corresponding to 250 images [5] Lawrence W. Cheuk, Matthew A. Nichols, for each possible distribution of atoms. The Melih Okan, Thomas Gersdorf, Vinay V. error of the three-layer network also generally Ramasesh, Waseem S. Bakr, Thomas Lompe, and Martin W. Zwierlein. Quantum-Gas decreased, though its error is less consistent Microscope for Fermionic Atoms. Phys. Rev. between different datasets due to the difficulty Lett., 114(19):193001, May 2015. of reliably converging to a good local minimum [6] G. J. A. Edge, R. Anderson, D. Jervis, D. C. without prior dimensionality reduction. As the McKay, R. Day, S. Trotzky, and J. H. Thy- wissen. Imaging and addressing of individual unconfined erbium atoms at 266 nm spacing fermionic atoms in an optical lattice. Phys. Rev. represent the most difficult test case for our A, 92(6):063406, December 2015. networks, it can be assumed that other cases [7] Elmar Haller, James Hudson, Andrew Kelly, would not need any larger training sets. Dylan A. Cotta, Bruno Peaudecerf, Graham D. Bruce, and Stefan Kuhr. Single-atom imaging For the convolutional network, we also of fermions in a quantum-gas microscope. Nat. need to optimize a number of parameters Phys., 11(9):738, September 2015. for each convolutional and pooling layer. [8] Ahmed Omran, Martin Boll, Timon A. Hilker, As described in the text, we find a kernel Katharina Kleinlein, Guillaume Salomon, Im- size of 10-by-10 pixels for all layers gives manuel Bloch, and Christian Gross. Micro- scopic Observation of Pauli Blocking in Degen- us our best performance. We use a erate Fermionic Lattice Gases. Phys. Rev. Lett., progressively increasing number of filters in 115(26):263001, December 2015. each convolutional level, beginning with 20 in [9] Maxwell F. Parsons, Florian Huber, Anton the first layer followed by 40 in the second and Mazurenko, Christie S. Chiu, Widagdo Se- tiawan, Katherine Wooley-Brown, Sebastian 100 in the final layer. For the pooling layers, we Blatt, and Markus Greiner. Site-Resolved find that our best performance is achieved for Imaging of Fermionic 6Li in an Optical Lattice. a pooling region of 5-by-5 pixels with a stride Phys. Rev. Lett., 114(21):213002, May 2015. of 2. [10] Debayan Mitra, Peter T. Brown, Elmer Guardado-Sanchez, Stanimir S. Kondov, Deep Learning-Assisted Classification of Quantum Gas Microscope Images 17

Trithep Devakul, David A. Huse, Peter Schauß, Christoph Becker, Klaus Sengstock, and and Waseem S. Bakr. Quantum gas microscopy Christof Weitenberg. Identifying quantum of an attractive fermi–hubbard system. Nat. phase transitions using artificial neural net- Phys., 14(2):173–177, February 2018. works on experimental data. Nat. Phys., [11] Karl D. Nelson, Xiao Li, and David S. Weiss. 15:917, July 2019. Imaging single atoms in a three-dimensional [23] Annabelle Bohrdt, Christie S. Chiu, Geoffrey array. Nat. Phys., 3(8):556–560, August 2007. Ji, Muqing Xu, Daniel Greif, Markus Greiner, [12] Andrea Alberti, Carsten Robens, Wolfgang Eugene Demler, Fabian Grusdt, and Michael Alt, Stefan Brakhane, Michal Karski, Ren´e Knap. Classifying snapshots of the doped Reimann, Artur Widera, and Dieter Meschede. hubbard model with machine learning. Nat. Super-resolution microscopy of single atoms in Phys., 15:921, July 2019. optical lattices. New J. Phys., 18(5):053010, [24] Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, May 2016. Laurent Daudet, Maria Schuld, Naftali Tishby, [13] Daniel Greif, Maxwell F. Parsons, Anton Leslie Vogt-Maranto, and Lenka Zdeborov´a. Mazurenko, Christie S. Chiu, Sebastian Blatt, Machine learning and the physical sciences. Florian Huber, Geoffrey Ji, and Markus arXiv:1903.10563 [astro-ph, physics:cond-mat, Greiner. Site-resolved imaging of a fermionic physics:hep-th, physics:physics, physics:quant- . Science, 351(6276):953–957, ph], March 2019. February 2016. [25] Giacomo Torlai, Guglielmo Mazzola, Juan Car- [14] Martin Miranda, Ryotaro Inoue, Naoki Tambo, rasquilla, Matthias Troyer, Roger Melko, and and Mikio Kozuma. Site-resolved imaging of a Giuseppe Carleo. Neural-network quantum bosonic Mott insulator using ytterbium atoms. state tomography. Nat. Phys., 14(5):447, May Phys. Rev. A, 96(4):043626, October 2017. 2018. [15] Andrea Bergschneider, Vincent M. Klinkhamer, [26] P. Ilzh¨ofer, G. Durastante, A. Patscheider, Jan Hendrik Becher, Ralf Klemt, Gerhard Z¨urn, A. Trautmann, M. J. Mark, and F. Ferlaino. Philipp M. Preiss, and Selim Jochim. - Two-species five-beam magneto-optical trap resolved single-atom imaging of 6Li in free for erbium and dysprosium. Phys. Rev. A, space. Phys. Rev. A, 97(6):063613, June 2018. 97(2):023633, February 2018. [16] Lenka Zdeborov´a. Machine learning: New tool in [27] Ian Goodfellow, Yoshua Bengio, and Aaron the box. Nat. Phys., 13:420, February 2017. Courville. Deep Learning. MIT Press, 2016. [17] Kyle Mills, Michael Spanner, and Isaac Tamblyn. [28] Yoshua Bengio, Dong-Hyun Lee, Jorg Bornschein, Deep learning and the Schr¨odinger equation. Thomas Mesnard, and Zhouhan Lin. To- Phys. Rev. A, 96(4):033402, October 2017. wards Biologically Plausible Deep Learning. [18] Alireza Seif, Kevin A. Landsman, Norbert M. arXiv:1502.04156 [cs], February 2015. Linke, Caroline Figgatt, C. Monroe, and [29] Zachary C. Lipton. The mythos of model Mohammad Hafezi. Machine learning assisted interpretability. arXiv:1606.03490 [cs, stat], readout of trapped-ion qubits. J. Phys. B: At. June 2016. Mol. Opt. Phys., 51(17):174006, August 2018. [30] MathWorks 2019. deepDreamIm- [19] Alexandre Goy, Kwabena Arthur, Shuai Li, and age. MATLAB R2019a documenta- George Barbastathis. Low Photon Count Phase tion. [https://www.mathworks.com/ Retrieval Using Deep Learning. Phys. Rev. help/deeplearning/ref/deepdreamimage.html] Lett., 121(24):243902, December 2018. (Accessed August 2019). [20] Kelvin Ch’ng, Juan Carrasquilla, Roger G. Melko, [31] K. Aikawa, A. Frisch, M. Mark, S. Baier, and Ehsan Khatami. Machine Learning Phases A. Rietzler, R. Grimm, and F. Ferlaino. Bose- of Strongly Correlated Fermions. Phys. Rev. X, Einstein Condensation of Erbium. Phys. Rev. 7(3):031038, August 2017. Lett., 108(21):210401, May 2012. [21] Juan Carrasquilla and Roger G. Melko. Ma- [32] K. Aikawa, A. Frisch, M. Mark, S. Baier, chine learning phases of matter. Nat. Phys., R. Grimm, and F. Ferlaino. Reaching Fermi 13(5):431, May 2017. Degeneracy via Universal Dipolar Scattering. [22] Benno S. Rem, Niklas K¨aming, Matthias Phys. Rev. Lett., 112(1):010404, January 2014. Tarnowski, Luca Asteria, Nick Fl¨aschner, [33] Martin Miranda, Ryotaro Inoue, Yuki Okuyama, Deep Learning-Assisted Classification of Quantum Gas Microscope Images 18

Akimasa Nakamoto, and Mikio Kozuma. Site- resolved imaging of ytterbium atoms in a two- dimensional optical lattice. Phys. Rev. A, 91(6):063414, June 2015. [34] Robert N. Tubbs. Lucky Exposures: Diffraction Limited Astronomical Imaging Through the Atmosphere. Doctoral, Cambridge University, Cambridge, September 2003.