<<

Joint -State and Measurement with Incomplete Measurements

Adam C. Keith,1, 2, ∗ Charles H. Baldwin,1 Scott Glancy,1, † and E. Knill1, 3 1Applied and Computational Mathematics Division, National Institute of Standards and Technology, Boulder, Colorado, 80305, USA 2Department of Physics, University of Colorado, Boulder, Colorado, 80309, USA 3Center for Theory of Quantum Matter, University of Colorado, Boulder, Colorado 80309, USA (Dated: October 17, 2018) Estimation of quantum states and measurements is crucial for the implementation of protocols. The standard method for each is . However, quantum tomography suffers from systematic errors caused by imperfect knowledge of the system. We present a procedure to simultaneously characterize quantum states and measurements that mitigates sys- tematic errors by use of a single high-fidelity state preparation and a limited set of high-fidelity unitary operations. Such states and operations are typical of many state-of-the-art systems. For this situation we design a set of experiments and an optimization algorithm that alternates between maximizing the likelihood with respect to the states and measurements to produce estimates of each. In some cases, the procedure does not enable unique estimation of the states. For these cases, we show how one may identify a set of density matrices compatible with the measurements and use a semi-definite program to place bounds on the state’s expectation values. We demonstrate the procedure on data from a simulated experiment with two trapped ions.

PACS numbers: 03.65.Wj, 03.67.-a, 37.10.Ty,

I. INTRODUCTION A drawback to standard QST and QDT is that they require well-known measurements and state preparations Recent experiments have demonstrated high-fidelity respectively. However, it is difficult to produce such mea- unitary operations in various platforms for quantum in- surements and state preparations when some processes formation processing; for examples see Refs. [1–6]. Even have significantly lower fidelity than others. Attempting in the most advanced systems, some operations are standard QT in this case typically results in estimates harder to accomplish and have significantly lower fidelity with systematic errors. than others. For example, in the systems reported in To combat systematic errors, we adapt standard QST Refs. [1–6] single- gates are accomplished with sig- and QDT to the situation where a single state prepara- nificantly higher fidelity than two-qubit gates. A natural tion and a limited set of unitary operations (for example, question is then, how can we use the high-fidelity opera- single-qubit rotations) have significantly higher fidelity tions to diagnose other parts of the quantum system? In than other processes and measurements. For this situ- this work, we propose such a procedure that uses a single ation, we develop a procedure to estimate other states high-fidelity state initialization and a limited set of uni- and the measurement operators simultaneously with an tary operations to diagnose other state preparations and alternating maximum likelihood estimation (MLE) algo- measurement operators. rithm. When the measurements are not informationally The standard method to diagnose state preparations complete the state is non-identifiable. However, it is “set and measurements is quantum tomography (QT). Quan- identifiable,” meaning that we can specify a set of den- tum state tomography (QST) is a procedure to estimate sity matrices that are compatible with the measurements. an unknown from experimental data. This set can be used to upper and lower bound impor- When the measurements are informationally complete, tant quantities like fidelity of the state preparations or the resulting data can be used to estimate the corre- expectation values of other . The estimates sponding density [7, 8], and we say that the state also provide information about the quantum processes is “identifiable.” Quantum detector tomography (QDT) that produced the unknown states. is a procedure to estimate an unknown quantum mea- Systems with a limited set of high-fidelity processes are surement [9, 10]. When the unknown measure- common in state-of-the-art quantum information exper- ment is applied to an informationally complete set of al- iments. Our original motivation comes from trapped-ion ready known quantum states, the resulting data can be experiments where single-qubit gates have been demon- used to create an estimate of the measurement operator, strated with high fidelity, but entangling gates have lower and the measurements are identifiable. fidelities. The procedure described here was implemented in trapped-ion experiments described in Refs. [1, 11]. We return to the trapped-ion example throughout this paper to illustrate our procedure. Trapped ions are measured ∗ [email protected] by observing the presence or absence fluorescence pro- † [email protected] duced by ions in the “bright” or “dark” computational 2 states. A measurement datum is the number of that describes the state of a quantum system. To esti- detected fluorescence photons. To reduce computational mate ρ, we prepare many identical copies of the quantum complexity, we coarse-grain the measurement outcomes state and apply a known quantum measurement to each. with a strategy that maximizes mutual information be- In the following, a sequence of identical state prepara- tween the raw and coarse-grained data. Our strategy can tions and measurements is called an “experiment”, while be used for other qubit systems measured by fluorescence a particular state preparation and measurement in an or even systems that give continuous measurement out- experiment is referred to as a “trial.” A quantum mea- comes, like superconducting [12]. surement is associated with the measurement operators P 1 QT with unknown states and measurements has been Fb of a POVM {Fb}b, where Fb ≥ 0 and b Fb = . The considered in previous work [13, 14]. Our procedure dif- of outcome b in a trial is given by the Born fers from these works by using alternating MLE, which rule pb = Tr (Fbρ). In some instances, several different yields estimates consistent with all the data collected. are used for QST; in these cases we perform sep- Our procedure also resembles other variants of QT, arate experiments for each POVM. The QST formalism such as self-consistent [15, 16] and gate-set tomography assumes that uncertainty about the measurement oper- (GST) [17]. These methods treat the quantum system as ators of the POVMs is negligible. If the measurement a “black-box” about which nothing (or almost nothing) is operators from all experiments span the bounded oper- known. Notably, in GST one has access to a collection of ators on the , we call the set of POVMs unknown quantum processes or “gates,” and one creates “informationally complete” (IC). If the state’s probabil- a series of experiments, applying the gates in different ities for the outcomes of a set of IC POVMs with spec- orders. By performing the proper experiments, one can ified measurement operators are exactly known, one has find a full estimate of all the gates simultaneously (up to all the information necessary to exactly reconstruct the an unobservable gauge) [17]. The method has been suc- unknown [7, 8]. In an experimental im- cessful in experiments for single qubit systems [18–20]. In plementation, the are never exactly known, GST, the goal is to learn everything about the system. because one only runs a finite number of trials. One In our procedure, we assume more about the system and therefore numerically estimates the quantum state from ask less about the outcome, thereby requiring fewer ex- the measured frequencies of the outcomes via techniques periments. While this requires stronger pre-experiment such as MLE [21, 22]. knowledge about the system than is available in GST, In QDT, the goal is to estimate the measurement op- we believe that it is well suited to a situation that occurs erators associated with an unknown quantum measure- in many state-of-the-art experiments and complements ment. To estimate the measurement operators, we pre- these other proposals. pare many identical copies of members of a family of We begin by defining and describing the standard known quantum states {ρj}j. We then perform experi- methods for QST and QDT in Sec. II. We also dis- ments where we prepare many copies of one member of cuss and physically motivate some additional assump- this family and apply the unknown quantum measure- tions about the measurement that greatly simplify our ments. For a given trial in one of these experiments the procedure. Then, in Sec. III, we introduce the exper- probability of each outcome is given by the iments required for our procedure and discuss the ap- pj,b = Tr (Fbρj). In the QDT formalism, we assume the plication to the example trapped-ion system. Next, we uncertainty about the states {ρj}j is negligible. We de- discuss our numerical technique to extract estimates of fine an IC set of states for QDT as a set that spans the the states and measurements from the experimental data bounded operators on the Hilbert space, which is analo- in Sec. IV. In Sec. V, we show how to upper- and lower- gous to IC POVMs in QST. The probabilities from each bound important expectation values if the states are only outcome along with the exact description of an IC set set-identifiable. We then discuss how to estimate the un- of states allow for unique reconstruction of each mea- certainties Sec. VI. We summarize our procedure and surement operator. As with QST, in practice we cannot discuss future directions in Sec. VII. We also include determine the probability of each outcome due to a fi- three appendices that describe our software implemen- nite number of copies of the unknown state. Thus, the tation (App. A), a method for coarse-graining measure- measurement operators are numerically estimated from ment outcomes (App. B), and the stopping criteria for the frequencies of the outcomes via techniques such as iterations of the likelihood maximization (App. C). MLE [10]. In this work, we use aspects of QST and QDT to cre- ate a hybrid procedure that estimates both states and II. GENERAL MODEL detectors in a single maximization of the likelihood given the entire data set. Instead, one could consider first Our procedure is rooted in QST and QDT, so we be- performing QDT tomography using {ρj}j. Once the gin with a brief description of the standard version of detectors have been calibrated, they could be used for each. We focus on quantum systems that are described QST. Compared to this separated strategy, our proce- with finite, d-dimensional Hilbert spaces. QST is a pro- dure achieves lower uncertainty because it includes all cedure to estimate the unknown d × d density matrix ρ, data when estimating both the measurement operators 3 and unknown states. (The separated strategy would not k b include data from unknown states in QDT.) We obtain Q1,1 1 1 the global maximum of the full likelihood rather than P1,i maximizing two separate likelihood functions, which may 2 2 휌 풰 Π푘 푘 not be maximized at the same location as the full like- 푖 P2,i lihood function. Furthermore, when using the separated strategy one must somehow propagate uncertainty in the PN,i QN,M measurement operators into the QST, which uses those N M measurement operators, but we do not know of a robust 퐹 method for that uncertainty propagation. 푖,푏 푏 We assume that one has access to a limited set of high- fidelity unitary operations. We designate such a set of FIG. 1. Schematic describing the measurement model for a single engineered measurement {F } . A density matrix ρ unitary operators with indices as U = {Ui|i = 1, . . . , r}, i,b b is rotated by the U , which determines the where U [ρ] = U ρU † and U is the identity, U = 1. We i i i i 0 0 basis in which ρ is measured. In the measurement, first the additionally assume that we can reliably prepare a known underlying POVM {Πk}k is applied, and result k is obtained state ρ . Then, by applying the high-fidelity processes to 0 with probability Pk,i. Then, a Markov process acts to give this state, we generate a family of known states ρ = random outcome b with probability Qk,b. In the experiment {ρi = Ui[ρ0]}i. There could be repetitions in this family. we only perceive the outcome b. Ellipses and dotted arrows In addition to these known states, other processes gen- show that there may be an arbitrary finite number of under- erate a family of unknown states σ = {σj|j = 1, . . . , s} lying measurement operators Πk and values of b. to be characterized. All states are read-out by an un- known quantum measurement described by the POVM Tr (F ρ). The POVM F is referred to as the “bare F = {Fb|b = 1,...,M}. To measure in different bases, b we apply one of the high-fidelity processes prior to the POVM.” As mentioned above, we also apply the high- measurement, thereby creating the measurement opera- fidelity processes U prior to the measurement to create † † “engineered POVMs” whose members are tors Fi,b = Ui [Fb] = Ui FbUi, where F0,b = Fb by defini- tion of U . X 0 F = Q Π , (2) To significantly simplify our procedure, we add an ad- i,b k,b i,k k ditional assumption that the measurement operators Fb are unknown mixtures of the operators of an underlying † 1 where Πi,k = Ui [Πk]. Because we designated U0 = , POVM Π = {Πk|k = 1,...,N}, whose members are well {Fi,b}i,b is the complete family of measurement opera- known. To avoid numerical instabilities, we assume that tors, bare and engineered. no two measurement operators in Π are equal. The re- In the theory described below, the underlying POVM sulting measurement is depicted in Fig. 1. We model the Π may be any POVM, but we specifically discuss POVMs measurement process by the application of the underly- whose measurement operators are orthogonal subspace ing POVM Π yielding hidden outcome k, followed by an projectors. This further assumption was originally mo- unknown Markov process, after which outcome b is ob- tivated by trapped-ion experiments where the underly- served. The Markov process is described by the transition ing POVM consists of projectors onto orthogonal inter- matrix Qk,b. (If the outcome b is a continuous variable, nal states of each ion. This model is a useful descrip- our procedure requires discretization.) This situation is tion because the quantization axis and measurements are common in many experiments, where physics constrains aligned with high accuracy (further details are given in the underlying quantum measurement process, but sub- the next section). The situation also occurs in other ex- sequent incoherent effects make identification of the mea- periments with similarly high accuracy alignment. If the surement operators corresponding to outcomes difficult. assumption does not hold naturally, it can be easily en- Previous techniques have further constrained the model forced if the operations take place in a rotating frame. by assuming a form for the Markov process. For example, Then the measurements can be decohered by randomiz- in some ion trap experiments, the observed outcomes are ing the phase (or time) between the operations and mea- assumed to be a mixture of Poissonians given the hidden surements for each trial. Then, averaged over many trials outcomes [23, 24]. By not constraining the Markov pro- any will be lost and the underlying measure- cess by a model, we avoid systematic errors from model ment operators are orthogonal subspace projectors. mismatch. The measurement operator Fb can now be expressed in terms of the Markov process and the underlying mea- III. EXPERIMENTAL MEASUREMENT surement operators as PROCEDURE X F = Q Π . (1) b k,b k Our procedure consists of two experimental steps fol- k lowed by numerical processing of the data. In this sec- The probability of observing b given state ρ is pb = tion, we describe the experimental steps and introduce 4 an example application to trapped-ion systems to illumi- ions, similar to Ref. [1]. Each ion is treated as a single nate the discussion. A schematic for the measurement qubit with computational basis |↑i, called the “bright and numerical estimation procedure is in Fig. 2. state,” and |↓i, called the “dark state.” Optical pumping The first experimental step is related to standard QDT is used to initialize each ion in the bright state, ρ0 = of the bare POVM. We call the experiments done during |↑↑i h↑↑|, with high fidelity. this step the “reference” experiments since they will be As in Ref. [1], we consider the high-fidelity processes to used to initialize the algorithm described in the next sec- be single-qubit rotations applied to both ions. A single- tion. There is one reference experiment for each known qubit rotation is defined by pure state ρi in ρ. The ith reference experiment, indexed U(θ, φ) = exp −i θ (σ cos φ + σ sin φ) , (6) i, consists of n trials. In each trial, we prepare ρi and 2 x y apply the bare POVM. The probability for outcome b in where σ and σ are Pauli operators. Collective ro- a given trial is x y tations U(θ, φ)⊗2 can be accomplished with, e.g., mi- (0) X crowaves applied uniformly across the ion trap. For the p = Tr (F ρ ) = Q Tr(Π ρ ). (3) i,b b i k,b k i example discussed below, we choose the subset U = k ⊗2 π ⊗2 ⊗2 π π ⊗2 {U(0, 0) ,U( 2 , 0) ,U(π, 0) ,U( 2 , 2 ) } as the set The measurement outcomes are sampled from the distri- of high-fidelity processes. Since two-qubit entangling op- bution given by Eq. (3). We assume the outcomes of the erations have significantly lower fidelities, a natural task trials are independent and identically distributed. The is to diagnose entangled states created by these opera- outcomes of all trials from all reference experiments are tions. collected into a matrix H(0) such that row i is the rela- Measurement of the ions is accomplished by stimulat- tive frequency histogram for experiment i. Specifically, ing a cycling transition between the bright state and an- (0) other internal state (outside of the computational ba- the matrix element Hi,b is the number of times outcome b is observed for experiment i divided by n. We call this sis). This transition is well aligned with the quantiza- matrix the “histogram matrix” in the following. tion axis and highly detuned from the dark state, which The second experimental step is similar to QST, where implies observing fluorescence is a projective measure- the goal is to extract information about the unknown ment of an ion in the bright state. The ions are close to- quantum states σ. We call the experiments done during gether in the trap, so the measurement cannot distinguish this step the “probing experiments,” because they are de- which ion is fluorescing. The corresponding POVM con- signed to probe the unknown states. There is one prob- sists of subspace projectors, Π = {Π0 = |↓↓i h↓↓| , Π1 = ing experiment for each engineered POVM and unknown |↓↑i h↓↑| + |↑↓i h↑↓| , Π2 = |↑↑i h↑↑|}. However, we can- state. The probing experiment, indexed by i, j, consists not observe the outcome of this POVM directly in any experiment since the number of photons in the fluores- of n trials, where in each trial, we prepare σj and apply the engineered POVM indexed by i. The probability of cence signal is distributed according to counting statistics getting outcome b in a given trial is as well as other detection errors such as dark counts, re- pumping to the dark state, or detector inefficiency. These (j) X effects act as the Markov process described above, so the p = Tr(U †[F ]σ ) = Q Tr(Π σ ). (4) i,b i b j k,b i,k j observed outcomes are associated with the POVM, F . k Simulated histograms of the above states are shown in The outcomes of the trials from the probing experiments Fig. 3. With the processes considered, we cannot create (j) on state σj are collected into the histogram matrix H . an IC set of engineered POVMs because the POVM Π In general, the probability of outcome b in a trial of a (and therefore also F ) cannot distinguish the two single- reference or probing experiment has the probability dis- ion-bright states and all of the rotations in U act equally tribution, on both qubits. However, the measurements are sufficient to identify a Bell-state fidelity, as is discussed in Sec. V. (j) † X pi,b = Tr(Ui [Fb]τj) = Qk,bTr(Πi,kτj), (5) k IV. MAXIMUM LIKELIHOOD ESTIMATION s TECHNIQUE where τ = (τj)j=0 = (ρ0, σ1, . . . , σs). For j = 0, Eq. (5) reduces to Eq. (3) and for j > 0 it reduces to Eq. (4). In this discussion we have made a few simplifying as- We now present a maximum likelihood estimation sumptions about the measurement record, such as that (MLE) technique that produces numerical estimates of every experiment contains the same number of trials and the measurement operators and the unknown states. Like that every state τj is subjected to the same set of mea- all MLE techniques, if the true operators and states are surements. These assumptions are not necessary for our not on the boundary, the technique is optimal in that procedure and are made here only to simplify the math- its variance asymptotically approaches the Cram´er-Rao ematical notation. lower bound [25]. Our algorithm starts by coarse grain- As an example of the complete procedure we consider ing the outcome space b, which condenses the data to the task of diagnosing the internal state of two trapped speed up the numerical processing. We then create an 5

Initial estimate of Maximize … … measurement ℒ � � � � �, �, � Reference experiments … �, … Coarse Repeat n times for each i Maximize grain bins � … � ℒ … … … � � � � � � , , RrR � … RrR � … Probe experiments … �, … Check OK to stopping stop criterion

FIG. 2. Schematic showing measurement and estimation procedure. Reference experiments begin with the preparation of the known state ρ0, and probe experiments begin with unknown states from σ = {σj |j = 1, . . . , s}. During each experiment the states are transformed by a process in the set U = {Ui|i = 0 . . . r} and then measured with the POVM F = {Fb|b = 1,...,M}, giving outcome b. For simplicity, we assume every known process acts on each unknown state, but this is not necessary. After n (j) trials of each experiment, histograms {Hb,i }b,i for each state are recorded. These histograms are then coarse-grained to produce (j) new histograms {Hc,i }c,i based on a training data set composed of 10 % of the trials randomly selected without replacement from each reference experiment. From the reference experiments’ data, we calculate an initial estimate for the POVM Fˆini. We divide the likelihood maximization into two concave subproblems: maximization of L1 with respect to σ and maximization of L2 with respect to F . We alternate between using the RρR algorithm to maximize L1 and a standard nonlinear optimizer to cur cur maximize L2. At each iteration we find maximizing parameters σˆ and Fˆ , and we check a stopping criterion. When the stopping criterion signals that we can stop iterations, we have the numerical maximum likelihood estimates σˆ and Fˆ.

FIG. 3. Histograms from simulated measurements of two ions. An individual ion fluoresces if it is in the |↑i (bright) state. We model the distributions of photon counts as Poissonians. In a real experiment the distributions differ from Poissonians due to processes such as repumping from the dark to the bright state. For two ions, there are three possible count distributions corresponding to the total number of ions in the bright state: zero ions bright (mean 2, due to dark counts), one ion bright (mean 20), and two ions bright (mean 40). For one ion bright, the photon detector cannot determine which of the two ions is fluorescing. (a) Relative frequency histograms for both ions in the dark state (blue), one ion on the dark state (orange), and π ⊗2 both ions in the bright state (red). (b) Relative frequency histogram for the reference experiment with U1 = U( 2 , 0) . (c) π π ⊗2 Relative frequency histogram for the probing experiment with U3 = U( 2 , 2 ) . 6 initial estimate of the transition matrix Q. Finally, we the states span the subspace that is spanned by the un- search for the estimates of Q and σ that maximize the derlying POVM. We only consider situations where this total log- of all the data. This is ac- condition is met. The initial estimate of the transition complished by an alternating approach. While this is not matrix is also used to calculate an initial estimate of the ˆini P ˆini proven to yield the maximally likely operators and states observed measurement operators, Fc = k Qk,cΠk. in all instances, we have found no counterexamples and The final step is to determine estimates of the states believe that the maximum likelihood solution is typically and measurement operators that maximize the total like- found. lihood function L. To derive L, we first need the proba- The first step is to coarse grain the outcome space. In bility of observing a given histogram matrix H(j), which many cases, the set of possible values of the outcomes is the product of the probabilities of observing each out- b is large, that is b = 1,...,M where M  1. The come from Eq. (5) and is given by collected histograms and transition matrix are then also (j) large, which is a main contributor to the computational Y Hi,c Prob(H(j)|τ , F , U) = Tr (U †[F ]τ ) . (8) cost of likelihood maximization. Coarse graining the out- i c j come space shrinks the histograms and transition matrix, i,c thereby reducing the complexity. The coarse graining is The probability of obtaining the histogram matrix from accomplished by creating a pre-specified number G < M all of the experiments given the unknown and the known of contiguous bins on the set of possible outcomes. Each parameters is then bin is defined by its edges in the outcome space. We in- (j) dex the bins, which correspond to new coarse-grained Y † Hi,c outcomes, by c = 1,...,G. Each outcome b is then Prob(H|τ , F , U) = Tr (Ui [Fc]τj) , (9) reassigned to a coarse-grained outcome c based on the j,i,c bin edges. The bin edges are constructed according to (j) a heuristic that minimizes the information loss between where H = {H }j. This probability is the total like- the original outcomes and the coarse grained outcomes lihood function L. When we maximize L, we vary the (details are in Appendix B). We randomly select without unknown parameters σ and F , which are the parameters replacement 10 % of the trials from the reference exper- of the statistical model, and keep the known parameters iments as a training set with which we choose the bin ρ0, U and the histogram matrix H fixed. To emphasize edges. The training set is then excluded from further the distinction between varying and fixed parameters, analysis. (To simplify notation, in the results presented we write the likelihood function as L(σ, F |H, U, ρ0) = below, we simulated reference experiments with (10/9)n Prob(H|τ , F , U). The estimates for the states and mea- trials so that the number of trials for reference and prob- surement operators that maximize L also maximize the ing experiments is equal after coarse graining, although log-likelihood function L = ln(L). As is standard prac- the technique can be used with unequal numbers of tri- tice, we maximize L instead of L because L has conve- als.) The coarse-grained outcomes then determine a new nient concavity properties, and it avoids numerical issues (j) with extremely small values. The log-likelihood is set of histogram matrices Hi,c , measurement operators F , and transition matrix Q . For the remainder of the X (j) c k,c L(σ, F |H, U, ρ ) = H ln Tr (U †[F ]σ ). (10) paper we discuss the coarse-grained version of each. 0 i,c i c j The next step is to use the reference histogram matrix i,j,c (0) H to derive an initial estimate of the transition matrix In this case, the log-likelihood is not a concave function, Q, which we later use to initialize the likelihood maxi- (0) which makes it difficult to determine the global maximum mization. In the limit of infinite trials H is equal to value. However, L is separately concave in σ and F . the from Eq. (3) but with a finite This can be confirmed by computing the log-likelihood’s (0) P number of trials Hi,c ≈ k Qk,cTr(Πkρi). We can re- second derivatives with respect to σ and F and noting express this relation in matrix form, H(0) ≈ PQ, where Q that they are negative semidefinite. has elements Qk,c, and P has elements Pi,k = Tr(Πkρi). We present an iterative technique to find the maximum We refer to the elements Pi,k as “populations,” since of L. Since the log-likelihood is separately concave in σ they are the probabilities that state i is in subspace and F , optimizing over one while holding the other fixed k. The population matrix P is known a priori, because is a concave optimization problem whose local maxima the states are prepared with the high-fidelity processes. are global maxima. So to maximize L over both σ and F Therefore, we can derive an initial estimate for Q, which jointly, we alternate between two subproblems: (1) con- describes the bare measurement, by applying the left cave optimization over σ with F fixed and (2) concave pseudo-inverse of P , optimization over F with σ fixed (details are given be- low). Here, we describe the subproblem optimizations in −1 Qˆini = P >P  P >H(0). (7) the case of a single unknown state, σ = {σ} (we drop the boldface notation for the unknown states since the The left pseudo-inverse exists when (P ) = N, that is, family has a single member), which requires a single set the number of subspace projections. This requires that of probing experiments with histogram matrix H(1). 7

The first subproblem is maximization of the log- problem: likelihood with respect to the unknown density matrix σ cur keeping the measurement operators fixed at their current maximize: L2(Q|H, Pˆ ), estimate Fˆcur. For the initial step, we fix the measure- Q cur ini X ment as Fˆ = Fˆ . For this subproblem the objective subject to: Qc,k = 1, ∀ k, (16) function is c 0 ≤ Qc,k ≤ 1, ∀ k, c. (1) ˆcur X (1) † ˆcur L1(σ|H , U, F ) = Hc,i ln Tr (Ui [Fc ]σ). (11) c,i The program is initialized with the current estimate of the transition matrix Qˆcur (Qˆini for the first itera- (1) Note that L1 only uses the histogram matrix H from tion) and runs until a pre-specified stopping tolerance the probing experiments because the reference experi- is reached. All histogram matrices are considered in the ments are independent of σ. We numerically search for optimization, which ensures that the estimated transi- the density matrix σ that maximizes L1 by solving tion matrix Qˆ is consistent with both the reference and probing experiments described in the previous section. (1) cur maximize: L1(σ|H , U, Fˆ ), The assumption that the measurement operators are σ (12) linear combinations of an underlying POVM significantly subject to: Tr σ = 1, simplifies the program in Eq. (16). Without this assump- σ  0, tion, we would require semi-definite constraints to ensure that the measurement is physical. With the assumption, where σ  0 means that σ is a positive semidefinite the linear constraints ensure that the estimated transi- matrix. To accomplish this, we use the RρR algo- tion matrix Qˆ is a probability distribution and thus the rithm [22, 26], initialized with σ(1) =σ ˆcur (ˆσcur = corresponding observed measurement operators Fˆ are 1/d for the first iteration). At the kth iteration of physical. the algorithm, the state σ(k) is updated to σ(k+1) = We alternate between these two subproblems until N (R(σ(k))σ(k)R(σ(k))), where N indicates normalization a gradient-based stopping criterion is met (see Ap- and R is pendix C), yielding estimatesσ ˆ and Fˆ. While there is no guarantee that these estimates correspond to a global (1) ˆcur X Hi,c Fi,c R(σ) = . (13) maximum [28], in numerical experiments we find the al- ˆcur gorithm converges to estimates that are close to the true i,c Tr (σFi,c ) parameters. This ensures that at each iteration the estimate is phys- There may be states other thanσ ˆ that produce the ical and the likelihood is nondecreasing [22]. The al- same maximum value of the likelihood function. This gorithm is run until the stopping condition derived in occurs when the high-fidelity processes do not generate an IC set of POVMs, that is, {Fˆ } does not span Ref. [27] is met. For multiple unknown states, L1 is the i,c i,c P the space of bounded operators. This is the case in the sum of the log-likelihoods for each state L1 = L1(σj), j trapped-ion example given at the end of Sec. III, where where L1(σj) has the form of Eq. (11). To maximize L1 in this case, we calculate R for each σ and iteratively the POVM cannot determine which ion is in the bright j state. In this case,σ ˆ is an element of the “set of MLE update each σj individually. The second subproblem is maximization of the log- states” where each element of the set produces the same maximum value of the likelihood function. Similarly, likelihood with respect to the measurement operators, ˆ F , with σ fixed at its current estimateσ ˆcur returned by there may be transition matrices other than Q that yield the most recent use of the RρR algorithm. The objective the same maximum value of the likelihood function. This function for this subproblem is occurs when the family of known input states does not span the subspace of the underlying POVM. However, as X (j) mentioned previously in this section, we do not consider L (F |H, U, τˆcur) = H ln Tr (U †[F ]ˆτ cur), (14) 2 c,i i c j this case because it would cause other complications with c,i,j our initialization and would complicate the estimation of cur cur expectation values described in the next section. where τˆ = (ρ0, σˆ ). Because the measurement oper- ators are constrained according to Eq. (2), we re-write the objective function as V. ESTIMATION OF EXPECTATION VALUES ! ˆcur X (j) X ˆcur L2(Q|H, P ) = Hc,i ln Qk,cPk,i,j , (15) To gain information about the unknown states, we can c,i,j k estimate the expectation value hOi = Tr(Oσ) of any ob- servable O. We can calculate the ’s expecta- ˆcur † cur tion value according to the point estimate from the likeli- where P = {Tr (Ui [Πk]ˆτj )}k,i,j. We use a standard nonlinear multi-variable optimizer to solve the following hood maximization as hOˆ i = Tr (Oσˆ). However, if the set 8 of POVMs is not IC then it is possible that the set of MLE �2 states contains more than one element, and each element may have a different expectation value with respect to O. This leads to a set of possible expectation values for All Density the unknown state, which are all equally likely. Never- Matrices theless, we can still learn about the unknown states by determining bounds on the expectation value. This is ac- � complished by searching for the elements in the MLE set Set of ML that provide the largest and smallest expectation values, Density Matrices which corresponds to the pair of semidefinite programs (SDPs) � minimize: ± Tr (Oρ), 1 ρ subject to: Tr ρ = 1, FIG. 4. Conceptual schematic of bounding expectation val- (17) ues. The likelihood maximization returns an estimateσ ˆ that ρ  0, is an element of the set of density matrices that maximize the  ˆ  likelihood (green line). Projecting from the set of maximum Tr Fi,c(ρ − σˆ) = 0, for all i, c. likelihood density matrices onto the line representing observ- able O1 gives a unique expectation value. Because O1 is in The last constraint ensures that ρ is in the set of MLE the span of the estimated measurement operators, each maxi- states by constraining the expected probability of each mum likelihood density matrix has the same expectation value outcome to be equal to that ofσ ˆ. When this is the case, for O1. Observable O2 is only partially contained within the the log-likelihood of any ρ obeying the constraint is equal span of estimated measurement operators. The SDP provides to the log-likelihood ofσ ˆ. upper and lower bounds of its expectation value. As formulated above, the last line of the SDP con- tains |{Fi,c}i,c| constraints, many of which may be redun- dant, thereby causing increased computation time. We {Fi,c}i,c are IC, they span the space of Hermitian op- reduce the number of constraints by finding an orthog- erators, and every observable is identifiable. We do not explicitly check if a decomposition of the form used in onal basis that spans {Fˆi,c}i,c. This is done by defin- 2 Eq. (18) exists apart from running the SDPs. ing a size |{Fi,c}i,c| × d matrix F, where each row is Set-identifiable observables have a range of expectation the measurement operator Fˆ written as a 1 × d2 vec- i,c values over the MLE states. If {F } does not span all tor. We then calculate the singular value decomposition, i,c i,c Hermitian operators, then there necessarily exist observ- F = WSV †. The measurement operators {Fˆ } are i,c i,c ables that are set-identifiable. The SDPs provide tight in the span of the rows of V †. To reduce the number of upper and lower bounds on the ranges of the expectation constraints in the SDP, we create V 0† by discarding the values. rows of V † corresponding to the ` smallest singular val- In the two-ion example, the measurements are not in- ues, where the value of ` is equal to d2 minus the number formationally complete so there are both identifiable and of nonzero singular values of the related matrix formed by set-identifiable observables. An example of an identifi- the underlying measurement operators, {Π } . Now, i,k i,k able observable is the , O = |Φ+i hΦ+|, where we replace the last constraint in the above SDP with 1 |Φ+i = √1 (|↑↑i+|↓↓i). Since σ is an attempted Bell state Tr [V 0†(ρ − σˆ)] = 0, where V 0† is the matrix formed from 2 k k preparation, O measures the fidelity. The Bell state can the kth row of V 0†. This reduces the number of these 1 be written in terms of the measurement operators, constraints to d2 or fewer. Observables can be divided into two types, identifiable |Φ+i hΦ+| = Π + Π and set-identifiable, shown in Fig. 4. First, identifiable 0 2  π ⊗2† π ⊗2 observables have the same expectation value for every + U( 2 , 0) Π1U( 2 , 0) element in the set of MLE states. In this case the two † − U( π , π )⊗2 Π U( π , π )⊗2. (19) SDPs return the same expectation value. An identifiable 2 2 1 2 2 observable O is in the span of the estimated measure- 1 In principle, one could calculate the fidelity by finding ment operators, namely there exist coefficients f such i,c the probability of each measurement outcome with an P ˆ that O1 = i,c fi,cFi,c. The expectation values are then operator in the expansion above. However, the SDP al- proportional to the estimated outcome probabilities gorithm allows for exact calculation without knowing the expansion. ˆ X ˆ hOi = Tr (Oσˆ) = fi,c Tr (Fi,cσˆ). (18) An example of a set-identifiable observable about i,c which we can gain partial information is the probability that the second ion is in the bright state, O = |↓↑i h↓↑|. ˆ 2 Since Tr (Fi,cσˆ) has an associated empirical estimate This operator does not have an expansion in terms of the (1) ˆ given by Hc,i , we can directly calculate hOi. When engineered measurements, unlike the Bell state. How- 9

+ + Observable O1 = |Φ i hΦ | O2 = |↓↑i h↓↑| ing uncertainty for expectation values of set-identifiable True hOi 0.9925 2.5 × 10−3 observables a complication arises. In these cases, the M.L. hOi 0.9902 2.452 × 10−3 expectation values are lower and upper bounded by the SDPs, but in each bootstrap resample different lower and SDPs (0.9902, 0.9902) (2.361, 2.543) × 10−3 −3 upper bounds will be estimated. Therefore, there exists Basic C.I. (0.9849, 0.9943) (1.296, 3.451) × 10 uncertainty from the observable being set-identifiable as −3 Bias corrected C.I. (0.9841, 0.9944) (2.522, 4.357) × 10 well as from the bootstrap distribution. To incorporate both uncertainties, we calculate 97.5 % one-sided confi- TABLE I. Numerical results from the two-ion example. The dence intervals for the lower and upper bounds separately procedure is applied with measurement operators and high- fidelity unitary processes described in Sec. IV and unknown and report the overlapping range, which is a 95 % confi- + + 0.01 1 dence interval. state σ = 0.99 |Φ i hΦ | + 4 . “True hOi” is the true expectation value for the prepared state, “M.L. hOi” is the There are many methods to determine the bootstrap estimated expectation value withσ ˆ, “SDPs” are the (lower, confidence intervals, but these methods make assump- upper) bounds returned from the semidefinite programs, “Ba- tions that are often not satisfied in our situation. In sic C.I.” is the 95 % confidence interval found with the basic simulations, we see that the bootstrap distributions for method and “Bias corrected C.I.” is the 95 % confidence in- the upper and lower bounds of set-identifiable observ- terval found with the bias corrected method. ables are commonly biased or asymmetric. This is de- tected by observing a difference between the median of the bootstrap resamples and the estimate from the origi- ever, we can still gain partial information about the sec- nal data. Bias in maximum likelihood quantum state to- ond ion bright via the SDP. Upper and lower bounds of mography has also been reported in Refs. [30–32]. These both example observables are reported in Table I along effects are caused by the nonlinearity of the estimator with the true value of the expectation value and the value and the complicated structure of the boundary of quan- with respect to the point estimate. tum state space, which necessarily affects our lower and upper bounds on expectation values due to the nature of the SDPs. Certain techniques to construct confidence VI. UNCERTAINTIES intervals, such as the percentile method [33], are sensi- tive to bias and asymmetry, and are therefore contraindi- To quantify the uncertainty in our procedure we use cated. Further, the bias and boundary issues imply that parametric bootstrap resampling. For this method, one the theory underlying other bootstrap confidence interval generates samples from the estimated probability distri- methods is not applicable. As a result, we expect system- bution, which simulates new repetitions of the exper- atic coverage probability errors that cannot be removed iments, and uses the results to estimate the distribu- by increasing the number of bootstrap samples. tion of various parameters [29]. For our case, the es- Nevertheless, for moderate confidence levels between timated probability distribution is defined by Eq. (5) 60 % and 95 %, intervals obtained can still be useful for with τ = (ρ0, σˆ) and Q = Qˆ, the estimates produced descriptive purposes. To this end, we report two meth- by the MLE algorithm. We numerically sample this dis- ods, the basic [34] and bias corrected [33] bootstraps, tribution n times for each combination of state (index which both have some robustness to bias and asymme- j) and engineered POVM (index i) to create a simu- try, though neither is designed for high dimensional esti- lated histogram for each of the experiments described mates with boundary constraints. These methods return in Sec. II. The bootstrap resampling is performed with sometimes very different confidence intervals (cf. in Ta- coarse-grained measurement outcomes according to the ble I for O2). Therefore, we stress that both should be same binning rule so that the bootstrap histogram ma- taken only as qualitative descriptions of the uncertainty trices have the same structure as the original binned ex- and not used for further inference. perimental histogram matrices. We repeat the numerical More sophisticated methods to deal with bias and technique from Sec. IV and expectation value bounding asymmetry exist (for example, the bias corrected and SDPs from Sec. V on the numerically generated data to accelerated method in Ref. [33]). However, such meth- produce new estimates for the measurements, states, and ods are not designed to address the underlying problems upper and lower bounds for hOi. We repeat this entire encountered. Another possibility may be to use methods bootstrap procedure t times, producing a distribution of that involve a double bootstrap, but they are impractical estimated measurement operators, states, and bounds on for our procedure. Further research is required to obtain observables. high-quality bootstrap confidence intervals for quantum We use the bootstrap distribution to estimate the un- tomography. Alternatively, one might adapt the confi- certainty in our method by calculating approximate 95 % dence regions described in Refs. [35, 36], which are not bootstrap confidence intervals of the bounds on observ- based on the bootstrap. ables. For identifiable observables (such as the Bell-state The bootstrap procedure also allows us to perform a fidelity) the confidence interval can be calculated with likelihood ratio test to determine how well our model fits standard techniques (see below). However, when report- the observed data. The “likelihood ratio” is the ratio 10 of the likelihood of our null model, the estimates ofσ ˆ about the measurement operators and do not seek to di- and Fˆ from the experimental data (with log-likelihood agnose all parts of the quantum system. However, since given by the iterative program discussed in Sec. IV), to our protocol is dependent on prior information about the the likelihood of an alternative model, the experimental system, it is not ideal for experiments that have not previ- frequency histogram matrix with log-likelihood function, ously been diagnosed. In the absence of a well defined ini-

X (j) (j) tial state and sufficiently many high-fidelity operations, Lfrq(H) = Hi,c ln Hi,c . (20) methods such as GST may be better suited. i,j,c Our implementation and testing of this estimation pro- To perform the likelihood ratio test we compare the like- cedure has focused on a few trapped ions. In this system, lihood ratio of the original estimates to the distribution measurements can be modeled as classical noise follow- of likelihood ratios created by the bootstrap procedure. ing projection onto a small number of orthogonal sub- The likelihood ratio test statistic is spaces and measurement outcomes that can be parti- tioned into a small number of bins with little informa- ˆ Λ0 = L(ˆσ, F |H, U, ρ0) − Lfrq(H). (21) tion loss. In principle, the procedure can handle higher- dimensional systems and measurements that consist of We also compute an analogous statistic for each of the multi-dimensional histograms but further optimization is t bootstrapped data sets (each computed using its own required to run these cases efficiently. There are also op- simulated histogram matrices and respective state and portunities for expanding our procedure by applying it measurement estimate) to generate a distribution of like- to systems with measurements that cannot be modeled lihood test statistics, {Λ |i = 1, . . . , t}. Because the boot- i as orthogonal projection followed by classical noise. strap data sets are certainly well described by the null model, their likelihood ratios should typically be com- parable to the likelihood ratio of the experimental data set, which may or may not obey the model. A statisti- Appendix A: Software implementation cally significant difference between the experimental data set’s likelihood ratio and the bootstrap data sets’ is evi- We implemented our procedure as a Python package, dence that the null model does not match the experiment. available at Ref. [37]. Instructions for installing and run- To quantify evidence against the null model, we com- ning the package are given in the accompanying docu- pute an empirical p-value by determining the percentile mentation. Our package contains a version of RρR for the at which Λ0 falls in the distribution from the bootstrap first optimization subproblem, and uses scipy.optimize data sets [29]. for the second optimization subproblem. The SDPs that estimate the expectation values described in Sec. V re- quire the MATLAB API engine. The SDPs are solved VII. CONCLUSION with YALMIP [38], which is a package for MATLAB.

We have developed a procedure to simultaneously char- acterize unknown quantum measurements and states by Appendix B: Coarse-graining measurement using a limited set of high-fidelity quantum operations. outcomes The protocol requires two types of experiments, which we designated as “reference” and “probing.” The reference The measurement device may have a very large num- experiments use the high-fidelity processes to estimate ber of possible outcomes b = 1,...,M where M  1. the unknown measurement operators, similar to QDT. This is a main contributor to the computational com- The probing experiments use the high-fidelity processes plexity of the log-likelihood maximization, because each to probe unknown state preparations, similar to QST. In outcome adds a term to the log-likelihood function, which our procedure the estimation of the measurement opera- requires more computation for finding R in the first sub- tors and density matrices is achieved simultaneously via problem and increases the optimization space for the sec- alternating MLE. This means the estimates produced are ond subproblem. To reduce the complexity, we coarse consistent with both reference and probe experiments. grain the outcome space by collecting the original out- We also introduced a method for estimating expectation comes b into G < M bins. (The original outcomes can values of the unknown states by two SDPs when the high- also be thought of as binned, so this procedure is techni- fidelity processes do not produce IC measurements. cally a re-binning.) We assume that the ordering of the Our procedure applies to systems where we can apply original outcomes 1,...,M is meaningful so that it makes certain operations with high fidelity. These need not be sense to bin consecutive outcomes for minimum informa- universal; a small set of single-qubit unitaries suffices. A tion loss. The bins are then identified by the bin edges Bc sufficient set of such operations is available in many state- in the outcome space {Bc|c = 0,...,G}, where B0 = 0, of-the-art quantum information processors, for example and BG = M, so an outcome b ∈ (Bc−1,Bc] is mapped trapped ions. Moreover, our protocol has the advantage to the coarse-grained outcome c. Coarse graining reduces that it is efficient to implement relative to previous pro- the information about the unknown states and measure- posals. This is because we make use of prior information ments that was captured by our experiments. For an 11 extreme example choose G = 2 with B1 = 0; then ev- about is how well the coarse-graining retains information ery outcome b is mapped to the coarse-grained outcome about the underlying outcome distribution pk = Tr (Πkρ) c = 1, which provides no information. To combat this discussed in Sec. II. This distribution is dependent on the problem, we construct a heuristic algorithm to choose the state, so we choose a representative state ρ = ρeq with bin edges. We apply this algorithm to “training data,” the property that underlying outcomes have equal prob- eq which is composed of 10 % of the trials from the ref- ability, namely P (k) = Tr (Πkρ ) = 1/N for all k. We erence experiments randomly sampled without replace- could estimate the coarse-grained histogram matrix Heq ment and set aside from all further analysis. Though for ρeq based on the training data’s histogram, but this we describe the algorithm in terms of one-dimensional relation is dependent on the family of known input states histograms here, our software also supports multidimen- ρ. Instead we find it more convenient to determine Heq sional histograms. based on the estimated transfer matrix Qˆtrain, which al- For a given number G of coarse-grained bins, our al- ready contains information about the histogram and the eq P ˆtrain 1 gorithm maximizes the amount of information retained known states, such that Hc = k Qk,c N = P (c). in the coarse-grained outcomes c by a heuristic based on The needed joint probability distribution is given by mutual information. First, we need to identify the in- P (k, c) = P (c|k)P (k) = P (c|k)/N with P (c|k) given by formation about the unknown states and measurement. ˆtrain Qk,c . In our procedure, this information is contained in the We determine a good coarse graining by finding bin histograms collected from the experiments. These his- (j) edges {Bc}c with high mutual information for P (k, c) as togram matrices Hi,b are rectangular with vertical di- defined in the previous paragraph. Searching over all mension M, the number of possible outcomes. Coarse possible bin edges to maximize I(K; C) is impractical, so graining the outcomes then shrinks the vertical dimen- instead we use a heuristic algorithm that iteratively adds sion to G < M. For bin edges {Bc}c the coarse grained bin edges. We pre-specify the target number of bins G. histogram matrix is defined as Then, starting with two bins, we compute the mutual information for each possible location for the bin edge Bc 0(j) X (j) between 0 and M. The boundary location that gives the H = H . (B1) i,c i,b largest mutual information is then fixed as an element of b=Bc−1+1 what will be our final list of bin edges. To add a third Note that every histogram (that is, every row of the bin, leaving the existing bin edge fixed, we compute the histogram matrix) is treated identically. The coarse- mutual information for all possible locations for the new graining is also applied uniformly to the transition matrix edge. The new edge that gives the largest mutual infor- mation is added to {Bc}c. We continue this procedure Bc until the target number of G bins is reached. We ap- 0 X Qk,c = Qk,b. (B2) ply the resulting bin edge rule to the remaining reference

b=Bc−1+1 histograms not used in the training data, all probing his- tograms, and Qˆini by Eqs. (B1) and (B2). An example Therefore, both H0(j) and Q0 have smaller vertical di- of binning ion fluorescence data is in Fig. 5. mension due to the coarse graining. In the following, we We have observed that binning consecutive elements in study the coarse-grained histogram matrices and transfer the rows of H(j) has been effective when systems occu- train 0(0) matrix estimated from the training data H = H pying a single subspace produce unimodal distributions, ˆtrain 0 and Q = Q from Eq. (7). such as those produced by the ion measurements. Anal- We quantify the amount of information retained in ysis of multimodal distributions may benefit from more coarse-grained histograms with the mutual information. complicated binning strategies. Consider a joint probability distribution P (k, c), where c are the bin indices and k are underlying outcomes. Let C and K denote the random variables with values c and k, respectively. The mutual information I (K; C) be- P Appendix C: Stopping criteria tween K and C quantifies the amount of information C has about K (or K about C). It is given by After each optimization subproblem is run in an iter-   X P (k|c) ation, we bound the difference between the current log- I(K; C) = P (c)P (k|c) log2 , (B3) P (k) likelihoods Li (i = 1 and i = 2 for the first and second k,c subproblems) and their respective maximum possible val- where P (k|c) = P (k, c)/P (k) is the conditional probabil- ues. The bounds tell us how much the log-likelihoods ity distribution. Our goal for coarse graining is to max- could increase with further iterations. We stop the algo- imize the mutual information, so as to approach the full rithm when both differences are individually below pre- information in K, achieved when C = K. We could cal- specified thresholds Tσ and TQ. culate the mutual information between the original and To find the difference bound for the first subproblem, the coarse-grained histograms, but what we really care we follow the method introduced in Ref. [27]. The bound 12

cur ∂ L2(Qˆ ) where are the elements of the gradient of L2. ∂Qk,c Since we do not know QML, we bound the right side of Eq. (C2) by finding the maximum value over all distri- butions by solving the optimization problem

ˆcur X ˆcur ∂ L2(Q ) maximize: (Qk,c − Xk,c) , X ˆcur k,c ∂Qk,c X (C3) subject to: Xk,c = 1, ∀k, c

0 ≤ Xk,c ≤ 1, ∀ k, c.

The value SˆQ returned is an upper bound on the differ- ence between the maximum log-likelihood and the log- likelihood at the current iteration. FIG. 5. Illustration of the binning procedure. (a) Rela- We compare both bounds to the pre-specified thresh- tive frequency histogram for the second reference experiment, olds Tσ and TQ. Default values for these thresholds in π ⊗2 U1 = U( 2 , 0) . (b) Same histogram data but after bin- the code are currently Tσ = 0.3 and TQ = 0.25, which ning. The black lines show the bin boundaries found with the were chosen empirically by testing on simulated data rep- heuristic algorithm. resentative of our ion-trap applications. When Sˆσ ≤ Tσ and SˆQ ≤ TQ the procedure is stopped, and the final estimates are returned. Sˆσ is given by

cur cur L1(σML) − L1(ˆσ ) ≤ max{eig[R(ˆσ )]} − n = Sˆσ, (C1) where σML is the maximum likelihood state and max{eig[R(ˆσcur)]} is the maximum eigenvalue of R(ˆσcur), which was computed by the RρR algorithm. We stop it- ˆ erations of RρR when Sσ ≤ Tσ. ACKNOWLEDGMENTS The difference bound for the second subproblem is cal- culated in a similar way. Since the log-likelihood func- tion of this subproblem is also concave, the difference be- This work includes contributions of the National Insti- tween the maximum log-likelihood and the log-likelihood tute of Standards and Technology, which are not subject of Qˆcur is upper bounded by to U.S. copyright. The use of trade names allows the cal- ˆcur culations to be appropriately interpreted and does not ˆcur X ˆcur ML ∂ L2(Q ) L2(QML) − L2(Q ) ≤ (Qk,c − Qk,c ) , imply endorsement by the US government, nor does it ∂Qk,c k,c imply these are necessarily the best available for the pur- (C2) pose used here.

[1] J. P. Gaebler, T. R. Tan, Y. Lin, Y. Wan, R. Bowler, Jay M. Gambetta, “Procedure for systematically tuning A. C. Keith, S. Glancy, K. Coakley, E. Knill, D. Leibfried, up cross-talk in the cross-resonance gate,” Phys. Rev. A and D. J. Wineland, “High-fidelity universal gate set for 93, 060302 (2016), arXiv:1603.04821 [quant-ph]. 9Be+ ion qubits,” Phys. Rev. Lett. 117, 060505 (2016), [4] L. Casparis, T. W. Larsen, M. S. Olsen, F. Kuem- arXiv:1604.00032 [quant-ph]. meth, P. Krogstrup, J. Nyg˚ard, K. D. Petersson, [2] John M. Nichol, Lucas A. Orona, Shannon P. Harvey, and C. M. Marcus, “Gatemon benchmarking and two- Saeed Fallahi, Geoffrey C. Gardner, Michael J. Man- qubit operations,” Phys. Rev. Lett. 116, 150505 (2016), fra, and Amir Yacoby, “High-fidelity entangling gate arXiv:1512.09195 [cond-mat.mes-hall]. for double-quantum-dot qubits,” npj Quantum In- [5] R. Barends, J. Kelly, A. Megrant, A. Veitia, formation 3, 3 (2017), arXiv:1608.04258 [cond-mat.mes- D. Sank, E. Jeffrey, T. C. White, J. Mutus, A. G. hall]. Fowler, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, [3] Sarah Sheldon, Easwar Magesan, Jerry M. Chow, and A. Dunsworth, C. Neill, P. O’Malley, P. Roushan, 13

A. Vainsencher, J. Wenner, A. N. Korotkov, A. N. Cle- tomography,” New Journal of Physics 18, 103018 (2016), land, and John M. Martinis, “Superconducting quantum arXiv:1606.02856 [cond-mat.mes-hall]. circuits at the surface code threshold for fault tolerance,” [20] Robin Blume-Kohout, John King Gamble, Erik Nielsen, Nature 508, 500–503 (2014), arXiv:1402.4848 [quant-ph]. Kenneth Rudinger, Jonathan Mizrahi, Kevin Fortier, [6] C. J. Ballance, T. P. Harty, N. M. Linke, M. A. Sepiol, and Peter Maunz, “Demonstration of qubit operations and D. M. Lucas, “High-fidelity gates us- below a rigorous fault tolerance threshold with gate set ing trapped-ion hyperfine qubits,” Phys. Rev. Lett. 117, tomography,” Nat. Commun. 8 (2017), arXiv:1605.07674 060504 (2016), arXiv:1512.04600 [quant-ph]. [quant-ph]. [7] E. Prugoveˇcki,“Information-theoretic aspects of quan- [21] ZdenˇekHradil, Jaroslav Reh´aˇcek,Jarom´ırFiur´aˇsek,ˇ and tum measurement,” Int. J. Theor. Phys. 16, 321–331 Miroslav Jeˇzek,“Maximum-likelihood methods in quan- (1977). tum ,” in Quantum State Estimation, Lecture [8] Christopher A. Fuchs and R¨udingerSchack, “Unknown Notes in Physics, Vol. 649, edited by Matteo G. A. Paris quantum states and operations, a baysian view,” in and Jaroslav Reh´aˇcek(Springer,ˇ Berlin, 2004) Chap. 3, Quantum State Estimation, Lecture Notes in Physics, pp. 59–112. Vol. 649, edited by Matto G. A. Paris and Jaroslav [22] Jaroslav Reh´aˇcek,ZdenˇekHradil,ˇ E. Knill, and A. I. Reh´aˇcek(Springer,ˇ Berlin, 2004) Chap. 5, p. 164. Lvovsky, “Diluted maximum-likelihood algorithm for [9] A. Luis and L. L S´anchez-Soto, “Complete character- quantum tomography,” Phys. Rev. A 75, 042108 (2007), ization of arbitrary quantum measurement processes,” arXiv:quant-ph/0611244. Phys. Rev. Lett. 83, 3573 (1999). [23] W. M. Itano, J. C. Bergquist, J. J. Bollinger, J. M. Gilli- [10] Jarom´ır Fiur´aˇsek, “Maximum-likelihood estimation of gan, D. J. Heinzen, F. L. Moore, M. G. Raizen, and D. J. quantum measurement,” Phys. Rev. A 64, 024102 Wineland, “Quantum projection noise: Population fluc- (2001), arXiv:quant-ph/0101027. tuations in two-level systems,” Phys. Rev. A 47, 3554– [11] Y. Lin, J. P. Gaebler, F. Reiter, T. R. Tan, R. Bowler, 3570 (1993). Y. Wan, A. Keith, E. Knill, S. Glancy, K. Coakley, [24] M Acton, K.-A. Brickman, P. C. Haljan, P. J. Lee, A. S. Sørensen, D. Leibfried, and D. J. Wineland, L. Deslauriers, and C. Monroe, “Near-perfect simulta- “Preparation of entangled states through hilbert space neous measurement of a qubit register,” Quantum Info. engineering,” Phys. Rev. Lett. 117, 140502 (2016), Comput. 6, 465–482 (2006), arXiv:quant-ph/0511257. arXiv:1603.03848 [quant-ph]. [25] Jun Shao, Mathematical Statistics, Springer Texts in [12] M. D. Reed, L. DiCarlo, B. R. Johnson, L. Sun, D. I. Statistics (Springer, New York, 1998) Chap. 4. Schuster, L. Frunzio, and R. J. Schoelkopf, “High-fidelity [26] Doglas S. Gon¸calves, M´arcia A. Gomes-Ruggiero, and readout in circuit using the Carlile Lavor, “Global convergence of diluted iterations jaynes-cummings nonlinearity,” Phys. Rev. Lett. 105, in maximum-likelihood quantum tomography,” Quan- 173601 (2010), arXiv:1004.4323. tum Info. Comput. 14, 966–980 (2014), arXiv:1306.3057 [13] D Mogilevtsev, J Reh´aˇcek,ˇ and Z Hradil, “Self- [math-ph]. calibration for self-consistent tomography,” New J. Phys. [27] S. Glancy, E. Knill, and M. Girard, “Gradient- 14, 095001 (2012). based stopping rules for maximum-likelihood quantum- [14] D Mogilevtsev, A Ignatenko, A Maloshtan, B Stoklasa, state tomography,” New J. Phys. 14, 095017 (2012), J Reh´aˇcek,andˇ Z Hradil, “Data pattern tomography: re- arXiv:1205.4043 [quant-ph]. construction with an unknown apparatus,” New J. Phys. [28] Jochen Gorski, Frank Pfeuffer, and Kathrin Klamroth, 15, 025038 (2013). “Biconvex sets and optimization with biconvex functions: [15] Seth T. Merkel, Jay M. Gambetta, John A. Smolin, Ste- a survey and extensions,” Math. Method. Oper. Res. 66, fano Poletto, Antonio D. C´orcoles,Blake R. Johnson, 373–407 (2007). Colm A. Ryan, and Matthias Steffen, “Self-consistent [29] Dennis D. Boos, “Introduction to the bootstrap world,” quantum process tomography,” Phys. Rev. A 87, 062119 Statistical Science 18, 168–174 (2003). (2013), arXiv:1211.0322 [quant-ph]. [30] T. Sugiyama, P. S. Turner, and M. Murao, “Effect of [16] Cyril Stark, “Self-consistent tomography of the state- non-negativity on estimation errors in one-qubit state measurement Gram matrix,” Phys. Rev. A 89, 052109 tomography with finite data,” New J. Phys. 14, 085005 (2014), arXiv:1209.5737 [quant-ph]. (2012), arXiv:1205.2976 [quant-ph]. [17] Robin Blume-Kohout, John King Gamble, Erik Nielsen, [31] C. Schwemmer, L. Knips, D. Richart, H. Weinfurter, Jonathan Mizrahi, Jonathan D. Sterk, and Peter T. Moroder, M. Kleinmann, and O. G¨uhne,“Systematic Maunz, “Robust, self-consistent, closed-form tomogra- errors in current quantum state tomography tools,” Phys. phy of quantum logic gates on a trapped ion qubit,” Rev. Lett. 114, 080403 (2015), arXiv:1310.8465 [quant- (2013), arXiv:1310.4492 [quant-ph]. ph]. [18] J. Medford, J. Beil, J. M. Taylor, S. D. Bartlett, A. C. [32] G. B. Silva, S. Glancy, and H. M. Vasconcelos, Doherty, E. I. Rashba, D. P. DiVincenzo, H. Lu, A. C. “Investigating bias in maximum-likelihood quantum- Gossard, and C. M. Marcus, “Self-consistent mea- state tomography,” Phys. Rev. A 95, 022107 (2017), surement and state tomography of an exchange-only arXiv:1604.00321 [quant-ph]. spin qubit,” Nature Nanotechnology 8, 654–659 (2013), [33] Bradley Efron and R. J. Tibshirani, An Introduction to arXiv:1302.1933 [cond-mat.mes-hall]. the Bootstrap, Monographs on Statistical and Applied [19] Juan P. Dehollain, Juha T. Muhonen, Robin Blume- Probability (Chapman and Hall, New York, 1993). Kohout, Kenneth M. Rudinger, John King Gamble, [34] A. C. Davison and D. V. Hinkley, Bootstrap Methods Erik Nielsen, Arne Laucht, Stephanie Simmons, Rachpon and their Application (Cambridge University Press, New Kalra, Andrew S. Dzurak, and Andrea Morello, “Opti- York, NY, 1997). mization of a solid-state electron spin qubit using gate set [35] Matthias Christandl and Renato Renner, “Reliable quan- 14

tum state tomography,” Phys. Rev. Lett. 109, 120403 https://github.com/usnistgov/state_meas_tomo. (2012), arXiv:1108.5329v1 [quant-ph]. [38] J. Lofberg, “Yalmip : a toolbox for modeling and [36] R. Blume-Kohout, “Robust error bars for quantum to- optimization in matlab,” in 2004 IEEE International mography,” (2012), arXiv:1202.5270 [quant-ph]. Conference on Robotics and Automation (IEEE Cat. [37] “Joint quantum state and mea- No.04CH37508) (IEEE, 2004) pp. 284–289. surement tomography,” (2018),