Arxiv:2105.05992V2 [Quant-Ph] 26 May 2021

Home , Operator space, Quantum operation

Informationally complete POVM-based shadow tomography

Atithi Acharya,1 Siddhartha Saha,1 and Anirvan M. Sengupta1, 2 1Department of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854, USA 2Center for Computational Mathematics and Center for Computational Quantum Physics, Flatiron Institute, New York, NY 10010 USA Abstract: Recently introduced shadow tomography protocols use ‘classical shadows’ of quantum states to predict many target functions of an unknown quantum state. Unlike full quantum state tomography, shadow tomography does not insist on accurate recovery of the density matrix for high rank mixed states. Yet, such a protocol makes multiple accurate predictions with high confidence, based on a moderate number of quantum measurements. One particular influential algorithm, proposed by Huang, Kueng, and Preskill [10], requires additional circuits for performing certain random unitary transformations. In this paper, we avoid these transformations but employ an arbitrary informationally complete POVM and show that such a procedure can compute k-bit correlation functions for quantum states reliably. We also show that, for this application, we do not need the median of means procedure of Huang et al. Finally, we discuss the contrast between the computation of correlation functions and fidelity of reconstruction of low rank density matrices.

I. INTRODUCTION II. GENERALIZED MEASUREMENTS

A projective measurement is described by an observ- Recent advances in quantum information processing able, A, a Hermitian operator on the state space of the system being observed. The observable has a spectral often require characterizing quantum states prepared P during various stages of a procedure. As a result, the decomposition, A = a aPa where Pa is the projec- problem of characterising a quantum state, more specifi- tor onto the eigenspace of A with eigenvalue a. The cally, a density matrix, from measurements on an ensem- possible outcomes of the measurement corresponding to ble of identical states, known as quantum state tomogra- the eigenvalues, a, of the observable and the outcome phy (QST), has seen a surge of interest [5, 10, 20]. One of probability is p(a) = hψ| Pa |ψi. Projection Valued Mea- the key challenges is that, for n-qubit quantum systems, sures (PVMs) are a special case of general measurements, the density matrix is of size 2n × 2n. As the number of where the measurement operators are Hermitian and or- qubits become large, inferring the density matrix from a thogonal projectors. A set of Positive Operator Valued limited number of measurements becomes difficult. Measures (POVMs) [14] forms a generalization of PVMs. The index a in the POVM element Ma refers to the measurement outcomes that may occur in the experiment. Can we get away without fully characterizing the quan- The probability of the measurement outcome is given by tum state, but by constructing an approximate classical p(a) = tr(ρMa) and the post measurement density ma- description that predicts many different functions of the † KaρKa trix can be written as † , where {Ka} are the state accurately? Shadow Tomography [2] precisely aims tr(KaρKa) to do this, namely, predict a power law number of obser- Kraus operators [14] corresponding to the POVM, with † vations in number of qubits, n, from O(n) copies of the KaKa = Ma. The operators {Ma} form a complete set density matrix ρ. This idea was taken further by Huang of Hermitian non-negative operators. Namely, they sat- † et al. [10] who have constructed such a description of isfy Hermiticity : Ma = M , P ositivity : hψ| Ma |ψi ≥ 0 a P low sample complexity via classical shadows (ˆρ), related for any vector |ψi and Completeness : a Ma = I. Such to states without any entanglement in the appropriate a POVM could be thought of as a partition of unity by basis, corresponding to each copy of ρ . non-negative operators. arXiv:2105.05992v2 [quant-ph] 26 May 2021

Quantum measurement requires specifying a set of Positive Operator Valued Measures (POVMs) [14] which A. Informational completeness is a generalization of a complete set of projection operators. The work by Huang et al. [10] involves measure- The density matrix, ρ, is a Hermitian and unit trace ments via projection operators. Since projection oper- operator. If we have a d-dimensional system, ρ will be a ators are not informationally complete (see Sec. II A), complex square matrix represented by d2 −1 real param- Huang et al. employ a set of random unitary transfor- eters. The operator space for this d-dimensional operator mations before taking measurements. In this work, we will be however spanned by d2 linearly independent ba- directly employ a complete or overcomplete POVM sys- sis operators. Note that PVMs only have d projection tem and perform shadow tomography. We also gain some operators. They are capable of providing only the diago- insight into how the choice of POVM affects the efficiency nal elements of ρ in a particular orthonormal basis, leav- of the method. ing out potential entanglement-related information from 2 the off-diagonal elements. Thus, PVMs are examples of III. CLASSICAL SHADOWS WITH POVMs POVMs that are informationally undercomplete. If the number of outcomes k satisfies k ≥ d2, and we Aaronson introduced the idea of “pretty good can form exactly d2 linearly independent operators by tomography”[1], with the focus on predicting many ob- linearly combining the set of POVMs, such POVMs will servations accurately, based on N copies of the density be called informationally complete (IC). However, in the matrix. This idea parallels the “learnability” of quantum most common terminology, informationally complete ac- states in a Probably Approximately Correct (PAC) sense tually refers to the minimally complete POVM (k = d2). [21]. Proceeding along this line, he later introduced the If we proceed to reconstruct the density matrix for an concept of Shadow Tomography [2], where from N copies informationally complete POVM, we can expand ρ as of the density matrix ρ, we want to predict L different d2−1 linear target functions tr(O1ρ), tr(O2ρ). . . . tr(OLρ) up to X an additive error less than . ρ = ξaMa. (1) a=0 Huang et al. [10] build their methods on the idea of Shadow Tomography [2]. They repeatedly perform If k = d2 we have a informationally complete a measurement procedure, i.e. apply a random uni- (minimally complete) basis set. However, if we have tary to rotate the state (ρ 7→ UρU †) and perform a k > d2 =⇒ it forms an informationally overcomplete computational-basis measurement. Then, after the mea- set [16]. IC POVM has been used for entanglement detec- surement, they apply the inverse of U to the resulting tion [8] and for individual elements of the density matrix computational basis state. This procedure collapses ρ [13]. to a snapshot U †|ˆbihˆb|U, producing a quantum channel We start out by giving the example of a rather sim- M, which depends on the ensemble of (random) unitary ple overcomplete set in the single-qubit Hilbert space, transformations. M . Pauli-6 POVM has 6 outcomes M = Pauli−6 Pauli−6 If the collection of unitaries is defined to be tomo- M = 1 ×|0ih0| ,M = 1 ×|1ih1| ,M = 1 ×|+ih+| ,M = 0 3 1 3 2 3 3 graphically complete, namely, if the condition i.e. for each 1 × |−ih−| ,M = 1 × |lihl| ,M = 1 × |rihr| where 3 4 3 5 3 σ 6= ρ, there exist U ∈ U and b such that hb|UσU †|bi= 6 {|0i , |1i}, {|+i , |−i}, and {|ri , |li} stand for the eigen- hb|UρU †|bi is met, then M — viewed as a linear map — bases of the Pauli operators σz, σx, and σy, respectively. has a unique inverse M−1. Huang et al. [10] set Experimentally, it can be implemented directly by first randomly choosing x, y, or z, and then measuring the respective Pauli operator, which justifies the 1/3 factor. ρˆ = M−1 U †|ˆbihˆb|U (classical shadow). (2) However, other probabilities will also be valid for this example of an overcomplete POVM. Now, let us give an example of a minimally com- −1 Although the inverted channel M is not physical (it plete POVM, the Pauli-4 POVM: MPauli−4 = M0 = is not completely positive), one can still apply M−1 to 1 1 1 † 3 × |0ih0| ,M1 = 3 × |+ih+| ,M2 = 3 × |lihl| ,M3 = the (classically stored) measurement outcome U |ˆbihˆb|U 1 × |1ih1| + |−ih−| + |rihr| . As a sanity check for the in a completely classical post-processing step. Even if an 3 P completeness relation, one can see a Ma = 1/3(|0ih0| + individual sample ofρ ˆ is not a density matrix, the expec- |1ih1|+|+ih+|+|−ih−|+|lihl|+|rihr|) = I. The experimen- tation ofρ ˆ’s is the original density matrix ρ. One can use tal procedure will be similar to that of Pauli-6 POVM, this property to get a good prediction of measurements with an additional step where three different outcomes performed on ρ. of Pauli-6 are identified as the single element of Pauli-4, If, instead of working with the computational basis M . Thus, this set contains an element which is not a 3 measurements, we decide to use an IC POVM (Sec. II A), rank-1 projector. we can avoid dealing with particular random unitary en- The third one is the tetrahedral POVM M = M = tetra a sembles. The only thing we need to make sure is that the 1 ( + s · σ) , whose outcomes correspond to 4 I a a∈{0,1,2,3} resulting channel M is invertible. sub-normalized rank-1 projectors along the directions √ √ 2 2 1 2 q 2 1 s0 = (0, 0, 1), s1 = ( 3 , 0, − 3 ), s2 = (− 3 , 3 , − 3 ), √ q and s = (− 2 , − 2 , − 1 ) in the Bloch sphere. Since 3 3 3 3 A. POVMs for the the n-qubit system the tetrahedron formed is regular, it forms an example of a symmetric informationally complete (SIC) POVM. n The experimental implementation of Mtetra relies on From single qubit POVMs {Ma}, we introduce k op- Neumark’s dilation theorem. The theorem implies that erators by taking tensor products and form POVMs for the n-qubit system: M = Ma1 ⊗ Ma2 ⊗ ..Man . Mtetra can be physically realized by coupling the system a1,...an qubit to an ancillary qubit and performing a von Neu- The outcomes of this measurements in this system are mann measurement on the two qubits (see Ref. [5, 19] of the form ~a = (a1, a2, ....an). Now, we discuss how to for explicit constructions). form shadows from such an observation. 3

B. A synthetic measurement channel

Let the POVM elements be diagonalised as follows: P a a Ma = i λi |i, aihi, a| , ∀i, a, λi ≥ 0, since Ma 0. + + Let f : R0 → R0 be a strictly monotonic function which will be applied to the eigenvalues of the POVM elements. + The function f is deﬁned on R0 since the eigenvalues are non-negative. The probability outcome ‘a’ is given as

p(a) = tr(ρMa). (3)

Each time we perform a measurement and get an outcome ‘a’, we construct a pure output state |i, aihi, a| a FIG. 1. The convex region in the figure is the set of admis- f(λi ) with probability p(i|a) = P a . We assume each j f(λj ) sible density matrices. We schematically describe the process Ma to be non-zero, guaranteeing that the denominator of forming classical shadows from N copies of ρ. For the i- P f(λa) 6= 0. Although this is a synthetic channel, we th observation with outcome ‘a’, the inverse of the channel, j j −1 will refer to it as the measurement channel, in analogy M , acts on the projectors |ψaihψa| to construct the shadow ρî. The sample mean of the shadows cast by ρ i.e. 1 P ρî with the case where {M } are projections. N i a fluctuates around the true ρ and could be outside the con- The measurement channel, for a single qubit, can be vex region. However, while measuring L k-local observables defined as [10] O~ = (O1,...OL), the convergence of the sample averages X X ~oˆ(N) to the true expected values can be guaranteed with a ρ˜ = M(ρ) = p(a) p(i|a) |i, aihi, a| . (4) number of samples O(log L). See Theorem 1. a i For simplicity, in the following discussion, we con- sider the case where the highest eigenvalue of each Ma is random Pauli measurements [10], we get a depolarizing non-degenerate. The modifications needed for the gen- channel i.e. a channel that contracts a pure state (lying eral case are obvious. If a particular POVM element on the surface of the Bloch sphere) towards the ‘center’ is not a rank one projector and the function f is very of the Bloch sphere, namely, the maximally mixed state steeply increasing, then the overwhelmingly likely output ρ = I2/2. The inverse (a non-physical map) can be com- is |ψai hψa|, where |ψai is the eigenvector corresponding puted, which can map a point inside the Bloch ball to to the highest eigenvalue of Ma. An example of such the outside. a function is f(λ) = λm in the large m limit. In the Multi-qubit system: For local measurements (not large m limit, as we perform a measurement, the output necessarily the depolarizing channel), the inverse channel is |ψaihψa| (snapshots) with probability tr(ρMa). The for the n-qubit system can be written as measurement channel can be defined as n X −1 O −1 ρ˜ = M(ρ) = tr(ρMa) |ψaihψa| . (5) Mn = M1 . (6) a j=1

In a more general scheme, like the one mentioned in the We can now reformulate the shadows with our overcom- beginning of the subsection, |ψai is a random vector cho- plete POVM set and its corresponding channel. For in- sen according to a probability distribution. For example, stance, when we work with Pauli-6 POVM, we will get in the current scheme, if the largest eigenvalue of Ma is degenerate, we choose any one of the corresponding n O −1 eigenvectors with equal probability. ρˆ = M1 (|ψa,jihψa,j|)(classical shadow), (7) In the formalism developed in [10], the channel and j=1 its inversion were related to the ensemble of (random) −1 unitary transformations (e.g. Clifford unitary ensemble). where M1 (X) = 3X − tr(X)I (see Sec. A 2). Note that The condition of tomographical completeness depended the 2n × 2n matrixρ ˆ need not be constructed explicitly. on the existence of a unitary transformation in the cho- We just need to store |ψa,ji for each qubit j. sen ensemble to distinguish different density matrices Since the inverted channel M−1 is not physical (it is [10]. However, with our reformulation of the measure- not completely positive), theρ ˆ in Eq. (7) need not be ment channel, we need to use an informationally com- physical. In other words, there is no guarantee the output plete set POVMs (e.g. Pauli-6, see Sec. II). of the inverse channel is positive semidefinite. See Fig. 1 In the example of a single qubit measured using the 6 for a schematic description. We recover the true density projectors coming from the 3 Pauli matrices i.e. Pauli- matrix only in expectation. However, if the shadow ma- 6 POVM, the channel and its inverse can be explicitly trix is forced to be positive semidefinite, we can see how computed. Similar to the classical shadows built out of the observations such as fidelity changes (see Sec. IV A). 4

C. Noisy shadow E. The algorithm and the guarantee of performance Earlier, we defined our measurement channel, Eq. (5). However, we can also let each of our qubits pass through a We want to predict the expected value of previously characterized noise channel E and then take 1 multiple k-local observables O1,...OL based the measurements [11]. The combined channel ME,1 is on shadows using the two algorithms below. given by Algorithm 1: Generating Shadows with POVMs X n ρ˜ = ME,1(ρ) = tr(E1(ρ)Ma) |ψaihψa| . (8) Input: IC POVM with k outcomes, ρ ∈ C2 (N a copies of the unknown density matrix) We used IC POVMs to ensure that the measurement 1 Compute the measurement channel M1 and its channel was invertible. As long as the action of the −1 inverse M1 for the chosen IC POVM. (See Sec. noise channel E1 itself is invertible, ME,1 is also invert- A 2) ; ible. We will work with an n-qubit noise channel of the 2 for i = 1,...N do Nn form En = j=1 E1. Thus, we can still write the inverse 3 Perform measurements using the POVM of the new noisy measurement channel for the n-qubit elements Ma to get outcomes aji ∈ {1, . . . , k} system in terms of the single qubit inverse shadow chan- ; −1 nel ME,1: 4 Construct shadows n −1 n N ρî = j=1 M1 ( ψaji ψaji ) (See Sec. III B, −1 O −1 ME,n = ME,1. (9) A for the general version) j=1 5 end If we choose an amplitude damping channel with Output: ρˆ1, ρˆ2 ... ρˆN damping parameter γ, one of the Kraus operator representations can be given as Algorithm 2: Predicting many properties using † † mean as an estimate EAD(ρ) = K0ρK0 + K1ρK1, (10) Input: A POVM set, N copies of unknown √ 1 0 0 γ density matrix ρ, L different k-local Pauli where K = , K = . 0 0 p(1 − γ) 1 0 0 observables O1,O2,...OL and error −1 parameters , δ The inverse of the noisy shadow channel MAD(X) is 1 Find bounds on the local observables B({O}, M). given in Eq. (A6). Its action on I and σx,y,z is given −1 γ −1 3 (See Sec.A 3 for details). as M ( ) = − σz, M (σx,y) = √ σx,y and AD I I 1−γ AD 1−γ B({O},M) log( 2L ) −1 3 2 Using algorithm.1, collect N ≥ δ M (σz) = σz. See Sec.A 2 for a general descrip- 22 AD 1−γ shadows. tion on the inversion of a noisy shadow channel. Here, 1 N (j) 3 Compute meanso ˆ = P tr O ρˆ we will construct the shadows (noisy) with the following i N j=1 i Output: o¯ , o¯ ... o¯ definition: 1 2 L n The existence of the bound is guaranteed by the fol- O −1 ρˆ = MAD(|ψa,jihψa,j|). (11) lowing theorem. j=1

2L B({O},M) log( δ ) D. Predicting linear functions with classical Theorem 1. With N ≥ 22 samples of shadows ρ, we can predict L different linear target functions tr(O1ρ), tr(O2ρ), . . . , tr(OLρ) up to additive error with maximum failure probability δ. Using the statistical properties of a single shadow, we can predict linear functions in the unknown state ρ as The constant bound B({O}, M) will depend on the o = tr(Oρ) = E[ô], whereo ˆ = tr(Oρˆ). (12) measurement channel M (which depends on the choice of In practice, using an array of shadows (i.e. N snapshots), POVM) and on the operator set {O} = {O1,O2, .., OL}). we can estimate the expectation o. Given an array of The important thing is that B({O}, M is bounded for so N independent classical snapshots (each defined as in called k-local operators, as defined in [10]. Eq. (7)) : For instance, if we choose Pauli-6, the bound is given n (1) (1) (N)o (j) k k S(ρ; N) = ρˆ , ρˆ ,..., ρˆ . (13) aso î ∈ [−3 , 3 ], in which case B(k-local, Pauli-6) = 4 × 9k (See Sec.A 3). See section III B for the algorithm, 1 PN (j) The sample mean iso ¯ = N j=1 tr Oρˆ . This sam- including the construction of the measurement channel. ple mean will fluctuate around the true prediction, with In the appendix section on sample complexity (Sec. A 3), E(¯o) = o. the details of the proof is provided. 5

(a) GHZ z z FIG. 2. Prediction of two-point correlations, hσi σj i, for noisy GHZ target states using classical shadows for Pauli-6, with 1-standard deviation band. The standard deviations are estimated over ten independent runs, each of which involved N = 5000 samples. The parameter p, representing the local depolarizing noise strength, is described in Eq. (14).

IV. NUMERICAL RESULTS

For many quantum systems in Condensed Matter Physics, one of the objects of interest is the two-point correlation function. Two-point correlators could be effi- ciently estimated using classical shadows based on Pauli- Z Z (b) Spin down 6 POVM. The predictions of two-point functions hσi σj i for the GHZ states with varying degree of noise is shown FIG. 3. Maximum error in two-point correlators. (a) Scal- in Fig. 2. ing of maximum error among all two-point correlations in 30 We can write the action of the single qubit depolarizing qubit pure GHZ state, plotted against different number of noise [14] on an arbitrary ρ written in the Bloch sphere samples for different choice of POVMs: Pauli-6, Pauli-4 and representation: tetrahedral. (b) Scaling of maximum error for all spin down state with Pauli-4 and Pauli-6 POVM. Pauli-4 ensures a much 4 4 better scaling. See Sec. A 3 for details. ρ0 = (1 − p)ρ + p . (14) 3 3 I

Applying this channel to every qubit, we generate a noise results in the three regimes: critical, ordered and para- GHZ state [9] from a pure one. The expected two-point magnetic. The exact numerical correlations are plotted Z Z 2 correlations hσi σj i varies as (1 − 4p/3) with the noise using the matrix product representations of the ground parameter p. states. [15]. In [5] and [12], POVM-based measurements, While predicting multiple 1,...,L, two-point or k-point followed by a neural-network-centric approach for con- correlations, we monitor the maximum possible error structing the ground state, and computing the resulting among all the observables. This measure of error is ex- two-point correlations were presented for the same sys- pected to go down with increasing number of samples. tem. This scaling, as seen in Fig. 3, gives us some idea of the appropriateness of a POVM set for a particular task. 2. 1D Disordered Heisenberg Model

1. 1D Transverse Field Ising Model The Hamiltonian for the 1D disordered Heisenberg model is given by, We take antiferromagnetic (J > 0 in Eq. (15) ) transverse ﬁeld Ising model in 1D: N ˆ 1 X x x x y y y z z z z H = − (Jj σj σj+1 + Jj σj σj+1 + Jj σj σj+1 + hσj ). X z z X x 2 H = J σi σj + h σ . (15) j=1 i (16) 1 The properties of spin- 2 antiferromagnetic chains with The quantum critical point at h/J = 1 will be exhibited various types of random exchange coupling has been by the power-law decay of the correlations. See Fig. 4 for studied in an exact decimation renormalization-group 6

(a) J = h = 1 (b) J > h, J = 1 and h = 0.5 (a) Exact Diagonalization (b) Reconstruction with Shadows

z z FIG. 5. Two-point functions hσi σj i for ground states of disordered 1D Heisenberg spin chain with length=10 and open boundary conditions. (a) Exact diagonalization results. (b) Results from using Pauli-6 POVM based shadows using 5000 samples.

z z FIG. 4. Two-point functions hσ0 σi i for ground states of antiferromagnetic 1D tranverse field Ising model using Pauli-6 POVM based shadows and the true value, as computed using matrix product states. The correlations are plotted against the lattice separation. The lattice size is 30 and the number of samples used is 5000. (a) Critical (J = h) antiferromagnetic 1D TFIM, showing signatures of power-law correlation. (b) Ordered state (J > h), where correlations saturates with increasing lattice separation. (c) The paramagnetic state (J < h), displaying exponential decay of the correlations. FIG. 6. Quantum fidelity predicted for the pure GHZ state using sample mean of shadows constructed on 104 samples. (strong-disorder) schemes, some of which involve gener- The shaded regions are the standard deviation over ten inde- alization or modifications of the scheme introduced by pendent runs. The inset shows the scaling of the variance of Dasgupta and Ma [7]. The numerical studies done by fidelity which grows exponentially with number of qubits. R.N. Bhatt and P.A. Lee [4] indicate that the system could be in a random-singlet phase. In such a phase, each spin is paired with another spin that may be far struct a hypothesis state (σ): away on the lattice. We perform exact diagonalization, N obtain the ground state and then compute two-point 1 X σ = ρ.ˆ (17) quantum correlations. The 2d plot of the correlation N matrix will also inform us about the locations of the i singlet formations in the chain. We can also reconstruct When our target state is pure, we can rewrite quantum these behavior of a ground-state corresponding to one fidelity as a linear prediction with our target observable particular disorder realization of the XXZ-Heisenberg given as O = |ψihψ|. Starting from this definition of model Eq. (16) (J = J = 2J , h = 0) with sufficient p√ √ 2 x y z quantum fidelity i.e. FQ(ρ, σ) = (tr( ρσ ρ)) , us- number of shadows. See Fig. 5, where the singlet ing ρ = |ψihψ| we get (trp(hψ| σ |ψi |ψihψ|))2. Further formations are indicated by the schematics drawn on simplification of the fidelity gives us the axes of the matrix visualization plots and the results from the two methods are compared. p 2 FQ(ρ, σ) = hψ| σ |ψi (tr( |ψihψ|)) = hψ| σ |ψi = tr(σO).

The measure tr(σO) is equivalent to fidelity, only when the latter is defined, i.e. when σ 0. That property is likely to hold only when the number of samples is large. A. Exploring quantum fidelity We expect tr(σO) to fluctuate around its mean value 1, as seen in Fig. 6, even when the typical σ is not a physical In our approach to construct shadows using local state, meaning it is not positive semidefinite. Also, the POVMs, we ensure prediction of local observables. How- fluctuation around this mean keeps on growing exponen- ever, we can also explore non-local observables such as tially with the number of qubits (see Fig. 6). This growth fidelity. Using sample mean as an estimator, we can con- cannot be dealt with even by the median of means (MoM) 7

plications of quantum devices. When we have additional information about the possible noisy channels we also adapt the shadow channel as a composition of the noise channel and the measurement channel. The invertibility becomes straightforward in the proposed framework. We also comment on why the mean as an estimator is sufficient throughout our discussion. And as long as we are dealing with local observables, we can provide efficient sample complexity using Hoeffding’s inequality directly. We provided instances where the choice of POVM im- pacts the sample complexity for predicting 2-point corre- FIG. 7. Quantum fidelity of the projected shadows (onto lators in certain quantum states for fixed maximum error. the physical positive definite space) with the noiseless GHZ 3 4 We noted that the different POVMs work better for dif- state. As we increase the number of samples, from 10 to 10 , ferent states. It is an exciting endeavour to understand the quantum fidelity improves. The shaded regions indicate which sets of POVM would be ideal for different classes 1-standard deviation bands, estimated over ten independent runs. of quantum states and observables. Although, an exploration, we attempt to reconstruct fidelity using the locally built shadows and show that we procedure [10] within the shadow formalism. Numerical cannot benefit from median of means as an estimator, computations using MoM also show no advantage over since variance of fidelity becomes exponential in number sample means here. of qubits. Additionally, when presented with few samples Hence, we need a procedure to find the ‘closest’ phys- we raise the issue of unphysical i.e. not positive semidef- ical state to σ. The trace condition tr(ˆσ) = 1 ensures initeρ ˆ and then provide a projection tecnique, similar that onceσ ˆ 0, some of the eigenvalues will be greater to [17], to estimate fidelity. Unfortunately, the estimator than 1 to compensate for the negative eigenvalues. Thus, no longer remains unbiased. Addressing this issue would we cannot just throw away the negative eigenvalues, as require methods to deal with non-local observables. would be done for projecting a Hermitian matrix to the We did not provide an effective analog of the global space of positive semidefinite matrices. Clifford unitary transformation-based method in [10]. We define the the convex set of physical states to be There has been work which provides description of global C = {ρ|ρ 0, tr(ρ) = 1}. Our nonlinear projection to C alternatives using stabilizer states [17]. Whether there is can be a scheme based on such states that is competi- tive with the classical shadows method [10] remains to 2 ΠC (σ) = arg min tr((ρ − σ) ). (18) be seen. ρ∈C The use of generalized measurement to unambigu- We achieve this by diagonalizing σ, projecting the ously discriminate non-orthogonal states with lower failure probability is well known [3, 6, 14]. Efficient predic- eigenvalues λi of σ onto a canonical simplex ∆ = p p p PD p tion of expectations of local observables combined with {(λ1, . . . , λD) | λi ≥ 0, i=1 λi = 1}, using the recipe from Ref. [23], while leaving the eigenvectors untouched. the generalized measurement scheme to obtain the shad- Here, D = 2n where n is the total number of qubits. ows can be used as an optimal framework in the discrim- The projected state is a biased estimator. We can hope ination of non-orthogonal states. In the future, it is a that the price paid by accepting some bias comes with promising direction of exploration. the benefit of reduced variance. This expectation seems to be born out in Fig. 7. However, as number of qubits increase, the bias itself reduces fidelity. To compensate ACKNOWLEDGEMENT this effect, we need larger sample sizes (N). Fig. 7 shows all these trends. We would like to thank Shagesh Sridharan, James Stokes, Miles Stoudenmire for insightful discussions.

V. DISCUSSIONS Appendix A: Appendix We provide an approach to predict expectations of local observables without having to apply random uni- 1. The Measurement Channel for Pauli-6 tary transformations, which sometimes require complex circuits of its own, and can become a practical bottle- We can take the simple rank-1 Pauli-6 POVMs to see neck. We show that this can rather be done using an IC the action of a measurement channel: POVM. For illustrations, we show faithful reconstruc- X 1 tion properties of low energy states coming from diﬀer- ρ˜ = M(ρ) = tr ( + r.σσ)M |ψ ihψ | , (A1) 2 I a a a ent many body Hamiltonians relevant to near-term ap- a 8

1 1 where we use the Bloch representation ρ = 2 (I + r.σσ). 3 (|1ih1|+|−ih−|+|rihr|) = I, all other elements are rank- The contribution of the ﬁrst two POVM elements of 1. Since M3 is rank-2, when the outcome is 3, we take the Pauli-6 only gets contribution from and r σ , generat- eigenvector |ti corresponding to eigenvalue 1 (1 + √1 ) in- I z z 2 3 ing stead of the other corresponding to 1 (1− √1 ). Rewriting 2 3 1 1 Eq. (A 2), we get tr( ( + r σ )M ) |0ih0| + tr( ( + r σ )M ) |1ih1| . 2 I z z 0 2 I z z 1 ρ˜ = M1(ρ) = tr(ρM0) |0ih0| + tr(ρM1) |+ih+| 1 1 (A4) Using M0 = 3 × |0ih0| and M1 = 3 × |1ih1| this ex- +tr(ρM2) |lihl| + M3) |tiht| . 1 1 pression becomes: 6 I + 3 rzσz Following similar steps for pairs M2,M3 and M4,M5, we get that: The inverse of the channel can be written as

xs √ X 1 1 1 M−1(X) = 6X − (√ − x ( 3 − 1))( σ ) − 5x M(ρ) = M (I + r.σσ) = (I + r.σσ), 1 0 i 0I 2 2 3 3 i P √ making M a depolarizing channel. tr(X) 3(tr(X( i σi)) + ( 3 − 1)x0) x0 = , xs = √ . 2 3 + 1 (A5) 2. Inverse of the measurement channel When we are working with a known noise channel E, the inverse is given as Tˆ−1 = Tˆ−1Tˆ−1. If we choose an Given any single qubit channel, the inverse can be ME E M easily computed using the Bloch-sphere representation. amplitude damping channel with a damping parameter We can write any 2 dimensional (single qubit) quan- γ, the inverse can be given as tum operation (X) as X = (x + ~r.~σ). Any arbi- 0I 3(1 − 1 tr(Xσ )) trary trace-preserving quantum operation is given as −1 2 z 3tr(σzX) ME (X) = p X + X 0 E 0 (1 − γ) 2(1 − γ) E(X) = (x0I + ~r .~σ). The map ~r −→ ~r is equivalent (A6) to, 1 3 γtr(X) +( − )tr(X) + σ . 2 p I 2(γ − 1) z 1 2 (1 − γ) ~r 0 = T~r + x ~c, T = tr(σ E(σ )). (A2) 0 i,j 2 i j

The components of displacement (~c) is given as ci = 3. Sample complexity 1 2 tr(σiE(I)). The affine map between the Bloch sphere and itself is given by T , and its meaning is under- a. Variance of the Estimate for a Single Observable stood better by doing a singular value decomposition 0 i.e. T = O1DO2 where O1,O2 are orthogonal matri- Given an array of N independent, classical snapshots ces. The singular values capture the deformation of the (each defined as Eq. (7)) : Bloch sphere about its principal axes. A superoperator Tˆ4×4 can be defined as n o S(ρ; N) = ρˆ(1), ρˆ(1),..., ρˆ(N) . (A7) ˆ x0 x0 ˆ 1 0 T = 0 , T4×4 = . (A3) N ~r ~r ~c T3×3 1 P (j) The sample mean iso ˆ = N j=1 tr Oρˆ . The bound on probability of deviation of the sample mean is given Computing inverse of the channel is equivalent to writing by Chebyshev’s inequality: 0 −1 (x0, ~r) from (x0, ~r ) i.e. computing Tˆ . In the main text, the Pauli measurement channel (E = V ar[ô] P r(|oˆ − [ô]| ≥ ) ≤ (A8) M1) turns out to be a depolarizing channel, and its in- E 2 verse that acts on the local qubit is given as [ô] = tr (Oρ) where ρ is the true density matrix. Fluc- −1 E M1 (X) = 3X − tr(X)I. tuations ofo ˆ around this desired expectation are con- 1 (j) trolled by the variance. V ar[ô] = N V ar[tr(Oρˆ )] = We take a more general example following our defini- (j) V ar[o ] . However, since the classical shadows are unit tion of a measurement channel: N trace by construction, the variance depends only on the X tr(O) ρ˜ = M(ρ) = tr(ρMa) |ψaihψa| . trace-less part of the observable i.e. O0 = O− 2n I. The a minimum number of samples needed to assure a maximum failure probability (δ) using Eq. (A8) is If a particular POVM element is not rank one, |ψai can be taken as the eigenvector corresponding to the high- V ar[o(j)] N ≥ . (A9) est eigenvalue of Ma. For Pauli-4, except for M3 = 2δ 9

b. Dependence on POVM variance: k (j) X Y −1 † 2 Given a measurement channel and an observable, we V ar[o ] ≤ P r(a1..an) hai| [M1 ] (Pi) |aii can bound the variance of its estimator, using familiar a1,.,an i=1 maneuvers with superoperators [10], k X a1 a2 an Y −1 † 2 = tr(ρM ⊗ M .. ⊗ M ) hai| [M1 ] (Pi) |aii (j) (j) 2 (j) 2 (j) 2 V ar[o ] = E((o ) ) − (E(o )) ≤ E((o ) ) a1,.,an i=1 k X 2 −1 † X Y 2 = P r(a1..an) ha1, ., an| [Mn ] (O0) |a1, ., ani a1 a1 an −1 † = tr[ρ (M ⊗ M .. ⊗ M hai| [M1 ] (Pi) |aii ] a1,.,an a1,.,an i=1 a1 a1 an where P r(a1, .., an) = tr(ρM ⊗ M .. ⊗ M ). k O X ai −1 † 2 O ⊗(n−k) = tr[ρ M hai| [M1 ] (Pi) |aii I ] We broadly define a k-local Pauli-observable as an oper- i=1 ai ator which acts nontrivially only on k qubits. Traceless k | {z } 5 +4P local operators can be expressed as linear conbination of I i k tensor products of indentity matrices and k or less Pauli O O = tr[ρ (5 + 4P ) ⊗(n−k)]. matrices. Hence, we need to focus only on special class I i I i=1 of k-local operators. Denoting Pi as one of the Pauli matrices acting on the ith qubit, we focus on of tenor Clearly, the above bound on variance is dependent on ⊗(n−k) products like O0 = P1 ⊗ P2 ⊗ .. ⊗ Pk ⊗ I , where, the state ρ, unlike the bound we obtained using Pauli- without loss of generality, we assume that the operator 6 POVM. Since ρ is a density matrix and the operator acts non-trivially on only the first k qubits. 5I + 4Pi is a PSD operator, one gets the minimum value For Pauli-6 POVM, the inverse of the measurement for the bound when ρ is of the form: channel is a self-adjoint map, and thus one can verify its k ! action as: O O ρ = |piihpi| ρñ−k, (A10) i=1 −1 † −1 [M1 ] (Pα) = M1 (Pα) = 3Pα, where |piihpi| is the projector into the eigenvector corresponding to the lowest eigenvalue of the operator 5 +4P , where P denotes a Pauli matrix and [M−1]†( ) = I i α 1 I andρ ˜ is a valid density matrix in the Hilbert space M−1( ) = . Given a k-local observable, we can further n−k 1 I I of n − k qubits on which the k-local Pauli-observable compute the bound on variance: acts trivially. For the above ρ, it is simple to verify that the value of the variance bound is 1 (independent k X Y 2 of k). Thus, for example, if the unknown state ρ is the V ar[o(j)] ≤ P r(a ..a ) ha | 3P |a i 1 n i i i all spin down state, then Pauli-4 POVM works better a1,.,an i=1 than Pauli-6 POVM in predicting two-point correlators k hσZ σZ i, since the variance is higher in the latter. X a1 a2 an Y 2 i j = tr(ρM ⊗ M .. ⊗ M ) hai| 3Pi |aii

a1,.,an i=1 k c. Improved Bound Using Hoeﬀding’s Inequality X a1 a1 an Y 2 = tr[ρ (M ⊗ M .. ⊗ M hai| 3Pi |aii) ] a1,.,an i=1 Furthermore, we can use Hoeﬀding’s inequality to pro- k vide theoretical bounds when we are dealing with k-local O X ai 2 O ⊗(n−k) k = tr[ρ M hai| 3Pi |aii I ] = 3 . Pauli observable, since we are working with bounded ran- 1 N (j) (j) i=1 ai dom variables. Ifo ˆ = P tr Oρˆ ,o ˆ ∈ [a, b] for | {z } N j=1 3I all j, where −∞ < a ≤ b ≤ ∞, we can write,

2 ( −2N ) Now, we take up Pauli-4 POVM. One can verify the ac- P r(|oˆ − E [ô]| ≥ ) ≤ 2e (b−a)2 . (A11) −1 † tion of [M1 ] as: The minimum number of samples needed to assure a √ √ X √ maximum failure probability (δ) among all the observ- [M−1]†(P ) = (2 − 3) + (3 + 3)P − (3 − 3)P , 1 α I α β ables using Eq. (A13) is β6=α 2(b − a)2 N ≥ log . (A12) where Pα denotes a Pauli matrix. Using the fact that δ 22 −1 M1 is a trace preserving map, one can say its adjoint −1 † 2 has to be unital i.e. [M1 ] (I) = I. Given a k-local B({O}, M) = (b − a) depends on the locality of the Pauli-observable, one can again compute the bound on observable and the maximum eigenvalue λmax of the 10 inverse channel acting on the observable. The bound The scaling is logarithmic in the number of observables on random variableo ˆ(j) can be found as the range of L, instead of linear behavior we get using Chebyshev’s the Rayleigh quotient of the inverse of the measure- inequality. We do not need to use MoM procedure [10], ment channel, acting on the observable over all possible which would have been necessary if we were dealing with states. For instance, if we choose Pauli-6, the bounds estimate distributions with long tails (unlike the bounded can be shown to lie withino ˆ(j) ∈ [−3k, 3k] in which estimates for k-local Pauli observables). case B(k-local, Pauli-6) = 4 × 9k. Using the action of [M−1]†, one can verify that the value of the random vari- −1 † able tr([M ] (Pi) |aiihai|) belongs to the set {5, −1} for any Pauli matrix Pi when |aiihai|) is the inferred state for Pauli-4. Thus, the random variableo ˆ is contained in the range {−5k, 5k} which is exponential on the locality rather than the number of qubits. 4. Numerical computations d. The Guarantee of Performance for Multiple Observables The computations for GHZ states has been carried out using MPS matrix product state (MPS) representations If we have L different k-local Pauli observables for noiseless states and matrix product operator (MPO) representations for the noisy states. The details on the oˆ1,..., oî,..., oˆL with the sample mean corresponding to N simulation of mixed states using MPOs has been shown the observable i defined aso ˆ = 1 P tr O ρˆ(j). If i N j=1 i in [5, 22]. The data-sets corresponding to the ground 1 PN (j) (j) oî = N j=1 tr Oiρˆ ,o î ∈ [a, b] for all j, where states of spin Hamiltonians such as the Transverse-field −∞ < a ≤ b < ∞, we can combine the union bound Ising model have been generated using the density ma- with Hoeffding’s inequality to write trix renormalization group (DMRG). The library used

2 is mpnum (a matrix product represenation library for ( −2N ) (b−a)2 Python). [18]. Given a particular spin model, with the P r( max |oî − E [ôi]| ≥ ) ≤ 2Le . (A13) 1≤i≤L Hamiltonian expressed as a MPO, the DMRG algorithm attempts to find the optimal MPS with the lowest en- The minimum number of samples needed to assure a ergy. However, for the visualization of correlations in the maximum failure probability (δ) among all the observ- disordered 1d Heisenberg spin chain, the computations ables using Eq. (A13) is are made without using the DMRG framework. This is 2L(b − a)2 because the number of sites were low for this particular N ≥ log . (A14) numerical experiment. δ 22

[1] Scott Aaronson. The learnability of quantum states. Pro- 8, 2018. ceedings of the Royal Society A: Mathematical, Physi- [9] Daniel M. Greenberger, Michael A. Horne, and Anton cal and Engineering Sciences, 463(2088):3089–3114, Sep Zeilinger. Going beyond Bell’s theorem. eprint arXiv: 2007. 0712.0921, 2007. [2] Scott Aaronson. Shadow tomography of quantum states, [10] Hsin-Yuan Huang, Richard Kueng, and John Preskill. 2018. Predicting many properties of a quantum system from [3] Stephen M. Barnett and Sarah Croke. Quantum state very few measurements. Nature Physics, Jun 2020. discrimination. Adv. Opt. Photon., 1(2):238–278, Apr [11] Dax Enshan Koh and Sabee Grewal. Classical shadows 2009. with noise. e-print arXiv:2011.11580, 2020. [4] R. N. Bhatt and P. A. Lee. Scaling studies of highly [12] Ilia A. Luchnikov, Alexander Ryzhov, Pieter-Jan Stas, disordered spin-½ antiferromagnetic systems. Phys. Rev. Sergey N. Filippov, and Henni Ouerdane. Variational au- Lett., 48:344–347, Feb 1982. toencoder reconstruction of complex many-body physics. [5] Juan Carrasquilla, Giacomo Torlai, Roger G. Melko, Entropy, 21(11):1091, Nov 2019. and Leandro Aolita. Reconstructing quantum states [13] Joshua Morris and Borivoje Dakić. Selective quan- with generative models. Nature Machine Intelligence, tum state tomography. arXiv preprint arXiv:1909.05880, 1(3):155–161, Mar 2019. 2019. [6] Anthony Chefles. Quantum state discrimination. Con- [14] M. A Nielsen and I. L. Chuang. Quantum Computation temporary Physics, 41(6):401–424, Nov 2000. and Quantum Information: 10th Anniversary Edition. [7] Chandan Dasgupta and Shang-Keng Ma. Low- Cambridge University Press, 10 edition, 2011. temperature properties of the random Heisenberg anti- [15] RománOrús. A practical introduction to tensor net- ferromagnetic chain. Phys. Rev. B, 22(3):1305–1319, Au- works: Matrix product states and projected entangled gust 1980. pair states. Annals of Physics, 349:117–158, Oct 2014. [8] Aleksandra Dimićand Borivoje Dakić. Single-copy en- [16] Joseph M. Renes, Robin Blume-Kohout, A. J. Scott, tanglement detection. npj Quantum Information, 4(1):1– and Carlton M. Caves. Symmetric informationally com- 11

plete quantum measurements. Journal of Mathematical [20] Giacomo Torlai, Guglielmo Mazzola, Juan Carrasquilla, Physics, 45(6):2171–2180, Jun 2004. Matthias Troyer, Roger Melko, and Giuseppe Carleo. [17] G.I. Struchalin, Ya. A. Zagorovskii, E.V. Kovlakov, S.S. Neural-network quantum state tomography. Nature Straupe, and S.P. Kulik. Experimental estimation of Physics, 14(5):447–450, Feb 2018. quantum state properties from classical shadows. PRX [21] L. G. Valiant. A theory of the learnable. Communications Quantum, 2(1), Jan 2021. of the ACM, 1984. [18] Daniel Suess and Milan Holzäpfel. mpnum: A matrix [22] F. Verstraete, J. J. Garc´ıa-Ripoll, and J. I. Cirac. Ma- product representation library for Python. Journal of trix product density operators: Simulation of finite- Open Source Software, 2(20):465, 2017. temperature and dissipative systems. Phys. Rev. Lett., [19] Gelo Noel M. Tabia. Experimental scheme for qubit 93:207204, Nov 2004. and qutrit symmetric informationally complete positive [23] Weiran Wang and Miguel A.´ Carreira-Perpiñán. Pro- operator-valued measurements using multiport devices. jection onto the probability simplex: An efficient algo- Phys. Rev. A, 86:062107, Dec 2012. rithm with a simple proof, and an application. CoRR, abs/1309.1541, 2013.