Information Scrambling in Computationally Complex Quantum Circuits

Xiao Mi,1, ∗ Pedram Roushan,1, ∗ Chris Quintana,1, ∗ Salvatore Mandr`a,2, 3 Jeffrey Marshall,2, 4 Charles Neill,1 Frank Arute,1 Kunal Arya,1 Juan Atalaya,1 Ryan Babbush,1 Joseph C. Bardin,1, 5 Rami Barends,1 Andreas Bengtsson,1 Sergio Boixo,1 Alexandre Bourassa,1, 6 Michael Broughton,1 Bob B. Buckley,1 David A. Buell,1 Brian Burkett,1 Nicholas Bushnell,1 Zijun Chen,1 Benjamin Chiaro,1 Roberto Collins,1 William Courtney,1 Sean Demura,1 Alan R. Derk,1 Andrew Dunsworth,1 Daniel Eppens,1 Catherine Erickson,1 Edward Farhi,1 Austin G. Fowler,1 Brooks Foxen,1 Craig Gidney,1 Marissa Giustina,1 Jonathan A. Gross,1 Matthew P. Harrigan,1 Sean D. Harrington,1 Jeremy Hilton,1 Alan Ho,1 Sabrina Hong,1 Trent Huang,1 William J. Huggins,1 L. B. Ioffe,1 Sergei V. Isakov,1 Evan Jeffrey,1 Zhang Jiang,1 Cody Jones,1 Dvir Kafri,1 Julian Kelly,1 Seon Kim,1 Alexei Kitaev,1, 7 Paul V. Klimov,1 Alexander N. Korotkov,1, 8 Fedor Kostritsa,1 David Landhuis,1 Pavel Laptev,1 Erik Lucero,1 Orion Martin,1 Jarrod R. McClean,1 Trevor McCourt,1 Matt McEwen,1, 9 Anthony Megrant,1 Kevin C. Miao,1 Masoud Mohseni,1 Wojciech Mruczkiewicz,1 Josh Mutus,1 Ofer Naaman,1 Matthew Neeley,1 Michael Newman,1 Murphy Yuezhen Niu,1 Thomas E. O’Brien,1 Alex Opremcak,1 Eric Ostby,1 Balint Pato,1 Andre Petukhov,1 Nicholas Redd,1 Nicholas C. Rubin,1 Daniel Sank,1 Kevin J. Satzinger,1 Vladimir Shvarts,1 Doug Strain,1 Marco Szalay,1 Matthew D. Trevithick,1 Benjamin Villalonga,1 Theodore White,1 Z. Jamie Yao,1 Ping Yeh,1 Adam Zalcman,1 Hartmut Neven,1 Igor Aleiner,1 Kostyantyn Kechedzhi,1, † Vadim Smelyanskiy,1, ‡ and Yu Chen1, § 1Google Research 2QuAIL, NASA Ames Research Center, Moffett Field, California 94035, USA 3KBR, Inc., 601 Jefferson St., Houston, TX 77002, USA 4USRA Research Institute for Advanced Computer Science, Mountain View, California 94043, USA 5Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 6Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL 7California Institute of Technology, Pasadena, CA 91125 8Department of Electrical and Computer Engineering, University of California, Riverside, CA 9Department of Physics, University of California, Santa Barbara, CA Interaction in quantum systems can spread initially localized quantum information into the many degrees of freedom of the entire system. Understanding this process, known as quantum scrambling, is the key to resolving various conundrums in physics. Here, by measuring the time-dependent evo- lution and fluctuation of out-of-time-order correlators, we experimentally investigate the dynamics of quantum scrambling on a 53-qubit quantum processor. We engineer quantum circuits that dis- tinguish the two mechanisms associated with quantum scrambling, operator spreading and operator entanglement, and experimentally observe their respective signatures. We show that while operator spreading is captured by an efficient classical model, operator entanglement requires exponentially scaled computational resources to simulate. These results open the path to studying complex and practically relevant physical observables with near-term quantum processors.

The inception of quantum computers was motivated modeling its dynamics is the key to resolving a num- by their ability to simulate dynamical processes that are ber of physical phenomena, such as the fast-scrambling challenging for classical computation [1]. However, while conjecture for black holes [9, 10], non-Fermi liquid be- the size of the Hilbert space scales exponentially with the haviors [15, 16] and many-body localization [17]. Under- number of qubits, quantum dynamics can be simulated in standing scrambling also provides a basis for designing polynomial times when entanglement is insufficient [2–4] algorithms in quantum benchmarking or machine learn- or when they belong to special classes such as the Clif- ing that would benefit from efficient exploration of the arXiv:2101.08870v1 [quant-ph] 21 Jan 2021 ford group [5–7]. A physical process that fully lever- Hilbert space [18–20]. ages the computational power of quantum processors is A precise formulation of quantum scrambling is found quantum scrambling, which describes how interaction in in the Heisenberg picture, where quantum operators a quantum system disperses local information into its evolve and quantum states are stationary. Analogous to many degrees of freedom [8–12]. Quantum scrambling is classical chaos, scrambling manifests itself as a “butterfly the underlying mechanism for the thermalization of iso- effect”, wherein a local perturbation is rapidly amplified lated quantum systems [13, 14] and as such, accurately over time [11, 21]. More specifically, the perturbation is realized as an initially local operator (the “butterfly operator”) Oˆ, typically a Pauli operator acting on one ∗ These authors contributed equally of the qubits (the “butterfly qubit”). When the quan- † Corresponding author: kostyantyn@.com tum system undergoes a dynamical process Uˆ, the butter- ‡ Corresponding author: [email protected] fly operator Oˆ acquires a time-dependence and becomes § Corresponding author: [email protected] Oˆ (t) = Uˆ †OˆUˆ, with Uˆ † being the inverse of Uˆ. The 2

n ˆ ˆ P p ˆ Cycle 1 Cycle 2 Cycle K resulting O (t) can be expanded as O (t) = i=1 wiBi, A CZ CZ ෝy B (i) (i) −1 Qa: |+Yۧ Q1 X V Y ˆ

where Bi =ρ ˆ1 ⊗ ρˆ2 ⊗ ... are basis operators consisting EG Q1: |+Xۧ EG (i) Q W−1 X V of single-qubit operatorsρ ˆ acting on different qubits 2

j Q2: |+Xۧ EG −1 −1 and wi are their coefficients. ෡ ෡† Q3 X W X

Qb: |+Xۧ 푈 푈

EG EG Quantum scrambling is enabled by two different mech- Q Y X−1 Y−1 Q |+Xۧ N anisms: operator spreading and operator entanglement N 푈෡ [11, 22–26]. Operator spreading refers to the transfor- C Qa Q1 Q2 Q20 mation of basis operators such that on average, each ˆ

Bi involves a higher number of non-identity single-qubit 0.8 No Butterfly Operator 1.0 ഥ

operators. Operator entanglement, on the other hand, 0.6 C , , refers to the increase in np, i.e. the minimum number 0.4 0.5

ˆ y

of terms required to expand O (t). Independent char- ෝ  0.2 acterizations of these two mechanisms are essential for a 0.0 0.0 complete understanding of the nature of quantum scram- -0.2 OTOC Average bling. Quantifying the degree of operator entanglement -0.4 -0.5 also holds the key to assessing the classical simulation 0 5 10 15 20 0 5 10 15 20 complexity of quantum observables [27]. However, oper- Cycles Cycles ator spreading and operator entanglement are generally intertwined and indistinguishable in past experimental FIG. 1. OTOC measurement protocol. (A) Measurement studies of quantum scrambling [28–32]. scheme for the OTOC of a quantum circuit Uˆ. A quantum In this Article, we perform a comprehensive characteri- system (qubits Q1 through QN ) is first initialized in a super- j=N zation of quantum scrambling in a two-dimensional (2D) position state Q | + Xi , where | + Xi = √1 (|0i + |1i) j=1 j j 2 j quantum system of 53 superconducting qubits. Signa- and |0i (|1i) is the ground (excited) state of individual qubits. tures of operator spreading and operator entanglement Uˆ and its inverse Uˆ † are then applied, with a butterfly oper- are clearly distinguished in our experiment. These re- ator (realized as an X gate on qubit Qb) inserted in-between. sults are enabled by quantum circuit designs that inde- An ancilla qubit Qa is initialized along the y-axis of the Bloch pendently tune the degree of each scrambling mechanism, sphere and entangled with Q1 by a pair of CZ gates. The as well as extensive error-mitigation techniques that al- y-axis projection of Qa, hσˆyi, is measured at the end. (B) ˆ low us to faithfully recover coherent experimental signals The structure of U consists of K cycles: each cycle includes√ ±1 in the presence of substantial noise. Lastly, we find that one√ layer√ of single-qubit√ gates (randomly chosen from X , while operator spreading can be efficiently predicted by a Y ±1, W ±1 and V ±1) and one layer of two-qubit entan- gling gates (EG). Here W = X√+Y and V = X√−Y . (C) Left classical stochastic process, simulating the experimental 2 2 signature of operator entanglement is significantly more panel: The filled circles represent hσˆyi measured with Qb suc- costly with a computational resource that scales expo- cessively chosen from Q2 through QN . The open grey circles are a set of normalization values, corresponding to hσˆ i with- nentially with the size of the quantum circuit. y out applying the butterfly operator. The data are plotted over Our experiment approach is based on evaluating the ˆ ˆ different numbers of cycles in U and averaged over 60 random correlator between O (t) and a “measurement operator”, circuit instances. Right panel: Experimental average OTOCs ˆ M, which is another Pauli operator on a different qubit C for different Qb, obtained from dividing the corresponding (the “measurement qubit”): hσˆyi by the normalization values.

C(t) = hOˆ†(t) Mˆ † Oˆ(t) Mˆ i. (1) Here h...i denotes the expectation value over a particular sufficient to identify operator spreading through the de- quantum state. C(t) is commonly known as the out-of- cay of C, whereas any insight into operator entanglement time-order correlator (OTOC) and related to the commu- necessitates an estimate of δC. tator [Oˆ(t), Mˆ ] by C(t) = 1 − 1 h|[Oˆ(t), Mˆ ]|2i [11, 21, 34– The measurement protocol for OTOC is described in 2 ˆ 37]. Quantum scrambling is characterized by measuring Fig.1A and consists of a quantum circuit U and its in- ˆ † ˆ C over a collection of quantum circuits with microscopic verse U , with a butterfly operator O (Pauli operator (b) differences, e.g. phases of individual gates. Operator σˆx on Qb) inserted in-between. An ancilla qubit Qa, spreading is then reflected in the average OTOC value, connected to the measurement qubit Q1 via a controlled- C, which decays from 1 when Oˆ(t) and Mˆ overlap and phase (CZ) gate, measures C between Oˆ and Mˆ (Pauli (1) no longer commute [30, 31]. In the fully scrambled limit operatorσ ˆz on Q1) through its hσˆyi [35, 38]. For this where the commutation between Oˆ(t) and Mˆ is com- work, we employ quantum circuits composed of random pletely randomized, C becomes ∼0. If operator entan- single-qubit gates and fixed two-qubit gates (Fig.1B) glement is also present (i.e. np  1), C approaches 0 due to the wide range of quantum scrambling that may for all circuits and their fluctuation δC vanishes as well. be achieved with limited circuit depths [33, 39]. The This is because each C includes contributions from many OTOC measurement protocol is first implemented on a basis operators Bˆi with different phases. It is therefore one-dimensional (1D) chain of 21 qubits (Fig.1C). We 3

A 6 Cycles 8 Cycles 10 Cycles 13 Cycles 15 Cycles C Cഥ (iSWAP) 1.0 1.0 0.8 0.8 Qa 0.6

0.6 Q1 ഥ C 0.4 0.4 0.2 0.2 0.0 0.0 -0.2 -0.2 -0.4 0 5 10 15 Cycles B 6 Cycles 10 Cycles 14 Cycles 18 Cycles 22 Cycles Cഥ ( iSWAP) 1.0

1.0 0.8 0.8 Q a 0.6

0.6 Q1

ഥ C 0.4 0.4 0.2 0.2 0.0 -0.2 0.0 -0.4 0 5 10 15 20 25 Cycles

FIG. 2. OTOC propagation and speed of operator spreading. (A) Spatial profiles of average OTOCs, C, measured on the full 53-qubit processor. The ancilla qubit Qa and the the measurement qubit Q1 are indicated by the red arrows. The colors of other filled circles represent C with different choices of the butterfly qubit Qb. The two-qubit gates are iSWAP and applied between all nearest-neighbor qubits, with the same order as done in Ref. [33].√ The dashed lines delineate the light-cone of Q1. The data are averaged over 38 circuit instances. (B) Similar to (A) but with iSWAP as the two-qubit gates. Here C is averaged over√ 24 circuit instances. (C) Cycle-dependent C for 4 different choices of Qb. The top (bottom) panel shows data with iSWAP ( iSWAP) being the two-qubit gates. The colors of the data points indicate the locations of Qb (inset to bottom panel). Dashed lines show theoretical predictions based on a classical population dynamics model.

use the qubit at one end of the chain as Qa and succes- C converges to 0, indicating that Oˆ (t) and Mˆ have over- sively choose qubits Q2 through Q20 as Qb. A two-qubit lapped and no longer commute. In addition, we observe gate (iSWAP) is applied to each pair (Qj,Qj+1), where that the time-evolution of C for each Qb resembles a bal- j = 0, 2, 4... in odd circuit cycles and j = 1, 3, 5... in even listically propagating wave. The front of each wave co- circuit cycles. incides with the edge of the “light-cone” associated with Q , i.e. the set of qubits that have been entangled with In the left panel of Fig.1C, experimental values of 1 Q . This profile is attributed to the iSWAP gates used in hσˆ i are shown for different numbers of cycles in Uˆ. Here 1 y these circuits, which spread single-qubit operators at the we average hσˆ i over 60 circuit instances to first focus y same rate as their light-cones expand [42]. For generic on operator spreading. It is seen that hσˆ i decays as y quantum circuits, the spreading velocity (a.k.a. the but- early as cycle 1, irrespective of the location of Q . This b terfly velocity) is typically slower. Using the full 2D sys- observation contradicts the 1D geometry which requires tem, we next demonstrate how the evolution of C may h + 1 circuit cycles before the time-evolved butterfly op- be used to diagnose the butterfly velocity of operator erator Oˆ (t) overlaps with Mˆ , with h being the number spreading. of qubits between Qb and Q1. The signals are there- fore complicated by errors in the quantum circuits, such In Fig.2A, the spatial distribution of C is shown for as mismatch between Uˆ and Uˆ † or qubit decoherence five different numbers of cycles in Uˆ, with iSWAP still [29, 30]. These deleterious effects are mitigated by ad- being the two-qubit gate. It is seen that the number of ditionally measuring hσˆ i without applying the butterfly y qubits with C < 1 rapidly increases with the number operator [41]. These data, referred to as normalization of cycles, consistent with the spatial spread of the time- values, are also shown in the left panel of Fig.1C and ap- evolved butterfly operator. Moreover, for each circuit cy- proximately equal to the total fidelities of Uˆ and Uˆ † [29]. cle, the values of C abruptly change across the edge of the We then divide hσˆ i for each Q by the normalization y b light-cone associated with Q (dashed lines in Fig.2A). values to recover the effects of scrambling [40]. 1 In contrast, the spatiotemporal evolution of C shown in The normalized data, equal to the average OTOCs Fig.2B is significantly√ different. Here the iSWAP gates C after error-mitigation, are shown in the right panel are replaced with iSWAP gates and the decay of C is of Fig.1C and exhibit features consistent with operator slower. Qubits far from Q1 still retain average OTOC spreading: For each location of Qb, C retains values near values closer to 1 even after 22 cycles. The sharp, step- 1 before sufficient circuit cycles have occurred to allow √like spatial transition seen with iSWAP is also absent for an overlap between Oˆ (t) and Mˆ . Beyond these cycles, iSWAP. Instead, C changes in a gradual fashion as Qb 4

ND: 0 1 2 4 8 16 32 A B C 0.06 Experiment 120

iSWAP 0.04 Simulation ഥ C 0.02 Iመ X 100 0.00 12 Cycles -0.02 Cliffords ෝx iSWAP 80 1.00 C 106 ෝ 

y p 104 n W 60 102 ෝz 100

+ 0 10 20 30

Cliffords Circuit Instance Circuit

- 40 ND 0.10

Non V 20 + Experiment

OTOC Fluctuation, Fluctuation, OTOC Simulation + + 0 0.02 + + -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 -1 0 1 0 5 10 15 20 25 30 Instance-Specific OTOC, C ND

FIG. 3. OTOC fluctuation and signature of operator entanglement. (A) Transformation of a butterfly operator into products of Pauli operators (Pauli strings) by quantum gates. In this example, an initial butterfly operator Iˆ(1)σˆ(2)Iˆ(3)Iˆ(4)... is √ x (1) (2) (3) (4) (1) (2) (3) (4) mapped into Iˆ σˆz σˆy Iˆ ... by an iSWAP gate, then into Iˆ σˆy σˆy Iˆ ... by an X gate and so on. In the last two steps, the butterfly operator evolves into a superposition of multiple Pauli strings (coefficients not shown). (B) OTOCs of individual random circuit instances, C, measured with the number of non-Clifford gates in Uˆ, ND, fixed at different values. Dashed lines are numerical simulation results. The inset shows locations of Qa (open black circle), Q1 (filled black circle), and Qb (filled blue circle) as well as the number of circuit cycles with which the data are taken. (C) The mean C (upper panel) and RMS values δC (lower panel) of C for different ND. Dashed lines are computed from the numerically simulated values in (B). Inset: Average numbers of Pauli strings in the time-evolved butterfly operator Oˆ (t), np, for different ND. The scaling behavior is 0.79ND np ≈ 2 [40].

moves further away from Q1. used to estimate C. In this classical picture, the OTOC The different OTOC behaviors can alternatively be wavefront corresponds to the boundary separating the seen in the full temporal evolution of four specific qubits empty region from the region populated by particles. (Fig.2C). For iSWAP, the shape of the OTOC wavefront The√ difference in OTOC propagation between iSWAP remains sharp and relatively insensitive to the location of and iSWAP is then captured by the dependence of Ω on Qb, similar to the 1D example in Fig.1C. On the√ other θ: After each application of an iSWAP gate (θ = π/2), hand, the wavefront propagates more slowly for iSWAP the particle occupation always changes (with the excep- and also broadens as the distance between Qb and Q1 in- tion of ♦♦). In particular, any previously empty site will creases. As a result, more√ circuit cycles are required be- be filled and the region populated by particles always fore C reaches√ 0 for iSWAP. The wavefront behavior grows. This leads to the observed maximal√ butterfly ve- seen with iSWAP is similar to generic quantum circuits locity. In contrast, the application of any iSWAP gate analyzed in past works [23, 24]. (θ = π/4) can leave the particle occupation unchanged 5 The observed features of average OTOCs are quanti- with a probability 12 . C therefore decays more slowly tatively understood by mapping operator spreading to a in this case and its broadening is explained by the fact classical Markov process involving population dynamics that the wavefront spreads at a different velocity for each [40]. In this model, the 2D qubit lattice is populated trial of the Markov process. The predicted values of C, by fictitious particles representing two copies of a single- plotted as dashed lines in Fig.2C, agree well with the qubit operator. The initial state of the entire system experimental data and indicate that the dynamics of op- is a single particle at the site of Qa. Whenever a two- erator spreading can be reliably predicted by classical qubit gate is applied to two neighboring lattice sites, their models. We note that√ the effect of noise is included when particle occupation changes between four possible states: C is calculated for iSWAP as it is found to introduce ♦♦ (both empty), ♦ (left empty, right filled), ♦ (right deformation to the observed signals [40]. empty, left filled) and  (both filled). The transition Unlike operator spreading, an efficient classical de- probabilities are described by the stochastic matrix: scription of operator spreading is not known to exist. In particular, population dynamics cannot be used to model   1 0 0 0 the circuit-to-circuit fluctuation of OTOCs [40]. Resolv-  0 1 − a − b a b  ing the growth of operator spreading is also difficult with Ω =   , (2)  0 a 1 − a − b b  a quantum processor since it is often accompanied by b b 2 0 3 3 1 − 3 b increased operator spreading [11, 22–24]. We overcome this challenge by gradually adjusting the composition of 1 4 1 1 2 2  Uˆ and Uˆ †, realizing a group of circuits with predomi- where a = 3 sin θ, b = 3 2 sin 2θ + 2 sin θ with θ √ √ ±1 ±1 being the swap angle of the two-qubit gate. The average nantly Clifford gates (iSWAP, X √ or Y )√ and a ±1 ±1 probability of finding a particle at the site of Q1 is then small number of non-Clifford gates ( W and V ). 5

A 11 Cycles 12 Cycles 15 Cycles B This is expected as the time-evolved butterfly operator NS = 159 NS = 208 NS = 251 Signal Expt. Error Oˆ(t) is a single Pauli string and therefore either com- 0.1 mutes or anti-commutes with the measurement operator Mˆ . As more non-Clifford gates are introduced into the 0.01 circuits, C starts to assume intermediate values between 3 100 2 ±1 and converges toward 0.

SNR 1 80 The mean C and fluctuation (i.e. root-mean-square 160 180 200 220 240 260 value) δC of C are then computed from experimental N 60 C S data and plotted against ND in Fig.3C. We observe dif- ferent behaviors for C and δC: C remains largely con- 1020 40 stant and close to 0, confirming that operator spread-

Circuit Instance Circuit 1010 ing remains unaffected by the increasing number of non- 20 (Hours)

sim Clifford gates. On the other hand, δ decays from an

t C

= 64 = 40 =

= 64 = 100

D D D

N N N initial value of 1 and is almost suppressed by two orders 0 -1 0 1 -0.2 0.0 0.2 -0.1 0.0 0.1 200 300 400 500 of magnitude as ND increases from 0 to 32. Over the C C C N S same range of ND, we have numerically calculated the average numbers of Pauli strings in Oˆ(t), np, which are FIG. 4. Classical simulation complexity of quantum seen to increase exponentially (inset to Fig.3C). These scrambling. (A) Top panels show three different circuit con- results demonstrate that the decay of OTOC fluctuation figurations, each having the same Q (black unfilled circle) a allows the growth of operator entanglement to be exper- and Q1 (black filled circle) but different Qb (colored circle) and number of cycles in Uˆ. The number of iSWAPs in Uˆ imentally diagnosed. † and Uˆ that affect classical simulation costs, NS, is indicated To determine the accuracy of our measurements, the for each configuration. Bottom panels show the instance- OTOCs of experimental circuits are simulated using a dependent OTOCs measured for each configuration. The Clifford-expansion method and overlaid on the data in dashed lines are simulation results using tensor-contraction Fig.3B and Fig.3C[40]. We find good agreement be- methods. N is also indicated for each configuration. (B) D tween experiment and simulation even when δC is as Top: OTOC signal size and experimental error as functions small as 0.03, indicating the quantum processor’s capa- of NS. Bottom: Signal-to-noise ratio (SNR) as a function bility to resolve high degrees of operator entanglement. of N . SNR equals the ratio of OTOC signal size to exper- S This may appear surprising given that other signatures imental error. N = 64 for the first three values of N and D S of quantum entanglement, such as entropy, are highly ND = 40 for the last two values. The reason for decreasing susceptible to unwanted interaction with the environ- ND is to increase the signal size such that it can be resolved with less statistical averaging [40]. (C) Estimated time tsim ment [43]. Instead, we find that environmental effects needed to simulate the OTOC of a single 53-qubit circuit with are nearly absent from these data. This robustness is a a variable number of iSWAPs, NS, on a single CPU core (6 result of the effective normalization protocol and a range Gflop/s). of other error-mitigation techniques used in our experi- ment [40]. Having identified means of characterizing both opera- As illustrated by the evolution of a butterfly operator in tor spreading and operator entanglement, it remains to Fig.3A, Clifford gates generate operator spreading by be asked how the computational complexity of quantum extending the butterfly operator to other qubits while scrambling, as well as our experimental error, scale with preserving the total number of basis operators (which circuit size. This is addressed by systematically increas- are products of Pauli operators, or Pauli strings, in these ing the number of iSWAPs in the quantum circuits, NS cases). In contrast, non-Clifford gates generate operator (here NS counts only iSWAP gates that lie within the entanglement by transforming one single Pauli string into light-cones of Qa and Q1 [40]). At the same time, ND a superposition of multiple Pauli strings, maintaining the is kept at a large value such that the Clifford-expansion spatial extent of operator spreading in the process. simulation method used in Fig.3 is challenging to per- These distinctive properties of Clifford and non- form and tensor-contraction is the most efficient classical Clifford gates therefore provide us a way to indepen- simulation method [33, 40, 44]. Figure4A shows repre- dently tune one scrambling mechanism without affecting sentative data for three circuit configurations with differ- the other. We now focus on operator entanglement and ent values of NS, along with the corresponding numerical measure the circuit-to-circuit fluctuation of OTOCs, as simulation results. As NS and the computational com- shown in Fig.3B. Here the number of circuit cycles is plexity for tensor-contraction increase, we observe that fixed at 12 and the number of non-Clifford gates in Uˆ, the OTOC fluctuation decreases and the agreement be- ND, is successively changed from 0 to 32. For each ND, tween experiment and simulation also degrades. the individual OTOCs C of 130 random circuit instances To quantify these observations, we define an ideal are measured using a modified normalized procedure [40]. OTOC signal as the fluctuation δC computed from the At ND = 0 where the circuits consist of only Clifford simulated values of C. We also define an experimental er- gates, we see that C takes discrete values of 1 or −1. ror as the RMS deviation between the simulated and the 6 measured values of C. Both quantities are shown as func- In conclusion, we characterize quantum scrambling in tions of NS in the upper panel of Fig.4B. It is seen that a 53-qubit system and demonstrate that entanglement the OTOC signal generally decreases as NS increases (see in the space of quantum operators is the key to com- Extended Data in SM for data at ND = 24 [40]). This putational complexity of quantum observables. This re- change in signal size is difficult to predict theoretically sult highlights the importance of careful classical anal- [40] and may be related to the fact that the Pauli strings ysis in the ongoing quest to attain quantum computa- in Oˆ(t) become less correlated with each other as the tional advantage on various problems of interest. On the light-cone associated with the butterfly qubit grows [42]. other hand, the challenge in predicting OTOC fluctua- On the other hand, the experimental error first decreases tions even moderately beyond the experimental regime to a minimum of ∼0.01 for NS = 232 before starting to also indicates that quantum processors of today can al- increase. The ratio of these quantities is the experimental ready shed light on certain physical phenomena as well as signal-to-noise ratio for OTOC and plotted in the bottom classical computers. Another encouraging finding of our panel of Fig.4B. The SNR is seen to monotonically decay work is that the accuracy of quantum processors can be from a value of 3 at NS = 159 to 0.55 at NS = 271. significantly improved through effective error-mitigation. For example, an SNR of ∼1 for OTOCs is achieved for NS = 251 where the circuit fidelity is merely 3% [40]. At last, we estimate the time needed to simulate the In the immediate future, classical simulation of average OTOC of one circuit on a single CPU core using tensor- OTOCs may be used to efficiently benchmark perfor- contraction, tsim [40]. The result is plotted against NS mance of quantum processors. The experimental frame- in Fig.4C. We observe an exponential increase in tsim as work established here can also be used to study other NS increases, confirming that simulating the scrambling quantum dynamics of interest, such as the integrabil- of complex quantum circuits indeed demands exponen- ity of the XY model (see SM for preliminary data) and tially scaled classical computational resources. In partic- many-body localization [17, 45]. As the fidelity of quan- ular, although simulating the results in Fig.4A currently tum processors continues to increase, modeling scram- 10 requires tsim ≈ 100 hours, this cost becomes tsim > 10 bling in quantum gravity and unconventional quantum hours when NS reaches 400. Chartering a path to this phases may become a reality as well [9, 10, 15, 16]. experimental regime is the focus of our ongoing research. Acknowledgements P. R. and X. M. acknowledge In the SM, we have provided numerical simulation re- fruitful discussions with P. Zoller, B. Vermersch, A. El- sults showing that a percentage decrease in the coherent ben, and M. Knapp. S. M. and J. M. acknowledge the or incoherent errors of iSWAP gates will lead to at least support from the NASA Ames Research Center and the commensurate levels of reduction for the OTOC errors support from the NASA Advanced Division for provid- [40]. Of less certainty is the size of δC in this regime, ing access to the NASA HPC systems, Pleiades and which cannot be predicted by any known classical model Merope. S. M. and J. M. also acknowledge the sup- [40]. Nevertheless, this difficulty also implies that resolv- port from the AFRL Information Directorate under grant ing δC alone might reveal scrambling dynamics beyond number F4HBKC4162G001. J. M. is partially supported the simulation capacity of classical computers. by NAMS Contract No. NNA16BD14C.

[1] Richard P. Feynman, “Simulating physics with comput- [8] Patrick Hayden and John Preskill, “Black holes as mir- ers,” Int. J. Theor. Phys. 21, 467–488 (1982). rors: quantum information in random subsystems,” J. [2] Leslie G. Valiant, “Quantum circuits that can be simu- High Energy Phys. 2007, 120 (2007). lated classically in polynomial time,” SIAM Comput. 31, [9] Yasuhiro Sekino and L. Susskind, “Fast scramblers,” J. 1229–1254 (2002). High Energy Phys. 2008, 065 (2008). [3] Barbara M. Terhal and David P. DiVincenzo, “Classical [10] Nima Lashkari, Douglas Stanford, Matthew Hastings, simulation of noninteracting-fermion quantum circuits,” Tobias Osborne, and Patrick Hayden, “Towards the fast Phys. Rev. A 65, 032325 (2002). scrambling conjecture,” J. High Energy Phys. 2013, 22 [4] Guifr´e Vidal, “Efficient simulation of one-dimensional (2013). quantum many-body systems,” Phys. Rev. Lett. 93, [11] Igor L. Aleiner, Lara Faoro, and Lev B. Ioffe, “Micro- 040502 (2004). scopic model of quantum butterfly effect: Out-of-time- [5] Daniel Gottesman, “Class of quantum error-correcting order correlators and traveling combustion waves,” Ann. codes saturating the quantum hamming bound,” Phys. Phys. 375, 378 – 406 (2016). Rev. A 54, 1862–1868 (1996). [12] Quntao Zhuang, Thomas Schuster, Beni Yoshida, and [6] Scott Aaronson and Daniel Gottesman, “Improved sim- Norman Y. Yao, “Scrambling and complexity in phase ulation of stabilizer circuits,” Phys. Rev. A 70, 052328 space,” Phys. Rev. A 99, 062334 (2019). (2004). [13] J. M. Deutsch, “Quantum statistical mechanics in a [7] Sergey Bravyi and David Gosset, “Improved classical closed system,” Phys. Rev. A 43, 2046–2049 (1991). simulation of quantum circuits dominated by clifford [14] Marcos Rigol, Vanja Dunjko, and Maxim Olshanii, gates,” Phys. Rev. Lett. 116, 250501 (2016). “Thermalization and its mechanism for generic isolated 7

quantum systems,” Nature 452, 854–858 (2008). N. Y. Yao, and I. Siddiqi, “Quantum information [15] Mike Blake, Richard A. Davison, and Subir Sachdev, scrambling in a superconducting qutrit processor,” “Thermal diffusivity and chaos in metals without quasi- https://arxiv.org/abs/2003.03307 (2020). particles,” Phys. Rev. D 96, 106008 (2017). [32] Manoj K. Joshi, Andreas Elben, BenoˆıtVermersch, Tiff [16] Daniel Ben-Zion and John McGreevy, “Strange metal Brydges, Christine Maier, Peter Zoller, Rainer Blatt, from local quantum chaos,” Phys. Rev. B 97, 155117 and Christian F. Roos, “Quantum information scram- (2018). bling in a trapped-ion quantum simulator with tunable [17] D. M. Basko, I. L. Aleiner, and B. L. Altshuler, “Metal- range interactions,” Phys. Rev. Lett. 124, 240505 (2020). insulator transition in a weakly interacting many-electron [33] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, system with localized single-particle states,” Ann. Phys. Joseph C. Bardin, Rami Barends, Rupak Biswas, Sergio 321, 1126 – 1205 (2006). Boixo, Fernando G. S. L. Brandao, David A. Buell, Brian [18] Jarrod R. McClean, Sergio Boixo, Vadim N. Smelyanskiy, Burkett, Yu Chen, Zijun Chen, Ben Chiaro, Roberto Ryan Babbush, and Hartmut Neven, “Barren plateaus Collins, William Courtney, Andrew Dunsworth, Ed- in quantum neural network training landscapes,” Nat. ward Farhi, Brooks Foxen, Austin Fowler, Craig Gidney, Commun. 9, 4812 (2018). Marissa Giustina, Rob Graff, Keith Guerin, Steve Habeg- [19] E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B. ger, Matthew P. Harrigan, Michael J. Hartmann, Alan Blakestad, J. D. Jost, C. Langer, R. Ozeri, S. Sei- Ho, Markus Hoffmann, Trent Huang, Travis S. Humble, delin, and D. J. Wineland, “Randomized benchmarking Sergei V. Isakov, Evan Jeffrey, Zhang Jiang, Dvir Kafri, of quantum gates,” Phys. Rev. A 77, 012307 (2008). Kostyantyn Kechedzhi, Julian Kelly, Paul V. Klimov, [20] Jonas Haferkamp, Felipe Montealegre-Mora, Markus Sergey Knysh, Alexander Korotkov, Fedor Kostritsa, Heinrich, Jens Eisert, David Gross, and Ingo Roth, David Landhuis, Mike Lindmark, Erik Lucero, Dmitry “Quantum homeopathy works: Efficient unitary designs Lyakh, Salvatore Mandr`a,Jarrod R. McClean, Matthew with a system-size independent number of non-clifford McEwen, Anthony Megrant, Xiao Mi, Kristel Michielsen, gates,” https://arxiv.org/abs/2002.09524 (2020). Masoud Mohseni, Josh Mutus, Ofer Naaman, Matthew [21] Daniel A. Roberts, Douglas Stanford, and Leonard Neeley, Charles Neill, Murphy Yuezhen Niu, Eric Os- Susskind, “Localized shocks,” J. High Energy Phys. tby, Andre Petukhov, John C. Platt, Chris Quintana, 2015, 51 (2015). Eleanor G. Rieffel, Pedram Roushan, Nicholas C. Ru- [22] Daniel A. Roberts and Beni Yoshida, “Chaos and com- bin, Daniel Sank, Kevin J. Satzinger, Vadim Smelyan- plexity by design,” J. High Energ. Phys. 2017, 121 skiy, Kevin J. Sung, Matthew D. Trevithick, Amit (2017). Vainsencher, Benjamin Villalonga, Theodore White, [23] Adam Nahum, Sagar Vijay, and Jeongwan Haah, “Op- Z. Jamie Yao, Ping Yeh, Adam Zalcman, Hartmut Neven, erator spreading in random unitary circuits,” Phys. Rev. and John M. Martinis, “ using a X 8, 021014 (2018). programmable superconducting processor,” Nature 574, [24] C. W. von Keyserlingk, Tibor Rakovszky, Frank Poll- 505–510 (2019). mann, and S. L. Sondhi, “Operator hydrodynamics, [34] Pavan Hosur, Xiao-Liang Qi, Daniel A. Roberts, and otocs, and entanglement growth in systems without con- Beni Yoshida, “Chaos in quantum channels,” J. High En- servation laws,” Phys. Rev. X 8, 021013 (2018). ergy Phys. 2016, 4 (2016). [25] Tibor Rakovszky, Frank Pollmann, and C. W. von [35] Brian Swingle, Gregory Bentsen, Monika Schleier-Smith, Keyserlingk, “Diffusive hydrodynamics of out-of-time- and Patrick Hayden, “Measuring the scrambling of quan- ordered correlators with charge conservation,” Phys. Rev. tum information,” Phys. Rev. A 94, 040302 (2016). X 8, 031058 (2018). [36] Beni Yoshida and Norman Y. Yao, “Disentangling scram- [26] Vedika Khemani, Ashvin Vishwanath, and David A. bling and decoherence via quantum teleportation,” Phys. Huse, “Operator spreading and the emergence of dissi- Rev. X 9, 011006 (2019). pative hydrodynamics under unitary evolution with con- [37] B. Vermersch, A. Elben, L. M. Sieberer, N. Y. Yao, and servation laws,” Phys. Rev. X 8, 031057 (2018). P. Zoller, “Probing scrambling using statistical correla- [27] Michael J. Hartmann, Javier Prior, Stephen R. Clark, tions between randomized measurements,” Phys. Rev. X and Martin B. Plenio, “Density matrix renormalization 9, 021061 (2019). group in the heisenberg picture,” Phys. Rev. Lett. 102, [38] C in some cases can have a non-negligible imaginary part 057202 (2009). which is measured through the hσˆxi of Qa. For this work, [28] Jun Li, Ruihua Fan, Hengyan Wang, Bingtian Ye, Bei we only measure the real part of C which is equal to hσˆyi Zeng, Hui Zhai, Xinhua Peng, and Jiangfeng Du, “Mea- of Qa. suring out-of-time-order correlators on a nuclear mag- [39] Sergio Boixo, Sergei Isakov, Vadim Smelyanskiy, Ryan netic resonance quantum simulator,” Phys. Rev. X 7, Babbush, Nan Ding, Zhang Jiang, John Martinis, and 031011 (2017). Hartmut Neven, “Characterizing quantum supremacy in [29] Martin G¨arttner,Justin G. Bohnet, Arghavan Safavi- near-term devices,” Nat. Phys. 14, 595–600 (2018). Naini, Michael L. Wall, John J. Bollinger, and [40] Supplementary materials are available online.. Ana Maria Rey, “Measuring out-of-time-order correla- [41] Brian Swingle and Nicole Yunger Halpern, “Resilience tions and multiple quantum spectra in a trapped-ion of scrambling measurements,” Phys. Rev. A 97, 062113 quantum magnet,” Nat. Phys. 13, 781–786 (2017). (2018). [30] K. A. Landsman, C. Figgatt, T. Schuster, N. M. Linke, [42] Pieter W. Claeys and Austen Lamacraft, “Maximum ve- B. Yoshida, N. Y. Yao, and C. Monroe, “Verified quan- locity quantum circuits,” Phys. Rev. Research 2, 033032 tum information scrambling,” Nature 567, 61–65 (2019). (2020). [31] M. S. Blok, V. V. Ramasesh, T. Schuster, K. O’Brien, [43] Ryszard Horodecki, Pawe lHorodecki, MichalHorodecki, J. M. Kreikebaum, D. Dahlen, A. Morvan, B. Yoshida, and Karol Horodecki, “Quantum entanglement,” Rev. 8

Mod. Phys. 81, 865–942 (2009). the quantum supremacy frontier with a 281 pflop/s sim- [44] Cupjin Huang, Fang Zhang, Michael Newman, Junjie ulation,” Quantum Sci. Technol. 5, 034003 (2020). Cai, Xun Gao, Zhengxiong Tian, Junyin Wu, Haihong [57] Benjamin Villalonga, Sergio Boixo, Bron Nelson, Xu, Huanjun Yu, Bo Yuan, Mario Szegedy, Yaoyun Christopher Henze, Eleanor Rieffel, Rupak Biswas, and Shi, and Jianxin Chen, “Classical simulation of quantum Salvatore Mandr`a,“A flexible high-performance simula- supremacy circuits,” https://arxiv.org/abs/2005.06787 tor for verifying and benchmarking quantum circuits im- (2020). plemented on real hardware,” npj Quantum Inf. 5, 1–16 [45] Xiaoguang Wang, “Entanglement in the quantum heisen- (2019). berg XY model,” Phys. Rev. A 64, 012313 (2001). [58] Johnnie Gray and Stefanos Kourtis, “Hyper- [46] Pasquale Calabrese, Fabian H L Essler, and Giuseppe optimized tensor network contraction,” Mussardo, “Introduction to ‘quantum integrability in out https://arxiv.org/abs/2002.01935 (2020). of equilibrium systems’,” J. Stat. Mech. 2016, 064001 [59] Daniel Gottesman, “The heisenberg representation (2016). of quantum computers,” https://arxiv.org/abs/quant- [47] Rahul Nandkishore and David A. Huse, “Many-body lo- ph/9807006 (1998). calization and thermalization in quantum statistical me- [60] Sergey Bravyi, Dan Browne, Padraic Calpin, Earl Camp- chanics,” Annu. Rev. Condens. Matter Phys. 6, 15–38 bell, David Gosset, and Mark Howard, “Simulation of (2015). quantum circuits by low-rank stabilizer decompositions,” [48] B. Foxen, C. Neill, A. Dunsworth, P. Roushan, B. Chiaro, Quantum 3, 181 (2019). A. Megrant, J. Kelly, Zijun Chen, K. Satzinger, [61] Sergio Boixo, Sergei V Isakov, Vadim N Smelyanskiy, R. Barends, F. Arute, K. Arya, R. Babbush, D. Bacon, and Hartmut Neven, “Simulation of low-depth quan- J. C. Bardin, S. Boixo, D. Buell, B. Burkett, Yu Chen, tum circuits as complex undirected graphical models,” R. Collins, E. Farhi, A. Fowler, C. Gidney, M. Giustina, https://arxiv.org/abs/1712.05384 (2017). R. Graff, M. Harrigan, T. Huang, S. V. Isakov, E. Jef- [62] Igor L Markov, Aneeqa Fatima, Sergei V Isakov, and Ser- frey, Z. Jiang, D. Kafri, K. Kechedzhi, P. Klimov, A. Ko- gio Boixo, “Quantum supremacy is both closer and far- rotkov, F. Kostritsa, D. Landhuis, E. Lucero, J. McClean, ther than it appears,” https://arxiv.org/abs/1807.10749 M. McEwen, X. Mi, M. Mohseni, J. Y. Mutus, O. Naa- (2018). man, M. Neeley, M. Niu, A. Petukhov, C. Quintana, [63] Igor Markov, Aneeqa Fatima, Sergei Isakov, and Sergio N. Rubin, D. Sank, V. Smelyanskiy, A. Vainsencher, Boixo, “Faster classical sampling from distributions de- T. C. White, Z. Yao, P. Yeh, A. Zalcman, H. Neven, and fined by quantum circuits,” APS 2019, K42–010 (2019). J. M. Martinis (Google AI Quantum), “Demonstrating a [64] NASA Ames Research Center, “Merope supercomputer,” continuous set of two-qubit gates for near-term quantum https://nas.nasa.gov/hecc/resources/merope.html. algorithms,” Phys. Rev. Lett. 125, 120504 (2020). [65] Martin Meuer, Horst Simon, Jack Dongarra, Erich [49] Michael A. Nielsen and Isaac L. Chuang, Quantum Com- Strohmaier, and Hans Werner Meuer, “Top500 List,” putation and Quantum Information: 10th Anniversary https://www.top500.org/lists/top500/2020/11/. Edition (Cambridge University Press, 2010). [66] Oak Ridge National Laboratory (ORNL), “Sum- [50] David C. McKay, Christopher J. Wood, Sarah Sheldon, mit Resources,” https://www.olcf.ornl.gov/ Jerry M. Chow, and Jay M. Gambetta, “Efficient z olcf-resources/compute-systems/summit/. gates for ,” Phys. Rev. A 96, 022330 [67] Johnnie Gray, “quimb: A python package for quan- (2017). tum information and many-body calculations,” Journal [51] G. Ithier, E. Collin, P. Joyez, P. J. Meeson, D. Vion, of Open Source Software 3, 819 (2018). D. Esteve, F. Chiarello, A. Shnirman, Y. Makhlin, [68] Beni Yoshida and Alexei Kitaev, “Efficient J. Schriefl, and G. Sch¨on,“Decoherence in a supercon- decoding for the hayden-preskill protocol,” ducting quantum bit circuit,” Phys. Rev. B 72, 134519 https://arxiv.org/abs/1710.03363 (2017). (2005). [52] Evan Jeffrey, Daniel Sank, J. Y. Mutus, T. C. White, J. Kelly, R. Barends, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, A. Megrant, P. J. J. O’Malley, C. Neill, P. Roushan, A. Vainsencher, J. Wenner, A. N. Cleland, and John M. Martinis, “Fast accurate state measure- ment with superconducting qubits,” Phys. Rev. Lett. 112, 190504 (2014). [53] M. B. Hastings, “Observations outside the light cone: Al- gorithms for nonequilibrium and thermal states,” Phys. Rev. B 77, 144302 (2008). [54] Igor L Markov and Yaoyun Shi, “Simulating quantum computation by contracting tensor networks,” SIAM Journal on Computing 38, 963–981 (2008). [55] Jianxin Chen, Fang Zhang, Cupjin Huang, Michael Newman, and Yaoyun Shi, “Classical simu- lation of intermediate-size quantum circuits,” https://arxiv.org/abs/1805.01450 (2018). [56] Benjamin Villalonga, Dmitry Lyakh, Sergio Boixo, Hart- mut Neven, Travis S Humble, Rupak Biswas, Eleanor G Rieffel, Alan Ho, and Salvatore Mandr`a,“Establishing 9

Supplemental Materials

CONTENTS I. EXTENDED DATA

References6 In this section, we provide data not covered in the main text. These include the spatial distribution of average I. Extended Data9 OTOCs on the 53-qubit system, average OTOC behav- A. Full 53-Qubit Average OTOC Data9 iors for additional types of quantum circuits and more B. OTOCs for Non-Integrable and Integrable detailed investigation of errors in experimental OTOC Quantum Circuits9 measurements. C. Characterization of Experimental Errors 11 A. Full 53-Qubit Average OTOC Data II. Experimental Techniques √ 13 A. Calibration of iSWAP and iSWAP Gates 13 Figure S1 shows experimentally measured average 1. Reducing Conditional-Phase Errors 14 OTOCs, C, for 51 different possible butterfly qubit lo- 2. Arbitrary Z-Rotations 14 cations. The data are also shown for every circuit cycle B. Gate Error Benchmarking and Cross-Talk from 1 through 15. Here iSWAP is used as the two-qubit Mitigation 15 gate. The sharp nature of the wavefront propagation is C. Dynamical Decoupling 16 readily visible, where a large reduction from C = 1 is ob- D. Elimination of Bias in hσˆyi 17 served as soon as the lightcone of the measurement qubit E. Light-cone Filter 18 (black filled circle) reaches a given qubit. In comparison, F. Normalization via Reference Clifford similar data are shown up to a√ total of 24 circuit cycles Circuits 19 for random circuits containing iSWAP gates (Fig. S2). The dynamics is seen to be slower than iSWAP, as men- III. Large-Scale Simulation of OTOCs of Individual tioned in the main text. In particular, the wavefront is Circuits 20 also broader, as seen by the more gradual spatial transi- A. Numerical Calculation of the OTOC Value 21 tion from C = 1 to C ≈ 0. B. Branching Method 21 C. Tensor Contraction 22 B. OTOCs for Non-Integrable and Integrable Quantum Circuits IV. Markov population dynamics 24 A. Symmetric single qubit gate set 24 In the main text of this work, we have primarily fo- B. Efficient Population Dynamics for the cused on quantum circuits that are non-integrable [46], Averaged OTOC 25 i.e. capable of evolving a quantum system into states C. Sign Problem in the Population Dynamics for with maximal degrees of scrambling. In general, many OTOC Fluctuations 26 quantum circuits or dynamical processes are integrable D. Population Dynamics for iSWAP Gate Sets and lead to small degree of quantum scrambling even at Implemented in the Main Text 27 long time scales. Examples include Clifford circuits [5], 1. Clifford Gate Set 27 dynamics of free fermions [3] and many-body localiza- 2. Universal Gate Set 27 tion [47]. For certain integrable, pseudo-random circuits such as Clifford circuits, OTOC fluctuation is needed to V. Efficient Population Dynamics for Noisy distinguish them from non-integrable circuits, as demon- Circuits 28 strated in the main text. In many other cases, average A. Inversion Error 28 OTOCs behave differently for integrable quantum cir- cuits compared to non-integrable ones and are sufficient B. Generic Error Model 28 to differentiate between them, which we illustrate next. C. Error Mitigation for Average OTOC 29 Figure. S3 shows two different types of quantum cir- D. Theory of Error Mitigation for Individual √ cuits. The circuit in the left panel consists of iSWAP√ Circuits 30 √gates and√ single-qubit√ gates randomly chosen from X±, Y ±, W ±, V ±. This is the example of a non- VI. Numerical Simulation of Error-Limiting integrable circuit, which is expected to lead to maximal Mechanisms for OTOC 30 scrambling at long times. The circuit in the right panel A. Coherent and Incoherent Contributions 30 has the same two-qubit gates, but the single-qubit gates B. Perturbative Expansion of OTOC Error 31 are replaced with Z gates that have angles randomly cho- 10

Cഥ (iSWAP) 1 Cycle 2 Cycles 3 Cycles 4 Cycles 5 Cycles 6 Cycles 1.0

0.8

0.6

0.4

0.2 7 Cycles 8 Cycles 9 Cycles 10 Cycles 11 Cycles 12 Cycles

0.0

-0.2

-0.4

-0.6 13 Cycles 14 Cycles 15 Cycles

FIG. S1. Full evolution of average OTOCs for iSWAP random circuits. The average OTOCs of the 53-qubit system shown for every cycle up to a total of 15. The black unfilled (filled) circle represents the location of the ancilla (measurement) qubit. The colors of the other filled circles represent the values of C for different locations of the butterfly qubit. The two-qubit gate used here is iSWAP and the data are averaged over 38 random circuit instances. sen from the interval [−π, π]. This is the example of an which is of wide interest in condensed-matter physics due integrable circuit, where the dynamics does not lead to to its ability to capture a variety of interesting physi- maximal scrambling if implemented in 1D. cal phenomena such as quantum phase transitions. To The experimentally measured average OTOCs are demonstrate an immediate application of our work, we shown in Fig. S3B for both types of quantum circuits. use OTOCs to study a particular feature of the XY- Here, the two-qubit gates are applied in a brick-work model, namely the transition from integrability to non- pattern similar to what is used in Fig. 1C of the main integrability due to geometry. text. For the non-integrable circuits, we observe a clear It is well-known that XY-model in 1D exhibits inte- propagation of the OTOCs along with a diffusive broad- grable dynamics, as demonstrated in Fig. S3. Dynamics ening of the wavefronts, similar to what was observed for XY-model in 2D remains a highly active area of re- in Fig. 2C of the main text. In particular, C monoton- search and is generally non-integrable. In Fig. S4, we ically decays toward 0 at large circuit cycles. On the show that the transition into non-integrability for XY- other hand, C does not show the wave-like propagation model occurs as soon as the geometry changes from 1D for the integrable circuits. For qubits closer to the mea- to a ladder-like geometry (Fig. S4A). Here we have ar- surement qubit Q1, C first decays but gradually increases ranged 16 qubits into two parallel chains of 8 qubits and for longer circuit cycles. Qubits further from Q1 barely connected them with 5 “cross-links”. Next, we measure show any appreciate degree of OTOC decay up to 50 OTOCs of random circuits implemented with this new circuit cycles. Overall, C for different qubits converges geometry where the single-qubit gates are again ran- toward values > 0.5 for the largest circuit cycles probed domly chosen from Z gates with angles√ in the interval in this experiment. These results show how one might in [−π, π] and the two-qubit√ gates are iSWAP. The order some cases uses average OTOC behavior to distinguish for applying the iSWAP gates is shown in Fig. S4B. non-integrable quantum dynamics from integrable ones. The average OTOCs for this ladder-geometry XY The integrable circuits studied in Fig. S3 in fact are model are shown in Fig. S4C. It is seen that the wavelike the digitized realizations of the so-called XY model [45] propagation and long-time limits of C = 0 characteristic 11

Cഥ ( iSWAP) 1 Cycle 2 Cycles 3 Cycles 4 Cycles 5 Cycles 6 Cycles 1.0

0.8

0.6

7 Cycles 8 Cycles 9 Cycles 10 Cycles 11 Cycles 12 Cycles

0.4

0.2

0.0

13 Cycles 14 Cycles 15 Cycles 16 Cycles 17 Cycles 18 Cycles

19 Cycles 20 Cycles 21 Cycles 22 Cycles 23 Cycles 24 Cycles

√ FIG. S2. Full evolution of average OTOCs for iSWAP random circuits. The average OTOCs of the 53-qubit system shown for every cycle up to a total of 24. The black unfilled (filled) circle represents the location of the ancilla (measurement) qubit. The colors of√ the other filled circles represent the values of C for different locations of the butterfly qubit. The two-qubit gate used here is iSWAP and the data are averaged over 24 random circuit instances. of non-integrable quantum circuits are both recovered in is the signal-to-noise ratio to the number of non-Cliffords this modified geometry, indicating a transition from inte- in the circuits? grability to non-integrability for the XY-model. Detailed We first address question 1. The role of finite sam- experimental studies of this transition with larger num- pling in experimental OTOC measurement may be un- bers of qubits is a subject of future work. derstood as follows: For nstats single-shot measurements, an average error of √ 1 is expected to be present in nstats the estimate for hσˆ i of the ancilla qubit due to statis- C. Characterization of Experimental Errors y tical uncertainty. In the presence of circuit errors and a

normalization value hσˆyiI < 1, this expected statistical In this section, we present additional characterization √1 error is amplified to a value of hσˆ i n , assuming the data to further corroborate and understand the experi- y I stats mental results in Fig. 4 of the main text. In particular, ideal OTOC value to be significantly smaller than 1 (i.e. we focus on answering two questions: 1. What fraction of hσˆyiI  | hσˆyiB | where hσˆyiB is hσˆyi measured with the the observed experimental errors can be attributed to sta- butterfly operator applied). tistical errors due to limited sampling? 2. How sensitive In Fig. S6A, this expected statistical error is plotted 12

Non-Integrable Integrable A A Q Q C −1 a 1 1.0

Q1 X V Y Q1 Z11 Z12 Z1K

iSWAP iSWAP iSWAP −1 iSWAP 0.8 Q2 W X V Q2 Z21 Z22 Z2K

B iSWAP Q X−1 W X−1 Q Z Z iSWAP Z 0.6

3 3 31 32 3K Cycle 1:

C

iSWAP iSWAP iSWAP −1 −1 iSWAP QN Y X Y QN ZN1 ZN2 ZNK 0.4 Cycle 2: Q Q Q Q B a 1 2 16 0.2 Cycle 3: 0.0 1.0 1.0 0 5 10 15 20 25 30 35 0.8 0.8 Cycles 0.6

0.6

ഥ ഥ C C 0.4 FIG. S4. Transition into non-integrability in the XY Model. 0.4 (A) A 2D arrangement of qubits in the form of two parallel 1D 0.2 chains with 5 connections between each other. Black unfilled 0.0 0.2 (filled) circle denotes the ancilla (measurement)√ qubit Qa -0.2 0.0 (Q1). (B) Order for applying the√ two-qubit gates iSWAP. 0 10 20 30 40 0 10 20 30 40 50 The qubit connections denote the iSWAP gates that are ap- Cycles Cycles plied for a particular cycle. Additional cycles√ repeat the first three cycles, e.g. cycle 4 applies the same iSWAP gates as FIG. S3. Average OTOCs for non-integrable and integrable cycle 1 and so on. (C) Average OTOCs C with the butterfly quantum circuits. (A) Two different types of quantum cir- qubit corresponding to qubits Q2 through Q16. The locations cuits. Left√ panel shows a non-integrable quantum circuit of the butterfly qubit are indicated by the colors of the data composed√ of √iSWAP√ and single-qubit√ gates randomly drawn symbols and the colors of the qubits in (A) and (B). All data ± ± ± ± from X , Y , W , V . Right√ panel shows an in- are averaged over 36 circuit instances. tegrable quantum circuit composed of iSWAP and single- qubit gates that are rotations around the z-axis of the Bloch A B sphere with random angles. (B) Average OTOCs C for the 0.1 two types of random circuits, implemented on a 1D chain of 16 Experimental Error 0.1 Experimental Error qubits. The qubit configuration is shown on top, where the Statistical Error Statistical Error unfilled (filled) black circle represents the ancilla (measure- ment) qubit Qa (Q1). Left panel shows C with the butterfly 0.01

qubit corresponding to qubits Q2 through Q16, where the ran- OTOC Errors OTOC dom circuits are of the type described in the left panel of (A). Errors OTOC Right panel shows similar data for the random circuits of the 0.01 type in the right panel of (A). The locations of the butterfly N = 251, N = 40 qubit are indicated by the colors of the data symbols and the S D 0.001 1 10 160 180 200 220 240 260 legend on top. All data are averaged over 40 circuit instances. 6 nstats (10 ) NS

FIG. S5. Contribution of finite sampling to experimental as a function of nstats for NS = 251 and ND = 40 where OTOC error. (A) Dependence of experimental OTOC error on the number of single-shots used to estimate hσˆyi, nstats. hσˆyiI ≈ 0.04. On the same plot, we have included exper- imental OTOC errors (i.e. the RMS deviation between The dashed line shows expected statistical errors for different simulated and experimental OTOC values of 100 random values of nstats. Here the number of iSWAPs is NS = 251 circuits) obtained from reduced amounts of single-shot and the number of non-Cliffords is ND = 40. (B) Comparison of experimental OTOC errors and expected statistical errors data. We observe that the experimental error initial de- for different values of NS. The experimental errors are repro- creases with increasing values of nstats, suggesting that duced from Fig. 4C of the main text. The statistical errors statistical uncertainty being a significant source of er- are calculated based on nstats and the average normalization ror for small numbers of single-shot measurements. At value for each NS. 7 nstats > 10 , the experimental error has a significantly weaker dependence on nstats and is markedly higher than the expected statistical error, indicating other er- error. ror sources are dominant in this regime. In Fig. S6A, we Next, we focus on the scaling of experimental error vs have re-plotted the experimental errors in Fig. 4C of the number of iSWAPs for a different number of non-Cliffords main text along with the expected statistical errors cal- in the quantum circuits. Fig. S6A shows C of 100 circuit culated from nstats and hσˆyiI of each NS. The increase in instances for NS = 113, 251 and 300, all of which have statistical error at higher NS is a result of decreasing nor- ND = 24. On the same plots, exactly simulated OTOC malization values. It is evident that the observed exper- values using the branching method (Section IV) are also imental errors are consistently larger than the expected plotted. Similar to Fig. 4B of the main text, we observe statistical errors, indicating other error mechanisms are that the agreement between experimental and simulated dominant. In SectionVI, we provide numerical simula- OTOC values degrades as NS increases. In addition, the tion results to further analyze the sources of experimental OTOC signal size (i.e. the fluctuation of the simulated 13

A B A B FSIM tp 1.0 NS = 113 ND = 24 f1

0.1 )

0.5 ) Z’ ( )  Z’ ( )

C 1 1  0.0 ( -0.5 f2 gmax

0.01 0 iSWAP

0.2 ( CPHASE NS = 251 g Z’ (2) Z’ (2) 0.1 time C 0.0 Signal 0.001 Experimental Error C 0.0 0.5 1.0 1.5 D 0.0 0.2 0.4 0.6 -0.1 Statistical Error  (rad)  (rad) 0.2 3 NS = 300 0.1 20 20

2 Sycamore Sycamore C

0.0 SNR (MHz)

1 (MHz) 15 15 

-0.1 

/2 /2

10 10 max

0 20 40 60 80 100 150 200 250 300 max iSWAP iSWAP g g iSWAP iSWAP Circuit Instance, i NS 5 5 10 15 20 25 30 35 40 10 15 20 25 30 35 40 FIG. S6. OTOC errors for ND = 24. (A) OTOCs of 100 ran- tp (ns) tp (ns) dom circuit instances, C, for different values of NS. ND = 24 E F 1.0 1.0 for all circuits. The dashed lines are exact numerical simu- iSWAP iSWAP

lation results using the branching method. (B) Upper panel: 0.5 0.5 ECDF OTOC signal size, experimental error and estimated statisti- ECDF cal error as functions of N . N = 24 for all data included 0.0 0.0 S D 1.50 1.55 1.60 0.0 0.1 0.2 here. Lower panel: SNR (i.e. ratio of OTOC signal size to 1.0 1.0 iSWAP iSWAP experimental error) as a function of NS.

0.5 0.5

ECDF ECDF 0.0 0.0 0.76 0.77 0.78 0.79 0.80 0.05 0.10 0.15  (rad)  (rad) OTOC values) also decreases as NS increases.

In Fig. S6B, the OTOC signal size, experimental error FIG. S7. Calibrating “textbook” gates. (A) Schematic il- and expected statistical error are plotted as functions of √lustration of the waveforms used to realize pure iSWAP and NS (all with ND = 24). Here we see that the OTOC sig- iSWAP gates: The |0i → |1i transition frequencies of two nal size indeed monotonically decreases as NS increases. transmon qubits, f1 and f2, are brought close to each other The experimental error, on the other hand, shows less by adjusting the control fluxes to their superconducting quan- dramatic changes as a function of NS. After taking the tum interference device (SQUID) loop. Concurrently, a pulse ratio of the OTOC signal size and the experiment er- to the coupler’s SQUID flux changes the qubit-qubit cou- pling g from 0 to a maximum value of g . The total ror, we plot the resulting SNR as a function of NS in max length of the pulses is tp. (B) The generic FSIM gate re- the lower panel of Fig. S6B. The scaling of SNR vs NS is roughly consistent with Fig. 4 of the main text where alized by the waveforms in (A), composed of a partial-iSWAP gate with swap angle θ, a CPHASE gate with a conditional- higher values of ND (64 and 40) are used. In particu- phase φ and four local Z-rotations with angles α , α , β , β larly, SNR falls below 1 after N has increased beyond 1 2 1 2 S each. (C, D) Simulated θ (C) and φ (D) as functions of gmax ∼250. The relative insensitivity of SNR to ND allows us and tp. The simulation assumes an average qubit frequency to conveniently benchmark our system at lower values of 1 2 (f1 + f2) = 6.0 GHz and an anharmonicity of 200 MHz for ND where classical simulation is easy. This is especially each transmon qubit. The operating√ points for three different important in the future when we conduct experiments gates, Sycamore, iSWAP and iSWAP, are indicated by the with NS > 400 where tensor-contraction may no longer the star symbols. (E, F) Integrated histogram (empirical cu- be feasible (SectionIII). mulative distribution function,√ ECDF) of θ (E) and φ (F) for both the iSWAP and iSWAP gates. The data include all 86 qubit pairs on the processor. The x-axis location of each dashed line indicates the median value of the corresponding angle. II. EXPERIMENTAL TECHNIQUES √ In this section, we describe the calibration process and A. Calibration of iSWAP and iSWAP Gates metrology of quantum gates used in the OTOC experi- ment. In addition, we also demonstrate a series of error- The measurement of OTOCs requires faithful inversion mitigation strategies used in the compilation of exper- of a given quantum circuit, Uˆ. The “Sycamore” gate used imental circuits which significantly reduced errors from in our previous work [33], equivalent to an iSWAP gate various sources such as those related to state prepara- followed by a CPHASE gate with a conditional-phase of tion and readout (SPAM), cross-talk, or coherent control π/6 radians (rad), is an ill-suited building block for Uˆ errors on two-qubit gates. since its inversion cannot be easily created by combining 14

Sycamore with√ single-qubit gates. On the other hand, The calibrated values of θ and φ associated with ev- iSWAP and iSWAP are commonly used two-qubit gates ery qubit pair on the 53-qubit processor are shown in that can be readily inverted by adding local Z rotations: Fig. S7E and Fig. S7F, respectively. Each angle is mea- sured via cross entropy benchmarking (XEB), similar to −1 π   π   π  π  previous works [33, 48]. The median value of θ is 0.783 G = Z1 Z2 − GZ1 − Z2 (S1) √ 2 2 2 2 rad (standard deviation = 0.010 rad) for iSWAP and 1.563 rad (standard deviation = 0.027 rad) for iSWAP. Here√G is the two-qubit unitary corresponding to iSWAP −i ϕ σˆ(i) (i) These median values are√ very close to the target values or iSWAP and Zi(ϕ) = e 2 z , whereσ ˆz is the of π/4 = 0.785 rad for iSWAP and π/2 = 1.571 rad for Pauli-Z matrix acting on qubit i (i = 1 or 2). Compared √ iSWAP.√ In the case of φ, the median value is 0.112 rad to Sycamore, realizing pure iSWAP and iSWAP gates for iSWAP (standard deviation = 0.026 rad) and 0.136 requires the development of two additional capabilities rad for iSWAP (standard deviation = 0.055 rad). The on our quantum processor: 1. A significant reduction of median values of φ are close to predictions from numer- the coherent error associated with the conditional-phase ical simulation and 4 to 5 times lower compared to the in Sycamore. 2. The abilility to implement arbitrary Sycamore gate. single-qubit rotations around the Z axis. The coherent error introduced by remnant values of φ is further reduced by adjusting other phases of a FSIM unitary UFSIM(θ, φ, ∆+, ∆−, ∆−,off), defined as: 1. Reducing Conditional-Phase Errors 1 0 0 0  i(∆++∆−) i(∆+−∆−,off) We first describe the calibration technique for reducing 0 e cos θ −ie sin θ 0   i(∆++∆−,off) i(∆+−∆−)  . conditional phase errors. As discussed in previous works 0 −ie sin θ e cos θ 0  i(2∆ −φ) [33], such a phase arises from the dispersive interaction 0 0 0 e + between the |11i and |02i (or |20i) states of two trans- (S2) mon qubits while their coupling strength g is raised to a Here ∆+, ∆− and ∆−,off are phases that can be freely finite value to enable a resonant interaction between the adjusted by local Z-rotations. Imperfect gate calibration |10i and |01i states (Fig. S7A). Although eliminating this often results in an actual two-qubit unitary Ua that dif- phase is possible by concatenating a Sycamore gate with fers slightly from the target unitary Ut, leading to a Pauli a pure CPHASE gate having an opposite conditional- error [33, 49]: phase [48], this process would involve complicated wave- 1 †  2 forms and large qubit frequency excursions which are de- rp = 1 − tr U Ut (S3) D2 a manding to implement on a 53-qubit processor. Instead, we adopt an alternative approach based on the relatively Here D = 4 is the dimension of a two-qubit Hilbert space. simple control waveform used in Ref. [33] (Fig. S7A). Given that our target√ unitaries are Ut = This waveform sequence produces a Fermionic Simula- UFSIM(π/4, 0, 0, 0, 0) for iSWAP and Ut = tion (FSIM) gate comprising a partial-iSWAP gate with UFSIM(π/2, 0, 0, 0, 0) for iSWAP, one may naively swap angle θ, a CPHASE gate with conditional-phase φ expect that ∆+, ∆− and ∆−,off should all be set to 0 in and four local Z-rotations (Fig. S7B). Ua in order to minimize rp. While this is indeed the case Next, we consider how the angles θ and φ depend on if φ = 0 in Ua, it is not true when φ assumes a finite value readily tunable waveform parameters such as the pulse φa in Ua. In fact, simple algebraic calculation shows duration tp and maximum qubit-qubit coupling gmax. that in such a case, the minimum value of rp occurs at This is done by numerically solving the time-evolution of (∆+, ∆−, ∆−,off) = (φa/2, 0, 0), where it is a factor of two coupled transmons with typical device parameters. 3 smaller compared to (∆+, ∆−, ∆−,off) = (0, 0, 0). By The dependences of θ and φ on tp and gmax are shown calibrating our system such that ∆+ = φa/2 for every in Fig. S7C and Fig. S7D, respectively. We observe that qubit pair, we estimate that the median Pauli error although both θ and φ increase linearly with tp, their scal- introduced√ by the conditional-phase to be only 0.07 % ing with respect to gmax is different: θ ∝ gmax whereas for iSWAP and 0.12 % for iSWAP. 2 φ ∝ gmax. For a given value of θ, it is therefore possible to reduce φ by increasing tp and decreasing gmax while 2. Arbitrary Z-Rotations keeping tpgmax constant. However, since longer gate op- eration is more susceptible to decoherence effects such as relaxation and dephasing, it is also desirable to minimize We now describe the implementation of arbitrary values of tp. As a result, we√ use tp = 12 ns, gmax/2π ≈ 10 single-qubit Z-rotations on the quantum processor.√ In −1 MHz for calibrating the iSWAP gate and tp = 36 ns, addition to constructing the inverse gates iSWAP −1 gmax/2π ≈ 7 MHz for calibrating the iSWAP gate. Based and iSWAP , Z-rotations are also used in removing na- 0 on the simulation results, these choices would√ yield val- tive Z-rotations Z in the FSIM gate (Fig. S7B) as well ues of θ = π/4 rad, φ = 0.14 rad for the iSWAP gate as adjusting values of ∆+ to minimize conditional-phase and θ = π/2 rad, φ = 0.19 rad for the iSWAP gate. errors. The procedure for incorporating Z-rotations 15

A A iSWAP B iSWAP Z (11) Z (12) R /2 (1) Z (13) Before  1.0 1.0

Compilation FSIM FSIM 0.8 0.8 Z (21) Z (22) R/2 (2) Z (23)

0.6 0.6 ECDF Z ( ) R ( −  ) Z (13 + 12) ECDF 0.4 0.4 After 11 /2 1 12 Isolated Isolated

Compilation 0.2 Simultaneous, uncorrected 0.2 Simultaneous, uncorrected

FSIM FSIM Z (21) R/2 (2 − 22) Z (13 + 12) Simultaneous, corrected Simultaneous, corrected 0.0 0.0 0.01 0.02 0.03 0.04 0.05 0.01 0.02 0.03 0.04 0.05 0.06 r r B C C p p Z (1) Z (1) Before 1.0

Compilation

iSWAP iSWAP

Z (2) Z (2) 0.1

y ෝ f  1 0.01 Z (2) After f Uncorrected Compilation 2 0.001 Corrected iSWAP Z ( ) 1 g 0 2 4 6 8 10 12 14 Cycles FIG. S8. Implementation of Z-rotations. (A) Circuit compi- lation procedure for reducing number of required Z-rotations. FIG. S9. Fixed-Unitary XEB and cross-talk correction. (A, Z-rotations occurring after each FSIM gate are combined with B) Integrated histograms (ECDF) of Pauli errors for both the the Z-rotations before the next FSIM gate. Phases on the √ iSWAP (A) and the iSWAP (B) gates. Three sets of values microwave-driven single-qubit gates are shifted to maintain are shown for each case: the isolated (blue) curves represent the same overall quantum evolution. ϕ denotes the an- ij the errors obtained by operating each qubit pair individu- gles of various Z-rotations and χ denotes the phases of mi- i ally with all other qubits at ground state. The simultaneous, crowave single-qubit gates. (B) Circuit compilation procedure uncorrected (red) curves represent the errors obtained by op- for pushing Z-rotations through an iSWAP gate. Combined erating all qubits at the same time, without any additional with (A), Z-rotations in circuits containing only iSWAP and correction. The simultaneous, corrected (green) curves repre- single-qubit gates become entirely virtual. (C) Schematic il- sent the simultaneous error rates after additional Z-rotations lustration of the waveforms used to implement Z-rotations √ are included in the circuits to offset unitary shifts induced before each iSWAP gate. Compared to the waveforms in by cross-talk effects. Simultaneous error rates are obtained Fig. S7A, an additional control flux pulse is used to detune from four different configurations of parallel two-qubit oper- the qubits from their idle frequencies and thereby achieve a ations, similar to Ref. [33]. Dashed lines indicate the median “physical” Z-gate. locations of various errors.√ All error rates are obtained from XEB with pure iSWAP or iSWAP gate used in simulation, and include contributions from two single-qubit gates and one into quantum circuits is two-fold: First, we recompile two-qubit gate. (C) An OTOC experimental configuration the circuits and combine all Z-rotations occurring af- for evaluating the effects of cross-talk correction. The empty ter each two-qubit FSIM gate with the Z-rotations be- circle represents the ancilla qubit and the black filled circle fore the next FSIM gate, as shown in Fig. S8A. If any represents the measurement qubit. All other qubits are rep- microwave-driven single-qubit gate such as Rπ/2(χ) = resented by purple spheres. (D) The OTOC normalization −i π (cos χσˆ +sin χσˆ ) value hσˆ i as a function of number of cycles in a quantum e 4 x y occurs between the two FSIM gates, y ˆ whereσ ˆ andσ ˆ are Pauli-X and Pauli-Y matrices and χ circuit U for the configuration shown in (C), measured both x y ˆ is the phase of the microwave drive, we apply the equiv- with and without applying the cross-talk corrections. U is alence R (χ)Z(ϕ) = Z(ϕ)R (χ − ϕ) to shift the ro- composed of iSWAP and random single-qubit π/2 rotations π/2 π/2 around axes on the XY plane. tation axis of the single-qubit gate and “push” the Z- rotation through. This process effectively reduces the number of Z-rotations in the circuits by a factor of 2 and has been demonstrated to incur negligible degradation two-qubit gates using additional control flux pulses, as il- in the overall fidelity, since the only physical changes are lustrated in Fig. S8C. Here, we detune the qubit frequen- modifications to the phases of the microwave drives of cies from their idle positions by an variable amount√ ∆f single-qubit gates [50]. and for a fixed duration tz = 20 ns before each iSWAP gate, leading to a Z-rotation ≈ Z (2π∆ftz). The second step for√ implementing Z-rotations is differ- ent for iSWAP and iSWAP gates. In the case of iSWAP, the equivalence UFSIM(π/2, 0, 0, 0, 0)Z1(ϕ1)Z2(ϕ2) = B. Gate Error Benchmarking and Cross-Talk Z1(ϕ2)Z2(ϕ1)UFSIM(π/2, 0, 0, 0, 0) allows us to push Z- rotations through each two-qubit gate by simply rear- Mitigation ranging their phases (Fig. S8B). As a result, the Z- rotations occurring in circuits√ with iSWAP gates are en- √ The Pauli errors of the calibrated iSWAP and tirely virtual. In the case of iSWAP, such equivalence iSWAP gates are measured through XEB [33, 39], does not exist and we implement Z-rotations before the which uses a collection of random circuits comprising in- 16

Spin Echoes terleaved two-qubit and random single-qubit gates. For A ෝ B ෝ each random circuit, the probability distribution of all y y Qa: |+Yۧ Qa: |+Yۧ X X Y Y X X possible output bit-strings is both measured experimen- Q1: |+Xۧ Q1: |+Xۧ tally and computed numerically. The statistical corre- Q2: |+Xۧ Q2: |+Xۧ † Q : |+Xۧ † Q : |+Xۧ lation between the two distributions (“cross-entropy”) is 3 푈෡ 푈෡ 3 푈෡ 푈෡ then used to infer the total error of each quantum circuit. Q : |+Xۧ Q : |+Xۧ After measuring a sufficient number of circuit instances N N C tc (s) at different circuit depths, it has been shown that XEB 0 1 2 3 4 reliably yields gate errors that are very consistent with 0.8 values obtained from conventional characterization meth- 0.6 ods such as randomized benchmarking [19, 33, 48].

y 0.4 ෝ A key difference between the XEB process used in our  0.2 current work compared to prior experiments [33] is the Without Spin Echo 0.0 unitaries used in the numerical computation of the bench- With Spin Echo -0.2 marking circuits. Previously, such unitaries are freely 0 5 10 15 20 25 30 35 40 adjusted for each individual qubit pair, whereby val- Cycles ues of various FSIM angles θ, φ, ∆+, ∆− and ∆−,off are optimized to obtain the lowest Pauli errors. In FIG. S10. Dynamical coupling in OTOC experiments. (A) this work, we do not perform such an optimization Test experimental circuit for benchmarking the intrinsic co- step during gate error characterization and instead use herence of the ancilla qubit. Compared to an actual OTOC a fixed unitary (U (π/2, 0, 0, 0, 0) for iSWAP and circuit, the CZ gates between the ancilla qubit and measure- FSIM √ ment qubit is removed. (B) Same circuit as (A) with the UFSIM(π/4, 0, 0, 0, 0) for iSWAP) for all qubit pairs. additional of consecutive spin echo pulses to the ancilla qubit The gate errors characterized by the “fixed-unitary” during the quantum circuits Uˆ and Uˆ †, in the form of random XEB process include contributions from both incoher- X or Y gates. (C) The projection of the ancilla qubit to the ent sources such as relaxation and dephasing and co- y-axis, hσˆyi, at the end of the circuit, measured with (red) herent sources such as remnant conditional-phases. The and without (blue) spin echo. The bottom x-axis of the plot adoption of a more stringent benchmarking criterion for shows the number of cycles in Uˆ (Uˆ † has the same number gate errors is motivated by the fact that both coherent of cycles), and the top x-axis shows the corresponding total and incoherent errors can lead to imperfect reversal of circuit duration tc. quantum circuits and adversely impact the accuracy of OTOC measurements, in contrast to previous experiment in which coherent errors are compensated by modifying ous operation is reduced√ to 0.0140 (0.0131) for iSWAP circuits in simulation [33]. and 0.0142 (0.0123) for iSWAP. The Pauli error rate rp per cycle (aggregate error of The effects of the cross-talk correction can be readily two single-qubit gates and one two-qubit gate) associated seen in the “normalization” values used in the OTOC with each qubit pair is first measured in isolation. The experiment. Figure S9C shows the configuration for a results are plotted as integrated histograms in Fig. S9A 53-qubit OTOC experiment. The corresponding OTOC and Fig. S9B, where we observe a mean (median) error normalization hσˆyi as a function of the number of cy- cles in a quantum circuit is shown in Fig. S9D. Without √of 0.0109 (0.0106) for iSWAP and 0.0106 (0.0089) for iSWAP. These higher errors rates compared to our applying the additional Z-rotations for cross-talk correc- previous work [33] are a result of enhanced incoherent tion, hσˆyi decays rapidly and falls below 0.1 after 8 cycles. After applying the additional Z-rotations, hσˆ i decays at errors due to longer√ pulses used in iSWAP and additional y detuning pulses in iSWAP, as well as coherent errors a visibly slower rate and falls below 0.1 after 10 cycles. arising from remnant conditional-phases. The slower decay of hσˆyi is indicative of more accurate We then repeat the same process but measure the error inversion of the quantum circuit after cross-talk correc- rates of different pairs simultaneously. We first observe, tion. similar to previous work [33], a sizable increase in rp, with the mean (median) being√ 0.0193 (0.0163) for iSWAP and 0.0190 (0.0161) for iSWAP. To reduce these cross-talk C. Dynamical Decoupling effects, we first fit the XEB results to obtain the shifts in the two-qubit unitary associated with each individual We now describe a series of additional error-mitigation qubit pair, which are often related to the single-qubit techniques used in the compilation of quantum circuits phases ∆+, ∆− and ∆−,off. In a second step, instead that further improve the accuracy of OTOC experiments. of simply incorporating these shifts into classical simula- The first such technique is dynamical decoupling, moti- tion as was done previously [33], we add local Z-rotations vated by long “idling” times at non-ground states for into the quantum circuits to offset the unitary shifts. The some of the qubits during the experiment. The most parallel error rates are then re-measured with these Z- prominent of such qubits is the ancilla qubit, which re- rotations. The mean (median) value of rp for simultane- mains idle throughout the time needed to implement the 17

ˆ ˆ † quantum circuit U and its inverse U . Intrinsic decoher- A P Unbalanced Readout ↑ 1.0 ence of the ancilla qubit can in principle limit the circuit Q R ( ) X Unbalanced a /2 v 0.5 depth at which OTOC can be resolved, especially if the Balanced

P y 0.0 ෝ Balanced Readout ↑  characteristic time T2 is comparable to the total duration Qa R/2 (v) X -0.5 of Uˆ and Uˆ †. + P -1.0 ↑ 0.0 0.5 1.0 1.5 2.0 To benchmark the intrinsic T2 of the ancilla qubit dur- R ( ) −1 Qa /2 v X  /2 (rad) ing an OTOC experiment, we design the test circuits v shown in Fig. S10A and Fig. S10B. In either case, we B P↑

Qa R/2 (p) R/2 (m) Unbalanced State removed the two CZ gates otherwise present in an ac- Preparation: Q Y tual OTOC experiment such that the ancilla qubit does 1 p = 0, m = 0 and  Q2 Y not interact with any other qubit apart from cross-talk ෡ ෡† Balanced State Q Y 푈 푈 effects. The difference between the two cases is that b Preparation: p = 0 and , m = 0 and  the ancilla qubit remains completely idle during Uˆ and QN Y Uˆ † in Fig. S10A, whereas a train of spin echo pulses C Q Q Q Q Q Q Q Q Q Q Q Q Q X − X − Y − Y − X − X... is applied to the ancilla a 1 2 3 4 5 6 7 8 9 10 11 12 ˆ ˆ † qubit during U and U in Fig. S10B. Unbalanced State Preparation Balanced State Preparation The y-axis projection of the ancilla qubit, hσˆyi, at the 0.04 0.04

end of Uˆ and Uˆ † is measured both with and without the y

y 0.02 0.02

ෝ ෝ  added spin echoes. The results are shown in Fig. S10C.  We observe that without spin echo, hσˆyi decays rather 0.00 0.00 quickly despite no entanglement between the ancilla and other qubits in the system, falling to ∼0 after 25 circuit 1.0 1.0 0.8 0.8 cycles (corresponding to an evolution time tc ≈ 3.0 µs).

0.6 0.6

ഥ ഥ C The Gaussian shape of the decay at earlier times suggests C that low-frequency noise is likely the dominant source of 0.4 0.4 decoherence [51]. On the hand, with the addition of spin 0.2 0.2 0.0 0.0 echo, hσˆyi decays at a much slower rate maintaining a 0 10 20 30 40 0 10 20 30 40 value of 0.78 even after 40 circuit cycles (tc ≈ 4.6 µs). Cycles Cycles − tc By fitting hσˆyi to a functional form Ae T2 where A and T2 are fitting parameters, we obtain a coherence time FIG. S11. Unbiased measurements of hσˆyi. (A) Left panel T2 = 28.6 µs for the ancilla qubit, which is close to the shows the test circuits for characterizing hσˆyi arising from asymmetry in readout fidelities. Two different schemes are 2T1 limit of our quantum processor [33]. Given this value implemented. The conversion between the excited-state pop- of T2 is significantly longer than all OTOC experimental circuits used in this work (the longest circuit lasts ∼5 µs), ulation(s) P↑ and hσˆyi is described in the main text. Right panel shows the experimental values of hσˆyi as a function of we conclude that changes in the ancilla projection hσˆyi the variable phase ϕv, obtained with both readout schemes. in the OTOC experiments are indeed dominated by the Dashed lines show fits to hσˆ i = (1 − 2F ) cos (ϕ ) + d , ˆ ˆ † y r v r many-body effects in U and U rather than the intrinsic where Fr and dr are fitting parameters. (B) Schematic of decoherence of the ancilla qubit itself. For all experimen- the full OTOC circuit showing two different state prepara- tal results presented in the main text, spin echo is applied tion schemes: in the unbalanced scheme, only one phase is to the ancilla qubit. used for the first Rπ/2 gate on the ancilla qubit. In the bal- anced scheme, two different phases are used. Balanced read- out is used in both schemes. (C) Results of a 12-qubit OTOC experiment. Upper panels show hσˆyi at different number of D. Elimination of Bias in hσˆyi circuit cycles. The black squares represent the normalization case and the colored spheres represent hσˆyi obtained with the butterfly operator (Z) successively applied to qubits 2 (Q ) The accuracy of OTOC measurements is particularly 2 through 12 (Q12). The location of the butterfly operator for susceptible to any bias in hσˆyi of the ancilla qubit, i.e. each curve is indicated by the color legend on top. The nor- a fixed offset d to the ideal value. Such a bias can, for malized OTOCs, C, are shown in the lower panels. The data example, be introduced by the different readout fidelities are the average values of 40 different random circuit instances. for the |0i and |1i states of the ancilla qubit. To see the impact of the bias on OTOC, we consider an ideal OTOC value of C and an ideal normalization value of hσˆ i . The 0 y I hσˆyiB+d C0hσˆyiI+d becomes C1 = ≈ . Assuming typical ideal y-projection of the ancilla with the butterfly oper- hσˆyiI+d hσˆyiI+d values of hσˆyi = 0.05 and C0 = 0.1, even a small asym- ator applied, hσˆyiB, in such a case is hσˆyiB ≈ C0 hσˆyiI. I However, in the presence of a bias, the measured pro- metry d = 0.01 would lead to a highly erroneous value of C1 ≈ 0.25. It is therefore crucial to identify and mitigate jections become hσˆyi = hσˆyiI + d for the normalization any bias in hσˆyi of the ancilla qubit. value and hσˆyi = hσˆyiB + d with the butterfly opera- tor applied. The experimental value for OTOC then We begin by measuring hσˆyi-bias due to asymmetry 18 in readout errors. The test circuit is shown in the left preparation, the hσˆyi values exhibit sudden rise from 0 to panel of Fig. S11A under the label “unbalanced read- >0.006 at cycles 29 to 38. This behavior is inconsistent out”. We first project the qubit onto the equator of the with the effects of scrambling and decoherence, either of Bloch sphere with a π/2 rotation, Rπ/2 (ϕv), where ϕv is which is expected to lead to monotonic decay of hσˆyi to- a variable√ phase. A second π/2 rotation around a fixed ward 0 for a non-integrable process. In contrast, with axis, X, is then applied and excited state population balanced state preparation, hσˆyi indeed monotonically P↑ is finally measured. The population is then converted decays toward 0 at large cycles for all curves. to hσˆyi via hσˆyi = 2P↑ − 1. The right panel of Fig. S11A The bottom panels of Fig. S11C show the normalized shows hσˆyi as a function of ϕv and a fit to a functional average OTOCs, C, for each qubit. Here we observe form hσˆyi = (1−2Fr) cos (ϕv)+dr. Here Fr is the average that C obtained with the unbalanced state preparation of the readout fidelities of the |0i and |1i states, and dr scheme again manifests unphysical jumps from 0 at cy- is their difference. For this unbalanced readout scheme, cles 29 to 38, resulting from the finite bias hσˆyi and the we obtain Fr = 0.0962 and dr = 0.055. The observed mechanism outlined at the beginning of this section. In difference in readout fidelities is consistent with past ex- contrast, C obtained with the balanced state prepara- periments where it was shown that energy relaxation of tion scheme decays monotonically toward 0, in agreement the qubit during dispersive readout generally leads to with the scrambling behavior of a non-integrable process. lower readout fidelity for the |1i state compared to the These data suggest that a symmetrization step duration |0i state [52]. the state preparation phase of the ancilla qubit is needed The bias dr = 0.055 is significantly reduced by adopt- to completely remove the bias in hσˆyi, in addition to the ing a ”balanced readout” scheme, also shown in the left symmetrization step before readout. The physical origin panel of Fig. S11A. Here, we measure the P↑ of the an- of the remnant bias seen in the left panels of Fig. S11C is not completely understood at the time of writing, and cilla qubit√ with√ the second gate in the test circuit being either X or −X. The results are then combined to could be related to control errors in the single-qubit gates obtain hσˆyi = (P↑,+ − P↑,−), where P↑,± is P↑ measured on the ancilla qubit, incomplete depolarization of T1 er- √ rors by the spin echoes, or other unknown mechanisms. with the second gate being ±X. The averaged hσˆyi is shown in the right panel of Fig. S11A as a function of ϕv. A similar fit as before yields the same average readout fidelity Fr = 0.0962 and a much reduced bias E. Light-cone Filter dr ≈ 0.0002. Balanced readout alone, as we will demonstrate below, Analogous to classical systems, quantum perturbations is insufficient for completely removing bias from hσˆyi. often travel at a limited speed (the “butterfly veloc- For all experiments reported in the main text, we apply ity”). This typically results in a “light-cone” structure a second symmetrization step shown in Fig. S11B, which for many quantum circuits, which can be capitalized to is the measurement scheme for OTOC with the gates re- reduce their classical simulation costs [53]. Similarly, the lated to SPAM explicitly shown. Here, we parametrize light-cone structure of these quantum circuits may also both the phase ϕp of the first π/2 gate and the phase be utilized to modify their implementations on a quan- ϕm of the last π/2 gate on the ancilla qubit. In an “un- tum processor and improve the fidelity of experimental balanced state preparation” scheme, we measure the av- results. In this section, we describe a light-cone-based erage projections hσˆyi = (P↑,++ − P↑,+−), where P↑,++ circuit re-compilation technique that led to considerable and P↑,+− are P↑ obtained with (ϕp, ϕm) = (0, 0) and improvements in the accuracy of experimental OTOC (ϕp, ϕm) = (0, π), respectively. In a ”balanced state measurements. preparation” scheme, we measure the average projec- Figure S12A displays the generic structure of an 1 tions hσˆyi = 2 (P↑,++ − P↑,+− − P↑,−+ + P↑,−−), where OTOC measurement circuit, where the component gates † P↑,−+ and P↑,−− are additional populations obtained of the quantum circuit Uˆ and its inverse Uˆ are explic- with (ϕp, ϕm) = (π, 0) and (ϕp, ϕm) = (π, π), respec- itly shown. The butterfly operator possesses a pair of tively. triangular light-cones extending from the middle of the The difference between the two state preparation circuit into both Uˆ and Uˆ †. Quantum gates outside these schemes is illustrated in Fig. S11C, where we present the light-cones may be completely removed (“filtered”) with- results of a 12-qubit OTOC experiment. The√ quantum out altering the output of the circuit. Furthermore, since circuit Uˆ is non-integrable and composed of iSWAP the measurement at the end of the circuit is also local- and random single-qubit π/2 rotations around 8 differ- ized at a single qubit (Q1), one may additionally discard ent axes on the XY plane. A total of 40 circuit instances quantum gates outside the light-cone of Q1 originating are used and the data shown are average values over all from the right-end of Uˆ †, without altering circuit output. instances. The top panels show the measured values of The gates removed by the light-cone filter are shown with hσˆyi for the normalization case and cases where a butter- semi-transparent colors in Fig. S12A. In practice, some fly operator Z is successively applied to qubits 2 to 12 in- qubits have much longer idling times as a result of gate between Uˆ and Uˆ †. The y-axis scale is intentionally lim- removal and become more susceptible to decoherence ef- ited to < 0.05. We observe in the case of unbalanced state fects such as relaxation and dephasing. To mitigate such 19

A ments are shown in Fig. S12B and Fig. S12C. The left Q 1 panel of Fig. S12B shows the configuration for a 53-qubit

Q2 OTOC experiment. Here we choose a quantum circuit Uˆ with iSWAP and random single-qubit gates which are Q3 π/2 rotations around axes on the XY plane. The axes Q4 of rotation are chosen such that exactly√ 48 non-Clifford√ Q 5 rotations (randomly selected from ±W and ±V ) oc-

Q6 ˆ ˆ † cur in U and U . All other single-qubit√ gates√ are Clifford Q7 rotations randomly selected from ±X and ±Y ). The number of circuit cycles is fixed at 11. The right panels QN of Fig. S12B show experimental results for 100 individual ෡ ෡† 푈 푈 instances of Uˆ, whereby hσˆyi for the normalization case R iSWAP iSWAP-1 random is plotted at the top and hσˆyi with the butterfly oper- −1 or iSWAP or iSWAP -1 Rrandom ator (X) applied is plotted at the bottom. We observe B significant enhancements in the amplitudes of hσˆyi after 0.25 0.20 the application of the light-cone filter, with the normal-

y 0.15 ෝ  ization hσˆyi values averaging to 0.024 without the filter 0.10 Unfiltered Filtered 0.05 and 0.194 with filter. 0.00 The experimental improvement facilitated by the light- 0.15 Unfiltered Filtered 0.10 cone filter is more clearly seen by comparing the normal-

y 0.05

ෝ  0.00 ized OTOC values C with exact numerical simulation -0.05 of the same circuits, as shown in Fig. S12C. Without 0 20 40 60 80 100 Circuit Instance, i the light-cone filter, the experimental C values are sub- C D stantially different from simulation results. On the other 1.5 Unfiltered 1.0 Unfiltered hand, with the light-cone filter, the agreement between Filtered 1.0 Filtered

Simulation experimental and simulated values is much closer. We ε C 0.5 0.5 further quantify the effect of light-cone filter by plot- ting the differences between numerical and experimental 0.0 0.0 values of C, , in Fig. S12D. Here we observe that the

0 20 40 60 80 100 0 20 40 60 80 100 root-mean-square (RMS) value for  is 0.156 without the Circuit Instance, i Circuit Instance, i light-cone filter, whereas it is reduced to 0.041 after fil- ter is applied. This four-fold improvement in accuracy of FIG. S12. Re-compiling OTOC circuits with light-cone filter. OTOC measurements is a natural consequence of the re- (A) Schematic of an OTOC measurement circuit, including duction of number of gates in the overall quantum circuit ˆ ˆ † the component gates of the quantum circuits U and√U . The (the number of iSWAP gates is reduced from 464 to 161 ancilla qubit and its related gates, as well as the Y gates for the example in Fig. S12). Although the additional used in the state preparation of all qubits, are omitted for qubit idling introduced by the light-cone filter carries er- simplicity. Gates shown with semi-transparent colors can be removed from the OTOC measurement circuit without al- rors as well, they are expected to be much less than the tering its output. (B) Left panel shows a configuration for errors of the removed two-qubit gates, particularly when evaluating experimental effects of the light-cone filter. The spin echo is also applied during the idling. unfilled circle represents the ancilla qubit and the black filled circle represents the measurement qubit. The purple filled circle indicates the butterfly qubit. The total number of cir- F. Normalization via Reference Clifford Circuits cuit cycles is 11. Right panel shows the measured values of hσˆ i for 100 random circuit instances. Data obtained in the y The last error-mitigation technique we use for the normalization case are shown on top and data obtained with ˆ the butterfly operator applied are shown at the bottom. The OTOC experiment is specific to quantum circuits U com- same number of repetitions (1 million) is used in all cases to posed of predominantly Clifford gates with a small num- estimate hσˆyi. (C) Normalized experimental OTOC values C ber of non-Clifford gates. For such circuits, it is found for different circuit instances, plotted alongside exact numeri- through numerical studies that a modified normaliza- cal simulation results. (D) Experimental errors  for different tion procedure yields more accurate values of OTOC ˆ circuit instances, corresponding to the differences between ex- (Fig. S13A): Consider a quantum√ circuit U√composed perimental and simulated values. of mostly Clifford gates (iSWAP, ±X√ and ±√Y ) and a small number of non-Clifford gates ( ±V and ±W ). We first measure the hσˆyi of the ancilla with a butter- effects, we also apply spin echo to qubits with long idling fly operator applied between Uˆ and Uˆ † (we denote this times, similar to the approach to the ancilla qubit in value as hσˆyi2), same as before. In a second step, instead Fig. S10. of measuring hσˆyi without applying the butterfly opera- The effects of the light-cone filter on OTOC measure- tor (denoted by hσˆyi0), we measure hσˆyi with the same 20

A ues of OTOC, C, in two different ways: First, we apply Q X V−1 Y Q X X Y

1 1 C = hσˆyi2 / hσˆyi0, which corresponds to the normaliza-

iSWAP iSWAP iSWAP −1 iSWAP −1 Q2 Y X X Q2 Y X X tion procedure used in the previous sections. Second,

we apply C = hσˆ i / hσˆ i , corresponding to normal- iSWAP −1 iSWAP −1 −1 −1 y 2 y 1 Q3 X W X Q3 X Y X

ization using hσˆyi of a reference circuit. Here the abso-

iSWAP iSWAP iSWAP Q Y iSWAP X−1 Y−1 Q Y X−1 Y−1 N N lute sign accounts for the fact that the theoretical OTOC ෡ ෡ values of Clifford circuits are ±1. The resulting C val- 푈 푈ref B ues are both plotted alongside exact simulation results

0 푈෡ (Norm.)

y 0.05 ෝ  in the left panel of Fig. S13C. It is easily seen that the 0.00 second normalization procedure with reference Clifford 0.10 1 circuits yields experimental values that are in much bet-

y 0.00

ෝ  ෡ -0.10 푈ref ter agreement with simulation results. Indeed, the ex-

2 perimental errors  (Fig. S13D) have an RMS value of

y 0.00 ෝ  0.250 when C = hσˆyi / hσˆyi is used and 0.157 when 푈෡ 2 0 -0.10 0 20 40 60 80 100 C = hσˆyi2 / hσˆyi1 is used. Given these observations, we Circuit Instance, i adopt normalization via reference Clifford circuits when ෝ / ෝ  /|  |  /  /| | C y 2 y 0 ෝy 2 ෝy 1 Simulation D ෝy 2 ෝy 0 ෝy 2 ෝy 1 measuring quantum circuits dominated by Clifford gates. 1.5 0.6 Lastly, we note that for the data in Fig. 3 and Fig. 4 of 1.0 0.4 0.5 the main text, we apply a ensemble of reference circuits

0.2 ˆ ε C 0.0 0.0 Uref and use the average value of |hσˆyi| obtained from all -0.5 -0.2 Uˆ to normalize hσˆ i of the actual quantum circuit Uˆ. -0.4 ref y -1.0 -0.6 ˆ ˆ -1.5 The typical number of Uref for each U is 10 in Fig. 3 and 0 20 40 60 80 100 0 20 40 60 80 100 varies between 15 and 70 for Fig. 4. Circuit Instance, i Circuit Instance, i

FIG. S13. Normalization via reference Clifford circuits. (A) III. LARGE-SCALE SIMULATION OF OTOCS ˆ Schematic of two quantum circuits: U is the actual quantum OF INDIVIDUAL CIRCUITS circuit of interest, composed√ mostly√ of Clifford gates and a few non-Clifford gates ( ±V and ±W ). Uˆref is a reference ˆ In the last few years, there has been a constant devel- circuit with the same Clifford gates as U. The non-Clifford√ ˆ opment of new numerical techniques to simulate large gates√ in U are replaced with random Clifford gates ±X and ±Y in Uˆref. (B) Left panel shows an experimental con- scale quantum circuits. Among the many promising figuration for comparing two normalization procedures. The methods, two major numerical techniques are widely black unfilled (filled) circle represents the ancilla (measure- used on HPC clusters for large scale simulations: tensor ment) qubit. The purple filled circle represents the butterfly contraction [44, 54–58] and Clifford gate expansion qubit. The total number of circuit cycles is 14. Right panel [6,7, 59, 60]. All the aforementioned methods have shows the measured values of hσˆyi for 100 random circuit advantages and disadvantages, which mainly depend on instances. hσˆyi0 denotes values obtained without applying the underlying layout of the quantum circuits and the ˆ butterfly operator to U, hσˆyi1 denotes values obtained with type of used gates. On the one hand, tensor contraction ˆ butterfly operator applied to Uref, and hσˆyi2 denotes values works best for shallow circuits with a small treewidth obtained with butterfly operator applied to Uˆ. The same [58, 61]. On the other hand, Clifford gate expansion is number of repetitions (4 millions) is used in all cases to es- mainly used to simulate arbitrary circuit layouts with timate hσˆyi. (C) Normalized experimental OTOC values C few non-Clifford gates. Indeed, it is well known that for different circuit instances, plotted alongside exact numeri- circuits composed of Clifford gates only can be simulated cal simulation results. (D) Experimental errors  for different in polynomial time [59], with a numerical cost which circuit instances, corresponding to the differences between ex- perimental and simulated values. grows exponentially with the number of non-Clifford gates [6]. Both methods can be used to sample exact and approximate amplitudes, with a computational cost which decreases with an increasing level of noise. butterfly operator but a different quantum circuit Uˆ ref For instance, approximate amplitudes can be sampled and its inverse Uˆ † (denoted by hσˆ i ). The reference ref y 1 by slicing large tensor network and contracting only a ˆ ˆ circuit Uref has the same Clifford gates as U, whereas the fraction of the resulting sliced tensors [62, 63]. The final ˆ non-Clifford gates in U√are replaced√ with Clifford gates fidelity of the sample amplitudes is therefore propor- chosen randomly from ±X and ±Y . tional to the fraction of contracted slices [56, 57], which Example data showing hσˆyi1, hσˆyi2 and hσˆyi3 are can be tuned to match experimental fidelity. Similarly, shown in Fig. S13B, where Uˆ contains a total of 8 non- it is possible to sample approximate amplitudes by only Clifford rotations and 14 cycles. Similar to the previous selecting the dominant stabilizer states in the Clifford section, we present results from 100 circuit instances. expansion [7, 60]. Next, we process the data to obtain experimental val- 21

In our numerical simulations, we used tensor contrac- ND = 8 tion to compute approximate OTOC values, which are 104 ND = 12 then validated using results from the Clifford expansion ND = 16 3 for circuits with a small number of non-Clifford gates. 10 ND = 20 Both methods are described in the following sections. ND = 24 102 18.0 ± 2.4 s/branch 5%-95%

[single node] 101

A. Numerical Calculation of the OTOC Value branching time (s) 100 As described in the main text and shown in Fig. 1(A), 101 103 105 107 109 the experimental OTOC circuits have density-matrix-like nb structure of the form Cˆ = Uˆ † σˆ(Qb) Uˆ, withσ ˆ(Qb) being the butterfly operator. In all numerical simulations, we used iSWAPs as entangling two-qubit gates. Before and after Cˆ, a controlled-Z gate is applied between the qubit Q1 (in Cˆ) and an ancilla qubit Qa (external to Cˆ): the OTOC value is therefore obtained by computing the ex- pectation value of hσˆyi relative to the ancilla qubit Qa. To reduce the computational cost, it is always possible to project the ancilla to either 0 or 1 (in the computational (Q1) (Q1) basis). Let us call Cˆ0 = Cˆ (Cˆ1 =σ ˆz Cˆ σˆz ) the cir- cuit with the ancilla qubit projected on 0 (1). Therefore, the OTOC value hσˆyi can be obtained as:

hσˆyi = R [hψ1|ψ0i] , (S4) FIG. S14. (Top) Runtime (in seconds) to explore a given number of branches (results are for nodes with 2 six-core ˆ ˆ with |ψ0i = C0|+i and |ψ1i = C1|+i respectively. Intel Xeon [email protected]). (Bottom) Number nb of ex- plored branches by varying the number ND of non-Clifford rotations (boxes extend from the lower to upper quartile val- B. Branching Method ues of the data, with a line at the median, while whiskers correspond to the 5% − 95% confidence interval). The inset shows the projected runtime on Summit. To get exact OTOC value for circuit with a small number of non-Clifford rotations, we used a branch- ing method based on the Clifford expansion. More † precisely, recalling that OTOC circuits have a density- in U only (U and U may have a different number of matrix-like structure, that is Cˆ = Uˆ † σˆ(Qb) Uˆ = non-Cliffords because of the different lightcones acting † † (Q )  on them. See Fig. S12A). More precisely, our branching gˆ ··· gˆ σˆ b gˆ1 ··· gˆt , it is possible to apply each t 1 algorithms scales as the number of branches n induced  † (Qb) b pair of gates gˆt, gˆt toσ ˆ iteratively and “branch” by the N non-Clifford rotations in U, that is O (n ). only for non-Clifford gates. D b After the applications of all {gˆ } gates, the At the beginning of the simulation, a Pauli “string” is i i=1, ...,t (Qb) OTOC circuit Cˆ will be then represented as a super- initialized to all identities except aσ ˆx operator (the position of distinct Pauli strings, each with a different butterfly operator) on the butterfly qubit Qb (in this ex- amples, the Pauli X is chosen as butterfly operator), that amplitude. The full states |ψ0i and |ψ1i can be then ob- tained by applying the initial state |+i to the Clifford is P = Iˆ(1)Iˆ(2) ··· σˆ(Qb) ··· . Whenever a pair of Clif- x expansion of Cˆ and, therefore, the OTOC value from ford operators gˆ , gˆ† is applied to P, the Pauli string t t Eq. (S4). To reduce the memory footprint of our branch- is “evolved” to another Pauli string. For instance, an ing method, the initial state |+i is always applied to all iSWAP operator evolves the Pauli string P = Iˆσˆ to x Pauli strings once the last gate in U is applied. There- P0 = iSWAP† Iˆσˆ  iSWAP = − σˆ σˆ . On the con- x y z fore, our branching algorithm will output both |ψ i and trary, if a non-Clifford operator is applied to P, the 0 |ψ i as a superposition of binary strings. Because some Pauli string P will evolve into a superposition of mul- 1 of binary string composing |ψ i and |ψ i may have a zero tiple Pauli strings, that can be eventually explored as 0 1 amplitude, (due to destructive interference), the number independent branches.√ As an example, the non-Clifford √ of binary strings np composing |ψ0i and |ψ1i is typically rotationg ˆt = W = X + Y will branch P =σ ˆx three σˆ smaller than the number of explored branches nb, that is times into P0 = √σˆx , P0 = √y and P0 = σˆz respectively. 1 2 2 2 3 2 np ≤ nb.  † Because gˆt, gˆt are always applied in pairs (one gate Fig S14 (top) shows the number of explored branches † from U and one gate from U ), the computational com- nb by varying the number ND of non-Clifford rotations plexity depends on the number ND of non-Clifford gates in Uˆ. Because the only non-Clifford gates used in the 22

20.79ND 5 223 VM

4 Peak VM (GB) 218 3 13 p 2 n 2 28 1 23 0 0 2 4 6 8 10 12 14 16 18 20 22 24

ND

FIG. S15. Number np of bitstrings in |ψ0i and |ψ1i which 100 are different from zero after applying the butterfly operator Cˆ = Uˆ †σˆ(Qb)Uˆ to the initial state |+i, by varying the number 10 4

ND of non-Clifford rotations. The shaded area correspond 8 to the peak of the amount of virtual memory (RAM) used 10 to simulate the OTOC circuits (95% among different simula- 10 12 tions). abs. error 10 16 1 projection 20 projection 10 20 √ √ 0 4 8 12 16 20 24 ± ± OTOC experimental circuits are W and V , with ND W = X+Y and V = X−Y respectively, one can compute the expected scaling by assuming that, at each branch- ing point, there is an homogeneous probability to find FIG. S16. (Top) Comparison between exact and approximate OTOC values (circuits are ordered accordingly to the exact any of the four Pauli operators {I,X,Y,Z}. Because OTOC value over different circuits with different depths, lay- non-Clifford rotations branch only twice on Z, thrice on outs and numbers of non-Clifford rotations). The Pearson {X,Y } and never on I, the expected scaling is nb ∝ n n coefficient between exact and approximate OTOC values is 3 2 2 4 , which has been confirmed numerically in Fig. S14 R = 0.99987. (Bottom) Absolute error by varying the num- (top). Fig S14 (bottom) shows the runtime (in seconds) ber ND of non-Clifford rotations (boxes extend from the lower to explore a given number of branches on single nodes to upper quartile values of the data, with a line at the me- of the NASA cluster Merope [64]. While the interface of dian, while whiskers correspond to the 5% − 95% confidence our branch simulator is completely written in Python, the interval). core part is just-in-time (JIT) compiled using numba to achieve C−like performance. Our branch simulator also uses multiple threads (24 threads on the two 2 six-core C. Tensor Contraction Intel Xeon [email protected] nodes) to explore multiple branches at the same time and it can explore a single branch in ∼ 18 µs. Because multithreading starts for Tensor contraction is a powerful tool to simulate large 3 quantum circuits [44, 54–58]. In our numerical simula- nb > 10 only, it is possible to see the small jump caused by the multithreading overhead. The inset of Fig. S14 tions, we use tensor contraction to compute approximate shows the projected runtime on Summit by rescaling OTOC values. It is well known that approximate am- Summit plitudes can be sampled by properly slicing the tensor to Summit’s Rmax [65](Rmax = 200,794.9 TFlops, Merope (single node) network and only contracting a fraction of the sliced ten- Rmax = 140.64 GFlops), assuming that |ψ0i sor networks [56, 57, 62, 63]. However, in our numeri- and |ψ1i can be fully stored in Summit (seed Fig. S15). cal simulations, we used a different approach to compute In Fig. S15, we report the number of elements different approximate OTOC values. More precisely, rather than from zero in |ψ0i and |ψ1i. As one can see, the number of computing one (approximate) amplitude at a time using 0.79 ND elements different from zeros scales as 2 and can tensor contraction, we compute (exact) “batches” of am- be accommodated on single nodes (shaded area corre- plitudes by leaving some of the terminal qubits in the sponds to the amount of virtual memory [RAM] used by tensor network “open”. Let us call κ a given projection the simulator). Because branches are explored using a of the non-open qubits and us define |ψ(κ)i and |ψ(κ)i the depth-first search strategy, most of the virtual memory 0 1 projection of |ψ i = P |ψ(κ)i and |ψ i = P |ψ(κ)i re- used by our branching algorithm is reserved to store |ψ0i 0 κ 0 1 κ 1 spectively. Therefore, we can re-define the OTOC value and |ψ1i, it would be in principle possible to simulate as a “weighted” average of partial OTOC values, that is: between ND ≈ 50 and ND ≈ 60 non-Clifford rotations on Summit before running out of virtual memory (Sum- P mit has 10 PB of available DDR4 RAM among its 4,608 κ ωκhσˆyiκ hσˆyi := P , (S5) nodes [66]). κ ωκ 23 with

 (κ) (κ)  RQC R hψ1 |ψ0 i hσˆyiκ = (S6) (κ) 2 (κ) 2 kψ0 k + kψ1 k /2

(κ) 2 (κ) 2 and ωκ = kψ0 k + kψ1 k /2. It is interesting to observe that Eq. (S6) corresponds to the correlation co- efficient between two non-normalized states and that the exact OTOC value is equivalent to the weighted average of single projection OTOC values. Indeed, when all κ are included, Eq. (S5) reduces to Eq. (S4). However, because hσyi needs to be O(1) to be experimentally measurable, few projection κ may actually be sufficient to get a good estimate of Eq. (S5). FIG. S17. Number of slices required to fit the largest tensor In our numerical simulations, we left 24 qubits open in memory during tensor contraction with a threshold of 228 and we used 20 random κ projections to compute an elements by varying the number NS of iSWAPs (boxes extend approximate OTOC value using Eq. (S5). If not indi- from the lower to upper quartile values of the data, with a line cated otherwise, medians and confidence intervals are at the median, while whiskers correspond to the 5%−95% con- computed by bootstrapping 1,000 times Eq. (S5) using fidence interval). Green stars correspond to expected number 10 randomly chosen projections among the 20 available. of slices to compute single amplitudes of random quantum circuits (RQC) with a similar depth [33, 58]. (Inset) Number Fig. S16 shows the comparison of approximate and exact of elements to store in memory for the largest tensor during OTOC values, by varying the circuit index (top) and by tensor contraction. For sufficiently deep circuits, the number varying the number of non-Clifford rotations (bottom). of elements for the largest tensor saturates to the size of the As one can see, there is a great agreement between ap- Hilbert space 2n, with n = 53 the number of qubits in the proximate and exact results across different circuits with Sycamore chip. different depth, layout and number of non-Clifford rota- tions (top). The bottom part of Fig. S16 shows the ab- solute error by varying the number of non-Clifford rota- pute single amplitudes of random quantum circuits with tions. In particular, we are comparing results by averag- a similar number of iSWAPs [33, 58]. Because OTOC ing over single projections κ using Eq. (S6) (orange/light circuits and the random quantum circuits presented in gray boxes) or by bootstrapping 1,000 times Eq. (S5) us- [33] share a similar randomized structure, the compu- ing 10 randomly chosen projections among the 20 avail- tational complexity mainly depends on the number of able (blue/dark gray boxes). As expected, results become iSWAPs (OTOC slicing is slightly worse because of the less noisy by increasing the number of used projections. open qubits). We may expect an improvement in the slic- Due to the limited amount of memory in HPC nodes, ing by using novel techniques as the “subtree reconfigu- slicing techniques are required to fit the tensor contrac- ration” proposed by Huang et al. [44]. It is interesting tion on a single node and avoid node-to-node memory to observe that, for deep enough circuits, the number of communication overhead [57, 58]. In our numerical sim- elements of the largest tensor in the tensor contraction ulations, we used cotengra [58] and quimb [67] to iden- saturates to the number of qubits (inset). tify the best contraction (including slicing) and perform Fig. S18 summarizes the computational cost to com- the actual tensor contraction respectively. Because each pute an exact single projection κ. Top panel of Fig. S18 different projection κ may lead to a slightly different reports the actual runtime to contract a tensor network, simplification of the tensor network (in our numerical by varying its expected total cost (including the slic- simulations, we used the rank and column reduction in- ing overhead) in FLOPs (expected FLOPs are using by cluded in the quimb library), we recompute the best cotengra to identify an optimal contraction). Dashed contraction for each single projection. For each pro- and dot-dashed lines correspond to the peak performance jection (regardless of the depth/number of iSWAPs) we and expected performance of a 2 six-core Intel Xeon fixed max repeats = 128 in cotengra and restarted the [email protected] node. The sustained performance of 64% heuristics 10 times to identify the optimal contraction is consistent with similar analysis on the NASA cluster (with a hard limit of 2 minutes for each run), using 24 [57]. Bottom panel reports the number of FLOPs (with threads on a 2 six-core Intel Xeon [email protected] and without the slicing overhead) by varying the number node. We found that the runtime to the best contrac- of iSWAPs. On the right y-axis, it is reported the ex- tion scales as 0.03NS − 3.68 minutes, with NS the num- trapolated number of days to simulate OTOC circuits of ber of iSWAPs in the circuit, that is less than 20 min- a given number of iSWAPs, by assuming a sustained per- utes for ∼ 600 iSWAPs. Fig. S17 shows the number of formance of 148,600 TFLOPs. Green stars corresponds slices required to have the largest tensor in the tensor to the total FLOPs (including the slicing overhead) to contraction no larger than 228 elements. Green stars in compute single amplitudes of random quantum circuits the figure correspond to the number of slices to com- with a similar number of iSWAPs [58]. 24

nearest neighbour qubits hi, ji. For each pair a cycle con- peak performance (140.64 GFLOPs) sists of two single qubit gates Gˆi, Gˆj and an entangling expected performance (64%) 104 two-qubit gate,

i iφ − θ(XiXj +YiYj )− ZiZj Uˆi,j(θ, φ) = e 2 2 , (S7) 3 10 parameterized by fixed angles θ, φ. The random in- stances are generated by drawing single qubit gates from [single node] n ˆ o 102 a specific finite set Gi . We consider two sets cor- responding√ √ to generators of single qubit Clifford group

contraction time/projection (s) 30 32 34 36 38 40 42 44 46 48 { X±1, Y ±1} and the set which in conjunction with log2(flops+cuts) any√ entangling√ two-qubit√ √ gate generates a universal set { X±1, Y ±1, W ±1, V ±1} introduced in Sec.I. 40 200 with slicing overhead 10 Average OTOC can be calculated by first considering w/o slicing overhead Summit days/projection the dynamics of the pair of butterfly operators Oˆ(t) = 30

n 175 RQC 10 [148,600 TFLOPs] † o ˆ i U OU evolved under circuit U. We introduce the average t 0.46x 104.64

c 150

e 20 of the pair of the operators acting in the direct product of j 10 o r 125 two Hilbert spaces of the two replicas of the same circuit: p /

) 10 s 100 10 p ˆ(2) ˆ ˆ ˆ ⊗2 o O (t) = O(t) ⊗ O(t) ≡ O(t) (S8) l f ( 75 1 day on Summit 0 2 10

g 1 second on Summit

o where averaging over ensemble of the circuits is denoted l 50 10 10 by (...), and the rightmost equation defines the short- 25 hand notation. 113 163 208 251 292 356392422456 497 555 593 Analogously, one can introduce the higher order aver- ages NS Oˆ(4)(t) = Oˆ(t) ⊗ Oˆ(t) ⊗ Oˆ(t) ⊗ Oˆ(t) ≡ Oˆ(t)⊗4, (S9) FIG. S18. (Top) Runtime (in seconds) to fully contract a single projection by varying the required number of FLOPs as a starting point to analyze the circuit to circuit fluc- (results are for single nodes with 2 six-core Intel Xeon tuations of C(t). [email protected]). Because long double are used throughout Then the value of OTOC is obtained by taking the the simulation, we used the conversion factor 8 to convert matrix element of C(t) = hψ| Mˆ Oˆ(t)Mˆ Oˆ(t)|ψi with the FLOPs to actual time. (Bottom) Flops to contract a single initial state that in our experimental setup is chosen to be projection by varying the number NS of iSWAPs, with and Nn |ψi = i=1 |+ii where |+ii is the symmetric superposi- without the slicing overhead. Summit’s days are obtained tion of the computational basis states. The average and by using Summit’s Rpeak [65] and the conversion factor 8 from FLOPs to actual time (because long double’s have been used the second moment of C(t) are obtained as a straight- throughout the simulation). Green stars correspond to ex- forward convolution of the indices in Eqs. (S8) and (S9) pected number of FLOPs to compute single amplitudes of respectively. Operator Oˆ(4) can be also used to study the random quantum circuits (RQC) with a similar number of effect of the initial state to the OTOC (clearly its average iSWAPs [33, 58]. is not sensitive to the initial conditions).

A. Symmetric single qubit gate set IV. MARKOV POPULATION DYNAMICS

We first consider average over uniformly random single For a broad class of circuit ensembles the average qubits gate such that for any Pauli matrix, OTOC can be computed efficiently (in polynomial time) † on a classical computer. This appears surprising as com- αˆi ≡ Gˆ αˆiGˆ = 0, (S10) puting the output of the random circuit is expected to require exponential resource. This contradiction is re- We will analyze the specific discrete gate sets used in the solved by demonstrating an exact mapping of the average experiment in SectionIVD. In this section, we use the evolution of OTOC onto a Markov population dynam- Latin indices to label a qubit, and the Greek ones to de- ˆ ˆ ˆ ics process. Such connection was identified for Hamilto- note the corresponding Pauli matricesα ˆi = {Xi, Yi, Zi} nian dynamics [11] and subsequently for simplified mod- For a pair of operators as in Eq. (S8) there exist an els of random circuits [23, 24] where uniformly random analog of scalar product – a spherically symmetric com- two-qubit gate was assumed. In practice, we implement bination that does not vanish after averaging, a specific gate set consisting of “cycles” of the form ˆ † † ˆ Q ˆ ˆ ˆ αˆi ⊗ βj = Gˆ αˆiGˆ ⊗ Gˆ βjGˆ = δαβδijBi, (S11) hiji GiGjUij(θ, φ) applied to non-overlapping pairs of 25 where we introduced “bond” notation, B. Efficient Population Dynamics for the Averaged OTOC 1 1   B ≡ αˆ ⊗ αˆ ≡ Xˆ ⊗2 + Y ⊗2 + Z⊗2 , (S12) i 3 i i 3 i i i We expand average evolution of the pair of butterfly operators (S8) as,

n and the summation over the repeated Greek indices is  ⊗2 ˆ(2) X O ˆ ˆ always implied. O (t) = P{vi} viBi + ui11i . (S14) i=1 Analogously one can find non-vanishing averages of the {vi} Pauli matrices acting in four replicas space of the same where normalization condition reads, qubit (we will omit the qubit label). There are in total six bilinear combinations ui + vi = 1, (S15)

the variable vi = {0, 1}, i = 1, 2, ..., n indicates whether 1 ˆ (12) (12) or not Pauli matrices occupy qubit i, and P{v } are form- αˆ ⊗ β ⊗ 1ˆ1 ⊗ 1ˆ1 = δαβB , B = αˆ ⊗ αˆ ⊗ 1ˆ1 ⊗ 1ˆ1, i 3 factors. In other words, each ith is characterized either (0) ˆ (13) (12) 1 by ρ ≡ 11⊗2, ”vacuum”, or the “bond” B . All other αˆ ⊗ 1ˆ1 ⊗ β ⊗ 1ˆ1 = δαβB , B = αˆ ⊗ 1ˆ1 ⊗ αˆ ⊗ 1ˆ1, i i i 3 terms in the right hand side of Eq. (S14) vanish upon . averaging over single qubit gates see Sec.IVA. . Application of a two-qubit Sycamore gate to a pair of ˆ (34) (34) 1 qubits {i, j} is then described by 22 × 22 matrix in the 1ˆ1 ⊗ 1ˆ1 ⊗ αˆ ⊗ β = δαβB , B = 1ˆ1 ⊗ 1ˆ1 ⊗ αˆ ⊗ α.ˆ 3 space {vi, vj}, (S13a)   Four cubic invariants are possible because there is no 1 0 0 0  0 1 − a − b a b  inversion operation for the spin: Ω =   , (S16)  0 a 1 − a − b b  0 b b 1 − 2 b αβγ 3 3 3 1ˆ1 ⊗ αˆ ⊗ βˆ ⊗ γˆ = αβγ C(1), C(1) ≡ 1ˆ1 ⊗ αˆ ⊗ βˆ ⊗ γ,ˆ 1 6 a = 2 sin2 θ + sin4 θ , αβγ 3 ˆ αβγ (2) (2)  ˆ   αˆ ⊗ 1ˆ1 ⊗ β ⊗ γˆ =  C , C ≡ αˆ ⊗ 1ˆ1 ⊗ β ⊗ γ,ˆ 1 1 2 2 2  6 b = sin 2θ + 2 sin θ + cos θ . 3 2 . . where the matrix Ω acts from the right on four di- αβγ ˆ ˆ αβγ (4) (4)  ˆ ˆ mensional row vector√ with the basis (00), (01), (10), (11) αˆ ⊗ β ⊗ γˆ ⊗ 11 =  C , C ≡ αˆ ⊗ β ⊗ γˆ ⊗ 11. π 6 The standard iSWAP gate corresponds to θ = 4 , (S13b) 1 1 π 1 2 a = 12 , b = 2 , and iSWAP θ = 2 with a = 3 , b = 3 . where αβγ is the three-dimensional Levi-Civita symbol. Each time when the two-qubit gate is applied the form- Because they would change sign under the inversion op- factor P is updated according to the rules eration, the cubic invariants can appear only in pairs on X neighboring qubits. P (t + 1) = P 0 0 (t)Ω 0 0 . v1...vivj ...vn v1...vivj ...vn vivj ,vivj 0 0 Finally, the three quartic invariants are vivj (S17) ˆ ˆ (2) (3) (4) αˆ ⊗ β ⊗ γˆ ⊗ δ = D Υαβγδ + D Υαγβδ + D Υαγδβ, Equations (S16)–(S17) are obtained by an application of 1 1 a two-qubit gate (S7) with φ = 0 to a pair (i, j) of factors Υαβγδ = δαβδγδ − δαγ δβδ − δαδδβγ , in Eq. (S14) and averaging the result using Eqs. (S10)– 4 4 2 (S11). D(2) = αˆ ⊗ αˆ ⊗ βˆ ⊗ β,ˆ Some additional constraints can be extracted from the 15 exact condition (3) 2 D = αˆ ⊗ βˆ ⊗ αˆ ⊗ β,ˆ n 15 h i2 O Oˆ(t) = 1ˆ1n. (S18) (4) 2 D = αˆ ⊗ βˆ ⊗ βˆ ⊗ α.ˆ i=1 15 (S13c) ˆ⊗2 ˆ Convoluting operators in Eq. (S14), using 11i → 11i, This simple form implies the spherical symmetry of the Bˆi → 1ˆ1i and the normalization condition (S15), we ob- single qubit averaging. For a lower symmetry (e.g. all the tain the requirement rotations of a cube) other quartic and cubic invariants are ⊗4 ⊗4 ⊗4 ˆ X possible (like X + Y + Z or |αβγ |11 ⊗ α ⊗ β ⊗ γ) P{vi}(t) = 1. (S19) but we will ignore them for the sake of simplicity. {vi} 26

Preserving this condition in the update rule (S17) re- Eq. (S14), we write for Oˆ(4) of Eq. (S9) quires the elements in each row of the matrix Ω from Eq. (S16) to add up to one. Moreover, all the elements of Ω are non-negative and therefore not only is P{vi}(t) n normalized but it is also non-negative. ˆ(4) X O ˆ O (t) = P{V } Vi · Qi, (S20) Therefore, rules (S17) and (S19) defines the Markov i {Vi} i=1 process and variables vi = 0, 1 correspond to the clas- sical population of each qubit. As the non-populated states vi = vj = 0 do not evolve and the reproducing (01) → (11) and destruction (11) → (01) processes are al- where Qˆi is the vector with 14 = 1 + 6 + 4 + 3 operator lowed the problem (S17) is nothing but the classical pop- components given by invariants of Eq. (S9): 0 0  ulation dynamics. The formfactors P v1...vivj...vn, t are interpreted as a distribution function over an n bit register {v ...v }, subject to the Markov process defined 1 n ˆ hˆ⊗4 i by the update, Eq. (S16). Qi = 11i , b, c, d . Direct solution of Eq. (S17) would require 2n real num-  ˆ(12) ˆ(34) bers. However, unlike the original unitary evolution, the b = Bi , ..., Bi , (S21) classical population dynamics involves only positive num-  ˆ(1) ˆ(4) bers constrained by normalization (S19). Such dynamics c = Ci , ..., Ci , is very efficiently stimulated using a classical Monte Carlo   d = Dˆ(2), Dˆ(3), Dˆ(4) , type algorithm. i i i The butterfly operator is the starting point of the Markov process Eq. (S17). At long times the distribu- Nn (erg) tion P{vi} converges to the stationary state i=1 ρi , and Vi are the 14 basis unit vectors so that 13 compo- (erg) 1 ⊗2  nents equal to zero and the remaining component is 1. where ρi = 4 11i + 3Bi , which corresponds to ”vacuum” occurring on each site i with probability In other words, each site can be in 14 possible states. p 11⊗2 = 1 and ”bond” occurring with probability i 4 A straightforward generalization of the evolution equa- p (Bi) = 3/4. OTOC in this limit takes the value cor- responding to the random matrix statistics [68]. Inter- tion (S17) reads mediate dynamics of OTOC between these two limits is fully described by the Markov process Eq. (S17). It has a form of shock wave spreading from the initial butterfly X PV ·V V ·V (t + 1) = PV ·V0 V0 ·V (t)ΩV0 V0 ,V V , operator to cover the whole system. Note that the choice 1 i j n 1 i j n i j i j V0 V0 of the two qubit gate parameter θ has a dramatic effect i j on the intermediate OTOC dynamics. As discussed in (S22) the main text θ = π/2 corresponds to the probability of butterfly operator to spread equal one, and therefore spreading with maximum velocity equal to the light cone where Ω is now 142 × 142 whose explicit form is known velocity and saturating the Lieb-Robinson bound. At but not quite important for the further consideration. any other value of θ probability to spread is less then one which results in diffusive broadening of the front, and the Equation (S22) involves 14n real numbers. The only center of the front propagates with a butterfly velocity feasible path to the solution would be an efficient Monte that is smaller than the light cone velocity. Carlo sampling. Naively, one can hope to map the prob- lem to the multicolored population dynamics. However, it is not possible as we explain below. C. Sign Problem in the Population Dynamics for OTOC Fluctuations Reliable Monte Carlo sampling requires (1) normaliz- able and non-negative weights, and (2) absence of the cor- As we already mentioned, the analytic calculation of relations in contribution of different configurations. Let the OTOC for an individual circuit is impossible. One us demonstrate that both conditions do not hold for the can expect, however, that it is possible to express the evolution of formfactors in Eq. (S20). Once again, we use variance of the OTOC in terms of some products of clas- exact Eq. (S18). Convoluting third and fourth replica in sical propagators similarly to the analysis of the meso- Oˆ(4)(t) we obtain scopic fluctuations in the disordered metals. The purpose of this subsection is to show that even such a modest task can not be efficiently undertaken. " n # The starting point is the expansion in terms of the (4) (2) O Oˆ (t) → Oˆ (t) ⊗ 1ˆ1, 1ˆ1 ≡ 1ˆ1i . (S23) single qubit rotations invariants (S13). Similarly to i=1 27

Convolutions of the four operators (S13) yield D. Population Dynamics for iSWAP Gate Sets Implemented in the Main Text 1 B(12) → B(3) ≡ αˆ ⊗ αˆ ⊗ 1ˆ1, 3 Circuits with θ = π/2 were used to realize Clifford as ⊗4 ⊗3 1ˆ1 , B(34) → 1ˆ1 , well as a universal ensemble. It is therefore instructive 1 to study dynamics of the specific gate sets used in the B(13), B(14) → B(2) ≡ αˆ ⊗ 1ˆ1 ⊗ α,ˆ 3 experiment in more detail. Consider conjugation of a ˆ ˆ 1 pair of Pauli operatorsα ˆiβj by the iSWAP gate S. It B(23), B(24) → B(1) ≡ 1ˆ1 ⊗ αˆ ⊗ α,ˆ ˆ 3 maps the pair onto another pair of Pauli operatorsγ ˆiδj αβγ according to the following rules, C(3), C(4) → C ≡ αˆ ⊗ βˆ ⊗ γ,ˆ (S24) 6 αˆ βˆ Xˆ 1ˆ1 Yˆ 1ˆ1 Zˆ 1ˆ1 Zˆ Xˆ Zˆ Yˆ αˆ αˆ Xˆ Yˆ C(1) → iB(1), C(2) → iB(2), i j i j i j i j i j i j i j i j S†αˆ βˆ S −Zˆ Yˆ Zˆ Xˆ 1ˆ1 Zˆ −Yˆ 1ˆ1 Xˆ 1ˆ1 αˆ αˆ Yˆ Xˆ 6 i j i j i j i j i j i j i j i j D(2) → B(3), 5 2 4i D(3) → B(3) + C, 5 5 1. Clifford Gate Set 2 4i D(4) → B(3) − C. 5 5 Clifford gate set used to obtain the data in the main ˆ text√ is√ drawn√ form√ the single qubit gate set {G} = Let us convolute both sides of Eq. (S20) using the rules { X, X−1, Y, Y −1}. Averaging over this gates set (S23)–(S24). We find of a symmetric pair of Pauli operators αi ⊗ αi reads, n   ˆ(2) ˆ X O (1) (2) (3) (4) ˆ† ˆ ˆ† ˆ O (t) ⊗ 11 = P{Vi} qi + qi + qi + qi , G αˆG ⊗ G αˆG = Ξα, (S25)

{Vi} i=1 (1) (1) ˆ⊗3 (2) (3) where we introduce the basis Ξα = {11, X, Y, Z}, qi ≡ Vi 11i + Vi Bi , ⊗3 2   (2) (7) ˆ (12) (13) (14) (3) 1 ⊗2 ⊗2 q ≡ V 11i + 3V + V + V B , = (Zˆ + Yˆ ), i i 5 i i i i X 2 (3)  (3) (4) (8) (1) q ≡ V + V + iV B 1 ⊗2 ⊗2 i i i i i Y = (Xˆ + Zˆ ),   2 + V (5) + V (6) + iV (9) B(2), 1 i i i i = (Xˆ ⊗2 + Yˆ ⊗2). Z 2  4i 4i  q(4) ≡ V (10) + V (11) + V (13) − V (14) C . i i i 5 i 5 i i Single qubit gate average can be described in terms of (S) the invariants Ξα → M Ξβ, (·) αβ Here, the superscript in vector Vi enumerates compo- nents according to Eq. (S21). The total result is real as  1 0 0 0  the imaginary terms are always generated in pairs. 1 1 (1) (2) (3,..,14) (S)  0 0 2 2  M = 1 1 . (S26) Configurations with Vi = ui, Vi = vi, Vi =    0 2 0 2  0, exactly reproduce averaged result (S14). It means 0 1 1 0 that all the other terms exactly cancel each other. It is 2 2 possible only if the corresponding formfactors can be of In this new basis {Ξγ Ξδ} instead of a 4 × 4 matrix Ω both signs. Therefore, any finite unconstrained sampling in Eq. (S16) effect of iSWAP gate is described by 16 × leads to an arbitrary result (sign problem). Moreover, 16 matrix constructed straightforwardly from the rules the constraints are non-local. Consider, e.g. cancellation (1) (1) Eqs. (S26) and the rules for iSWAP. of contribution proportional to Bi Bj . Cancellation occurs only if the formfactors different by replacement V (8)V (8) → V (a)V (b), a, b = {3, 4} must be kept precisely i j i j 2. Universal Gate Set the same. The requirement quickly becomes intractable with the increasing of the number of non-trivial matrices involved into cancellation. The universal gates set used in the main text con- ˆ To summarize, the general requirements on the evolu- sists√ of√ eight choices√ √ of single qubit gates {G} = tion of the Oˆ(4)(t) lead to the sign and locality problems. { X±1, Y ±1, W ±1, V ±1}. For the butterfly op- Those two facts render a brute force classical Monte Carlo erator we choose Oˆi = Xˆi the dynamics is described algorithm impossible. At present, we are not aware of any in the reduced subspace spanned by the basis Λ = algorithm enabling to circumvent those obstacles. {11⊗2, Xˆ ⊗2, Yˆ ⊗2, Zˆ⊗2}. Average of a pair of Pauli op- 28

1.0 1.0

0.8 0.8

0.6

0.6 ഥ

C 0.4 Fidelity OTOC, OTOC, 0.2 0.4

0.0 0.2

-0.2

0.0 -0.4 0 2 4 6 8 10 12 14 0 5 10 15 20 25 Cycles Cycles

FIG. S19. Extended data for Fig. 2 in the main text: Average FIG. S20. Average fidelity of OTOC, i.e. the circuit with OTOC for four different locations of butterfly operator, X. no butterfly applied shown as red line. OTOC fidelity from Solid lines correspond to experimental data and dashed lines Monte Carlo simulation of noisy population dynamics Sec.V. to the population dynamics simulation with Eq. (S27). The Pauli error rate rp = 0.013 is determined by fitting the numerics to the experimental data. This value is within 10% of the gate fidelity determined in a separate experiment benchmarking two-qubit gates, see Sec.II. This is the single (U) parameter of the model which is used to reproduce the rest erators over this ensemble reads Λα → M Λβ, αβ of the OTOC data for 51 different locations of butterfly op- erator on the chip. Black dashed line shows a naive estimate   of circuit fidelity via product of fidelities of gates within the 1 0 0 0 light cone. Note that naive fidelity decays much faster with 3 1 1 (U)  0 8 8 2  time than OTOC fidelity. M =  1 3 1  , (S27)  0 8 8 2  1 1 0 2 2 0 dynamics process in the following way,

 2 2  Together with the rules for iSWAP gate it is straight- cos φ 0 0 sin φ forward to generate 16 × 16 matrix defining the Markov  0 1 − a˜ − b a˜ b  Ω =   , population dynamics process. Comparison of this noise-  0a ˜ 1 − a˜ − b b  sin2 φ b b 8+cos2 φ 2 free population dynamics prediction for several different 9 3 3 9 − 3 b circuits with different locations of the butterfly operator 1 a˜ = cos2 φ sin4 θ + sin2 φ cos4 θ . X throughout the 53 qubit chip implemented experimen- 3 tally is shown in Fig. S19.

B. Generic Error Model

For an ensemble that satisfies Eq. (S10) the effect of V. EFFICIENT POPULATION DYNAMICS FOR relaxation and dephasing in the quantum processor can NOISY CIRCUITS be captured by a one and two qubit depolarizing channel noise model,

p1 X A. Inversion Error ρ → (1 − p )ρ + αρˆ α,ˆ (S28) 1 3 αˆ=X,ˆ Y,ˆ Zˆ The circuit to measure OTOC require inversion of p2 X ˆ ˆ ρ → (1 − p2)ρ + αˆβραˆβ. (S29) gates Uˆi,j(θ, φ) which is not perfect. An important source 15 α,ˆ βˆ of error is non-invertible phase φ such that instead of ˆ † ˆ Ui,j(θ, φ) the gate Ui,j(−θ, φ) is implemented. This in- In this case the Markov process Eq. (S16) can be modified version error can be included in the Markov population in a straight forward way. For a single qubit depolarizing 29

1.0 1.0

0.8 0.8

0.6

ഥ C

0.6 Fidelity

0.4 OTOC,

0.4 0.2

0.2 0.0

0 2 4 6 8 10 12 14 0 5 10 15 20 25 Cycles Cycles

FIG. S21. OTOC circuits (with no butterfly applied) for uni- FIG. S22. Extended data for Fig. 2 of the main text: Nor- versal iSWAP gate set. Solid red line shows experimental malized OTOC for four different locations of butterfly oper- data and dashed blue line is the theoretical fit with a single ator Z. Solid lines show experimental data, and dashed lines parameter, Pauli error rate. The extracted Pauli error rate correspond to the noisy population dynamics. is rp = 0.012 within 20% of the value obtained via two-qubit gate benchmarking. OTOC predicted in this way with experimental data is shown in Fig. S22. This procedure introduces substan- channel the Markov process Eq. (S16) is supplemented by tial noise-dependent bias into the observed OTOC values, the exponential decay rate as follows, which is illustrated in Fig. S23. ⊗2 ⊗2 11i → 11i , (S30)

−4p1/3 Bi → e Bi, (S31) C. Error Mitigation for Average OTOC for each bond per gate cycle. Two-qubit depolarizing We estimate circuit fidelity from the circuit shown in, channel noise is accounted by supplementing the Markov Fig. 1 A, B, without the butterfly operator. This fidelity process by the following, estimate is used for the error mitigation procedure aplied to the data in Fig. 2. In the error-free circuit absence of 11⊗211⊗2 → 11⊗211⊗2, (S32) i j i j the butterfly operator means that U and U † cancel each 16 ⊗2 − 15 p2 ⊗2 Bi11j → e Bi11j , (S33) other exactly resulting in Cz0(t) = 1. In practice, in- 16 version is imperfect as detailed in Sec.VA. Both errors − p2 BiBj → e 15 BiBj. (S34) in the unitary parameters and the effect of noise are re- flected in the time dependence of C (t) < 1 which serves Note that the effect of noise on the Markov population z0 as circuit fidelity, an analog of Loschmidt echo for local dynamics cannot be described by the global depolarizing operator Mˆ . channel model that is often conjectured for ergodic cir- Average over circuits reads, cuits. Instead the time dependence of OTOC will demon- strate characteristic time dependence that can be used to ⊗2 C0z = F + FB1 , (S35) verify the experimental data. 111 The described procedure allows us to verify the ex- 1 ⊗2 Czz = F − FB1 . (S36) perimental results for average fidelity of OTOC circuits, 111 3 the circuit with no butterfly applied, by direct compari- ⊗2 where the probabilities of vacuum F and bond FB1 at son to the noisy population dynamics, see Fig. S20 and 111 Fig. S21. We use the two-qubit Pauli error as a fitting the measurement site are described by the population dy- parameter. Best fit corresponds to errors within 10% of namics Eqs. (S16) with respective decay rates Eqs. (S30- the average error of two qubit cycle measured indepen- S34). Note that the decay rate grows with the extent of dently. We use the extracted error to predict values of spreading of the butterfly operator, resulting in the de- OTOC for every position of the butterfly with no addi- cay of fidelity that is not a simple exponent. Moreover, ⊗2 tional fitting. Comparison of the values of normalized in general F , FB1 is not a simple probability no error 111 30

expanded into Pauli strings Bi, 1.0 X O(t) = wα1...αn Bα1...αn ,

0.8 Bα1...αn =α ˆ1 ⊗ ... ⊗ αˆn For the initial state used in the experimental protocol |ψi = N |+i , the value of OTOC is expanded as,

0.6 i

ഥ C X z α z β C = wαα2...αn wβα2...αn h+1| σ1 σ1 σ1 σ1 |+i1. (S37)

OTOC, OTOC, 0.4 The real part of the OTOC corresponds to α = β, X ReC = w2 κ , (S38) 0.2 α1...αn α1

where κα = {1, −1, −1, 1} for α = {0, x, y, z}. 0.0 For Clifford circuits ideal value of OTOC can be calcu- 0 5 10 15 20 25 lated efficiently. It can then be used to calculate circuit Cycles fidelity by comparing data with the expected value. The fidelity of the OTOC in presence of non-Clifford gates is FIG. S23. Comparison of noisy OTOC obtained by normal- then calculated by sampling a subset of OTOCs for Clif- ization procedure, dashed lines, to noise-free population dy- fords that appear in its expansion, see Eq. (S37). The namics for the same circuit, solid lines. fidelity is calculated by averaging over fidelities of in- dividual Clifford contributions. Averaging reduces the circuit specific effect of noise and gives a more accurate estimate of fidelity. occurred, as one could naively expect, see Fig. S20 for comparison.

The ratio of Czz/C0z is compared to normalized data VI. NUMERICAL SIMULATION OF in Fig. S22. Note that this ratio does not correspond to ERROR-LIMITING MECHANISMS FOR OTOC the noise free OTOC, Fig. S23. This is because the proba- bility of vacuum at measure site F ⊗2 decays slower with 111 ⊗2 respect to its noise free value p(111 ) than the probability In this section, we provide numerical simulation re- sults aimed at identifying the potential error sources that of the bond FB1 . The latter corresponds to the weight of the operators which span the distance between mea- limit our experimental accuracy in resolving OTOCs. As sure and butterfly qubit which are more susceptible to demonstrated in SectionI, shot noise from finite statis- noise. As a result normalized OTOC from noisy circuits tical sampling is unlikely the dominant mechanism. The overestimates the value of noise-free OTOC. For individ- remaining known error channels are: 1. Incoherent, de- ual circuits such a simple description is no longer valid, polarizing noise in the quantum circuits, which can arise as demonstrated by the dependence of fidelity estimate from qubit dephasing or relaxation. 2. Coherent errors in Cz0(t) on the circuit instance, see Sec.VI. In many cases the quantum gates, e.g. the remnant conditional-phase φ circuit dependent corrections are relatively small and the demonstrated in SectionII. We study each of these errors normalization procedure still works for individual circuits below. as well. Nonetheless, for our OTOC fluctuations data we develop a more efficient error mitigation procedure de- scribed in the following section. A. Coherent and Incoherent Contributions

We first describe how OTOC error from depolarizing noise may be simulated. We consider a depolarizing chan- D. Theory of Error Mitigation for Individual nel model parameterized by an error probability p, Circuits I E(ρ) = (1 − p)ρ + p , (S39) d We use a more precise error mitigation procedure for individual OTOC measurements presented in Figs. 3, 4 where d = 2n is the dimension, and I the identity opera- of the main text. These circuits contain iSWAP entan- tor. The Kraus operators of this map are all Pauli strings p gling gate and OTOC value can be expanded in terms of of length n, where each non-trivial string has weight d2 . contributions from Clifford circuits. In presence of non- We consider a model where after each two qubit gate Clifford gates the butterfly operator can be conveniently (these typically dominate the loss in fidelity compared to 31

 (rad) 100 A B 0.00 0.05 0.10 Ns=178; ND=45 Ns=218; ND=55 0.03 -1  Error 10

0.1 rp Error ε -2 Ideal Signal Size 10 0.02 Error, Error, 10-3 0.0 Order 0. ε ∝ p1.1 Order 0. ε ∝ p1.1 C Order 1. ε ∝ p2.0 Order 1. ε ∝ p2.1 10-4 0.01 -2 OTOC Error OTOC 10-3 10-2 10-3 10 100 -0.1 Ns=278; ND=70 0.00 10-1 0 10 20 30 40 0.00 0.005 0.010 0.015 Circuit Instance rp

-2 ε 10 FIG. S24. Simulation of incoherent and coherent OTOC errors. (A) OTOCs, C, of 47 quantum circuit instances Error, 10-3 simulated in three different ways: with φ error, where a Order 0. ε ∝ p1.0 conditional-phase of 0.136 rad is added to every two-qubit Order 1. ε ∝ p2.0 10-4 gate; with r error, where a Pauli error of 0.015 is added p -3 -2 to each two-qubit gate; ideal, where no error is introduced. 10 Prob., p 10 The qubit configuration used here is a 1D chain of 10 qubits wherein Q1 and Qb reside at opposite ends of the chain. The FIG. S25. Relative error  of the 0’th and 1’st order error two-qubit gate is iSWAP and ND = 12 non-Clifford gates are approximation (from Eq. (S40)) calculated by a pure state used for each instance. The number of circuit cycles is 34 simulation, comparing to the exact value from a density ma- (NS = 250). (B) The scaling of OTOC error (i.e. RMS de- trix calculation, as a function of depolarizing probability p viation from the ideal OTOC values) against φ (red) and rp (Eq.(S39)). Here we compute Czx, varying the number of (blue). Dashed line shows the size of the OTOC signal (i.e. iSWAP (Ns) and non-Clifford gates (ND), with 11 qubits on RMS value of the ideal OTOCs). a chain. The dots are the median over 48 circuits with lines of best fit given by the legend. The shaded region is the middle 50% of the data. The scaling of the error is close to  ∝ p2 as single-qubit gates), a two-qubit depolarizing channel is expected for the one-error approximation in all three cases. applied. In Fig. S24A, we show a number of instance-dependent

OTOC values C simulated using full density matrix cal- improved as we decrease both φ and rp. We see that culations and an experimentally measured Pauli error the OTOC error is linearly proportional to rp whereas rate rp of 0.015 (rp = 15p/16 in Eq. (S39), due to the its scaling is steeper against φ. In particular, reducing trivial Pauli string in the Kraus map from the identity φ by a factor of 2 leads to an OTOC error that is 3 matrix). Here we have used a smaller number of qubits times lower. Although these results may depend on the due to the high cost of density-matrix simulation. To number of qubits and the structure of circuits, their sim- match with experiment, we have adjusted the number ilarity to experimental data nevertheless provides hints of circuit cycles to yield a total number of iSWAP gates that reducing the conditional-phases in our iSWAP gates close to those shown in Fig. 4 of the main text. The can potentially lead to large improvements in the OTOC same normalization protocol as the experiment is also accuracy. This can, for example, be achieved through adopted, such that hσˆyi is simulated with and without concatenated pulses demonstrated in Ref. [48]. the butterfly operator and their ratio is recorded as C. The results, compared to the ideal OTOC values also plotted in Fig. S24B, show little deviation. Next, we simulate OTOCs of the same circuits with rp = 0 but an experimentally measured conditional- B. Perturbative Expansion of OTOC Error phase of φ = 0.136 rad on each iSWAP gate and its inverse. The results are also plotted in Fig. S24A and The exact density-matrix calculation used in the pre- seen to deviate much more from the ideal values. To ceding section is costly to implement and becomes quantify these observations, we have plotted the OTOC quickly intractable as the number of qubits increase. error as a function of both rp and φ in Fig. S24B. Here One can more systematically estimate the instance spe- we see that at the experimental limit (rp = 0.015 and cific noise contribution from a perturbation theory ex- φ = 0.136 rad), the OTOC error is dominated by the pansion of the quantum map Eq. (S39), with an en- contribution from φ (where it is ∼0.03) rather than rp tirely pure state calculation. Let us adopt the notation (where it is ∼0.006). Interestingly, the SNR is about 0.9 σ(i,j) = σ(i) σ(j) , where σ(i) is the m’th Pauli operator for φ = 0.136 rad, which is close to the value measured m1,m2 m1 m2 m (m ∈ {0, x, y, z}) applied on qubit i. We will also call in Fig. 4 of the main text for NS ≈ 250. (i,j) Figure S24B also provides preliminary indication of C[σm1,m2 (d)] the OTOC value with the additional ‘error (i,j) how the OTOC accuracy for our experiments may be gate’ σm1,m2 inserted at layer d in the circuit. Then, to 32 accuracy O(p2) one has there is an equivalent one in U. Of course one can con- tinue this expansion to arbitrary order, however the num- ber of terms quickly becomes infeasible to compute (the n2 Cp = (1 − p) Cideal k’th order contains 15 × n2 terms). p k n2−1 X (i,j) In Fig. S25 we show the error scaling (compar- + (1 − p) C[σm ,m (d)], (S40) 15 1 2 ing to an exact density matrix computation) for the (m1,m2)6=(0,0), hi,ji,d zero’th (i.e. only the first term in Eq. (S40)) and first order approximation for Czx, giving scaling in the error as expected; the error  is computed as where Cideal is the OTOC value in the limit of no noise, |Cp(exact) − Cp(approx)|/|Cp(exact)|, where Cp(approx) Cp the first order approximation of Cideal in parameter p, is from Eq. (S40), and Cp(exact) the value from the den- and n2 the total number of two-qubit gates in the circuit, sity matrix calculation with noise rate p. Moreover, the occurring over pairs defined by hi, ji (i.e. Eq. (S40) is a accuracy of the approximation remains fairly consistent, sum over all error terms in the circuit). For convenience, for a fixed noise level (p) for the three different circuit we have also redefined p → 15p/16 from Eq. (S39) due sizes (using a similar number of iSWAP gates as in the to the trivial contribution from the all zero Pauli string. main text). Eq. (S40) can be used to separate out the ideal OTOC Here we have outlined a protocol of simulating the in- value of an individual circuit, from the noise contribution. stance specific error contribution to the OTOC. This can This noise contribution can be computed using 15 × n2 be used to more accurately separate the instance specific circuit simulations (inserting each Pauli pair at all two- OTOC value, systematic errors, and contributions from qubit gate locations in the circuit), though in some cases gate noise. Of course, the overhead in the simulation is symmetries can be used to reduce this. For example, for a factor of 15n2, which can itself become challenging in the normalization curve Cz0, only around half of this is deep enough circuits, although statistical sampling may required, since for each error in the reverse circuit U †, be feasible in some cases.