<<

arXiv:1601.07458v2 [quant-ph] 23 Aug 2016 oepoue oipeettePrnmrclyi described is numerically PTr the implement to produced code eane fteatcei tutrda olw.I Sec. In follows. as structured is article the of remainder urn eerhi ayaeso cec.Bti sas mot a also is it But science. [ of computer quantum areas part scale many elementary in of research number current the with exponentially increases ∗ avoi by simply obtained i are via which firstly and optimization trace, uniqu of partial the levels the two discuss of and calculation equivalence, numerical their the verify system, a of [ results related and entropy ex [ and Neumann decoherence marginal von quantum the of in property [ ingredient matrices fundamental density a random also of generation the [ in information trace (mutual partial quantifiers [ correlation the established of of well comput interpretation is the physical meaning for operational the instance, of for spite place, p unique In as a much has as function time PTr o computation to the like reduce would to we order purpose in that functions for [ And thermody particles the possible. of in practically number accessed moderate usually a behavior, with many-body the [ of states pair entangled [ tions xrc prxmt nomto bu unu ytm usi systems quantum about information approximate extract lcrncaddress: Electronic hncluaigcranfntoso unu ytm,i m in systems, quantum of functions certain calculating When nteqatmmcaiso opst ytm,oeubiquito one systems, composite of mechanics quantum the In u i eei oeaiei eal h ata rc functi trace partial the details in examine to is here aim Our mn hs ehd,sm aoseape r:sohsi M stochastic are: examples famous some methods, those Among I C III 8 – 11 .I Sec. In ). ,dniyfntoa hoy[ theory functional density ], nvriaeFdrld at ai,AeiaRria1000, Roraima Avenida Maria, Santa de Federal Universidade n us htcncnieal eueteos o instance For ops. the orde reduce in considerably developments can analytical what o sums, some it and doing eleme compute worthwhile of to is number implementations, it the alternative In analyze possible and systems. some function composite this of of mechanics description quantum the in procedure in elMn matrices Mann t Gell partial tion, systems, composite mechanics, quantum with Keywords: numerically parametrization Bloch’s it traces via implement partial matrices to density of provided reduced computation code the Fortran regard describe we sequence, the In dimensions ento ple to applied definition 60 aigpriltae o optn eue est matric density reduced computing for traces partial Taking – 63 [email protected] IV ]. optn ata rcsadrdcddniymatrices density reduced and traces partial Computing h aclto ftePrfrtegnrlcs fmultipart of case general the for PTr the of calculation the , 20 2 d nttt eFsc,Fcla eIgneí,Uiesddd Universidad Ingeniería, de Facultad Física, de Instituto a – 1 23 dim = – 1 3 eatmnod íia etod inisNtri Exata e Naturais Ciências de Centro Física, de Departamento .O h te ad eetysvrlatoshv hw tha shown have authors several recently hand, other the On ]. .Frnw ehv orsr osvrlohratraietec alternative other several to resort to have we now, For ]. .HreayRisg55 10,Mneie,Uruguay Montevideo, 11300, 565, Reissig y Herrera J. H H b a requires and 24 12 32 – – d 31 15 b – 35 dim = .I uhkn fivsiain ti eial oueasyst a use to desirable is it investigation, of kind such In ]. O ,rnraiaingop[ group renormalization ], ( .Bsds h T per eyfeunl,frisac,i instance, for frequently, very appears PTr the Besides, ]. .INTRODUCTION I. 47 d 38 a 6 oa Maziero Jonas , d , b 6 H 48 39 ) b ,adi netgtosrgrigpaetastos[ transitions phase regarding investigations in and ], p,isotmzdipeetto entails implementation optimized its ops, ,qatmetnlmn [ entanglement quantum ], n for and II 57 sdrc mlmnain(Sec. implementation direct ts bipartition involving PTr the for definitions two present we o en rva atr[ matter trivial a being not ns ftepriltaefnto.I Sec. In function. trace partial the of eness – htcmoetoesses hsisei udeto hurdle a is issue This systems. those compose that s tmz h mlmnaino ai n rqetyused frequently and basic of implementation the ptimize 59 gol h vial lsia optn power. computing classical available the only ng to frdcddniymtie n eae functions. related and matrices density reduced of ation d igmkn ulmlilctosadsm (Secs. sums and multiplications null making ding ecnie h optto ftepriltaevia trace partial the of computation the consider We . a nwt pca ou nisnmrclcluain The calculation. numerical its on focus special with on aia ii,myb icoe yaayigsystems analyzing by disclosed be may limit, namical ,adi h hoiso unu esrmn and measurement quantum of theories the in and ], eso rbes[ problems tension ae eue est arx lc parametriza- Bloch matrix, density reduced race, d , ,2, 1, n ntne h unn ieo lsia computers classical of time running the instances any v o h us oad h osrcino large a of construction the towards quest the for ive ossible. b sfnto stepriltae(T)[ (PTr) trace partial the is function us neCrosmltos[ simulations Carlo onte ≫ ∗ eeaie elMn’ matrices. Mann’s Gell generalized lsia optr sw notice, we As computer. classical a n oaodmkn ulmultiplications null making avoid to r s rrltdfntos saubiquitous a is functions, related or es, eas osdrtecluainof calculation the consider also We . 16 tr prtos(p)nee,under needed, (ops) operations ntary o iatt system bipartite a for , 1 o eea utpriessesand systems multipartite general for hl ietueo ata trace partial of use direct a while , 70-0,SnaMra S Brazil RS, Maria, Santa 97105-900, – hsatce epeetathorough a present we article, this 19 ,admti rdc ttsadprojected and states product matrix and ], 40 aRepública, la e t ytm srgre;adFortran and regarded; is systems ite – 51 43 – ,qatmdsod[ discord quantum ], 56 36 ,frtesrn subadditivity strong the for ], s, nqe,wt hc n can one which with hniques, , 4 I A III 37 – H 7 O ,i’ ahmtcland mathematical it’s ], oegnrlpatterns general some t ,ma edapproxima- field mean ], a ( n fewrsusing afterwards and ) d H ⊗ a 2 d b b ) with ops. III ma ag as large as em h context the n 49 32 44 eaddress we , – – 35 46 50 .The ]. ,etc), ], .I is It ]. I B III s 2

Bloch’s parametrization with generalized Gell Mann’s matrices in Sec. V. A brief summary of the article is included in Sec. VI.

II. PARTIAL TRACES FOR BI-PARTITIONS

Let O be a linear operator defined in the Hilbert space H, that is to say O : H → H. The space composed by these operators is denoted by L(H). As a prelude, let us recall that the trace of O is a map Tr : L(H) → C defined as the d sum of the diagonal elements of O when it is represented in a certain basis |ψj i ∈ H, i.e., Tr(O) = j=1hψj |O|ψj i, with d being the dimension of H [64]. P By its turn, in the quantum mechanics of composite systems with Hilbert space H = Ha ⊗ Hb, the partial trace function, taken over sub-system b, can be defined as [65]

db Trb(O)= (Ia ⊗ hbj |)O(Ia ⊗ |bj i), (1) j=1 X with

t |bji = [bj1 bj2 ··· bjdb ] (2)

† t being any orthonormal basis for Hb, hbj | = |bji , db = dim Hb, and Ib is the identity operator in Hb (X denotes the transpose of X and X† stands for its conjugate transpose). So the partial trace is a map

Trb : L(H) → L(Ha); (3) and the analogous definition follows for Tra : L(H) → L(Hb). It is worthwhile observing here that the definition above is equivalent to another definition which appears frequently in the literature [32]:

′ ′ ′ ′ Trb(|aiha | ⊗ |bihb |)= |aiha |⊗ Trb(|bihb |). (4)

′ ′ In the last equation |ai, |a i ∈ Ha and |bi, |b i ∈ Hb are generic vectors in the corresponding Hilbert spaces. In order to verify this assertion, let us use two basis |aji ∈ Ha and |bki ∈ Hb and the related completeness relations to write

da db O = (Ia ⊗ Ib)O(Ia ⊗ Ib)= (haj | ⊗ hbk|O|ali ⊗ |bmi)|aj ihal| ⊗ |bkihbm|. (5) j,l k,m X=1 X=1

db The linearity of the partial trace and Trb(|bkihbm|)= l=1hbl|bkihbm|bli = δlkδml (we applied the base independence of the trace function) lead to P

da db db da da Trb(O)= |aj i(haj | ⊗ hbk|)O(|ali ⊗ |bki)hal| = ( |aj ihaj |) ⊗ hbk|O( |alihal|) ⊗ |bki, (6) j,l k k j=1 l X=1 X=1 X=1 X X=1 which is equivalent to the definition in Eq. (1). To obtain the last equality in Eq. (6), we verified that

(|ai ⊗ |bi)ha′| = |aiha′| ⊗ |bi (7)

′ for any vectors |ai, |a i ∈ Ha and |bi ∈ Hb. Another important fact about the the partial trace function Trb(O) is that it is the only function f : L(Ha ⊗ Hb) → L(Ha) such that Trab(A ⊗ IbO) = Tra(Af(O)), for generic linear operators A ∈ L(Ha) and O ∈ L(Ha ⊗ Hb). To prove this assertion, let us start assuming that f(O) = Trb(O). Then, using O as written in Eq. (5) leads to

Trb(O)= (hak| ⊗ hbl|O|ami ⊗ |bni)|akiham|⊗ δnl = (hak| ⊗ hbl|O|ami ⊗ |bli)|akiham|. (8) k,l,m,n k,l,m X X 3

Now, utilizing the eigen-decomposition A = j aj |aj ihaj | we shall have

Tra(Af(O)) = Tra(ATrb(O)) P (9)

= Tra( aj |aj ihaj | hak| ⊗ hbl|O|ami ⊗ |bli|akiham|) (10) j k,l,m X X = aj (hak| ⊗ hbl|O|ami ⊗ |bli)Tra(|aj ihaj |akiham|) (11) j k,l,m X X =δjmδjk

= haj | ⊗ hbl|aj Ia ⊗ IbO|aj i ⊗ |bli =| haj |{z ⊗ hbl|aj } |akihak|⊗ IbO|aj i ⊗ |bli (12) j,l j,l k X X X = haj | ⊗ hbl| ak|akihak|⊗ IbO|aj i ⊗ |bli = haj | ⊗ hbl|A ⊗ IbO|aj i ⊗ |bli (13) j,l k j,l X X X = Trab(A ⊗ IbO). (14)

To complete the proof we assume that Tra(Af(O)) = Trab(A ⊗ IbO) and use a basis of linear operators Υj ∈ L(Ha) to write [34]

2 2 2 da da da † † I † f(O)= Tra(Υj f(O))Υj = Trab(Υj ⊗ bO)Υj = Tra(Υj Trb(O))Υj . (15) j=1 j=1 j=1 X X X Above, the second equality is obtained applying Tra(Af(O)) = Trab(A ⊗ IbO) and the last equivalence follows from I † Tra(ATrb(O)) = Trab(A ⊗ bO), with A =Υj . Hence the uniqueness of the decomposition of an element of a Hilbert space in a given basis [64] implies in the uniqueness of the partial trace function, i.e., f(O) ≡ Trb(O). Summing up, we proved that

f(O) = Trb(O) ⇐⇒ Trab(A ⊗ IbO) = Tra(Af(O)). (16)

III. NUMERICAL COMPUTATION OF THE PARTIAL TRACE

In this section we analyze the numerical calculation of the partial trace for bi-partite systems by first consider- ing the direct implementation of its definition and afterwards optimizing it by identifying and avoiding doing null multiplications and sums.

A. Direct implementation

Let us analyze the number of basic operations, scalar multiplications (mops) and scalar sums (sops), needed to compute the partial trace directly as given in Eq. (1). Considering that the tensor product of two matrices of dimensions mxn and oxp requires monp mops and zero sops and that the multiplication of two matrices of dimensions mxn and nxo requires mon mops and mo(n − 1) sops, we arrive at the numbers of basic operations shown in Table I.

Operation No. of mops No. of sops 2 Ia ⊗hbj | dadb 0 2 Ia ⊗ |bj i dadb 0 3 2 2 (Ia ⊗hbj |)O dadb dadb(dadb − 1) 3 2 (Ia ⊗hbj |O)(Ia ⊗ |bj i) dadb da(dadb − 1)

Table I: Number of basic operations taken by each one of the steps needed to compute the partial trace when implemented numerically directly from its definition in Eq. (1).

2 2 2 So, to calculate Eq. (1) numerically we would make use of a total of dadb (2 + da(db + 1)) mops and dadb(dadb − 1)(db + 1) sops. If db ≫ 1 then

3 3 3 mops = sops ≈ dadb = d (17) 4 would be needed, where d = dim H. As the complexity for the multiplication is, in the “worst case”, the square of that for the addition, then this ops is about O(d6).

B. (Not) Using the zeros of Ia

Now, instead of simply sending the matrices to a subroutine that computes tensor products, let’s observe that, for da ≫ 1, most matrix elements of the identity operator are null. Thus, using the notation:

[α, β : γ] = [Oα,β Oα,β+1 ··· Oα,γ−1 Oα,γ ], (18) and with |0i being the null vector in Hb, p = da, and q = db, follows that each term (Ia ⊗ hbj |)O(Ia ⊗ |bj i) in Eq. (1) is equal to:

[1, (1 − 1)q + 1 : q] [1, (2 − 1)q +1:2q] ··· [1, (p − 1)q + 1 : pq] |bj i |0i · · · |0i [2, (1 − 1)q + 1 : q] [2, (2 − 1)q +1:2q] ··· [2, (p − 1)q + 1 : pq] |0i |bji · · · |0i I ⊗ hb |     (19) a j ......  . . . .   . . . .      [pq, (1 − 1)q + 1 : q][pq, (2 − 1)q +1:2q] ··· [pq, (p − 1)q + 1 : pq]  |0i |0i · · · |bj i        

hbj | h0| ··· h0| [1, (1 − 1)q + 1 : q]|bj i [1, (2 − 1)q +1:2q]|bji · · · [1, (p − 1)q + 1 : pq]|bji h0| hbj | ··· h0| [2, (1 − 1)q + 1 : q]|bj i [2, (2 − 1)q +1:2q]|bji · · · [2, (p − 1)q + 1 : pq]|bji =     (20) ......  . . . .   . . . .       h0| h0| ··· hbj | [pq, (1 − 1)q + 1 : q]|bj i [pq, (2 − 1)q +1:2q]|bji · · · [pq, (p − 1)q + 1 : pq]|bj i         q q ∗ ∗ bjα[(1 − 1)q + α, (1 − 1)q + 1 : q]|bj i · · · bjα[(1 − 1)q + α, (p − 1)q + 1 : pq]|bj i α=1 α=1  Xq Xq  ∗ ∗   bjα[(2 − 1)q + α, (1 − 1)q + 1 : q]|bj i · · · bjα[(2 − 1)q + α, (p − 1)q + 1 : pq]|bj i =   (21) α=1 α=1  X . . X .   . .. .     q q   ∗ ∗   bjα[(p − 1)q + α, (1 − 1)q + 1 : q]|bj i · · · bjα[(p − 1)q + α, (p − 1)q + 1 : pq]|bj i   α=1 α=1  X X  q q q ∗ ∗ ∗ bjαO(1−1)q+α,(1−1)q+β bjβ bjαO(1−1)q+α,(2−1)q+β bjβ ··· bjαO(1−1)q+α,(p−1)q+β bjβ α,β=1 α,β=1 α,β=1  Xq Xq Xq  b∗ O b b∗ O b ··· b∗ O b   jα (2−1)q+α,(1−1)q+β jβ jα (2−1)q+α,(2−1)q+β jβ jα (2−1)q+α,(p−1)q+β jβ  = α,β α,β α,β  . (22)  X=1 X=1 X=1   . . . .   . . .. .     q q q   ∗ ∗ ∗   bjαO(p−1)q+α,(1−1)q+β bjβ bjαO(p−1)q+α,(2−1)q+β bjβ ··· bjαO(p−1)q+α,(p−1)q+β bjβ   α,β=1 α,β=1 α,β=1   X X X  a So, a generic matrix element of O = Trb(O) shall take the form:

db db a ∗ Okl = O(k−1)db+α,(l−1)db+β bjαbjβ. (23) α,β j=1 X=1 X 2 2 2 To compute each one of the da matrix elements above we need to do db (db + 1) mops and db(db − 1) sops. Then, on 2 2 2 2 the total dadb (db + 1) mops and dadb(db − 1) sops will be necessary. For db ≫ 1 follows that 2 3 mops = sops ≈ dadb . (24)

We notice thus a decreasing by a multiplicative factor da in ops with relation to the previous direct implementation. 5

C. (Not) Using the zeros of the computational basis

Now let us recall and verify that the partial trace is base independent and use this fact to diminish considerably the number of basic operations required for its computation. We regard the following arbitrary basis for Hb: |ji = db k=1 cjk|bki, with cjk = hj|bki. Taking the partial trace in the basis |ji,

P db db db db I I I ∗ I Trb(O) = ( a ⊗ hj|)O( a ⊗ |ji)= ( a ⊗ cjkhbk|)O( a ⊗ cjl|bli) (25) j=1 j=1 k l X X X=1 X=1 db db ∗ I I ∗ I I = cjkcjl( a ⊗ hbk|)O( a ⊗ |bli)= hj|bki hj|bli( a ⊗ hbk|)O( a ⊗ |bli) (26) j,k,l j,k,l X=1 X=1 db db = hbk|jihj|bli(Ia ⊗ hbk|)O(Ia ⊗ |bli)= hbk|bli(Ia ⊗ hbk|)O(Ia ⊗ |bli) (27) j,k,l k,l X=1 X=1 db db = δkl(Ia ⊗ hbk|)O(Ia ⊗ |bli)= (Ia ⊗ hbk|)O(Ia ⊗ |bki), (28) k,l k X=1 X=1 is then seem to be equivalent to compute the partial trace with |bj i. Hence we shall use the computational basis

t |ji = [δj1 δj2 ··· δjdb ] , (29)

(with δjk being the Kronecker’s delta function) to take partial traces, avoiding multiplying its db − 1 null elements 1 (for each |ji). This is done simply by replacing |bj i by |ji in Eq. (23) to get

db a Okl = δjαδjβ O(k−1)db+α,(l−1)db+β (30) j,α,β X=1 db

= O(k−1)db+j,(l−1)db+j . (31) j=1 X 2 Therefore, in this last implementation of the partial trace operation, we need to perform “only” da(db − 1) sops (and no mops). Or, for db ≫ 1

2 mops =0 and sops ≈ dadb. (32)

We think this is the most optimized way to calculate partial traces for bipartite systems. In the case of Hermitian a a ∗ reduced matrices (in particular for density matrices) Olk = (Okl) , and the number of basic operations needed to −1 compute the partial trace can be reduced yet by 2 da(da − 1)(db − 1) sops (which is almost half of the total when da, db ≫ 1). For the sake of illustration, it is shown in Fig. 1 the time taken to compute the partial trace via these three methods as a function of system a dimension da.

IV. PARTIAL TRACE FOR MULTI-PARTITIONS

Let us consider a multipartite system H = s Hs and a linear operator O ∈ L(H). For two arbitrary sub-systems ′ ′′ ′′ ′ ′ ′ ′′ ′′ s and s , with s >s , and bases |sj i ∈ Hs and |sj i ∈ Hs , one can verify that N ′′ ′ ′ ′′ I ′ I ′ ′′ I ′ ′′ I ′ ′′ ( s ⊗ (s +1)···(s −1) ⊗ |sk i)(|sj i⊗ (s +1)···(s −1))= |sj i⊗ (s +1)···(s −1) ⊗ |ski. (33)

1 Another, simpler way to get this result is by applying the definition in Eq. (4) to O represented as in Eq. (5), but with |bj i being a a the computational basis |ji. So, utilizing O = Trb(O) = Pk,l Pj hk| ⊗ hj|O|li⊗|ji|kihl| = Pk,l Ok,l|kihl| and the fact that only the ((l − 1)db + j)-th element of |li⊗|ji is non-null, we obtain Eq. (31). 6

250 1000

100

10

200 1

0.1

0.01

150 0.001 t(s) 0.0001 1e-05

100 1e-06 1 10 100

50

0 0 10 20 30 40 50 60 70 da

Figure 1: (color online) Time taken to compute the partial trace using Eqs. (1), (23), and (31) (from up down) as a function of system a dimension. In the inset is shown the log-log plot of the same data. We set db = da and used the maximally mixed global state in all cases. The calculations were performed using the GNU Fortran Compiler version 5.0.0 in a MacBook Air p q Processor 1.3 GHz Intel Core i5, with a 4 GB 1600 MHz DDR3 Memory. If, for da,db ≫ 1 and da = db, we have t ∝ dadb , then log t ∝ (p + q) log da. So, the increasing rate of log t with the number of quantum bits [32] constituting a in the optimized implementation of the partial trace is one fourth of that for the direct calculation.

We then use this relation to see that the equality

Trs′s′′ (O) =Trs′ (Trs′′ (O)) (34) holds for all s′ and s′′. Therefore the partial trace taken over any set of sub-systems of H can be implemented sequentially via partial tracing over single partitions; and we can do that using the following two main procedures. In one of these procedures we split H in two parts Ha ⊗ Hb and trace out a or b. And in the other one we divide H in three parties Ha ⊗ Hb ⊗ Hc and trace over the inner party b. As the first procedure was addressed in the previous section, we shall regard the details of the last one in this section. Let O ∈ L(Ha ⊗ Hb ⊗ Hc) and let us consider the partial trace

db ac O = Trb(O)= (Ia ⊗ hbj |⊗ Ic)O(Ia ⊗ |bj i⊗ Ic). (35) j=1 X 2 2 2 2 2 In a direct implementation of this equation, dadb dc(dadc(db +1)+2) mops and dadbdc(db + 1)(dadbdc − 1) sops would be used. For db ≫ 1,

3 3 3 mops = sops ≈ dadb dc . (36)

With the aim of reaching an optimized computation of the partial trace, we start considering

da db dc ac O = (hj| ⊗ hk| ⊗ hl|)O(|mi ⊗ |ni ⊗ |oi)|jihm|⊗ Trb(|kihn|) ⊗ |liho| (37) j,m=1 k,n l,o X X=1 X=1 da dc db = (hj| ⊗ hk| ⊗ hl|)O(|mi ⊗ |ki ⊗ |oi) |jihm| ⊗ |liho| (38) j,m=1 l,o k ! X X=1 X=1 da dc = (hj| ⊗ hl|)Oac(|mi ⊗ |oi)|jihm| ⊗ |liho|. (39) j,m=1 l,o X X=1 7

In this article, if not stated otherwise, we assume that the matrix representation of the considered operators in the corresponding (global) computational basis is given. Next we notice e.g. that only the element α := (m−1)dbdc +(n− 1)dc + o of |mi ⊗ |ni ⊗ |oi is non-null; thus O|mi ⊗ |ni ⊗ |oi is equal to the α-th column vector of O. Thus, Eqs. (38) and (39) and these results can be used to write the following relation between matrix elements (in the corresponding global computational basis):

db ac O(j−1)dc+l,(m−1)dc+o = O(j−1)dbdc+(k−1)dc+l,(m−1)dbdc+(k−1)dc+o. (40) k X=1 2 2 In terms of ops, in this implementation we shall utilize mops =0 and sops = dadc (db − 1), which for db ≫ 1 is

2 2 mops =0 and sops ≈ dadbdc. (41)

As in the case of bipartite systems, here also we can utilize the hermiticity of the reduced matrix to diminish the −1 number of basic operations by 2 dadc(dadc − 1)(db − 1) sops. Fortran code to perform all numerical calculations associated with this article, and several others, can be accessed in https://github.com/jonasmaziero/LibForQ.git. In particular, we provide the subroutine, partial_trace(rho, d, di, nss, ssys, dr, rhor), which returns the reduced matrix rhor once provided dr (its dimension), rho (the matrix representation in the computational basis of the regarded linear operator in H1 ⊗ H2 ⊗···⊗Hn), d (the dimension of rho), nss= n (the number of sub-systems), di (a n-dimensional integer vector whose components specify the dimensions of the sub-systems), and ssys (a n-dimensional integer vector whose null components specify the sub-systems to be traced over; the other components must be made equal to one). In the Hermitian case, just change the subroutine’s name to partial_trace_he. Let’s exemplify the application of what was discussed in this section by considering the thermal ground state [66], ρ = exp(−βH)/Tr(exp(−βH)) with β → ∞, of a line of qubits with Ising interaction between nearest-neighbors: J n−1 z z n x x(z) H = − 2 j=1 σj σj+1 − h j=1 σj , where σj are the Pauli operators in the state space of the j-th spin, h is the so called transverse magnetic field, and we set the exchange interaction strength to unit (J = 1). Here we want P P to compute the nonlocal quantum coherence [67] of the edge spins: Cnl(ρ1n) = C(ρ1n) − [C(ρ1)+ C(ρn)], where the l1-norm quantum coherence is given by [68]: C(ρ) = j=6 khj|ρ|ki, with |ji being the standard-computational basis in the regarded Hilbert space. When performing this kind of calculation, we shall need the reduced states: P ρ1n = Tr2···(n−1)(ρ12···(n−1)n), ρ1 = Trn(ρ1n), and ρn = Tr1(ρ1n). The results for the quantum coherence and a comparison between the time taken by the optimized and direct implementations of the partial trace function are shown in Fig. 2.

V. “PARTIAL TRACES” VIA BLOCH’S PARAMETRIZATION

As the most frequent application of the partial trace function is to compute reduced density matrices (also called partial states or quantum marginals), let us consider yet another approach we may apply to perform that task and analyze its computational complexity. From the defining properties of a (positiveness ρ ≥ 0 and unit trace Tr(ρ)=1), follows that it can be written in terms of Ib and of the orthonormal-traceless-hermitian generators Γj of the special unitary group, as shown below. For a bipartite system Ha ⊗ Hb, the reduced state of sub-system b can be written as [69, 70]:

2 I db −1 b b γj ρ = + Γj , (42) db 2 j=1 X † b where Trb(Γj Γk)=2δjk and γj = Trb(Γj ρ ). If the density matrix ρ of the whole system is known, and we want to compute ρb, then

dadb b γj = Trb(Γj ρ ) = Trab(Ia ⊗ Γjρ)= (Ia ⊗ Γj ρ)k,k (43) k X=1 2 t will yield the d − 1 real components of the Bloch vector ~γ = [γ γ ··· γ 2 ] . b 1 2 db −1 8

0.9 1000 n=3 100 n=4 0.8 10 n=5

1 n=6 n=7 0.7 0.1 n=8

0.01 n=9 n=10 0.001 0.6 3 4 5 6 7 8 9 10

0.5

0.4 -0.1

0.3 -0.15

-0.2 0.2 -0.25 n=3 n=4 n=5 -0.3 n=6 0.1 n=7 n=8 n=9 -0.35 n=10 0 1.5 2 2.5 3 3.5 4 4.5 1.5 2 2.5 3 3.5 4 4.5 5

Figure 2: (color online) Non-local quantum coherence (NLQC) of the edge qubits of an Ising chain with n spins as a function of the reciprocal of the transverse magnetic field. In the bottom inset is shown the first derivative of the NLQC, whose minimum is seem to go slowly towards the point of quantum phase transition (h = 1/2). In the upper inset is shown, in logarithmic scale, the difference between the time taken, to do the calculations for a fixed value of n, by the direct and optimized implementations of the partial trace function. We notice thus a exponential increase of this time difference with n.

A. Direct implementation

Let us start by the most straightforward, unoptimized, implementation of the partial trace via Bloch parametriza- I 2 2 tion. Counting the basic operations need to compute each γj , we note that for the tensor product a ⊗Γj , mops = dadb , I 3 3 2 2 for the matrix multiplication ( a ⊗ Γj )ρ, mops = dadb and sops = dadb (dadb − 1), and for the trace, sops = dadb − 1. 2 2 2 2 2 2 2 Then, for the db −1 components γj we shall need mops = (db −1)dadb (dadb +1) and sops = (db −1)(dadb +1)(dadb −1). 4 2 2 b After knowing ~γ, db mops and db (db − 2) + db sops are required to compute ρ . Thus, on total we have to do 2 2 2 2 2 2 2 2 2 mops = db (db + da(db − 1)(dadb + 1)) and sops = (db − 1)(dadb + 1)(dadb − 1) + db (db − 2) + db, which for db ≫ 1 become

3 5 mops = sops ≈ dadb . (44)

B. (Not) Using the zeros of Ia

Now let’s use the zeros of Ia to diminish the number of ops needed to obtain γj . We begin by writing

(1) (1) Γj 0dbxdb ··· 0dbxdb ρ [?] ··· [?] Γj ρ Γj [?] ··· Γj [?] (2) (2) 0d xd Γj ··· 0d xd [?] ρ ··· [?] Γj [?] Γj ρ ··· Γj [?] (I ⊗ Γ )ρ =  b b b b    =   , (45) a j ......  . . . .   . . . .   . . . .     (da)  (da) 0d xd 0d xd ··· Γj   [?] [?] ··· ρ   Γj [?] Γj [?] ··· Γj ρ   b b b b            9

where 0dbxdb is the dbxdb null matrix and [?] is used to denote those dbxdb sub-blocks of ρ that we do not need when computing γj . From the last equation we see that

da da db da db db (β) (β) (β) γj = Tr(Γj ρ )= (Γj ρ )α,α = (Γj )α,kρk,α (46) β β α=1 β α=1 k X=1 X=1 X X=1 X X=1 db db da

= (Γj )α,k ρ(β−1)db+k,(β−1)db+α, (47) α=1 k β X X=1 X=1 which is valid for any choice of the generators Γj . If computed as shown in the last equation, each γj requires 2 2 2 sops = db (da − 1) and mops = db , and we need to compute db − 1 of them. Thus, when accounted for also the ops 2 2 2 2 necessary to compute Eq. (42) given ~γ, we arrive at a total of 2db (db − 1) mops and db ((db − 1)da − 1) + db sops b needed to compute ρ . For db ≫ 1:

4 4 mops ≈ 2db and sops ≈ dadb . (48)

C. (Not) Using the zeros of Γj (and of Ib)

These numbers can be reduced even more if we use the zeros of the generators Γj . To do that, a particular basis Γj has to be chosen, and we shall pick here the generalized Gell Mann’s matrices [70]:

j 2 Γ(1) = |kihk|− j|j +1ihj +1| , for j =1, ··· , d − 1, (49) j j(j + 1) b s k ! X=1 (2) Γ(k,l) = |kihl| + |lihk|, for 1 ≤ k

From Eq. (47), when computing the components of the Bloch vector, we see that for the generators corresponding to the diagonal group in Eq. (49):

j+1 d 2 a γ(1) = (−j)δα,j+1 ρ . (52) j j(j + 1) (β−1)db+α,(β−1)db+α s α=1 β X X=1

db −1 Then 6(db − 1) mops and 1 + (da − 1)( j=1 j)=1+(da − 1)2 db(db + 1) sops are used for this first group. Related to the generators belonging to the symmetric and anti-symmetric groups in Eqs. (50) and (51), respectively, from Eq. (47) we get P

da da (2) γ(k,l) = ρ(β−1)db+l,(β−1)db+k + ρ(β−1)db+k,(β−1)db+l =2 Re(ρ(β−1)db+l,(β−1)db+k), (53) β=1 β=1 X  X da da (3) γ(k,l) = −i ρ(β−1)db+l,(β−1)db+k − ρ(β−1)db+k,(β−1)db+l =2 Im(ρ(β−1)db+l,(β−1)db+k). (54) β=1 β=1 X  X

These two groups, formed by db(db − 1)/2 elements each, entail in mops =2 and sops = 2(da − 1); but, as we will see below, we do not need them to compute the reduced state ρb. In the sequence, we shall rewrite the partial state splitting its diagonal ∆ and non-diagonal Θ parts, i.e.,

ρb =∆+Θ. (55)

We will look first at the diagonal elements of ρb:

(1) (1) j I db−1 db db−1 b γj (1) k=1 |kihk| γj 2 ∆ = + Γj = + |kihk|− j|j +1ihj +1| . (56) db 2 db 2 j(j + 1) j=1 j=1 s k ! X P X X=1 10

(1) −1 If, for j =1, ··· , db − 1, we define the function ξj = γj / 2(j(j + 1)) and set ξ0 = db and q = db, then p ∆ = diag(ξ0, ξ0, ξ0, ξ0, ξ0, ··· , ξ0) + diag(ξ1, −ξ1, 0, 0, 0, ··· , 0) + diag(ξ2, ξ2, −2ξ2, 0, 0, ··· , 0) (57)

+diag(ξ3, ξ3, ξ3, −3ξ3, 0, ··· , 0)+ ··· + diag(ξq−3, ··· , ξq−3, −(q − 3)ξq−3, 0, 0) (58)

+diag(ξq−2, ξq−2, ··· , ξq−2, −(q − 2)ξq−2, 0) + diag(ξq−1, ξq−1, ξq−1, ξq−1, ··· , ξq−1, −(q − 1)ξq−1). (59)

After some analysis, we see that

db−1 ∆1,1 = k=0 ξk and ∆j,j = ∆j−1,j−1 + (j − 2)ξj−2 − jξj−1 for j =2, ··· , db. (60)

(1) P If we know the γj ’s, for the ξj ’s mops = 4(db − 1)+1 and sops = db − 1 and for the ∆j,j ’s 2mops = sops = 4(db − 1). Then for ∆ we shall need 6(db − 1)+1 mops and 5(db − 1) sops. Let us now consider the off-diagonal elements of ρb: 1 1 Θ = γ(2) Γ(2) + γ(3) Γ(3) = (γ(2) − iγ(3) )|kihl| + (γ(2) + iγ(3) )|lihk| . (61) 2 (k,l) (k,l) (k,l) (k,l) 2 (k,l) (k,l) (k,l) (k,l) ≤k

da

Θl,k = ρ(β−1)db+l,(β−1)db+k. β X=1 ∗ −1 Then, as Θk,l = Θl,k, we shall need sops = 2 db(db − 1)(da − 1) to calculate Θ. So, after knowing ~γ, 6(db − 1)+1 −1 b mops and (db − 1)(2 db(da − 1) + 5) sops are used to compute ρ . Therefore, counting the ops need to compute (1) 2 b the γj ’s, on total mops = 12(db − 1)+1 and sops = db (da − 1) + 5(db − 1)+1 shall be used to get ρ via Bloch parametrization. And for da, db ≫ 1

2 mops ≈ 12db and sops ≈ dadb . (62) As the “worst case” complexity for the multiplication is equal to the square of the complexity for the addition, we see that this number of operations is comparable with the optimized one obtained in Sec. IIIC, for da, db ≫ 1. Here is another, more dramatic example of the reduction in the number of elementary operations needed to compute the 6 10 2 partial trace (O(dadb ) → O(dadb )), which is obtained simply by performing some analytical developments. We also provide the Fortran code to compute partial states of bipartite systems via Bloch’s parametrization using this last procedure, i.e., with generalized Gell Mann’s matrices.

VI. CONCLUDING REMARKS

In this article, a thorough discussion about the partial trace function was made. We gave special attention to its numerical implementation. It is a good programming practice trying to avoid making the computer perform calculations which make no difference to the final result. We showed here that by following this truism we can decrease considerably the number of elementary operations needed to compute the partial trace, or reduced density matrix, which is an extremely important and common procedure in the quantum mechanics of composite (and open) systems. We provided and described Fortran code for computing the partial trace over any set of parties of a discrete multipartite system. At last, we analyzed the calculation of partial states via the Bloch’s parametrization with generalized Gell Mann’s matrices. We believe this text will be of pedagogical and practical value for the physics and science communities. As the partial trace function is highly adaptable for parallel calculations, we think this is a natural theme for future investigations. Extending our approach to continuous variable systems is an interesting research topic. It would also be fruitful producing translations of our code to other open source programming languages such as e.g. Maxima and Octave, as was already done for Python.

Acknowledgments

This work was supported by the Brazilian funding agencies: Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), processes 441875/2014-9 and 303496/2014-2, Instituto Nacional de Ciência e Tecnologia de 11

Informação Quântica (INCT-IQ), process 2008/57856-6, and Coordenação de Desenvolvimento de Pessoal de Nível Superior (CAPES), process 6531/2014-08. I gratefully acknowledge the hospitality of the Physics Institute and Laser Spectroscopy Group at the Universidad de la República, Uruguay. I also thank Felix Huber for pointing out an error in one of the partial traces subroutines of a previous version of LibForQ.

[1] R. P. Feynman, Simulating physics with computers, Int. J. Theor. Phys. 21, 467 (1982). [2] S. Lloyd, Universal quantum simulators, Science 273, 1073 (1996). [3] K. Fisher, H.-G. Matuttis, N. Ito, and M. Ishikawa, Quantum-statistical simulations for quantum circuits, Int. J. Mod. Phys. C 13, 931 (2002). [4] W. Krauth, Introduction to Monte Carlo algorithms, arXiv:cond-mat/9612186. [5] D. P. Landau and K. Binder, A Guide to Monte Carlo Simulations in Statistical Physics (Cambridge University Press, New York, 2009). [6] D. P. Kroese, T. Brereton, T. Taimre, and Z. I. Botev, Why the Monte Carlo method is so important today, WIREs Comput. Stat. 6, 386 (2014). [7] Y. Lutsyshyn, Fast quantum Monte Carlo on a GPU, Comp. Phys. Comm. 187, 162 (2015). [8] L. P. Kadanoff, More is the same; Phase transitions and mean field theories, J. Stat. Phys. 137, 777 (2009). [9] W.-Y. Wang, W.-S. Duan, and J. Liu, The effects of the beyond mean field corrections of Fermi superfluid gas in a double-well potential, Int. J. Mod. Phys. C 23, 1250076 (2012). [10] A. Sen(De) and U. Sen, Entanglement mean field theory: Lipkin–Meshkov–Glick model, Quantum Inf. Process. 11, 675 (2012). [11] D. Yamamoto, G. Marmorini, and I. Danshita, Quantum phase diagram of the triangular-lattice XXZ model in a magnetic field, Phys. Rev. Lett. 112, 127203 (2014). [12] K. Capelle, A bird’s-eye view of density-functional theory, Braz. J. Phys. 36, 1318 (2006). [13] C. A. Ullrich and Z.-h. Yang, A brief compendium of time-dependent density functional theory, Braz. J. Phys. 44, 154 (2014). [14] R. O. Jones, Density functional theory: Its origins, rise to prominence, and future, Rev. Mod. Phys. 87, 897 (2015). [15] A. Esmailian, M. Shahrokhi, and F. Kanjouri, Structural, electronic and magnetic properties of (N, C)-codoped ZnO nanotube: First principles study, Int. J. Mod. Phys. C 26, 1550130 (2015). [16] M. E. Fisher, Renormalization group theory: Its basis and formulation in statistical physics, Rev. Mod. Phys. 70, 653 (1998). [17] H. J. Maris and L. P. Kadanoff, Teaching the renormalization group, Am. J. Phys. 46, 652 (1978). [18] J. V. Alvarez and S. Moukouri, Numerical renormalization group method in weakly coupled quantum spin chains: Com- parison with exact diagonalization, Int. J. Mod. Phys. C 16, 843 (2005). [19] C. Bény and T. J. Osborne, Information geometric approach to the renormalisation group, Phys. Rev. A 92, 022330 (2015). [20] R. Oruś, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Ann. Phys. 349, 117 (2014). [21] M. Lubasch, J. I. Cirac, and M.-C. Ban˜uls, Algorithms for finite projected entangled pair states, Phys. Rev. B 90, 064425 (2014). [22] S. Keller, M. Dolfi, M. Troyer, and M. Reiher, An efficient matrix product operator representation of the quantum-chemical Hamiltonian, J. Chem. Phys. 143, 244118 (2015). [23] D. M. Kennesa and C. Karrasch, Extending the range of real time density matrix renormalization group simulations, Comp. Phys. Comm. 200, 37 (2016). [24] S. Campbell, L. Mazzola, and M. Paternostro, Global quantum correlations in the Ising model, Int. J. Quantum Inf. 9, 1685 (2011). [25] G.-D. Lin, C. Monroe, and L.-M. Duan, Sharp phase transitions in a small frustrated network of trapped ion spins, Phys. Rev. Lett. 106, 230402 (2011). [26] G. L. Giorgi and Th. Busch, Genuine correlations in finite-size spin systems, Int. J. Mod. Phys. B 27, 1345034 (2013). [27] S. Campbell, L. Mazzola, G. De Chiara, T. J. G. Apollaro, F. Plastina, Th. Busch, and M. Paternostro, Global quantum correlations in finite-size spin chains, New J. Phys. 15, 043033 (2013). [28] C. Y. Koh, L. C. Kwek, S. T. Wang, and Y. Q. Chong, Entanglement and discord in spin glass, Laser Phys. 23, 025202 (2013). [29] C.-J. Shan, W.-W. Cheng, J.-B. Liu, Y.-S. Cheng, and T.-K. Liu, Scaling of geometric quantum discord close to a topological phase transition, Sci. Rep. 4, 4473 (2014). [30] A. Biswas, R. Prabhu, A. Sen(De), and U. Sen, Genuine-multipartite-entanglement trends in gapless-to-gapped transitions of quantum spin systems, Phys. Rev. A 90, 032301 (2014). [31] M. J. M. Power, S. Campbell, M. Moreno-Cardoner, and G. De Chiara, Nonclassicality and criticality in symmetry- protected magnetic phases, Phys. Rev. B 91, 214411 (2015). [32] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, Cam- bridge, 2000). [33] J. Preskill, Quantum Information and Computation, http://theory.caltech.edu/people/preskill/ph229/. 12

[34] M. G. A. Paris, The modern tools of quantum mechanics: A tutorial on quantum states, measurements, and operations, Eur. Phys. J. ST 203, 61 (2012). [35] M. M. Wilde, Quantum Information Theory (Cambridge University Press, Cambridge, 2013). [36] C. Garola and S. Sozzo, The physical interpretation of partial traces: Two nonstandard views, Theor. Math. Phys. 152, 1087 (2007). [37] S. Fortin and O. Lombardi, Partial traces in decoherence and in interpretation: What do reduced states refer to?, Found. Phys. 44, 426 (2014). [38] B. Groisman, S. Popescu, and A. Winter, Quantum, classical, and total amount of correlations in a , Phys. Rev. A 72, 032317 (2005). [39] J. Maziero, Distribution of mutual information in multipartite states, Braz. J. Phys. 44, 194 (2014). [40] C. H. Bennett, H. J. Bernstein, S. Popescu, and B. Schumacher, Concentrating partial entanglement by local operations, Phys. Rev. A 53, 2046 (1996). [41] A. Hosoya, A. Carlini, and S. Okano, Complementarity of entanglement and interference, Int. J. Mod. Phys. C 17, 493 (2006). [42] T. Radtke and S. Fritzsche, Simulation of n-qubit quantum systems. II. Separability and entanglement, Comp. Phys. Comm. 175, 145 (2006). [43] B. Röthlisberger, J. Lehmann, and D. Loss, libCreme: An optimization library for evaluating convex-roof entanglement measures, Comp. Phys. Comm. 183, 155 (2012). [44] L. C. Céleri, J. Maziero, R. M. Serra, Theoretical and experimental aspects of quantum discord and related measures, Int. J. Quantum Inf. 9, 1837 (2011). [45] K. Modi, A. Brodutch, H. Cable, T. Paterek, and V. Vedral, The classical-quantum boundary for correlations: Discord and related measures, Rev. Mod. Phys. 84, 1655 (2012). [46] G. H. Aguilar, O. Jiménez Farías, J. Maziero, R. M. Serra, P. H. Souto Ribeiro, and S. P. Walborn, Experimental estimate of a classicality witness via a single measurement, Phys. Rev. Lett. 108, 063601 (2012). [47] J. Maziero, Random sampling of quantum states: A survey of methods, Braz. J. Phys. 45, 575 (2015). [48] J. Maziero, Fortran code for generating random probability vectors, unitaries, and quantum states, Frontiers in ICT 3, 4 (2016). [49] N. Paunković, P. D. Sacramento, P. Nogueira, V. R. Vieira, and V. K. Dugaev, Fidelity between partial states as signature of quantum phase transitions, Phys. Rev. A 77, 052302 (2008). [50] J.-Y. Chen, Z. Ji, Z.-X. Liu, Y. Shen, and B. Zeng, Geometry of reduced density matrices for symmetry-protected topo- logical phases, Phys. Rev. A 93, 012309 (2016). [51] G. Bellomo, A. Plastino, and A. R. Plastino, Classical extension of quantum-correlated separable states, Int. J. Quantum Inf. 13, 1550015 (2015). [52] M. Christandl, B. Doran, S. Kousidis, and M. Walter, Eigenvalue distributions of reduced density matrices, Commun. Math. Phys. 332, 1 (2014). [53] L. Chen, O. Gittsovich, K. Modi, and M. Piani, Role of correlations in the two-body-marginal problem, Phys. Rev. A 90, 042314 (2014). [54] P. D. Johnson and L. Viola, On state versus channel quantum extension problems: exact results for U ⊗ U ⊗ U symmetry, J. Phys. A: Math. Theor. 48, 035307 (2015). [55] P. D. Johnson and L. Viola, Compatible quantum correlations: On extension problems for Werner and isotropic states, Phys. Rev. A 88, 032323 (2013). [56] J. Chen, Z. Ji, D. Kribs, N. Lütkenhaus, and B. Zeng, Symmetric extension of two-qubit states, Phys. Rev. A 90, 032318 (2014). [57] R. Bhatia, Partial traces and entropy inequalities, Appl. 370, 125 (2003). [58] P. Hayden, R. Jozsa, D. Petz, and A. Winter, Structure of states which satisfy strong subadditivity of quantum entropy with equality, Commun. Math. Phys. 246, 359 (2004). [59] E. A. Carlen and E. H. Lieb, Bounds for entanglement via an extension of strong subadditivity of entropy, Lett. Math. Phys. 101, 1 (2012). [60] M. Schlosshauer, Decoherence, the measurement problem, and interpretations of quantum mechanics, Rev. Mod. Phys. 76, 1267 (2004). [61] W. H. Zurek, Quantum Darwinism, Nat. Phys. 5, 181 (2009). [62] J. Maziero and F. M. Zimmer, Genuine multipartite system-environment correlations in decoherent dynamics, Phys. Rev. A 86, 042121 (2012). [63] T. Radtke and S. Fritzsche, Simulation of n-qubit quantum systems. III. Quantum operations, Comp. Phys. Comm. 176, 617 (2007). [64] G. B. Arfken and H. J. Weber, Mathematical Methods for Physicists (Elsevier, California, 2005). [65] J. Watrous, Theory of Quantum Information, https://cs.uwaterloo.ca/∼watrous/TQI/. [66] T. J. Osborne and M. A. Nielsen, Entanglement in a simple quantum phase transition, Phys. Rev. A 66, 032110 (2002). [67] M. B. Pozzobom and J. Maziero, Environment-induced quantum coherence spreading, arXiv:1605.04746. [68] T. Baumgratz, M. Cramer, and M. B. Plenio, Quantifying coherence, Phys. Rev. Lett. 113, 140401 (2014). [69] E. Brüning, H. Mäkelä, A. Messina, and F. Petruccione, Parametrizations of density matrices, J. Mod. Opt 59, 1 (2012). [70] R. A. Bertlmann and P. Krammer, Bloch vectors for qudits, J. Phys. A: Math. Theor. 41, 235303 (2008).