arXiv:1102.0316v1 [cs.IT] 1 Feb 2011 hthglgtteueuns ftegahclapproach: graphical the applicat of several usefulness present the will highlight we paper, that this In semantics. products. is graph of the variables) of internal part over as (summing assumed explicitly box” always the “closing that Loeliger of and f (FFGs) 2 “Forney-style graphs” the degree resemble of closely edges NFGs respectively. by 1, represented being distinguish in by are NFG NFG variables an an external by fac and products represent internal of vertices Moreover, and sum variables a represent represent edges set can which one we [12], which ver restri in degree of in as “normal” set graph, introducing other By bipartite the factors. while represents a variables, by represents vertices represent factors of graph of factor A product [12]. a graphs normal and [14] graphs graphs factor a function a such call we term, Borr physics engineering. a and ing science, , in ubiquitous oictosofr nfigve fteFuirtransfor Leg NFG Fourier the the such and of calculus, of transform. loop view subclass reparameterization, unifying simple tree-based a A offers . modifications function partition enitoue xlctyi h rvosfco rp lit graph factor to previous seem the importa not in an does explicitly is introduced surprisingly, NFGs been that, of paradigm functions intuitive partition and as products of sums s features. physics, similar In share function. diagrams partition repres the half-edges of fact and variables represent variables, external vertices internal which cal represent in representation, products, edges (NFG), graphical graph of natural factor “normalizati a sum normal After has products function. a of partition is sum a engineering called and sometimes physics, mathematics, • • • sums of are there as NFGs of applications many as are There are products of sums as expressed be can that Functions fpriua neetaeNGmdfiain htlaethe leave that modifications NFG are interest particular Of nti ae,w ilrpeetpriinfntosby functions partition represent will we paper, this In ebleeta h ocpulfaeoko representing of framework conceptual the that believe We Abstract oeflgnrlrsl,o hc n oolr sthe is [12]. corollary of one theorem which duality graph of normal result, general powerful a The NFGs; of physics; terms of in areas various in arise par The that relations, kind algebraic the of linear ticularly into insight provide often diagrams Trace atto ucin fNra atrGraphs Factor Normal of Functions Partition omlfco rp ult theorem duality graph factor normal algorithm sum-product Oeo h otcmo ye ffntosin functions of types common most the of —One NG) hc ul ntecnet ffactor of concepts the on build which (NFGs), aoaoyfrIfrainadDcso Systems Decision and Information for Laboratory .I I. ascuet nttt fTechnology of Institute Massachusetts hc r lsl eae oNFGs, to related closely are which , NTRODUCTION abig,M 23,USA 02139, MA Cambridge, tal. et .DvdFre,Jr. Forney, David G. [email protected] 1] 1] ihtedifference the with [16], [15], sntrlyncl derived nicely naturally is atto function partition 2,[3 is [13] [2], erature. o-called normal n the ent n”a on,” ctions endre din ed actor e a led tices have tors. ions ow- ors, m, nt s - . where n h set the and smo-rdcsfr, sfollows: as form,” “sum-of-products ofiuain scle its called is configurations ilb alda called be will ofiuain scle the called is configurations factor h aepriinfunction partition same the set The aud n htalvral lhbt r discrete. are alphabets comple variable are all functions that all and that valued, assume will we simplicity, For equivalent ucino htvariable; that of function atto function partition n a 2 s h em“xeirfunction.” “exterior term the So use variables phys temperature). [2] of external as terminology Mao usual and such the no parameters, extends function” are be “partition may there there and (although configurations), (state 1 • • • atclrsmo-rdcsfr o atto function partition a for form sum-of-products particular A A • • sal npyisapriinfnto sasmoe interna over sum a is function partition a physics in Usually nalphabets in nalphabets in Y ssoni 1] u ewl o ics hstpchere. topic this discuss not will we but [13], in shown as functions erating subsets X each nenlvrals respectively. variables, internal 7,[] n h arnedaiyrslso otbland Vontobel Chernyak of [24]. and results [23], duality Chertkov Loeliger Lagrange the of and results [8], [7], calculus” “loop the iercdsdfie ngraphs on defined codes Linear a efrhrgnrlzdt eiete“tree-based the derive Wainwright to of generalized approach others, reparameterization” further derive and to be [21] used Valiant may be of may algorithms” which “holographic the [2], Mao and Bashabsheh The atto function partition f k sastof set a is sastof set a is ( I P II. X x esyta qiaetrealizations equivalent that say We . factor oorpi transformations holographic k Z , X Y ( = y x ARTITION k k = ) = elt–akr Laboratories Hewlett–Packard ) ⊆ Q realization aoAt,C 40,USA 94304, CA Alto, Palo . f Y X involves [email protected] m k Q X i n i i m y ( X , , =1 51Pg ilRoad Mill Page 1501 aclO Vontobel O. Pascal x ∈Y 1 j n 1 xenlvariables external nenlvariables internal and =1 k ≤ ≤ X , aentrlrpeettosa NFGs, as representations natural have k Y sayfunction any is y i Y F ∈K j i k Y i.e., NTOSAND UNCTIONS j ofiuainspace configuration variable a ≤ falpsil xenlvariable external possible all of ) ≤ ifrn elztosta yield that realizations Different . domain k , k f falpsil nenlvariable internal possible all of k m Z n ⊆ ( if ; K ∈ x ; Y k X : , i y → X ftest fetra and external of sets the of , ftepriinfunction, partition the of k ∈ X safnto fcertain of function a is ) , n their and i X Z (resp. X Y k ( x j x fNG fAl- of NFGs of G i C (resp. ) X ∈ aigvalues taking RAPHS aigvalues taking c.Al-Bashabsheh ics. esyta a that say We . hti ie in given is that ilb called be will Y rsrethe preserve egtgen- weight configurations l j , if ) tal. et norsense our in Y 1 u sg of usage our j ∈ f k Y [25], sa is k y x x- ). j i A. Normal partition functions C. Equality indicator functions We will say that a realization of a partition function is We use special symbols for certain frequently occurring normal if all external variables are involved in precisely one factors. The most common and fundamental factor is the factor fk, and all internal variables are involved in precisely equality indicator function Φ=, which equals 1 if all incident two factors. These degree restrictions were introduced in [12] variables (which must have a common alphabet) are equal, and in the context of behavioral graphs. equals 0 otherwise. As observed in [12], any realization may be converted Figure 2 shows three ways of representing an equality to an equivalent normal realization by the following simple indicator function: first, by a vertex labeled by Φ=; second, by normalization procedure. a vertex labeled simply by an equality sign =; and third, as a junction vertex. The second representation makes a connection • For every external variable Xi, if Xi is involved in p with the behavioral graph literature (e.g., Tanner graphs), factors, then define p replica variables Xiℓ, 1 ≤ ℓ ≤ p, where vertices represent constraints rather than factors. The replace Xi by Xiℓ in the ℓth factor in which Xi is in- volved, and introduce one new factor, namely an equality third representation makes connections with ordinary block diagrams, where any number of edges representing the same indicator function Φ=(xi, {xiℓ, 1 ≤ ℓ ≤ p}) (see below). variable may meet at a junction, as well as with the factor • For every internal variable Yj , if Yj is involved in q ≥ 2 graph literature, where variables are represented by vertices factors, then define q replica variables Yjℓ, 1 ≤ ℓ ≤ q, rather than by edges. replace Yj by Yjℓ in the ℓth factor in which Yj is in- volved, and introduce one new factor, namely an equality indicator function Φ=({yjℓ, 1 ≤ ℓ ≤ q}). Φ= = ⑦ Thus all replica variables are internal variables that are in- volved in precisely two factors, while the external variables Fig. 2. Three representations of an equality indicator function of degree 3. Xi become involved in only one factor, namely an equality indicator function. Evidently this normalization procedure An equality indicator function of degree 2 is often denoted preserves the partition function. by a Kronecker delta function δ. Since such a function con- nects only two edges and constrains their respective variables B. Normal factor graphs to be equal, it may simply be omitted, as shown in Figure 3.3 For a normal realization of a partition function, a natural graphical model is a normal factor graph (NFG), in which δ = vertices are associated with factors, ordinary edges (i.e., hy- peredges of degree 2) are associated with internal variables, Fig. 3. Three representations of an equality indicator function of degree 2. “half-edges” [12] (i.e., hyperedges of degree 1) are associated with external variables, and a variable edge or half-edge is incident on a factor vertex if the variable is involved in that III. TRACE DIAGRAMS factor. It turns out that physicists have long used graphical dia- Example 1 (vector- multiplication). Consider a multi- grams called “trace diagrams” [10], [17], [18], [19], [20] that plication v = wM of a vector w by a matrix M, namely use semantics similar to those of NFGs. In this section we give a brief exposition of this topic, following [19].

vj = wiMij , j ∈ J , In trace diagrams, the factors are often vectors, matrices, Xi∈I , and so forth, and the variables are typically their indices. For instance, a matrix M = {M ,i ∈ I, j ∈ J} for some discrete index sets I and J . This may be interpreted ij may be considered to be a function of the two variables I and as a normal realization of the function v : J → C, with J, and is represented as a vertex with two incident edges, as external variable J, internal variable I, and factors w and i in Figure 4(a). Mij . Figure 1 shows the corresponding normal factor graph, in which the vertices are represented by labeled boxes, and the 2 I J I half-edge is represented by a special dongle symbol. M M I J J (a) (b) w M = v Fig. 4. Representations of (a) a matrix M; (b) the trace of M.

3 Fig. 1. Normal factor graph of a matrix multiplication v = wM. The last equivalence shown in Figure 3 is actually a bit problematic, since a single edge is not a legitimate normal factor graph; however, as a component of a normal factor graph, such an edge is always incident on some factor vertex fk, and since the combination of a factor fk involving some internal 2 ′ The dongle symbol “⊣” was chosen in [12] to suggest the possibility of variable Yj with an equality function Φ=(yj ,yj ) is just the same factor with ′ a connection to another external half-edge in the manner of two railroad cars Yj substituted for Yj , this substitution can be made in any legitimate NFG coupling, but of course this embellishment may be omitted. (see also [2]). Trace diagrams use the NFG convention that dangling edges Thus w is given in the form of a normal partition function (half-edges) represent external variables, whereas ordinary with external variable I and internal variables J and K. The edges represent internal variables, and are to be summed over. trace diagram or NFG of this is illustrated in For example, if the matrix M is square (i.e., the index alpha- Figure 6(b). (Notice that in this case the order of the indices bets I and J are the same), and the half-edges representing is important, since εijk = −εjik.) I and J are connected as in Figure 4(b), then the resulting Similarly, the of a 3 × 3 matrix M may be

figure represents the trace of M, since Tr M = i Mii. This written in terms of εijk as apparently explains why these kinds of graphicalP models are 3 3 3 known as “trace diagrams.” det M = εijkM1iM2j M3k. The convention that indices that appear twice are implicitly Xi=1 Xj=1 kX=1 to be summed over is known in physics as the Einstein sum- Thus if M1, M2 and M3 are the three rows of M, then its mation convention. This convention is used rather generally in determinant may be represented in trace diagram or normal physics, not just with trace diagrams. factor graph notation as in Figure 7. Trace diagrams permit visual proofs of various relationships in . For example, Figure 5 proves the identity M Tr ABC = Tr BCA. 1 I JK A B C M2 ε M3

Fig. 7. Representation of a determinant det{M1, M2, M3}. = B C A Figure 7 shows that the determinant of M may be expressed in three equivalent ways, as follows: Fig. 5. Proof of the identity Tr ABC = Tr BCA. det M = M1 · (M2 × M3)

If u and v are two real vectors with a common index set = M2 · (M3 × M1) I, then their (inner product) is defined as = M3 · (M1 × M2).

u · v = uivi. The trace diagram notation permits other operations that Xi∈I have not heretofore been considered in the factor graph liter- ature. For example, two trace diagrams with the same sets of The trace diagram (or normal factor graph) of a dot product external variables that are connected by a plus or minus sign is illustrated in Figure 6(a). represent the sum or difference of the corresponding partition I 4 I JK functions. For example, Figure 8 illustrates the “contracted u v u ε v epsilon identity,” namely (a) (b) 3 ε ε = δ δ − δ δ . Fig. 6. Representations of (a) a dot product u · v; (b) a cross product u × v. ijk kℓm iℓ jm im jℓ Xk=1

If u and v are two real three-dimensional vectors, then their I M I M I M K ❳❳✘✘✘ cross product u × v = w is defined by J ε ε L = ✘J✘❳❳❳L − J L

w1 = u2v3 − u3v2; Fig. 8. Contracted epsilon identity. w = u v − u v ; 2 3 1 1 3 From this identity, or its corresponding trace diagram, we w3 = u1v2 − u2v1. can derive such identities as Equivalently, (u × v) × w = (u · w)v − (v · w)u, 3 3 illustrated in Figure 9(a), or wi = εijkujvk, Xj=1 kX=1 (u × v) · (w × x) = (u · w)(v · x) − (u · x)(v · w), where we use the Levi-Civita symbol εijk, defined as illustrated in Figure 9(b), which reduce expressions involving two cross products to simpler forms involving only dot prod- +1, if ijk is an even permutation of 123; ucts. εijk = −1, if ijk is an odd permutation of 123;  0, otherwise. 4A product of partition functions is represented simply by a disconnected  factor graph, with each component graph representing a component function. ❳ ✘ ′ u u ❳✘❳✘ u −→ Yj ε ε = ✘✘ ❳❳ − G ′ v w v w v w −→ j −→ −→ Yj Yj (a) = ′ ′′ Gj f(yj ,yj ,yj ) u x u❳❳✘✘✘x u x −→ Yj′′ ε ε = ✘❳❳ − ′′ v w v✘ ❳w v w Gj (b) Fig. 11. Expressing an NFG in terms of subgraphs connected to a vertex. Fig. 9. Cross product identities: (a) (u × v) × w = (u · w)v − (v · w)u; (b) (u × v) · (w × x) = (u · w)(v · x) − (u · x)(v · w). −→ in µj (yj ) except f(yj,yj′ ,yj′′ ), and sum over all internal variables except Yj′ and Yj′′ . Therefore the partition function IV. THE SUM-PRODUCT ALGORITHM −→ −→ µj (yj ) of Gj may be expressed in terms of the partition The sum-product algorithm is an efficient method for com- functions of these subgraphs as follows: puting partition functions of cycle-free graphs. It has been −→ ′ ′′ −→ ′ ′ −→ ′′ ′′ explained many times, including in [12]. Here we explain it µj (yj )= f(yj ,yj ,yj ) µj (yj ) µj (yj ). y X′ ∈Y ′ y ′′X∈Y ′′ again in the language of normal factor graphs, with the objec- j j j j tive of achieving a clearer and more intuitive explanation than More generally, if the factor vertex to which edge Yj is in [12]. We freely use ideas from e.g., [1], [14], [15], [16], [26]. attached is fk(yk), then the message update rule is As Al-Bashabsheh and Mao [2] have emphasized, a partition −→ −→ µj (yj )= fk(yk) µj′ (yj′ ). function is completely determined by the set {fk(xk, yk)} of y X\{y } j′∈JY\{j} factors, independent of their ordering. In evaluating a partition k j k function, factors may be arbitrarily ordered and grouped. This This is called the sum-product update rule. Since G is connected and cycle-free, it is a tree (assuming observation (called the “generalized distributive law” by Aji −→ and McEliece [1]) is at the root of the sum-product algorithm. that it is finite). Each message µj has a depth equal to the We start with a normal realization of a partition function maximum length of any path from that message to any leaf with no external variables whose associated normal graph G vertex. The messages at depth 1 can be computed immediately, is connected and cycle-free. Thus the partition function of G the messages at depth 2 can be computed as soon as the is a constant, denoted by Z(G), and G is an ordinary graph messages at depth 1 are known, and so forth. If G is finite, then (no half-edges) that moreover is a tree. all messages can be computed in at most δ(G) rounds, where A connected graph G is cycle-free if and only if any cut δ(G) is the maximum possible depth, called the diameter. For any internal variable Yj , we define the marginal par- through any edge Yj divides G into two disconnected graphs, −→ ←− tition function Zj (yj ) as which we label arbitrarily as Gj and Gj . Such a cut divides the edge associated with Y into two half-edges associated with Z (y )= −→µ (y )←−µ (y ), y ∈ Y . j −→ ←− j j j j j j j j two external variables, denoted by Yj and Yj , with the same Thus Zj (yj ) is simply the componentwise (dot) product of alphabet Yj as Yj , as illustrated in Figure 10. −→ ←− the messages µj (yj ) and µj (yj ). This is sometimes called −→ ←− the past-future decomposition rule [12]. −→ Yj ←− ⇒ −→ Yj Yj ←− G = Gj Gj Gj Gj Graphically, Zj (yj ) is the partition function of the graph obtained from G by converting Yj from an internal to an Fig. 10. Disconnecting a cycle-free NFG G by a cut through edge Yj . external variable as shown in Figure 12; i.e., by replacing the edge associated with by a “tap” consisting of the −→ ←− Yj Let us define the messages µj (yj ) and µj (yj ) as the −→ −→ ←− concatenation of an edge labeled by Yj , an equality indicator partition functions of Gj and Gj , respectively; i.e., ←− function, and another edge labeled by Yj , with a further half- −→ edge labeled by Yj attached to the equality indicator function. µj (yj )= fk(yk), Yj −→X−→ Y−→ y ∈Y k∈K −→ ←− −→ −→ Yj ←− ⇒ −→ Yj Yj ←− where Y is the set of left-side variables (excluding Y ), and G = Gj Gj Gj ④ Gj −→ j K is the set of indices of left-side factors, and similarly for ←− Fig. 12. Converting Yj from internal to external by inserting a “tap.” µj (yj ). The goal of the sum-product algorithm is to compute −→ ←− the messages µj (yj ), µj (yj ) for every internal variable Yj . −→ Conversely, Z(G) is the partition function of the graph To compute a message such as µj (yj ), consider the factor −→ obtained by converting Yj back to an internal variable; i.e., vertex to which is attached. For simplicity, let us suppose Yj by summing Zj (yj ) over Yj : that this vertex has degree 3, and that the associated factor is −→ ←− f(yj ,yj′ ,yj′′ ), as shown in Figure 11. Z(G)= Zj (yj )= µj (yj ) µj (yj ). −→ −→ yXj ∈Yj yXj ∈Yj Since G is cycle-free, the subgraphs Gj′ and Gj′′ that extend from the edges Yj′ and Yj′′ must be disjoint. Their Thus, for any edge Yj , Z(G) is simply the dot product of the −→ −→ −→ ←− partition functions, µj′ (yj′ ) and µj′′ (yj′′ ), include all factors messages µj and µj . V. HOLOGRAPHIC TRANSFORMATIONS B. General normal factor graph duality theorem In this section, we recapitulate and generalize the concept This general approach yields a very simple proof of the of “holographic transformations” of normal factor graphs, “general normal factor graph duality theorem” of [2], [13]. which was introduced by Al-Bashabsheh and Mao [2], and Suppose that we have a normal factor graph in which each their “generalized Holant theorem,” which relates the partition variable alphabet A is a finite-dimensional over function of a normal factor graph to that of its holographic a finite field F of characteristic p (i.e., p is the least positive transform. This theorem generalizes the Holant theorem of integer such that pα =0 for all α ∈ F). The dual space Aˆ is Valiant [21] (see also [3], [4], [5], [6], [22]), which has then a vector space over F of the same dimension as A, and been used to show that some seemingly intractable counting there is a well-defined Zp-valued inner product ha,aˆ i with problems on graphs are in fact tractable. the usual properties; e.g., ha,ˆ 0i = h0,ai = 0, ha,aˆ + a′i = Using this concept, Al-Bashabsheh and Mao [2] were able ha,aˆ i + ha,aˆ ′i, and so forth (see, e.g., [11]). to prove a very general and powerful Fourier transform duality Given a complex-valued function f : A → C defined on A, theorem for normal factor graphs, of which the original normal its Fourier transform is then defined as the complex-valued graph duality theorem of [12] is an immediate corollary. We function F : Aˆ → C on Aˆ that maps aˆ to give a variation of this proof which is perhaps even simpler F (ˆa)= f(a)ωha,aˆ i, aˆ ∈ Aˆ, (compare also the proof in [13]). aX∈A In the last section of this paper, we will sketch further applications of this general approach. where ω = e2πi/p is a primitive complex pth root of unity. In an NFG, a Fourier transform may be represented as in A. General approach Figure 14, where the Fourier transform factor is The general approach can be explained very simply, as F = {ωha,aˆ i :ˆa ∈ Aˆ,a ∈ A}. follows. Let A and B be two finite alphabets, which will often A be of the same size; i.e., |A| = |B|. Let U(a,b),S(b,b′), The transform F (ˆa) is obtained by summing over A, which and V (b′,a′) be complex-valued factors involving variables in this case amounts to a matrix-vector multiplication. A, B, B′, and A′ defined on A, B, B, and A, respectively; ˆ ˆ alternatively, we may regard , and as matrices. Finally, A A A U,S V f FA = F suppose that the concatenation USV , shown in Figure 13, is the identity factor δaa′ , which can be represented simply as Fig. 14. Normal factor graph of a Fourier transform. an ordinary edge as in Figure 3.5 Note that as a factor in an NFG, we do not have to A A B B A = U S V distinguish between FA and its transpose; FA is simply a function of the two variables corresponding to the two incident Fig. 13. A concatenation of factors that is equivalent to the identity. edges, and as a matrix can act on either variable. Thus FA ˆ can act also as a Fourier transform FAˆ on a function of A. We then have the following obvious lemma: More generally, given a complex-valued multivariate func- Lemma (generalized holographic transformations). In any tion f(a) defined on a set of variables A = {Ai} whose F NFG, any ordinary edge may be replaced by a concatenation alphabets Ai are vector spaces over , its Fourier transform of factors USV equivalent to the identity, as in Figure 13, is defined as the complex-valued function without changing the partition function. F (aˆ)= f(a) ωhaˆi,aii. The “holographic transformations” of [2] involve similar re- Xa Yi placements, except without the middle factor S (alternatively, In other words, in a normal factor graph, each variable Ai may ′ with S(b,b ) = δbb′ ). Al-Bashabsheh and Mao [2] call B the be transformed separately, as illustrated in Figure 15. In [2], coupling alphabet, and say that U and V are dual with respect this property is called separability. ˆ to B. When |A| = |B|, they say that U and V are transformers; A2 in this case, as matrices, U and V are inverses. F ˆ A2 If a normal factor graph has external variables Xi, then A2 they may be transformed as well, by the insertion of a factor A2 Aˆ Aˆ Aˆ A A Aˆ or matrix Wi(xi, wi) defined on Xi ×Wi, where Wi is the 1 3 = 1 1 3 3 F FA1 f FA3 alphabet of a transformed external variable Wi. Thus the partition function is transformed into a function of the new Fig. 15. Fourier transform of multivariate function f(a1, a2, a3). external variables Wi. This is the essence of the “generalized Holant theorem” of [2]. (The original Holant theorem of Now let us define U = V = FA and S = Φ∼/|A|, where Valiant [21] applies when there are no external variables.) the sign inverter indicator function over Aˆ is defined as ′ 5 1, ifa ˆ = −aˆ ; Here and subsequently we may label an internal edge simply by its Φ (ˆa, aˆ′)= alphabet, without introducing dummy internal variables. ∼  0, otherwise. Then the concatenation USV is the identity, since A. Tree-based reparameterization ′ ′ ′ ha,aˆ i ′ haˆ ,a i ha,aˆ −a i Wainwright, Jaakkola, and Willsky [25] have shown how ω Φ∼(ˆa, aˆ )ω = ω = |A|δaa′ , the sum-product algorithm applied to general graphs with aˆ∈AXˆ,aˆ′∈A aˆX∈Aˆ cycles can be understood as a tree-based reparameterization by a basic orthogonality relation for Fourier transforms over algorithm, where each round of the message-passing algorithm finite groups (see, e.g., [11]). This result is illustrated in reparameterizes marginal distributions over simple subtrees Figure 16, where we omit the scale factor of |A|. consisting of a pair of vertices connected by an edge. More A A Aˆ Aˆ A generally, they consider iterative algorithms that reparameter- = FA Φ∼ FA ize distributions over arbitrary cycle-free subtrees of the graph, particularly spanning trees. Fig. 16. A concatenation of factors that is equivalent to an edge, up to scale. Let X be a set of m variables Xi taking values xi in finite Now we can prove our desired result: alphabets Xi, and let E be a set of pairs (Xi,Xj ) indicating which pairs of variables are connected. Suppose that the Normal factor graph duality theorem [2], [13]. Given corresponding graph with vertices Xi and edges (Xi,Xj ) ∈ E an NFG with partition function Z(x), comprising external is a tree (i.e., cycle-free). Finally, suppose that a probability variables Xi associated with half-edges, internal variables Yj distribution p(x) over these variables can be expressed as associated with ordinary edges (all alphabets being vector spaces over a finite field F), and complex-valued factors fk p(x) ∝ ψi(xi) ψij (xi, xj ), associated with vertices, the dual normal factor graph is 1≤Yi≤m (Xi,XYj )∈E defined by replacing each alphabet X or Y by its dual i j where the functions ψ (x ) and ψ (x , x ) depend only on the alphabet ˆ or ˆ , each factor by its Fourier transform i i ij i j Xi Yj fk singleton variables X and pairs (X ,X ), respectively. (By ˆ i i j fk, and finally by placing a sign inverter indicator function the Hammersley-Clifford theorem, this can always be done Φ∼ in the middle of every ordinary edge. Then the partition when p(x) is a positive Markov random field over the graph.) ˆ function of the dual NFG is the Fourier transform Z(xˆ) of x 6 We can view such a distribution p( ) as a partition function Z(x), up to scale. in which all variables are external (a “global function”). Proof : Let us first convert the given NFG with partition func- Normalizing this partition function, we obtain an equivalent tion Z(x) to an NFG with partition function Zˆ(xˆ), up to scale, partition function with the same external variables, but with ˆ by appending a Fourier transform FXi from Xi to Xi to every an equality indicator function corresponding to each external half-edge associated with every external variable Xi, as in variable replacing it in the corresponding normal factor graph. Figure 15. Then let us replace every ordinary edge associated A typical fragment of such an NFG is shown in Figure 17. with every internal variable Yj by a concatenation FAΦ∼FA Xi Xj like that shown in Figure 16; this preserves the partition function Zˆ(xˆ), up to scale. Now each vertex associated with Xi Xi Xj Xj ④ ψij ④ each factor fk is surrounded by Fourier transforms of all of the Xi Xj variables involved in fk, so it and its surrounding transforms may be replaced by a single vertex representing the Fourier ψi ψj transform factor fˆk without changing the partition function, up to scale. Fig. 17. Fragment of NFG representing a probability distribution on a tree.

Notice that this remarkably general theorem applies to any Now we can execute the sum-product algorithm on such normal factor graph, whether or not it has cycles. a cycle-free NFG, obtaining on each edge two messages, Using the fact that the indicator functions of a linear code C −→ ←− say µi(xi) and µi(xi) on an edge with alphabet Xi. The over F and of its orthogonal code C⊥ are a Fourier transform corresponding marginal probability distribution pi(xi) is pro- pair, up to scale, one obtains as an immediately corollary a portional to the componentwise product of these messages: duality theorem for normal factor graph representations of −→ ←− linear codes [2], [13], which is equivalent to the original pi(xi) ∝ µi(xi) µi(xi), xi ∈ Xi. normal graph duality theorem of [12]. Such a marginal distribution can be exhibited explicitly as VI. FURTHER DEVELOPMENTS a message in a “reparameterized” NFG by replacing a factor We now sketch briefly how the “tree-based reparameteriza- such as ψij (xi, xj ) by the concatenation of three factors: tion” approach of Wainwright et al. [25], the “loop calculus” U(x , x′ ) = ←−µ (x )δ(x , x′ ); results of Chertkov and Chernyak [7], [8], and the Lagrange i i i i i i ψ (x′ , x′ ) duality results of Vontobel and Loeliger [23], [24] fit within ′ ′ ij i j S(xi, xj ) = ←− ′ −→ ′ this generalized framework. The full developments will appear µi(xi) µj (xj ) ′ −→ ′ in a subsequent version of this paper. V (xj , xj ) = µj (xj )δ(xj , xj ),

6As shown in [2], the scale factor is |Y|. which evidently preserves the partition function. Such a reparameterization can be performed also in a graph C. Lagrange duality with cycles, or over a subtree of a given graph. Nice results Structurally similar operations can be used to obtain the are obtained when the messages are those that occur at a fixed Lagrange duality results for normal graphs of Vontobel and point of the sum-product algorithm, but the messages do not Loeliger [23], [24], which are based on the Legendre transform have to be chosen in this way. of convex optimization theory. In future work, we plan to use this approach to restate and One interesting aspect of this development is that instead generalize many of the results of [25] and related papers. of sums of products, we consider minima over sums (i.e., the sum-product semiring over the reals R is replaced by the min- sum semiring over the extended real line R¯ = R ∪{+∞}). B. Loop calculus Thus a partition function has the following form: Chertkov and Chernyak [7], [8], [9] have developed a “loop Z(x) = min fk(xk, yk), x ∈ X , y∈Y calculus” for statistical systems defined on finite graphs that kX∈K allows the partition function of a system to be expressed as a R¯ finite sum over “generalized loops,” in which the lowest-order where the “factors” fk(xk, yk) are -valued. term corresponds to the Bethe-Peierls (sum-product algorithm) The dual functions under the Legendre transform are func- approximation. tions in the max-sum semiring. Dualization involves the inser- tion of sign inverters into edges, as with Fourier dualization. We briefly sketch our approach to their results. Suppose Again, details will be provided in future versions of this paper. that all alphabets are binary. Then replace every edge Yj in the system by the concatenation Uj Sj Vj , where in matrix notation ACKNOWLEDGMENT We wish to acknowledge our close collaboration with ←− −→ + µj (0) − µj (1) Yongyi Mao, which led to the normal factor graph paradigm. Uj = ←− −→ ;  + µj (1) + µj (0)  REFERENCES 1 1 0 Sj = ; [1] S. M. Aji and R. J. McEliece, “The generalized distributive law,” ∆j  0 1  −→ −→ IEEE Trans. Inf. Theory, vol. 46, pp. 325-343, Mar. 2000. + µj (0) + µj (1) [2] A. Al-Bashabsheh and Y. Mao, “Normal factor graphs and Vj = ←− ←− ,  − µj (1) + µj (0)  holographic transformations,” IEEE Trans. Inf. Theory, pp. 752– 763, Feb. 2011. [3] J.-Y. Cai and V. Choudhary, “Valiant’s Holant theorem and where ←−µ (y ) and −→µ (y ) are functions that may (but need matchgate tensors,” Theoretical Computer Science, vol. 384, pp. j j j j 22–32, 2007. not) be chosen as fixed-point messages of the sum-product −→ ←− −→ ←− [4] J.-Y. Cai and P. Lu, “Holographic algorithms: From art to sci- algorithm, and ∆j = µj (0) µj (0) + µj (1) µj (1) is the ence,” in Proc. 39th Annual ACM Symp. on Theory of Computing determinant of Uj and Vj . Evidently the concatenation Uj Sj Vj (San Diego, CA), pp. 401–410, June 2007. is the identity, so this replacement preserves the partition [5] J.-Y. Cai, P. Lu, and M. Xia, “Holographic algorithms by Fi- function. bonacci gates and holographic reductions for hardness,” in Proc. IEEE 49th Annual IEEE Symp. on Foundations of Computer Now express every Sj as the sum of two matrices: Science (Philadelphia, PA), pp. 644–653, October 2008. [6] J.-Y. Cai, P. Lu, and M. Xia, “Holographic algorithms with 1 1 0 1 0 0 matchgates capture precisely tractable planar #CSP,” Aug. 2010. Sj = + ; ArXiv: 1008.0683. ∆j  0 0  ∆j  0 1  [7] M. Chertkov and V. Y. Chernyak, “Loop calculus in statistical physics and information science,” Phys. Rev. E, vol. 73, no. if there are n edges Yj , then the partition function of the 065102(R), 2006. original NFG can correspondingly be expressed as the sum [8] M. Chertkov and V. Y. Chernyak, “Loop series for discrete of the partition functions of the 2n component NFGs. statistical models on graphs,” J. Stat. Mech., P06009, 2006. ←− −→ [9] V. Y. Chernyak and M. Chertkov, “Planar graphical models If the functions µj (yj ) and µj (yj ) are fixed-point mes- which are easy,” J. Stat. Mech., P11007, Nov. 2010. sages of the sum-product algorithm, then it turns out that [10] P. Cvitanovi´c, Group Theory: Birdtracks, Lie’s, and Exceptional the partition function of the “zero-order” component graph Groups. Princeton U. Press, 2008. [On-line] birdtracks.eu. is the Bethe-Peierls partition function (at that fixed-point of [11] G. D. Forney, Jr., “Transforms and groups,” in Codes, Curves and Signals: Common Threads in Communications (A. Vardy, the sum-product algorithm); that the partition function of any ed.), pp. 79–97. Boston: Kluwer, 1998. component graph with a “loose end” (a vertex of effective [12] G. D. Forney, Jr., “Codes on graphs: Normal realizations,” IEEE degree 1) is zero; and that the partition functions of the Trans. Inf. Theory, vol. 47, pp. 520–548, Feb. 2001. remaining component graphs (corresponding to “generalized [13] G. D. Forney, Jr., “Codes on graphs: Duality and MacWilliams loops,” in which all vertices have effective degree 2 or more) identities,” to appear, IEEE Trans. Inf. Theory, vol. 57, Mar. 2011. are “small” multiples of the Bethe-Peierls partition function. [14] F. R. Kschischang, B. J. Frey and H.-A. Loeliger, “Factor graphs Again, the full development will be given in a subsequent and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. version of this paper. 47, pp. 498–519, Feb. 2001. [15] H.-A. Loeliger, “An introduction to factor graphs,” IEEE Sig. Proc. Mag., vol. 21, pp. 28–41, Jan. 2004. [16] H.-A. Loeliger, J. Dauwels, J. Hu, S. Korl, L. Ping and F. R. Kschischang, “The factor graph approach to model-based signal processing,” Proc. IEEE, vol. 95, pp. 1295–1322, June 2007. [17] S. Morse and E. Peterson, “Trace diagrams, matrix minors, and determinant identities.” ArXiv: 0903.1373. [18] R. Penrose, The Road to Reality: A Complete Guide to the Laws of the Universe. New York: Knopf, 2005. [19] E. Peterson, “Unshackling linear algebra from linear notation.” ArXiv: 0910.1362. [20] G. E. Stedman, Diagrammatic Techniques in Group Theory. Cambridge U. Press, 1990. [21] L. G. Valiant, “Holographic algorithms,” in Proc. 45th Annual IEEE Symp. on Foundations of Computer Science (Rome, Italy), pp. 306–315, October 2004. [22] L. G. Valiant, “Some observations on holographic algorithms,” in Proc. 9th Latin American Theoretical Informatics Symp. (Oaxaca, Mexico), pp. 577–590, April 2010. [23] P. O. Vontobel, Kalman Filters, Factor Graphs, and Electrical Networks, post-diploma project, ETH–Zurich, 2002. [On-line]: http://www.isiweb.ee.ethz.ch/papers. [24] P. O. Vontobel and H.-A. Loeliger, “On factor graphs and electrical networks,” in Mathematical Systems Theory in Biology, Communication, Computation and Finance (J. Rosenthal and D. Gilliam, eds.), IMA Volumes in Math. and Appl., vol. 134, pp. 469–492. New York: Springer-Verlag, 2003. [25] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, ”Tree-based reparameterization framework for analysis of sum-product and related algorithms,” IEEE Trans. Inf. Theory, vol. 49, pp. 1120– 1146, May 2003. [26] N. Wiberg, H.-A. Loeliger, and R. K¨otter, “Codes and iter- ative decoding on general graphs,” European Transactions on Telecommunications, vol. 6, pp. 513–525, Sept. 1995.