Preliminary Version Do Not Cite

PRELIMINARY VERSION: DO NOT CITE The AAAI Digital Library will contain the published version some time after the conference

Gene regulatory network inference as relaxed graph matching

Deborah Weighill,1 Marouen Ben Guebila,1 Camila Lopes-Ramos, 1 Kimberly Glass, 1 2 3 John Quackenbush, 1 2 3 John Platig, 2 3 Rebekka Burkholz 1 1 Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115 2 Channing Division of Network Medicine, Brigham and Women’s Hospital, 3 Harvard Medical School, Boston, MA 02115

Abstract see Fig. 2). The general objective of this work is to infer W based on noisy observations of its projections P ≈ Bipartite network inference is a ubiquitous problem across T T disciplines. One important example in the field molecular bi- WW and C ≈ W W . We formulate this task as a ology is gene regulatory network inference. Gene regulatory non-convex optimization problem, OTTER (Optimize to networks are an instrumental tool aiding in the discovery of Estimate Regulation). It is related to inexact graph match- the molecular mechanisms driving diverse diseases, including, as it seeks agreement between two graphs P and C. ing cancer. However, only noisy observations of the projec- W could be interpreted as relaxed permutation matrix that tions of these regulatory networks are typically assayed. In matches nodes in P (TFs) with nodes in C (genes). As re- an effort to better estimate regulatory networks from their laxed graph matching, OTTER is theoretically tractable but noisy projections, we formulate a non-convex but analytically the solutions are non-unique, since information about W is tractable optimization problem called OTTER. This problem lost as a consequence of projecting. To select a solution, we can be interpreted as relaxed graph matching between the need a good initial guess W of the bipartite network as in- two projections of the bipartite network. OTTER’s solutions 0 can be derived explicitly and inspire a spectral algorithm, for put in addition to P and C. which we provide network recovery guarantees. We also pro- Our first contribution is to fully characterize OTTER’s so- vide an alternative approach based on gradient descent that is lution space, which depends on the spectral decomposition more robust to noise compared to the spectral algorithm. In- of C and P . Hence, two natural choices to solve OTTER are terestingly, this gradient descent approach resembles the mes- (1) a spectral algorithm and (2) gradient descent. For both, sage passing equations of an established gene regulatory net- we provide theoretical network recovery guarantees. While work inference method, PANDA. Using three cancer-related the spectral method is robust to small noise, gradient descent data sets, we show that OTTER outperforms state-of-the-art is more reliable in higher noise settings, which are common inference methods in predicting transcription factor binding to gene regulatory regions. To encourage new graph match- in biological applications. As we show on three benchmark ing applications to this problem, we have made all networks data sets related to gene regulation and cancer, optimiz- and validation data publicly available. ing OTTER using gradient descent outperforms state-of-the- art gene regulatory network inference techniques. Among these techniques is PANDA (Passing Attributes between Introduction Networks for Data Assimilation (Glass et al. 2013)), an es- Bipartite networks are studied across disciplines ranging tablished GRN inference method. OTTER gradient descent from machine learning (Yamanishi 2009), ecology, and eco- resembles the corresponding message passing updates, and nomics to biology. They focus on the interaction of dif- it can therefore be interpreted as simplified, theoretically ferent types of nodes like vertices in different graphs, pol- tractable formulation of PANDA. This formulation enables linators and plants, countries and products, or compounds us to provide network recovery guarantees and analyze the and proteins. Another prominent example are gene regula- effects of noise on the network reconstruction. tory networks consisting of transcription factors (TFs) and Gene regulation Next generation genome sequencing genes (see Fig. 1). These are fundamental objects of study in technology has revolutionized biomedical research and pro- molecular biology and their analysis provides insights into vides data at an unprecedented scale and speed. The low mechanisms underlying the progression of various diseases, cost of this technology facilitates large, genome-scale stud- including cancer (Lopes-Ramos et al. 2018; Burkholz and ies which provide new insights into gene regulation, includ- Quackenbush 2021; Lopes-Ramos et al. 2020). ing the control of protein production through the expression Often, we do not observe the bipartite network W (e.g. of genes. Proteins influence higher level cellular functions, representing TF–gene interactions) directly but instead have which are often altered during the development and progres- information about its two associated projections WW T T sion of different diseases, including cancer. To gain an un- (TF–TF cooperation) and W W (gene–gene interactions, derstanding of the gene regulatory mechanisms perturbed by Copyright c 2021, Association for the Advancement of Artificial a disease, it is common practice to infer and compare asso- Intelligence (www.aaai.org). All rights reserved. ciated gene regulatory networks (GRNs) (Lao et al. 2015; Figure 1: Gene regulation. A. Transcription factors (TFs) are represented by green, blue, and yellow objects that bind to the genome (gray band) in vicinity of the start site of a gene (black arrow) to regulate its expression. B. Representation of A as bipartite gene regulatory network.

Lopes-Ramos et al. 2018; Qiu et al. 2018; Yung et al. 2019). objectives: structure learning and gene expression predic- In many cases, these networks are weighted, bipartite, and tion. Usually, gene expression prediction makes indirect have a representation as matrix W . W consist of two types statements about the interaction structure of variables as of nodes – transcription factors (TFs) and genes. A TF is a well and thus forms an hypothesis about which TFs reg- protein that can bind to the DNA in the vicinity of a gene and ulate which genes. TFs are proteins that are created from regulate its expression. When this occurs, the TF and target the mRNA expressed by their corresponding genes. Hence, gene are linked in the gene regulatory network, see Fig. 1. predicting target gene expression from the expression of the TFs are also known to cooperatively regulate target genes. genes coding for the TFs assumes a biologically reasonable One actively studied mechanism by which TFs cooperate is structure. The most common and basic approach is to anal- through the formation of protein complexes which then go yse the Pearson correlation (COR) matrix or, if feasible, par- on to bind to DNA. These TF-TF interactions are an area tial correlations (PARTIAL COR). Spearman correlations of active study, and public databases with such information usually lead to similar conclusions. Another popular ap- are actively maintained. In this work, we denote the TF-TF proach is Weighted Gene Co-expression Network Analysis cooperativity matrix as P . (WGCNA) (Zhang and Horvath 2005; Langfelder and Hor- Genes that are co-regulated are frequently correlated in vath 2008), in particular the TOM subroutine. It also starts their expression levels. We can estimate this co-regulation from a gene expression correlation matrix but down-weighs using a gene-gene co-expression matrix C estimated from connections if they are not consistent with neighborhood in- gene expression data. This is especially attractive because formation. Other pruning heuristics take also different types gene expression is widely available and is context-specific, of node similarities resulting from graph embeddings into i.e., it depends on the tissue type, disease, etc. A more de- account (Pio et al. 2020). Alternatives are based on mutual tailed explanation of gene regulation is given in the supple- information, where ARACNe (Lachmann et al. 2016) is one ment. We also recommend the review article by Todeschini of the most commonly used representatives. Among graphi- et al. (Todeschini, Georges, and Veitia 2014)). cal models, mainly Gaussian graphical models are used be- GRN inference is central to deepening our understanding cause the learning algorithms have to scale to a large number of diseases on the molecular level, but it is a notoriously of genes. The GLASSO (Friedman, Hastie, and Tibshirani difficult problem. Contributions by researchers working in 2008) method is among the best performing candidates and different domains like graph matching could therefore have uses LASSO regularization to enforce sparsity. However, it a great impact. For this reason, we provide benchmark data still does not scale to our setting (approximately 20, 000- sets for three human tissues in the cancer domain (Guebila 30, 000 genes for human tissue), so that we had to omit et al. 2020). Their use does not require expertise in molecu- it from the benchmarks in our experiments. Linear models lar biology. (Haury et al. 2012) and random forests (Huynh-Thu et al. Related work The OTTER objective is inspired by a 2010) have been used for a similar purpose, where TIGRESS state-of-the-art GRN inference method, PANDA (Glass et al. (Haury et al. 2012) and GENIE3 (Huynh-Thu et al. 2010) 2013). PANDA integrates multiple data sources through a were top scorers at the DREAM5 challenge (Marbach et al. message passing approach, which is similar to the gradient 2012) (although the challenge was somewhat different from descent of OTTER. A derivation is given in the supplement. the GRN modeling we study here). Both methods have high PANDA has been used to investigate gene regulatory rela- computational requirements and are less suitable for the hu- tionships in both tissue specific (Sonawane et al. 2017) as man genome. An alternative approach is to treat the binding well as several disease contexts, including chronic obstruc- of TFs to the promoter region of a gene as supervised learn- tive pulmonary disease (Lao et al. 2015), asthma (Qiu et al. ing problem (Karimzadeh and Hoffman 2019; Yuan and Bar- 2018), beta cell differentiation (Yung et al. 2019), and colon Joseph 2019). While such models can be quite accurate, they cancer (Lopes-Ramos et al. 2018). OTTER can be seen as are limited to the small number of TFs for which the rele- a theoretically tractable simplification of PANDA, which is vant data is available, which is provided by ChIP-seq exper- amenable to modern optimization techniques and draws con- iments. Hence, supervised approaches cannot discover new nections to graph matching. gene regulatory relationships. Transfer learning algorithms Many methods try to infer regulatory relationships solely can utilize more data from different domains, for instance, based on gene expression with two possible (non-exclusive) GRNs related to mice (Mignone et al. 2019), but might also inherit unrealistic biases. Note that, in contrast, the data re- agree partially with the projections of the actual gene reg- quired to define OTTER are widely available, related to the ulatory network W . We do not require perfect equality of T T relevant domain, and include a much larger set of known WW = P and W W = C but improve W0 accord- TFs. ing to the following three central elements of gene regu- In addition, OTTER is related to established problems in lation: (1) TFs that can bind to the promoter region of a graph matching (Yan et al. 2016), which have strong the- gene are more likely to regulate that gene (W ≈ W0) oretical foundations (Jiang et al. 2017; Barak et al. 2019). (Spitz and Furlong 2012; Lambert et al. 2018; Ouyang, The quadratic assignment problem (QAP) (Aflalo, Bron- Zhou, and Wong 2009), (2) genes that are correlated in their stein, and Kimmel 2015) and its variants (Maron and Lip- expression are more likely to be co-regulated by similar man 2018) have a direct link to OTTER and can support a TFs (W T W ≈ C) (Lambert et al. 2018; Shi, Fornes, and similar biological theory. Graph matching has broad appli- Wasserman 2018; Hobert 2008), and (3) TFs that interact cations in computer science ranging from machine learning (for example, by forming complexes) are more likely to tar- (Cour, Srinivasan, and Shi 2007), pattern matching (Zhou get the same genes (WW T ≈ P ). TF cooperation is of- and De la Torre 2016), vision (Berg, Berg, and Malik 2005; ten mediated through protein-protein interactions (Spitz and Zhou and De la Torre 2013), and protein network alignment Furlong 2012; Morgunova and Taipale 2017; Deplancke, (Singh, Xu, and Berger 2008) to social network analysis Alpern, and Gardeux 2016). For example, the TFs Msn2 and (Fan 2012). However, it has not been applied to gene regula- Msn4 bind together to form a complex before binding to the tory network inference to the best of our knowledge. As we DNA of their target genes (Chapal et al. 2019). show, simple relaxed graph matching techniques are com- This reasoning motivates our general study of bipartite petitive with established GRN inference methods. network inference based on observed noisy projections (P Contributions 1) We pose a novel optimization prob- and C). As we will show, a considerable amount of infor- lem, OTTER, for the inference of bipartite networks in gen- mation is lost by projecting. This explains partially why eral and gene regulatory networks (GRNs). Importantly, OT- GRN inference is challenging. Central to our success is TER is analytically tractable. 2) We gain insights into a a good initialization W0 and the choice of algorithm that state-of-the-art GRN inference method, PANDA (Glass et al. picks a solution among the many different options. Specif- 2013), as OTTER gradient descent resembles the related ically, we study two approaches: (a) a spectral algorithm message passing equations. 3) We characterize OTTER’s so- and (b) a gradient descent variant optimizing the OTTER lution space and derive a spectral algorithm on its basis, for objective, which we introduce formally in the next section. which we give network recovery guarantees. 4) We solve As we show, the spectral approach has excellent recovery the gradient flow dynamics associated with gradient descent guarantees in low noise settings, while gradient descent is for OTTER. 5) We draw a connection from OTTER to re- more reliable in high noise applications, which is common laxed graph matching and open a new application area for in high throughput sequencing data in biology. Gradient de- related algorithms. 6) We show that OTTER gradient descent also has the advantage that it allows us to stay closer to scent outperforms the current state of the art in GRN in- the initial W0 with early stopping. For this reason, it enables ference on three challenging biological data sets related to us to outperform the state of the art in GRN inference. cancer. 7) We make the processed data publicly available to ease the use for researchers without a computational biology Theoretical framework background and to foster further innovation in relaxed graph In this section, we analyse the general problem of learning matching and GRN inference. a bipartite and weighted network with matrix representation W ∈ Rnp×nc from its symmetric projections P ∈ Rnp×np OTTER and C ∈ Rnc×nc . By analogy with our motivation of GRN inference, we call the nodes of one type transcription fac- Biological motivation tors (TFs) and of the other type genes. We have np TFs Our objective is to infer a gene regulatory network (GRN) and nc genes, where the number of genes is usually much represented by a matrix W . Entries wij with larger values larger (np nc). Minimizing the following OTTER objec- indicate a higher probability that TF i regulates gene j. OT- tive f(W ) = TER and PANDA refine an initial guess W0 of a GRN by (1 − λ) λ γ increasing its consistency with protein-protein interactions kWW T − P k2 + kW T W − Ck2 + kW k2 (1) P and observed gene expression with correlation matrix C. 4 4 2 In our experiments, a TF–gene edge exists in W0 if the se- with respect to W seeks agreement between the projections quence motif for that TF is present in the promoter region of W and P and C. λ ∈ [0, 1] denotes a tuning parame- of the target gene. This information depends only on the hu- ter that moderates the influence of P versus C, and γ corre- man reference genome and provides a reasonable estimate sponds to a potential regularization. As we will see later, this of where TFs bind. Yet, it is context agnostic. TF binding choice of regularization compensates for a bias that noise in changes between different tissues, allowing cells to assume P and C introduces. In principle, we could choose any ma- their specific functions, and can become disrupted by dis- trix norm but limit our following discussion to the Frobenius 2 Pn Pm 2 T eases like cancer. norm kAk := i=1 j=1 aij = tr(A A) for a matrix To estimate condition specific GRNs, we solve the OT- A = (aij). For this choice, gradient descent resembles most TER objective. In doing so, we assume that P and C closely the related message passing equations of PANDA Figure 2: The unknown bipartite network W is inferred from its observed projections P and C. and we can derive the solutions of the minimization prob- Condition (4) is usually met and it is a minor technicality lem. to exclude alternative global minima of Objective (1) that These solutions depend on the spectral decomposition of defy our intuition. The nature of these alternatives is dis- T T P = UpDpUp and C = VcDcVc , which exist with re- cussed in detail in the proof of Thm. 1 in the supplement. spect to orthogonal Up and Vc, as P and C are symmetric. According to Thm. 1, OTTER (Eq. (1)) has at least 2np Otherwise, the same results hold for the spectral decompo- different solutions. Each column u of U has two op- T T :i p sition of (P + P )/2 and (C + C )/2. Dp and Dc are tional signs that do not alter the spectral decomposition of diagonal matrices containing the eigenvalues of the respec- P but can lead to a different solution W ∗. The same ap- tive matrix. In a slight abuse of notation, we denote with plies to columns v:i of Vc. Only the product of correspond- D D ∈ np×np p a matrix p R and, if convenient, a matrix ing columns (u:i and v:,i) determines the respective solu- nc×nc Dp ∈ , which is padded with zeros accordingly. Fur- ∗ ∗ P R tion W , as we have wij = k dw,kkuikvjk. This leaves [np] np thermore, let M = (mij)i≤np,j≤np denote a submatrix us with 2 alternatives. If the (non-zero) spectra are not of a larger matrix M with dimension np × np. Without loss simple, such that some eigenspaces have multiple choices of of generality, we assume that the eigenvalues dp,ii of P are basis functions, we have additional degrees of freedom in indexed in descending order; dp,ii ≥ dp,jj for i < j. For C constructing the solutions. however, we require a good matching with P . We therefore As a consequence, we face a model selection problem and assume implicitly that the distance of Dc to Dp is minimized require additional information to make an informed deci- with respect to permutations of the eigenvalues of C, that is 2 2 sion. In the following, we propose two natural algorithmic kDc − Dpk = minπ∈P kDc,π − Dpk , where P denotes choices to identify a solution: (a) a spectral approach based the set of permutations of {1, ··· , nc} and Dc,π the corre- on Thm. 1 and (b) gradient descent minimizing OTTER. sponding ordering of eigenvalues on the diagonal. If D and p np×nc Both rely on additional input W0 ∈ , an initial Dc show little discrepancy, this will result in the eigenvalues R of C being in descending order as well. Now, everything is guess of a gene regulatory matrix. The choice of W0 is cru- in place to characterize the solution space S. cial for the performance of both algorithms. To understand some of their advantages and limitations, we provide theo- n ×n T Theorem 1. For given P ∈ R p p with P = P and retical recovery results when W is a random perturbation n ×n T 0 C ∈ R c c with C = C , for any spectral decomposition of the correct W and compare both algorithms on synthetic T T P = UpDpUp and C = VcDcVc , λ ∈ [0, 1], the minimiza- data with increasing levels of noise. tion problem (1) has solutions W ∗ ∈ S with singular value ∗ T decomposition W = UpDwVc , where q A spectral method for solving OTTER dw,ii = max ((1 − λ)dp,ii + λdc,ii − γ, 0) (2) Assuming that W0 provides good evidence, our first pro- for i ≤ np. For dw,ii = 0, the corresponding columns of posal for a network inference algorithm selects the closest and are not restricted to the eigenvectors of and Uw Vw P solution (Wc ∈ S) to W0 in a spectral approach: Wc = 2 C. The eigenvalues of C are ordered such that Dc = Dc,π, minW ∈S kW − W0k . If P and C have simple spectra so where the permutation solves the minimization problem that the non-zero eigenvalues correspond to 1-dimensional eigenspaces, the solution to this minimization problem can λ(1 − λ) [np] 2 be computed easily. Note that this assumption is satisfied in π =argmin 0 kD − D 0 k π ∈P 2 p c,π (3) our applications. From our previous derivation of the solution space, we know that the only ambiguity lies in the sign λ [np] 2 [np] − kDc,π0 k + (1 − λ)γ tr Dc,π0 . of the eigenvectors or, equivalently, the singular eigenvalues. 2 T Essentially, for fixed spectral decomposition P = UpDpUp [np] T For ∆ := Dc − Dp, we further assume that and C = VcDcVc , our candidate solutions are of the form W = U D D V T , where D contains the unknown sign p w s c s 1 2 1 2 1 2 2 information. Ds is a diagonal matrix with entries ds,ii ∈ 2 kP k + tr (P ) + (tr(∆)) > k∆k . (4) λ np np {−1, 1} on the diagonal. These are our only degrees of free- E ∗ ∗T 2 E ∗T ∗ 2 dom. Hence, our problem turns into (P ) = W W + σpI and (C) = W W + σc I, 2 T 2 where I denotes the respective identity matrix. The choice min kW −W0k = min kDwDs−Up W0Vck (5) γ = λσ2 + (1 − λ)σ2 in Eq. (1) can compensate for this W ∈S ds,ii∈{−1,1} c p spectral shift. T For simplicity, we write M0 = Up W0Vc. The solution It should be noted that, Thm. 1 states that such a l2- T regularization alters the solutions to Problem (1) in two Wc is unique and given by Wc = UpDbwV with dbw,ii = c ways. Not only are the singular values of W ∗ shifted by −γ dw,iisign(m0,ii), where dw,ii is defined as in Thm. 1 and sign(x) = 1 for x ≥ 0 and sign(x) = −1 for x < 0. to compensate for the biases introduced by the noise, also An important question in many applications is how well the matching of the eigenvalues of P and C is influenced by [np] this approach performs under noise. First, we study a simpli- the additional penalty γ(1−λ) tr Dc,π in Eq. (3). Conse- fied scenario, in which only W0 is noise corrupted so that we quently, it may be optimal to pair the eigenvalues of P with know the singular values Dw, as we can deduce them from smaller eigenvalues of C rather than larger ones if γ is large. the correct P and C. Even in this simplified case, perfect The spectral method can be powerful in a setting in which recovery of W is unlikely for large-scale problems, as the noise is well controlled such that our assumptions are met next proposition states. Let Φ denote the cumulative distri- approximately. Our second solution proposal, gradient de- bution function (cdf) of a standard normal and X ∼ Ber(p) scent, however, gives us more tuning options, including the a Bernoulli random variable with success probability p. step size and early stopping, that will allow us to stay closer ∗T ∗ Proposition 2. Assume that we observe P = W W , to the initial guess W0. ∗ ∗T ∗ C = W W , and W0 = W + E for a true underlying W ∗ ∈ Rnp×np and noise E ∈ Rnp×np with inde- Gradient descent for solving OTTER pendent identically normally distributed components eij ∼ The message passing equations of PANDA resemble a gra- N 0, σ2. Further assume that P and C have a simple dient descent procedure minimizing Objective (1). We ex- spectrum {d1, . . . , dn }. Then, for the spectral approach plain this relationship in detail in the supplement. In our p experiments, we used the ADAM method (Kingma and Ba W = argmin kW − W k2 with γ = 0, the recovery c W∈S 0 2014) for gradient descent, but alternatives are equally appli- ∗ 2 Pnp 2 loss is distributed as kWc − W k = 4 i=1 di Ri, where cable. To better understand the approach from a theoretical Ri ∼ Ber (Φ (−di/σ)) for di > 0 and Ri = 0 for di = 0 perspective and reason about its response to noisy data, we are independent. For any > 0, the following holds with the take the continuous time approximation (corresponding to usual Chernoff bound: infinitesimally small step size) and study the corresponding gradient flow: ∗ 2 P kWc − W k ≤ ≥ 1−exp − µ − δ log , dW 4 µ τ = −∇f(W ) dt P 1 (6) where µ = i pi and δ = 2 for ≤ µ and T maxi(dw,ii) = −WW W + (1 − λ)PW + λW C − γW, 1 δ = 2 otherwise. where we set the time unit τ = 1 in the following for sim- mini(dw,ii) plicity. If the initial W has a similar singular value decom- The proof is given in the supplement. The insight that 0 position as a solution, the differential equation decouples Ri ∼ Ber (Φ (−di/σ)) allows us also to analyze the proba- and we can solve the resulting one-dimensional ordinary dif- bility of perfect recovery ( = 0). We have P Wc = W ∗ = ferential equations for the diagonal elements explicitly. Qnp T T i=1 (1 − Φ(−di/σ)). In our examples, np = 1636 and Proposition 3. For initial W0 = UpD0Vc with UpDpUp d ≈ 0.0001 0.5 T min . To achieve a probability of at least , we and VcDcVc , the solution of the gradient flow (6) is given σ ≈ 3 · 10−5 T could allow for a noise variance of . In many by W (t) = UpDtVc with applications, this would be a reasonable range, considering 7 dt,ii =sign(d0,ii)dw,ii that we have npnc ≈ 4.4 · 10 matrix entries. Yet, biological data is known to be very noisy. In addition, r1 × h d2 t + h−1 2d2 /d2 − 1 + 1, we also expect high noise in P and C. To compensate for 2 w,ii 0,ii w,ii this additional noise, we need regularization. 2 2 where h(x) = tanh(x) if d0,ii < dw,ii and h(x) = coth(x) otherwise. Regularization To motivate the need for regularization (γ > 0), we show that, depending on the source of the The proof is provided in the supplement. Note that the noise, the spectrum of P and C becomes biased. Typical square root factor converges to 1 for t → ∞ in both cases. Hence, the final solution inherits the signs sign(d0,kk) of noise matrices Ep and Ec have iid entries with zero mean, variance σ2 , and a symmetric distribution. They could dis- the initialization, which is similar to our spectral approach. p/c Thus, if we start from a reasonable guess W that diagonal- ∗ ∗T ∗T ∗ 0 tort the true projections W W and W W as P = izes with respect to the same U and V as the global minima, ∗ ∗ T ∗ T ∗ (W + Ep)(W + Ep) and C = (W + Ec) (W + Ec) gradient descent will converge to the closest global mini- ∗ ∗T T ∗T ∗ T or P = W W + EpEp and C = W W + Ec Ec, mum (for small enough learning rate). For general W0, how- respectively. In both cases, P and C become biased as ever, it is important to keep in mind that gradient descent ● can converge to different solutions, since it optimizes a non- ● ● convex objective (Kingma and Ba 2014; Burkholz and Du- 2 batovka 2019). It does not necessarily stay close to our initialization and can even get stuck in local minima. But it also ● 0 ● σ ● ● provides us with additional tuning options and early stop- ● p c (error)

● ping, which will enable us to outperform the state of the art 10 0 1e−06 in GRN inference. log −2 1e−05 ● ● ● ● ● ● ● ● 1e−04 ● ● ● ● ● ● 0.001 ● ● ● Relation to inexact graph matching −4 ● 0.01 As we show in this section, OTTER can also be interpreted −4 −3 −2 −1 0 log σ as relaxed graph matching. If W solves the OTTER objec- 10 0 tive for γ = 0 perfectly (f(W ) = 0), P and C are its projec-

T T ● tions, i.e., P = WW and C = W W . It follows that P , 0 ● ● T ● ● C, and W also fulﬁll the relation PW = WW W = WC. ● Hence, it would also be reasonable infer a bipartite network −5 from its projections by minimizing the objective

(error) −10 10 ● ● ● ● ● ● ● ● 1 γ ● ● ● ● 2 2 ●

g (W ) = kPW − WCk + kW k (7) log −15 1 2 2 ● −20 ● ● ● (with additional l2-regularization). This is the well known ● quadratic assignments problem (QAP), a standard objective −4 −3 −2 −1 0 in graph matching (Aflalo, Bronstein, and Kimmel 2015). log10σ0 In this setting, P and C are usually assumed to have the same dimension (nP = nC ). The dimensions can differ for inexact graph matching, but the smaller network is then sup- Figure 3: OTTER recovery error using (top) gradient descent posed to be similar to a subgraph of the bigger one. Thus, and (bottom) spectral decomposition for artificial networks W of size np = 100, nc = 200 and Gaussian noise with vari- the minimization is performed under the constraint that 2 2 is a permutation matrix. In contrast, we are not interested in a ance σp/c for P and C and σ0 for W0. Shaded regions corre- permutation matrix, but in a weighted network W ∈ Rnp×nc spond to the 0.95 confidence interval and lines to the average that solves the relaxed QAP. As OTTER, QAP has different over 10 repetitions. The legend applies to both figures. solutions and thus solution techniques. Gradient descent and spectral approaches are common choices. In particular, GRAMPA (Fan et al. 2019a,b) is a variant Experiments of QAP with strong recovery guarantees. It adds the term −δ1T W 1 to the QAP objective (7), where 1 denotes a vec- Experiments on synthetic data tor with all entries equal to one: To showcase the performance of OTTER for cases in which our assumptions are met and to study the influence of noise, 1 γ ∗ g (W ) = kPW − WCk2 + kW k2 − δ1T W 1 (8) we create synthetic data based on a ground truth W that we 2 ∗ 2 2 try to recover from noise corrupted inputs W0 = W + E0, ∗ ∗ T ∗ T ∗ As a consequence, the GRAMPA minimization problem has P = (W +Ep)(W +Ep) , and C = (W +Ec) (W + Ec). All noise entries are Gaussian and independently dis- a unique solution and becomes explicitly solvable by a spec- 2 2 tral approach. tributed with e0,ii ∼ N 0, σ0 , ep,ii ∼ N 0, σp , and 2 As for OTTER, the spectral approach performs worse in ec,ii ∼ N 0, σc . To obtain a realistic ground truth for estimating GRNs than the optimization by gradient descent. which we can repeat each experiment 10 times conveniently, We therefore only report the latter in our experiments, where we sub-sample (in each repetition) the ChIP-Seq network for we explore the utility of graph matching techniques for GRN the liver tissue to np = 100 and nc = 200. (See the next inference in comparison with OTTER. The precise gradient section for more details.) As this is unweighted, we draw −5 descent algorithms minimizing QAP or GRAMPA are de- the weights iid from N (10 , 1). For each network, we use tailed in the supplement. the spectral and the gradient descent version of OTTER and ∗ 2 Graph matching can also be studied within the optimal report the obtained recovery error k|W − W k . transport framework (Peyre,´ Cuturi, and Solomon 2016; We align the eigenvalues of P and C by arranging them in Titouan et al. 2019). We could formulate the OTTER objec- descending order in the spectral approach. ADAM gradient 4 tive with respect to a nonstandard metric and regularization descent for the OTTER objective is run for 10 steps with the term. Since we are not searching for stochastic matrices W , default ADAM parameters, as detailed in the supplement. this does not serve our purpose and we leave the transfer For both the gradient decent and the spectral approach, we 2 2 2 of related methods to gene regulatory network inference to use parameters γ = σp/c = σp = σc and λ = 0.5. future explorations. The results are shown in Fig. 3. For small levels of noise in Table 1: TF binding prediction for different cancer tissues. liver cancer. The protein-protein interaction matrix P is de- The symbol † indicates that binding predictions were made rived using laboratory experiments and represents possible only for TFs with ChIP-seq data due to high computational interactions; we use the version of (Sonawane et al. 2017) demands. The highest score for each data set is shown in and fill unavailable information with zeros. P consists of bold. np = 1, 636 potential TFs. Our initial guess of a gene regulatory network, W0, is derived from the human reference genome. It is almost identical across tissues. It only varies AUC-ROC (AUC) slightly according to the number of genes (nc) included af- METHOD BREAST CERVIX LIVER ter filtering and normalization. W0 is a binary matrix with COR 0.5900 0.5758 0.5637 w0,ij ∈ {0, 1} where “1” indicates a TF sequence motif in † PARTIAL COR 0.5366 0.5209 0.5175 the promoter of the target gene. Sequence motif mapping ARACNE 0.6150 0.5234 0.5636 was performed using the FIMO software (Grant, Bailey, and GENIE3 0.4818† 0.4832 0.4846 Noble 2011) from the MEME suite (Bailey et al. 2009) and TIGRESS† 0.4945 0.4808 0.5018 the GenomicRanges R package (Lawrence et al. 2013). Note OTTER SPECTRAL 0.5787 0.5420 0.5345 that neither W0 or P carry sign information about edge GRAMPA GRAD 0.6301 0.6328 0.6072 weights so that we cannot infer whether TFs inhibit or ac- QAP GRAD 0.6373 0.6287 0.6081 tivate the expression of a gene. We therefore focus on the WGCNATOM 0.6146 0.5842 0.5946 prediction of link existence with the understanding that the W0 0.6282 0.6261 0.5982 type of interaction can be estimated post hoc. PANDA 0.6739 0.6642 0.6211 Validation of gene regulatory networks is a major chal- OTTER GRAD 0.6936 0.6833 0.6600 lenge. Data from chromatin immunoprecipitation followed AUC-PR (AUPR) by sequencing (ChIP-seq) experiments, which measure the COR 0.2772 0.2247 0.3057 binding of TFs to DNA in the genome, provide a valida- † PARTIAL COR 0.2361 0.1952 0.2525 tion standard against which to benchmark our results. Each ARACNE 0.2858 0.2027 0.2986 ChIP-seq experiment assays only one TF. Because of the as- GENIE3 0.2064† 0.1836 0.2437 say’s relatively high cost, there are only few data sets that TIGRESS† 0.2088 0.1845 0.2523 have ChIP-seq data for many TFs from the same cells. We OTTER SPECTRAL 0.2555 0.2024 0.2614 used ChIP-seq data from the HeLa cell line (cervical cancer, GRAMPA GRAD 0.3162 0.2763 0.3223 48 TFs), HepG2 cell line (liver cancer, 77 TFs) and MCF7 QAP GRAD 0.3215 0.2637 0.3425 cell line (breast cancer, 62 TFs) available in the ReMap2018 WGCNATOM 0.2834 0.2229 0.3140 database (Cheneby` et al. 2018), a database collection of pub- W0 0.2865 0.2523 0.3045 PANDA 0.3481 0.2960 0.3503 licly available ChIP-seq datasets from available studies. This OTTER GRAD 0.3752 0.3179 0.3746 database contains identified ChIP-seq peaks, representing our target TF binding sites. Further details are given in the supplementary material. Based on this data, we measure the P and C, the spectral approach performs reliably and better performance of link classification on the subnetwork in each than gradient descent. However, for high σ2 gradient de- tissue that is constrained to the available TFs and report the p/c AUC-ROC (area under the receiver operating characteristic scent outperforms the spectral method. Since biological data curve) and AUPR (or AUC-PR) (area under the precision is inherently noisy, gradient descent seems to be the method recall curve). of choice. Furthermore, it provides us with additional tuning options that we can leverage to outperform state-of-the-art Hyperparameter tuning of OTTER was assisted by MAT- methods. LAB’s bayesopt function utilizing a Gaussian process prior to maximize the joint AUC-PR for breast and cervix can- Experiments on cancer data cer, max AUPRbreast · AUPRcervix. Breast and cervix data serve therefore as training data while the liver cancer data The most abundant data source for studying gene regulation is an independent test set. The parameters of all compared is gene expression data. These data are often measured using methods are reported in the supplementary information. bulk RNA-sequencing (RNA-seq) with samples corresponding to different individuals. Results Table 1 compares the feasible GRN inference and Datasets and experimental set-up We obtained bulk relaxed graph matching methods based on comparison with RNA-seq data from the Cancer Genome Atlas (TCGA) experimental ChIP-seq binding data. Note that we also re- (Tomczak, Czerwinska,´ and Wiznerowicz 2015). The data port the performance of our initialization W0, which is based is downloaded from recount2 (Collado-Torres et al. 2017) on motif data. OTTER GRAD, PANDA, QAP GRAD, and for liver, cervical, and breast cancer tumors and normal- GRAMPA GRAD greatly improve this initial guess and ized and filtered as described in the supplement. The cor- make it tissue specific. Overall, OTTER gradient descent responding Pearson correlation matrix defines the gene-gene (OTTER GRAD) achieves the best performance on all tis- co-expression matrix C consisting of nc = 31, 247 genes for sues, in particular, on the liver test set. An enrichment anal- breast cancer, nc = 30, 181 for cervix cancer and 27, 081 for ysis of Gene Ontology terms between networks for healthy and cancerous liver tissue in the supplement provides addi- gov/tcga. DW, MG, CL, JQ, RB were supported by a grant tional evidence that OTTER GRAD is biologically meaning- from the US National Cancer Institute (1R35CA220523). JP ful. acknowledges support from the US National Heart, Lung, Interestingly, ADAM gradient descent solving alternative and Blood Institute (NHLBI): K25HL140186 and KG from graph matching problems, i.e., QAP GRAD and GRAMPA the K25 grant: K25HL133599. GRAD, achieve better results than established GRN infer- We thank Alkis Gotovos for helpful feedback on the ence algorithms, even though they were not originally de- manuscript. signed for this purpose. They succeed based on similar hy- perparameters as OTTER GRAD. Data and code availability In general, we observe better performance for the meth- OTTER is available in R, Python, and MATLAB ods that incorporate additional biological evidence such as through the netZoo packages: netZooR v0.7 (https://github. (transformed) protein-protein interactions and binding mo- com/netZoo/netZooR), netZooPy v0.7 (https://github.com/ tifs, even though P and W0 are not tissue-specific. A reason netZoo/netZooPy), and netZooM v0.5 (https://github.com/ for this is that correlations in gene expression can be caused netZoo/netZooM). by many factors. Many TFs are expressed at very low levels We provide a tutorial to walk the users through the usage of but strongly activate their target genes, obscuring correla- OTTER in R (https://netzoo.github.io/netZooR/). tions between TFs and their targets. Hence, graph matching The raw and processed data are accessible through net- approaches are a promising alternative to models that make Zoo (https://netzoo.github.io/zooanimals/otter/) and the net- predictions based on gene expression alone. works can be downloaded from the GRAND database (https: //grand.networkmedicine.org/cancers/). Discussion References We formulated the inference of a bipartite network from its two projections as a non-convex but analytically tractable Aflalo, Y.; Bronstein, A.; and Kimmel, R. 2015. On con- optimization problem, OTTER. The projections alone do not vex relaxation of graph isomorphism. Proceedings of the provide enough information for network inference, as OT- National Academy of Sciences 112(10): 2942–2947. TER has multiple solutions that we derived explicitly. We Bailey, T. L.; Boden, M.; Buske, F. A.; Frith, M.; Grant, proposed two natural inference algorithms for model selec- C. E.; Clementi, L.; Ren, J.; Li, W. W.; and Noble, W. S. tion, a spectral approach and gradient descent, and derived 2009. MEME SUITE: tools for motif discovery and search- sufficient conditions for network recovery. Both rely on an ing. Nucleic acids research 37(suppl 2): W202–W208. W additional initial guess of the bipartite network, 0, which Barak, B.; Chou, C.-N.; Lei, Z.; Schramm, T.; and Sheng, Y. has to be close to the correct network to guarantee good net- 2019. (Nearly) Efficient Algorithms for the Graph Match- work recovery. We find the spectral approach to be more re- ing Problem on Correlated Random Graphs. In Advances liable in low noise settings, while gradient descent seems to in Neural Information Processing Systems 32, 9190–9198. be more robust with respect to higher amounts of noise and NIPS 2019. therefore more suitable for our application of interest: gene regulatory network inference. Berg, A. C.; Berg, T. L.; and Malik, J. 2005. Shape match- As we have shown, gradient descent also resembles in part ing and object recognition using low distortion correspon- an established gene regulatory network inference method, dences. In 2005 IEEE Computer Society Conference on PANDA. OTTER can therefore be interpreted as a theoreti- Computer Vision and Pattern Recognition (CVPR’05), vol- cally tractable simplification of PANDA that provides an in- ume 1, 26–33 vol. 1. triguing connection to relaxed graph matching. OTTER also Burkholz, R.; and Dubatovka, A. 2019. Initialization of Re- outperforms state-of-the art gene regulatory network infer- LUs for Dynamical Isometry. In Advances in Neural Infor- ence approaches on real world data sets corresponding to mation Processing Systems 32, 2385–2395. NeurIPS’2019. three human cancer tissues. We make these data sets pub- Burkholz, R.; and Quackenbush, J. 2021. Cascade size dis- licly available to benchmark the use of general graph match- tributions: Why they matter and how to compute them effi- ing algorithms for gene regulatory network inference (Gue- ciently. In Proceedings of the AAAI Conference on Artificial bila et al. 2020). As highlighted, relaxed graph matching ap- Intelligence. AAAI’2021. proaches apply to this setting and achieve competitive performance. They have the advantage that they can integrate Chapal, M.; Mintzer, S.; Brodsky, S.; Carmi, M.; and Barkai, additional information about a gene regulatory network in N. 2019. Resolving noise–control conflict by gene duplica- the form of W0 and protein interactions P . We therefore tion. PLoS Biology 17(11). see great potential in transferring other graph matching tech- Cheneby,` J.; Gheorghe, M.; Artufel, M.; Mathelier, A.; and niques to gene regulatory network inference in future inves- Ballester, B. 2018. ReMap 2018: an updated atlas of regula- tigations. tory regions from an integrative analysis of DNA-binding ChIP-seq experiments. Nucleic acids research 46(D1): Acknowledgements D267–D275. The results shown here are in part based upon data gener- Collado-Torres, L.; Nellore, A.; Kammers, K.; Ellis, S. E.; ated by the TCGA Research Network: https://www.cancer. Taub, M. A.; Hansen, K. D.; Jaffe, A. E.; Langmead, B.; and Leek, J. T. 2017. Reproducible RNA-seq analysis using Lambert, S. A.; Jolma, A.; Campitelli, L. F.; Das, P. K.; recount2. Nature biotechnology 35(4): 319–321. Yin, Y.; Albu, M.; Chen, X.; Taipale, J.; Hughes, T. R.; and Cour, T.; Srinivasan, P.; and Shi, J. 2007. Balanced Graph Weirauch, M. T. 2018. The Human Transcription Factors. Matching. In Scholkopf,¨ B.; Platt, J. C.; and Hoffman, T., Cell 172(4): 650–665. eds., Advances in Neural Information Processing Systems Langfelder, P.; and Horvath, S. 2008. WGCNA: an R pack- 19, 313–320. NIPS 2007. age for weighted correlation network analysis. BMC Bioin- Deplancke, B.; Alpern, D.; and Gardeux, V. 2016. The Ge- formatics 1: 559. netics of Transcription Factor DNA Binding Variation. Cell Lao, T.; Glass, K.; Qiu, W.; Polverino, F.; Gupta, K.; 166(3): 538–554. Morrow, J.; Mancini, J. D.; Vuong, L.; Perrella, M. A.; Fan, W. 2012. Graph Pattern Matching Revised for Social Hersh, C. P.; et al. 2015. Haploinsufficiency of Hedgehog Network Analysis. In Proceedings of the 15th International interacting protein causes increased emphysema induced Conference on Database Theory, ICDT ’12, 8–21. by cigarette smoke through network rewiring. Genome medicine 7(1): 12. Fan, Z.; Mao, C.; Wu, Y.; and Xu, J. 2019a. Spectral Graph Matching and Regularized Quadratic Relaxations I: Lawrence, M.; Huber, W.; Pages, H.; Aboyoun, P.; Carlson, The Gaussian Model. M.; Gentleman, R.; Morgan, M. T.; and Carey, V. J. 2013. Software for computing and annotating genomic ranges. Fan, Z.; Mao, C.; Wu, Y.; and Xu, J. 2019b. Spectral Graph PLoS computational biology 9(8). Matching and Regularized Quadratic Relaxations II: Erdos-˝ Renyi´ Graphs and Universality. Lopes-Ramos, C. M.; Kuijjer, M.; Glass, K.; DeMeo, D.; and Quackenbush, J. 2020. Abstract 6569: Regulatory net- Friedman, J.; Hastie, T.; and Tibshirani, R. 2008. Sparse works of liver carcinoma reveal sex specific patterns of gene inverse covariance estimation with the graphical lasso. Bio- regulation. Cancer Research 80: 6569–6569. statistics 9(3): 432–441. Lopes-Ramos, C. M.; Kuijjer, M. L.; Ogino, S.; Fuchs, C. S.; Glass, K.; Huttenhower, C.; Quackenbush, J.; and Yuan, G.- DeMeo, D. L.; Glass, K.; and Quackenbush, J. 2018. Gene C. 2013. Passing messages between biological networks to regulatory network analysis identifies sex-linked differences refine predicted interactions. PloS one 8(5). in colon cancer drug metabolism. Cancer research 78(19): Grant, C. E.; Bailey, T. L.; and Noble, W. S. 2011. FIMO: 5538–5547. scanning for occurrences of a given motif. Bioinformatics Marbach, D.; Costello, J. C.; Kuffner,¨ R.; Vega, N. M.; Prill, 27(7): 1017–1018. R. J.; Camacho, D. M.; Allison, K. R.; Aderhold, A.; Bon- Guebila, M. B.; Lopes-Ramos, C.; Sonawane, A.; neau, R.; Chen, Y.; et al. 2012. Wisdom of crowds for robust Burkholz, R.; Weighill, D.; Platig, J.; Kuijjer, M.; Glass, gene network inference. Nature methods 9(8): 796. K.; and Quackenbush. 2020. GRAND: A database Maron, H.; and Lipman, Y. 2018. (Probably) Concave Graph of gene regulatory models across human conditions. Matching. In Advances in Neural Information Processing https://grand.networkmedicine.org/. Systems 31, 408–418. NIPS 2018. Haury, A.-C.; Mordelet, F.; Vera-Licona, P.; and Vert, J.-P. Mignone, P.; Pio, G.; D’Elia, D.; and Ceci, M. 2019. Ex- 2012. TIGRESS: trustful Inference of Gene REgulation us- ploiting transfer learning for the reconstruction of the human ing Stability Selection. BMC systems biology 6: 145. gene regulatory network. Bioinformatics 36(5): 1553–1561. Hobert, O. 2008. Gene regulation by transcription factors Morgunova, E.; and Taipale, J. 2017. Structural perspective and microRNAs. Science 319(5871): 1785–1786. of cooperative transcription factor binding. Curr Opin Struct Huynh-Thu, V. A.; Irrthum, A.; Wehenkel, L.; and Geurts, P. Biol 47: 1–8. 2010. Inferring Regulatory Networks from Expression Data Ouyang, Z.; Zhou, Q.; and Wong, W. H. 2009. ChIP-Seq of Using Tree-Based Methods. PLOS ONE 5(9): 1–10. transcription factors predicts absolute and differential gene Jiang, B.; Tang, J.; Ding, C.; Gong, Y.; and Luo, B. 2017. expression in embryonic stem cells. Proceedings of the Na- Graph Matching via Multiplicative Update Algorithm. In tional Academy of Sciences 106(51): 21521–21526. Advances in Neural Information Processing Systems 30, Peyre,´ G.; Cuturi, M.; and Solomon, J. 2016. Gromov- 3187–3195. NIPS 2017. Wasserstein Averaging of Kernel and Distance Matrices. In Karimzadeh, M.; and Hoffman, M. M. 2019. Virtual ChIP- Proceedings of the 33rd International Conference on In- seq: predicting transcription factor binding by learning from ternational Conference on Machine Learning - Volume 48, the transcriptome. bioRxiv doi:10.1101/168419. ICML16, 2664–2672. Kingma, D.; and Ba, J. 2014. Adam: A Method for Stochas- Pio, G.; Ceci, M.; Prisciandaro, F.; and Malerba, D. 2020. tic Optimization. International Conference on Learning Exploiting causality in gene network reconstruction based Representations . on graph embedding. Machine Learning 109(6): 1231– Lachmann, A.; Giorgi, F. M.; Lopez, G.; and Califano, A. 1279. 2016. ARACNe-AP: gene network reverse engineering Qiu, W.; Guo, F.; Glass, K.; Yuan, G. C.; Quackenbush, J.; through adaptive partitioning inference of mutual informa- Zhou, X.; and Tantisira, K. G. 2018. Differential connectiv- tion. Bioinformatics 32(14): 2233–2235. ity of gene regulatory networks distinguishes corticosteroid response in asthma. Journal of Allergy and Clinical Im- Zhou, F.; and De la Torre, F. 2016. Factorized Graph Match- munology 141(4): 1250–1258. ing. IEEE Transactions on Pattern Analysis and Machine Shi, W.; Fornes, O.; and Wasserman, W. W. 2018. Gene ex- Intelligence 38(9): 1774–1789. pression models based on transcription factor binding events confer insight into functional cis-regulatory variants. Bioin- formatics 35(15): 2610–2617. Singh, R.; Xu, J.; and Berger, B. 2008. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences 105(35): 12763–12768. Sonawane, A. R.; Platig, J.; Fagny, M.; Chen, C.-Y.;Paulson, J. N.; Lopes-Ramos, C. M.; DeMeo, D. L.; Quackenbush, J.; Glass, K.; and Kuijjer, M. L. 2017. Understanding tissue- specific gene regulation. Cell reports 21(4): 1077–1088. Spitz, F.; and Furlong, E. E. M. 2012. Transcription factors: from enhancer binding to developmental control. Nat Rev Genet 13(9): 613–626. Titouan, V.; Courty, N.; Tavenard, R.; Laetitia, C.; and Fla- mary, R. 2019. Optimal Transport for structured data with application on graphs. In Chaudhuri, K.; and Salakhutdinov, R., eds., Proceedings of the 36th International Conference on Machine Learning, volume 97, 6275–6284. Todeschini, A.-L.; Georges, A.; and Veitia, R. A. 2014. Transcription factors: specific DNA binding and specific gene regulation. Trends in genetics 30(6): 211–219. Tomczak, K.; Czerwinska,´ P.; and Wiznerowicz, M. 2015. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemporary oncology 19(1A): A68. Yamanishi, Y. 2009. Supervised Bipartite Graph Inference. In Koller, D.; Schuurmans, D.; Bengio, Y.; and Bottou, L., eds., Advances in Neural Information Processing Systems 21, 1841–1848. NIPS 2009. Yan, J.; Yin, X.-C.; Lin, W.; Deng, C.; Zha, H.; and Yang, X. 2016. A Short Survey of Recent Advances in Graph Match- ing. In Proceedings of the 2016 ACM on International Con- ference on Multimedia Retrieval, ICMR ’16, 167–174. Yuan, Y.; and Bar-Joseph, Z. 2019. Deep learning for inferring gene relationships from single-cell expression data. Proceedings of the National Academy of Sciences 116(52): 27151–27158. Yung, T.; Poon, F.; Liang, M.; Coquenlorge, S.; McGaugh, E. C.; Hui, C.-c.; Wilson, M. D.; Nostro, M. C.; and Kim, T.-H. 2019. Sufu-and Spop-mediated downregula- tion of Hedgehog signaling promotes beta cell differentiation through organ-specific niche signals. Nature communi- cations 10(1): 1–17. Zhang, B.; and Horvath, S. 2005. A general framework for weighted gene coexpression network analysis. In Statistical Applications in Genetics and Molecular Biology 4: Article 17. Zhou, F.; and De la Torre, F. 2013. Deformable Graph Matching. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’13, 2922–2929. USA: IEEE Computer Society.