Localized Structured Prediction

Carlo Ciliberto1, Francis Bach2,3, Alessandro Rudi2,3 1 Department of Electrical and Electronic Engineering, Imperial College London, London 2 Département d’informatique, Ecole normale supérieure, PSL Research University. 3 INRIA, Paris, France 101

• X input space, Y output space, • ℓ : Y × Y → R loss function, • ρ probability on X × Y.

f ⋆ = argmin E[ℓ(f(x), y)], f:X →Y

n given only the dataset (xi, yi)i=1 sampled independently from ρ.

1 Structured Prediction

2 If Y is a vector space

• G easy to choose/optimize: (generalized) linear models, Kernel methods, Neural Networks, etc.

If Y is a “structured” space:

• How to choose G? How to optimize over it?

Protypical Approach: Empirical Risk Minimization

Solve the problem:

∑n b 1 f = argmin ℓ(f(xi), yi) + λR(f). ∈G n f i=1

Where G ⊆ {f : X → Y} (usually a convex function space)

3 If Y is a “structured” space:

• How to choose G? How to optimize over it?

Protypical Approach: Empirical Risk Minimization

Solve the problem:

∑n b 1 f = argmin ℓ(f(xi), yi) + λR(f). ∈G n f i=1

Where G ⊆ {f : X → Y} (usually a convex function space)

If Y is a vector space

• G easy to choose/optimize: (generalized) linear models, Kernel methods, Neural Networks, etc.

3 Protypical Approach: Empirical Risk Minimization

Solve the problem:

∑n b 1 f = argmin ℓ(f(xi), yi) + λR(f). ∈G n f i=1

Where G ⊆ {f : X → Y} (usually a convex function space)

If Y is a vector space

• G easy to choose/optimize: (generalized) linear models, Kernel methods, Neural Networks, etc.

If Y is a “structured” space:

• How to choose G? How to optimize over it?

3 State of the art: Structured case

b Y arbitrary: how do we parametrize G and learn f?

Surrogate approaches + Clear theory (e.g. convergence and learning rates) - Only for special cases (classification, ranking, multi-labeling etc.) [Bartlett et al., 2006, Duchi et al., 2010, Mroueh et al., 2012]

Score learning techniques

+ General algorithmic framework (e.g. StructSVM [Tsochantaridis et al., 2005]) - Limited Theory (no consistency, see e.g. [Bakir et al., 2007] )

4 Is it possible to have best of both worlds?

general algorithmic framework + clear theory

5 Table of contents

1. A General Framework for Structured Prediction [Ciliberto et al., 2016]

2. Leveraging Local Structure [This Work]

6 A General Framework for Structured Prediction Pointwise characterization in terms of the conditional expectation:

⋆ f (x) = argmin Ey[ℓ(z, y) | x]. z∈Y

Characterizing the target function

⋆ f = argmin Exy[ℓ(f(x), y)]. f:X →Y

7 Characterizing the target function

⋆ f = argmin Exy[ℓ(f(x), y)]. f:X →Y

Pointwise characterization in terms of the conditional expectation:

⋆ f (x) = argmin Ey[ℓ(z, y) | x]. z∈Y

7 Deriving an Estimator

Idea: approximate

⋆ f (x) = argmin E(z, x) E(z, x) = Ey[ℓ(z, y) | x] z∈Y

b by means of an estimator E(z, x) of the ideal E(z, x)

fb(x) = argmin Eb(z, x) Eb(z, x) ≈ E(z, x) z∈Y

b n Question: How to choose E(z, x) given the dataset (xi, yi)i=1?

8 Estimating the Conditional Expectation

Idea: for every z perform “regression” over the ℓ(z, ·).

1 ∑n gbz = argmin L(g(xi), ℓ(z, yi)) + λR(g) X →R n g: i=1 b Then we take E(z, x) = gbz(x).

Questions:

• Models: How to choose L?

• Computations: Do we need to compute gbz for every z ∈ Y? b b • Theory: Does E(z, x) → E(z, x)? More generally, does f → f ⋆?

9 Square Loss!

Let L be the square loss. Then:

∑n 1 2 2 gbz = argmin (g(xi) − ℓ(z, yi)) + λ∥g∥ g n i=1

In particular, for linear models g(x) = ϕ(x)⊤w

⊤ 2 2 gbz(x) = ϕ(x) wbz wbz = argmin Aw − b + λ w w

With

⊤ ⊤ A = [ϕ(x1), . . . , ϕ(xn)] and b = [ℓ(z, y1), . . . , ℓ(z, yn)]

10 Computing the gbz All in Once

Closed form solution

b ⊤ b ⊤ ⊤ −1 ⊤ ⊤ gz(x) = ϕ(x) wz = |ϕ(x) (A A{z+ λnI) A } b = α(x) b α(x)

In particular, we can compute

⊤ ⊤ −1 αi(x) = ϕ(x) (A A + λnI) ϕ(xi)

only once (independently of z). Then, for any z

∑n ∑n gbz(x) = αi(x)bi = αi(x)ℓ(z, yi) i=1 i=1

11 Structured Prediction Algorithm

n Input: dataset (xi, yi)i=1.

Training: for i = 1, . . . , n, compute

⊤ −1 vi = (A A + λnI) ϕ(xi)

Prediction: given a new test point x compute

⊤ αi(x) = ϕ(x) vi Then, ∑n b f(x) = argmin αi(x)ℓ(z, yi) ∈Y z i=1

12 Theorem (Rates - [Ciliberto et al., 2016]) Under mild assumption on ℓ. Let λ = n−1/2, then

E[ℓ(fb(x), y) − ℓ(f ⋆(x), y)] ≤ O(n−1/4), w.h.p.

The Proposed Structured Prediction Algorithm

Questions:

• Models: How to choose L? Square loss!

• Computations: Do we need to compute gbz for every z ∈ Y? No need, Compute them all in once! b • Theory: Does f → f ⋆? Yes!

13 The Proposed Structured Prediction Algorithm

Questions:

• Models: How to choose L? Square loss!

• Computations: Do we need to compute gbz for every z ∈ Y? No need, Compute them all in once! b • Theory: Does f → f ⋆? Yes!

Theorem (Rates - [Ciliberto et al., 2016]) Under mild assumption on ℓ. Let λ = n−1/2, then

E[ℓ(fb(x), y) − ℓ(f ⋆(x), y)] ≤ O(n−1/4), w.h.p.

13 A General Framework for Structured Prediction

(General Algorithm + Theory) Is it possible to have best of both worlds?

Yes!

We introduced an algorithmic framework for structured prediction:

• Directly applicable on a wide family of problems (Y, ℓ). • With strong theoretical guarantees. • Recovering many existing algorithms (not seen here).

14 What Am I Hiding?

• Theory. The key assumption to achieve consistency and rates is that ℓ is a Structure Encoding Loss Function (SELF).

ℓ(z, y) = ⟨ψ(z), φ(y)⟩H ∀z, y ∈ Y

With ψ, φ : Y → H continuous maps into H Hilbert. • Similar to the characterization of reproducing kernels. • In principle hard to verify. However lots of ML losses satisfy it!

• Computations. We need to solve an optimization problem at prediction time!

15 Prediction: The Inference Problem

Solving an optimization problem at prediction time is a standard practice in structured prediction. Known as Inference Problem

fb(x) = argmin Eb(x, z) z∈Y

In our case it is reminiscient of a weighted barycenter.

∑n b f(x) = argmin αi(x)ℓ(z, yi) ∈Y z i=1

It is *very* problem dependent

16 Example:

Goal: given a query x, order a set of documents d1, . . . , dk according to their relevance scores y1, . . . , yk w.r.t. x.

∑k Pair-wise Loss: ℓrank(f(x), y) = (yi − yj) sign(f(x)i − f(x)j) i,j=1

∑ b n It can be shown that f(x) = argminz∈Y i=1 αi(x)ℓ(z, yi) is a Minimum Feedback Arc Set problem on DAGs (NP Hard!)

Still, approximate solutions can improve upon non-consistent approaches.

17 Additional Work

Case studies:

• Learning to rank [Korba et al., 2018] • Output Fisher Embeddings [Djerrab et al., 2018] • Y = manifolds, ℓ = geodesic distance [Rudi et al., 2018] • Y = probability space, ℓ = wasserstein distance [Luise et al., 2018]

Refinements of the analysis:

• Alternative derivations [Osokin et al., 2017] • Discrete loss [Nowak-Vila et al., 2018, Struminsky et al., 2018]

Extensions:

• Application to multitask-learning [Ciliberto et al., 2017] • Beyond least squares surrogate [Nowak-Vila et al., 2019] • Regularizing with trace norm [Luise et al., 2019]

18 Predicting Probability Distributions [Luise, Rudi, Pontil, Ciliberto ’18]

Setting: Y = P(Rd) probability distributions on Rd.

Loss: Wasserstein distance ∫ ℓ(µ, ν) = min ∥z − y∥2 dτ(x, y) τ∈Π(µ,ν)

Digit Reconstruction

19 Manifold Regression [Rudi, Ciliberto, Marconi, Rosasco ’18]

Setting: Y Riemmanian manifold.

Loss: (squared) geodesic distance.

Optimization: Riemannian GD.

Fingerprint Reconstruction Multi-labeling (Y = S1 sphere) (Y statistical manifold)

20 Nonlinear Multi-task Learning [Ciliberto, Rudi, Rosasco, Pontil ’17, Luise, Stamos, Pontil, Ciliberto ’19 ]

Idea: instead of solving multiple learning problems (tasks) separately, leverage the potential relations among them.

Previous Methods: only imposing/learning linear tasks relations.

Unable to cope with non-linear constraints (e.g. ranking, robotics, etc.).

MTL+Structured Prediction

− Interpret multiple tasks as separate outputs. − Impose constraints as structure on the joint output.

21 Leveraging local structure Local Structure

22 Motivating Example (Between-Locality)

Super-Resolution:

Learn f : Lowres → Highres.

However...

• Very large output sets (high ). • Local info might be sufficient to predict output.

23 Motivating Example (Between-Locality)

Idea: learn local input-output maps under structural constraints (i.e. overlapping output patches should line up)

Super-Resolution:

Learn f : Lowres → Highres.

Between-Locality. Let [x]p, [y]p denote input/output “parts” p ∈ P : ( ) ( )

• P [y] x = P [y] [x] ( p ) (p p )

• P [y]p [x]p = P [y]q [x]q

24 Structured Prediction + Parts

Assumption. The loss is “aware” of the parts.

∑ ′ ′ ℓ(y , y) = ℓ0([y ]p, [y]p) p∈P

• set P indicizes the parts of X and Y

• ℓ0 loss on parts

• [y]p is the p-th part of y

25 Localized Structured Prediction: Inference

Input Output

xAAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cW7Ae0oWy2k3btZhN2N2IJ/QVePCji1Z/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3H1FpHst7M0nQj+hQ8pAzaqzUeOqXK27VnYOsEi8nFchR75e/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/ND52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasJrP+MySQ1KtlgUpoKYmMy+JgOukBkxsYQyxe2thI2ooszYbEo2BG/55VXSuqh6btVrXFZqN3kcRTiBUzgHD66gBndQhyYwQHiGV3hzHpwX5935WLQWnHzmGP7A+fwB5jmM/A== zAAAB6HicbZC7SwNBEMbnfMbzFbW0WQyCVbiz0UYM2lgmYB6QhLC3mUvW7O0du3tCPAL2NhaK2PrP2Nv537h5FJr4wcKP75thZyZIBNfG876dpeWV1bX13Ia7ubW9s5vf26/pOFUMqywWsWoEVKPgEquGG4GNRCGNAoH1YHA9zuv3qDSP5a0ZJtiOaE/ykDNqrFV56OQLXtGbiCyCP4PC5ad78QgA5U7+q9WNWRqhNExQrZu+l5h2RpXhTODIbaUaE8oGtIdNi5JGqNvZZNARObZOl4Sxsk8aMnF/d2Q00noYBbYyoqav57Ox+V/WTE143s64TFKDkk0/ClNBTEzGW5MuV8iMGFqgTHE7K2F9qigz9jauPYI/v/Ii1E6Lvlf0K16hdAVT5eAQjuAEfDiDEtxAGarAAOEJXuDVuXOenTfnfVq65Mx6DuCPnI8fV0iOxw==sha1_base64="tAmXynKJdtPm35TZtik9m1rvKrM=">AAAB6HicbZC7SgNBFIbPxltcb1FLm8EgWIVdG23EoI1lAuYCyRJmJ2eTMbMXZmaFuOQJbCwUsdWHsbcR38bJpdDEHwY+/v8c5pzjJ4Ir7TjfVm5peWV1Lb9ub2xube8UdvfqKk4lwxqLRSybPlUoeIQ1zbXAZiKRhr7Ahj+4GueNO5SKx9GNHibohbQX8YAzqo1Vve8Uik7JmYgsgjuD4sWHfZ68f9mVTuGz3Y1ZGmKkmaBKtVwn0V5GpeZM4MhupwoTyga0hy2DEQ1Redlk0BE5Mk6XBLE0L9Jk4v7uyGio1DD0TWVIdV/NZ2Pzv6yV6uDMy3iUpBojNv0oSAXRMRlvTbpcItNiaIAyyc2shPWppEyb29jmCO78yotQPym5TsmtOsXyJUyVhwM4hGNw4RTKcA0VqAEDhAd4gmfr1nq0XqzXaWnOmvXswx9Zbz9I15A7sha1_base64="/VpXkFonZ7oqnzGnYgpVrbAMIVo=">AAAB6HicbVA9TwJBEJ3DL8Qv1NJmIzGxInc2UhJtLCGRjwQuZG+Zg5W9vcvungle+AU2Fhpj60+y89+4wBUKvmSSl/dmMjMvSATXxnW/ncLG5tb2TnG3tLd/cHhUPj5p6zhVDFssFrHqBlSj4BJbhhuB3UQhjQKBnWByO/c7j6g0j+W9mSboR3QkecgZNVZqPg3KFbfqLkDWiZeTCuRoDMpf/WHM0gilYYJq3fPcxPgZVYYzgbNSP9WYUDahI+xZKmmE2s8Wh87IhVWGJIyVLWnIQv09kdFI62kU2M6ImrFe9ebif14vNWHNz7hMUoOSLReFqSAmJvOvyZArZEZMLaFMcXsrYWOqKDM2m5INwVt9eZ20r6qeW/WabqV+k8dRhDM4h0vw4BrqcAcNaAEDhGd4hTfnwXlx3p2PZWvByWdO4Q+czx/oAYz6

AAAB6HicbVDLSgNBEOyNrxhfUY96GAyCp7DrJR6DXjwmYB6QLGF20puMmZ1dZmaFsOQLvHhQxKtf4Xd48+anOHkcNLGgoajqprsrSATXxnW/nNza+sbmVn67sLO7t39QPDxq6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/VbD6g0j+WdGSfoR3QgecgZNVaqJ71iyS27M5BV4i1IqXr6Uf8GgFqv+NntxyyNUBomqNYdz02Mn1FlOBM4KXRTjQllIzrAjqWSRqj9bHbohJxbpU/CWNmShszU3xMZjbQeR4HtjKgZ6mVvKv7ndVITXvkZl0lqULL5ojAVxMRk+jXpc4XMiLEllClubyVsSBVlxmZTsCF4yy+vkuZl2XPLXt2mcQ1z5OEEzuACPKhAFW6hBg1ggPAIz/Di3DtPzqvzNm/NOYuZY/gD5/0H6NOPNQ==sha1_base64="rEhUZs3qal4B2jyV9+tPIlj3nL8=">AAAB6HicbVDLSgNBEOyNrxhfUY+KDAbBU9j1osegF48JmAckS5iddJIxs7PLzKwQlhw9efGgiFe/It/hzW/wJ5w8DppY0FBUddPdFcSCa+O6X05mZXVtfSO7mdva3tndy+8f1HSUKIZVFolINQKqUXCJVcONwEaskIaBwHowuJn49QdUmkfyzgxj9EPak7zLGTVWqsTtfMEtulOQZeLNSaF0PK58P56My+38Z6sTsSREaZigWjc9NzZ+SpXhTOAo10o0xpQNaA+blkoaovbT6aEjcmaVDulGypY0ZKr+nkhpqPUwDGxnSE1fL3oT8T+vmZjulZ9yGScGJZst6iaCmIhMviYdrpAZMbSEMsXtrYT1qaLM2GxyNgRv8eVlUrsoem7Rq9g0rmGGLBzBKZyDB5dQglsoQxUYIDzBC7w6986z8+a8z1ozznzmEP7A+fgBxxiQmw==sha1_base64="NJcRm2LSCR31unftaQYs8y2SaBY=">AAAB6HicbVBNT8JAEJ3iF+IX6tHLRmLiibRe9Ej04hESCyTQkO0yhZXtttndmpCGX+DFg8Z49Sd589+4QA8KvmSSl/dmMjMvTAXXxnW/ndLG5tb2Tnm3srd/cHhUPT5p6yRTDH2WiER1Q6pRcIm+4UZgN1VI41BgJ5zczf3OEyrNE/lgpikGMR1JHnFGjZVa6aBac+vuAmSdeAWpQYHmoPrVHyYsi1EaJqjWPc9NTZBTZTgTOKv0M40pZRM6wp6lksaog3xx6IxcWGVIokTZkoYs1N8TOY21nsah7YypGetVby7+5/UyE90EOZdpZlCy5aIoE8QkZP41GXKFzIipJZQpbm8lbEwVZcZmU7EheKsvr5P2Vd1z617LrTVuizjKcAbncAkeXEMD7qEJPjBAeIZXeHMenRfn3flYtpacYuYU/sD5/AHY2Yzw p

AAAB6HicbVDLSgNBEOyNrxhfUY96GAyCp7DrJR6DXjwmYB6QLGF20puMmZ1dZmaFsOQLvHhQxKtf4Xd48+anOHkcNLGgoajqprsrSATXxnW/nNza+sbmVn67sLO7t39QPDxq6jhVDBssFrFqB1Sj4BIbhhuB7UQhjQKBrWB0M/VbD6g0j+WdGSfoR3QgecgZNVaqJ71iyS27M5BV4i1IqXr6Uf8GgFqv+NntxyyNUBomqNYdz02Mn1FlOBM4KXRTjQllIzrAjqWSRqj9bHbohJxbpU/CWNmShszU3xMZjbQeR4HtjKgZ6mVvKv7ndVITXvkZl0lqULL5ojAVxMRk+jXpc4XMiLEllClubyVsSBVlxmZTsCF4yy+vkuZl2XPLXt2mcQ1z5OEEzuACPKhAFW6hBg1ggPAIz/Di3DtPzqvzNm/NOYuZY/gD5/0H6NOPNQ==sha1_base64="rEhUZs3qal4B2jyV9+tPIlj3nL8=">AAAB6HicbVDLSgNBEOyNrxhfUY+KDAbBU9j1osegF48JmAckS5iddJIxs7PLzKwQlhw9efGgiFe/It/hzW/wJ5w8DppY0FBUddPdFcSCa+O6X05mZXVtfSO7mdva3tndy+8f1HSUKIZVFolINQKqUXCJVcONwEaskIaBwHowuJn49QdUmkfyzgxj9EPak7zLGTVWqsTtfMEtulOQZeLNSaF0PK58P56My+38Z6sTsSREaZigWjc9NzZ+SpXhTOAo10o0xpQNaA+blkoaovbT6aEjcmaVDulGypY0ZKr+nkhpqPUwDGxnSE1fL3oT8T+vmZjulZ9yGScGJZst6iaCmIhMviYdrpAZMbSEMsXtrYT1qaLM2GxyNgRv8eVlUrsoem7Rq9g0rmGGLBzBKZyDB5dQglsoQxUYIDzBC7w6986z8+a8z1ozznzmEP7A+fgBxxiQmw==sha1_base64="NJcRm2LSCR31unftaQYs8y2SaBY=">AAAB6HicbVBNT8JAEJ3iF+IX6tHLRmLiibRe9Ej04hESCyTQkO0yhZXtttndmpCGX+DFg8Z49Sd589+4QA8KvmSSl/dmMjMvTAXXxnW/ndLG5tb2Tnm3srd/cHhUPT5p6yRTDH2WiER1Q6pRcIm+4UZgN1VI41BgJ5zczf3OEyrNE/lgpikGMR1JHnFGjZVa6aBac+vuAmSdeAWpQYHmoPrVHyYsi1EaJqjWPc9NTZBTZTgTOKv0M40pZRM6wp6lksaog3xx6IxcWGVIokTZkoYs1N8TOY21nsah7YypGetVby7+5/UyE90EOZdpZlCy5aIoE8QkZP41GXKFzIipJZQpbm8lbEwVZcZmU7EheKsvr5P2Vd1z617LrTVuizjKcAbncAkeXEMD7qEJPjBAeIZXeHMenRfn3flYtpacYuYU/sD5/AHY2Yzw p

fAAAB8nicbVBNS8NAEN3Ur1q/qh69LBbBU0lE0GPRi8cK9gPaUDabSbt0kw27E6WE/gwvHhTx6q/x5r9x2+agrQ8GHu/NMDMvSKUw6LrfTmltfWNzq7xd2dnd2z+oHh61jco0hxZXUuluwAxIkUALBUrophpYHEjoBOPbmd95BG2ESh5wkoIfs2EiIsEZWqnXfxIhjBjm0XRQrbl1dw66SryC1EiB5qD61Q8Vz2JIkEtmTM9zU/RzplFwCdNKPzOQMj5mQ+hZmrAYjJ/PT57SM6uENFLaVoJ0rv6eyFlszCQObGfMcGSWvZn4n9fLMLr2c5GkGULCF4uiTFJUdPY/DYUGjnJiCeNa2FspHzHNONqUKjYEb/nlVdK+qHtu3bu/rDVuijjK5IScknPikSvSIHekSVqEE0WeySt5c9B5cd6dj0VrySlmjskfOJ8/uwCRiA== Test b implied similarity `(zp,y0 ) observed AAAB+3icbVDLSsNAFJ3UV62vWJduBou0gpREBF0W3bisYB/QhjCZTtqhk8kwMxFjyK+4caGIW3/EnX/jtM1CqwcuHM65l3vvCQSjSjvOl1VaWV1b3yhvVra2d3b37P1qV8WJxKSDYxbLfoAUYZSTjqaakb6QBEUBI71gej3ze/dEKhrzO50K4kVozGlIMdJG8u3qkDDWePTFKUzrfibq+Ylv15ymMwf8S9yC1ECBtm9/DkcxTiLCNWZIqYHrCO1lSGqKGckrw0QRgfAUjcnAUI4iorxsfnsOj40ygmEsTXEN5+rPiQxFSqVRYDojpCdq2ZuJ/3mDRIeXXka5SDTheLEoTBjUMZwFAUdUEqxZagjCkppbIZ4gibA2cVVMCO7yy39J96zpOk339rzWuiriKINDcAQawAUXoAVuQBt0AAYP4Am8gFcrt56tN+t90VqyipkD8AvWxzd+i5Ns p0

AAAB6XicbZC7SgNBFIbPeo3rLWppMxhEq7Bro40YtLGMYi6QLGF2cjYZMju7zMyKIQR8ABsLRWx9GHs738bJpdDEHwY+/v8c5pwTpoJr43nfzsLi0vLKam7NXd/Y3NrO7+xWdZIphhWWiETVQ6pRcIkVw43AeqqQxqHAWti7GuW1e1SaJ/LO9FMMYtqRPOKMGmvdPhy18gWv6I1F5sGfQuHi0z1/BIByK//VbCcsi1EaJqjWDd9LTTCgynAmcOg2M40pZT3awYZFSWPUwWA86ZAcWqdNokTZJw0Zu787BjTWuh+HtjKmpqtns5H5X9bITHQWDLhMM4OSTT6KMkFMQkZrkzZXyIzoW6BMcTsrYV2qKDP2OK49gj+78jxUT4q+V/RvvELpEibKwT4cwDH4cAoluIYyVIBBBE/wAq9Oz3l23pz3SemCM+3Zgz9yPn4AtLiO9g==sha1_base64="O2Bd58WbeH8pIuR8c3raJLFsFWs=">AAAB6XicbZDLSgMxFIbP1Fsdb1WXboJFdFVm3OhGLLpxWcVeoB1KJs20oZlkSDJiGfoGblwo4rYP496N+Daml4W2/hD4+P9zyDknTDjTxvO+ndzS8srqWn7d3djc2t4p7O7VtEwVoVUiuVSNEGvKmaBVwwynjURRHIec1sP+9TivP1ClmRT3ZpDQIMZdwSJGsLHW3eNxu1D0St5EaBH8GRQvP9yLZPTlVtqFz1ZHkjSmwhCOtW76XmKCDCvDCKdDt5VqmmDSx13atChwTHWQTSYdoiPrdFAklX3CoIn7uyPDsdaDOLSVMTY9PZ+Nzf+yZmqi8yBjIkkNFWT6UZRyZCQar406TFFi+MACJorZWRHpYYWJscdx7RH8+ZUXoXZa8r2Sf+sVy1cwVR4O4BBOwIczKMMNVKAKBCJ4ghd4dfrOs/PmvE9Lc86sZx/+yBn9AKZHkGo=sha1_base64="kOgXll9xUeMgz+G4CFAzvcFEdEI=">AAAB6XicbVA9TwJBEJ3DL8Qv1NJmIzFakTsbLIk2lmjkI4EL2VvmYMPe3mV3z0gu/AMbC42x9R/Z+W9c4AoFXzLJy3szmZkXJIJr47rfTmFtfWNzq7hd2tnd2z8oHx61dJwqhk0Wi1h1AqpRcIlNw43ATqKQRoHAdjC+mfntR1Sax/LBTBL0IzqUPOSMGivdP533yxW36s5BVomXkwrkaPTLX71BzNIIpWGCat313MT4GVWGM4HTUi/VmFA2pkPsWipphNrP5pdOyZlVBiSMlS1pyFz9PZHRSOtJFNjOiJqRXvZm4n9eNzXhlZ9xmaQGJVssClNBTExmb5MBV8iMmFhCmeL2VsJGVFFmbDglG4K3/PIqaV1WPbfq3bmV+nUeRxFO4BQuwIMa1OEWGtAEBiE8wyu8OWPnxXl3PhatBSefOYY/cD5/AEWAjSk= x0 similarity yAAAB6XicbVC7TsNAEFyHVwivACUUJyIEVWTTQBlBQ5kg8pASKzpf1skp57N1d0aKrPwBDQUI0fITfAcdHZ/C5VFAwkgrjWZ2tbsTJIJr47pfTm5ldW19I79Z2Nre2d0r7h80dJwqhnUWi1i1AqpRcIl1w43AVqKQRoHAZjC8mfjNB1Sax/LejBL0I9qXPOSMGivdjc66xZJbdqcgy8Sbk1Ll+KP2DQDVbvGz04tZGqE0TFCt256bGD+jynAmcFzopBoTyoa0j21LJY1Q+9n00jE5tUqPhLGyJQ2Zqr8nMhppPYoC2xlRM9CL3kT8z2unJrzyMy6T1KBks0VhKoiJyeRt0uMKmREjSyhT3N5K2IAqyowNp2BD8BZfXiaNi7Lnlr2aTeMaZsjDEZzAOXhwCRW4hSrUgUEIj/AML87QeXJenbdZa86ZzxzCHzjvP1b/j28=sha1_base64="QSLzEHFJDnMA9st08y9SebZEOV8=">AAAB6XicbVC7SgNBFL3rM8ZX1FKRwSBahV0bLYM2lomYByRLmJ3MJkNmZpeZWWFZUtrZWChi60/kO+z8Bn/CyaPQxAMXDufcy733BDFn2rjul7O0vLK6tp7byG9ube/sFvb26zpKFKE1EvFINQOsKWeS1gwznDZjRbEIOG0Eg5ux33igSrNI3ps0pr7APclCRrCx0l161ikU3ZI7AVok3owUy0ej6vfj8ajSKXy2uxFJBJWGcKx1y3Nj42dYGUY4HebbiaYxJgPcoy1LJRZU+9nk0iE6tUoXhZGyJQ2aqL8nMiy0TkVgOwU2fT3vjcX/vFZiwis/YzJODJVkuihMODIRGr+NukxRYnhqCSaK2VsR6WOFibHh5G0I3vzLi6R+UfLckle1aVzDFDk4hBM4Bw8uoQy3UIEaEAjhCV7g1Rk4z86b8z5tXXJmMwfwB87HDzVEkNU=sha1_base64="vc8g6+JR5IwOpGHOELKqDXqKSPk=">AAAB6XicbVBNS8NAEJ3Ur1q/qh69LBbRU0m86LHoxWMV+wFtKJvtpF262YTdjRBC/4EXD4p49R9589+4bXPQ1gcDj/dmmJkXJIJr47rfTmltfWNzq7xd2dnd2z+oHh61dZwqhi0Wi1h1A6pRcIktw43AbqKQRoHATjC5nfmdJ1Sax/LRZAn6ER1JHnJGjZUesvNBtebW3TnIKvEKUoMCzUH1qz+MWRqhNExQrXuemxg/p8pwJnBa6acaE8omdIQ9SyWNUPv5/NIpObPKkISxsiUNmau/J3IaaZ1Fge2MqBnrZW8m/uf1UhNe+zmXSWpQssWiMBXExGT2NhlyhcyIzBLKFLe3EjamijJjw6nYELzll1dJ+7LuuXXv3q01boo4ynACp3ABHlxBA+6gCS1gEMIzvMKbM3FenHfnY9FacoqZY/gD5/MHRwWNKg== 0 k(xp,x0 ) AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbFIK0hJRNBj0YvHCvYD2hA22227dLMJuxtpDf0lXjwo4tWf4s1/47bNQVsfDDzem2FmXhBzprTjfFu5tfWNza38dmFnd2+/aB8cNlWUSEIbJOKRbAdYUc4EbWimOW3HkuIw4LQVjG5nfuuRSsUi8aAnMfVCPBCszwjWRvLt4qgy9uNzNC77aVyenvl2yak6c6BV4makBBnqvv3V7UUkCanQhGOlOq4Tay/FUjPC6bTQTRSNMRnhAe0YKnBIlZfOD5+iU6P0UD+SpoRGc/X3RIpDpSZhYDpDrIdq2ZuJ/3mdRPevvZSJONFUkMWifsKRjtAsBdRjkhLNJ4ZgIpm5FZEhlphok1XBhOAuv7xKmhdV16m695el2k0WRx6O4QQq4MIV1OAO6tAAAgk8wyu8WU/Wi/VufSxac1Y2cwR/YH3+AC3Zkh0= p0

Train AAAB6XicbVC7TsNAEFyHVwivACUUJyIEVWTTQBlBQ5kg8pASKzpf1skp57N1d0aKrPwBDQUI0fITfAcdHZ/C5VFAwkgrjWZ2tbsTJIJr47pfTm5ldW19I79Z2Nre2d0r7h80dJwqhnUWi1i1AqpRcIl1w43AVqKQRoHAZjC8mfjNB1Sax/LejBL0I9qXPOSMGivdJWfdYsktu1OQZeLNSaly/FH7BoBqt/jZ6cUsjVAaJqjWbc9NjJ9RZTgTOC50Uo0JZUPax7alkkao/Wx66ZicWqVHwljZkoZM1d8TGY20HkWB7YyoGehFbyL+57VTE175GZdJalCy2aIwFcTEZPI26XGFzIiRJZQpbm8lbEAVZcaGU7AheIsvL5PGRdlzy17NpnENM+ThCE7gHDy4hArcQhXqwCCER3iGF2foPDmvztusNefMZw7hD5z3H0lSj2Y=sha1_base64="7/yf2jvoONEmwc0y4gAKPPtHSMs=">AAAB6XicbVDLSgNBEOz1GeMr6lGRwSB6Crte9Bj04jER84AkhNnJbDJkdnaZ6RXCkqM3Lx4U8epP5Du8+Q3+hJPHQRMLGoqqbrq7/FgKg6775Swtr6yurWc2sptb2zu7ub39qokSzXiFRTLSdZ8aLoXiFRQoeT3WnIa+5DW/fzP2aw9cGxGpexzEvBXSrhKBYBStdBeftXN5t+BOQBaJNyP54tGo/P14PCq1c5/NTsSSkCtkkhrT8NwYWynVKJjkw2wzMTymrE+7vGGpoiE3rXRy6ZCcWqVDgkjbUkgm6u+JlIbGDELfdoYUe2beG4v/eY0Eg6tWKlScIFdsuihIJMGIjN8mHaE5QzmwhDIt7K2E9aimDG04WRuCN//yIqleFDy34JVtGtcwRQYO4QTOwYNLKMItlKACDAJ4ghd4dfrOs/PmvE9bl5zZzAH8gfPxAyeXkMw=sha1_base64="kS16dWTX2TN5GUpptjWVDVXB36g=">AAAB6XicbVBNS8NAEJ3Ur1q/oh69LBbRU0m86LHoxWMV+wFtKJvtpl262YTdiVBC/4EXD4p49R9589+4bXPQ1gcDj/dmmJkXplIY9Lxvp7S2vrG5Vd6u7Ozu7R+4h0ctk2Sa8SZLZKI7ITVcCsWbKFDyTqo5jUPJ2+H4dua3n7g2IlGPOEl5ENOhEpFgFK30kJ733apX8+Ygq8QvSBUKNPruV2+QsCzmCpmkxnR9L8UgpxoFk3xa6WWGp5SN6ZB3LVU05ibI55dOyZlVBiRKtC2FZK7+nshpbMwkDm1nTHFklr2Z+J/XzTC6DnKh0gy5YotFUSYJJmT2NhkIzRnKiSWUaWFvJWxENWVow6nYEPzll1dJ67LmezX/3qvWb4o4ynACp3ABPlxBHe6gAU1gEMEzvMKbM3ZenHfnY9FacoqZY/gD5/MHOViNIQ== p0 AAAB6XicbVC7TsNAEFyHVwivACUUJyIEVWTTQBlBQ5kg8pASKzpf1skp57N1d0aKrPwBDQUI0fITfAcdHZ/C5VFAwkgrjWZ2tbsTJIJr47pfTm5ldW19I79Z2Nre2d0r7h80dJwqhnUWi1i1AqpRcIl1w43AVqKQRoHAZjC8mfjNB1Sax/LejBL0I9qXPOSMGivdJWfdYsktu1OQZeLNSaly/FH7BoBqt/jZ6cUsjVAaJqjWbc9NjJ9RZTgTOC50Uo0JZUPax7alkkao/Wx66ZicWqVHwljZkoZM1d8TGY20HkWB7YyoGehFbyL+57VTE175GZdJalCy2aIwFcTEZPI26XGFzIiRJZQpbm8lbEAVZcaGU7AheIsvL5PGRdlzy17NpnENM+ThCE7gHDy4hArcQhXqwCCER3iGF2foPDmvztusNefMZw7hD5z3H0lSj2Y=sha1_base64="7/yf2jvoONEmwc0y4gAKPPtHSMs=">AAAB6XicbVDLSgNBEOz1GeMr6lGRwSB6Crte9Bj04jER84AkhNnJbDJkdnaZ6RXCkqM3Lx4U8epP5Du8+Q3+hJPHQRMLGoqqbrq7/FgKg6775Swtr6yurWc2sptb2zu7ub39qokSzXiFRTLSdZ8aLoXiFRQoeT3WnIa+5DW/fzP2aw9cGxGpexzEvBXSrhKBYBStdBeftXN5t+BOQBaJNyP54tGo/P14PCq1c5/NTsSSkCtkkhrT8NwYWynVKJjkw2wzMTymrE+7vGGpoiE3rXRy6ZCcWqVDgkjbUkgm6u+JlIbGDELfdoYUe2beG4v/eY0Eg6tWKlScIFdsuihIJMGIjN8mHaE5QzmwhDIt7K2E9aimDG04WRuCN//yIqleFDy34JVtGtcwRQYO4QTOwYNLKMItlKACDAJ4ghd4dfrOs/PmvE9bl5zZzAH8gfPxAyeXkMw=sha1_base64="kS16dWTX2TN5GUpptjWVDVXB36g=">AAAB6XicbVBNS8NAEJ3Ur1q/oh69LBbRU0m86LHoxWMV+wFtKJvtpl262YTdiVBC/4EXD4p49R9589+4bXPQ1gcDj/dmmJkXplIY9Lxvp7S2vrG5Vd6u7Ozu7R+4h0ctk2Sa8SZLZKI7ITVcCsWbKFDyTqo5jUPJ2+H4dua3n7g2IlGPOEl5ENOhEpFgFK30kJ733apX8+Ygq8QvSBUKNPruV2+QsCzmCpmkxnR9L8UgpxoFk3xa6WWGp5SN6ZB3LVU05ibI55dOyZlVBiRKtC2FZK7+nshpbMwkDm1nTHFklr2Z+J/XzTC6DnKh0gy5YotFUSYJJmT2NhkIzRnKiSWUaWFvJWxENWVow6nYEPzll1dJ67LmezX/3qvWb4o4ynACp3ABPlxBHe6gAU1gEMEzvMKbM3ZenHfnY9FacoqZY/gD5/MHOViNIQ== p0

∑ ∑n b ′ f(x) = argmin αi,p′ (x, p) ℓ0([y ]p, [y]p′ ) y′∈Y p,p′∈P i=1

26 Leveraging Locality

Questions:

• are we really leveraging locality? • does the parts structure help?

Problem: if two patches are too similar (i.e. correlated) they do not provide much novel information.

27 Within-Locality

Intuition: “far-away” parts should be uncorrelated...

More formally, let d : P × P → R be a distance on the parts.

Assumption (Within-Locality). There exists γ ≥ 0 such that

E ⊤ − ⊤ ′ ≤ −γd(p,q) Cpq = [xp xq xp xq] e

28 Within-Locality in the Wild

Is within-locality a sensible assumption?

Does it hold in practice on real datasets?

Example: (Empirical) Within-locality wrt central patch p on ImageNet

∑ 0.8 b 1 n ⊤ ⊤ 20 Cpq = [x xiq −x xjq ] m i,j=1 ip ip 0.7 40

0.6 60

0.5 80

0.4 100

0.3 120

0.2 140

0.1 160 20 40 60 80 100 120 140 160 29 Leveraging Locality

Questions: • are we really leveraging locality? Yes! • does the parts structure help?

Theorem (This work). Under between-locality...

• ...and no within-locality (i.e. γ ≈ 0), then

E[ℓ(fb(x), y) − ℓ(f ⋆(x), y)] = O(n−1/4).

• ...and within-locality (i.e. γ ≫ 0), then

E[ℓ(fb(x), y) − ℓ(f ⋆(x), y)] = O((n|P |)−1/4).

30 Experiments

Predicting the Direction of Ridges in Fingerprint Images

f : BWimages → Anglesimages

The output set is the manifold of ridge orientations (S1).

0.12

0.1

0.08

0.06

0.04

0.02 Local- Local-LS Global- KRLS

31 Conclusions

A General Framework for Structured Prediction: • Algorithm: Directly applicable on a wide family of problems. • Theory: With strong theoretical guarantees.

Exploiting the local structure: • Algorithm: Directly model locality between input/output parts (e.g. images, strings, graphs, etc.). • Theory: Adaptively leverage locality to attain better rates.

Future work: • Learning the parts (i.e. latent structured prediction). • Integration with other models (e.g. Deep NN). 32 References i

Bakir, G. H., Hofmann, T., Schölkopf, B., Smola, A. J., Taskar, B., and Vishwanathan, S. V. N. (2007). Predicting Structured Data. MIT Press. Bartlett, P. L., Jordan, M. I., and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156. Ciliberto, C., Rosasco, L., and Rudi, A. (2016). A consistent regularization approach for structured prediction. Advances in Neural Information Processing Systems 29 (NIPS), pages 4412–4420. Ciliberto, C., Rudi, A., Rosasco, L., and Pontil, M. (2017). Consistent multitask learning with nonlinear output relations. In Advances in Neural Information Processing Systems, pages 1983–1993. Djerrab, M., Garcia, A., Sangnier, M., and d’Alché Buc, F. (2018). Output fisher embedding regression. , 107(8-10):1229–1256. Duchi, J. C., Mackey, L. W., and Jordan, M. I. (2010). On the consistency of ranking algorithms. In Proceedings of the International Conference on Machine Learning (ICML), pages 327–334. Korba, A., Garcia, A., and d’Alché Buc, F. (2018). A structured prediction approach for label ranking. In Advances in Neural Information Processing Systems, pages 8994–9004. Luise, G., Rudi, A., Pontil, M., and Ciliberto, C. (2018). Differential properties of sinkhorn approximation for learning with wasserstein distance. In Advances in Neural Information Processing Systems, pages 5859–5870. Luise, G., Stamos, D., Pontil, M., and Ciliberto, C. (2019). Leveraging low-rank relations between surrogate tasks in structured prediction. International Conference on Machine Learning (ICML).

33 References ii

Mroueh, Y., Poggio, T., Rosasco, L., and Slotine, J.-J. (2012). Multiclass learning with simplex coding. In Advances in Neural Information Processing Systems (NIPS) 25, pages 2798–2806. Nowak-Vila, A., Bach, F., and Rudi, A. (2018). Sharp analysis of learning with discrete losses. AISTATS. Nowak-Vila, A., Bach, F., and Rudi, A. (2019). A general theory for structured prediction with smooth convex surrogates. arXiv preprint arXiv:1902.01958. Osokin, A., Bach, F., and Lacoste-Julien, S. (2017). On structured prediction theory with calibrated convex surrogate losses. In Advances in Neural Information Processing Systems, pages 302–313. Rudi, A., Ciliberto, C., Marconi, G., and Rosasco, L. (2018). Manifold structured prediction. In Advances in Neural Information Processing Systems, pages 5610–5621. Struminsky, K., Lacoste-Julien, S., and Osokin, A. (2018). Quantifying learning guarantees for convex but inconsistent surrogates. In Advances in Neural Information Processing Systems, pages 669–677. Tsochantaridis, I., Joachims, T., Hofmann, T., and Altun, Y. (2005). Large margin methods for structured and interdependent output variables. volume 6, pages 1453–1484.

34