Latent Structures for Coreference Resolution

Latent Structures for Coreference Resolution Sebastian Martschat and Michael Strube Heidelberg Institute for Theoretical Studies gGmbH 1 / 25 Coreference Resolution Coreference resolution is the task of determining which mentions in a text refer to the same entity. 2 / 25 An Example Vicente del Bosque admits it will be difficult for him to select David de Gea in Spain’s squad if the goalkeeper remains on the sidelines at Manchester United. de Gea’s long-anticipated transfer to Real Madrid fell through on Monday due to miscommunication between the Spanish club and United and he will stay at Old Trafford until at least January. 3 / 25 An Example Vicente del Bosque admits it will be difficult for him to select David de Gea in Spain’s squad if the goalkeeper remains on the sidelines at Manchester United. de Gea’s long-anticipated transfer to Real Madrid fell through on Monday due to miscommunication between the Spanish club and United and he will stay at Old Trafford until at least January. 3 / 25 Outline Motivation Structures for Coreference Resolution Experiments and Analysis Conclusions and Future Work 4 / 25 Outline Motivation Structures for Coreference Resolution Experiments and Analysis Conclusions and Future Work 5 / 25 General Paradigm Vicente del Bosque admits it will be difficult for him to select David de Gea in Spain’s squad if the goalkeeper remains on the sidelines at Manchester United. Consolidate pairwise decisions for anaphor-antecedent pairs de Gea’s long-anticipated transfer to Real Madrid fell through on Monday due to miscommunication between the Spanish club and United and he will stay at Old Trafford until at least January. 6 / 25 General Paradigm Vicente del Bosque admits it will be difficult for him to select David de Gea in Spain’s squad if the goalkeeper remains on the sidelines at Manchester United. Consolidate pairwise decisions for anaphor-antecedent pairs de Gea’s long-anticipated transfer to Real Madrid fell through on Monday due to miscommunication between the Spanish club and United and he will stay at Old Trafford until at least January. 6 / 25 Mention Pairs Vicente del Bosque admits it will be difficult for him to select m m 6 7 − David de Gea in Spain’s squad if the goalkeeper remains on m m 5 7 − the sidelines at Manchester United. m4 m7 + de Gea’s long-anticipated m3 m7 − transfer to Real Madrid fell through on Monday due to m2 m7 − miscommunication between the Spanish club and United and m1 m7 − he will stay at Old Trafford until at least January. 7 / 25 Mention Ranking Vicente del Bosque admits it will be difficult for him to select m6 David de Gea in Spain’s squad if the goalkeeper remains on m5 the sidelines at Manchester United. m4 m7 de Gea’s long-anticipated m 3 transfer to Real Madrid fell through on Monday due to m 2 miscommunication between the Spanish club and United and m 1 he will stay at Old Trafford until at least January. 8 / 25 Antecedent Trees Vicente del Bosque admits it m m 4 1 will be difficult for him to select David de Gea in Spain’s squad m7 m10 if the goalkeeper remains on m3 the sidelines at Manchester United. m9 m17 m12 de Gea’s long-anticipated transfer to Real Madrid fell m16 through on Monday due to m15 miscommunication between the Spanish club and United and will stay at Old Trafford until m2 m5 m6 ::: m19 he at least January. 9 / 25 ! devise unified representation of approaches in terms of these structures Unifying Approaches • approaches operate on structures not annotated in training data • we can view these structures as latent structures 10 / 25 Unifying Approaches • approaches operate on structures not annotated in training data • we can view these structures as latent structures ! devise unified representation of approaches in terms of these structures 10 / 25 Outline Motivation Structures for Coreference Resolution Experiments and Analysis Conclusions and Future Work 11 / 25 Final Goal Learn a mapping f : X ! H × Z 12 / 25 Final Goal Learn a mapping f : X!H×Z • x 2 X : structured input • documents containing mentions and linguistic information 12 / 25 Final Goal Learn a mapping f : X!H×Z • h 2 H: document-level latent structure we actually predict • mention pairs, antecedent trees, ... • employ graph-based latent structures 12 / 25 Final Goal Learn a mapping f : X!H×Z Latent structures: subclass of directed labeled graphs G = (V ;A;L) 12 / 25 Final Goal Learn a mapping f : X!H×Z Latent structures: subclass of directed labeled graphs G = (V ;A;L) m0 m1 m2 m3 Nodes V : mentions plus dummy mention m0 for anaphoricity detection 12 / 25 Final Goal Learn a mapping f : X!H×Z Latent structures: subclass of directed labeled graphs G = (V ;A;L) + + m0 m1 m2 m3 + Arcs A: subset of all backward arcs 12 / 25 Final Goal Learn a mapping f : X!H×Z Latent structures: subclass of directed labeled graphs G = (V ;A;L) + + m0 m1 m2 m3 + Labels L: labels for arcs 12 / 25 Final Goal Learn a mapping f : X!H×Z Latent structures: subclass of directed labeled graphs G = (V ;A;L) + + m0 m1 m2 m3 + Graph can be split into substructures which are handled individually 12 / 25 Final Goal Learn a mapping f : X ! H ×Z • z 2 Z: mapping of mentions to entity identifiers • inferred via latent h 2 H 12 / 25 Linear Models Employ an edge-factored linear model: 13 / 25 Linear Models Employ an edge-factored linear model: f (x) = argmax hq;f(x;a;z)i (h;z)2Hx ×Zx ∑ a2h 13 / 25 0 3.7 10 sentDist=2-0.5 anaType=PRO0.8 Linear Models Employ an edge-factored linear model: f (x) = argmax hq;f(x;a;z)i (h;z)2Hx ×Zx ∑ a2h m3 m3 m3 m0 m1 m2 m0 m1 m2 m0 m1 m2 13 / 25 0 3.7 10 -0.5 0.8 Linear Models Employ an edge-factored linear model: f (x) = argmax hq;f(x;a;z)i (h;z)2Hx ×Zx ∑ a2h m3 m3 m3 sentDist=2 anaType=PRO m0 m1 m2 m0 m1 m2 m0 m1 m2 13 / 25 0 3.7 10 sentDist=2 anaType=PRO Linear Models Employ an edge-factored linear model: f (x) = argmax hq;f(x;a;z)i (h;z)2Hx ×Zx ∑ a2h m3 m3 m3 -0.5 0.8 m0 m1 m2 m0 m1 m2 m0 m1 m2 13 / 25 sentDist=2-0.5 anaType=PRO0.8 Linear Models Employ an edge-factored linear model: f (x) = argmax hq;f(x;a;z)i (h;z)2Hx ×Zx ∑ a2h m3 0 m3 3.7 m3 10 m0 m1 m2 m0 m1 m2 m0 m1 m2 13 / 25 sentDist=2-0.5 anaType=PRO0.8 Linear Models Employ an edge-factored linear model: f (x) = argmax h ; (x;a;z)i (h;z)2Hx ×Zx ∑ q f a2h m3 0 m3 3.7 m3 10 m0 m1 m2 m0 m1 m2 m0 m1 m2 13 / 25 Learning: Perceptron Input: Training set D, cost function c, number of epochs n function PERCEPTRON(D, c, n) Set q = (0;:::;0) for epoch = 1;:::;n do for (x;z) 2 D do hôpt = argmaxhq;f(x;h;z)i h2Hx;z (h^;z^) = argmax hq;f(x;h;z)i + c(x;h;hôpt;z) (h;z)2Hx ×Zx if h^ does not encode z then Set q = q + f(x;hôpt;z) − f(x;h^;z^) Output: Weight vector q 14 / 25 Learning: Perceptron Input: Training set D, cost function c, number of epochs n function PERCEPTRON(D, c, n) Set q = (0;:::;0) for epoch = 1;:::;n do for (x;z) 2 D do hôpt = argmaxhq;f(x;h;z)i h2Hx;z (h^;z^) = argmax hq;f(x;h;z)i + c(x;h;hôpt;z) (h;z)2Hx ×Zx if h^ does not encode z then Set q = q + f(x;hôpt;z) − f(x;h^;z^) Output: Weight vector q 14 / 25 Learning: Perceptron Input: Training set D, cost function c, number of epochs n function PERCEPTRON(D, c, n) Set q = (0;:::;0) for epoch = 1;:::;n do for (x;z) 2 D do hôpt = argmaxhq;f(x;h;z)i h2Hx;z (h^;z^) = argmax hq;f(x;h;z)i + c(x;h;hôpt;z) (h;z)2Hx ×Zx if h^ does not encode z then Set q = q + f(x;hôpt;z) − f(x;h^;z^) Output: Weight vector q 14 / 25 Learning: Perceptron Input: Training set D, cost function c, number of epochs n function PERCEPTRON(D, c, n) Set q = (0;:::;0) for epoch = 1;:::;n do for (x;z) 2 D do hôpt = argmaxhq;f(x;h;z)i h2Hx;z (h^;z^) = argmax hq;f(x;h;z)i + c(x;h;hôpt;z) (h;z)2Hx ×Zx if h^ does not encode z then Set q = q + f(x;hôpt;z) − f(x;h^;z^) Output: Weight vector q 14 / 25 Learning: Perceptron Input: Training set D, cost function c, number of epochs n function PERCEPTRON(D, c, n) Set q = (0;:::;0) for epoch = 1;:::;n do for (x;z) 2 D do hôpt = argmaxhq;f(x;h;z)i h2Hx;z (h^;z^) = argmax hq;f(x;h;z)i + c(x;h;hôpt;z) (h;z)2Hx ×Zx if h^ does not encode z then Set q = q + f(x;hôpt;z) − f(x;h^;z^) Output: Weight vector q 14 / 25 Learning: Perceptron Input: Training set D, cost function c, number of epochs n function PERCEPTRON(D, c, n) Set q = (0;:::;0) for epoch = 1;:::;n do for (x;z) 2 D do hôpt = argmaxhq;f(x;h;z)i h2Hx;z (h^;z^) = argmax hq;f(x;h;z)i + c(x;h;hôpt;z) (h;z)2Hx ×Zx if h^ does not encode z then Set q = q + f(x;hôpt;z) − f(x;h^;z^) Output: Weight vector q 14 / 25 Learning: Perceptron Input: Training set D, cost function c, number of epochs n function PERCEPTRON(D, c, n) Set q = (0;:::;0) for epoch = 1;:::;n do for (x;z) 2 D do hôpt = argmaxhq;f(x;h;z)i h2Hx;z (h^;z^) = argmax hq;f(x;h;z)i + c(x;h;hôpt;z) (h;z)2Hx ×Zx if h^ does not encode z then Set q = q + f(x;hôpt;z) − f(x;h^;z^) Output: Weight vector q m5 m0 m1 m2 m3 m4 14 / 25 Learning: Perceptron Input: Training set D, cost function c, number of epochs n function PERCEPTRON(D, c, n) Set q = (0;:::;0) for epoch = 1;:::;n do for (x;z) 2 D do hôpt = argmaxhq;f(x;h;z)i h2Hx;z (h^;z^) = argmax hq;f(x;h;z)i + c(x;h;hôpt;z) (h;z)2Hx ×Zx if h^ does not encode z then Set q = q + f(x;hôpt;z) − f(x;h^;z^) Output: Weight vector q m5 X m0 m1 m2 m3 m4 14 / 25 Learning: Perceptron Input: Training set D, cost function c, number of epochs n function PERCEPTRON(D, c, n) Set q = (0;:::;0) for epoch = 1;:::;n do for (x;z) 2 D do hôpt = argmaxhq;f(x;h;z)i h2Hx;z (h^;z^) = argmax hq;f(x;h;z)i + c(x;h;hôpt;z) (h;z)2Hx ×Zx if h^ does not encode z then Set q = q + f(x;hôpt;z) − f(x;h^;z^) Output: Weight vector q m5 m0 m1 m2 m3

Load more