THE IMPORTANCE OF SYNTACTIC PARSING AND INFERENCE IN SEMANTIC ROLE LABELING (PUNYAKANOK ET AL. 2008)
Summer Semester Peter Michael Stahl June 27th 2011 Seminar: Recent Developments in Computa onal Seman cs Outline
2
¨ SRL system architecture ¤ Pruning ¤ Argument iden fica on ¤ Argument classifica on ¤ Inference ¨ What is Integer Linear Programming (ILP)? ¨ Importance of Syntac c Parsing ¤ full syntac c parsing ¤ shallow syntac c parsing ¨ Conclusion 3 SRL System Architecture 4 Two-step design
5
¨ 1) System is trained to iden fy argument candidates for a given verb predicate ¨ 2) Argument candidates are classified into their types ¨ Addi onal steps: ¤ Pruning of obvious non-candidates before step 1 ¤ Use post-processing inference to fix inconsistent predic ons a er step 2 ¨ System uses PropBank Annota on Scheme Pruning step
6
¨ filter out simple cons tuents that are unlikely to be arguments ¨ recursive process star ng from the target verb ¨ returns siblings of verb as candidates ¨ then it moves to parent of the verb and collects siblings again ¨ process con nues un l it reaches the root ¨ exploits heuris c rules introduced by Xue and Palmer (2004) Argument iden fica on step
7
¨ binary classifica on ¨ full syntac c parsing: ¤ train and apply classifiers on cons tuents supplied by pruning stage ¨ shallow syntac c parsing: ¤ no pruning stage à consider all possible subsequences (i.e. consecu ve words) as poten al candidates ¤ two classifiers: One to predict beginnings of a poten al candidate, the second to predict the ends ¤ predic ons are combined Features of classifier
8
¨ same features as in Gildea & Jurafsky (2002) ¨ some addi onal features which are especially useful for systems without full parse tree informa on, for example: ¤ two word before and a er cons tuent, together with their POS tags ¤ length of cons tuent, measured in words and chunks ¤ chunk pa ern: encodes sequence of chunks from cons tuent to predicate ¤ chunk pa ern length: counts number of chunks in chunk pa ern feature Argument classifica on step
9
¨ mul -class classifier used to predict seman c types or label argument as null to discard it ¨ same features as in iden fica on step ¨ for details of the classifier see (Roth 1998) and (Carlson et al. 1999) Inference step
10
¨ previous decisions were made for each argument independently, ignoring global informa on across arguments ¨ purpose: incorporate such informa on here, including linguis c and structural knowledge, e.g.: ¤ arguments do not overlap ¤ each verb takes at most one argument of each type ¨ à useful to resolve inconsistencies of argument classifica on Inference step
11
¨ process of finding the best valid seman c labels which sa sfy certain constraints ¨ input: the argument classifier‘s confidence scores for each type of argument, along with a list of constraints ¨ output: op mal solu on that maximizes linear sum of confidence scores, subject to constraints that encode domain knowledge ¨ accomplished through integer linear programming Constraints over Argument Labeling
12
¨ S1:M: set of arguments, indexed from 1 to M ¨ PM: set of labels ¨ the indexed set of arguments can take a set of labels c1:M ∈PM ¨ classifiers return score(Si = ci) ¨ à likelihood of argument Si being labeled ci, given a sentence ¨ goal: maximize the overall score of arguments 13 Constraints over Argument Labeling
find the set of labels for which the sum of the scores of each argument is maximized can be thought of as if the solu on space is limited through a filter func on F which eliminates many labelings from considera on 14 Constraints over Argument Labeling
the filter func on uses several constraints, e.g.:
- arguments cannot overlap with the predicate
- if a predicate is outside a clause, its arguments cannot be embedded in that clause 15 Constraints over Argument Labeling
problem: computa on takes a very long me, but can be op mized by using an integer linear programming resolver 16 Integer Linear Programming What is ILP?
17
¨ ILP is a special case of linear programming ¨ also called linear op miza on ¨ mathema cal method for achieving the best outcome in a mathema cal model for some list of requirements represented as linear rela onships ¨ à maximize an objec ve func on depending on linear (in)equality constraints ¨ if unknown variables are all required to be integers, then the task is called ILP 18 Geometrical interpreta on of ILP
- set {x | ax = b}: all points x which sa sfy ax=b (a and b are constants) - all these points define a hyperplane in n-dimensional space - all points which sa sfy ax <= b lie on one side of this hyperplane
- each constraint divides the space into two parts; points on one side are feasible, those on the other side are not 19 Geometrical interpreta on of ILP
- the set of all points which lie in their respec ve feasible region of the space build a convex polytope 20 What is a polytope?
- a polytope is a closed solid geometric object with flat sides, which exists in any general number of dimensions
- 2-polytopes are called polygons
- 3-polytopes are called polyhedrons 21 What does convex mean?
an object is convex if for every pair of points x and y within the object, every point on the straight line, which connects x and y, is also within the object 22 Geometrical interpreta on of ILP
- the set of all points which lie in their respec ve feasible region of the space build a convex polytope T - the red vector c is the objec ve func on c: x -> c x that shall be maximized (standard form)
- this corresponds to finding the point in the polytope that maximizes the scalar product of vector c and the hyperplane with the condi on {x | cTx=0} (the straight red line) 23 Geometrical interpreta on of ILP
- shi this hyperplane in the direc on of vector c as far as possible without leaving the polytope
- i.e. the point xop mal maximizes the objec ve func on c considering the linear constraints and thus is the only op mal solu on to the given problem
- in this case only one point matches, but it can also be a set of points in general 24 ILP applied to Seman c Role Labeling 25 ILP applied to Seman c Role Labeling
the original func on is reformulated as a linear func on over several binary
variables uic and the filter func on F is represented using linear (in)equali es 26 ILP applied to Seman c Role Labeling
i - uic = [S = c] (indicator variable that represents whether or not the argument type c is assigned to Si) i i - pic = score(S = c) (the probability of argument S to be of type c)
- the constraint means that each argument can take only one type ILP applied to Seman c Role Labeling
27
¨ previous approaches rely on dynamic programming to resolve non-overlapping / embedding constraints ¨ but they are not able to handle more expressive constraints which take, e.g., long-distance dependencies into account 28 Full vs. Shallow Syntac c Parsing Full vs. Shallow Syntac c Parsing
29
¨ PropBank sec ons 02-21 used as training data, sec on 23 used for tes ng ¨ goal: understand the effec ve contribu on of full versus shallow parsing informa on (i.e. using only POS tags, chunks and clauses) ¨ comparison of performance when using the correct (gold-standard) data versus automa c parse data ¨ performance is measured in terms of precision, recall and F1 measure Full vs. Shallow Syntac c Parsing
30
Full Parsing Shallow Parsing
¨ automa c full parse ¨ different components trees are derived are used: using Charniak‘s ¤ POS tagger (Even- parser (2001) Zohar & Roth 2001) ¤ Chunker (Punyakanok & Roth 2001) ¤ Clauser (Carreras et al. 2005) 31 Argument classifica on performance
- for tes ng, it is assumed that the argument boundaries are known
- difference between systems lies in features that are used (in the shallow system, most features can be approximated using chunks and clauses) 32 Argument iden fica on performance
- the candidate list used here is the output of the pruning heuris c applied on the gold-standard parse trees
- difference between systems lies only in the construc on of some features 33 Pruning performance
- the main contribu on of full parsing is the pruning stage - internal tree structure significantly helps in discrimina ng argument candidates - the shallow parsing system does not have enough informa on for the pruning heuris cs, thus two word-based classifiers are trained instead (one to predict the beginning of an argument, the second to predict the end) 34 ILP Inference performance
removing most linear constraints has a visible impact on performance Conclusion
35
¨ the pruning step and the ILP-based inference procedure have the greatest impact on the overall system performance ¨ by means of these features a high-performant SRL system can be built ¨ shallow syntac c parsing yields already very good results, full syntac c parsing is most relevant in pruning and argument iden fica on Merci ! References
37
¨ Carlson, Andrew J.; Cumby, Chad M.; Rosen, Jeff L.; Roth, Dan (1999): The SNoW learning architecture. Technical Report UIUCDCS-R-99-2001, UIUC Computer Science Department. ¨ Carreras, Xavier; Màrquez, Llúis (2005): Introduc on to the CoNLL-2005 shared task: Seman c Role Labeling. In: Proceedings of the Ninth Conference on Computa onal Natural Language Learning, p. 152-164, Ann Arbor, MI. ¨ Charniak, Eugene (2001): Immediate-head parsing for language models. In: Proceddings of the 39th Annual Mee ng of the ACL, p. 116-123, Toulouse, France. ¨ Even-Zohar, Yair; Roth, Dan (2001): A sequen al model for mul -class classifica on. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP-2001), p. 10-19, Pi sburgh, PA. ¨ Gildea, Daniel; Jurafsky, Daniel (2002): Automa c Labeling of Seman c Roles. Computa onal Linguis cs, 28(3):245-288. ¨ Punyakanok, Vasin; Roth, Dan; Yih, Wen-tau; Zimak, Dav (2004): Seman c role labeling via integer linear programming inference. In: Proceedings of the 20th Interna onal Conference on Computa onal Linguis cs (COLING), p. 1346-1352, Geneva, Switzerland. ¨ Roth, Dan (1998): Learning to resolve natural language ambigui es: A unified approach. In: Proceedings of the 15th Na onal Conference on Ar ficial Intelligence (AAAI-98), p. 806-813, Madison, WI. ¨ Xue, Nianwen; Palmer, Martha (2004): Calibra ng features for seman c role labeling. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-2004), p. 88-94, Barcelona, Spain.