Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction

Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction Siyuan Qi 1 Baoxiong Jia 1 2 Song-Chun Zhu 1 Abstract Generalized Earley Parser Parse Tree microwave food Future prediction Future predictions on sequence data (e.g., videos Grammar ... or audios) require the algorithms to capture non- open microwave take food put into microwave Markovian and compositional properties of high- Final output level semantics. Context-free grammars are nat- Classier ural choices to capture such properties, but tradi- raw output tional grammar parsers (e.g., Earley parser) only take symbolic sentences as inputs. In this paper, Sequence ... ... we generalize the Earley parser to parse sequence input data data which is neither segmented nor labeled. This generalized Earley parser integrates a grammar parser with a classifier to find the optimal seg- Figure 1. The input of the generalized Earley parser is a matrix of mentation and labels, and makes top-down future probabilities of each label for each frame, given by an arbitrary predictions. Experiments show that our method classifier. The parser segments and labels the sequence data into significantly outperforms other approaches for fu- a label sentence in the language of a given grammar. Future ture human activity prediction. predictions are then made based on the grammar. mars to parse and label sequence data. Traditional grammar 1. Introduction parsers take symbolic sentences as inputs instead of noisy We consider the task of labeling unsegmented sequence data sequence data like videos or audios. The data has to be i) and predicting future labels. This is a ubiquitous problem segmented and ii) labeled to apply existing grammar parsers with important applications in many perceptual tasks, e.g., to. One naive solution is to first segment and label the data recognition and anticipation of human activities or speech. using a classifier and thus generating a label sentence. Then A generic modeling approach is key for intelligent agents to grammar parsers can be applied on top of it for prediction. perform future-aware tasks (e.g., assistive activities). But this is apparently non-optimal, since the grammar rules are not considered in the classification process. It may not Such tasks often exhibit non-Markovian and compositional even be possible to parse this label sentence, because they properties, which should be captured by a top-down predic- are very often grammatically incorrect. tion algorithm. Consider the video sequence in Figure 1, human observers recognize the first two actions and predict In this paper, we design a grammar-based parsing algorithm the most likely future action based on the entire history. that directly operates on sequence input data, which goes Context-free grammars are natural choices to model such beyond the scope of symbolic string inputs. Specifically, reasoning processes, and it is one step closer to Turing ma- we propose a generalized Earley parser based on the Earley chines than Markov models (e.g., Hidden Markov Models) parser (Earley, 1970). The algorithm finds the optimal seg- in the Chomsky hierarchy. mentation and label sentence according to both a symbolic grammar and a classifier output of probabilities of labels for However, it is not clear how to directly use symbolic gram- each frame as shown in Figure 1. Optimality here means maximizing the probability of the label sentence according 1University of California, Los Angeles, USA 2Peking University, Beijing, China. Correspondence to: Siyuan Qi to the classifier output while being grammatically correct. <[email protected]>. The difficulty of achieving this optimality lies in the joint Proceedings of the 35 th International Conference on Machine optimization of both the grammar structure and the parsing Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 likelihood of the output label sentence. For example, an by the author(s). expectation-maximization-type algorithm will not work well Generalized Earley Parser since i) there is no guarantee for optimality, and ii) any Sample grammar: γ ! R; R ! R + R; R ! \0"j\1" Input string: 0 + 1 grammatically incorrect sentence has a grammar prior of State: j state # j rule j origin j comment j probability 0. The algorithm can easily get stuck in local S(0) minimums and fail to find a grammatically correct solution. (1) γ ! ·R 0 start rule (2) R ! ·R + R 0 predict: (1) The core idea of our algorithm is to directly and efficiently (3) R ! ·0 0 predict: (1) (4) R ! ·1 0 predict: (1) search for the optimal label sentence in the language defined S(1) by the grammar. The constraint of the search space ensures (1) R ! 0· 0 scan: S(0)(3) that the sentence is grammatically correct. Specifically, a (2) R ! R · +R 0 complete: (1) and S(0)(2) (3) γ ! R· 0 complete: (2) and S(0)(1) heuristic search is performed on the prefix tree expanded S(2) according to the grammar, where the path from the root to (1) R ! R + ·R 0 scan: S(1)(2) a node represents a partial sentence (prefix). By carefully (2) R ! ·R + R 2 predict: (1) (3) R ! ·0 2 predict: (1) defining the heuristic as a prefix probability computed based (4) R ! ·1 2 predict: (1) on the classifier output, we can efficiently search through S(3) (1) R ! 1· 2 scan: S(2)(4) the tree to find the optimal label sentence. (2) R ! R + R· 0 complete: (1) and S(2)(1) (3) R ! R · +R 0 complete: (1) and S(2)(2) The generalized Earley parser has four major advantages. (4) γ ! R· 0 complete: (2) and S(0)(1) i) The inference process highly integrates a high-level grammar with an underlying classifier; the grammar gives guid- Figure 2. An example of the original Earley parser. ance for segmenting and labeling the sequence data. ii) It can be applied to any classifier outputs. iii) It generates a Form is defined by a 4-tuple G = (V; Σ; R; γ) where grammar parse tree for data sequence that is highly explain- 1. V is a finite set of non-terminal symbols that can be able. iv) It is principled and generic, as it applies to most replaced by/expanded to a sequence of symbols. sequence data parsing and prediction problems. 2. Σ is a finite set of terminal symbols that represent actual We evaluate the proposed approach on two datasets of hu- words in a language, which cannot be further expanded. man activities in the computer vision domain. The first 3. R is a finite set of production rules describing the replace- dataset CAD-120 (Koppula et al., 2013) consists of daily ment of symbols, typically of the form A ! BC or A ! α activities and most activity prediction methods are based for A; B; C 2 V and α 2 Σ. on this dataset. Comparisons show that our method sig- 4. γ 2 V is the start symbol (root of the grammar). nificantly outperforms state-of-the-art methods on future Given a formal grammar, parsing is the process of analyzing activity prediction. The second dataset Watch-n-Patch (Wu a string of symbols, conforming to the production rules et al., 2015) is designed for “action patching”, which in- and generating a parse tree. A parse tree represents the cludes daily activities that have action forgotten by people. syntactic structure of a string according to some context- Experiments on the second dataset show the robustness of free grammar. The root node of the tree is the grammar our method on noisy data. Both experiments show that the root. Other non-leaf nodes correspond to non-terminals in generalized Earley parser performs particularly well on pre- the grammar, expanded according to grammar production diction tasks, primarily due to its non-Markovian property. rules. The leaf nodes are terminal symbols. All the leaf This paper makes three major contributions. nodes together form a sentence. • We design a parsing algorithm for symbolic context-free The above definition can be augmented by assigning a proba- grammars. It directly operates on sequence data to obtain bility to each production rule, thus becoming a probabilistic the optimal segmentation and labels. context-free grammar. The probability of a parse tree is the • We propose a prediction algorithm that naturally integrates product of the production rules that derive the parse tree. with this parsing algorithm. • We formulate an objective for future prediction for both grammar induction and classifier training. The generalized 3. Earley Parser Earley parser serves as a concrete example for combining In this section, we review the original Earley parser (Earley, symbolic reasoning methods and connectionist approaches. 1970) and introduce the basic concepts. Earley parser is an algorithm for parsing sentences of a given context-free lan- 2. Background: Context-Free Grammars guage. In the following descriptions, α, β, and γ represent any string of terminals/nonterminals (including the empty In formal language theory, a context-free grammar is a type string ), A and B represent single nonterminals, and a rep- of formal grammar, which contains a set of production rules resents a terminal symbol. We adopt Earley’s dot notation: that describe all possible sentences in a given formal lan- for production of form A ! αβ, the notation A ! α · β guage. A context-free grammar G in Chomsky Normal means α has been parsed and β is expected. Generalized Earley Parser Input position n is defined as the position after accepting 1.0 Sample grammar: the nth token, and input position 0 is the position prior to γ ! R input. At each input position m, the parser generates a state 0.8 0.1e 0.0 R ! R + R set S(m).

Load more