Inductive and Statistical Learning of Formal Grammars

2002 Grammar Induction Machine Learning Inductive and statistical learning of formal grammars Goal: to give the learning ability to a machine Pierre Dupont Design programs the performance of which improves over time [email protected] Inductive learning is a particular instance of machine learning • Goal: to find a general law from examples • Subproblem of theoretical computer science, artificial intelligence or pattern recognition – Typeset by FoilTEX – Pierre Dupont 2 2002 Grammar Induction 2002 Grammar Induction Outline Grammar Induction or Grammatical Inference • Grammar induction definition Grammar induction is a particular case of inductive learning • Learning paradigms The general law is represented by a formal grammar or an equivalent machine • DFA learning from positive and negative examples The set of examples, known as positive sample, is usually made of • RPNI algorithm strings or sequences over a specific alphabet • Probabilistic DFA learning A negative sample, i.e. a set of strings not belonging to the target language, • Application to a natural language task can sometimes help the induction process • Links with Markov models • Smoothing issues Data Induction Grammar • Related problems and future work aaabbb S−>aSb ab S−> λ Pierre Dupont 1 Pierre Dupont 3 2002 Grammar Induction 2002 Grammar Induction Examples Chromosome classification • Natural language sentence • Speech Centromere • Chronological series Chromosome 2a • Successive actions of a WEB user 90 80 70 60 • Successive moves during a chess game 50 grey density 40 30 0 100 200 300 400 500 600 • A musical piece 6 4 2 • A program 0 -2 -4 grey dens. derivative -6 0 100 200 300 400 500 600 • A form characterized by a chain code position along median axis "=====CDFDCBBBBBBBA==bcdc==DGFB=bccb== ...... ==cffc=CCC==cdb==BCB==dfdcb=====" • A biological sequence (DNA, proteins, ...) String of Primitives Pierre Dupont 4 Pierre Dupont 6 2002 Grammar Induction 2002 Grammar Induction Pattern Recognition A modeling hypothesis 16 G 0 '3.4cont' Data Grammar 15 '3.8cont' Generation Induction G 14 13 12 8dC 11 10 • Find G as close as possible to G0 3 2 1 9 8 4 0 7 • The induction process does not prove the existence of G0 5 6 7 6 It is a modeling hypothesis 5 4 3 2 1 0 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 8dC: 000077766676666555545444443211000710112344543311001234454311 Pierre Dupont 5 Pierre Dupont 7 2002 Grammar Induction 2002 Grammar Induction Identification in the limit • Grammar induction definition • Learning paradigms • DFA learning from positive and negative examples G 0 Data Grammar • RPNI algorithm Generation Induction d 1 G 1 • Probabilistic DFA learning d 2 G 2 • Application to a natural language task d • Links with Markov models n G* • Smoothing issues • convergence in finite time to G∗ • Related problems and future work ∗ • G is a representation of L(G0) (exact learning) Pierre Dupont 8 Pierre Dupont 10 2002 Grammar Induction 2002 Grammar Induction Learning paradigms PAC Learning G How to characterize learning? 0 Data Grammar Generation Induction d • which concept classes can or cannot be learned? 1 G 1 d 2 G 2 • what is a good example? d n G* • is it possible to learn in polynomial time? • convergence to G∗ ∗ • G is close enough to G0 with high probability ⇒ Probably Approximately Correct learning • polynomial time complexity Pierre Dupont 9 Pierre Dupont 11 2002 Grammar Induction 2002 Grammar Induction Other learnability results Define a probability distribution D on a set of strings Σ≤n • Identification in the limit in polynomial time L(G 0 ) L(G* ) – DFAs cannot be efficiently identified in the limit – unless we can ask equivalence and membership queries to an oracle Σ∗ • PAC learning P [P (L(G∗) ⊕ L(G )) < ] > 1 − δ D 0 – DFAs are not PAC learnable (under some cryptographic limitation assumption) – unless we can ask membership queries to an oracle The same unknown distribution D is used to generate the sample and to measure the error The result must hold for any distribution D (distribution free requirement) The algorithm must return an hypothesis in polynomial time 1 1 with respect to , δ, n, |R(L)| Pierre Dupont 12 Pierre Dupont 14 2002 Grammar Induction 2002 Grammar Induction Identification in the limit: good and bad news The bad one... • PAC learning with simple examples, i.e. examples drawn according to the conditional Solomonoff-Levin distribution Theorem 1. No superfinite class of languages is identifiable in the limit from −K(x|c) positive data only Pc(x) = λc2 The good one... K(x|c) denotes the Kolmogorov complexity of x given a representation c of the concept to be learned Theorem 2. Any admissible class of languages is identifiable in the limit from positive and negative data – regular languages are PACS learnable with positive examples only – but Kolmogorov complexity is not computable! Pierre Dupont 13 Pierre Dupont 15 2002 Grammar Induction 2002 Grammar Induction Cognitive relevance of learning paradigms • Grammar induction definition A largely unsolved question • Learning paradigms • DFA learning from positive and negative examples Learning paradigms seem irrelevant to model human learning: • RPNI algorithm • Gold’s identification in the limit framework has been criticized as children seem • Probabilistic DFA learning to learn natural language without negative examples • Application to a natural language task • All learning models assume a known representation class • Links with Markov models • Smoothing issues • Some learnability results are based on enumeration • Related problems and future work Pierre Dupont 16 Pierre Dupont 18 2002 Grammar Induction 2002 Grammar Induction Regular Inference from Positive and Negative Data However learning models show that: Additional hypothesis: the underlying theory is a regular grammar or, equiva- • an oracle can help lently, a finite state automaton • some examples are useless, others are good: characteristic samples ⇔ typical examples Property 1. Any regular language has a canonical automaton A(L) which is deterministic and minimal (minimal DFA) • learning well is learning efficiently Example : L = (ba∗a)∗ a • example frequency matters b b a 0 1 2 • good examples are simple examples ⇔ cognitive economy Pierre Dupont 17 Pierre Dupont 19 2002 Grammar Induction 2002 Grammar Induction A few definitions A theorem Definition 1. A positive sample S+ is structurally complete The positive data can be represented by a prefix tree acceptor (PTA) with respect to an automaton A if, when generating S+ from A: a 3 b b a • every transition of A is used at least one a 1 4 6 8 0 b a a 2 5 7 • every final state is used as accepting state of at least one string Example : {aa, abba, baa} a b Theorem 3. If the positive sample is structurally complete with respect to a b a 0 1 2 canonical automaton A(L0) then there exists a partition π of the state set of PTA Example : {ba, baa, baba, λ} such that P T A/π = A(L0) Pierre Dupont 20 Pierre Dupont 22 2002 Grammar Induction 2002 Grammar Induction Merging is fun a 3 A1 a b a 0 1 2 b b a a 1 4 6 8 0 b a a 2 5 7 b a b a a A b a 2 0,1 2 0 1,2 a b 0,1 2 0,2 1 a b a 0 1 b a 0,1,2 b a • Merging ⇔ definition of a partition π on the set of states 0 Example : {{0,1}, {2}} How are we going to find the right partition? Use negative data! • If A2 = A1/π then L(A1) ⊆ L(A2) : merging states ⇔ generalize language Pierre Dupont 21 Pierre Dupont 23 2002 Grammar Induction 2002 Grammar Induction Summary • Grammar induction definition A(L ) 0 Data Grammar Generation Induction PTA π • Learning paradigms PTA π ? • DFA learning from positive and negative examples We observe some positive and negative data • RPNI algorithm The positive sample S+ comes from a regular language L0 • Probabilistic DFA learning The positive sample is assumed to be structurally complete with respect to the • Application to a natural language task canonical automaton A(L0) of the target language L0 (Not an additional hypothesis but a way to restrict the search to reasonable generalizations!) • Links with Markov models • Smoothing issues We build the Prefix Tree Acceptor of S+. By construction L(PTA) = S+ • Related problems and future work Merging states ⇔ generalize S+ The negative sample S− helps to control over-generalization Note: finding the minimal DFA consistent with S+,S− is NP-complete! Pierre Dupont 24 Pierre Dupont 26 2002 Grammar Induction 2002 Grammar Induction An automaton induction algorithm RPNI algorithm RPNI is a particular instance of the “generalization as search” paradigm RPNI follows the prefix order in PTA Algorithm Automaton Induction input a 3 S // positive sample b b a + a 1 4 6 8 S // negative sample − 0 b a a A ← PTA(S+) // PTA 2 5 7 while (i, j) ←choose states () do // Choose a state pair Polynomial time complexity with respect to sample size (S+,S−) if compatible (i, j, S−) then // Check for compatibility of merging i and j A ← A/πij RPNI identifies in the limit the class of regular languages end if A characteristic sample, i.e. a sample such that RPNI is guaranteed to produce end while the correct solution, has a quadratic size with respect to |A(L )| return A 0 Additional heuristics exist to improve performance when such a sample is not provided Pierre Dupont 25 Pierre Dupont 27 2002 Grammar Induction 2002 Grammar Induction RPNI algorithm: pseudo-code Search space characterization input S+,S− Conditions on the learning sample to guarantee the existence of a solution output A DFA consistent with S+,S− begin DFA and NFA in the lattice A ← PTA(S+) // N denotes the number of states of PTA(S+) π ← {{0}, {1},..., {N

Inductive and Statistical Learning of Formal Grammars

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support