Lecture 5: Context Free Grammars
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Theory of Computer Science
Theory of Computer Science MODULE-1 Alphabet A finite set of symbols is denoted by ∑. Language A language is defined as a set of strings of symbols over an alphabet. Language of a Machine Set of all accepted strings of a machine is called language of a machine. Grammar Grammar is the set of rules that generates language. CHOMSKY HIERARCHY OF LANGUAGES Type 0-Language:-Unrestricted Language, Accepter:-Turing Machine, Generetor:-Unrestricted Grammar Type 1- Language:-Context Sensitive Language, Accepter:- Linear Bounded Automata, Generetor:-Context Sensitive Grammar Type 2- Language:-Context Free Language, Accepter:-Push Down Automata, Generetor:- Context Free Grammar Type 3- Language:-Regular Language, Accepter:- Finite Automata, Generetor:-Regular Grammar Finite Automata Finite Automata is generally of two types :(i)Deterministic Finite Automata (DFA)(ii)Non Deterministic Finite Automata(NFA) DFA DFA is represented formally by a 5-tuple (Q,Σ,δ,q0,F), where: Q is a finite set of states. Σ is a finite set of symbols, called the alphabet of the automaton. δ is the transition function, that is, δ: Q × Σ → Q. q0 is the start state, that is, the state of the automaton before any input has been processed, where q0 Q. F is a set of states of Q (i.e. F Q) called accept states or Final States 1. Construct a DFA that accepts set of all strings over ∑={0,1}, ending with 00 ? 1 0 A B S State/Input 0 1 1 A B A 0 B C A 1 * C C A C 0 2. Construct a DFA that accepts set of all strings over ∑={0,1}, not containing 101 as a substring ? 0 1 1 A B State/Input 0 1 S *A A B 0 *B C B 0 0,1 *C A R C R R R R 1 NFA NFA is represented formally by a 5-tuple (Q,Σ,δ,q0,F), where: Q is a finite set of states. -
Probabilistic Grammars and Their Applications This Discretion to Pursue Political and Economic Ends
Probabilistic Grammars and their Applications this discretion to pursue political and economic ends. and the Law; Monetary Policy; Multinational Cor- Most experiences, however, suggest the limited power porations; Regulation, Economic Theory of; Regu- of privatization in changing the modes of governance lation: Working Conditions; Smith, Adam (1723–90); which are prevalent in each country’s large private Socialism; Venture Capital companies. Further, those countries which have chosen the mass (voucher) privatization route have done so largely out of necessity and face ongoing efficiency problems as a result. In the UK, a country Bibliography whose privatization policies are often referred to as a Armijo L 1998 Balance sheet or ballot box? Incentives to benchmark, ‘control [of privatized companies] is not privatize in emerging democracies. In: Oxhorn P, Starr P exerted in the forms of threats of take-over or (eds.) The Problematic Relationship between Economic and bankruptcy; nor has it for the most part come from Political Liberalization. Lynne Rienner, Boulder, CO Bishop M, Kay J, Mayer C 1994 Introduction: privatization in direct investor intervention’ (Bishop et al. 1994, p. 11). After the steep rise experienced in the immediate performance. In: Bishop M, Kay J, Mayer C (eds.) Pri ati- zation and Economic Performance. Oxford University Press, aftermath of privatizations, the slow but constant Oxford, UK decline in the number of small shareholders highlights Boubakri N, Cosset J-C 1998 The financial and operating the difficulties in sustaining people’s capitalism in the performance of newly privatized firms: evidence from develop- longer run. In Italy, for example, privatization was ing countries. -
Cs 61A/Cs 98-52
CS 61A/CS 98-52 Mehrdad Niknami University of California, Berkeley Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 1 / 23 Something like this? (Is this good?) def find(string, pattern): n= len(string) m= len(pattern) for i in range(n-m+ 1): is_match= True for j in range(m): if pattern[j] != string[i+ j] is_match= False break if is_match: return i What if you were looking for a pattern? Like an email address? Motivation How would you find a substring inside a string? Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23 def find(string, pattern): n= len(string) m= len(pattern) for i in range(n-m+ 1): is_match= True for j in range(m): if pattern[j] != string[i+ j] is_match= False break if is_match: return i What if you were looking for a pattern? Like an email address? Motivation How would you find a substring inside a string? Something like this? (Is this good?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23 What if you were looking for a pattern? Like an email address? Motivation How would you find a substring inside a string? Something like this? (Is this good?) def find(string, pattern): n= len(string) m= len(pattern) for i in range(n-m+ 1): is_match= True for j in range(m): if pattern[j] != string[i+ j] is_match= False break if is_match: return i Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23 Motivation How would you find a substring inside a string? Something like this? (Is this good?) def find(string, pattern): n= len(string) m= len(pattern) for i in range(n-m+ 1): is_match= True for j in range(m): if pattern[j] != string[i+ -
Grammars and Normal Forms
Grammars and Normal Forms Read K & S 3.7. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no, AND if yes, describe the structure a + b * c Now it's time to worry about extracting structure (and doing so efficiently). Optimizing Context-Free Languages For regular languages: Computation = operation of FSMs. So, Optimization = Operations on FSMs: Conversion to deterministic FSMs Minimization of FSMs For context-free languages: Computation = operation of parsers. So, Optimization = Operations on languages Operations on grammars Parser design Before We Start: Operations on Grammars There are lots of ways to transform grammars so that they are more useful for a particular purpose. the basic idea: 1. Apply transformation 1 to G to get of undesirable property 1. Show that the language generated by G is unchanged. 2. Apply transformation 2 to G to get rid of undesirable property 2. Show that the language generated by G is unchanged AND that undesirable property 1 has not been reintroduced. 3. Continue until the grammar is in the desired form. Examples: • Getting rid of ε rules (nullable rules) • Getting rid of sets of rules with a common initial terminal, e.g., • A → aB, A → aC become A → aD, D → B | C • Conversion to normal forms Lecture Notes 16 Grammars and Normal Forms 1 Normal Forms If you want to design algorithms, it is often useful to have a limited number of input forms that you have to deal with. Normal forms are designed to do just that. -
CS351 Pumping Lemma, Chomsky Normal Form Chomsky Normal
CS351 Pumping Lemma, Chomsky Normal Form Chomsky Normal Form When working with a context-free grammar a simple and useful form is called the Chomsky Normal Form (CNF). A CFG in CNF is one where every rule is of the form: A ! BC A ! a Where a is any terminal and A,B, and C are any variables, except that B and C may not be the start variable. Note that we have two and only two variables on the right hand side of the rule, with the exception that the rule S!ε is permitted where S is the start variable. Theorem: Any context free language may be generated by a context-free grammar in Chomsky normal form. To show how to make this conversion, we will need to do three things: 1. Eliminate all ε rules of the form A!ε 2. Eliminate all unit rules of the form A!B 3. Convert remaining rules into rules of the form A!BC Proof: 1. First add a new start symbol S0 and the rule S0 ! S, where S was the original start symbol. This guarantees that the start symbol doesn’t occur on the right hand side of a rule. 2. Remove all ε rules. Remove a rule A!ε where A is not the start symbol For each occurrence of A on the right-hand side of a rule, add a new rule with that occurrence of A deleted. Ex: R!uAv becomes R!uv This must be done for each occurrence of A, so the rule: R!uAvAw becomes R! uvAw and R! uAvw and R!uvw This step must be repeated until all ε rules are removed, not including the start. -
Regular Languages and Finite Automata for Part IA of the Computer Science Tripos
N Lecture Notes on Regular Languages and Finite Automata for Part IA of the Computer Science Tripos Prof. Andrew M. Pitts Cambridge University Computer Laboratory c 2012 A. M. Pitts Contents Learning Guide ii 1 Regular Expressions 1 1.1 Alphabets,strings,andlanguages . .......... 1 1.2 Patternmatching................................. .... 4 1.3 Somequestionsaboutlanguages . ....... 6 1.4 Exercises....................................... .. 8 2 Finite State Machines 11 2.1 Finiteautomata .................................. ... 11 2.2 Determinism, non-determinism, and ε-transitions. .. .. .. .. .. .. .. .. .. 14 2.3 Asubsetconstruction . .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..... 17 2.4 Summary ........................................ 20 2.5 Exercises....................................... .. 20 3 Regular Languages, I 23 3.1 Finiteautomatafromregularexpressions . ............ 23 3.2 Decidabilityofmatching . ...... 28 3.3 Exercises....................................... .. 30 4 Regular Languages, II 31 4.1 Regularexpressionsfromfiniteautomata . ........... 31 4.2 Anexample ....................................... 32 4.3 Complementand intersectionof regularlanguages . .............. 34 4.4 Exercises....................................... .. 36 5 The Pumping Lemma 39 5.1 ProvingthePumpingLemma . .... 40 5.2 UsingthePumpingLemma . ... 41 5.3 Decidabilityoflanguageequivalence . ........... 44 5.4 Exercises....................................... .. 45 6 Grammars 47 6.1 Context-freegrammars . ..... 47 6.2 Backus-NaurForm ................................ -
Neural Edit Operations for Biological Sequences
Neural Edit Operations for Biological Sequences Satoshi Koide Keisuke Kawano Toyota Central R&D Labs. Toyota Central R&D Labs. [email protected] [email protected] Takuro Kutsuna Toyota Central R&D Labs. [email protected] Abstract The evolution of biological sequences, such as proteins or DNAs, is driven by the three basic edit operations: substitution, insertion, and deletion. Motivated by the recent progress of neural network models for biological tasks, we implement two neural network architectures that can treat such edit operations. The first proposal is the edit invariant neural networks, based on differentiable Needleman-Wunsch algorithms. The second is the use of deep CNNs with concatenations. Our analysis shows that CNNs can recognize regular expressions without Kleene star, and that deeper CNNs can recognize more complex regular expressions including the insertion/deletion of characters. The experimental results for the protein secondary structure prediction task suggest the importance of insertion/deletion. The test accuracy on the widely-used CB513 dataset is 71.5%, which is 1.2-points better than the current best result on non-ensemble models. 1 Introduction Neural networks are now used in many applications, not limited to classical fields such as image processing, speech recognition, and natural language processing. Bioinformatics is becoming an important application field of neural networks. These biological applications are often implemented as a supervised learning model that takes a biological string (such as DNA or protein) as an input, and outputs the corresponding label(s), such as a protein secondary structure [13, 14, 15, 18, 19, 23, 24, 26], protein contact maps [4, 8], and genome accessibility [12]. -
LING83600: Context-Free Grammars
LING83600: Context-free grammars Kyle Gorman 1 Introduction Context-free grammars (or CFGs) are a formalization of what linguists call “phrase structure gram- mars”. The formalism is originally due to Chomsky (1956) though it has been independently discovered several times. The insight underlying CFGs is the notion of constituency. Most linguists believe that human languages cannot be even weakly described by CFGs (e.g., Schieber 1985). And in formal work, context-free grammars have long been superseded by “trans- formational” approaches which introduce movement operations (in some camps), or by “general- ized phrase structure” approaches which add more complex phrase-building operations (in other camps). However, unlike more expressive formalisms, there exist relatively-efficient cubic-time parsing algorithms for CFGs. Nearly all programming languages are described by—and parsed using—a CFG,1 and CFG grammars of human languages are widely used as a model of syntactic structure in natural language processing and understanding tasks. Bibliographic note This handout covers the formal definition of context-free grammars; the Cocke-Younger-Kasami (CYK) parsing algorithm will be covered in a separate assignment. The definitions here are loosely based on chapter 11 of Jurafsky & Martin, who in turn draw from Hopcroft and Ullman 1979. 2 Definitions 2.1 Context-free grammars A context-free grammar G is a four-tuple N; Σ; R; S such that: • N is a set of non-terminal symbols, corresponding to phrase markers in a syntactic tree. • Σ is a set of terminal symbols, corresponding to words (i.e., X0s) in a syntactic tree. • R is a set of production rules. -
Context-Free Grammars
Chapter 3 Context-Free Grammars By Dr Zalmiyah Zakaria •Context-Free Grammars and Languages •Regular Grammars Formal Definition of Context-Free Grammars (CFG) A CFG can be formally defined by a quadruple of (V, , P, S) where: – V is a finite set of variables (non-terminal) – (the alphabet) is a finite set of terminal symbols , where V = – P is a finite set of rules (production rules) written as: A for A V, (v )*. – S is the start symbol, S V 2 Formal Definition of CFG • We can give a formal description to a particular CFG by specifying each of its four components, for example, G = ({S, A}, {0, 1}, P, S) where P consists of three rules: S → 0S1 S → A A → Sept2011 Theory of Computer Science 3 Context-Free Grammars • Terminal symbols – elements of the alphabet • Variables or non-terminals – additional symbols used in production rules • Variable S (start symbol) initiates the process of generating acceptable strings. 4 Terminal or Variable ? • S → (S) | S + S | S × S | A • A → 1 | 2 | 3 • The terminal symbols are { (, ), +, ×, 1, 2, 3} • The variable symbols are S and A Sept2011 Theory of Computer Science 5 Context-Free Grammars • A rule is an element of the set V (V )*. • An A rule: [A, w] or A w • A null rule or lambda rule: A 6 Context-Free Grammars • Grammars are used to generate strings of a language. • An A rule can be applied to the variable A whenever and wherever it occurs. • No limitation on applicability of a rule – it is context free 8 Context-Free Grammars • CFG have no restrictions on the right-hand side of production rules. -
Theory of Computation
Theory of Computation Alexandre Duret-Lutz [email protected] September 10, 2010 ADL Theory of Computation 1 / 121 References Introduction to the Theory of Computation (Michael Sipser, 2005). Lecture notes from Pierre Wolper's course at http://www.montefiore.ulg.ac.be/~pw/cours/calc.html (The page is in French, but the lecture notes labelled chapitre 1 to chapitre 8 are in English). Elements of Automata Theory (Jacques Sakarovitch, 2009). Compilers: Principles, Techniques, and Tools (A. Aho, R. Sethi, J. Ullman, 2006). ADL Theory of Computation 2 / 121 Introduction What would be your reaction if someone came at you to explain he has invented a perpetual motion machine (i.e. a device that can sustain continuous motion without losing energy or matter)? You would probably laugh. Without looking at the machine, you know outright that such the device cannot sustain perpetual motion. Indeed the laws of thermodynamics demonstrate that perpetual motion devices cannot be created. We know beforehand, from scientic knowledge, that building such a machine is impossible. The ultimate goal of this course is to develop similar knowledge for computer programs. ADL Theory of Computation 3 / 121 Theory of Computation Theory of computation studies whether and how eciently problems can be solved using a program on a model of computation (abstractions of a computer). Computability theory deals with the whether, i.e., is a problem solvable for a given model. For instance a strong result we will learn is that the halting problem is not solvable by a Turing machine. Complexity theory deals with the how eciently. -
A Representation-Based Approach to Connect Regular Grammar and Deep Learning
The Pennsylvania State University The Graduate School A REPRESENTATION-BASED APPROACH TO CONNECT REGULAR GRAMMAR AND DEEP LEARNING A Dissertation in Information Sciences and Technology by Kaixuan Zhang ©2021 Kaixuan Zhang Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2021 The dissertation of Kaixuan Zhang was reviewed and approved by the following: C. Lee Giles Professor of College of Information Sciences and Technology Dissertation Adviser Chair of Committee Kenneth Huang Assistant Professor of College of Information Sciences and Technology Shomir Wilson Assistant Professor of College of Information Sciences and Technology Daniel Kifer Associate Professor of Department of Computer Science and Engineering Mary Beth Rosson Professor of Information Sciences and Technology Director of Graduate Programs, College of Information Sciences and Technology ii Abstract Formal language theory has brought amazing breakthroughs in many traditional areas, in- cluding control systems, compiler design, and model verification, and continues promoting these research directions. As recent years have witnessed that deep learning research brings the long-buried power of neural networks to the surface and has brought amazing break- throughs, it is crucial to revisit formal language theory from a new perspective. Specifically, investigation of the theoretical foundation, rather than a practical application of the con- necting point obviously warrants attention. On the other hand, as the spread of deep neural networks (DNN) continues to reach multifarious branches of research, it has been found that the mystery of these powerful models is equally impressive as their capability in learning tasks. Recent work has demonstrated the vulnerability of DNN classifiers constructed for many different learning tasks, which opens the discussion of adversarial machine learning and explainable artificial intelligence. -
6.035 Lecture 2, Specifying Languages with Regular Expressions and Context-Free Grammars
MIT 6.035 Specifying Languages with Regular Expressions and Context-Free Grammars Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Language Definition Problem • How to precisely define language • Layered struc ture of langua ge defifinitiition • Start with a set of letters in language • Lexical structurestructure - identifies “words ” in language (each word is a sequence of letters) • Syntactic sstructuretructure - identifies “sentences ” in language (each sentence is a sequence of words) • Semantics - meaning of program (specifies what result should be for each input) • Today’s topic: lexical and syntactic structures Specifying Formal Languages • Huge Triumph of Computer Science • Beautiful Theoretical Results • Practical Techniques and Applications • Two Dual Notions • Generative approach ((grammar or regular expression) • Recognition approach (automaton) • Lots of theorems about convertingconverting oonene approach automatically to another Specifying Lexical Structure Using Regular ExpressionsExpressions • Have some alphabet ∑ = set of letters • R egular expressions are bu ilt from: • ε -empty string • AAnyn yl lettere tte rf fromr oma alphabetlp ha be t ∑ •r1r2 – regular expression r1 followed by r2 (sequence) •r1| r2 – either regular expression r1 or r2 (choice) • r* - iterated sequence and choice ε | r | rr | … • Parentheses to indicate grouping/precedence Concept of Regular Expression Generating a StringString Rewrite regular expression until have only a sequence of letters (string) left Genera