Stream Processing Using Grammars and Regular Expressions

Total Page:16

File Type:pdf, Size:1020Kb

Stream Processing Using Grammars and Regular Expressions Stream Processing Using Grammars and Regular Expressions Ulrik Terp Rasmussen DIKU, Department of Computer Science University of Copenhagen, Denmark May 1, 2017 arXiv:1704.08820v1 [cs.FL] 28 Apr 2017 PhD Thesis This thesis has been submitted to the PhD School of the Faculty of Science, University of Copenhagen, Denmark i Abstract In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high- performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action ma- chines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear- time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot. iii Resumé I denne afhandling beskæftiger vi os med parsing med regulære udtryk samt anvendelsen af grammatiske specifikationer til syntese af hurtige, strømmende programmer til strengprocessering. I første del udvikler vi to algoritmer til parsing med regulære udtryk i lineær tid, og med grådig afgørelse af flertydigheder i stil med Perl. Den første algoritme består af to faser, der afvikles på en semi-strømmende facon med konstant størrelse arbejdslager, samt et ekstra båndlager der henholdsvis skrives og læses af hver af de to faser. Den anden algoritme består af en enkelt fase og er optimalt strømmende i den forstand, at den udskriver så meget af parse-træet, som det er semantisk muligt ud fra det præfix af inddata, der på det givne tidspunkt er blevet indlæst. Algoritmen falder tilbage til buffering af så mange inputsymboler, som det er nødvendigt for at kunne afgøre næste valg. Optimalitet opnås ved hjælp af en PSPACE-fuldstændig præanalyse af det regulære udtryk. I anden del præsenterer vi Kleenex, et sprog til at udtrykke højtyd- ende, strømmende strengprocesseringsprogrammer som regulære gram- matikker med indlejrede semantiske handlinger, samt dets oversættelse til streaming string transducers med worst-case lineær tids ydelse. Den underliggende teori er baseret på dekomponering af transducere i orakel- og handlingsmaskiner, samt en specialisering af den strømmende par- singalgoritme fra den første del som en endelig tilstandsmaskine. I anden del udvikler vi også en ny lineær tids, strømmende parsing algoritme til parsing expression grammars (PEG) der generaliserer de regulære gram- matikker fra Kleenex. Algoritmen er baseret på en bottom-up tabelopstil- lingsalgoritme, der reformuleres ved brug af mindste fikspunkter, og som beregnes ved hjælp af en instans af Cousot og Cousots chaotic iteration. Contents Contents iv List of Figures vi Preface ix 1 Introduction 1 2 Regular Expression Based Parsing 3 2.1 Regular Expressions In Theory . 4 2.2 Regular Expressions In Practice . 8 2.3 Regular Expression Based Parsing . 11 2.4 Parsing as Transduction . 14 2.5 Recognition, Matching and Parsing Techniques . 17 2.6 Our Contributions . 21 2.7 Conclusions and Perspectives . 22 3 Grammar Based Stream Processing 25 3.1 Context-Free Grammars . 26 3.2 Syntax-Directed Translation Schemes . 30 3.3 Parsing Expression Grammars . 32 3.4 String Processing Methods . 35 3.5 Our Contributions . 37 3.6 Conclusions and Perspectives . 38 Bibliography 41 Paper A Two-Pass Greedy Regular Expression Parsing 51 A.1 Introduction . 52 A.2 Symmetric NFA Representation of Parse Trees . 53 A.3 Greedy parsing . 56 A.4 NFA-Simulation with Ordered State Sets . 57 A.5 Lean-log Algorithm . 59 A.6 Evaluation . 61 iv CONTENTS v A.7 Related work . 63 Bibliography 67 Paper B Optimally Streaming Greedy Regular Expression Parsing 69 B.1 Introduction . 70 B.2 Preliminaries . 72 B.3 Augmented Automata . 73 B.4 Disambiguation . 75 B.5 Optimal Streaming . 76 B.6 Coverage . 77 B.7 Algorithm . 79 B.8 Example . 83 B.9 Related and Future Work . 87 Bibliography 89 Paper C Kleenex: High-Performance Stream Processing 91 C.1 Introduction . 92 C.2 Transducers . 96 C.3 Kleenex . 99 C.4 Streaming Simulation . 103 C.5 Determinization . 108 C.6 Implementation and Benchmarks . 113 C.7 Use Cases . 119 C.8 Discussion . 122 C.9 Conclusions . 126 Bibliography 129 Paper D PEG Parsing Using Tabling and Dynamic Analysis 137 D.1 Introduction . 138 D.2 Parsing Formalism . 141 D.3 Tabulation of Operational Semantics . 146 D.4 Streaming Parsing with Tables . 149 D.5 Algorithm . 154 D.6 Evaluation . 159 D.7 Discussion . 163 D.8 Conclusion . 165 D.9 Proofs . 166 Bibliography 173 List of Figures 2.1 Terms with flattening aba for REs a(a + b)∗ and (ab + a)(a + b)∗. 14 2.2 A bit-coded parsing transducer for the RE (ab + a)(a + b)∗. 16 3.1 A context-free grammar. 27 3.2 A parse tree and the corresponding derivation. 28 3.3 Regular CFG and its parsing transducer. 29 3.4 Example SDT for reformulating simple English phrases. 30 3.5 Greedy parse of the string a man who was happy, using SDT from Figure 3.4. 31 3.6 Example of a simple PEG. 32 3.7 A PEG parse tree for the string (0+1)+46. 33 A.1 aNFA construction schema. 55 A.2 Comparisons using very simple iteration expressions. 64 A.3 Comparison using a backtracking worst case expression, and its reversal. 65 A.4 Comparison using various e-mail expressions. 66 B.1 Example automaton for the RE (a + b)?b . 74 B.2 Example of streaming algorithm on RE (aaa + aa)?. 85 B.3 Example of streaming algorithm on (aa)?(za + zb) + a?z(a + b). 86 C.1 Kleenex program with transducer, oracle and action machines. 98 C.2 Path tree example. 106 C.3 SST constructed from the oracle machine in Figure C.1. 112 C.4 flip_ab run on lines with average length 1000. 116 C.5 patho2 run on lines with average length 1000. 117 C.6 Inserting separators in random numbers of average length 1000. 118 C.7 Throughput when parsing 250 MiB random IRC data. 119 C.8 Benchmark for program csv_project3. 120 C.9 JSON to SQL benchmark. 121 C.10 Apache Log to JSON benchmark. 122 C.11 ISO time stamps to JSON benchmark. 123 vi List of Figures vii D.1 Example of prefix tables and online expansion. 153 D.2 Parsing algorithm. 156 Preface This dissertation has been submitted to the PhD School of Science, Faculty of Science, University of Copenhagen, in partial fulfillment of the degree of PhD at Department of Computer Science (DIKU). The dissertation is written as a synopsis of four enclosed research papers, including three peer-reviewed conference papers and one, as of yet, unpub- lished manuscript. Chapter 1 presents a brief introduction to the two topics of this dissertation. Chapters 2 and 3 each give a more comprehensive overview of the respective topic, including an outline of the area of research, the main problems to be solved, and my contribution in relation to existing work in the literature. Each chapter concludes with a brief outline of the perspectives for future work. I could not have written this dissertation alone, so at this point I would like to take the opportunity to thank the people who have helped me along the way. First of all, the material presented here is the result of close collaboration with my coauthors, to whom I would like to express my sincere gratitude. To Fritz Henglein, my supervisor, thank you for giving me both enormous freedom in my research and expert guidance when needed. Your passion and never-ending spirit has been a constant source of inspiration. To Bjørn, thank you for being a great colleague, friend, office mate and travel companion. To Dexter Kozen, thank you for hosting me for a wonderful five months at Cornell University. To all of my current and past office mates at both Cornell and DIKU, and to my colleagues in the DIKU APL section, thank you for providing a pleasant and stimulating work environment. To my family, thank you for your support and understanding during my work on this dissertation. To Lotte, thank you for everything. Ulrik Terp Rasmussen ix Chapter 1 Introduction Programmers need to make several trade-offs when writing software. Most important, the software has to be correct while at the same time being able to handle all reasonably expected inputs and usage scenarios. In addition, the underlying implementation should be as simple as possible so that it can be maintained and adjusted without major risks of introducing errors. Furthermore, the software should also be efficient in the sense that requests are answered within a reasonable time frame, and there should be no way to bring the software to use excessive amounts of time and space, either by an adversary or by accident.
Recommended publications
  • Data Streams and Applications in Computer Science
    Data Streams and Applications in Computer Science David P. Woodruff IBM Research Almaden [email protected] Abstract This is a short survey of my work in data streams and related appli- cations, such as communication complexity, numerical linear algebra, and sparse recovery. The goal is give a non-technical overview of results in data streams, and highlight connections between these different areas. It is based on my Presburger lecture award given at ICALP, 2014. 1 The Data Stream Model Informally speaking, a data stream is a sequence of data that is too large to be stored in available memory. The data may be a sequence of numbers, points, edges in a graph, and so on. There are many examples of data streams, such as internet search logs, network traffic, sensor networks, and scientific data streams (such as in astronomics, genomics, physical simulations, etc.). The abundance of data streams has led to new algorithmic paradigms for processing them, which often impose very stringent requirements on the algorithm’s resources. Formally, in the streaming model, there is a sequence of elements a1;:::; am presented to an algorithm, where each element is drawn from a universe [n] = f1;:::; ng. The algorithm is allowed a single or a small number of passes over the stream. In network applications, the algorithm is typically only given a single pass, since if data on a network is not physically stored somewhere, it may be impossible to make a second pass over it. In other applications, such as when data resides on external memory, it may be streamed through main memory a small number of times, each time constituting a pass of the algorithm.
    [Show full text]
  • Theory of Computer Science
    Theory of Computer Science MODULE-1 Alphabet A finite set of symbols is denoted by ∑. Language A language is defined as a set of strings of symbols over an alphabet. Language of a Machine Set of all accepted strings of a machine is called language of a machine. Grammar Grammar is the set of rules that generates language. CHOMSKY HIERARCHY OF LANGUAGES Type 0-Language:-Unrestricted Language, Accepter:-Turing Machine, Generetor:-Unrestricted Grammar Type 1- Language:-Context Sensitive Language, Accepter:- Linear Bounded Automata, Generetor:-Context Sensitive Grammar Type 2- Language:-Context Free Language, Accepter:-Push Down Automata, Generetor:- Context Free Grammar Type 3- Language:-Regular Language, Accepter:- Finite Automata, Generetor:-Regular Grammar Finite Automata Finite Automata is generally of two types :(i)Deterministic Finite Automata (DFA)(ii)Non Deterministic Finite Automata(NFA) DFA DFA is represented formally by a 5-tuple (Q,Σ,δ,q0,F), where: Q is a finite set of states. Σ is a finite set of symbols, called the alphabet of the automaton. δ is the transition function, that is, δ: Q × Σ → Q. q0 is the start state, that is, the state of the automaton before any input has been processed, where q0 Q. F is a set of states of Q (i.e. F Q) called accept states or Final States 1. Construct a DFA that accepts set of all strings over ∑={0,1}, ending with 00 ? 1 0 A B S State/Input 0 1 1 A B A 0 B C A 1 * C C A C 0 2. Construct a DFA that accepts set of all strings over ∑={0,1}, not containing 101 as a substring ? 0 1 1 A B State/Input 0 1 S *A A B 0 *B C B 0 0,1 *C A R C R R R R 1 NFA NFA is represented formally by a 5-tuple (Q,Σ,δ,q0,F), where: Q is a finite set of states.
    [Show full text]
  • Cs 61A/Cs 98-52
    CS 61A/CS 98-52 Mehrdad Niknami University of California, Berkeley Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 1 / 23 Something like this? (Is this good?) def find(string, pattern): n= len(string) m= len(pattern) for i in range(n-m+ 1): is_match= True for j in range(m): if pattern[j] != string[i+ j] is_match= False break if is_match: return i What if you were looking for a pattern? Like an email address? Motivation How would you find a substring inside a string? Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23 def find(string, pattern): n= len(string) m= len(pattern) for i in range(n-m+ 1): is_match= True for j in range(m): if pattern[j] != string[i+ j] is_match= False break if is_match: return i What if you were looking for a pattern? Like an email address? Motivation How would you find a substring inside a string? Something like this? (Is this good?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23 What if you were looking for a pattern? Like an email address? Motivation How would you find a substring inside a string? Something like this? (Is this good?) def find(string, pattern): n= len(string) m= len(pattern) for i in range(n-m+ 1): is_match= True for j in range(m): if pattern[j] != string[i+ j] is_match= False break if is_match: return i Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23 Motivation How would you find a substring inside a string? Something like this? (Is this good?) def find(string, pattern): n= len(string) m= len(pattern) for i in range(n-m+ 1): is_match= True for j in range(m): if pattern[j] != string[i+
    [Show full text]
  • Ragel State Machine Compiler User Guide
    Ragel State Machine Compiler User Guide by Adrian Thurston License Ragel version 6.3, August 2008 Copyright c 2003-2007 Adrian Thurston This document is part of Ragel, and as such, this document is released under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Ragel is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PUR- POSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Ragel; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA i Contents 1 Introduction 1 1.1 Abstract...........................................1 1.2 Motivation.........................................1 1.3 Overview..........................................2 1.4 Related Work........................................4 1.5 Development Status....................................5 2 Constructing State Machines6 2.1 Ragel State Machine Specifications............................6 2.1.1 Naming Ragel Blocks...............................7 2.1.2 Machine Definition.................................7 2.1.3 Machine Instantiation...............................7 2.1.4 Including Ragel Code...............................7 2.1.5 Importing Definitions...............................7 2.2 Lexical Analysis of a Ragel Block.............................8
    [Show full text]
  • Efficient Semi-Streaming Algorithms for Local Triangle Counting In
    Efficient Semi-streaming Algorithms for Local Triangle Counting in Massive Graphs ∗ Luca Becchetti Paolo Boldi Carlos Castillo “Sapienza” Università di Roma Università degli Studi di Milano Yahoo! Research Rome, Italy Milan, Italy Barcelona, Spain [email protected] [email protected] [email protected] Aristides Gionis Yahoo! Research Barcelona, Spain [email protected] ABSTRACT Categories and Subject Descriptors In this paper we study the problem of local triangle count- H.3.3 [Information Systems]: Information Search and Re- ing in large graphs. Namely, given a large graph G = (V; E) trieval we want to estimate as accurately as possible the number of triangles incident to every node v 2 V in the graph. The problem of computing the global number of triangles in a General Terms graph has been considered before, but to our knowledge this Algorithms, Measurements is the first paper that addresses the problem of local tri- angle counting with a focus on the efficiency issues arising Keywords in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can Graph Mining, Semi-Streaming, Probabilistic Algorithms be used in many interesting applications. For example, we show that the measures we compute can help to detect the 1. INTRODUCTION presence of spamming activity in large-scale Web graphs, as Graphs are a ubiquitous data representation that is used well as to provide useful features to assess content quality to model complex relations in a wide variety of applica- in social networks. tions, including biochemistry, neurobiology, ecology, social For computing the local number of triangles we propose sciences, and information systems.
    [Show full text]
  • A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages
    Journal of Information Processing Vol.24 No.2 256–264 (Mar. 2016) [DOI: 10.2197/ipsjjip.24.256] Regular Paper A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages Tetsuro Matsumura1,†1 Kimio Kuramitsu1,a) Received: May 18, 2015, Accepted: November 5, 2015 Abstract: Parsing Expression Grammars are a popular foundation for describing syntax. Unfortunately, several syn- tax of programming languages are still hard to recognize with pure PEGs. Notorious cases appears: typedef-defined names in C/C++, indentation-based code layout in Python, and HERE document in many scripting languages. To recognize such PEG-hard syntax, we have addressed a declarative extension to PEGs. The “declarative” extension means no programmed semantic actions, which are traditionally used to realize the extended parsing behavior. Nez is our extended PEG language, including symbol tables and conditional parsing. This paper demonstrates that the use of Nez Extensions can realize many practical programming languages, such as C, C#, Ruby, and Python, which involve PEG-hard syntax. Keywords: parsing expression grammars, semantic actions, context-sensitive syntax, and case studies on program- ming languages invalidate the declarative property of PEGs, thus resulting in re- 1. Introduction duced reusability of grammars. As a result, many developers need Parsing Expression Grammars [5], or PEGs, are a popu- to redevelop grammars for their software engineering tools. lar foundation for describing programming language syntax [6], In this paper, we propose a declarative extension of PEGs for [15]. Indeed, the formalism of PEGs has many desirable prop- recognizing context-sensitive syntax. The “declarative” exten- erties, including deterministic behaviors, unlimited look-aheads, sion means no arbitrary semantic actions that are written in a gen- and integrated lexical analysis known as scanner-less parsing.
    [Show full text]
  • PHP 8.0.2 - Phpinfo() 2021-02-23 14:53
    PHP 8.0.2 - phpinfo() 2021-02-23 14:53 PHP Version 8.0.2 System Linux effa5f35b7e3 5.4.83-v7l+ #1379 SMP Mon Dec 14 13:11:54 GMT 2020 armv7l Build Date Feb 9 2021 12:01:16 Build System Linux 96bc8a22765c 4.15.0-129-generic #132-Ubuntu SMP Thu Dec 10 14:07:05 UTC 2020 armv8l GNU/Linux Configure Command './configure' '--build=arm-linux-gnueabihf' '--with-config-file-path=/usr/local/etc/php' '--with-config-file-scan- dir=/usr/local/etc/php/conf.d' '--enable-option-checking=fatal' '--with-mhash' '--with-pic' '--enable-ftp' '--enable- mbstring' '--enable-mysqlnd' '--with-password-argon2' '--with-sodium=shared' '--with-pdo-sqlite=/usr' '--with- sqlite3=/usr' '--with-curl' '--with-libedit' '--with-openssl' '--with-zlib' '--with-pear' '--with-libdir=lib/arm-linux-gnueabihf' '- -with-apxs2' '--disable-cgi' 'build_alias=arm-linux-gnueabihf' Server API Apache 2.0 Handler Virtual Directory Support disabled Configuration File (php.ini) Path /usr/local/etc/php Loaded Configuration File /usr/local/etc/php/php.ini Scan this dir for additional .ini files /usr/local/etc/php/conf.d Additional .ini files parsed /usr/local/etc/php/conf.d/docker-php-ext-gd.ini, /usr/local/etc/php/conf.d/docker-php-ext-mysqli.ini, /usr/local/etc/php/conf.d/docker-php-ext-pdo_mysql.ini, /usr/local/etc/php/conf.d/docker-php-ext-sodium.ini, /usr/local/etc/php/conf.d/docker-php-ext-zip.ini PHP API 20200930 PHP Extension 20200930 Zend Extension 420200930 Zend Extension Build API420200930,NTS PHP Extension Build API20200930,NTS Debug Build no Thread Safety disabled Zend Signal Handling enabled
    [Show full text]
  • Regular Languages and Finite Automata for Part IA of the Computer Science Tripos
    N Lecture Notes on Regular Languages and Finite Automata for Part IA of the Computer Science Tripos Prof. Andrew M. Pitts Cambridge University Computer Laboratory c 2012 A. M. Pitts Contents Learning Guide ii 1 Regular Expressions 1 1.1 Alphabets,strings,andlanguages . .......... 1 1.2 Patternmatching................................. .... 4 1.3 Somequestionsaboutlanguages . ....... 6 1.4 Exercises....................................... .. 8 2 Finite State Machines 11 2.1 Finiteautomata .................................. ... 11 2.2 Determinism, non-determinism, and ε-transitions. .. .. .. .. .. .. .. .. .. 14 2.3 Asubsetconstruction . .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..... 17 2.4 Summary ........................................ 20 2.5 Exercises....................................... .. 20 3 Regular Languages, I 23 3.1 Finiteautomatafromregularexpressions . ............ 23 3.2 Decidabilityofmatching . ...... 28 3.3 Exercises....................................... .. 30 4 Regular Languages, II 31 4.1 Regularexpressionsfromfiniteautomata . ........... 31 4.2 Anexample ....................................... 32 4.3 Complementand intersectionof regularlanguages . .............. 34 4.4 Exercises....................................... .. 36 5 The Pumping Lemma 39 5.1 ProvingthePumpingLemma . .... 40 5.2 UsingthePumpingLemma . ... 41 5.3 Decidabilityoflanguageequivalence . ........... 44 5.4 Exercises....................................... .. 45 6 Grammars 47 6.1 Context-freegrammars . ..... 47 6.2 Backus-NaurForm ................................
    [Show full text]
  • Backtrack Parsing Context-Free Grammar Context-Free Grammar
    Context-free Grammar Problems with Regular Context-free Grammar Language and Is English a regular language? Bad question! We do not even know what English is! Two eggs and bacon make(s) a big breakfast Backtrack Parsing Can you slide me the salt? He didn't ought to do that But—No! Martin Kay I put the wine you brought in the fridge I put the wine you brought for Sandy in the fridge Should we bring the wine you put in the fridge out Stanford University now? and University of the Saarland You said you thought nobody had the right to claim that they were above the law Martin Kay Context-free Grammar 1 Martin Kay Context-free Grammar 2 Problems with Regular Problems with Regular Language Language You said you thought nobody had the right to claim [You said you thought [nobody had the right [to claim that they were above the law that [they were above the law]]]] Martin Kay Context-free Grammar 3 Martin Kay Context-free Grammar 4 Problems with Regular Context-free Grammar Language Nonterminal symbols ~ grammatical categories Is English mophology a regular language? Bad question! We do not even know what English Terminal Symbols ~ words morphology is! They sell collectables of all sorts Productions ~ (unordered) (rewriting) rules This concerns unredecontaminatability Distinguished Symbol This really is an untiable knot. But—Probably! (Not sure about Swahili, though) Not all that important • Terminals and nonterminals are disjoint • Distinguished symbol Martin Kay Context-free Grammar 5 Martin Kay Context-free Grammar 6 Context-free Grammar Context-free
    [Show full text]
  • Mda13:Hpg-Variant-Developers.Pdf
    Overview Global schema: Binaries HPG Variant VCF Tools HPG Variant Effect HPG Variant GWAS Describing the architecture by example: GWAS Main workflow Reading configuration files and command-line options Parsing input files Parallelization schema How to compile: Dependencies and application Hacking HPG Variant Let's talk about... Global schema: Binaries HPG Variant VCF Tools HPG Variant Effect HPG Variant GWAS Describing the architecture by example: GWAS Main workflow Reading configuration files and command-line options Parsing input files Parallelization schema How to compile: Dependencies and application Hacking HPG Variant Binaries: HPG Variant VCF Tools HPG Variant VCF Tools preprocesses VCF files I Filtering I Merging I Splitting I Retrieving statistics Binaries: HPG Variant Effect HPG Variant Effect retrieves information about the effect of mutations I Querying a web service I Uses libcurl (client side) and JAX-RS/Jersey (server side) I Information stored in CellBase DB Binaries: HPG Variant GWAS HPG Variant GWAS conducts genome-wide association studies I Population-based: Chi-square, Fisher's exact test I Family-based: TDT I Read genotypes from VCF files I Read phenotypes and familial information from PED files Let's talk about... Global schema: Binaries HPG Variant VCF Tools HPG Variant Effect HPG Variant GWAS Describing the architecture by example: GWAS Main workflow Reading configuration files and command-line options Parsing input files Parallelization schema How to compile: Dependencies and application Hacking HPG Variant Architecture: Main workflow
    [Show full text]
  • Adaptive LL(*) Parsing: the Power of Dynamic Analysis
    Adaptive LL(*) Parsing: The Power of Dynamic Analysis Terence Parr Sam Harwell Kathleen Fisher University of San Francisco University of Texas at Austin Tufts University [email protected] [email protected] kfi[email protected] Abstract PEGs are unambiguous by definition but have a quirk where Despite the advances made by modern parsing strategies such rule A ! a j ab (meaning “A matches either a or ab”) can never as PEG, LL(*), GLR, and GLL, parsing is not a solved prob- match ab since PEGs choose the first alternative that matches lem. Existing approaches suffer from a number of weaknesses, a prefix of the remaining input. Nested backtracking makes de- including difficulties supporting side-effecting embedded ac- bugging PEGs difficult. tions, slow and/or unpredictable performance, and counter- Second, side-effecting programmer-supplied actions (muta- intuitive matching strategies. This paper introduces the ALL(*) tors) like print statements should be avoided in any strategy that parsing strategy that combines the simplicity, efficiency, and continuously speculates (PEG) or supports multiple interpreta- predictability of conventional top-down LL(k) parsers with the tions of the input (GLL and GLR) because such actions may power of a GLR-like mechanism to make parsing decisions. never really take place [17]. (Though DParser [24] supports The critical innovation is to move grammar analysis to parse- “final” actions when the programmer is certain a reduction is time, which lets ALL(*) handle any non-left-recursive context- part of an unambiguous final parse.) Without side effects, ac- free grammar. ALL(*) is O(n4) in theory but consistently per- tions must buffer data for all interpretations in immutable data forms linearly on grammars used in practice, outperforming structures or provide undo actions.
    [Show full text]
  • MC-1200 Series Linux Software User's Manual
    MC-1200 Series Linux Software User’s Manual Version 1.0, November 2020 www.moxa.com/product © 2020 Moxa Inc. All rights reserved. MC-1200 Series Linux Software User’s Manual The software described in this manual is furnished under a license agreement and may be used only in accordance with the terms of that agreement. Copyright Notice © 2020 Moxa Inc. All rights reserved. Trademarks The MOXA logo is a registered trademark of Moxa Inc. All other trademarks or registered marks in this manual belong to their respective manufacturers. Disclaimer Information in this document is subject to change without notice and does not represent a commitment on the part of Moxa. Moxa provides this document as is, without warranty of any kind, either expressed or implied, including, but not limited to, its particular purpose. Moxa reserves the right to make improvements and/or changes to this manual, or to the products and/or the programs described in this manual, at any time. Information provided in this manual is intended to be accurate and reliable. However, Moxa assumes no responsibility for its use, or for any infringements on the rights of third parties that may result from its use. This product might include unintentional technical or typographical errors. Changes are periodically made to the information herein to correct such errors, and these changes are incorporated into new editions of the publication. Technical Support Contact Information www.moxa.com/support Moxa Americas Moxa China (Shanghai office) Toll-free: 1-888-669-2872 Toll-free: 800-820-5036 Tel: +1-714-528-6777 Tel: +86-21-5258-9955 Fax: +1-714-528-6778 Fax: +86-21-5258-5505 Moxa Europe Moxa Asia-Pacific Tel: +49-89-3 70 03 99-0 Tel: +886-2-8919-1230 Fax: +49-89-3 70 03 99-99 Fax: +886-2-8919-1231 Moxa India Tel: +91-80-4172-9088 Fax: +91-80-4132-1045 Table of Contents 1.
    [Show full text]