A Comparison Between Packrat Parsing and Conventional Shift-Reduce Parsing on Real-World Grammars and Inputs

A Comparison Between Packrat Parsing and Conventional Shift-Reduce Parsing on Real-World Grammars and Inputs

UPTEC IT 14 016 Examensarbete 30 hp Oktober 2014 A Comparison Between Packrat Parsing and Conventional Shift-Reduce Parsing on Real-World Grammars and Inputs Daniel Flodin Abstract A Comparison Between Packrat Parsing and Conventional Shift-Reduce Parsing on Real-World Grammars and Inputs Daniel Flodin Teknisk- naturvetenskaplig fakultet UTH-enheten Packrat parsing is a top-down, recursive descent parsing technique that uses backtracking and has a guaranteed linear parse time. Conventional backtracking Besöksadress: parsers suffer from exponential parse times in the worst case due to re-evaluating Ångströmlaboratoriet Lägerhyddsvägen 1 redundant results. This is avoided in packrat parsers with the use of memoization. Hus 4, Plan 0 However, memoization causes packrat parsers memory consumption to be linearly proportional to the input string, as opposed to linearly proportional to the maximum Postadress: recursion depth for conventional parsing techniques. Box 536 751 21 Uppsala The objective of this thesis is to implement a packrat parser generator and compare it Telefon: with an existing and well-known parser combination called Lex/Yacc which produces 018 – 471 30 03 shift-reduce parsers. The comparison will consist of pure performance measurements Telefax: such as memory consumption and parsing time, and also a more general comparison 018 – 471 30 00 between the two parsing techniques. Hemsida: The conclusion made from the comparison is that packrat parsing can be a viable http://www.teknat.uu.se/student option due to its ability to compose modular and extendible grammars more easily than Lex/Yacc. However, from a performance perspective the Lex/Yacc combination proved superior. In addition, the results indicate that similar performance for a packrat parser is hard to achieve on grammars similar to those used in this thesis. Handledare: Johan Runesson Ämnesgranskare: Lars-Henrik Eriksson Examinator: Roland Bol ISSN: 1401-5749, UPTEC IT 14 016 Tryckt av: Reprocentralen ITC Summary in Swedish Packratparsning ¨aren parsingsteknik som togs fram av Bryan Ford 2002. Packratparsning ¨ar en s˚akallad top-down parsningsteknik som anv¨andersig utav backtracking. Konventionella top-down parsers som anv¨anderbacktracking kan i v¨arstafall ha exponentiell k¨ortidp˚agrund av att de utf¨orredundanta evalueringar. Packratparsers undviker detta genom memoisering. Memoiseringen inneb¨aratt alla evalueringar som g¨orsunder parsningen sparas undan i en separat tabell f¨oratt sedan kunna anv¨andasutifall samma evalueringar beh¨over upprepas. Om inf¨orande-och uppslagningsoperationerna p˚amemoiseringstabellen kan ske i konstant tid s˚a inneb¨ardetta att en packratparser kommer ha en garanterad linj¨ark¨ortid. Packratparsers skiljer sig ¨aven fr˚anmer traditionella parsningsmetoder genom att packrat- parsers kan vara scannerless. Med detta menas att packratparsers ej beh¨over genomf¨oraden lexikaliska analysen separat. B˚adeden lexikaliska analysen och parsningen kan ske med ett enda verktyg. Packratparsers anv¨andersig utav parsing-expressiongrammatiker som alltid producerar otvety- diga grammatiker. Detta skiljer s˚afr˚ande mer traditionella context-free grammatikerna som i vissa fall kan producera tvetydiga grammatiker. Packratparsers st¨odjer¨aven syntaktiska pre- dikat som ger packratparsers m¨ojlighetenatt anv¨andasig utav en obegr¨ansadlookahead. I denna rapport j¨amf¨orspackratparsning med parserkombinationen Lex/Yacc. Lex/Yacc ¨aren parsergenerator som producerar s˚akallade shift-reduce parsers. F¨oretaget bakom detta examensarbete, IAR Systems, anv¨anderi dagsl¨agetLex/Yacc f¨orparsning av m˚angaav deras egna implementerade spr˚ak.Dock s˚auppelever dem att Lex/Yacc har en del brister som till exempel att specifkationerna f¨orLex/Yacc f¨oljerolika syntax, felhanteringen m˚asteimplemen- teras manuellt, sv˚arigheteratt ha multipla parsers i samma program och att det generellt sett ¨arsv˚artatt ut¨oka och ¨andragrammatikspecifikationerna f¨orde tv˚averktygen. IAR Systems ¨ar d¨arf¨orintresserade av utifall packratparsning kan hantera dessa brister p˚aett b¨attres¨att.Dock f˚arinte eventuella presentandaskillnader mellan dessa tv˚aparsningstekniker vara f¨orstora. F¨oratt unders¨oka detta s˚ahar en packratparsergenerator vid namn Hilltop implementerats. Hilltop och Lex/Yacc har genererat parsers f¨orsamma grammatiker och dessa parsers har sedan analyserats prestandam¨assigt;sett till k¨ortidoch minnesf¨orbrukning. Det har ¨aven gjorts en generell j¨amf¨orelsemellan Hilltop och Lex/Yacc f¨oratt belysa eventuella f¨or-och nackdelar de olika parsningsteknikerna medf¨or.F¨oratt f˚aett s˚arealistikt resultat som m¨ojligts˚ahar gram- matikerna som parsergeneratorerna testats p˚avarit grammatiker som IAR Systems anv¨anderi produktion. K¨allkodsfilerna som de genererade parserarna testats p˚a¨ar¨aven dessa tagna fr˚an produktionen. F¨ogaf¨orv˚anandes˚aframkommer det under resultatanalysen att parsers genererade utav Hilltop skiljer sig prestandam¨assigt,b˚adei avseende f¨ork¨ortidoch minnesanv¨andning,gen- temot parsers genererade utav Lex/Yacc. Denna skillnad i prestanda beror delvis p˚aatt im- plementationen utav Hilltop endast har p˚ag˚attunder en begr¨ansadtid vilket resulterat i en del noterbara brister, bland annat hur str¨angj¨amf¨orelser hanteras. I sitt nuvarande tillst˚and s˚askiljer sig parsers genererade utav Hilltop gentemot Lex/Yacc med en faktor 18.4-25.9 med h¨ansyntill k¨ortid,och en faktor 25.9-34.9 vad g¨allerminnes˚atg˚ang.Dock s˚atros dessa faktorer kunna minska v¨asentligt med hj¨alpav diverse f¨orb¨attringarp˚aimplementationen. Skillnaderna tros kunna reduceras s˚amycket s˚aatt det ist¨alletskiljer en faktor tv˚aeller tre mellan gener- erade parsers. Packratparsers tros dock inte kunna bli prestandam¨assigtb¨attre¨anshift-reduce parsers genererade utav Lex/Yacc, i alla fall inte f¨orgrammatiker liknande de som har anv¨ants i detta examensarbete. Vissa av de brister som Lex/Yacc kombinationen innehar hanteras p˚aett b¨attres¨attutav Hilltop. Fr¨amsts˚a¨arHilltops och packratparsers f¨orm˚agaatt kunna modul¨ariseraoch ut¨oka/¨andra 1 sina grammatiker klart b¨attre¨anf¨orgrammatiker specificerade med Lex/Yacc. Bristerna relaterat till felhantering och m¨ojlighetenatt ha flera parsers i samma program ¨arf¨ortillf¨alletej implementerat i Hilltop. Dock kan en generell felhanteringen implementeras och d¨armedbeh¨over anv¨andarenej implementera n˚agonfelhantering manuellt. Aven¨ st¨odf¨or att ett program ska kunna inneh˚allamultipla parsers skall kunna implementeras utan st¨orre sv˚arigheter. F¨orm˚aganatt enkelt kunna modul¨ariseraoch ut¨oka sina grammatiker med Hilltop och pack- ratparsers g¨oratt packratparsergeneratorer passar bra om nya liknande grammatiker imple- menteras eller om grammatiker omdefineras p˚aen regelbunden basis. Detta g¨oratt en packrat- parsergenerator kan vara att f¨oredrainom detta anv¨andningsomr˚ade.D¨aremot,om anv¨andaren ¨armer intresserad av ren prestanda utav sin parser och inte regelbundet ¨andrareller skapar nya grammatiker s˚ast˚arsig parsergeneratorkombinationen Lex/Yacc sig v¨aldigtstarkt, och ¨ar i detta fall att f¨oredragentemot en packratparsergenerator. 2 Contents 1 Introduction 7 1.1 Background . .7 1.1.1 Packrat Parsing . .7 1.2 Problem Description . .8 1.3 Method . .9 1.3.1 Literature Review . .9 1.3.2 Implementation . .9 1.3.3 Analysis . .9 1.4 Limitations . .9 1.5 Thesis Outline . 10 2 Parsing 11 2.1 Regular Expressions . 11 2.2 Context-Free Grammars . 12 2.3 Bottom-Up Parsing . 12 2.3.1 Shift-Reduce Parsing . 13 2.4 Top-Down Parsing . 14 2.4.1 Backtrack Parsing . 14 2.4.2 Predictive Parsing . 15 2.4.3 Left Recursion . 16 2.5 Abstract Syntax Tree . 16 3 Packrat Parsing 18 3.1 Founding Work . 18 3.2 Parsing Expression Grammars . 18 3.2.1 Definition And Operators . 18 3.2.2 Ambiguity . 19 3.2.3 Left Recursion . 20 3.2.4 Syntactic Predicates . 20 3.3 Memoization . 21 3.4 Scannerless . 22 3.5 Parse Completion . 22 3.6 Practical Limitations . 23 3.6.1 Memory Consumption . 23 3.6.2 Maintaining States . 23 3 4 Lex/Yacc 25 4.1 Lex . 25 4.1.1 Ambiguity . 26 4.2 Yacc . 26 4.2.1 Ambiguity . 26 4.2.2 Left recursion . 27 5 Implementation 28 5.1 Treetop . 28 5.2 Hilltop . 29 5.2.1 Parse Tree . 31 5.2.2 Memoization . 32 5.3 Implemented Improvements . 33 5.3.1 Transient Productions . 33 5.3.2 Hashing . 34 5.4 General Comparison Between Hilltop and Lex/Yacc . 34 5.4.1 LALR(1) versus LL(k)............................. 34 5.4.2 Scannerless versus Separate Lexical Analyser and Parser . 34 5.4.3 Modularity and Extendibility . 35 5.4.4 Error Reporting . 35 5.4.5 Conflicts . 35 5.4.6 Parse Tree Creation . 36 6 Experimental Results 37 6.1 Testing Environment . 37 6.2 Parser Generation . 38 6.3 Parsing Time . 38 6.3.1 Grammar 1 . 39 6.3.2 Grammar 2 . 40 6.3.3 Discussion . 41 6.4 Memory Consumption . 42 6.4.1 Grammar 1 . 42 6.4.2 Grammar 2 . 44 6.4.3 Memoization Matrix Size . 45 6.4.4 Discussion . 46 6.5 Result Evaluation . 47 7 Related Work 48 7.1 Pappy . 48 7.2 Rats! ........................................... 48 7.3 The Cost of Using Memoization . 49 7.4 Packrat Parsing versus Primitive Recursive Descent Parsing . 49 8 Conclusion 51 8.1 Thesis Evaluation . 52 4 9 Future Work 53 9.1 Future Work . 53 9.1.1 Testing of More Grammars . 53 9.1.2 Array as Memoization Data Structure . 53 9.1.3 Hilltop with a Lexical Analyser . 53 9.1.4 Merging Choices with Common Prefixes . 54 9.1.5 Iterative Operators . 54 9.1.6 Left Recursion . 55 9.1.7 Cut Operator . 55 References 57 Appendix A A trivial grammar 60 Appendix B Treetop parser 61 Appendix C Hilltop parser 66 Appendix D Hilltop parse tree.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    82 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us