C99/UPC Parsers

C99/UPC Parsers

Extensible Parsers - C99/UPC Parsers Mike Kucera Jason Montojo IBM Eclipse CDT Team © 2006 IBM; made available under the EPL v1.0 | August 8, 2006 Overview Goals To create a parser framework that allows language extensions to be easily added to CDT Modularity Clean implementation, maintainability Performance Support for Unified Parallel C (UPC) needed by the Parallel Tools Project UPC spec is an extension to the C99 spec 2 © 2007 IBM Corporation; made available under the EPL v1.0 C99 Parser in CDT 4.0 C99 parser base Designed to be extensible UPC parser Built on top of C99 parser 3 © 2007 IBM Corporation; made available under the EPL v1.0 Language Extensibility in CDT What CDT currently provides Extension point for adding new parsers Map languages to content types Syntax highlighting can be extended to new keywords Add new types of AST nodes What CDT does not provide A parser that can be directly extended to support new syntax A reusable preprocessor 4 © 2007 IBM Corporation; made available under the EPL v1.0 C99 Preprocessor New preprocessor written from scratch Much cleaner implementation than DOM preprocessor DOM preprocessor: • Lexing and preprocessing are combined, not modular • Processes raw character stream, very complex code to do this • Doesn’t handle comments properly C99 Preprocessor: • Lexing and preprocessing are separated, modular • Token based, input is first lexed into tokens then fed to preprocessor, much cleaner • Comments are correctly resolved by the lexer 5 © 2007 IBM Corporation; made available under the EPL v1.0 C99 Parser in CDT 4.0 Different approach than the DOM parser DOM parser completely hand written C99 Parser generated from grammar files using a parser generator Using LPG - LALR Parser Generator Bottom-up parsing approach Grammar file looks similar to the spec Some parts of DOM parser are reused AST LocationMap 6 © 2007 IBM Corporation; made available under the EPL v1.0 LPG – LALR Parser Generator Two parts The generator (lpg.exe) • Generates parse tables from grammar file • Parse tables are basically a specification of a finite state machine The runtime (java library) • Contains the parser driver and supporting classes • Parser driver interprets the parse tables 7 © 2007 IBM Corporation; made available under the EPL v1.0 LPG – LALR Parser Generator LPG is used by several eclipse projects including: Model Development Tools (MDT) Graphical Modeling Framework (GMF) Generative Modeling Technologies (GMT) Data Tools Platform (DTP) SAFARI Java Development Tools (JDT, in the bytecode compiler) Part of Orbit project 8 © 2007 IBM Corporation; made available under the EPL v1.0 LPG – Benefits Automatic Computation of AST node offsets Backtracking Syntax error recovery Clean separation of parser and the code that builds the AST Grammar file inheritance Source of parser extensibility 9 © 2007 IBM Corporation; made available under the EPL v1.0 C99 Grammar File Example statement ::= labeled_statement | compound_statement | expression_statement | selection_statement | iteration_statement | jump_statement | ERROR_TOKEN /.$ba consumeStatementProblem(); $ea./ iteration_statement ::= 'do' statement 'while' '(' expression ')' ';' /.$ba consumeStatementDoLoop(); $ea./ | 'while' '(' expression ')' statement /.$ba consumeStatementWhileLoop(); $ea./ | 'for' '(' expression ';' expression ';' expression ')' statement /.$ba consumeStatementForLoop(true, true, true); $ea./ 10 © 2007 IBM Corporation; made available under the EPL v1.0 AST Building Actions /** * iteration_statement ::= 'while' '(' expression ')' statement */ public void consumeStatementWhileLoop() { IASTWhileStatement whileStatement = nodeFactory .newWhileStatement(); IASTStatement body = (IASTStatement) astStack .pop(); IASTExpression condition = (IASTExpression) astStack .pop(); whileStatement.setBody(body); body.setParent(whileStatement); body.setPropertyInParent(IASTWhileStatement. BODY ); whileStatement.setCondition(condition); condition.setParent(whileStatement); condition.setPropertyInParent(IASTWhileStatement. CONDITIONEXPRESSION ); setOffsetAndLength(whileStatement); astStack .push(whileStatement); } 11 © 2007 IBM Corporation; made available under the EPL v1.0 Content Assist 5 simple grammar rules ident ::= 'identifier' | 'Completion' ']' ::=? 'RightBracket' | 'EndOfCompletion' ')' ::=? 'RightParen' | 'EndOfCompletion' '}' ::=? 'RightBrace' | 'EndOfCompletion' ';' ::=? 'SemiColon' | 'EndOfCompletion' First rule says that a Completion token can occur anywhere an identifier token can occur. Next 4 rules allow the parse to complete successfully after a Completion token has been encountered. 12 © 2007 IBM Corporation; made available under the EPL v1.0 Generating The Parser From Grammar Files Grammar Parse File Tables Recognizes C99Lexer.g Parser Tokens Generator lpg.exe Grammar File Parse Tables C99Parser.g Recognizes C99 Language 13 © 2007 IBM Corporation; made available under the EPL v1.0 Architecture of C99 Parser Token C99 Stream Source Preprocessor Parser Code C99 C99 C99 AST C99 Parse Keyword AST Lexer Tables Map Actions 14 © 2007 IBM Corporation; made available under the EPL v1.0 Extensibility – Supporting UPC UPC grammar file extends the C99 grammar file Adds new grammar rules for UPC syntax Generates new parse tables that recognize UPC $Import C99Parser.g $End iteration_statement ::= 'upc_forall' '(' expression ';' expression ';' expression ';' affinity ')' statement /.$ba consumeStatementUPCForallLoop(true, true, true, true); $ea./ 15 © 2007 IBM Corporation; made available under the EPL v1.0 Extensibility – Supporting UPC Extend C99 classes. C99ParserAction C99KeywordMap UPCParserAction UPCKeywordMap Adds actions for new Adds mappings for new grammar rules UPC keywords like ‘upc_forall’ 16 © 2007 IBM Corporation; made available under the EPL v1.0 Extensibility – Supporting UPC Create AST node classes for new language constructs CASTForStatement UPCASTForallStatement 17 © 2007 IBM Corporation; made available under the EPL v1.0 Architecture of UPC Parser Token UPC Stream Source Preprocessor Parser Code UPC UPC UPC C99 AST Lexer Parse Keyword AST Tables Map Actions 18 © 2007 IBM Corporation; made available under the EPL v1.0 Performance 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% GNUCSourceParser C99Parser 19 © 2007 IBM Corporation; made available under the EPL v1.0 Future Work Make the preprocessor reusable Reusable on any token stream Use for FORTRAN etc… Support for C++ Advanced approach Provide compiler specific extensions GCC, XLC etc… Further performance enhancements We haven’t spent much time on optimizations yet 20 © 2007 IBM Corporation; made available under the EPL v1.0.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us