Common Compiler Infrastructure: Compiler Writer’s Perspective
Eugene A. Zueff, Institute for Computer Systems, ETH Zürich [email protected]
June 26th, 2003 Talk Overview
• CCI Framework: What is it and what is it for? • CCI Architecture, Major Components & Features: An Overview. • CCI Intermediate Representation. • IR Transformation: Visitors. • CCI Compilation Model & Integration Issues. • Example: A CCI-based Zonnon Compiler • Current State & Conclusion
2 CCI at a Glance
• CCI = Common Compiler Infrastructure. • Technically, CCI is a set of resources (classes) providing support for implementing compilers and other language tools for .NET platform Compiler implementation Compiler integration • Conceptually, CCI is a part of the .NET Framework SDK
3 CCI at a Glance (2)
VSIP
Add-Ins
Macros
VS.NET
3+ CCI at a Glance (2)
CCI
BabelService
VSIP
Add-Ins
Macros
VS.NET
3+ CCI: Possible Scenarios
• Integrate an existing (“non-CCI”) compiler into VS.NET. • Develop a completely CCI-based compiler and integrate it into Visual Studio .NET. • Extend existing .NET languages & compilers (C#, VB etc.). • Develop post-compilation tools. • Develop “toy” compilers (either command- line or VS-integrated) for educational purposes! 4 CCI: Three Major Concerns
•(Common Concern) Developing compilers is a challenging task; integration brings a lot of additional issues. •(CCI Concern) CCI implements a radically different (non-conventional) view at compilation process. •(Technical Concern) CCI has wide and non-trivial interface, rules and contracts. 5 CCI Major Parts
Intermediate Representation (IR) A rich hierarchy of C# classes representing most common and typical notions of modern programming languages. System.Compiler.dll Transformers (“Visitors”) A set of classes performing consecutive transformations IR ⇒ MSIL System.Compiler.Framework.dll Integration Service Variety of classes and methods providing integration to Visual Studio environment (additional functionality required for editing, debugging, background compilation etc.) 6 CCI Way of Use: Principles
Common principles of using CCI: CCI services are represented as classes. In order to make use of them compiler writer should define classes inherited from CCI ones. Derived classes should implement some abstract methods declared in the base classes (they compose a “unified interface” with the environment) Derived classes may (and typically do) also implement some language-specific functionality.
7 CCI Way of Use: Parser Example
Prototype parser: using System.Compiler; abstract class from CCI namespace ZLanguageCompiler { public sealed class ZParser : System.Compiler.Parser { public override ... ParseCompilationUnit(...) {
l l . . . a } C private ... ParseZModule(...) { . . . Parser’s “unified interface”: } } implementation of the } interface between Z parser’s own logic Z compiler and environment 8 CCI Intermediate Representation (1)
The Central Part of CCI Classes representing Example: a C# class all language concepts public class C supported by CLI { public int m1; public void f ( ) { m1=0; } }
Field Class Name Identifier Name Flags Identifier Members Type Int32 Assignment- ... Statement
Method … ... Name Block … Identifier ... Flags Statements Void Type Body ... 9 CCI Intermediate Representation (2)
Node Node A part of IR Expression Member UnaryExpression TypeNode inheritance tree BinaryExpression Class NaryExpression DelegateNode MethodCall EnumNode Indexer Interface AssignmentExpression . . . Literal TypeParameter Parameter Pointer This Reference Statement Event AssignmentStatement Method If InstanceInitializer For StaticInitializer ForEach Field Continue Property ExpressionStatement Namespace VariableDeclaration CompilationUnit 10 CCI Intermediate Representation (3)
Example: Some Features: public class If : Statement Rather straightforward { approach. Expression condition; Block falseBlock; Very much similar to Block trueBlock; C# concept hierarchy. . . . } Supports some non-C# features (assignment public class Block : Statement statements etc). { bool hasLocals; Supports some future StatementList statements; C# features (generics). . . . } Suitable enough for representing a great number of other languages. 11 CCI Intermediate Representation (4) “Zero cost” approach: How to use IR classes Just take (a subset of) it as it is. - For relatively simple languages which are fully CLI-compliant (and therefore completely “covered” by IR classes).
”Standard” approach: Extend (some of them) adding own functionality if necessary (in a usual OO manner). - “Golden mean”: for most cases.
”Radical” approach: Create your own hierarchy providing means for converting its nodes to semantically equivalent IR nodes/sub-trees. - For complex languages and/or languages with completely different paradigms. 12 IR Transformation: Visitors (1)
StandardVisitor Every visitor walks an IR…
Looker …replacing Identifier nodes with Declarer the members/locals they resolve to;
Resolver …resolving overloads and deducing expression result types; …checking for semantic errors Checker and repairing it so that subsequent walks need not do error checking; Normalizer …preparing an IR for serializing to IL+MD. • It’s possible to modify standard visitors and/or… • Write own visitors replacing a standard one! 13 IR Transformation: Visitors (2) Overall scheme for IR processing Prototype “compiler”: abstract CCI class public class ZCompiler : System.Compiler.Compiler, ... { . . . protected override void Compile ( CompilationUnit cu, Class globalScope, ErrorNodeList errors ) { // Walk IR looking up names IR nodes (new Looker(globalScope)).VisitCompilationUnit(cu); // Walk IR inferring types and resolving overloads (new Resolver()).VisitCompilationUnit(cu); // Walk IR checking for semantic errors and repairing it (new Checker(errors)).VisitCompilationUnit(cu); // Walk IR reducing it to predefined mappings to MD+IL (new Normalizer().VisitCompilationUnit(cu); } . . . } Visitors 14 CCI Compilation Model
IL/MD Writer
OutputOutput Source MSIL+MD Source IR AssemblyAssembly (AST)
IL/MD Scanner Visitors & Reader Parser ImportedImported AssembliesAssemblies
Language specific Common to all languages 15 Compiler Integration: Traditional Approach
Source File Name Compilation Params
Compiler Start Up Environment Compiler Syntactic & Code Source Lexical Sequence Program Genera- Object Code Semantic Code Analysis of Tokens Analysis Tree tion
Compiler End Up
Compiler as a “Black File with Object Code Box” Program Diagnostic Messages 16 What Does Integration Assume? (1)
Visual Studio Components Features That Should be Supported by a Compiler • Language sources identification
• Syntax Highlighting Project Manager • Automatic text formatting Text Editor • Smart text browsing { Î } • Error checking while typing Semantic Support • Tooltip-like diagnostics & info (“Intellisense”) • Type member lists for classes and variables of class types Debugger • Lists of overloaded methods • Lists of method parameters • Expression evaluation • Conditional breakpoints 17 What Does Integration Assume? (2)
Example of “Intellisense” Feature
18 Compiler Integration: CCI Approach (1)
Token Document TokenToken Program Tree Object Code Source Code Token Attributes (Assembly) Source Context Token Context
Syntactic & Code Lexical Compiler Semantic Genera- Analysis Environment Analysis tion
Compiler as a Collection of Resources 19 Compiler integration: CCI Approach (2)
Token Document TokenToken Program Tree Source Code Token Attributes Object Code (Assembly) Source Context Token Context
Syntactic & Code Lexical Genera- Analysis Semantic Analysis tion Objects Compiler as a Set of
Environment Source Text Project Manager “Intellisense” Debugger Editor
20 Compiler Integration: CCI Approach (3)
Compilation Phase
Get Token Lexical Analysis SomeGet Token Methods with Extra Attributes
Parse Program Unit
Syntactic & Parse Expression Semantic Analysis Parse Statements
. . . 21 A CCI-based Compiler: Zonnon (1) • Zonnon language is a successor of Pascal, Modula-2 and Oberon line of languages. • Zonnon preserves the spirit of Oberon being a compact language with a small number of orthogonal basic concepts including: * modularity; * simple but powerful object model; * active object concept. • The first Zonnon implementation is for .NET platform and is based on CCI framework. • The language is designed and implemented in ETH Zürich, Switzerland. 22 A CCI-based Compiler: Zonnon (2) Zonnon compiler integrated into Visual Studio: a screenshot
23 A CCI-based Compiler: Zonnon (3) Zonnon compiler integrated into Visual Studio: a screenshot
24 A CCI-based Compiler: Zonnon (4) Zonnon compiler integrated into Visual Studio: a screenshot
25 A CCI-based Compiler: Zonnon (5) Zonnon compiler integrated into Visual Studio: a screenshot
26 CCI: Current State
• (Almost) completely implemented; non-documented. • Since June 12th CCI Toolkit is included into MSDN Academic Alliance: www.msdnaa.net/cci • Zonnon compiler is, perhaps, the first experience in using CCI outside Microsoft.
27 Conclusion
Common Compiler Infrastructure seems to be powerful and practically useful framework for developing compilers and language tools for wide range of languages. CCI provides a convenient high-level model for real integration compilers to Visual Studio environment. CCI framework has been chosen as the platform for implementing compiler for the new Zonnon language.
28