Montana State University

Micro Compiler Portfolio

Matt Johnerson Michael Shihrer CSCI 468 ­ Compilers

1

Program 2

Expression.java 2

IRNode.java 2

Listener.java 4

Main.java 10

Micro.java 11

Micro.g4 12

Scope.java 14

Symbol.java 16

SymbolTable.java 17

TinyGenerator.java 19

Teamwork 22

Design Pattern 23

Technical Writing 23

UML Diagram 24

Design Trade­Offs 24

Software Development Life Cycle 25

2

1. Program The program’s and final submission is located on . It can be viewed at https://github.com/shihrer/csci468 under the release tagged “Step4”. In order to build and ​ execute the code, the ANTLR v4.5.2 runtime is needed. This can be located at http://www.antlr.org/. ​ a. Expression.java package com.csci468.micro; ​ ​ ​ ​ ​

class Expression { ​ ​ private String name; ​ ​ ​ ​ private String type; ​ ​ ​ ​

Expression(String name, String type) { ​ ​ ​ ​ ​ this.name = name; ​ ​ ​ ​ ​ this.type = type; ​ ​ ​ ​ ​ } ​

String getType() { ​ ​ return this.type; ​ ​ ​ ​ } ​

String getName() { ​ ​ return this.name; ​ ​ ​ ​ } ​

@Override ​ public String toString() { ​ ​ ​ return this.name; ​ ​ ​ ​ } ​ }

b. IRNode.java package com.csci468.micro; ​ ​ ​ ​ ​

class IRNode { ​ ​

//Data fields ​ private String _opcode; ​ ​ ​ ​ private Expression _operand1; ​ ​ ​ ​ private Expression _operand2; ​ ​ ​ ​ private Expression _result; ​ ​ ​ ​

IRNode(String OPCode, Expression OP1, Expression OP2, Expression Result) { ​ ​ ​ ​ ​ ​ ​ ​ ​ if (OP1 != null && !OPCode.equals("STR")) { ​ ​ ​ ​ ​ ​ ​ ​ switch (OP1.getType()) { ​ ​ ​ ​ case "INT": ​ ​ ​ this._opcode = OPCode + "I"; ​ ​ ​ ​ ​ ​ 3

break; ​ ​ case "FLOAT": ​ ​ ​ this._opcode = OPCode + "R"; ​ ​ ​ ​ ​ ​ break; ​ ​ default: ​ ​ this._opcode = OPCode + "S"; ​ ​ ​ ​ ​ ​ break; ​ ​ } ​ } else { ​ ​ this._opcode = OPCode; ​ ​ ​ ​ ​ } ​

this._operand1 = OP1; ​ ​ ​ ​ ​ this._operand2 = OP2; ​ ​ ​ ​ ​ this._result = Result; ​ ​ ​ ​ ​ } ​

String getIRCode() { ​ ​ StringBuilder irCode = new StringBuilder(); ​ ​ ​ ​ ​ ​ irCode.append("; "); ​ ​ ​ ​ ​ ​ irCode.append(_opcode).append(" "); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

if (_operand1 != null) ​ ​ ​ ​ ​ ​ irCode.append(_operand1).append(" "); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ if (_operand2 != null) ​ ​ ​ ​ ​ ​ irCode.append(_operand2).append(" "); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ if (_result != null) ​ ​ ​ ​ ​ ​ irCode.append(_result).append(" "); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

return irCode.toString(); ​ ​ ​ ​ ​ ​ } ​

String getOpcode() { ​ ​ return _opcode; ​ ​ ​ } ​

Expression getOP1() { ​ ​ return _operand1; ​ ​ ​ } ​

Expression getOP2() { ​ ​ return _operand2; ​ ​ ​ } ​ Expression getResult() { ​ ​ return _result; ​ ​ ​ } ​ @Override ​ public String toString() { ​ ​ ​ return getIRCode(); ​ ​ ​ ​ } ​ 4

} . Listener.java package com.csci468.micro; ​ ​ ​ ​ ​

import java.util.LinkedList; ​ ​ ​ ​ ​ import java.util.Stack; ​ ​ ​ ​ ​

/** * Michael Shihrer * Matthew Johnerson * 26 March 2016 */

class Listener extends MicroBaseListener { ​ ​ ​ ​ // Count variables ​ private int scopeCount = 1; ​ ​ ​ ​ ​ private int labelCount = 0; ​ ​ ​ ​ ​ private int variableCount = 0; ​ ​ ​ ​ ​

private SymbolTable microSymbolTable; ​ ​ ​ ​

// Code generation objects ​ private LinkedList IRNodes; ​ ​ ​ ​ ​ ​ ​ private Stack labelStack; ​ ​ ​ ​ ​ ​ ​ private Stack expressionStack = new Stack<>(); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ private Stack> stackStack = new Stack<>(); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

// Constructor requires IRNodes list ​ Listener(LinkedList IRNodes) { ​ ​ ​ ​ ​ ​ microSymbolTable = new SymbolTable(); ​ ​ ​ ​ ​ ​ this.IRNodes = IRNodes; ​ ​ ​ ​ ​ this.labelStack = new Stack<>(); ​ ​ ​ ​ ​ ​ ​ ​ } ​

@Override ​ public void enterFuncDecl(MicroParser.FuncDeclContext ctx) { ​ ​ ​ ​ ​ ​ ​ //Create new scope ​ microSymbolTable.createScope(ctx.ID().toString()); ​ ​ ​ ​ ​ ​ ​ ​ ​

Expression result = new Expression(ctx.ID().getText(), "STR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode("LABEL", null, null, result)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​

@Override ​ public void exitFuncDecl(MicroParser.FuncDeclContext ctx) { ​ ​ ​ ​ ​ ​ ​ //Pop scope ​ destroyScope(); ​ ​ ​ } ​

5

@Override ​ public void exitVarDecl(MicroParser.VarDeclContext ctx) { ​ ​ ​ ​ ​ ​ ​

String ids = ctx.idList().getText(); ​ ​ ​ ​ ​ ​ ​

for (String id : ids.split(",")) { ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ Expression result = new Expression(id, ctx.varType().toString()); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode("VAR", null, null, result)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ }

@Override ​ public void enterParamDecl(MicroParser.ParamDeclContext ctx) { ​ ​ ​ ​ ​ ​ ​ //Add parameter to scope ​ microSymbolTable.createSymbol(ctx.ID().toString(), ​ ​ ​ ​ ​ ​ ​ ​ ​ ctx.varType().getText()); ​ ​ ​ ​ ​ } ​

@Override ​ public void enterIfStmt(MicroParser.IfStmtContext ctx) { ​ ​ ​ ​ ​ ​ ​ createBlockScope(); ​ ​ ​ } ​

@Override ​ public void exitIfStmt(MicroParser.IfStmtContext ctx) { ​ ​ ​ ​ ​ ​ ​ int curLabel = labelStack.pop(); ​ ​ ​ ​ ​ ​ ​ ​ Expression result = new Expression("label" + curLabel, "STR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode("LABEL", null, null, result)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ destroyScope(); ​ ​ } ​ @Override public void enterWhileStmt(MicroParser.WhileStmtContext ctx) { ​ ​ ​ ​ ​ ​ Expression result = new Expression("label" + labelCount, "STR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode("LABEL", null, null, result)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ labelStack.push(labelCount); ​ ​ ​ ​ ​ ​ ​ labelCount++; ​ ​ ​ createBlockScope(); ​ ​ }

@Override public void exitWhileStmt(MicroParser.WhileStmtContext ctx) { ​ ​ ​ ​ ​ ​ int curLabel = labelStack.pop(); ​ ​ ​ ​ ​ ​ ​ ​ int jumpLabel = labelStack.pop(); ​ ​ ​ ​ ​ ​ ​ ​ Expression jumpResult = new Expression("label" + jumpLabel, "STR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ Expression labelResult = new Expression("label" + curLabel, "STR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

IRNodes.add(new IRNode("JUMP", null, null, jumpResult)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode("LABEL", null, null, labelResult)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ }

6

@Override public void enterElse(MicroParser.ElseContext ctx) { ​ ​ ​ ​ ​ ​ createBlockScope(); ​ ​ ​ int curLabel = labelStack.pop(); ​ ​ ​ ​ ​ ​ ​ ​

Expression jumpResult = new Expression("label" + labelCount, "STR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ Expression labelResult = new Expression("label" + curLabel, "STR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode("JUMP", null, null, jumpResult)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode("LABEL", null, null, labelResult)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ labelStack.push(labelCount); ​ ​ ​ ​ ​ ​ ​ labelCount++; ​ ​ ​ }

@Override public void exitElse(MicroParser.ElseContext ctx) { ​ ​ ​ ​ ​ ​ destroyScope(); ​ ​ ​ }

@Override public void exitReadStmt(MicroParser.ReadStmtContext ctx) { ​ ​ ​ ​ ​ ​

String ids = ctx.idList().getText(); ​ ​ ​ ​ ​ ​ ​

for (String id : ids.split(",")) { ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ Expression readResult = new Expression(id, ​ ​ ​ ​ ​ ​ microSymbolTable.getSymbol(id).getType()); ​ ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode("READ", readResult, null, null)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ } private void createBlockScope() { ​ ​ //Create new scope ​ microSymbolTable.createScope(scopeCount); ​ ​ ​ ​ ​ ​ ​ scopeCount++; ​ ​ ​ } private Scope destroyScope() { ​ ​ return microSymbolTable.destroyScope(); ​ ​ ​ ​ ​ ​ }

@Override public void enterStringDecl(MicroParser.StringDeclContext ctx) { ​ ​ ​ ​ ​ ​ //Create an entry in the current scope ​ String name = ctx.ID().toString(); ​ ​ ​ ​ ​ ​ ​ String value = ctx.STRINGLITERAL().toString(); ​ ​ ​ ​ ​ ​ String type = ctx.STRING().toString(); ​ ​ ​ ​ ​ ​ microSymbolTable.createSymbol(name, type, value); ​ ​ ​ ​ ​ ​ ​ }

@Override 7

public void enterVarDecl(MicroParser.VarDeclContext ctx) { ​ ​ ​ ​ ​ ​ String varType = ctx.varType().getText(); ​ ​ ​ ​ ​ ​ ​ // Create an entry in the current scope ​ String names = ctx.idList().getText(); ​ ​ ​ ​ ​ ​ ​

String[] tokens = names.split(","); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ for (String s : tokens) { ​ ​ ​ ​ ​ ​ microSymbolTable.createSymbol(s, varType); ​ ​ ​ ​ ​ ​ ​ } ​ }

@Override public void exitAssignExpr(MicroParser.AssignExprContext ctx) { ​ ​ ​ ​ ​ ​

//Clear out expression stack, just make sure there's nothing extra to ​ evaluate. while (expressionStack.size() > 1) { ​ ​ ​ ​ ​ ​ ​ ​ buildExpression(); ​ ​ ​ } ​

String OPCode = "STORE"; ​ ​ ​ ​ String ID = ctx.ID().toString(); ​ ​ ​ ​ ​ ​ Expression storeResult = new Expression(ID, ​ ​ ​ ​ ​ microSymbolTable.getSymbol(ID).getType()); ​ ​ ​ ​ ​ ​ ​ ​

// Avoid popping if the stack is empty ​ if (expressionStack.size() > 0) { ​ ​ ​ ​ ​ ​ ​ ​ Expression OP1 = expressionStack.pop(); ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode(OPCode, OP1, null, storeResult)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ }

@Override public void exitCond(MicroParser.CondContext ctx) { ​ ​ ​ ​ ​ ​ //Compare the results of the expressions ​ Expression op1; ​ Expression op2; if (expressionStack.size() > 1) { ​ ​ ​ ​ ​ ​ ​ ​ op1 = expressionStack.pop(); ​ ​ ​ ​ ​ ​ ​ op2 = expressionStack.pop(); ​ ​ ​ ​ ​ ​ Expression compareResult = new Expression("label" + labelCount, "STR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ StringBuilder OPCode = new StringBuilder(); ​ ​ ​ ​ ​

switch (ctx.COMPOP().toString()) { ​ ​ ​ ​ ​ ​ case ">": ​ ​ ​ OPCode.append("LE"); ​ ​ ​ ​ ​ ​ ​ break; ​ ​ case ">=": ​ ​ ​ OPCode.append("LT"); ​ ​ ​ ​ ​ ​ ​ 8

break; ​ ​ case "<": ​ ​ ​ OPCode.append("GE"); ​ ​ ​ ​ ​ ​ ​ break; ​ ​ case "<=": ​ ​ ​ OPCode.append("GT"); ​ ​ ​ ​ ​ ​ ​ break; ​ ​ case "!=": ​ ​ ​ OPCode.append("EQ"); ​ ​ ​ ​ ​ ​ ​ break; ​ ​ case "=": ​ ​ ​ OPCode.append("NE"); ​ ​ ​ ​ ​ ​ ​ break; ​ ​ } ​ IRNodes.add(new IRNode(OPCode.toString(), op2, op1, compareResult)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ labelStack.push(labelCount); ​ ​ ​ ​ ​ ​ ​ labelCount++; ​ ​ ​ //variableCount++; ​ } ​ }

@Override public void exitWriteStmt(MicroParser.WriteStmtContext ctx) { ​ ​ ​ ​ ​ ​ //Create an entry in the current scope ​ String names = ctx.idList().getText(); ​ ​ ​ ​ ​ ​ ​

for (String s : names.split(",")) { ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ Expression writeOP = new Expression(s, ​ ​ ​ ​ ​ ​ microSymbolTable.getSymbol(s).getType()); ​ ​ ​ ​ ​ ​ ​ ​ IRNodes.add(new IRNode("WRITE", writeOP, null, null)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​

} @Override public void exitADDOP(MicroParser.ADDOPContext ctx) { ​ ​ ​ ​ ​ ​ ​ Expression newExpr = new Expression(ctx.ADDOP().toString(), "OPERATOR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ expressionStack.push(newExpr); ​ ​ ​ ​ ​ ​ ​ } ​

@Override ​ public void exitMULOP(MicroParser.MULOPContext ctx) { ​ ​ ​ ​ ​ ​ ​ Expression newExpr = new Expression(ctx.MULOP().toString(), "OPERATOR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ expressionStack.push(newExpr); ​ ​ ​ ​ ​ ​ ​ } ​

@Override ​ public void exitPrimaryID(MicroParser.PrimaryIDContext ctx) { ​ ​ ​ ​ ​ ​ ​ expressionStack.add(new Expression(ctx.getText(), ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ microSymbolTable.getSymbol(ctx.ID().toString()).getType())); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ 9

@Override ​ public void exitPrimaryINT(MicroParser.PrimaryINTContext ctx) { ​ ​ ​ ​ ​ ​ ​ expressionStack.add(new Expression(ctx.getText(), "INT")); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​

@Override ​ public void exitPrimaryFLOAT(MicroParser.PrimaryFLOATContext ctx) { ​ ​ ​ ​ ​ ​ ​ expressionStack.add(new Expression(ctx.getText(), "FLOAT")); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​

@Override ​ public void exitParanths(MicroParser.ParanthsContext ctx) { ​ ​ ​ ​ ​ ​ ​ //Restore operations stack ​ buildExpression(); ​ ​ ​ Stack newStack = stackStack.pop(); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ newStack.addAll(expressionStack); ​ ​ ​ ​ ​ ​ expressionStack = newStack; ​ ​ ​ } ​

@Override ​ public void enterParanths(MicroParser.ParanthsContext ctx) { ​ ​ ​ ​ ​ ​ ​ //Save the current stack and start over ​ stackStack.push((Stack) expressionStack.clone()); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ expressionStack.clear(); ​ ​ ​ ​ ​ } ​

@Override ​ public void exitFactor(MicroParser.FactorContext ctx) { ​ ​ ​ ​ ​ ​ ​ buildExpression(); ​ ​ ​ } ​

@Override ​ public void exitStringDecl(MicroParser.StringDeclContext ctx) { ​ ​ ​ ​ ​ ​ ​ Expression op1 = new Expression(ctx.ID().getText(), "STR"); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ Expression result = new Expression(ctx.STRINGLITERAL().toString(), ​ ​ ​ ​ ​ ​ ​ ​ ​ "STR"); ​ ​

IRNodes.add(new IRNode("STR", op1, null, result)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​

@Override ​ public String toString() { ​ ​ ​ ​ return microSymbolTable.toString(); ​ ​ ​ ​ ​ ​ } ​

private void buildExpression() { ​ ​ ​ if (expressionStack.size() > 2) { ​ ​ ​ ​ ​ ​ ​ ​ //Need to determine type ​ IRNode exprNode; ​ 10

Expression op2 = expressionStack.pop(); ​ ​ ​ ​ ​ ​ Expression operator = expressionStack.pop(); ​ ​ ​ ​ ​ ​ Expression op1 = expressionStack.pop(); ​ ​ ​ ​ ​ ​

Expression result = new Expression("$T" + variableCount, ​ ​ ​ ​ ​ ​ ​ ​ op1.getType()); ​ ​ ​ ​ expressionStack.push(result); ​ ​ ​ ​ ​ ​ ​ variableCount++; ​ ​ ​

switch (operator.getName()) { ​ ​ ​ ​ ​ ​ case "+": ​ ​ ​ exprNode = new IRNode("ADD", op1, op2, result); ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ case "­": ​ ​ ​ exprNode = new IRNode("SUB", op1, op2, result); ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ case "*": ​ ​ ​ exprNode = new IRNode("MUL", op1, op2, result); ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ default: ​ ​ exprNode = new IRNode("DIV", op1, op2, result); ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ } ​ IRNodes.add(exprNode); ​ ​ ​ ​ ​ ​ ​ } ​ } } d. Main.java package com.csci468.micro; ​ ​ ​ ​ ​

import java.io.IOException; ​ ​ ​ ​ ​

class Main { ​ ​

public static void main(String[] args) throws IOException { ​ ​ ​ ​ ​ ​ ​ ​ if (args.length > 0) { ​ ​ ​ ​ ​ ​ String input = args[0]; ​ ​ ​ ​ ​ Micro micro = new Micro(input); ​ ​ ​ ​ ​ ​ ​ micro.Scan(); ​ ​ ​ ​ } else ​ ​ System.out.println("You must supply an input."); ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ }

e. Micro.java package com.csci468.micro; ​ ​ ​ ​ ​

import org..v4.runtime.ANTLRFileStream; ​ ​ ​ ​ ​ ​ ​ ​ ​ 11

import org.antlr.v4.runtime.BailErrorStrategy; ​ ​ ​ ​ ​ ​ ​ ​ ​ import org.antlr.v4.runtime.CommonTokenStream; ​ ​ ​ ​ ​ ​ ​ ​ ​ import org.antlr.v4.runtime.misc.ParseCancellationException; ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ import org.antlr.v4.runtime.tree.ParseTree; ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ import org.antlr.v4.runtime.tree.ParseTreeWalker; ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

import java.io.IOException; ​ ​ ​ ​ ​ import java.util.LinkedList; ​ ​ ​ ​ ​

class Micro { ​ ​ private MicroLexer lexer; ​ ​ ​ ​

Micro(String inputPath) throws IOException { ​ ​ ​ ​ ​ ​ try { ​ ​ lexer = new MicroLexer(new ANTLRFileStream(inputPath)); ​ ​ ​ ​ ​ ​ ​ ​ ​ } catch (IOException e2) { ​ ​ ​ ​ ​ System.out.println(e2.getLocalizedMessage()); ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ }

void Scan() { ​ ​ ​ CommonTokenStream tokens = new CommonTokenStream(lexer); ​ ​ ​ ​ ​ ​ ​ ​

MicroParser parser = new MicroParser(tokens); ​ ​ ​ ​ ​ ​ ​ parser.setErrorHandler(new BailErrorStrategy()); ​ ​ ​ ​ ​ ​ ​

try { ​ ​ LinkedList irNodes = new LinkedList<>(); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ParseTree tree = parser.program(); ​ ​ ​ ​ ​ ​

Listener microListener = new Listener(irNodes); ​ ​ ​ ​ ​ ​ ​ ParseTreeWalker test = new ParseTreeWalker(); ​ ​ ​ ​ ​ test.walk(microListener, tree); ​ ​ ​ ​ ​ ​

for (IRNode irNode : irNodes) ​ ​ ​ ​ ​ ​ System.out.println(irNode.getIRCode()); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ TinyGenerator gen = new TinyGenerator(irNodes); ​ ​ ​ ​ ​ ​ ​ System.out.print(gen.getTiny()); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ //System.out.print(microListener); ​

} catch (ParseCancellationException e) { ​ ​ ​ ​ ​ System.out.println(e.getMessage()); ​ ​ ​ ​ ​ ​ ​ ​ ​ //System.out.println("Not accepted"); ​ } ​ } }

f. Micro.g4 grammar Micro; ​ ​ 12

/* Program */ program : PROGRAM ID BEGIN body END; ​ ​ ​

/* Program Body */ body : decl funcDeclarations; ​ ​ ​ decl : stringDecl decl | ​ ​ ​ varDecl decl |; ​ ​

/* Global String Declaration */ stringDecl : STRING ID ASSIGNOP STRINGLITERAL ';'; ​ ​ ​ ​

/* Variable Declaration */ varDecl : varType idList ';'; ​ ​ ​ ​ varType : FLOAT | ​ ​ ​ INT; ​ ​ anyType : varType | ​ ​ ​ VOID; ​ ​ idList : ID idTail; ​ ​ ​ idTail : ',' ID idTail |; ​ ​ ​ ​

/* Function Paramater List */ paramDeclList : paramDecl paramDeclTail |; ​ ​ ​ paramDecl : varType ID; ​ ​ ​ paramDeclTail : ',' paramDecl paramDeclTail |; ​ ​ ​ ​

/* Function Declarations */ funcDeclarations : funcDecl funcDeclarations |; ​ ​ ​ funcDecl : FUNCTION anyType ID '(' paramDeclList ')' BEGIN funcBody ​ ​ ​ ​ ​ ​ END; ​ funcBody : decl stmtList; ​ ​ ​

/* Statement List */ stmtList : stmt stmtList | ​ ​ ​ ; stmt : baseStmt | ​ ​ ​ ifStmt | ​ ​ whileStmt ​ ; ​ baseStmt : assignStmt | ​ ​ ​ readStmt | ​ ​ writeStmt | ​ ​ returnStmt ​ ; ​

/* Basic Statements */ assignStmt : assignExpr ';'; ​ ​ ​ ​ assignExpr : ID ASSIGNOP expr; ​ ​ ​ readStmt : READ '(' idList ')' ';'; ​ ​ ​ ​ ​ ​ writeStmt : WRITE '(' idList ')' ';'; ​ ​ ​ ​ ​ ​ 13

returnStmt : RETURN expr ';'; ​ ​ ​ ​

/* Expressions */ expr : exprPrefix factor; ​ ​ ​ exprPrefix : exprPrefix factor ADDOP # ADDOP | # emptyExprPrefix; ​ ​ ​ ​ ​ ​ ​ factor : factorPrefix postfixExpr; ​ ​ ​ factorPrefix : factorPrefix postfixExpr MULOP # MULOP | # ​ ​ ​ ​ ​ emptyFactorPrefix; ​ postfixExpr : primary | ​ ​ ​ callExpr; ​ ​ callExpr : ID '(' exprList ')'; ​ ​ ​ ​ ​ ​ exprList : expr exprListTail |; ​ ​ ​ exprListTail : ',' expr exprListTail |; ​ ​ ​ ​ primary : '(' expr ')' # paranths| ​ ​ ​ ​ ​ ​ ​ ID # primaryID| ​ ​ ​ ​ INTLITERAL # primaryINT| ​ ​ ​ ​ FLOATLITERAL # primaryFLOAT; ​ ​ ​ ​

/* Complex Statements and Condition */ ifStmt : IF '(' cond ')' decl stmtList elsePart ENDIF; ​ ​ ​ ​ ​ ​ ​ elsePart : ELSE decl stmtList # else | # emptyElse; ​ ​ ​ ​ ​ ​ ​ cond : expr COMPOP expr; ​ ​ ​

/* While statements */ whileStmt : WHILE '(' cond ')' decl stmtList ENDWHILE; ​ ​ ​ ​ ​ ​ ​

/* Lexer Grammer */ /* Keywords */ PROGRAM : 'PROGRAM' ; ​ ​ ​ BEGIN : 'BEGIN' ; ​ ​ ​ END : 'END' ; ​ ​ ​ FUNCTION : 'FUNCTION'; ​ ​ ​ READ : 'READ' ; ​ ​ ​ WRITE : 'WRITE' ; ​ ​ ​ IF : 'IF' ; ​ ​ ​ ELSE : 'ELSE' ; ​ ​ ​ ENDIF : 'ENDIF' ; ​ ​ ​ WHILE : 'WHILE' ; ​ ​ ​ ENDWHILE : 'ENDWHILE'; ​ ​ ​ CONTINUE : 'CONTINUE'; ​ ​ ​ BREAK : 'BREAK' ; ​ ​ ​ RETURN : 'RETURN' ; ​ ​ ​ INT : 'INT' ; ​ ​ ​ VOID : 'VOID' ; ​ ​ ​ STRING : 'STRING' ; ​ ​ ​ FLOAT : 'FLOAT' ; ​ ​ ​

/*Doesn't really do anything...*/ ASSIGNOP : ':='; ​ ​ ​ 14

COMPOP : '<' | '>' | '=' | '!=' | '<=' | '>='; ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ADDOP : '+' | '­' ; ​ ​ ​ ​ ​ MULOP : '*' | '/' ; ​ ​ ​ ​ ​ OPERATOR : '(' ​ ​ | ')' ​ ​ | ';' ​ ​ | ',' ​ ​ ; ​

STRINGLITERAL : '"'~('"')*'"' ​ ​ ​ ​ ​ ​ ; ​

ID : [a­zA­Z][a­zA­Z0­9]* ​ ;

INTLITERAL : DIGIT+ ​ ​ ​ ;

FLOATLITERAL : DIGIT*'.'DIGIT+ ​ ​ ​ ​ ​ ​ ;

COMMENT : '­­'.*?(WINEOL | UNIEOL) ­> skip ​ ​ ​ ​ ​ ​ ​ ​ ; ​

WS : [ \t\n\r]+ ­> skip ​ ​ ; ​

fragment WINEOL : ('\r\n'); ​ ​ ​ ​ fragment UNIEOL : ('\n'); ​ ​ ​ ​ fragment DIGIT : [0­9]; ​ ​

g. Scope.java package com.csci468.micro; ​ ​ ​ ​ ​

import org.antlr.v4.runtime.misc.ParseCancellationException; ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

import java.util.LinkedHashMap; ​ ​ ​ ​ ​ import java.util.Map; ​ ​ ​ ​ ​

/** * Michael Shihrer 15

* Matthew Johnerson * 26 March 2016 */ class Scope { ​ ​ private Map symbolMap; ​ ​ ​ ​ ​ ​ ​

private String name; ​ ​ ​ ​ private int blockNumber; ​ ​ ​

Scope(String name) { ​ ​ ​ this.name = name; ​ ​ ​ ​ ​ symbolMap = new LinkedHashMap<>(); ​ ​ ​ ​ ​ ​ } ​

Scope(int blockNumber) { ​ ​ ​ ​ this.name = "BLOCK " + blockNumber; ​ ​ ​ ​ ​ ​ ​ this.blockNumber = blockNumber; ​ ​ ​ ​ ​ symbolMap = new LinkedHashMap<>(); ​ ​ ​ ​ ​ ​ } ​

//Add method to insert a symbol to the scope ​ void addSymbol(Symbol symbol) { ​ ​ ​ ​ ​ //Create new symbol ​ //Add it to a scope if (!symbolMap.containsKey(symbol.getName())) { ​ ​ ​ ​ ​ ​ ​ ​ symbolMap.put(symbol.getName(), symbol); ​ ​ ​ ​ ​ ​ ​ ​ ​ } else { ​ ​ ​ throw new ParseCancellationException(String.format("DECLARATION ​ ​ ​ ​ ​ ​ ​ ​ ERROR %s", symbol.getName())); ​ ​ ​ ​ ​ } ​ }

public String toString() { ​ ​ ​ StringBuilder output = new StringBuilder(); ​ ​ ​ ​ ​ ​ if (this.name.equals("GLOBAL")) { ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ output.append(String.format("Symbol table %s\n", this.name)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } else { ​ ​ ​ output.append(String.format("\nSymbol table %s\n", this.name)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ //Append symbols ​ for (Map.Entry entry : symbolMap.entrySet()) { ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ Symbol symbol = entry.getValue(); ​ ​ ​ ​ ​ ​ ​

output.append(symbol); ​ ​ ​ ​ ​ ​ } ​ return output.toString(); ​ ​ ​ ​ ​ ​ } ​

Boolean hasSymbol(String ID) { ​ ​ ​ ​ return this.symbolMap.containsKey(ID); ​ ​ ​ ​ ​ ​ ​ 16

} ​

Symbol getSymbol(String ID) { ​ ​ ​ ​ return this.symbolMap.get(ID); ​ ​ ​ ​ ​ ​ ​ } ​

int getBlockNumber() { ​ ​ ​ return this.blockNumber; ​ ​ ​ ​ } ​ }

h. Symbol.java package com.csci468.micro; ​ ​ ​ ​ ​

/** * Michael Shihrer * Matthew Johnerson * 28 March 2016 */ class Symbol { ​ ​ private String name; ​ ​ ​ ​ private String type; ​ ​ ​ ​ private String value; ​ ​ ​ ​

Symbol(String name, String type, String value) { ​ ​ ​ ​ ​ ​ ​ this.name = name; ​ ​ ​ ​ ​ this.type = type; ​ ​ ​ ​ ​ this.value = value; ​ ​ ​ ​ ​ } ​

Symbol(String name, String type) { ​ ​ ​ ​ ​ ​ this.name = name; ​ ​ ​ ​ ​ this.type = type; ​ ​ ​ ​ ​ this.value = ""; ​ ​ ​ ​ ​ ​ } ​

String getName() { ​ ​ return name; ​ ​ ​ } ​

String getType() { ​ ​ return type; ​ ​ ​ } ​

@Override ​ public String toString() { ​ ​ ​ if (value.isEmpty()) { ​ ​ ​ ​ ​ ​ return String.format("name %s type %s\n", this.name, this.type); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } else { ​ ​ ​ 17

return String.format("name %s type %s value %s\n", this.name, ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ this.type, this.value); ​ ​ ​ ​ ​ ​ ​ ​ } ​ } }

i. SymbolTable.java package com.csci468.micro; ​ ​ ​ ​ ​

import java.util.ArrayList; ​ ​ ​ ​ ​ import java.util.Stack; ​ ​ ​ ​ ​

/** * Michael Shihrer * Matthew Johnerson * 28 March 2016 */ class SymbolTable { ​ ​ private Stack scopeStack; ​ ​ ​ ​ ​ ​ ​

//Store all scopes. Necessary for outputting. ​ private ArrayList scopes; ​ ​ ​ ​ ​ ​ ​

SymbolTable() { ​ //Initialize stack ​ scopeStack = new Stack<>(); ​ ​ ​ ​ ​ ​ scopes = new ArrayList<>(); ​ ​ ​ ​ ​ ​

//Create the default, global scope. ​ Scope global = new Scope("GLOBAL"); ​ ​ ​ ​ ​ ​ ​ ​ scopeStack.push(global); ​ ​ ​ ​ ​ ​ ​ scopes.add(global); ​ ​ ​ ​ ​ ​ ​ } ​

Scope createScope(int scopeCount) { ​ ​ ​ ​ Scope scope = new Scope(scopeCount); ​ ​ ​ ​ ​ ​ scopeStack.push(scope); ​ ​ ​ ​ ​ ​ ​ scopes.add(scope); ​ ​ ​ ​ ​ ​ ​

//System.out.println(String.format("Symbol table %s", name)); ​ return scope; ​ ​ } ​

Scope createScope(String name) { ​ ​ ​ ​ Scope scope = new Scope(name); ​ ​ ​ ​ ​ ​ scopeStack.push(scope); ​ ​ ​ ​ ​ ​ ​ scopes.add(scope); ​ ​ ​ ​ ​ ​ ​

//System.out.println(String.format("Symbol table %s", name)); ​ 18

return scope; ​ ​ } ​

Scope destroyScope() { ​ ​ return scopeStack.pop(); ​ ​ ​ ​ ​ ​ } ​

//Define a symbol ​ Symbol createSymbol(String name, String type, String value) { ​ ​ ​ ​ ​ ​ ​ ​ Symbol symbol = new Symbol(name, type, value); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

scopeStack.peek().addSymbol(symbol); ​ ​ ​ ​ ​ ​ ​ ​ ​ String output = String.format("name %s type %s value %s", name, type, ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ value); ​ //System.out.println(output); ​

return symbol; ​ ​ } ​

Symbol createSymbol(String name, String type) { ​ ​ ​ ​ ​ ​ Symbol symbol = new Symbol(name, type); ​ ​ ​ ​ ​ ​ ​ ​

scopeStack.peek().addSymbol(symbol); ​ ​ ​ ​ ​ ​ ​ ​ ​ String output = String.format("name %s type %s", name, type); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ //System.out.println(output); ​

return symbol; ​ ​ } ​

Symbol getSymbol(String ID) { ​ ​ ​ ​ for (Scope aScopeStack : scopeStack) { ​ ​ ​ ​ ​ ​ if (aScopeStack.hasSymbol(ID)) { ​ ​ ​ ​ ​ ​ return aScopeStack.getSymbol(ID); ​ ​ ​ ​ ​ ​ } ​ }

return null; ​ ​ } ​

@Override ​ public String toString() { ​ ​ ​ StringBuilder output = new StringBuilder(); ​ ​ ​ ​ ​ ​ for (Scope scope : scopes) ​ ​ ​ ​ ​ ​ output.append(scope); ​ ​ ​ ​ ​ ​ ​

return output.toString(); ​ ​ ​ ​ ​ ​ } ​ }

19

j. TinyGenerator.java package com.csci468.micro; ​ ​ ​ ​ ​

import java.util.LinkedList; ​ ​ ​ ​ ​

class TinyGenerator { ​ ​

//Stores var declarations ​ private StringBuilder vars = new StringBuilder(); ​ ​ ​ ​ ​ ​ ​ ​ //Builds the Tiny code ​ private StringBuilder tiny = new StringBuilder(); ​ ​ ​ ​ ​ ​ ​ ​

private LinkedList code; ​ ​ ​ ​ ​ ​ ​

TinyGenerator(LinkedList IRcode) { ​ ​ ​ ​ ​ code = IRcode; ​ ​ ​ generate(); ​ ​ } ​

private void generate() { ​ ​ ​ while (!code.isEmpty()) { ​ ​ ​ ​ ​ ​ IRNode current = code.pop(); ​ ​ ​ ​ ​ ​ ​ int register = 0; ​ ​ ​ ​ ​ String opcode = current.getOpcode(); ​ ​ ​ ​ ​ ​ Expression f1 = current.getOP1(); ​ ​ ​ ​ ​ ​ Expression f2 = current.getOP2(); ​ ​ ​ ​ ​ ​ Expression f3 = current.getResult(); ​ ​ ​ ​ ​ ​ //Temp register for cases such as: a := b; ​ //need to first assign a to r99 and then r99 to b. String temp = "r99"; ​ ​ ​ ​

switch (opcode) { ​ ​ ​ ​ //int and float variable declaration ​ case "VAR": ​ ​ ​ vars.append(String.format("var %s\n", f3)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ //String variable declaration ​ case "STR": ​ ​ ​ vars.append(String.format("str %s %s\n", f1, f3)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ //Labels ​ case "LABEL": ​ ​ ​ tiny.append(String.format("label %s\n", f3)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ //Unconditional jump ​ case "JUMP": ​ ​ ​ tiny.append(String.format("jmp %s\n", f3)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ //Read inputs ​ 20

case "READI": ​ ​ ​ case "READR": ​ ​ ​ tiny.append(String.format("sys %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ opcode.toLowerCase(), f1)); ​ ​ ​ ​ ​ ​ break; ​ ​ //Assignment statements ​ case "STOREI": ​ ​ ​ case "STORER": ​ ​ ​ //check to catch statements such as a := b; ​ if (!f1.getName().contains("$") && ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ !f3.getName().contains("$")) { ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("move %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f1.getName().replace("$T", "r"), temp)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("move %s %s\n", temp, ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } else { ​ ​ ​ tiny.append(String.format("move %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f1.getName().replace("$T", "r"), ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ break; ​ ​ //Increment and decriment ​ case "INCI": ​ ​ ​ case "DECI": ​ ​ ​ tiny.append(String.format("%s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ opcode.toLowerCase(), f3.getName().replace("$T", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ "r"))); ​ ​ break; ​ ​ //Addition ​ case "ADDI": ​ ​ ​ case "ADDR": ​ ​ ​ tiny.append(String.format("move %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f1.getName().replace("$T", "r"), ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("%s %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ opcode.toLowerCase(), f2.getName().replace("$T", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ "r"), f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ //Subtraction ​ case "SUBI": ​ ​ ​ case "SUBR": ​ ​ ​ tiny.append(String.format("move %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f1.getName().replace("$T", "r"), ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("%s %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ opcode.toLowerCase(), f2.getName().replace("$T", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ "r"), f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ //Multiplication ​ case "MULI": ​ ​ ​ 21

case "MULR": ​ ​ ​ tiny.append(String.format("move %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f1.getName().replace("$T", "r"), ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("%s %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ opcode.toLowerCase(), f2.getName().replace("$T", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ "r"), f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​ //Division ​ case "DIVI": ​ ​ ​ case "DIVR": ​ ​ ​ tiny.append(String.format("move %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f1.getName().replace("$T", "r"), ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("%s %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ opcode.toLowerCase(), f2.getName().replace("$T", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ "r"), f3.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ break; ​ ​

//Integer comparison ​ case "EQI": ​ ​ ​ case "NEI": ​ ​ ​ case "GEI": ​ ​ ​ case "LEI": ​ ​ ​ case "LTI": ​ ​ ​ case "GTI": ​ ​ ​ if (!f1.getName().contains("$") && ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ !f2.getName().contains("$")) { ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("move %s %s\n", f2, temp)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("cmpi %s %s\n", f1, temp)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } else { ​ ​ ​ tiny.append(String.format("cmpi %s %s\n", f1, ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ f2.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ tiny.append(String.format("j%s %s\n", opcode.substring(0, ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 2).toLowerCase(), f3)); ​ ​ ​ ​ ​ ​ break; ​ ​ //Real number comparison ​ case "EQR": ​ ​ ​ case "NER": ​ ​ ​ case "GER": ​ ​ ​ case "LER": ​ ​ ​ case "LTR": ​ ​ ​ case "GTR": ​ ​ ​ if (!f1.getName().contains("$") && ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ !f2.getName().contains("$")) { ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("move %s %s\n", f2, temp)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ tiny.append(String.format("cmpr %s %s\n", f1, temp)); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } else { ​ ​ ​ tiny.append(String.format("cmpr %s %s\n", f1, ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 22

f2.getName().replace("$T", "r"))); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​ tiny.append(String.format("j%s %s\n", opcode.substring(0, ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 2).toLowerCase(), f3)); ​ ​ ​ ​ ​ ​ break; ​ ​ //System output ​ case "WRITEI": ​ ​ ​ case "WRITES": ​ ​ ​ case "WRITER": ​ ​ ​ tiny.append(String.format("sys %s %s\n", ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ opcode.toLowerCase(), f1)); ​ ​ ​ ​ ​ ​ break; ​ ​ //Links for function calls ​ case "LINK": ​ ​ ​ case "RET": ​ ​ ​ //unused ​ break; ​ ​ default: ​ ​ System.out.println("Invalid opcode: " + opcode); ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

} ​ } tiny.append("sys halt\n"); ​ ​ ​ ​ ​ ​ ​ ​ ​ tiny.insert(0, vars); ​ ​ ​ ​ ​ ​ ​ ​ ​ } ​

String getTiny() { ​ ​ return tiny.toString(); ​ ​ ​ ​ ​ ​ } ​ }

2. Teamwork The team worked collaboratively on the project. We utilized a mix of working together and remotely. Generally, we used our time in person to plan and split up work. Most work was done remotely. Many tools were necessary to complete the project in this fashion. We made heavy use of git, github, Google Drive, text messaging, and web­based chat to coordinate our work. We agreed to use IntelliJ IDEA as our IDE and used similar development environments. This proved to be invaluable since our schedules often didn’t allow more than a couple of hours to meet up. Work was split during our meetings and we checked up periodically to make sure there were no problems.

Team Member Contribution

Team Member 1 IR Code Generation 23

Symbol Table

Symbol Table Report

Semantic Routines Report

Total Contribution: 50%

Team Member 2 Tiny Code Conversion

Grammar Creation

Scanner Report

Parser Report

Full Compiler Report

Total Contribution: 50%

3. Design Pattern Throughout our project we utilized many different object oriented programming practices. One recurring practice we used was encapsulation. A specific instance in our code is in the IRNode class, which was used to represent the Intermediate Representation code. Variables in this class are private and can only be accessed by calling methods such as getOpcode(). We decided to create objects with private variables to improve code readability and keep information within the class secure from manipulation. This also allowed us to write the code to determine type specific IR code in a concise and clear way. For example, to create an ADD operation for integers, we simply had to create an IRNode as ADD and the constructor would determine if it should be ADDI or ADDR based on the types of the operands passed to it. Before we made that decision, our code required a lot of repeated work to determine what types we were handling. This allowed our code to be much cleaner and allowed the Listener class to focus on semantic actions rather than IR codes. We also did a good job abstracting our implementation. We represented the ideas required for the project as objects. Our implementation of a symbol table is handled with SymbolTable.java, Symbol.java and Scope.java. SymbolTable.java does most of the work required for a symbol table while Symbol.java and Scope.java represent the idea of a symbol and scope. 4. Technical Writing The team wrote an 18 page report as part of the final submission. The report was written in stages and included with each step of the project. The report describes our processes and design decisions for each aspect of the project. Included in the report is a technical description of each step. The final report is located on GitHub at https://github.com/shihrer/csci468/releases/download/Step4/Report.pdf. ​ 24

5. UML Diagram

6. Design Trade‐Offs A design trade­off we made was to sacrifice space requirements during execution to keep run time high. Our code makes heavy use of stacks for memory in semantic routines. This includes the use of a stack of stacks in order to handle recursive depths of the . In a large Micro program, this would lead to an increased usage of runtime memory. The reason we made this decision was to keep code clean, easy to understand, and quick. Our stack of stacks only required a handful of lines to implement. Other possible solutions would have required large changes to our code. One alternative would have been to use ANTLR’s visitor pattern rather than a listener. This would have required us to go back to the work we did for Step 2 and change everything since then. Additionally, this wouldn’t have resulted in much difference for memory usage since the visitor pattern relies on return calls to handle data memory. Another alternative would have been to utilize ANTLR’s parse tree properties, but this could have seen an even larger increase of space requirements since each node of the parse tree will be used to store data. Our solution was the best trade off in our opinion.

25

7. Software Development Life Cycle We didn’t utilize a specific development model in creating our project. However, we utilized a lot of agile principles in the development of our project. Our planning process was largely adaptive to what we wrote and what we needed to write. It fit well with the structure of the project since it was split in multiple steps. Each step was treated as a milestone. When our code passed the supplied tests, we could make a release. Our releases were used as submissions. Designing was mostly done in person. We talked about implementation details and made major decisions on how to write the project. For example, the final step of the project was to generate IR code and convert it to the Tiny architecture. This work was easily split between the team members by assigning IR code generation to one member and Tiny code to the other. We agreed on what the output of the IR code would look like and what the Tiny generator would expect for input. This made it trivial to develop each piece concurrently. Testing was done continuously as we developed. Major changes were immediately tested to validate the work. If the tests failed, we continued work. One aspect we could have improved on was to better automate testing. This could have saved time and improved the development process by making it more efficient. The work was started to include unit testing but never finished. We used documentation as a combination of how we wrote code and the corresponding sections of our report. We designed our code to make sense, be easy to read, and follow standard practices and conventions. We explained design decisions and implementation details in the report. This process allowed us to develop the project rapidly yet remain flexible to typical college student schedules. Trying to use a less flexible approach could have taken more time and impacted our ability to make submissions on time. Our process allowed each team member to work when they were available as opposed to having to find times to meet. I think teams that did most of their work together would face problems with scheduling and increase their risk of completing the steps on time. The only hindrances we think we faced was the time lost during our testing and one problem from not enough planning. If testing was automated, we could have developed more rapidly. It would have saved a lot of time and given us a better look at how well our code was executing. We also faced one major obstacle during our development that could have been avoided if we had planned in slightly more detail. The problem arose from how to handle expression calculations. It wasn’t something we expected to have an issue with, so time could have been saved by planning in more detail.

Montana State University

Micro Compiler

Michael Shihrer Matt Johnerson CSCI 468 ­ Compilers

1

Introduction 2

Background 2

Methods and Discussion 3

Scanner 3

Background 3

Implementation and Methods 4

Difficulties 5

Parser 5

Background 5

Implementation and Methods 5

Difficulties 6

Symbol Table 7

Background 7

Implementation and Methods 7

Difficulties 9

Semantic Routines 9

Background 9

Implementation and Methods 11

Difficulties 13

Full Fledged Compiler 16

Conclusion and Future Work 17

2

1. Introduction

This project is our implementation of a compiler for the Micro . At its most basic level, a compiler can be thought of as a translator: It takes one language and turns it into another. In the context of this project a compiler takes Micro source code as input, runs it through a series of steps, and outputs Tiny assembly code. The end result may look different but the meaning remains the same. The motivation for this project mainly comes from the Compiler course, CSCI 468, as a major portion of the grade is weighted on implementing a working compiler. Another source of motivation came from sheer curiosity and desire to learn something new. Neither of us has had any experience with compilers or how they function and this has proved to be a valuable learning experience with an integral part of Computer Science. Throughout this project countless hours were spent scouring the internet for valuable bits of pertinent information, pouring over documentation, coding, debugging, and even more debugging. The end result of this hard work is a fully functioning compiler, written from scratch, that can successfully “translate” one language to another. What follows is a detailed description of each step, along with the decision making process behind each design choice made throughout this project. Furthermore, understanding and expounding upon the difficulties experienced during each step will help to shed even more light on the design process.

2. Background

As mentioned above a compiler is simply a translator, however, a compiler’s importance to Computer Science cannot be emphasized enough. A compiler provides a level of convenience that is arguably unmatched when it comes to getting computers to do what we want them to do. Just think if software engineers were tasked to write code in assembly or machine code, using 1’s and 0’s. It would be borderline unreadable and nearly impossible to debug. Compilers have come a long way since the first implementation in 1952. Today they can handle surprisingly human readable and complex programming languages, something that early computer scientists could only dream of doing. Even though the Micro programming language can seem simplistic when compared with other modern day programming languages, building a compiler for it is no easy task and requires a multitude of various steps. The implementation that we sought to replicate required four steps:

3

Step one is the scanner portion, which analyzes input code and matches valid “tokens” which are defined in a grammar file. A grammar file is simply a set of rules that define a language.

ID : [a­zA­Z][a­zA­Z0­9]* ;

An example of a valid token for an ID (variable name) in the Micro language

Step two, the parser, uses the tokens from step one, and the order in which they appear, to determine if the program has a valid syntax.

idList : ID idTail ; idTail : ',' ID idTail | ;

Examples of parser rules for a list of ID’s

If the source code passes the parse requirements it is then passed to step three of the project, symbol table generation. A symbol table is a record of program scopes and variable declarations used to make sure there are no variable conflicts in any given program. The final and most difficult step is generating assembly code. This step can be thought of in two parts, generating an intermediate representation code (IR code) and converting this code to Tiny assembly code. Combine all these steps and the result is a fully functioning compiler.

3. Methods and Discussion

a. Scanner

i. Background

For the tokenizer (scanner) portion of our project we decided that we did not want to write our own scanner generator, due to the fact that so many others were already available and proven in real world examples. The lab portion of the course required us to learn about LEX, a popular lexical analyzer for Unix systems. After the first lab, and some brief experience using LEX, we decided to look for an alternative scanner generator that would ultimately work better with Java, our chosen

4

language for creating the compiler. We decided that the ANTLR framework for language recognition would be the best choice for our needs. The decision to use ANTLR was based on many factors: the professional look of the website and documentation, online help with examples, and reviews from other ANTLR users were some of the main reasons we chose ANTLR.

ii. Implementation and Methods

Before we began work on any of the Steps for this project we decided to integrate version control into our work process. Due to various reasons, this ended up taking a significant amount of time for Step one of the project. After considering all version control options at our disposal we chose to implement version control via the Intellij IDE and GitHub. We are confident that the benefits of utilizing version control will heavily outweigh the time commitment necessary to put it in place. To complete Step one of the project we needed a fully functioning tokenizer which would accept input source code and output all valid tokens. Completing this task required writing an ANTLR grammar file, which would be used when scanning input files for valid tokens. Writing the actual grammar file turned out to be a straightforward task once we understood all of what was required for Step one. After our grammar file was written we tested it on small strings of Micro source code to determine how well it was functioning, if at all. The last and most challenging part of Step one was getting our Java program to take input files through command line arguments and produce output files.

MicroLexer lexer = new MicroLexer(new ANTLRFileStream(inputFile)); Vocabulary vocab = lexer.getVocabulary();

for(Token token: lexer.getAllTokens()){ System.out.println("Token Type: " + vocab.getDisplayName(token.getType()) + "\nValue: " + token.getText()); if(vocab.getDisplayName(token.getType()).equals("COMMENT")){ //do nothing } }

Implementation of the tokenizer for step one

5

iii. Difficulties

Overall, writing a scanner generator was a simple yet work intensive component of writing a compiler and therefore produced only one minor difficulty when attempting to implement it. The grading process for this portion required us to use the grading script provided to us via D2L. We had to spend more time on this than any other element of Step one due to a lack of overall script knowledge and ended up consulting the internet for help on multiple occasions. We eventually succeeded in getting our program to accept command line arguments after being integrated with the grading script. Our finished product for Step one produced results that were identical to what was expected.

b. Parser

i. Background

To create the parser for the Micro language, we decided to continue working with ANTLR and the Java language. This allowed us to build on our existing grammar file and utilize the scanner we generated in the first step of the project. Our work is continued to be version controlled using Git and hosted on GitHub. We began work by modifying what was turned in for Step one, the scanner. Our solution is validated based on 21 test cases. Our output is either “Accepted” or “Not accepted” depending on whether the parser successfully parses tokens from the scanner. To verify our results, the output is compared to expected results.

ii. Implementation and Methods

To implement the parser, we had to make modifications to our grammar file and our Java program. The grammar file had to be modified to include grammar specific to the parser. Our Java program had to be modified to send the output of our scanner to a parser. Finally, in order to output correctly, error handling had to be implemented through ANTLR and standard Java exception handling. Modifying the grammar file was a straightforward process. We used the given grammar file and converted it to syntax that ANTLR

6

understands. The new parser grammar was added to the top of our existing grammar file. The modifications to our Java program were minimal. We removed outputting the matched tokens to standard output. Instead, we pass the tokens to our parser. After running the parser, the program outputs “Accepted” if no error occurs. If an error occurs, ANTLR throws an exception. When this exception is caught, the program outputs “Not accepted” and exits. At this point, we decided to reorganize our project. Originally, we attempted to split our parser and lexer into multiple grammar files and Java classes. With the implementation of the parser, we decided that it was actually better to group them together.

MicroParser parser = new MicroParser(tokens); parser.setErrorHandler(new BailErrorStrategy()); try { parser.program(); System.out.println("Accepted"); }catch(Exception e){ System.out.println("Not accepted"); }

Passes the tokens to the parser instead of writing to the console

iii. Difficulties

Difficulties in creating the parser were similar to writing the scanner. The process was simplified by utilizing ANTLR. Our primary difficulty was utilizing the grammar file correctly. We initially thought it was correct, but after testing with the script file we realized some input files were being rejected by the parser when they shouldn’t have or inputs were being rejected for the wrong reasons. To troubleshoot our issues, we were able to debug the Java program to step through the . We realized that our grammar file was slightly incorrect and was able to fix our tests by modifying small portions of the grammar. An example of our errors is the following grammar: str : 'STRINGLITERAL' ; ​ ​ ​

7

The correct grammar is as follows:

str : STRINGLITERAL ; ​ ​ ​

The correction allowed our parser grammar to match with our lexer grammar correctly. We identified one additional error that was similar to the above. Fixing these errors enabled our parser to correctly validate the 21 test cases.

c. Symbol Table

i. Background

The next step of building our compiler required us to implement a symbol table for the Micro grammar. Writing code for a symbol table can be done many different ways, making it difficult to settle on only one. After researching and testing some various symbol table functionalities we settled on using a linked hashmap to keep track of variable declarations and a stack to keep track of scopes. Since both of these components were provided to us by the Java utility library we didn’t need to write variations of them ourselves. Lab three provided some additional insight into the symbol table construction process. While we didn’t actually end up using the methods outlined by the lab, it gave us some valuable information regarding the overall process. The completed symbol table step would output files that contained either lists of scopes and variables declared under each respective scope, or simply an error line that read:

"DECLARATION ERROR "

Our completed symbol table step produced files that were identical to the expected output files required for this step.

ii. Implementation and Methods

During our research of symbol table implementation with ANTLR ​ we discovered there was an auto generated listener file which we could use when writing a symbol table. We created our own listener class which

8

extended the auto generated listener file. This allowed us to know when certain parser rules were being called and ultimately was the basis for recording variable and scope declarations. The auto generated listener file was built using the grammar file we wrote for steps one and two so all we had to do which figure out which parser rules were called when a new scope or variable was to be added. Once the specific parser rules were identified we used the @Override ​ function to handle symbol table action items such as pushing and popping. To help with code readability we created three additional Java classes. The SymbolTable class was the single point of access we used when building our symbol table. This class called out to two additional classes we constructed called Symbol and Scope. These classes, as their names might suggest, were used to represent the different scopes and variables encountered during the parsing of programs. Our implementation of the symbol table did not print scopes and variable declarations to the output file in real time which made the error handling easier than it could have been. The symbol table that we created utilized a linked hashmap instead of a regular hashmap to record symbols. This design choice was made due to the fact that variable declarations needed to be in the order of when they were declared. In order to keep track of scopes we simply pushed a scope onto our stack whenever a function, if statement or while statement was declared and popped a scope whenever the respective end statement was observed. It was assumed that there would always be a ‘GLOBAL’ scope for every program, so we chose to automatically add this scope first when building our symbol table. Whenever variable declarations were observed we created a Symbol object for each variable and saved them to the current scope. We decided to write toString methods for each of our classes: SymbolTable, Symbol and Scope so that a single print call could easily be made from our Micro class.

@Override public void enterFuncDecl(MicroParser.FuncDeclContext ctx) { //Create new scope

9

microSymbolTable.createScope(ctx.id().getText()); }

@Override public void exitFuncDecl(MicroParser.FuncDeclContext ctx) { //Pop scope destroyScope(); } Examples of creating and destroying a scope via overriding the superclass method

iii. Difficulties

One of the biggest difficulties we faced when writing our symbol table implementation was determining how best to code it so that it made sense to us. It wasn’t until a few hours of researching that we discovered the auto generated listener file which we ended up using as the basis to build our symbol table on. Once we realized we could use the listener to help us when adding scopes and declarations it became a more straightforward task to create our symbol table. Another part of this step which required some time to work out was figuring out how the listener file could be used to our benefit. Since ANTLR is a new tool to both of us, a noteworthy amount of time was spent figuring out what built in functionality is provided to us by ANTLR that we were previously unaware of. It payed off to do some research as we ended up using this functionality as an integral part of our symbol table.

d. Semantic Routines

i. Background

The final step in building our compiler was the generation of Tiny assembly code or the creation of semantic routines. This step is required so that parsed Micro source code can be converted into Tiny assembly code using intermediate representation code (IR code) as building blocks. Before we implemented semantic routines, we performed extensive research on the capabilities of ANTLR and the tools the framework provides. We found that there are multiple ways to implement semantic

10

routines. The first option that was almost immediately ruled out was to implement semantic routines in the grammar file. Although ANTLR supports this, there are many drawbacks. First, it makes the grammar harder to read and maintain. Second, it limits the ability of a grammar file to be reused in other projects. ANTLR is able to generate two different classes for performing actions after generating a parse tree. These are the listener and the visitor. Each one can perform roughly the same tasks, however they do so in very different ways. They are both essentially event driven objects that call methods at certain points of walking a parse tree. The difference between a listener and a visitor is that a listener walks the parse tree automatically and the visitor allows developers to control how to walk the tree. Since we decided to use the listener with Step 3, we decided to continue to use it for Step 4. This allowed us to extend our work rather than rewriting a major portion of our code for Step 4. Considering the issues discussed in section 2.d.iii, we think using a Visitor might have avoided the problem since it allows a developer to control how a tree is parsed. After deciding to continue using a listener, we had to decide how to manage data during the parse tree walk. ANTLR also supports three methods of managing data as the tree is walked. These methods are the use of stacks, return calls, and node properties. Stacks and node properties are best used with listeners while the visitor directly implements return calls. Since we decided to use a listener, we were left with the option of using either stacks or node properties. We ultimately decided to use stacks for our IR code generation. The choice was primarily influenced based on our familiarity with the implementation of stacks. It was something we were comfortable with and understand well. Node properties have a greater space requirement since it stores data to each node of the tree. This didn’t impact our decision much, but it’s something that should be kept in mind. Our basic design at this point was to implement a class for representing IR code, a class for converting IR code to Tiny, and any helper classes that might be needed. We identified the methods our Listener class needed to implement at this point, too. One thing that ANTLR v4 does not provide is the generation of an . This is a feature that was removed from ANTLR v3 with the explanation that visitors and listeners provide similar features along with the usage of labels on parse rules.

11

ii. Implementation and Methods

We completed step four in two parts: IR code generation and IR code to Tiny code conversion. IR code generation is completed first and the results are passed for Tiny code generation. To generate IR code, we relied on two classes: IRNode and Expression. IRNode represents a single line of IR code and contains 4 data members. The data members are _opcode, _operand1, _operand2, and _result. _operand1, _operand2, and _result are each an Expression object. The Expression class is a simple object that stores the name and type of an operand or result. This is a general representation of literals, variables, and temporary variables. Since One way we kept our code easy to read was to generate type specific opcodes in the constructor of the IRNode. This allowed our logic to be much simpler in our listener class.

IRNode(String OPCode, Expression OP1, Expression OP2, Expression Result) { if(OP1 != null && !OPCode.equals("STR")){ switch (OP1.getType()) { case "INT": this._opcode = OPCode + "I"; break; case "FLOAT": this._opcode = OPCode + "R"; break; default: this._opcode = OPCode + "S"; break; } }else { this._opcode = OPCode; } this._operand1 = OP1; this._operand2 = OP2; this._result = Result; }

12

The IRNode constructor showing how type specific OPCodes are generated.

Other important aspects of our code include a linked list of IRNode called IRNodes, a stack of integers called labelStack, a stack of expressions called expressionStack, and a stack of expression stacks called stackStack. IRNodes is passed to the listener from our main method and is the internal representation of our IR code. Passing it to the listener from our main method makes it easy to pass our IRNodes to our Tiny generator. labelStack is a simple stack of integers for keeping track of labels for branches. When we enter a branch, a label is created and pushed onto the stack. When we exit a branch, a label is popped off the stack. expressionStack is a stack for keeping expressions in memory until they can be executed. The results of expressions are then stored on the same stack. This is an elegant method of calculating expressions as we walk a parse tree. stackStack is a stack of expression stacks. This was a necessary component to solve errors with our code generation and is described in detail in section 2.d.iii. Its purpose is to store the state of the current expressionStack when we enter a subexpression. IRNodes is built in the Listener class. This is also used in Step 3. The Listener class is an extension of MicroBaseListener, a file generated by ANTLR for walking parse trees. Listener overrides methods from MicroBaseListener for implementing our semantic routines. This allows us to only have to write code for methods we care about. The tree is walked automatically by the code ANTLR generates. When the parse tree is completely walked, our IRNodes list contains the representation of our IR code. This is then passed directly to the class TinyGenerator for the generation of Tiny code. TinyGenerator is a simple class containing a constructor, and the method generate(). The generate() method is the main function for converting IR code to Tiny code. This utilizes a while loop for going through each item in IRNodes and a switch statement for evaluating each IRNode. The switch statement in TinyGenerator.generate() looks at the opcode of each IRNode to determine how to convert to its equivalent Tiny code.

//Addition

13

case "ADDI": case "ADDR": tiny.append(String.format("move %s %s\n", f1.getName().replace("$T","r"), f3.getName().replace("$T","r"))); tiny.append(String.format("%s %s %s\n", opcode.toLowerCase(), f2.getName().replace("$T","r"), f3.getName().replace("$T","r"))); Break;

Example of a case statement from TinyGenerator.

Altogether, these classes and methods allows our code to compile Micro to the Tiny architecture. We feel our design is easy to read, extendable, and maintainable. It’s a good basis for future work like compiling for another architecture or optimization. iii. Difficulties Step 4 proved to be the most difficult part of the project. We faced multiple difficulties in the development of semantic routines. Our primary difficulty was with producing expressions with the correct order of operations. Our original implementation of expressions resulted in an order of operations that went from right to left rather than left to right. This caused our code generation to not handle the associativity of division and subtraction correctly. If an expression had addition followed by subtraction, the subtraction would be completed before the addition causing an incorrect result. Multiple solutions were attempted but none really worked and were very confusing to understand. This error required extensive debugging and testing. For testing, we had to generate tiny code with different expressions and compare the results to what we expected. This allowed us to pinpoint where our code generation was failing. For debugging, we were able to use breakpoints in our IDE to determine if our code was executing as we expected. We were also able to use ANTLR’s parse tree visualizer to determine how an expression would be parsed. Ultimately, we had to make two changes to our IR code generation in order to fix this error. First, we had to change when we were calculating

14

expressions. Second, we had to change how we handled expressions within expressions. Originally, we attempted to calculate expressions by storing waiting results in a stack and then calculating them as we left an expr parse node. This seemed like an ideal solution, however it resulted in us being unable to calculate multiplication and division before addition and subtraction. It would simply calculate from left to right. To fix this, we determined it was best to calculate expressions as we left a factor node instead. This would correctly calculate expressions in the correct order of operations. However, this introduced another problem that we had to solve. When entering an expression surrounded by parentheses, our calculations would sometimes not produce the correct results based on what was saved in the stack. An example of an expression that caused us problems is a + ( b + c ). Our code would ​ ​ calculate this expression as (a + b) + c. ​ ​ The problem was identified by recognizing how the parse tree was generated for such an expression. Since we executed our expressions on the exit of a factor node, our code would execute on the exit of the factor node after getting to the terminal node containing b. At this point, our ​ ​ stack would contain a + so it would execute this expression and add the ​ ​ result to our stack. Our solution to this problem ended up being inspired by recursive functions. A recursive function typically implements a new stack to serve as memory for subcalls. Our solution was similar. On the enter event of an expr node, we cloned our current stack to a stack of stacks. This removed any awaiting expressions that need to be completed after a subexpression. On the exit event of an expr node, our code then pops off the stack of stacks and rebuilds the current stack to include the results of the subexpression. If needed, awaiting expressions are then called. To demonstrate this solution, we can continue to use the example a ​ + (b + c). This generates the following parse tree: ​

15

Parse tree generated from a + (b + c)

Our code will generate the stack as it reaches terminal nodes containing an ID or operator. The value of the terminal node is pushed to the stack, and the tree is continued to be walked. This results in our stack containing a + on the entrance of the primary:paranths node. Using a feature of ANTLR, we can label primary nodes containing expressions and generate an event unique to this occurrence. This allowed us to know when we were going to enter an expr node that needed to be calculated before the rest of the stack. At this point, our stack was copied and cleared. Originally, our error was occurring because the expressions contained in the stack would be executed on the exit of the factor node after getting to the terminal node containing b. Now, exiting this factor node would not execute any ​ ​ expressions since our stack didn’t have enough values. We now continue to execute our expressions as normal unless we encounter another sub expression node. This solution ended up being rather elegant and didn’t overly complicate our code. In total, it required about 10 lines of additional code. What made it difficult to solve was the necessity of having a deep understanding of how ANTLR parses the expressions, the grammar, and the resulting IR code.

16

e. Full Fledged Compiler

Our fully functioning compiler is shown in the UML diagram below. The MicroParser and MicroLexer are Java classes which are automatically generated by the ANTLR framework during the and parsing phase of execution. Our Main class serves only to create one instance of the Micro class. The Micro class in turn creates our Listener class, an extension of the MicroBaseListener class, to serve as an event recognizer to build the symbol table and IRNode data objects. Once these have been constructed, the IRNodes are passed to the TinyGenerator class to produce the final Tiny assembly code.

UML diagram of our project

Throughout this project we were sure to always add informational comments to help each other understand areas of potential confusion. We utilized an object oriented approach to implementing our compiler, using classes to represent data objects such as Symbols, IRNodes etc. Our Listener class took advantage of the auto generated files and used inheritance when recognizing events that took place during program execution. We were sure to always keep our code neat and avoided using any “hard” coding techniques whenever possible.

17

As stated above, we implemented version control into our development process via GitHub. This in turn saved us time and effort due to the ease of collaboration it provided.

4. Conclusion and Future Work

This project served as a valuable opportunity to understand exactly how a compiler works. We achieved the end goal of creating a compiler for the Micro language through the integration of the ANTLR framework and the Java programming language. Our solution worked and could successfully compile and execute code with expected output, but this is not to say that we could not have done it better. If we were to continue to work and improve our project there is in essence countless improvements to be made. We will discuss what we feel are the most important and achievable areas of improvement that should be focused on. Since the project was designed around inputs which are generally unchanging, our project had relatively no form of error handling that was beneficial to someone actually using our compiler. Throughout our code we implemented various try­catch blocks, but none of these were as informational as they could have been. If we were to continue work into this project, implementing better and more informational error handling to point users at exactly what went wrong would be one of the areas of improvement. Another area that we did not have time to properly implement was optimizing the Tiny assembly code. Optimization can be done in various ways and one of the most obvious for this project would be common subexpression elimination. Common subexpression elimination works by keeping track of “labels” which have been previously used and continuously checking to see if these labels can be re­used when constructing new assembly code. Consider the following:

A = B + C; D = B + C;

Since the label “B + C” had already been calculated and stored in A, common ​ ​ ​ ​ subexpression elimination works by instead substituting the value stored in A wherever ​ ​ the label “B + C” appears. Performing common subexpression elimination on the ​ ​ above code snippet would produce the following optimized code:

A = B + C; D = A;

18

The final area of improvement that we will discuss is register management. In today’s world of computation where memory is not generally considered to be a problem, register management and allocation is not an extremely serious issue. Regardless of memory capabilities, it is still be good coding practice to put in some sort of register management. There were many cases where after performing any sort of arithmetic operation the assembly code would assign the result twice, such as:

addr r53 r54 move r54 r55 move r55 t

If we were to continue to work on making our compiler more efficient we would look at ways to eliminate this double register assignment. These are only a few of countless areas in which we thought would make our compiler more efficient. Of course there will always be improvements as there are better and more efficient ways of doing things just waiting to be discovered. Overall, our compiler functions very well considering this was our first implementation.