Bali++ Language Specifications

CS212

Fall 2006

1 Introduction

1.1 Bali Specifications

As discussed in lecture, we can specify a language in terms of its syntax (spelling and structural rules) and semantics (meaning). Although formal description methods exist for both kinds of specifications, we will only formalize the Bali syntax, as developed in Section 2. Bali’s semantics are informally defined in Section 3.

1.2 Notation

We use the following notation throughout this document:

• Specific language elements, like keywords, operators, and punctuation, are shown as bold.

• Other terminals that can have a range of values, such as integers and characters, are shown as bolditalic.

• Non-terminals in production rules are shown as plain red.

• Square brackets are used to denote optional items. [ int ] indicates that the keyword int can be present but is not required.

• A single | denotes that either the item on the left or the right can be present. So a | b indicates that either a or b must be present, but not both.

• Parentheses are used to options. Thus, ( a | b ) indicates that c must be prefixed by either a or b. Square brackets can also be used for grouping with the difference that the group would be optional.

• An (*) is used to indicate zero or more occurrences of this item and a plus (+) is used to indicate one or more occurrences of this item.

• An arrow (→) indicates a production rule, which means can be expressed as.

(please refer also to the lecture notes for further information regarding regular expressions)

1 2 Bali Syntax

2.1 Program and Functions

program →struct*[globals varBlock ;] func* program can have global variables and functions func → funcHeader ( varBlock ; )* statement function has header, optional local variables, and a statement funcHeader → type name ( [ varBlock ] ) header has return type, function name, and optional parameters struct → structdef name { ( varBlock ; )* } a struct has a name and member variables

2.2 Variables and Naming

varBlock → type name ( , type name )* one or more variable definitions type → ( int | boolean | char | void* | struct name*)[*]* primitive types, pointers and struct name → (a-z|A-Z)(a-z|A-Z|_|0-9)* names must begin with a letter

2.3 Statements statement → { statement* } block of statements statement → if ( expression ) then statement [ else statement ] conditional statement statement → while ( expression ) statement while loop statement → expression; expression statement statement → print expression; output statement → return expression; return statement

2 2.4 Expressions

expression → expPart [ binop expPart ] basic expression expression → expPart := expression assignment expression → expPart ? expPart : expPart ternary/conditional operator expPart → unaryop expPart unary expPart → int | boolean |’char’ | ”string” | null constant expPart → readInt() integer input expPart → readChar() character input expPart → malloc(expression) memory allocation expPart → free(expression) memory deallocation expPart → expPart type cast expPart → sizeof(type | struct name) size of operator expPart → name variable access expPart → name ( [ expression ( , expression )* ] ) function call expPart → ( expression ) nested expression binop → arithmeticOp | comparisonOp | booleanOp | structOp binary operators arithmeticOp → + |-| * | / | % operators comparisonOp → == | > | < comparison operators booleanOp → && | || boolean operators (both are short-circuiting) unaryOp → - | ! | & | * unary operators structOp → -> member access operator

3 Bali Semantics

The following section details the semantics of the Bali language. If the following restrictions are violated, the compiler must throw a SemanticException (as opposed to a SyntaxException for violations of syntax above).

3.1 Reserved Keywords

The following keywords may not be used as variable or function names: globals, int, boolean, char, if, then, else, while, print, return, readInt, readChar, true, false, malloc, free, sizeof, null, struct, structdef.

3.2 Namespaces

A namespace is a set of unique identifiers (names). The following namespaces exist in Bali:

• global namespace (global variables, struct names, and functions)

• local namespaces (arguments and local variables of a function)

• member namespace (the members of a struct)

Furthermore, the following semantic rules apply to these namespaces

• Declaring two or more symbols with the same name within the same namespace is not allowed.

3 • Symbols with the same name may be declared in different namespaces.

• Global variables, structs, struct members and functions can be accessed anywhere where it is syntacticaly valid (even before they are declared). Local and argument variables can only be accessed within the function in which they are defined. It is semantically invalid to access a function, struct, member or variable that has not been defined in the program.

• If a variable access occurs in a function and its name was declared both in that function’s local variable names- pace and in the global name space, that variable access refers to the local variable not the global variable.

3.3 Expressions

Bali is a strongly typed language, and as such enforces restrictions on how expressions may be combined. Each expression has a type, and each operator will only work with specific types. These typing rules are given bellow.

Binary Operators

• Boolean operators accept and produce booleans.

• The equality comparison operator, ==, accepts two of the same type and produces a boolean result. All other comparison operators accept only integers and produce a boolean result.

• Arithmetic operators accept integers and produce an integers with the following exceptions

– The operator allows the addition of an integer and a pointer. The pointer can be on either side of the operator. The result is a pointer of the same type as the input pointer with the value of the input pointer plus the input integer times the sizeof(input pointer type). – The subtraction operator allows the subtraction of an integer from a pointer. A pointer cannot be subtracted from an integer. The result is a pointer of the same type as the input pointer with the value of the input pointer minus the input integer times the sizeof(input pointer type). – The subtraction operator allows the subtraction of one pointer from another pointer. Both must be of the same type. The result is an integer and is the difference of the address between the two pointers divided by sizeof(input pointer type).

• Division is integral for integers.

• Binary operations should be evaluated starting with the expression on the left and then the expression on the right. So for a + b, a should be evaluated first, then b, and then their results should be added.

• The operator -> is used to access a member of a struct. The left hand side must be of a struct pointer type while the right hand side must be a name which is a valid member of that struct. The result of the operator is of type equal to the type of member accessed.

Assignment Operator

The left side of an assignment operator must be a member access, a variable, or a dererenced pointer. (A member access is single binary expression, potentially contained within parentheses, where the operator is member access operator.) If the left hand side is a member access or a variable the right hand side must be of the same type as the

4 member being accessed or the variable. In these cases the assignment operator assigns the value of the right hand side to the member being access or the variable on the left hand side. If the left hand side is a dereferenced pointer (a pointer being operated on by the *) the assignment expression assigns the right side to the address the pointer is pointing to. A dereferenced pointer of type T-pointer on the left side of the expression must be accompanied by an expression of type T on the right side.

Ternary/Conditional Operator

The ternary operator is of the form ex1 ? ex2 : ex3. ex1 must be of type boolean. ex2 and ex3 must be of the same type T. Furthermore, the operator produces a result of T. If ex1 is true, then the operator evaluates only ex2. Otherwise, only ex3 is evaluated.

Unary Operators

The unary operator “negate” accepts and produces only integers and the unary operator “not” accepts and produces only booleans. The & reference operator takes any variable or member access with type T and produces a pointer to it with type T-pointer. The * dereference operator works on a value with type T-pointer and returns the value pointed at as type T. Void-pointer cannot be dereferenced, and pointers to a struct type with a single level of indirection cannot be dereferenced (for instance, if there is a struct foo, then pointers of type foo** can be dereferenced, but pointers of type foo* cannot be dereferenced).

Constants

Any Java integer is a constant of type int. The keywords true and false are constants of type boolean. Any Java character delimited by single quotes (’) is a constant of type char. The null keyword is a constant of type void*, which represents the memory address 0. Strings are defined to be a ’\0’ terminated array of characters, automatically allocated on the heap. Strings must be freed (by the bali programmer – not by your compiler) after use. Note that SaM string instructions null-terminate the string. String constants are of type char*.

Casts

The type of expressions that can be cast is restricted to pointers. The cast expression part changes the type of the input to the one specified in the cast. Any pointer type can be cast to any other pointer type. However, casts between non-pointers and pointers (as well as two non-pointer types) are not allowed.

Variables and Function Calls

• The built in function readInt requests an integer from the user and returns an integer. readChar requests a character from the user and returns a character.

• The malloc function allocates a block of memory of the provided size (which must be an integer) and returns a pointer of type void*.

• The free function deallocates a block of memory, determined by the argument passed. The argument passed to free should be a pointer (of any type) to the start of a memory block that has been previously returned by a call to malloc. The return value of free is always the integer 0.

5 • The sizeof built in function takes as input a type (either primitive or a struct), and returns its integer size minus the number of memory locations it takes up. Notice that here it is valid to use a struct name without a pointer. The typical way this works is: struct str_name * var = malloc(sizeof(struct str_name)); Note that the size of all pointer and primitive types is one. The size of a struct is the number of variables (ie: member fields) that struct holds.

• Variables have the type that they were defined to have.

• Function calls have the type of the return type of the function being called. Furthermore, function arguments should be evaluated from left to right. (For example, in the call “foo(x + 1,y + 2)” first “x + 1” should be evaluated and secondly y + 2 should be evaluated.)

3.4 Statements

• while statements require an expression of boolean type and have the same looping functionality as in Java.

• if statements require an expression of boolean type and have the same conditional functionality as in Java (just the added then keyword).

• A print statement requires an expression of integer or character type and then prints the output to SaM’s console.

• A return statement occuring within a particular function requires an expression of that function’s return type.

• An expression statement can have an expression with any return type, since the return value is eliminated.

3.5 Functions

• All functions contain an implicit return statement at the end of the function that returns 0 for integers, ’\0’ for characters, false for booleans and null casted to the retern type for pointers. Thus a function will return something even if no specific return statement is encountered.

• There must be one function called main with a return type of int and no parameters. This function will be called on startup.

6