COS 301 Topics

Programming Languages • Names • Variables • The Concept of Binding Sebesta Chapter 5 • Names, Binding and Scope • Scope and Lifetime • Referencing Environments The beginning of wisdom is to call • Named Constants things by their right names Chinese Proverb

Abstraction and the Von Neumann Machine Language groups

• Imperative languages are abstractions of the • We will consider a broad division of imperative von Neumann architecture and imperative/OO languages in discussing – Variables are an abstraction for memory; a one- names because the treatment of names falls dimensional array of cells broadly into two classes: – Some abstractions are close to the machine, for – -Like: C, C++, Java, etc (AKA the curly-brace example an n-bit integer in an n-bit machine languages) – Others require a mapping function, such as two or – Pascal-Like: Pascal, Ada, Modula, etc. higher dimensional arrays • Other major language families • Variables are characterized by attributes – Fortran – One of the most important is the type – Scripting – To design a type, must consider scope, lifetime, type checking, initialization, and type compatibility

Names Names - Length

• The terms name and identifier are roughly • Length synonymous – If too short, they cannot be connotative – Names are associated with variables, subprograms, – Language examples: modules, classes, parameters, types and other • FORTRAN 95: maximum of 31 constructs • C99: no limit but only the first 63 are significant; also, • Design issues for names: external names are limited to a maximum of 31 • C#, Ada, and Java: no limit, and all are significant – Allowable length • C++: no limit, but implementers often impose one – Are names case sensitive? – Are special words reserved words or keywords?

1 Special Characters

• Underscores • Most languages allow a few special characters – Many languages allow underscores (and possibly a few other in names (_,!,@,$, etc.) characters) – Some languages allow variable names to contain only • Some languages require them underscores – PHP: all variable names must begin with $. Names • You can write a complete program with variables _, __, ___, without $ are constants _____, etc – Often used to separate words in meaningful variable names: – Perl: all variable names begin with $, @, or %, ex. grand_total_score which specify the variable’s type – The way that names are formed is often part of a language – Ruby: variable names that begin with @ are instance culture variables; those that begin with @@ are class • GrandTotalScore (camel case) grand_total_score variables • GRANDTOTALSCORE GRAND_TOTAL_SCORE • In C one convention is to use uppercase names for constants, lower or mixed case names for variables

Special Characters Implicit Typing

• Languages typically allow only a few characters • Many use type declaration chars: foo$ is other A-Z,a-z, 0-9, with syntactic restrictions a string, foo! is a single precision float – Cobol allows hyphens in names (what sort of parsing • In FORTRAN 77 variables beginning with I,J,K,L, problem does this present?) M are implicitly integers, otherwise REAL is – C allows unrestricted use of the assumed – Fortran prior to Fortran 90 allowed embedded – COUNT is a real spaces – KOUNT is an integer • Grand Total Score is the same as GrandTotalScore – IMPLICIT statement allows mods to this rule • QuickBasic has DEFINT, DEFSNG, etc. statements to define implicit typing

Character Case Keywords and Reserved Words

• Case sensitivity • Keywords – Languages that allow only upper-case are less – An aid to readability; used to delimit or separate readable; no longer an issue statement clauses • A keyword is a word that is special only in certain – Case sensitive languages have a readability issue: contexts, e.g., in Fortran names that look alike are different – Real VarName (Real is a data type followed with a name, • Names in the C-based languages are case sensitive therefore Real is a keyword) – In Fortran this is legal: • Names in many others are not Real Integer • PHP has case-sensitive variable names, case-insensitive Integer Real function names – Pascal: const true = false – Case sensitive languages that do not require – PL/I had no reserved words! If if = then then else = if else if = else variable declarations also have a writability issue: it’s too easy to accidentally create a new variable

2 Keywords and Reserved Words The Concept of Binding

• Reserved Words • A binding is an association between an entity (such as – A reserved word is a special word that cannot be a variable) and a property (such as its value). used as a user-defined name – A binding is if the association occurs before run-time. – Potential problem with reserved words: If there are – A binding is dynamic if the association occurs at run-time. too many, many collisions occur (e.g., COBOL has • Static binding is used by most compiled languages 300 reserved words) • Interpreted languages often delay name resolution until runtime

Variables Variable Names

• A variable is an abstraction of a memory cell • Name - not all variables have them – Those that do not are typically “heap-dynamic” variables • Variables can be characterized as a tuple of • A variable name is a binding of a name to a memory attributes: address – Name – Address – Value – Type – Lifetime – Scope

Variable Addresses Variable Type

• Address - the with which it is • Type - determines the range of values of variables and associated the set of operations that are defined for values of – A variable may have different addresses at different times that type during execution – A variable may have different addresses at different places in a program – If two variable names can be used to access the same memory location, they are called aliases – Aliases are created via pointers, reference variables, C and C++ unions • Aliases decrease readability (program readers must remember all of them)

3 Variable Value L-values and R-values

• The contents of the memory cell associated • A distinction in the use of variable names first with the variable introduced in Algol 68 – Note that “memory cell” is an abstract concept • L-value - use of a variable name to denote its – Double-precision floats may occupy 8 bytes of address. physical storage in most implementations but we think of them as occupying 1 memory cell • R-value - use of a variable name to denote its value. Ex: x = x + 1 – Store into memory at address of x the value of x plus one. • On the LHS x denotes an address while on the RHS it denotes a value

Explicit Dereferencing and Pointers The Concept of Binding

• Some languages support/require explicit A binding is an association, such as between an dereferencing. attribute and an entity, or between an ML: x := !y + 1 operation and a C: x = *y + 1 • Binding time is the time at which a binding • C pointers takes place. int x,y; int *p; ; p is a pointer to int x = *p; *p = y; • Note that C is liberal about whitespace: x *y; /* y is a pointer to type x */ x * y; /* same as above */

Possible Binding Times Example: Java Assignment Statement

• Language design time -- bind operator symbols • In the simple statement to operations count = count + 1; • Language implementation time-- bind floating • Some of the bindings and binding times are point type to a representation – Type of count is bound at compile time – Set of possible values is bound at design time (but • Compile time -- bind a variable to a type in C or in C, at implementation time) Java – Meaning of + is bound at compile time, when types • Load time -- bind a C or C++ static variable of operands are known to a memory cell) – Internal representation of the literal 1 is bound at • Runtime -- bind a nonstatic to a compiler design time memory cell – Value of count is bound at runtime with this statement

4 Static and Dynamic Binding Type Binding

• A binding is static if it first occurs before run • How is a type specified? time and remains unchanged throughout • When does the binding take place? program execution. • If static, the type may be specified by either • A binding is dynamic if it first occurs during an explicit or an implicit declaration execution or can change during execution of the program

Explicit/Implicit Declaration Dynamic Type Binding

• An explicit declaration is a program statement • Perl symbols $, @, % divide variables into three used for declaring the types of variables namespaces: scalars, arrays and hashes • An implicit declaration is a default mechanism – Within each namespace type binding is dynamic for specifying types of variables (the first • JavaScript and PHP use dynamic type binding appearance of the variable in the program) • Specified through an assignment statement • FORTRAN and BASIC provide implicit e.g., JavaScript declarations list = [2, 4.33, 6, 8]; – Advantage: writability list = 17.3; – Disadvantage: reliability – Advantage: flexibility (generic program units) • Vestiges in modern languages: – Disadvantages: – Fortran: Implicit None • High cost (dynamic type checking and interpretation) – : Option Explicit • Type error detection by the compiler is difficult

Cost of Dynamic Type Binding Type Inferencing

• Type checking at run time • In ML types are determined (by the compiler) • Every variable needs a runtime descriptor to from the context of the reference fun circumference(r) = 3.14159 * r * r; maintain current type • Here the compiler can deduce type real and • Storage for variable is of varying size produce a real result – Implying heap storage and runtime garbage fun cube(x) = x * x * x; collection • Here the compiler will deduce type int. • Languages with dynamic type binding are – The default numeric type is int. usually implemented as pure interpreters – If we call cube with a real number we have a runtime error. – Cost of may hide cost of dynamic type cube(3.14159); checking • Solution fun cube(x) : real = x * x * x;

5 Storage Bindings and Lifetime Four categories of lifetime

• Allocation – allocating memory from some pool • For scalar variables of available memory – Static • Deallocation - putting a cell back into the pool – Stack-dynamic • The lifetime of a variable begins when memory – Explicit heap-dynamic is allocated and ends when memory is – Implicit heap-dynamic deallocated

Static Variables The static Keyword

• Static variables are bound to addresses before • In C and C++ a variable declared as static execution begins and remain bound until inside a function retains its value over function program terminates invocations: – Most global variables are static int hitcount() { static int count = 0; – Some languages such C/C++ support local static return ++count; variables } • Advantages: • But inside C#,C++ and Java class definitions the – efficiency (direct addressing), history-sensitive static modifier means that the variable is a subprogram support class variable rather than an instance variable • Disadvantage: lack of flexibility – A language with ONLY static variables does not support recursion

Global Variables in Java Stack Dynamic Variables

• Java does not allow declaration of variables • Storage bindings are created for variables when their outside of any scope, so C/C++ style globals declaration statements are elaborated. – A declaration is elaborated when the executable code cannot be used associated with it is executed • Solution is to use public static variables in a • For scalar variables, all attributes except address are statically bound class – local variables in C subprograms and Java methods public class GlobalData { – Note that while actual memory address is not statically bound public static int usercount = 0; a relative address (stack offset) is statically bound public static long hitcount = 0; • Advantage: allows recursion; conserves storage } • Disadvantages: – Overhead of allocation and deallocation – Subprograms cannot be history sensitive – Inefficient references (indirect addressing)

6 Explicit Heap Dynamic Variables Explicit Heap Dynamic variables

• Explicit heap-dynamic -- Allocated and • The main disadvantages are deallocated by explicit directives, specified by – Failure to deallocate storage results in memory leaks – Other difficulties in using variables correctly (dangling the programmer, which take effect during pointers, aliasing) execution – Heap fragmentation • Referenced only through pointers or • C# provides implicit deallocation references, e.g. dynamic objects in C++ (via • C++ style pointers can be used but header of any ), all objects in Java method that defines a pointer must include the int *intnode; //c or c++ keyword unsafe intnode = new int; . . . delete intnode; • With explicit heap dynamic variables we do not need to incur the overhead of garbage collection

Implicit Heap Dynamic Variables Variable Attributes: Scope

• Implicit heap-dynamic—bound to heap storage by • The scope of a variable is the range of by assignment statements statements over which it is visible – All attributes are bound by at this time • The nonlocal variables of a program unit are – all variables in APL; all strings and arrays in Perl, those that are visible but not declared there JavaScript, and PHP • The scope rules of a language determine how • Names are in effect just holders for pointers references to names are associated with • Advantage: flexibility (generic code) variables • Disadvantages: • A common source of error is accidental – Inefficient, because all attributes are dynamic modification of a non-local variable – Loss of error detection

Static scope Static Scope – non local names

• In static scoping, a name is bound to a collection • To connect a name reference to a variable, you of statements according to its position in the (or the compiler) must find the declaration source program. • Search process: search declarations, first • Also called lexical scoping – based on grammatical structure of the program locally, then in increasingly larger enclosing • Algol 60 introduced nested scoping, including scopes, until one is found for the given name nested functions and a begin-end block of code • Enclosing static scopes (to a specific scope) are that could include declarations and functions called its static ancestors; the nearest static • Static scope is determined at compile time and ancestor is called a static parent does not vary with the execution history of a program • Static scope is used by most modern languages

7 Nested and Disjoint Scopes Lexical Units of Scope

• Name collision becomes increasingly important Algol C Java Ada as programs and memories become larger Package n/a n/a yes yes

– Names with limited scope can be reused elsewhere Class n/a n/a nested yes • Two different scopes are either nested or Function nested yes yes nested

disjoint. Block nested nested nested nested • In disjoint scopes, same name can be bound to no no yes automatic different entities without interference. • This table is not definitive • Nested scopes are like nested loops: one scope • Note that the compilation unit always constitutes a is contained within another scope in addition to the above units • What constitutes a unit of scope?

Nested Subprograms Nested Subprograms (Pascal)

• Some languages allow nested subprogram definitions, which create nested static scopes (e.g., Ada, function Foo(i: integer): integer JavaScript, Fortran 2003, Pascal and PHP) function Bar(x: integer): integer • Varying levels of support in Javascript, Actionscript, begin Perl, Ruby, Python bar := i * x • In other languages nested scopes can still be created: end – Java and C++ in nested classes and in blocks begin Foo := Bar(42) – C in blocks end • Considered a useful tool for encapsulation and isolation of concerns by limiting visibility – The OO paradigm provided a different mechanism for writing functions with limited visibility

Name hiding Ada Example

• In nested scopes variables in an outer scope procedure Main is References to x in Main and p3 refer x : Integer; can be hidden from an inner by declaring a procedure p1 is to the integer x declared in Main variable with the same name x : Float; procedure p2 is • Ada allows access to these "hidden" variables begin ... x ... References in p1 and p2 refer to the – E.g., unit.name end p2; float x declared in p1 begin ... x ... end p1; procedure p3 is But we could use explicit main.x in begin p1 or p2 to refer to the integer ... x ... end p3; variable Begin ... x ... End Main;

8 Block Scope Declaration Order

• A method of creating static scopes inside • C99, C++, Java, VB.NET and C# allow variable program units--from ALGOL 60 declarations to appear anywhere a statement • Example in C: can appear void sub() { int count; – In C99, C++, and Java, the scope of all local variables while (...) { is from the declaration to the end of the block int count; – In C#, the scope of any variable declared in a block is count++; ... the whole block, regardless of the position of the } declaration in the block … } • However, a variable still must be declared before it can be used • Java allows block scope but does not allow • Other languages require all variables to be variables in the immediate enclosing scope to redeclared in the block declared before any executable statements – Which is more readable? Writable?

For Loop control control variables Global Scope

• In C++, Java, and, VB.NET C#, for loop control • C, C++, PHP, and Python support a program variables can be declared in for statements structure that consists of a sequence of function – The scope of such variables is restricted to the for definitions in a file construct – These languages allow variable declarations to appear For (int k = 0; k < j; k++{ outside function definitions . . . – Variables declared outside of any function are usually } allocated in static storage • Ada provides automatic block score for loop • C and C++have both declarations (just attributes) control variables in for statements and definitions (attributes and storage) – A declaration using extern specifies definition in another file extern int c

Global Scope - PHP PHP example

– Programs are embedded in XHTML markup documents, in any function DoMySQLiQuery($sql, $result_type = MYSQL_BOTH) { number of fragments, some statements and some function global $mysqli; definitions $result = mysqli_query($sql, $mysqli); – The scope of a variable (implicitly) declared in a function is if(mysqli_errno($this->mLink)) die(mysqli_error($mysqli)); local to the function $phpResult = array(); – The scope of a variable implicitly declared outside functions is while ($row = mysqli_fetch_array($result, $result_type)) { from the declaration to the end of the program, but skips over $phpResult[] = $row; any intervening functions } mysqli_free_result($result); – PHP is unusual in that global variables are not implicitly visible if(mysqli_errno($mysqli)) to functions die(mysqli_error($mysqli)); return $phpResult; • Global variables can be accessed in a function through the } $GLOBALS array or by declaring it global • Because PHP does not require variable declaration, a reference to a global is normally assumed to be a hiding declaration of a new variable

9 Global Scope - Python Evaluation of Static Scoping

• A can be referenced in • Works well in many situations functions, but can be assigned in a function • Problems: only if it has been declared to be global in the – In some cases, too much access is possible function – As a program evolves, the initial structure is destroyed and local variables often become global; subprograms also gravitate toward become global, rather than nested – Use of global variables is often frowned upon because of unexpected side effects and changes • But sometimes global variables are the most efficient solution to a problem (ex. )

Dynamic Scope Scope Example

• Based on calling sequences of program units, Big - declaration of X not their textual layout (temporal versus Sub1 - declaration of X - spatial) ... call Sub2 • References to variables are connected to ... Big calls Sub1 declarations by searching back through the Sub1 calls Sub2 chain of subprogram calls that forced Sub2 Sub2 uses X ... execution to this point - reference to X - ...

... call Sub1 …

Scope Example Scope and Lifetime

• Static scoping • The lifetime of a variable is the time interval during which the variable has been allocated a block of – Reference to X is to Big's X memory. • Dynamic scoping • Early languages Fortran and Cobol used static (compile – Reference to X is to Sub1's X time) allocation. • Evaluation of Dynamic Scoping: – Memory was allocated in a global memory area – Use of static global memory for function parameters and – Advantage: convenience return addresses means that recursive functions cannot exist – Disadvantages: – All variables existed for the duration of the program 1. While a subprogram is executing, its variables are • was the programmer’s visible to all subprograms it calls responsibility 2. Impossible to statically type check – Early machines had very limited memory space e.g., IBM 1130 3. Poor readability- it is not possible to statically 32KB; IBM 360 64KB determine the type of a variable

10 Dynamic stack variables When Scope != Lifetime

• Algol introduced the notion that memory should be • With static allocation a variable never allocated/deallocated at scope entry/exit. “forgets” its value, even variables declared • This allows recursive functions to exist because the within a function scope local memory for the function is allocated when the function is invoked • With dynamic allocation variables are created • Almost all languages use a stack for function local and destroyed as the program runs. memory – Memory allocated on function entry is returned on – Structure is often called a “stack frame” exit and will be overwritten – Contains parameters, return addresses, local or automatic • Most languages provide mechanisms that can variables, pointers to stack frames for caller and/or outer scope be used to break the scope equals lifetime • For dynamic stack variables, scope == lifetime rule.

Counting function invocations Referencing Environments

• It is sometimes handy to know how often a particular function has • The referencing environment of a statement is the been called: collection of all names that are visible in the Double func() { count++ statement . . . • In a static-scoped language, it is the local variables } plus all of the visible variables in all of the enclosing • But if count is declared inside func it will be recreated with every scopes invocation • So we do this: • A subprogram is active if its execution has begun but Double func() { has not yet terminated static int count = 0; • In a dynamic-scoped language, the referencing count++ . . . environment is the local variables plus all visible } variables in all active subprograms • Note that count is initialized to 0 during compilation (not runtime) and is never reinitialized

Example Dynamic Scope Example procedure Example is • Referencing Environments void sub1(){ • Referencing Environments A, B : Integer; int a,b; ... • At point 1: ... <------1 • At point 1: procedure Sub1 is X and Y of Sub1, A and B of Example } // end sub1 a and b of sub1, c of sub2, d of main (c X, Y : Integer; • At point 2: void sub2(){ of main and b of sub2 are hidden) begin -- of Sub1 X of Sub3 (X of Sub 2 is hidden), Z of int b,c; • At point 2: ... <------1 Sub3, A and B of Example ... <------2 b and c of sub2, d of main (c of main is end -- of Sub1 sub1(); hidden) procedure Sub2 is • At point 3: } // end sub2 X, Z : Integer; X of Sub 2, A and B of Example void main(){ • At point 3: procedure Sub3 is • At point 3: int c,d; c and d of main X : Integer; ... <------3 A and B of Example begin -- of Sub3 sub2(); ... <------2 } // end main end -- of Sub3 begin -- of Sub2 ... <------3 end -- of Sub2 begin -- of Example ... <------4 end -- of Example

11 Named Constants Named Constants

• A named constant is a variable that is bound to a value • Named constants in some languages must be declared only when it is bound to storage with constant valued expressions • Advantages: readability, parameterization and – Fortran 95, C, C++ (with #define) modifiability • Storage need not be allocated for such constants – It’s more readable to write pi than 3.14159 • Other languages allow dynamic binding of values to – It’s easier to modify #define question_count 129 than named constants to change the magic number 129 wherever it occurs const int elementcount = rows * columns; – This provides a crude form of parameterization • Ada, C++, and Java: expressions of any kind • The binding of values to named constants can be either • C# has two kinds, readonly and const static (called manifest constants) or dynamic - the values of const named constants are bound at compile time - The values of readonly named constants are dynamically bound

Initialized Data

• It is often convenient to initialize variables with known values at compile-time • Many languages allow initial values to be specified in a variable declaration: int x = 0; int c[5] = {10,20,30,40,50} int * foo = c; /* foo is an alias of c */ • Data can only be initialized with literal values or expressions that can be evaluated before runtime • In compile languages, initialized data becomes part of the executable image

12