Introduction

This chapter introduces the fundamental ChapterChapter 55 semantic issues of variables. – It covers the nature of names and special words in programming languages, attributes of Variables: variables, concepts of binding and binding times. – It investigates type checking, strong typing and Names, Bindings, type compatibility rules. – At the end it discusses named constraints and Type Checking and variable initialization techniques.

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 1 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 2

On Names Names • Humans are the only species to have the concept of a name. There are some design issues involving names: • It’s given philosophers, writers and computer scientists lots to do for many years. – The logician Frege’s famous morning star and evening – Maximum length? star example. – What characters are allowed? – "What's in a name? That which we call a rose by any – Are names case sensitive? other word would smell as sweet." -- Romeo and Juliet, – Are special words reserved words or keywords? W. Shakespeare – Do names determine or suggest attributes – The semantic web proposes using URLs as names for everything. • In programming languages, names are character strings used to refer to program entities – e.g., labels, procedures, parameters, storage locations, etc.

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 3 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 4 Names: length limitations? Names: What characters are allowed?

• Some examples: •Typical scheme: – FORTRAN I: maximum 6 – Names must start with an alpha and contain a mixture – COBOL: maximum 30 of alpha, digits and connectors – FORTRAN 90 and ANSI : maximum 31 – Some languages (e.g., Lisp) are even more relaxed – C++: no limit, but implementers often impose one •Connectors: – Java, Lisp, , Ada: no limit, and all are significant – Connectors might include , hyphen, dot, … • The trend has been for programming languages to allow - Fortran 90 allowed spaces as connectors longer or unlimited length names - Languages with infix operators typically only allow _, • Is this always good? reserving ., -, + as operators – What are some advantages and disadvantages of very - LISP: first-name long names? - C: first_name -“Camel notation” is popular in C, Java, C#... - Using upper case characters to break up a long name, e.g. firstName

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 5 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 6

Names: case sensitivity Special words

• Foo = foo? •Def: A keyword is a word that is special only in certain • The first languages only had upper case (why?) contexts • Case sensitivity was probably introduced by Unix and – Virtually all programming languages have keywords hence C (when?) •Def: A reserved word is a special word that cannot be used • Disadvantage: as a user-defined name • Poor readability, since names that look alike to a human are different; worse in Modula-2 because – Some PLs have reserved words and others do not predefined names are mixed case (e.g. WriteCard) – PLs try to minimize the number of reserved words • Advantages: • For languages which use keywords rather than reserved • Larger namespace, ability to use case to signify words classes of variables (e.g., make constants be in – Disadvantage: poor readability thru possible confusion uppercase) about the meaning of symbols • C, C++, Java, and Modula-2 names are case sensitive but – Advantage: flexibility – the programmer has fewer the names in many other languages are not constraints on choice of names

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 7 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 8 Reserved words in C Reserved words in Programming languages try to minimize the number of Programming languages try to minimize the number of reserved words. Here are all of C’s reserved words. reserved words. Here are all of Lisp’s reserved words.

auto enum signed break extern sizeof case float static char for struct const switch continue if typedef default int union do long unsigned double register void else Return volatile entry short while

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 9 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 10

Names: implied attributes Examples of implied attributes • LISP: global variables begin and end with an asterisk, e.g., • We often use conventions that associate attributes (if (> t *time-out-in-seconds*) (blue-screen)) with name patterns. • JAVA: class names begin with an upper case character, field • Some of these are just style conventions while others and method names with a lower case character are part of the language spec and are used by for(Student s : theClass) s.setGrade(“A”); compilers • PERL: scalar variable names begin with a $, arrays with @, and hashtables with %, subroutines begin with a &. $d1 = “Monday”; $d2=“Wednesday”; $d3= “Friday”; @days = ($d1, $d2, $d3); • FORTRAN: default variable type is float unless it begins with one of {I,J,K,L,M,N} in which case it’s integer DO 100 I = 1, N 100 ISUM = ISUM + I

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 11 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 12 Variables Attributes of Variables •A variable is an abstraction of a memory cell Variables can be characterized as a • Can represent complex data structures with lots of 6-tuple of attributes: structure (e.g., records, arrays, objects, etc.) –Name: identifier used to refer to the variable –Address: memory location(s) holding the variables value –Value: particular value at a moment

current_student –Type: range of possible values –Lifetime: when the variable can be accessed –Scope: where in the program it can be accessed

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 13 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 14

Variables: names and addresses A variable which shall remain nameless • Many languages allow one to create new data structures with • Name - not all variables have them! only a pointer to refer to them. This is like a variable that does not really have a name (“that thing over there”) , e.g.: • Address - the memory address with which it is Int *intNode; . . . associated intNode = new int; . . . – A variable may have different addresses at Delete intNode; different times during execution • Some languages (e.g., shell scripts) assume you reference – A variable may have different addresses at parameters not with names but by position (e.g., $1, $2) different places in a program • Some languages don’t have named variables at all! • If two variable names can be used to access the – E.g., if every function takes exactly one argument, we can same memory location, they are called aliases dispense with the variable name – Aliases are harmful to readability, but they are – Arguments that naturally take several arguments (e.g., +) can be reduced to functions that take only a single useful under certain circumstances argument. More on this when we talk of lisp, maybe.

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 15 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 16 Aliases Variables Type • When two names refer to the same memory • Typing is a rich subject we will talk about later. location we can them aliases. • Most languages associate a variable with a single • Aliases can be created in many ways, depend- type ing on the language. • The type determines the range of values and the • Pointers, reference variables, Pascal variant records, call by operations allowed for a variable name, Prolog’s unification, C and C++ unions, and • In the case of floating point, type usually also Fortran’s equivalence statemenr determines the precision (e.g., float vs. double) • Aliases can be trouble. (how?) • In some languages (e.g., Lisp, Prolog) a variable can • Some of the original justifications for aliases are no longer take on values of any type. valid; e.g. memory reuse in FORTRAN • OO languages (e.g. Java) have a few primitive types • Aliases can be powerful. (int, float, char) and everything else is a pointer to an • Prolog’s unification offers new paradigms object.

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 17 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 18

Variable Value lvalue and rvalue Are the two occurrences of “a” in this expression • The value is the contents of the memory the same? location with which the variable is associated. a := a + 1; • Think of an abstract memory location, rather In a sense, than a physical one. • The one on the left of the assignment refers to the – Abstract memory cell - the physical cell or location of the variable whose name is a; collection of cells associated with a variable • The one on the right of the assignment refers to the • A variable’s type will determine how the bits value of the variable whose name is a; in the cell are interpreted to produce a value. We sometimes speak of a variable’s lvalue and • We sometimes talk about lvalues and rvalues. rvalue • The lvalue of a variable is its address • The rvalue of a variable is its value

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 19 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 20 Binding Possible binding times • Def: A binding is an association, such as • Language design time, e.g., bind operator between an attribute and an entity, or between symbols to operations an operation and a • Language implementation time, e.g., bind • It’s like assignment, but more general floating point type to a representation • We often talk of binding –a variable to a value, as in • Compile time, e.g., bind a variable to a type in classSize is bound to the number of C or Java students • Link time –A symbol to an operator • Load time, e.g., bind a FORTRAN 77 variable + is bound to the inner product operation to memory cell (or a C static variable) • Def: Binding time is the time at which a • Runtime, e.g., bind a nonstatic local variable to binding takes place. a memory cell

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 21 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 22

Type Bindings Variable Declarations •Def: A binding is static if it occurs before run •Def: An explicit declaration is a program time and remains unchanged throughout statement used for declaring the types of program execution. variables •Def: A binding is dynamic if it occurs during execution or can change during execution of •Def: An implicit declaration is a default the program. mechanism for specifying types of variables •Type binding issues include: (the first appearance of the variable in the • How is a type specified? program) • When does the binding take place? • If static, type may be specified by either explicit or an implicit declarations

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 23 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 24 Implicit Variable Declarations Dynamic Type Binding • Some examples of implicit type declarations • With dynamic binding, a variable’s type can – In C undeclared variables are assumed to be of type int change as the program runs and might be re- – In Perl, variables of type scalar, array and hash begin bound on every assignment. with a $, @ or %, respectively. • Used in scripting languages (Javascript, PHP) and – Fortran variables beginning with I-N are assumed to be some older languages (Lisp, Basic, Prolog, APL) of type integer. • In this APL example LIST is first a vector of – ML (and other languages) use sophisticated type integers and then of floats: inference mechanisms LIST <- 2 4 6 8 • Advantages and disadvantages LIST <- 17.3 23.5 • Here’s a javascript example – Advantages: writability, convenience – Disadvantages: reliability – requiring explicit type list = [2, 4.33, 6, 8]; declarations catches bugs revealed by type mis-matches list = 17.3;

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 25 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 26

Dynamic Type Binding Type Inferencing

• The advantages of dynamic typing include • Type Inferencing is used in some programming languages, including ML, Miranda, and Haskell. – Flexibility for the programmer • Types are determined from the context of the – Obviates the need for “polymorphic” types reference, rather than just by assignment statement. – Development of generic functions (e.g. sort) • The compiler can trace how values flow through • But there are disadvantages as well variables and function arguments – Types have to be constantly checked at run time • The result is that the types of most variables can be deduced! – A compiler can’t detect errors via type mis- – Any remaining ambiguity can be treated as an error the matches programmer must fix by adding explicit declarations • Mostly used by scripting languages today • Many feel it combines the advantages of dynamic typing and static typing

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 27 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 28 Type Inferencing in ML Storage Bindings and Lifetime • Storage Bindings fun circumf(r) = 3.14159*r*r; // infer r is real • Allocation - getting a cell from some pool of fun time10(x) = 10*x; // infer r is integer fun square(x) = x*x; // can’t deduce types available cells // default type is int • Deallocation - putting a cell back into the pool We can explicitly type in several ways and enable the compiler to • Def: The lifetime of a variable is the time during deduce that the function returns a real and takes a real argument. which it is bound to a particular memory cell • Categories of variables by lifetimes fun square(x):real = x*x; •Static fun square(x:real) = x*x; • Stack dynamic fun square(x) = x:real*x; • Explicit heap dynamic fun square(x) = x*x:real; • Implicit heap dynamic

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 29 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 30

Static Variables Static Dynamic Variables

• Static variables are bound to memory cells • Stack-dynamic variables -- Storage bindings are before execution begins and remains bound to created for variables when their declaration statements the same memory cell throughout execution. are elaborated. • If scalar, all attributes except address are statically bound • Examples: – e.g. local variables in Pascal and C subprograms • All FORTRAN 77 variables • Advantages: • C static variables – allows recursion – conserves storage •Disadvantages: Advantage: efficiency (direct addressing), – Overhead of allocation and deallocation history-sensitive subprogram support – Subprograms cannot be history sensitive Disadvantage: lack of flexibility, no recursion! – Inefficient references (indirect addressing)

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 31 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 32 Explicit heap-dynamic Implicit heap-dynamic

Explicit heap-dynamic variables are allocated and Implicit heap-dynamic variables -- Allocation and deallocated by explicit directives, specified by the deallocation caused by assignment statements and programmer, which take effect during execution • Referenced only through pointers or references types not determined until assignment. • e.g. dynamic objects in C++ (via new and delete), all objects in Java e.g. all variables in APL Advantage: Advantage: provides for dynamic storage management – flexibility Disadvantage: inefficient and unreliable Disadvantages: Example: – Inefficient, because all attributes are dynamic int *intnode; . . . – Loss of error detection intnode = new int; . . . delete intnode;

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 33 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 34

Type Checking Strong Typing Generalize the concept of operands and operators to A programming language is strongly typed if include subprograms and assignments • type errors are always detected • Type checking is the activity of ensuring that the operands of an • There is strict enforcement of type rules with no operator are of compatible types exceptions. •A compatible type is one that is either legal for the operator, or • All types are known at compile time, i.e. are is allowed under language rules to be implicitly converted, by statically bound. compiler-generated code, to a legal type. • This automatic conversion is called a coercion. • With variables that can store values of more than •Atype error is the application of an operator to an operand of one type, incorrect type usage can be detected at an inappropriate type run-time. •Note: • Strong typing catches more errors at compile time If all type bindings are static, nearly all checking can be static than weak typing, resulting in fewer run-time If type bindings are dynamic, type checking must be dynamic exceptions.

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 35 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 36 Which languages have strong typing? Type Compatibility

Type compatibility by name means the two variables have • Fortran 77 isn’t because it doesn’t check parameters and compatible types if they are in either the same declaration or in because of variable equivalence statements. declarations that use the same type name • The languages Ada, Java, and Haskell are strongly typed. • Pascal is (almost) strongly typed, but variant records • Easy to implement but highly restrictive: screw it up. • Subranges of integer types aren’t compatible with integer types • C and C++ are sometimes described as strongly typed, • Formal parameters must be the same type as their but are perhaps better described as weakly typed because corresponding actual parameters (Pascal) parameter type checking can be avoided and unions are not type checked Type compatibility by structure means that two variables have • Coercion rules strongly affect strong typing—they can compatible types if their types have identical structures weaken it considerably (C++ versus Ada) • More flexible, but harder to implement

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 37 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 38

Type Compatibility Type Compatibility Language examples Consider the problem of two structured types. Pascal: usually structure, but in some cases name is used Suppose they are circularly defined (formal parameters)

• Are two record types compatible if they are structurally C: structure, except for records the same but use different field names? • Are two array types compatible if they are the same except Ada: restricted form of name that the subscripts are different? (e.g. [1..10] and [-5..4]) – Derived types allow types with the same structure to • Are two enumeration types compatible if their be different components are spelled differently? – Anonymous types are all unique, even in: A, B : array (1..10) of INTEGER: With structural type compatibility, you cannot differentiate between types of the same structure (e.g. different units of speed, both float)

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 39 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 40 Variable Scope Static Scope • Aka “lexical scope” • The scope of a variable is the range of statements in • Based on program text and can be determined prior a program over which it’s visible to execution (e.g., at compile time) • Typical cases: • To connect a name reference to a variable, you (or • Explicitly declared => local variables the compiler) must find the declaration • Explicitly passed to a subprogram => parameters • Search process: search declarations, first locally, • The nonlocal variables of a program unit are those then in increasingly larger enclosing scopes, until that are visible but not declared. one is found for the given name • Global variables => visible everywhere. • Enclosing static scopes (to a specific scope) are • The scope rules of a language determine how called its static ancestors; the nearest static references to names are associated with variables. ancestor is called a static parent • The two major schemes are static scoping and dynamic scoping

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 41 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 42

Blocks Examples of Blocks

• A block is a section of code in which local C and C++: Common Lisp: variables are allocated/deallocated at the for (...) { int index; start/end of the block. (let ((a 1) ... • Provides a method of creating static scopes } (b foo) inside program units (c)) • Introduced by ALGOL 60 and found in Ada: (setq a (* a a)) declare LCL:FLOAT; (bar a b c)) most PLs. begin • Variables can be hidden from a unit by ... having a "closer" variable with same name end C++ and Ada allow access to these "hidden" variables

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 43 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 44 Static scoping example Evaluation of Static Scoping

MAIN Suppose the spec is changed so that D must now MAIN access some data in B A MAIN calls A and B C A B A calls C and D Solutions: D B calls A and E C D E 1. Put D in B (but then C can no longer call it and D cannot B access A's variables) E 2. Move the data from B that D needs to MAIN (but then all procedures can access them) MAIN MAIN A B A B Same problem for procedure access!

C D E C D E Overall: static scoping often encourages many globals

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 45 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 46

Dynamic Scope Static vs. dynamic scope • Based on calling sequences of program units, not Define MAIN MAIN calls SUB1 their textual layout (temporal versus spatial) declare x SUB1 calls SUB2 • References to variables are connected to Define SUB1 SUB2 uses x declare x declarations by searching back through the chain of ... subprogram calls that forced execution to this point call SUB2 ... • Used in APL, Snobol and LISP – Note that these languages were all (initially) implemented as Define SUB2 • Static scoping - reference to x interpreters rather than compilers. ... reference x is to MAIN's x • Consensus is that PLs with dynamic scoping leads ... to programs which are difficult to read and ... • Dynamic scoping - reference call SUB1 to x is to SUB1's x maintain. ... – Lisp switch to using static scoping as it’s default circa 1980, though dynamic scoping is still possible as an option.

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 47 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 48 Dynamic Scoping Scope vs. Lifetime

• While these two issues seem related, Evaluation of Dynamic Scoping: they can differ • Advantage: convenience • Disadvantage: poor readability • In Pascal, the scope of a local variable and the lifetime of a local variable seem the same • In C/C++, a local variable in a function might be declared static but its lifetime extends over the entire execution of the program and therefore, even though it is inaccessible, it is still in memory

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 49 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 50

Referencing Environments Named Constants •A named constant is a variable that is bound to a •The referencing environment of a statement is the value only when it is bound to storage. collection of all names that are visible in the • The value of a named constant can’t be changed statement while the program is running. • In a static scoped language, that is the local variables plus all of the visible variables in all of • The binding of values to named constants can be the enclosing scopes. See book example (p. 184) either static (called manifest constants) or dynamic • A subprogram is active if its execution has begun • Languages: but has not yet terminated Pascal: literals only • In a dynamic-scoped language, the referencing Modula-2 and FORTRAN 90: constant-valued expressions environment is the local variables plus all visible Ada, C++, and Java: expressions of any kind variables in all active subprograms. • Advantages: increased readability and modifiability without loss of efficiency

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 51 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 52 Example in Pascal Variable Initialization

Procedure example; Procedure example; • For convenience, variable initialization can type a1[1..100] of integer; type const MAX 100; a2[1..100] of real; a1[1..MAX] of integer; occur prior to execution ... a2[1..MAX] of real; begin ... • FORTRAN: Integer Sum ... begin Data Sum /0/ for I := 1 to 100 do ... begin ... end; for I := 1 to MAX do ... begin ... end; • Ada: Sum : Integer := 0; for j := 1 to 100 do ... begin ... end; for j := 1 to MAX do • ALGOL 68: int first := 10; ... begin ... end; avg = sum div 100; ... • Java: int num = 5; ... avg = sum div MAX; ... • LISP (Let (x y (z 10) (sum 0) ) ... )

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 53 CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 54

Summary In this chapter, we see the following concepts being described • Variable Naming, Aliases • Binding and Lifetimes •Type variables • Scoping • Referencing environments • Named Constants • Type Compatibility Rules

CMSC331. Some material © 1998 by Addison Wesley Longman, Inc. 55