<<

More on Type Checking

Type Checking done at is said to be static type checking. Type Checking done at run time is said to be dynamic type checking. Dynamic type checking is usually performed immediately before the execution of a particular operation. Dynamic type checking is usually implemented by storing a type tag in each data object that indicates the of the object. Dynamic type checking languages include SNOBOL4, LISP, APL, , , Ruby, and Python. Dynamic type checking is more often used in interpreted languages, whereas static type checking is used in compiled languages. Static type checking languages include , Java, ML, , PL/I and Haskell.

Static and Dynamic Type Checking

The choice between static and dynamic typing requires some trade-offs. Many programmers strongly favor one over the other; some to the point of considering languages following the disfavored system to be unusable or crippled.

Static typing finds type errors reliably and at compile time. This should increase the reliability of delivered program. However, programmers disagree over how common type errors are, and thus what proportion of those bugs which are written would be caught by static typing. Static typing advocates believe programs are more reliable when they have been type-checked, while dynamic typing advocates point to distributed code that has proven reliable and to small bug databases. The value of static typing, then, presumably increases as the strength of the is inc reased. Advocates of languages such as ML and Haskell have suggested that almost all bugs can be considered type errors, if the types used in a program are sufficiently well declared by the programmer or inferred by the .

Static typing usually results in compiled code that executes more quickly. When the compiler knows the exact data types that are in use, it can produce machine code that just does the right thing. Further, in statically typed languages can find shortcuts more easily. Dynamically-typed languages such as use optional type declarations for optimization for this very reason. Static typing makes this pervasive.

Statically-typed languages which lack – such as Java – require that programmers declare the types they intend a method or function to use. This can serve as additional documentation for the program, which the compiler will not permit the programmer to ignore or drift out of synchronization. However, a language can be statically typed without requiring declarations, so this is not a consequence of static typing.

Static typing allows construction of libraries which are less likely to be accidentally misused by their users. This can be used as an additional mechanism for communicating the intentions of the library developer.

1 A static type system constrains the use of powerful language constructs more than it constrains less powerful ones. This makes powerful constructs harder to use, and thus places the burden of choosing the "right tool for the problem" on the shoulders of the programmer, who might otherwise be inclined to use the most powerful tool available. Choosing overly powerful tools may cause additional performance, reliability or correctness problems, because there are theoretical limits on the properties that can be expected from powerful language constructs. For example, indiscriminate use of recursion or global variables may cause well-documented adverse effects.

Dynamic typing allows constructs that would be illegal in some static type systems. For example, eval functions that execute arbitrary data as code are possible (however, the typing within that evaluated code might be static). Furthermore, dynamic typing accommodates transitional code and prototyping, such as allowing a string to be used in place of a data structure.

Dynamic typing allows debuggers to be more functional; in particular, the debugger can modify the code arbitrarily and let the program continue to run. Programmers in dynamic languages sometimes "program in the debugger" and thus have a shorter edit-compile- test-debug cycle. However, the need to use debuggers is sometimes considered as a sign of design or development process problems.

Dynamic typing allows compilers to run more quickly, since there is less checking to perform and less code to revisit when something changes. This, too, may shrink the edit- compile-test-debug cycle.

For the programmers, dynamic typing is hard to be debugged because it checks data types at the time of execution of an operation, operations on program execution paths that are not executed are never checked. During program testing not all possible execution paths can be tested, in general. Any untested execution paths may still contain argument type errors, and these errors may only appear that a much later time during use of the program when some unsuspecting user provides input data that takes the program down an untested path.

Dynamic type checking requires that type information be kept for each data object during program execution. The extra storage required can be substantial. Dynamic type checking must ordinarily be implemented in software, since the underlying hardware seldom provides support. Since the checking must be done before each execution of each operation, the speed of execution of the program is likely to be greatly slowed.

Static type checking includes all operations that appear in any program statement. Thus, all possible execution paths are checked and further testing for type errors is not needed. Therefore, type tags on data objects at run time are not required and no dynamic type checking is needed; the result is a substantial gain in efficiency of storage use and execution speed.

2 Strong and Weak Typing

A strongly typed language does not allow an operation to succeed on arguments which are of the wrong type. An example of the absence of strong typing is a C cast gone wrong; if you cast a value in C, not only is the compiler required to allow the code, but the runtime is expected to allow it as well. This allows C code to be compact and fast, but it can make debugging more difficult.

Sometimes the term safe language is used more generally for languages that do not allow nonsense to occur. For example, a safe language will also check array bounds which can only be done dynamically.

Weak typing means that types are implicitly converted (or cast) when they are used. Example: var x = 5; // (1) var y = "hi"; // (2) x + y; // (3)

If the code above was written a weakly-typed language, such as Visual Basic, the code would run properly, yielding the result "5hi". The number 5 is converted to a string "5" to make sense of the operation. There are problems with such conversions in weakly typed languages, though. For example, would the result of the following code be 9 or "54"? var x = 5; var y = "4"; x + y

Many say that weak typing gets programmers into bad habits because it doesn' teach them to use explicit . C and PERL are weak typing. Java, Lisp, and on are strong typing.

Type Conversion and Coercion

If, during type checking, a mismatch occurs between the actual type of an argument and the expected type for that operation, then two operations arise: 1) The type mismatch may be flagged as an error, and an appropriate error action taken, or 2) A coercion (or implicit type conversion) may be applied to change the type of the actual argument to the correct type. Most languages provide type conversions in two ways: 1) As a set of built-in functions that the programmer may explicitly invoke to effect the conversion. For example, C provides the function atoi() that converts a string object to an integer data object. 2) As coercions invoked automatically in certain cases of type mismatch. In Pascal, if the arguments for an arithmetic operation such as “+” are of mixed real and

3 integer types, the integer data object is implicitly converted to type real before the addition is performed. A type conversion operation may require extensive change in the run-time storage representation of the data object. Coercions are an important issue in most languages. In Pascal and Ada, almost no coercions are provided; any type mismatch, with few exceptions, is considered an error. In C, Java, PL/I and COBOL, coercions are the rule; a type mismatch causes the compiler to search for an appropriate conversion operation to insert into the compiled code to provide the appropriate change of type. Only if no conversion is possible is the mismatch flagged as an error.

Type Declaration, Type Inference, and Type Equivalence

Many static type systems, such as C's and Java's, require type declarations: the programmer must explicitly associate each variable in a function with a particular type. Others, such as Haskell's, perform type inference: the compiler draws conclusions about the types of variables based on the operations which the function performs upon them. For instance, in a function f(x,y), if at some point in the function the variables x and y are added together, the compiler can infer that they must be numbers -- since addition is only defined over numbers. Therefore, that any call to f elsewhere in the program that gives a string or a list (.g.) as an argument would be erroneous.

Numerical and string constants and expressions in code can and often do imply type in a particular context. For example, an expression 3.14 might imply that its type is floating- point while {1, 2, 3} might imply a list of integers; typically an array.

A critical component of any type system is the mechanism that it uses to decide whether or not two different type declarations are equivalent. Consider the following two declarations in C: struct TreeNode { struct SearchNode { struct TreeNode *left; struct SearchNode *left; struct TreeNode *right; struct SearchNode *right; int value int value } }

Are TreeNOde and SearchTree the same type? Are they equivalent? Any that has a nontrivial type system must include an unambiguous rule to answer this question for arbitrary types.

4