<<

Choice of language Wednesday, October 24, 2012 9:37 AM

Perhaps the most important decision in a project: choice of language Determines many aspects of implementation, testing, debugging. A hierarchy of choices: Execution style. Memory management style. Support for particular programming styles.

Language Page 1 Controversies Wednesday, October 24, 2012 4:32 PM

Controversies One can write any program in any language, but the choice of language for a specific task is very controversial.

Many so-called religious issues are grounded in real facts.

Language Page 2 Basic implementation principles Wednesday, October 24, 2012 10:01 AM

Basic implementation principles Implement guard clauses to assert preconditions and protect against unexpected occurrences and conditions. Avoid global variables. Avoid side-effects: functions should compute their return values and do nothing else. Handle exceptional conditions through some form of exception handling.

Why do we adhere? Because there's some thought that it's cheaper.

Languages differ on whether and how these principles are supported. Some languages support these practices. Others actively work against them.

Language Page 3 Why is doing these things "good"? Wednesday, October 24, 2012 2:43 PM

Guard clauses localize faults outside a protected . Global variables are implicit arguments to every subroutine in their scope (even if not accessed) Side-effects are difficult to debug, because they aren' part of the main data flow of program. Exceptions allow the programmer to determine accurately why a program crashed.

Language Page 4 A simple language taxonomy Wednesday, October 24, 2012 9:47 AM

A simple language taxonomy

Execution model: Native: runs actual machine code on the physical machine. By static compilation into that code (, C++) By just-in-time (JIT) compilation of intermediate code into native code (C#, J#) Virtual: compiled code runs inside a virtual machine. Java runs bytecode in the (JVM).

Heap management model Explicit free: programmer must free storage. C, C++. Automatic free: storage is freed when no longer referenced: Java, J#, C#, , ...

Language Page 5 Why so many options? Wednesday, October 24, 2012 9:55 AM

Why so many options? Languages trade ease of programming and portability for speed of execution. C, C++: "fastest" languages, "most difficult" to use. Java, C#, J#: "slower" languages, "easier" to use, more "portable". PHP, Javascript, , Ruby: "slowest" languages, "easiest" to use.

The easier it is to write a program, the harder it is to make it robust and maintainable!

Language Page 6 Some simple rules of thumb Wednesday, October 24, 2012 10:12 AM

Some simple rules of thumb for execution speed: Native languages execute about 10x faster than interpreted languages on same hardware. Automatic free based upon access counts makes every object access take 2x to 3x compared to explicit free. Even for objects that are not in the heap.

Language Page 7 Case study 1: memory allocation Wednesday, October 24, 2012 2:57 PM

Case study 1: memory allocation C's strength: speed. C's weakness: pointers.

Language Page 8 Summary of differences between C and Java Tuesday, October 23, 2012 11:43 AM

Summary of differences between C and Java memory allocation C, C++ Java, C#, J# explicit free implicit free no storage descriptors storage descriptors pointer aliases possible pointer aliases not possible pointer typecasts pointer typecasts necessary illegal unions possible unions disallowed array bounds checking array bounds not supported checking possible heap validity checking heap validity important checking irrelevant

Language Page 9 The C allocation model Tuesday, October 23, 2012 11:42 AM

The C allocation model: use malloc(3) to get a block of memory, aligned so that it can be used in a variety of ways. decide how to use it outside of malloc. must explicitly free storage that one no longer needs. memory manager has no concept of memory use.

Language Page 10 Dynamic memory allocation in C Tuesday, October 23, 2012 11:54 AM

Dynamic memory allocation in C int *p = (int *) malloc(10*sizeof(int));

.... free(p);

Facts about dynamic memory allocation: p[10]=1 and p[-1]=1 shouldn't work, but sometimes seem to work and cause memory corruption that manifests later. free(p) does not reset p, nor does it deallocate the memory at *p. Real effect of free(p): reuse *p.

Language Page 11 Dynamic memory allocation in C++ Wednesday, October 24, 2012 2:13 PM

Dynamic memory allocation in C++ int *p = new int[10]; Exact same meaning as C malloc expression, except call the constructor for the base object as required. delete p; Same as free(p), but call the destructor for each base object beforehand.

Language Page 12 C and C++ memory allocation Wednesday, October 24, 2012 2:15 PM

C and C++ memory allocation Syntax for the languages differ, but heap semantics are the same for both. C++ does pointer casts, initializations invisibly. C makes you do them explicitly. Otherwise, same game in both cases.

Language Page 13 How does C++ differ from C? Tuesday, October 23, 2012 12:22 PM

How does C++ differ from C? Java-like syntax: int *p=new int[10]; C semantics: same problem as in C.

Language Page 14 Pointer code debugging Wednesday, October 24, 2012 2:48 PM

Pointer code debugging Most common C/C++ pointer problems Lost pointer/memory leak: there is no remaining pointer to an allocated memory object. Pointer alias: two pointers point to the same place, but should point to different places. Pointer corruption: a pointer points somewhere invalid.

Language Page 15 A classical C programming problem: memory aliases Tuesday, October 23, 2012 12:10 PM

A classical C programming problem: memory aliases int *p, *q; p = (int *) malloc(sizeof(int)); *p=2; free(p); q = (int *) malloc(sizeof(int)); // now *p is the same location as *q! *p=5; // works and changes *q, but shouldn't!

Language Page 16 A worst-case pointer alias scenerio Tuesday, October 23, 2012 12:14 PM

A worst-case scenerio The worst that can happen: corruption of a pointer structure struct elem { struct elem *next; int data; } *head=NULL; void link(int d) { struct elem *p = (struct elem *)malloc (sizeof(struct elem)); p->data=d; p->next=head; head=p; } void unlink() { free(head); };

Consider what happens when we do the following: link(1); link(2); unlink(); link(3);

Language Page 17 Memory leaks Wednesday, October 24, 2012 3:20 PM

Memory leaks A memory leak occurs any time one loses a pointer to an allocated block of memory. .g., in C int *p = (int *)malloc(sizeof(int)*10); p=NULL; // without free() There is no mechanism whereby one may recover the pointer addresses and free them!

Language Page 18 What is heap validity checking? Tuesday, October 23, 2012 11:49 AM

A common form of C debugging: heap validity checking Check programs for common heap errors writing to a pointer out of range. referencing freed storage. Key to all successful approaches: replace malloc with a debugging version keep track of more than malloc (run slower) check the validity of the whole heap during every reference. catch access errors as early as possible.

Language Page 19 Strengths of heap validity checking Wednesday, October 24, 2012 5:04 PM

Corruption is caught early, before it causes chained errors. We have a better chance of tracing back a few steps to the cause, than back a lot of steps.

Language Page 20 Weaknesses of heap validity checking Wednesday, October 24, 2012 5:05 PM

Two wrongs can be mistaken for a right! Does not always detect problems. It may not detect them fast enough; it runs during calls to malloc and free.

Language Page 21 Validity checking methods Tuesday, October 23, 2012 11:49 AM

Two prevalent windows validity checking methods: electric fence (efence) valgrind

Language Page 22 What valgrind does Wednesday, October 24, 2012 2:53 PM

What valgrind does Take over malloc (replace its dynamic library) Implement primitive storage descriptors for malloc blocks Set traps for poor programming style Array references out of range. Accessing previously freed storage. Monitor the traps for common programming mistakes. Stop as soon as possible after an error is found.

Language Page 23 So, why use Java? Wednesday, October 24, 2012 3:02 PM

So, why use Java? Bad news: 10x slower. Good news: the problems that valgrind discovers simply disappear and cannot happen by construction.

Language Page 24 Java references versus C pointers Wednesday, October 24, 2012 3:03 PM

Java references versus C pointers C pointers Java references Ambiguous: can Unambiguous: used for one point to scalar or kind of thing. array Can be cast to One type, set at any type compilation time Can point to Constrained by the invalid locations language to only point to valid locations (or null) No concept of Range checking is practical range checking No concept of Runtime knowledge of type storage through storage descriptors descriptors

Language Page 25 The java allocation model Tuesday, October 23, 2012 11:42 AM

The java allocation model Allocate structures of specific types Document type of each object in a storage descriptor Discover memory that is no longer needed by "mark and sweep" or "access counting". Automatically free unreferenced storage.

Language Page 26 What is a storage descriptor? Tuesday, October 23, 2012 11:48 AM

What is a storage descriptor? Information that is recorded when operator new is called in java. For example: int a[] = new int[10]; Storage descriptor contains: Type of storage object. Length of storage object. Types of sub-components (as applicable). Storage descriptors have several effects: Can determine which storage objects are active: these have valid pointers to them. No need for an explicit free function call: use garbage collection (a.k.a storage reclamation) instead.

Language Page 27 Two forms of storage reclamation Wednesday, October 24, 2012 10:06 AM

Two automatic forms of storage reclamation Mark and sweep: starting with all known named variables (stack and global), traverse the tree of references, mark everything that is reachable as used. Free everything else. (LISP and older high-level languages) Slow, but takes no extra memory. Access counts: in every object's storage descriptor, keep track of how many things point to it. When initializing a pointer, increment the access count for where it points. When changing a pointer, decrement the access count of what it points to first. When the access count is 0, free the object. (Many Java versions, J#, C#, D) Reclamation is much faster, but every pointer update is slower.

Language Page 28 Detailed picture of mark and sweep Wednesday, October 24, 2012 3:24 PM

Language Page 29 Detailed picture of access counting Wednesday, October 24, 2012 2:16 PM

Language Page 30 Which one to use? Wednesday, October 24, 2012 3:35 PM

Which one to use? Mark and sweep does reclamation faster, but the user perceives a wait. Access counts do reclamation perceptably faster, at the cost of slowing down the whole program's execution.

Language Page 31 What does the java memory model do? Wednesday, October 24, 2012 3:25 PM

What does the java memory model do for us? Pointer aliases are impossible. (no free function, no ability to create the situation!) Lost pointers are impossible. (would be found during reclamation!) The language does not allow one to corrupt a pointer!

Language Page 32 End of lecture on 10/24/2012 Wednesday, October 24, 2012 5:33 PM

Language Page 33 From the point of view of memory allocation Wednesday, October 24, 2012 2:16 PM

So, from the point of view of memory allocation, C gets you into situations that java simply can't create. You must trade the speed of C against the safety of java.

Language Page 34 Pointer aliases are impossible in java Tuesday, October 23, 2012 12:25 PM

How java fixes the pointer problem: int p[], q[]; p = new int[1]; p[0]=2; // free(p); no analogue in java p=null; // effectively, forces p's data to be freed later. q = new int[1]; // only reuses p's storage if // - p is set to null // - garbage collection has occurred // now p[0] cannot be the same location as q[0]! q[1]=42; // works as expected p[1]=0; // doesn't work: null pointer exception

Language Page 35 So, C's speed comes at an incredible cost Wednesday, October 24, 2012 2:55 PM

So, C's speed comes at an incredible cost Must deal with pointers and pointer problems. Must cope without storage descriptors. More bugs, more effort.

Language Page 36 End of lecture on 10/24/2012 Wednesday, October 31, 2012 6:15 PM

Language Page 37 Strong and weak typing Wednesday, October 24, 2012 2:39 PM

Strong and weak typing Strong typing: a subroutine must be called with arguments of the appropriate type. True strong typing allows no implicit casts at all. Truly strongly typed languages: Ada. Weak typing: arguments to a subroutine can be anything. True weak typing: can pass any object whatsoever to a subroutine. Neither strong nor weak: C ability to pass an argument to a subroutine as a string of bits.

Language Page 38 Advantages and disadvantages of strong typing Wednesday, October 24, 2012 2:41 PM

Advantages and disadvantages of strong typing Advantage: type automagically forms part of the guard clause. Disadvantage: must be explicit about types in code; cannot allow the to do the work.

Language Page 39 C# Wednesday, October 24, 2012 3:41 PM

C# syntax like C++ storage semantics of Java aspects: ability to program via side effects.

Language Page 40 The debate over aspects Wednesday, October 24, 2012 3:42 PM

Language Page 41 High-level languages Wednesday, October 24, 2012 3:40 PM

High-level languages Javascript: clientside web. PHP, Perl: serverside web. Ruby, Python: general-purpose.

Attributes of very high-level languages: Dynamic typing: the type of a variable is the type of its value. $foo = "yo!"; # now it's a string. $foo = 5.6; # now it's float! As weak a as is possible! Many shorthands: make the language do the work for you. Easier to write, harder to read.

Language Page 42 A taxonomy of type systems Wednesday, October 24, 2012 3:52 PM

There is no meaning to strong typing in a dynamically typed environment!

Language Page 43 A two-edged sword Wednesday, October 24, 2012 3:45 PM

A two-edged sword Languages that are easier to write make it easier to write one-time-use programs. harder to write robust code, because the language does not guard against common programming problems.

Language Page 44