Determines Many Aspects of Implementation, Testing, Debugging

Choice of language Wednesday, October 24, 2012 9:37 AM Perhaps the most important decision in a project: choice of language Determines many aspects of implementation, testing, debugging. A hierarchy of choices: Execution style. Memory management style. Support for particular programming styles. Language Page 1 Controversies Wednesday, October 24, 2012 4:32 PM Controversies One can write any program in any language, but the choice of language for a specific task is very controversial. Many so-called religious issues are grounded in real facts. Language Page 2 Basic implementation principles Wednesday, October 24, 2012 10:01 AM Basic implementation principles Implement guard clauses to assert preconditions and protect against unexpected occurrences and conditions. Avoid global variables. Avoid side-effects: functions should compute their return values and do nothing else. Handle exceptional conditions through some form of exception handling. Why do we adhere? Because there's some thought that it's cheaper. Languages differ on whether and how these principles are supported. Some languages support these practices. Others actively work against them. Language Page 3 Why is doing these things "good"? Wednesday, October 24, 2012 2:43 PM Guard clauses localize faults outside a protected subroutine. Global variables are implicit arguments to every subroutine in their scope (even if not accessed) Side-effects are difficult to debug, because they aren't part of the main data flow of program. Exceptions allow the programmer to determine accurately why a program crashed. Language Page 4 A simple language taxonomy Wednesday, October 24, 2012 9:47 AM A simple language taxonomy Execution model: Native: runs actual machine code on the physical machine. By static compilation into that code (C, C++) By just-in-time (JIT) compilation of intermediate code into native code (C#, J#) Virtual: compiled code runs inside a virtual machine. Java runs bytecode in the Java Virtual Machine (JVM). Heap management model Explicit free: programmer must free storage. C, C++. Automatic free: storage is freed when no longer referenced: Java, J#, C#, D, ... Language Page 5 Why so many options? Wednesday, October 24, 2012 9:55 AM Why so many options? Languages trade ease of programming and portability for speed of execution. C, C++: "fastest" languages, "most difficult" to use. Java, C#, J#: "slower" languages, "easier" to use, more "portable". PHP, Javascript, Perl, Ruby: "slowest" languages, "easiest" to use. The easier it is to write a program, the harder it is to make it robust and maintainable! Language Page 6 Some simple rules of thumb Wednesday, October 24, 2012 10:12 AM Some simple rules of thumb for execution speed: Native languages execute about 10x faster than interpreted languages on same hardware. Automatic free based upon access counts makes every object access take 2x to 3x compared to explicit free. Even for objects that are not in the heap. Language Page 7 Case study 1: memory allocation Wednesday, October 24, 2012 2:57 PM Case study 1: memory allocation C's strength: speed. C's weakness: pointers. Language Page 8 Summary of differences between C and Java Tuesday, October 23, 2012 11:43 AM Summary of differences between C and Java memory allocation C, C++ Java, C#, J# explicit free implicit free no storage descriptors storage descriptors pointer aliases possible pointer aliases not possible pointer typecasts pointer typecasts necessary illegal unions possible unions disallowed array bounds checking array bounds not supported checking possible heap validity checking heap validity important checking irrelevant Language Page 9 The C allocation model Tuesday, October 23, 2012 11:42 AM The C allocation model: use malloc(3) to get a block of memory, aligned so that it can be used in a variety of ways. decide how to use it outside of malloc. must explicitly free storage that one no longer needs. memory manager has no concept of memory use. Language Page 10 Dynamic memory allocation in C Tuesday, October 23, 2012 11:54 AM Dynamic memory allocation in C int *p = (int *) malloc(10*sizeof(int)); .... free(p); Facts about dynamic memory allocation: p[10]=1 and p[-1]=1 shouldn't work, but sometimes seem to work and cause memory corruption that manifests later. free(p) does not reset p, nor does it deallocate the memory at *p. Real effect of free(p): reuse *p. Language Page 11 Dynamic memory allocation in C++ Wednesday, October 24, 2012 2:13 PM Dynamic memory allocation in C++ int *p = new int[10]; Exact same meaning as C malloc expression, except call the constructor for the base object as required. delete p; Same as free(p), but call the destructor for each base object beforehand. Language Page 12 C and C++ memory allocation Wednesday, October 24, 2012 2:15 PM C and C++ memory allocation Syntax for the languages differ, but heap semantics are the same for both. C++ does pointer casts, initializations invisibly. C makes you do them explicitly. Otherwise, same game in both cases. Language Page 13 How does C++ differ from C? Tuesday, October 23, 2012 12:22 PM How does C++ differ from C? Java-like syntax: int *p=new int[10]; C semantics: same problem as in C. Language Page 14 Pointer code debugging Wednesday, October 24, 2012 2:48 PM Pointer code debugging Most common C/C++ pointer problems Lost pointer/memory leak: there is no remaining pointer to an allocated memory object. Pointer alias: two pointers point to the same place, but should point to different places. Pointer corruption: a pointer points somewhere invalid. Language Page 15 A classical C programming problem: memory aliases Tuesday, October 23, 2012 12:10 PM A classical C programming problem: memory aliases int *p, *q; p = (int *) malloc(sizeof(int)); *p=2; free(p); q = (int *) malloc(sizeof(int)); // now *p is the same location as *q! *p=5; // works and changes *q, but shouldn't! Language Page 16 A worst-case pointer alias scenerio Tuesday, October 23, 2012 12:14 PM A worst-case scenerio The worst that can happen: corruption of a pointer structure struct elem { struct elem *next; int data; } *head=NULL; void link(int d) { struct elem *p = (struct elem *)malloc (sizeof(struct elem)); p->data=d; p->next=head; head=p; } void unlink() { free(head); }; Consider what happens when we do the following: link(1); link(2); unlink(); link(3); Language Page 17 Memory leaks Wednesday, October 24, 2012 3:20 PM Memory leaks A memory leak occurs any time one loses a pointer to an allocated block of memory. E.g., in C int *p = (int *)malloc(sizeof(int)*10); p=NULL; // without free() There is no mechanism whereby one may recover the pointer addresses and free them! Language Page 18 What is heap validity checking? Tuesday, October 23, 2012 11:49 AM A common form of C debugging: heap validity checking Check programs for common heap errors writing to a pointer out of range. referencing freed storage. Key to all successful approaches: replace malloc with a debugging version keep track of more than malloc (run slower) check the validity of the whole heap during every reference. catch access errors as early as possible. Language Page 19 Strengths of heap validity checking Wednesday, October 24, 2012 5:04 PM Corruption is caught early, before it causes chained errors. We have a better chance of tracing back a few steps to the cause, than back a lot of steps. Language Page 20 Weaknesses of heap validity checking Wednesday, October 24, 2012 5:05 PM Two wrongs can be mistaken for a right! Does not always detect problems. It may not detect them fast enough; it runs during calls to malloc and free. Language Page 21 Validity checking methods Tuesday, October 23, 2012 11:49 AM Two prevalent windows validity checking methods: electric fence (efence) valgrind Language Page 22 What valgrind does Wednesday, October 24, 2012 2:53 PM What valgrind does Take over malloc (replace its dynamic library) Implement primitive storage descriptors for malloc blocks Set traps for poor programming style Array references out of range. Accessing previously freed storage. Monitor the traps for common programming mistakes. Stop as soon as possible after an error is found. Language Page 23 So, why use Java? Wednesday, October 24, 2012 3:02 PM So, why use Java? Bad news: 10x slower. Good news: the problems that valgrind discovers simply disappear and cannot happen by construction. Language Page 24 Java references versus C pointers Wednesday, October 24, 2012 3:03 PM Java references versus C pointers C pointers Java references Ambiguous: can Unambiguous: used for one point to scalar or kind of thing. array Can be cast to One type, set at any type compilation time Can point to Constrained by the invalid locations language to only point to valid locations (or null) No concept of Range checking is practical range checking No concept of Runtime knowledge of type storage through storage descriptors descriptors Language Page 25 The java allocation model Tuesday, October 23, 2012 11:42 AM The java allocation model Allocate structures of specific types Document type of each object in a storage descriptor Discover memory that is no longer needed by "mark and sweep" or "access counting". Automatically free unreferenced storage. Language Page 26 What is a storage descriptor? Tuesday, October 23, 2012 11:48 AM What is a storage descriptor? Information that is recorded when operator new is called in java. For example: int a[] = new int[10]; Storage descriptor contains: Type of storage object. Length of storage object. Types of sub-components (as applicable). Storage descriptors have several effects: Can determine which storage objects are active: these have valid pointers to them.

Load more