The Usage of Pointers, Arrays, and Structures

The Usage of Pointers, Arrays, and Structures James Daly December 9, 2012 1 Introduction In 1968, the Association for Computing Machinery (ACM) proposed a sequence of eight courses that would provide the core of the computer science curriculum[3]. While their recommendations have changed signif- icantly over the years, their recommendations continue to influence how computer science is taught to this day. In particular, \CS1" is still used to refer to the introductory computer science course and \CS2" to the data structures course. The CS1 course typically covers such topics as basic language syntax, conditionals, looping, structures and objects, arrays, and pointers. Pointers are often considered to be one of the hardest concepts for novice programmers to understand and are usually one of their first stumbling blocks [8]. Pointers are also one of the foundations of the field and form the basis for most of the abstract data structures (ADS) that are traditionally taught during CS2. Without understanding pointers, students will not be able to understand the ADS or many other of the areas of the field. In hindsight, it should come as no surprise that many students find pointers very difficult. In the C++ programming language, pointers have no fewer than three operators associated with them: "*", the dereference operator; "&", the reference operator; and "->", the dereference and access operator. The fact that the asterisk is also used to declare a pointer, but it is the ampersand that is used to create one from an existing object probably does not help to alleviate confusion. This complicated syntax is one of the reasons that pointers are difficult for novice programmers [15]. This is supported by multiple studies showing that students learning Java (which has simplified and restricted pointer syntax) have an easier time understanding pointers than students who learn C++ [8, 9]. It is unclear from the articles whether such students are having an easier time because they are working with simplified syntax or if it carries over when they begin to learn C++. The addition of pointers also necessitates the addition of memory management. Novice programmers make frequent errors either allocating or deallocating memory [1]. Failing to allocate memory at the right time or to reassign a pointer to a safe value after deleting an object can create a wild pointer. Wild pointers are like time bombs that may crash a program when they are finally accessed. Failure to delete an object when the student should have causes a memory leak which can slowly sap resources as the leaks pile up. Finally, freeing memory that was never allocated (or has already been deallocated) is its own crash-worthy bug. As such, there are so many things that an intro programming student must get right that it is surprising that so many of them do pass the class. Several systems have been proposed to help students identify common mistakes that they may make with pointers [11, 1]. These systems, especially the Dereferee library by Allevato et al. can identify many potential problems like wild pointer accesses and pointer arithmetic making it much easier for students to identify what is wrong with their program. Alleveto believes that this is much more efficient than the common novice strategy of using trace statements to output variable states. While Dereferee can help the students identify what is wrong with their programs, other solutions are needed to help them understand the system well enough to not make the mistakes in the first place. 1 2 Background In a modern \objects-first” style to teaching, introductory computer science students are quickly taught how to encapsulate primitive data types (such as integers, floating-point numbers, or characters) into custom composite structures (or classes) [6]. These classes group data that is logically related into a single unit. A particular instance of a class is called an object. For example, a Complex data type can be defined that has two values representing the real and imaginary parts of a complex number. When a data type or object is to be passed to a function for processing, this can be done in two ways. In the first way, a copy of the object is made for the receiving function. The function can make modifications to its copy during whatever operations it performs, but the original copy will be unchanged. This is called pass by value semantics. The alternative is to pass to the function the location of the object. This allows the function to make changes that will be reflected in the calling function since it has access to the original object. This is called pass by reference semantics. This idea is closely related to the idea of a pointer. A pointer is a special data type whose value is the memory location of an object. In this case, the pointer is said to reference the other object. By dereferencing the pointer, this object can be read or modified. Another special data type is an array. Arrays are collections of data objects that are stored sequentially in memory. Arrays may be defined by having a pointer to the first object in the array. Other objects in the array can be referenced by adding multiples of the object size to the pointer value to get the location of the other objects. As such, pointers and arrays are almost semantically equivalent. Finally, the presence of pointers and arrays creates the problem of managing memory. It is frequently necessary to allocate new objects on the heap with the new command. These objects must be deallocated at some later date with the delete command or else that memory will be lost (or leaked). However, an object that still needs to be accessed should not be deleted or else a wild pointer will be created. Wild pointers have undefined behavior and may silently cause problems. Additionally, two pointers may both point to the same object and should not both be deleted (since they refer to the same object and each object should be deleted precisely once). As such, the semantics of when to delete an object can easily be nontrivial. Aside from the non-trivialities of properly allocating and deallocating memory, many students find pointers difficult for other reasons. As mentioned in the introduction, the complicated syntax is one of the confounding factors making pointers difficult to understand [15]. Another author believes that the main reason is that novices often have a weak understanding of variables in general [7]. This suggests that the main problem may be that the students do not completely understand variables and as such are failing to transfer their knowledge to understanding pointers. Additionally, variables are a fairly abstract idea and pointers are another layer of abstraction added on top of this. It is not possible to physically observe the values stored in memory (although a debugger can give a good approximation of it) and the relationships that pointers have to objects is difficult to represent. According to Chi, this makes the concept of pointers much more difficult to learn [5]. Furthermore, it is not sufficient for students must be able to correctly apply pointers. This is a higher level of Bloom's taxonomy than if they simply had to be able to recall pointer syntax or understand pointer theory. The objective of this project is to explore these difficulties that students have and to determine ways to improve student understanding of pointers and their ability to apply them. After completing the CS1 course, the students should be able to collect data types together into classes and to then instantiate them as objects. They should also be able to acquire and dereference pointers to these objects and to create arrays of them. Additionally, they should be able to tell when to use a pointer and when they should use an object (and by association when to use pass by reference and when to use pass by value). Finally, they should be able to allocate and deallocate objects without leaking memory or creating wild pointers. These skills are necessary for the student to be able to take the CS2 (Abstract Data Structures) and more advanced courses. Finally, the CS1 course should lay the groundwork for the students to be able to independently frame and solve problems. In How People Learn, the authors review several surveys which analyze the differences between novices and experts and found that experts are much better at being able to notice patterns and are better able to use those patterns to solve new problems [10]. Several of the other courses, most notably CS2, 2 are designed for teaching students lots of tools and patterns to use for solving new problems. In contrast, the CS1 course teaches more basic building blocks rather than the more advanced schemata. However, the students should begin to become comfortable with the basics of design. 3 Pre-Assessment At Michigan State University, computer science students are required to take an introductory Python course before they take the C++ course. As such, they will already have seen many of the above ideas, albeit in a limited form and a different guise. Thus, we will create and administer a pre-assessment to determine how much they recall from their Python class and see how well they can use it to predict the behavior of similar situations in C++. This pre-assessment will not count towards their grade but rather serve as a starting point for the unit. Our pre-assessment will be a Background Knowledge Probe [2] where the students are given several short programs and they have to identify what each of the programs output. This quiz will have two parts to it.

The Usage of Pointers, Arrays, and Structures

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support