Assignment 3 1/21/2015
Total Page:16
File Type:pdf, Size:1020Kb
Assignment 3 1/21/2015 The University of Wollongong School of Computer Science and Software Engineering CSCI124/MCS9124 Applied Programming Spring 2010 Assignment 3 (worth 6 marks) • Due at 11.59pm Thursday, September 9 (Week 7). Aim: Developing programs with dynamic memory and hashing. Requirements: This assignment involves the development of a database for a cinema. The program is designed to look up past movie releases, determining when the film was released, the rating it received from the Classification Board and its running time. To accelerate the lookup, a hash table will be used, with two possible hash functions. The program will report how efficiently the table is operating. Storage will be dynamic, enabling the space used by the program to be adjusted to suit the datafile size. The program must clean up after memory usage. During development, the program may ignore the dynamic nature until the program is working. As an addendum, the assignment also involves the use of compiler directives and makefiles. Those students who use a platform other than Linux/Unix will be required to ensure that their programs comply with the specifications below. The program is to be implemented in three files: ass3.cpp contains the simplest form of main function, hash.h contains the function prototypes of the publicly accessible functions, where the implementations of these functions will be placed in the file hash.cpp. The first two files should not be altered, although you may write your own driver programs to test your functions during development. The third file contains the stubs of the four public functions, plus a sample struct for holding the records, usable until they are changed for dynamic memory. Test data is provided in movies.dat. This filename should be hard-coded in your program, although your program cannot assume that the contents will always be the same. Step 1: involves implementing the function bool ReadFile(const char[]); which opens the named file. The function should check whether the file opens correctly, printing an error message and returning false if it fails. Otherwise, read the first item in the datafile which is the number of films contained in the file. This means that you need no eof loop to read the data :-). The struct provided can hold the data in the file which consists of: The date of release - a character string containing no whitespace The title of the movie - a character string where each word in the title capitalised The rating - one or two characters The running time - an integer Each line in the file has a tab between the above items and a newline at the end of each line. Reading the data will involve a combination of extraction and getline. An ignore may be required. The given struct will ensure there is sufficient space for the data to be stored into individual instances of the struct, but an array of these structs is required. If you understand the concepts of using pointers and dynamic memory at this time, you may modify the struct now to incorporate pointers for c-string storage, or you can change to dynamic memory later. Do not use the string class in this assignment. Similarly, you may use a fixed size array at this time but, at submission, your array must be dynamically allocated to perfectly fit the data in the file. Your function should print out the number of records read into the array in a form similar to The datafile contained 1887 records. Step 2: involves implementing the function void LoadHashTable(double); This should first check to see if the file has been read, that the array is occupied. If not, an error message should be printed and control returned to the calling function. Using the occupancy ratio (a fraction) passed as the argument, determine the size required for the hash table. (Again, if you are not confident about using dynamic memory at this time, make a fixed size hash table of about size 5000, and limit the usage to that size required by the occupancy ratio.) This table will contain indexes into the data array (so that the data can remain in order of release date). Fill the hash table with -1's to indicate empty cells. Now you need to write another function, whose name you choose, to perform the hash. This is a function which takes a character string as input and returns an integer on the range to fit the hash table. A second argument to the function will allow for the size of the table to be passed. So, now the data can be inserted into the hash table, based on the generated hash value. Should a collision occur, you'll need to determine where to place the current index. Linear probing as covered in lectures is acceptable, but you may choose alternatives if you wish to experiment. At the conclusion of the load of the table, the function is to report: the average number of cells inspected per hash insertion (this means file:///H:/Tutoring/CSCI124%202010/Assignments/a3/Ass3Q.html 1 / 28 Assignment 3 1/21/2015 for no collision you would get one inspected cell); and the maximum number of inspections for an insertion. These two pieces of information reflect the number of checks required when searching. Here is a suitable form of the output produced by this function: The average number of cells inspected when loading the hash table = 1.47271 while the maximum number of inspections = 66 Step 3: Now comes the search function bool FindTitle(const char[]); This should also check to see if the hash table has been filled. If not, issue an error message and return false. Otherwise, the hash function and insertion scheme used in step 2 should be used to search the hash table, counting how many cells have to be inspected before either the title is found, or an empty cell is encountered. If the title is not found, report the fact and how many checks were needed, and return false. Otherwise report the contents of the record found and the number of cells inspected. For example Enter Title: Inception The movie "Inception" was released on 22-Jul-10 with rating M running 148 minutes The search took 23 comparisons Now we'll add some extra output to the report for a found title. We'll display a list of those films released on the same date. The index found from the search enabled the output of the original information by indexing into the database which was in date order. Add to the output a list of the films released on the same day as the successful search, as in Other movies released on that date were: Leaving Skin Greenberg This will involve working backwards and forwards from the found index. Neaten the output so that plurals are used when appropriate and that if there were no concurrent releases that the header above is not printed. Step 4: It is at this time that we need to replace any situations where fixed length arrays have been used to hold variable length information, and move to pointers and dynamic memory. The rating can remain a simple char array. Input will then involve a temporary read into a fixed character array, the determination of the length of any c-string and the use of new to create suitable memory space. Once all the arrays are now dynamic (the date and title arrays in each record, the array holding the records themselves, and the hash table), we'll need code to free up this memory once it has been finished with. Whenever we are about to allocate new memory, the pointer should be checked to see if that memory has already been allocated (as would happen with multiple ReadFile or LoadTable calls, even though our program doesn't do that). If memory has been allocated it should be deleted. Write a function that the main program can call void CleanUp(); (which may involve calls of other simpler functions) to delete the memory used by the hash table, the array of records, and the records themselves. Step 5:One of the problems of inserting the titles into the hash table in the order of the datafile is that recent releases will go in last and hence will most likely result in collisions and hence longer search times. It would be an advantage to level out the searches by inserting the titles into the hash table in a random order. This will involve creating another index array to shuffle the indexes to determine an insertion order. Seed the random number generator with the time as in srand(time(0)); (which requires the header file ctime). Remember to free up this extra array. However, this extra code should be compiled conditionally, based on the compile flag RANDOM set at compile time. (See laboratory exercise 3's L3main.cpp for an example of this.) You may find the search example above then takes many less comparisons. Step 6: Write a makefile called Makefile to maintain the program written above, so that the command $ make ass3 will create the program to Step4, while $ make ass3r builds the randomised insertion version. A 'make all' and a 'make clean' command should also be available. Submit: The four files ass3.cpp, hash.h, hash.cpp and Makefile should be submitted before the due time using the command submit -u userid -c csci124 -a 3 filenames An extension of time for the completion of the assignment may be granted in certain circumstances. A request for an extension must be made to the Subject Coordinator before the due date.