WIPRO CONFIDENTIAL Agenda

Data structures and Program Design By FCG Team WIPRO CONFIDENTIAL Agenda § Introduction to Data Structures § Array as a Data Structure § Linked List – Definition and Implementation § Introduction to data structures related to linked lists and arrays Ÿ Doubly linked lists Ÿ Circular linked lists Ÿ Chunk lists Ÿ Stacks and Queues Ÿ Lookup tables Ÿ Hash tables Ÿ Associative arrays Ÿ Circular buffers Ÿ Dynamic arrays § Binary trees § Graphs § Examples of choosing data structures WIPRO CONFIDENTIAL 2 Prerequisites § Basics of C § Basic Understanding of Pointers § Basic Understanding of recursion § Some Understanding of Search and Sort techniques WIPRO CONFIDENTIAL 3 Data Structure Introduction § Take the example of a text processing program. § One of the requirement is to sort a list of words. § How would you design your program to perform this function? § Specifically how would you store the data for this program? W1 W2 W3 W4 W5 W6 W2 W5 W3 W4 W1 W6 WIPRO CONFIDENTIAL 4 What is a Data structure § A structured way of organizing a collection of data elements, which are related to each other. § A Data structure is also defined by Ÿ a set of defined operations on the structure, which determines how the data is processed. § Given a collection of data elements, there can be several ways that it can be organized, i.e a programmer can chose to represent the data with different types of data structures. § Choosing an appropriate data structure for your program depends on Ÿ How the input set of data is interrelated, Ÿ What are the data processing requirements, Ÿ What are the program constraints in terms of time and space. WIPRO CONFIDENTIAL 5 Importance of Data structures § The processing stage of a program revolves around the data structure. § Typically, if data structure chosen correctly, the rest of the logic will fall in place. § Frequently Used Data structures: Ÿ Arrays/Tables Ÿ Stack Ÿ Link list (singly linked list, doubly linked list) Ÿ Tree § The basic data structures can be combined to form more complex data structures Ÿ Arrays & Link list can be combined to form data structures like chunk list, hashed list, etc. § Selecting wrong data structure would result in Ÿ processing stage becoming unnecessarily complex and buggy WIPRO CONFIDENTIAL 6 Data structures – Algorithms and implementation § For each of the data structures, there are standard algorithms available for Ÿ parsing Ÿ Searching Ÿ Sorting § Combining these algorithms, several derived algorithms exist for Ÿ Splitting Ÿ Merging § Implementation of Data structures generally uses program defined basic data types along with references/pointers and structures/unions. WIPRO CONFIDENTIAL 7 Array as a Data Structure § Stores collection of elements of the same type. § the entire array is allocated as one contiguous block of memory. § Only defining operation for an array is : Indexing. § Static and Dynamically allocated arrays: Ÿ Static arrays are allocated at compile time Ÿ Dynamically allocated arrays are allocated at run time § Fixed size arrays: Ÿ Static arrays: at compile time Ÿ Dynamically allocated: size can be determined at runtime, but once declared it most often remains fixed. § Dynamic Arrays: Ÿ You can dynamically allocate an array in the heap, and resize it with realloc() call. Ÿ Managing a heap block in this way is fairly complex, but can have excellent efficiency for storage and iteration. Ÿ if continuous free memory is not available, realloc will copy the old block to new block and return the pointer to new block. This makes realloc expensive. WIPRO CONFIDENTIAL 8 Advantages and Disadvantages of an array § Pros Ÿ Access to an array element is convenient and fast w Any element can be accessed directly using the [ ] syntax. w Array access is almost always implemented using fast address arithmetic: the address of an element is computed as an offset from the start of the array which only requires one multiplication and one addition. Ÿ Because arrays are contiguous, processing of sequential data stored in an array can make best use of memory caches. § Cons Ÿ Because of the fixed size restriction on an array, most often programmers tend to allocate an array that is “large enough”, which tends to either waste memory for most occasions and crash for special out of bounds cases. Ÿ When large arrays are used as local variables, the stack space becomes unmanageable. Ÿ Inserting/Deleting new elements at the front or middle of an array is potentially expensive because existing elements need to be shifted over to make room. WIPRO CONFIDENTIAL 9 Linked Lists § List of nodes at non-contiguous memory locations in the heap, to each other through pointers/links § Need to keep track of the head of the list to be able to traverse the list § Size of the list varies dynamically during the execution of the program § Typical operations are Insertion, Deletion and Searching DATA NP typedef struct node { int data; struct node* next; Stack/Data }NODE_TYPE; Heap Segment Head 1 2 3 A “head” pointer local to function or global, keeps Each node the whole list by storing Each node stores stores one a pointer to the first one next pointer. node. data element WIPRO(int in CONFIDENTIALthis example). 10 Linked Lists – Insert at beginning of list 2 8 10 4 6 3 2 5 head head head 2 2 Allocate memory for a new node. Dummy head Assign data Set next of new node to next of head X Update head of next to point to the new node head head X 2 X 8 2 WIPRO CONFIDENTIAL 8 11 Linked Lists - Code List Initialize() head { Node* temp; X temp = (Node*)malloc(sizeof(Node)); return temp; } head X 2 void InsertBegin(List head,int d) { Node* temp; temp = (Node*)malloc(sizeof(Node)); main() temp->data = d; { List head; temp->next = head->next; head = Initialize(); head->next = temp; InsertBegin(head, 2); } WIPRO CONFIDENTIAL} 12 Linked Lists – Insert at any point in the list 2 8 1 4 10 3 6 5 Allocate memory for a new node. head Insert point Assign data Find the insert point after which the new X 8 2 node has to be inserted Set next of new node to next of insert point 1 Update next of insert point to point to the new node head Insert point X 8 2 1 4 head X 8 4 2 1 10 WIPRO CONFIDENTIAL 13 Linked Lists - Delete head prev del X 10 2 4 8 head prev del X 10 4 8 head Identify ‘del’, the node to be deleted X 10 4 Store this in temp Modify its previous pointer to point del->next Free the memory pointed by temp head X 4 WIPRO CONFIDENTIAL 14 Linked Lists – Pros and Cons § Pros Ÿ Optimal Memory Usage, when no of elements is not fixed at compile time. Ÿ It is easier to do insertions and deletions at any random point in the list. § Cons Ÿ Iteration is costlier because of non-contiguous storage. Ÿ Optimal only for sequential processing of elements. Ÿ Extra storage needed for references, which often makes them impractical for small lists of small data items Ÿ Locality of data is poor. Linked list data elements do not make the best use of memory caches. WIPRO CONFIDENTIAL 15 Linked Lists - Exercise • Write code to implement a phone book as found in a mobile phone •Add new entry •Edit existing entry •Delete an entry • The user should be able to search for the relevant details based on the name. • The details to be stored in an entry are: • Name of the person • Mobile number • Landline number • A text section of 128 bytes to store some notes or address related to the entry • The program should flag a warning if a new entry is being attempted with an existing name and add the new details to the existing name. •Given a linked-list write a function reverse() that reverses its contents - reverse(List* head) - At the end of call to this function head should be pointing to the last node in the list that was passed to it andWIPRO its contents CONFIDENTIAL should be reversed. -This program should not allocate memory for any additional nodes. 16 Arrays or Linked Lists? Chose to use an Array when - § Random access of elements is a requirement. § Maximum number of elements is known. § Program performance w.r.t time is more important than w.r.t to memory. § During most of the runtime, the array elements are all occupied with valid values, so that there is no wastage of memory. § There is no need to insert elements in the beginning or middle of list. Insertion is needed only at end of list. § There are no deletions required. Chose a linked list when – § Maximum number of elements is not known. § Access of elements is mostly sequential. Frequent insertions and deletions at any point in the list is required. § Performance in termsWIPRO of memory CONFIDENTIAL is critical more than in terms of time. 17 Related Data Structures § Doubly-Linked lists: Ÿ Each node has both a next and prev pointer, so that the list can be traversed in both the directions. Insertions and deletions are easier in this list, since there is no need to track a previous pointer. Ÿ Use this when there is a need to frequently parse the list in both directions sequentially, this can be used commonly in editors, which requires parsing the cursor in both directions. HEAD HEAD 1 2 3 1 2 3 •Circular lists: • The last node is connected back to the first node. Use this for naturally circular data relation. • Instead of needing a fixed head end, any pointer into the list will do.

Load more