20.09.19

Big-Oh notation classes

Algorithmics (6EAP) Class Informal Intuition Analogy MTAT.03.238 f(n) ∈ ο ( g(n) ) f is dominated by g Strictly below < Linear structures, sorting, f(n) ∈ O( g(n) ) Bounded from above Upper bound ≤ searching, Recurrences, f(n) ∈ Θ( g(n) ) Bounded from “equal to” = Master Theorem... above and below f(n) ∈ Ω( g(n) ) Bounded from below Lower bound ≥ Jaak Vilo f(n) ∈ ω( g(n) ) f dominates g Strictly above > 2019 Fall

Jaak Vilo 1

Linear, sequential, ordered, list … Physical ordered list ~ array

• Memory /address/ – Garbage collection

• Files (character/byte list/lines in text file,…)

• Disk – Memory, disk, tape etc – is an ordered sequentially addressed media. Disk fragmentation

Linear data structures: Arrays Linear data structures: Lists

• Array • Hashed array tree • Doubly linked list • • Bidirectional map Heightmap • Array list • Xor linked list • • Lookup table • Linked list • Zipper • • Matrix • Self-organizing list • Doubly connected edge • • Parallel array list • Bitmap • Sorted array • Skip list • Circular buffer • Sparse array • Unrolled linked list • Difference list • Control table • Sparse matrix • VList • Image • Iliffe vector • Dynamic array • Variable-length array • Gap buffer

1 20.09.19

Lists: Array Lists: Array

0 1 size MAX_SIZE-1 0 1 size MAX_SIZE-1 3 6 7 5 2 3 6 7 5 2

L = int[MAX_SIZE] L = int[MAX_SIZE] L[2]=7 L[size++] = new L[2]=7

1 2 size MAX_SIZE 3 6 7 5 2

L[3]=7 L[++size] = new

Multiple lists, 2-D-arrays, etc… 2D array

& A[i,j] = A + i*(nr_el_in_row*el_size) + j*el_size

Linear Lists Abstract Data Type (ADT)

• Operations which one may want to perform • High-level definition of data types on a linear list of n elements include: • An ADT specifies – A collection of data – gain access to the kth element of the list to – A set of operations on the data or subsets of the data examine and/or change the contents • ADT does not specify how the operations should be – insert a new element before or after the kth implemented element • Examples – delete the kth element of the list – vector, list, stack, queue, deque, priority queue, table (map), associative array, set, graph, digraph

Reference: Knuth, The Art of Comptuer Programming, Vol 1, Fundamental Algorithms, 3rd Ed., p.238-9

2 20.09.19

ADT Primitive & composite types

• A datatype is a set of values and an associated set of Primitive types Composite types operations • Boolean (for boolean values True/False) • Array • A datatype is abstract iff it is completely described by its set • Char (for character values) • Record (also called tuple of operations regradless of its implementation • int (for integral or fixed-precision or struct) • This means that it is possible to change the implementation values) • Union of the datatype without changing its use • Float (for storing real number • Tagged union (also called a • The datatype is thus described by a set of procedures values) • Double (a larger size of type float) variant, variant record, • These operations are the only thing that a user of the discriminated union, or abstraction can assume • String (for string of chars) • Enumerated type disjoint union) • Plain old

Abstract Data Types (ADT) Abstract data types: • Some common ADTs, which have proved useful in a great variety of applications, are • Dictionary (key,value) Container Stack • Stack (LIFO) List Graph Set Queue • Queue (FIFO) Multiset Priority queue • Queue (double-ended) Map Double-ended queue Multimap Double-ended priority queue • Priority queue (fetch highest-priority object) • ... • Each of these ADTs may be defined in many ways and variants, not necessarily equivalent.

Dictionary Some data structures for Dictionary ADT • Unordered • Container of key-element (k,e) pairs – Array • Required operations: – Sequence/list • – insert( k,e ), Ordered – Array – remove( k ), – Sequence (Skip Lists) – find( k ), – Binary Search Tree (BST) – isEmpty() – AVL trees, red-black trees • May also support (when an order is provided): – (2; 4) Trees – B-Trees – closestKeyBefore( k ), • Valued – closestElemAfter( k ) – Hash Tables • Note: No duplicate keys – Extendible Hashing

3 20.09.19

Linear data structures Trees …

• Hashed array Binary trees • Treap • Soft heap • Ctrie • Interval tree • BK-tree Arrays Lists • AA tree • Weight-balanced • Pairing heap • Range tree • Bounding tree • AVL tree tree • Leftist heap Multiway trees • Bin interval hierarchy • Array • Binary search • Treap • Ternary search • Kd-tree • Doubly linked list • BSP tree • Heightmap tree B-trees • Beap tree • Implicit kd-tree • Bidirectional • Binary tree • B-tree • Skew heap • And–or tree • Min/max kd-tree• Rapidly-exploring • Linked list random tree • Lookup table • Cartesian tree • B+ tree • • (a,b)-tree • map Ternary heap Adaptive k-d tree • • Pagoda • B*-tree • D-ary heap • Link/cut tree • Kdb tree Matrix • Self-organizing list • Randomized • B sharp tree • SPQR-tree • Bit array • • Quadtree Application-specific binary search • Dancing tree • Tries • Spaghetti stack • Octree trees • Parallel array tree • Bit field • Skip list • 2-3 tree • Trie • Disjoint-set data • Linear octree • Syntax tree • Red-black tree • 2-3-4 tree • Radix tree structure • Z-order • Abstract syntax • Sorted array • Rope • Bitboard • Unrolled linked list • Queap • Suffix tree • Fusion tree • UB-tree tree • Scapegoat tree • Enfilade • Sparse array • Fusion tree • Suffix array • R-tree • Parse tree • Self-balancing • Exponential tree • Bitmap • Bx-tree • Compressed • R+ tree • Decision tree • VList binary search suffix array • Fenwick tree • Alternating • Sparse matrix tree • R* tree • Circular buffer Heaps • Van Emde Boas decision tree • Splay tree • FM-index • Hilbert R-tree • Iliffe vector • Xor linked list • Heap tree • Minimax tree • Control table • T-tree • Generalised • X-tree • Binary heap suffix tree • Expectiminimax • Tango tree • Metric tree • Variable-length • Zipper • Binomial heap • B-trie tree • Image • Threaded binary • Cover tree • Fibonacci heap • Judy array Space-partitioning • Finger tree array tree • M-tree • Doubly connected edge list • AF-heap • X-fast trie trees • Dynamic array • Top tree • Segment tree • VP-tree • 2-3 heap • Y-fast trie • Gap buffer

Hashes, Graphs, Other *array () Lists: Array size • Hashes Graphs • Propositional directed MAX_SIZE 0 1 size MAX_SIZE-1 • Bloom filter • Adjacency list acyclic graph 3 6 7 5 2 • Distributed hash table • Adjacency matrix • Multigraph • Hash array mapped • Graph-structured • Hypergraph trie stack • Hash list • Scene graph Other 0 1 size • Hash table • Binary decision • Lightmap 3 6 7 8 25 2 Insert 8 after L[2] • Hash tree diagram • Winged edge • Zero suppressed • Hash trie • Quad-edge 0 1 size • Koorde decision diagram • Routing table • And-inverter graph 3 6 7 8 25 2 Delete last • Prefix hash tree • Symbol table • • Directed graph • Directed acyclic graph

Lists: Array Linear Lists

• Access i O(1) • Insert to end O(1) • Other operations on a linear list may include: • Delete from end O(1) • Insert O(n) – determine the number of elements • Delete O(n) • Search O(n) – search the list 0 1 size – sort a list 3 6 7 8 25 2 Insert 8 after L[2] – combine two or more linear lists – split a linear list into two or more lists 0 1 size – 3 6 7 8 25 2 Delete last make a copy of a list

4 20.09.19

Linked lists Linked lists: add head tail head tail

Singly linked

head tail head tail

Doubly linked

size

Linked lists: delete • Array indexed from 0 to n – 1: k = 1 1 < k < n k = n head tail (+ garbage collection?) access/change the kth element O(1) O(1) O(1) insert before or after the kth element O(n) O(n) O(1)

delete the kth element O(n) O(n) O(1) • Singly-linked list with head and tail pointers k = 1 1 < k < n k = n head tail access/change the kth element O(1) O(n) O(1) insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1) delete the kth element O(1) O(n) O(n) • Doubly linked list k = 1 1 < k < n k = n access/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(1)1 O(1) size delete the kth element O(1) O(1)1 O(1)

Introduction to linked lists Introduction to linked lists: inserting a node • Consider the following struct definition • node *p; struct node { • p = new node; string word; • p->num = 5; int num; • p->word = "Ali"; node *next; //pointer for the next node • p->next = NULL }; node *p = new node; • p 5 Ali p ? ? ? num word next num word next

5 20.09.19

Introduction to linked lists: adding a new Introduction to linked lists node node *p; • How can you add another node that is pointed by p->link? p = new node; • node *p; • p = new node; p->num = 5; • p->num = 5; p->word = "Ali"; • p->word = "Ali"; • p->next = NULL; p->next = NULL; • node *q; node *q; q q • q = new node;

p 5 Ali ?

num word link p 5 Ali ? ? ? ?

num word link num word link

Introduction to linked lists Introduction to linked lists node *p, *q; node *p, *q; p = new node; p = new node; p->num = 5; p->num = 5; p->word = "Ali"; p->word = "Ali"; p->next = NULL; p->next = NULL; q = new node; q = new node; q->num=8; q->num=8; q->word = "Veli"; p->next = q; q->word = "Veli"; q->next = NULL; q q

p 5 Ali ? 8 Veli ? p 5 Ali ? 8 Veli num word next num word next num word link num word link

Pointers in C/C++ Book-keeping

p = new node ; delete p ; • malloc, new – remember what has been p = new node[20] ; created free(p), delete (C/C++) • When you need many small areas to be p = malloc( sizeof( node ) ) ; free p ; allocated, reserve a big chunk (array) and maintain your own set of free objects • Elements of array of objects can be pointed by p = malloc( sizeof( node ) * 20 ) ; the pointer to an object. (p+10)->next = NULL ; /* 11th elements */

6 20.09.19

Object Some links

• Object = new object_type ; • http://en.wikipedia.org/wiki/Pointer

• Equals to creating a new object with necessary • Pointer basics: size of allocated memory (delete can free it) http://cslibrary.stanford.edu/106/

• C++ Memory Management : What is the difference between malloc/free and new/delete? – http://www.codeguru.com/forum/showthread.ph p?t=401848

Alternative: arrays and integers Replacing pointers with array index head=3 head 1 2 3 4 5 6 7 next / 5 1 8 4 7 • If you want to test pointers and linked list etc. key 7 8 4 data structures, but do not have pointers prev 5 / 3 familiar (yet)

• Use arrays and indexes to array elements instead…

Maintaining list of free objects Multiple lists, single free list head=3 head 1 2 3 4 5 6 7 head1=3 => 8, 4, 7 next / 5 1 8 4 7 head2=6 => 3, 9 free =2 (2) key 7 8 4 prev 5 / 3

1 2 3 4 5 6 7 head=3 free=6 free = -1 => array is full next / 4 5 / 1 7 / key 7 8 4 3 9 1 2 3 4 5 6 7 allocate object: prev next / 4 5 / 1 7 2 new = free; 5 / 3 / 6 key 7 8 4 free = next[free] ; free object x prev 5 / 3 next[x]= free free = x

7 20.09.19

Hack: allocate more arrays …

AA 1 2 3 4 5 6 7 use integer division and mod

8 9 10 11 12 13 14 AA[ (i-1)/7 ] -> [ (i -1) % 7 ]

15 16 17 18 19 20 21 LIST(10) = AA[ 1 ][ 2 ]

LIST(19) = AA[ 2 ][ 5 ]

Queue Queue (FIFO)

• enqueue(x) - add to end • dequeue() - fetch from beginning F L 3 6 7 5 2 • FIFO – First In First Out

• O(1) in all reasonable cases J

Queue Circular buffer (basic idea, does not contain all controls!)

F L MAX_SIZE-1 • A circular buffer or 3 6 7 5 2 ring buffer is a data structure that uses a F L MAX_SIZE-1 single, fixed-size buffer 7 5 2 as if it were connected end-to-end. This First = List[F] Pop_first : { return List[F++] } structure lends itself Last = List[L-1] Pop_last : { return List[--L] } easily to buffering data streams. Full: return ( L==MAX_SIZE ) Empty: F< 0 or F >= L

8 20.09.19

Circular Queue Stack

L F MAX_SIZE-1 3 6 7 5 2 • push(x) -- add to end (add to top) • pop() -- fetch from end (top) First = List[F] Add_to_end( x ) : { List[L]=x ; L= (L+1) % MAX_SIZE ] } // % = modulo Last = List[ (L-1+MAX_SIZE) % MAX_SIZE ] • O(1) in all reasonable cases J

Full: return ( (L+1)%MAX_SIZE == F ) Empty: F==L • LIFO – Last In, First Out L, F MAX_SIZE-1 3 6 7 5 2

Stack based languages Array based stack

• Implement a postfix calculator • How to know how big a stack shall ever be?

– Reverse Polish notation 3 6 7 5

3 6 7 5 2 • 5 4 3 * 2 - + => 5+((4*3)-2) • When full, allocate bigger table dynamically, • Very simple to parse and interpret and copy all previous values there

• FORTH, Postscript are stack-based languages • O(n) ?

What about deletions?

• When full, create 2x bigger table, copy • when n=32 -> 33 (copy 32, insert 1) previous n elements: • delete: 33->32 – should you delete (slots 33-64) immediately? • After every 2k insertions, perform O(n) copy

• O(n) individual insertions + • n/2 + n/4 + n/8 … copy-ing • Total: O(n) effort!

9 20.09.19

What about deletions? Amortized analysis

• when n=32 -> 33 (copy 32, insert 1) • delete: 33->32 • Analyze the time complexity over the entire – should you delete immediately? “lifespan” of the algorithm – Delete only when becomes less than 1/4th full • Some operations that cost more will be – Have to delete at least n/2 to decrease “covered” by many other operations taking – Have to add at least n to increase size less – Most operations, O(1) effort – But few operations take O(n) to copy – For any m operations, O(m) time

Lists and dictionary ADT… Array based sorted list

• How to maintain a dictionary using (linked) lists? • is d in D ? • Is k in D ? • Binary search in D – go through all elements d of D, test if d==k O(n) – If sorted: d= first(D); while( d<=k ) d=next(D); – on average n/2 tests … • Add(k,D) => insert(k,D) = O(1) or O(n) – test low mid high for uniqueness

Binary search – recursive Binary search – recursive BinarySearch(A[0..N-1], value, low, high) BinarySearch(A[0..N-1], value, low, high) { { if (high < low) if (high < low) return -1 // not found return -1 // not found mid = (low + high) / 2 mid = low + ((high - low) / 2) // Note: not (low + high) / 2 !! if (A[mid] > value) if (A[mid] > value) return BinarySearch(A, value, low, mid-1) return BinarySearch(A, value, low, mid-1) else if (A[mid] < value) else if (A[mid] < value) return BinarySearch(A, value, mid+1, high) return BinarySearch(A, value, mid+1, high) else else return mid // found return mid // found } }

10 20.09.19

Binary search – iterative Work performed

BinarySearch(A[0..N-1], value) • x <=> A[18] ? < { low = 0 ; high = N - 1 ; • x <=> A[9] ? > while (low <= high) { • x <=> A[13] ? == mid = low + ((high - low) / 2) // Note: not (low + high) / 2 !! if (A[mid] > value) 1 18 36 high = mid - 1 else if (A[mid] < value) low = mid + 1 else • O(lg n) return mid // found } return -1 // not found }

Sorting

• given a list, arrange values so that L[1] <= L[2] <= … <= L[n] • n elements => n! possible orderings • One test L[i] <= L[j] can divide n! to 2 – Make a binary tree and calculate the depth • log( n! ) = Ω ( n log n ) • Hence, lower bound for sorting is Ω(n log n) – using comparisons…

Decision tree model

• n! orderings (leaves) • Height of such tree?

11 20.09.19

Proof: log(n!) = Ω ( n log n )

• log( n! ) = log n + log (n-1) + log(n-2) + … log(1)

>= n/2 * log( n/2 )

Half of elements are larger than log(n/2) = Ω ( n log n )

The divide-and-conquer design Merge sort paradigm Merge-Sort(A,p,r) 1. Divide the problem (instance) into if p

It was invented by John von Neumann in 1945.

Example Merge of two lists: Θ(n)

• Applying the merge sort algorithm: A, B – lists to be merged L = new list; // empty while( A not empty and B not empty ) if A.first() <= B.first() then append( L, A.first() ) ; A = rest(A) ; else append( L, B.first() ) ; B = rest(B) ; append( L, A); // all remaining elements of A append( L, B ); // all remaining elements of B return L

12 20.09.19

Wikipedia / viz. Run-time Analysis of Merge Sort

• Thus, the time required to sort an array of size n > 1 is: Value – the time required to sort the first half, – the time required to sort the second half, and – the time required to merge the two lists • That is: ì Θ(1) n = 1 T(n) = í n î2T(2 )+ Θ(n) n > 1

Pos in array

Merge sort Quicksort

• Worst case, average case, best case … Θ( n log n ) • Proposed by C.A.R. Hoare in 1962. • Common wisdom: • Divide-and-conquer algorithm. – Requires additional space for merging (in case of arrays) • Sorts “in place”(like insertion sort, but not like • Homework*: develop in-place merge of two merge sort). lists implemented in arrays /compare speed/ • Very practical (with tuning). a b L[a] > L[b] L[a] <= L[b] a b a b

13 20.09.19

Quicksort

Partitioning version 2 L R pivot

L i j R <= pivot

> pivot pivot = A[R]; // i=L; j=R-1; while( i<=j ) while ( A[i] < pivot ) i++ ; // will stop at pivot latest while ( i<=j and A[j] >= pivot ) j-- ; if ( i < j ) { swap( A[i], A[j] ); i++; j-- } A[R]=A[i]; L j i R

A[i]=pivot; L j i R return i;

Wikipedia / video

14 20.09.19

Choice of pivot in Quicksort

• Select median of three …

• Select random – opponent can not choose the winning strategy against you!

http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-046JFall-2005/VideoLectures/detail/embed04.htm

15 20.09.19

Random pivot L R <= pivot

L i j R > pivot • 2-pivot version of Quicksort Select pivot randomly from the region (blue) – split in 3 regions! and swap with last position • Handle equal values (equal to pivots) as „third Select pivot as a median of 3 [or more] random regioon“ values from region

Apply non-recursive sort for array less than 10-20

L j i R

16 20.09.19

17 20.09.19

Alternative materials

• Quicksort average case analysis http://eid.ee/10z • https://secweb.cs.odu.edu/~zeil/cs361/web/website/Lectures/quick/pages/ar01s05.html

• http://eid.ee/10y - MIT Open Courseware - Asymptotic notation, Recurrences, Substituton Master Method

log b a f(n) f(n) n vs f(n)

f(n/b) f(n/b) f(n/b) f(n/b) f(n/b) f(n/b)

18 20.09.19

19 20.09.19

20 20.09.19

Back to sorting We can sort in O(n log n)

• using comparisons ( < <= >= > ) we can not do better than O(n log n)

• Is that the best we can do ?

How fast can we sort n integers?

• Sort people by their sex? (F/M, 0/1)

• Sort people by year of birth?

21 20.09.19

22 20.09.19

Radix sort

Radix-Sort(A,d) 1. for i = 1 to d /* least significant to most significant */ 2. use a stable sort to sort A on digit i

23 20.09.19

Radix sort using lists (stable) bbb bba adb aad aac ccc ccb aba cca

Radix sort using lists (stable) Radix sort using lists (stable) bbb bba adb aad aac ccc ccb aba cca bbb bba adb aad aac ccc ccb aba cca 2. 1. 1. a bba aba cca a bba aba cca a aac aad

b bbb adb ccb b bbb adb ccb b bba aba bbb

c aac ccc c aac ccc c cca ccb ccc

d aad d aad d adb

3. a aac aad aba adb b bba bbb

c cca ccb ccc

d

24 20.09.19

Why not from left to right ? Bitwise sort left to right

0101100 0101100 0101100 0101100 • Idea2: 1001010 0010010 0010010 0010010 1111000 1111000 0101000 0101000 – swap elements only if the prefixes match… 1001001 1001001 1001001 0010000 0010010 1001010 1001010 1001010 – For all bits from most significant 1001001 1001001 1001001 1001001 0101000 0101000 1111000 1111000 • advance when 0 0010000 0010000 0010000 1001001 • when 1 -> look for next 0 – if prefix matches, swap – otherwise keep advancing on 0s and look for next 1 • Swap 0 with first 1 • Idea 1: recursively sort first and second half – Exercise ?

do { /* For each bit */ i=0; Bitwise left to right sort new_mask: for( ; ( i < size ) && ( ! (ARRAY[i] & curbit) ) ; i++ ) ; /* Advance while bit == 0 */ if( i >= size ) goto array_end ; /* Historical sorting – was used in Univ. of Tartu using assembler…. */ group = ARRAY[i] & ; /* Save current prefix snapshot */ /* C implementation – Jaak Vilo, 1989 */ j=i; /* memorize location of 1 */ for( ; ; ) { void bitwisesort( SORTTYPE *ARRAY , int size ) if ( ++i >= size ) goto array_end ; /* reached end of array */ { if ( (ARRAY[i] & mask) != group) goto new_mask ; /* new prefix */ int i, j, tmp, nrbits ; if ( ! (ARRAY[i] & curbit) ) { /* bit is 0 – need to swap with previous location of 1, A[i] ó A[j] */ tmp = ARRAY[i]; ARRAY[i] = ARRAY[j]; ARRAY[j] = tmp ; j += 1 ; /* swap and increase j to the register SORTTYPE mask , curbit , group ; next possible 1 */ } } nrbits = sizeof( SORTTYPE ) * 8 ; array_end: mask = mask | curbit ; /* area under mask is now sorted */ curbit = 1 << (nrbits-1) ; /* set most significant bit 1 */ curbit >>= 1 ; /* next bit */ } while( curbit ); /* until all bits have been sorted… */ mask = 0; /* mask of the already sorted area */ }

Jaak Vilo, Univ. of Tartu Jaak Vilo, Univ. of Tartu

Bitwise from left to right Bucket sort

0010000 • Assume uniform distribution 0010010 0101000 0101100 • Allocate O(n) buckets 1001010 1001001 1001001 • Assign each value to pre-assigned bucket 1111000

• Swap 0 with first 1

Jaak Vilo, Univ. of Tartu

25 20.09.19

Sort small buckets with insertion http://sortbenchmark.org/ sort .78 0 / • The sort input records must be 100 bytes in length, .17 1 .12 .17 with the first 10 bytes being a random key .39 2 .21 .23 .26 • Minutesort – max amount sorted in 1 minute .26 3 – .39 116GB in 58.7 sec (Jim Wyllie, IBM Research) .72 4 / – 40-node 80-Itanium cluster, SAN array of 2,520 disks

.94 5 / • 2009, 500 GB Hadoop 1406 nodes x (2 Quadcore Xeons, 8 GB memory, 4 SATA) .21 6 .68 Owen O'Malley and Arun Murthy .12 7 .72 .78 Yahoo Inc. .23 8 • Performance / Price Sort and PennySort .68 9 .94

26 20.09.19

Sort Benchmark

• http://sortbenchmark.org/ • Sort Benchmark Home Page • We have a new benchmark called new GraySort, new in memory of the father of the sort benchmarks, Jim Gray. It replaces TeraByteSort which is now retired. • Unlike 2010, we will not be accepting early entries for the 2011 year. The deadline for submitting entries is April 1, 2011. – All hardware used must be off-the-shelf and unmodified. – For Daytona cluster sorts where input sampling is used to determine the output partition boundaries, the input sampling must be done evenly across all input partitions. New rules for GraySort: • The input file size is now minimum ~100TB or 1T records. Entries with larger input sizes also qualify. • The winner will have the fastest SortedRecs/Min. • We now provide a new input generator that works in parallel and generates binary data. See below. • For the Daytona category, we have two new requirements. (1) The sort must run continuously (repeatedly) for a minimum 1 hour. (This is a minimum reliability requirement). (2) The system cannot overwrite the input file.

Order statistics Order statistics

• Minimum – the smallest value • Input: A set A of n numbers and i, 1≤i≤n • Maximum – the largest value • Output: x from A that is larger than exactly i-1 • In general ith value. elements of A • Find the median of the values in the array • Median in sorted array A : – n is odd A[(n+1)/2] – n is even – A[ (n+1)/2 ] or A[ (n+1)/2 ]

Q: Find i’th value in unsorted data Minimum

Minimum(A) A. O( n ) 1 min = A[1] B. O( n log log n ) 2 for i = 2 to length(A) C. O(n log n) 3 if min > A[i] D. O( n log2 n ) 4 then min = A[i] 5 return min

n-1 comparisons.

27 20.09.19

Min and max together Selection in expected O(n)

• compare every two elements A[i],A[i+1] Randomised-select( A, p, r, i ) • Compare larger against current max if p=r then return A[p] • Smaller against current min q = Randomised-Partition(A,p,r) k= q – p + 1 // nr of elements in subarr • 3 n / 2 if i<= k then return Randomised-Partition(A,p,q,i) else return Randomised-Partition(A,q+1,r,i-k)

Conclusion Ok…

• Sorting in general O( n log n ) • lists – a versatile data structure for various • Quicksort is rather good purposes • Sorting – a typical algorithm (many ways) • Linear time sorting is achievable when one • Which sorting methods for array/list? does not assume only direct comparisons • Array: most of the important (e.g. update) • Find ith value in unsorted – expected O(n) tasks seem to be O(n), which is bad

• Find ith value: worst case O(n) – see CLRS

Can we search faster Q: search for a value X in linked list? in linked lists?

A. O(1) • Why sort linked lists if search anyway O(n)?

B. O( log n ) • Linked lists: – what is the mid-point of any sublist ? C. O( n ) – Therefore, binary search can not be used…

• Or can it ?

28 20.09.19

Skip List Skip lists

• Build several lists at different skip steps

• O(n) list • Level 1: ~ n/2 • Level 2: ~ n/4 • … • Level log n ~ 2-3 elements…

Skip List

typedef struct nodeStructure *node; Skip Lists typedef struct nodeStructure{ keyType key; S3 -¥ +¥ valueType value; S2 -¥ 15 +¥ node forward[1]; /* variable sized array of forward pointers */ S1 -¥ 15 23 +¥ }; S0 -¥ 10 15 23 36 +¥

9/17/19 9:12 AM Skip Lists 171

What is a Skip List Illustration of lists

• A skip list for a set S of distinct (key, element) items is a series of lists S0, S1 , … , Sh such that – Each list Si contains the special keys +¥ and -¥ – List S0 contains the keys of S in nondecreasing order – Each list is a subsequence of the previous one, i.e., S0 ÊS1 Ê … Ê Sh – List Sh contains only the two special keys • We show how to use a skip list to implement the dictionary ADT - inf 3 17 25 30 47 99 inf S3 -¥ +¥

S2 -¥ 31 +¥

S1 -¥ 23 31 34 64 +¥ Key Right[ .. ] - right links in array 12 23 26 31 34 44 56 64 78 S0 -¥ +¥ Left[ .. ] - left links in array, array size – how high is the list

9/17/19 9:12 AM Skip Lists 173

29 20.09.19

Search Randomized Algorithms

• We search for a key x in a a skip list as follows: • A randomized algorithm • We analyze the expected – We start at the first position of the top list performs coin tosses (i.e., uses running time of a randomized – At the current position p, we compare x with y ¬ key(after(p)) random bits) to control its algorithm under the following execution assumptions x = y: we return element(after(p)) – the coins are unbiased, and x > y: we “scan forward” • It contains statements of the type – the coin tosses are x < y: we “drop down” b ¬ random() independent – If we try to drop down past the bottom list, we return NO_SUCH_KEY if b = 0 • The worst-case running time of • Example: search for 78 do A … a randomized algorithm is else { b = 1} often large but has very low S -¥ +¥ 3 do B … probability (e.g., it occurs when all the coin tosses give “heads”) S2 -¥ 31 +¥ • Its running time depends on the outcomes of the coin tosses • We use a randomized algorithm to insert items into a skip list S1 -¥ 23 31 34 64 +¥

S0 -¥ 12 23 26 31 34 44 56 64 78 +¥

9/17/19 9:12 AM Skip Lists 175 9/17/19 9:12 AM Skip Lists 176

Insertion Deletion

• (x, o) To insert an item into a skip list, we use a randomized • To remove an item with key x from a skip list, we proceed as follows: algorithm: – We search for x in the skip list and find the positions p0, p1 , …, pi of the – We repeatedly toss a coin until we get tails, and we denote with i the items with key x, where position pj is in list Sj number of times the coin came up heads – p0, p1 , …, pi S0, S1, … , Si – If i ³ h, we add to the skip list new lists Sh+1, … , Si +1, each containing We remove positions from the lists only the two special keys – We remove all but one list containing only the two special keys – We search for x in the skip list and find the positions p0, p1 , …, pi of the • Example: remove key 34 items with largest key less than x in each list S0, S1, … , Si – For j ¬ 0, …, i, we insert item (x, o) into list Sj after position pj • Example: insert key 15, with i = 2 S3 -¥ +¥ S3 -¥ +¥ p2 p2 S2 -¥ +¥ S2 -¥ 15 +¥ S2 -¥ 34 +¥ S2 -¥ +¥ p1 p1 S1 -¥ 23 +¥ S1 -¥ 15 23 +¥ S1 -¥ 23 34 +¥ S1 -¥ 23 +¥ p0 p0 S0 -¥ 10 23 36 +¥ S0 -¥ 10 15 23 36 +¥ S0 -¥ 12 23 34 45 +¥ S0 -¥ 12 23 45 +¥

9/17/19 9:12 AM Skip Lists 177 9/17/19 9:12 AM Skip Lists 178

Implementation v2 Space Usage

• Consider a skip list with n items • We can implement a skip list • The space used by a skip list – with quad-nodes depends on the random bits used By Fact 1, we insert an item in list Si with probability 1/2i • A quad-node stores: by each invocation of the insertion – By Fact 2, the expected size of – item algorithm i list Si is n/2 – link to the node before quad-node • We use the following two basic • The expected number of nodes – link to the node after probabilistic facts: used by the skip list is – link to the node below x Fact 1: The probability of getting i – consecutive heads when flipping link to the node after i a coin is 1/2 h h • Also, we define special keys n 1 Fact 2: If each of n items is present in i = n i < 2n PLUS_INF and MINUS_INF, and å 2 å 2 a set with probability p, the i=0 i=0 we modify the key comparator expected size of the set is np to handle them Thus, the expected space usage of a skip list with n items is O(n)

9/17/19 9:12 AM Skip Lists 179 9/17/19 9:12 AM Skip Lists 180

30 20.09.19

Height Search and Update Times

• • • The running time of the search • Consider a skip list with n items The search time in a skip list is When we scan forward in a list, an insertion algorithms is – By Fact 1, we insert an item in list proportional to the destination key does not i – belong to a higher list affected by the height h of the Si with probability 1/2 the number of drop-down steps, plus – A scan-forward step is associated skip list – By Fact 3, the probability that list Si has at least one item is at most – the number of scan-forward with a former coin toss that gave • We show that with high n/2i steps tails probability, a skip list with n • By picking i = 3log n, we have • The drop-down steps are • By Fact 4, in each list the expected number of scan-forward items has height O(log n) that the probability that S3log n bounded by the height of the O(log n) • We use the following additional has at least one item is skip list and thus are steps is 2 probabilistic fact: at most with high probability • Thus, the expected number of n/23log n = n/n3 = 1/n2 • scan-forward steps is O(log n) Fact 3: If each of n events has To analyze the scan-forward • probability p, the probability • Thus a skip list with n items has steps, we use yet another We conclude that a search in a height at most 3log n with probabilistic fact: skip list takes O(log n) expected that at least one event occurs 2 is at most np probability at least 1 - 1/n Fact 4: The expected number of time coin tosses required in order to • The analysis of insertion and get tails is 2 deletion gives similar results

9/17/19 9:12 AM Skip Lists 181 9/17/19 9:12 AM Skip Lists 182

Summary

• A skip list is a data structure • Using a more complex for dictionaries that uses a probabilistic analysis, one randomized insertion can show that these algorithm performance bounds also • In a skip list with n items hold with high probability – The expected space used is O(n) • Skip lists are fast and simple – The expected search, to implement in practice insertion and deletion time is O(log n)

9/17/19 9:12 AM Skip Lists 183

Conclusions

• Abstract data types hide implementations • Important is the functionality of the ADT • Data structures and algorithms determine the speed of the operations on data • Linear data structures provide good versatility • Sorting – a most typical need/algorithm • Sorting in O(n log n) Merge Sort, Quicksort • Solving Recurrences – means to analyse • Skip lists – log n randomised data structure

31