08.09.20

Big-Oh notation classes

Algorithmics (6EAP) Class Informal Intuition Analogy MTAT.03.238 f(n) ∈ ο ( g(n) ) f is dominated by g Strictly below < Bounded from above Upper bound Linear structures, sorting, f(n) ∈ O( g(n) ) ≤ Bounded from “equal to” searching, Recurrences, f(n) ∈ Θ( g(n) ) above and below = Master Theorem... Bounded from below Lower bound f(n) ∈ Ω( g(n) ) ≥ Jaak Vilo f(n) ∈ ω( g(n) ) f dominates g Strictly above > 2020 Fall

Jaak Vilo 1

Linear, sequential, ordered, list … Physical ordered list ~ array

• Memory /address/ – Garbage collection

• Files (character/byte list/lines in text file,…)

• Disk – Disk fragmentation Memory, disk, tape etc – is an ordered sequentially addressed media.

Data structures Linear data structures: Arrays

• https://en.wikipedia.org/wiki/List_of_data_structures • Array • Hashed array • Bidirectional map • Heightmap • Bit array • Lookup table • https://en.wikipedia.org/wiki/List_of_terms_relating_to_algo • Bit field • Matrix rithms_and_data_structures • Bitboard • Parallel array • Bitmap • Sorted array • Circular buffer • Sparse array • Control table • Sparse matrix • Image • Iliffe vector • Dynamic array • Variable-length array • Gap buffer

1 08.09.20

Linear data structures: Lists Lists: Array

• Doubly linked list 0 1 size MAX_SIZE-1 • Array list • Xor linked list 3 6 7 5 2 • Linked list • Zipper • Self-organizing list • Doubly connected edge L = int[MAX_SIZE] • Skip list list L[2]=7 • Unrolled linked list • Difference list • VList

Lists: Array Multiple lists, 2-D-arrays, etc…

0 1 size MAX_SIZE-1 3 6 7 5 2

L = int[MAX_SIZE]

L[2]=7 L[size++] = new

1 2 size MAX_SIZE 3 6 7 5 2

L[3]=7 L[++size] = new

2D array Linear Lists

• Operations which one may want to perform on a linear list of n elements include:

– gain access to the kth element of the list to examine and/or change the contents & A[i,j] = A + i*(nr_el_in_row*el_size) + j*el_size – insert a new element before or after the kth element – delete the kth element of the list

Reference: Knuth, The Art of Comptuer Programming, Vol 1, Fundamental Algorithms, 3rd Ed., p.238-9

2 08.09.20

Abstract Data Type (ADT) ADT

• High-level definition of data types • A datatype is a of values and an associated set of operations • An ADT specifies • A datatype is abstract iff it is completely described by its set – A collection of data – A set of operations on the data or subsets of the data of operations regradless of its implementation • This means that it is possible to change the implementation • ADT does not specify how the operations should be of the datatype without changing its use implemented • The datatype is thus described by a set of procedures • These operations are the only thing that a user of the • Examples abstraction can assume – vector, list, stack, queue, deque, priority queue, table (map), , set, graph, digraph

Primitive & composite types Abstract data types:

Primitive types Composite types • Boolean (for boolean values True/False) • Array • Dictionary (key,value) • Char (for character values) • Record (also called tuple • int (for integral or fixed-precision or struct) • Stack (LIFO) values) • Union • • Float (for storing real number Queue (FIFO) values) • Tagged union (also called a • Queue (double-ended) • Double (a larger size of type float) variant, variant record, • String (for string of chars) discriminated union, or • Priority queue (fetch highest-priority object) disjoint union) • Enumerated type • ... • Plain old data structure

Abstract Data Types (ADT) Dictionary

• Some common ADTs, which have proved useful • Container of key-element (k,e) pairs in a great variety of applications, are • Required operations: Container Stack – insert( k,e ), List Graph – remove( k ), Set Queue – find( k ), Multiset Priority queue – isEmpty() Map Double-ended queue • May also support (when an order is provided): Multimap Double-ended priority queue – closestKeyBefore( k ), • Each of these ADTs may be defined in many – closestElemAfter( k ) ways and variants, not necessarily equivalent. • Note: No duplicate keys

3 08.09.20

Some data structures for Dictionary ADT Linear data structures • Unordered • Hashed array – Array Arrays Lists tree – Sequence/list • Array • Heightmap • Doubly linked list • Bidirectional • Ordered • Lookup table • Linked list map – Array • • Bit array Matrix • Self-organizing list – Sequence (Skip Lists) • Parallel array • Bit field • Skip list – Binary (BST) • Sorted array • Bitboard • Unrolled linked list – AVL trees, red-black trees • • Bitmap Sparse array – (2; 4) Trees • Sparse matrix • VList • Circular buffer – B-Trees • Iliffe vector • Xor linked list • Control table • Valued • Variable-length • Image • Zipper – Hash Tables array • Dynamic array • Doubly connected edge list – Extendible Hashing • Gap buffer

Trees … Hashes, Graphs, Other

Binary trees • • Soft • BK-tree • AA tree • Weight-balanced• • Bounding • Hashes Graphs • Propositional directed • AVL tree tree • Leftist heap Multiway trees • Bin interval • Bloom filter • Adjacency list acyclic graph • Binary search • Treap • Ternary search • Kd-tree hierarchy • BSP tree tree B-trees • Beap tree • Implicit kd-tree • Distributed hash table• Adjacency matrix • Multigraph • • • B-tree • • And–or tree • Rapidly-exploring Skew heap Min/max kd-tree random tree • • • B+ tree • Ternary heap • (a,b)-tree • Adaptive k-d tree • Hash array mapped • Graph-structured Hypergraph • Pagoda • B*-tree • D-ary heap • Link/cut tree • Kdb tree • • • stack Randomized B sharp tree • SPQR-tree • Application-specific binary search • Dancing tree • • Spaghetti stack • tree trees • Hash list • Scene graph Other • 2-3 tree • Trie • Disjoint-set data • Linear octree • Syntax tree • Red-black • 2-3-4 tree • • Z-order • Abstract syntax • Rope • • Hash table • Binary decision • Lightmap • Queap • • UB-tree tree • • • Fusion tree • Suffix array Enfilade • R-tree • Parse tree • Hash tree diagram • Winged edge • • Self-balancing • Bx-tree • Compressed • R+ tree • Decision tree binary search • • • Zero suppressed tree suffix array • R* tree Alternating • Hash trie • Quad-edge • • • Heaps FM-index Van Emde Boas • Hilbert R-tree decision tree decision diagram • • tree • • Koorde • Routing table • T-tree Heap Generalised • X-tree Minimax tree • • suffix tree • • Expectiminimax • And-inverter graph Tango tree tree • Prefix hash tree • Symbol table • • B-trie • • Threaded binary • tree • • Judy array Space-partitioning • M-tree • Directed graph • AF-heap • X-fast trie trees • • • • VP-tree • 2-3 heap • Y-fast trie • Directed acyclic graph

*array (memory address) Lists: Array Lists: Array size MAX_SIZE • Access i O(1) 0 1 size MAX_SIZE-1 • Insert to end O(1) • Delete from end O(1) 3 6 7 5 2 • Insert O(n) • Delete O(n) • Search O(n)

0 1 size 0 1 size 3 6 7 8 25 2 Insert 8 after L[2] 3 6 7 8 25 2 Insert 8 after L[2]

0 1 size 0 1 size 3 6 7 8 25 2 Delete last 3 6 7 8 25 2 Delete last

4 08.09.20

Linear Lists Linked lists head tail

• Other operations on a linear list may include: Singly linked – determine the number of elements – search the list

– sort a list head tail – combine two or more linear lists

– split a linear list into two or more lists Doubly linked – make a copy of a list

Linked lists: add Linked lists: delete (+ garbage collection?) head tail head tail

head tail head tail

size size

• Array indexed from 0 to n – 1: Introduction to linked lists k = 1 1 < k < n k = n access/change the kth element O(1) O(1) O(1) • Consider the following struct definition insert before or after the kth element O(n) O(n) O(1)

delete the kth element O(n) O(n) O(1) struct node { string word; • Singly-linked list with head and tail pointers k = 1 1 < k < n k = n int num; access/change the kth element O(1) O(n) O(1) node *next; //pointer for the next node insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1) delete the kth element O(1) O(n) O(n) }; • Doubly linked list node *p = new node; k = 1 1 < k < n k = n access/change the kth element O(1) O(n) O(1) p ? ? ? insert before or after the kth element O(1) O(1)1 O(1) num word next delete the kth element O(1) O(1)1 O(1)

5 08.09.20

Introduction to linked lists: inserting a Introduction to linked lists: adding a new node node • node *p; • How can you add another node that is pointed by p->link?

• p = new node; • node *p; • p->num = 5; • p = new node; • p->num = 5; • p->word = "Ali"; • p->word = "Ali"; • p->next = NULL; • p->next = NULL • node *q;

q • • p 5 Ali p 5 Ali ?

num word next num word link

Introduction to linked lists Introduction to linked lists node *p; node *p, *q; p = new node; p = new node; p->num = 5; p->num = 5; p->word = "Ali"; p->word = "Ali"; p->next = NULL; p->next = NULL; q = new node; node *q; q->num=8; q q = new node; q->word = "Veli"; q

? p 5 Ali ? ? ? p 5 Ali ? 8 Veli ?

num word link num word link num word next num word next

Introduction to linked lists Pointers in C/C++ node *p, *q; p = new node; p = new node ; delete p ; p->num = 5; p->word = "Ali"; p = new node[20] ; p->next = NULL;

q = new node; q->num=8; p = malloc( sizeof( node ) ) ; free p ; q->word = "Veli"; p->next = q; q->next = NULL; p = malloc( sizeof( node ) * 20 ) ; q (p+10)->next = NULL ; /* 11th elements */

p 5 Ali ? 8 Veli

num word link num word link

6 08.09.20

Book-keeping Object

• malloc, new – “remember” what has been • Object = new object_type ; created free(p), delete (C/C++) • When you need many small areas to be • Equals to creating a new object with necessary allocated, reserve a big chunk (array) and size of allocated memory (delete can free it) maintain your own set of free objects • Elements of array of objects can be pointed by the pointer to an object.

Some links Alternative: arrays and integers

• http://en.wikipedia.org/wiki/Pointer • If you want to test pointers and linked list etc. • Pointer basics: data structures, but do not have pointers http://cslibrary.stanford.edu/106/ familiar (yet)

• C++ Memory Management : What is the difference • Use arrays and indexes to array elements between malloc/free and new/delete? instead… – http://www.codeguru.com/forum/showthread.ph p?t=401848

Replacing pointers with array index Maintaining list of free objects head=3 head head=3 head 1 2 3 4 5 6 7 1 2 3 4 5 6 7 next / 5 1 8 4 7 next / 5 1 8 4 7 key 7 8 4 key 7 8 4 prev 5 / 3 prev 5 / 3

head=3 free=6 free = -1 => array is full

1 2 3 4 5 6 7 allocate object: next / 4 5 / 1 7 2 new = free; key 7 8 4 free = next[free] ; free object x prev 5 / 3 next[x]= free free = x

7 08.09.20

Multiple lists, single free list Hack: allocate more arrays …

head1=3 => 8, 4, 7 AA head2=6 => 3, 9 1 2 3 4 5 6 7 use integer division and mod free =2 (2) 8 9 10 11 12 13 14 AA[ (i-1)/7 ] -> [ (i -1) % 7 ]

15 16 17 18 19 20 21 1 2 3 4 5 6 7 LIST(10) = AA[ 1 ][ 2 ] next / 4 5 / 1 7 / key 7 8 4 3 9 LIST(19) = AA[ 2 ][ 5 ] prev 5 / 3 / 6

Example with 4 bits

A 1011 B 0110 C 1100

AC 0111 A xor C ACC 1011 AC xor C == A

Truth table for all binary logical operators

• https://en.wikipedia.org/wiki/Truth_table

• Pause

8 08.09.20

Queue

Algorithmics (6EAP) • enqueue(x) - add to end MTAT.03.238 • dequeue() - fetch from beginning Linear structures, sorting, searching, Recurrences, • FIFO – First In First Out Master Theorem... Jaak Vilo • O(1) in all reasonable cases J 2020 Fall

Jaak Vilo 49

Queue (FIFO) Queue (basic idea, does not contain all controls!)

F L MAX_SIZE-1 3 6 7 5 2 F L 3 6 7 5 2 F L MAX_SIZE-1 7 5 2

First = List[F] Pop_first : { return List[F++] } Last = List[L-1] Pop_last : { return List[--L] }

Full: return ( L==MAX_SIZE ) Empty: F< 0 or F >= L

Circular buffer Circular Queue

• A circular buffer or L F MAX_SIZE-1 ring buffer is a data 3 6 7 5 2 structure that uses a

single, fixed-size buffer First = List[F] as if it were connected Add_to_end( x ) : { List[L]=x ; L= (L+1) % MAX_SIZE ] } // % = modulo end-to-end. This Last = List[ (L-1+MAX_SIZE) % MAX_SIZE ] structure lends itself Full: return ( (L+1)%MAX_SIZE == F ) easily to buffering data Empty: F==L streams. L, F MAX_SIZE-1 3 6 7 5 2

9 08.09.20

Stack Stack based languages

• Implement a postfix calculator • push(x) -- add to end (add to top) – Reverse Polish notation • pop() -- fetch from end (top) • 5 4 3 * 2 - + => 5+((4*3)-2) • O(1) in all reasonable cases J • Very simple to parse and interpret • LIFO – Last In, First Out • FORTH, Postscript are stack-based languages

Array based stack

• How to know how big a stack shall ever be? • When full, create 2x bigger table, copy

3 6 7 5 previous n elements:

3 6 7 5 2 • After every 2k insertions, perform O(n) copy • When full, allocate bigger table dynamically, and copy all previous values there • O(n) individual insertions + • n/2 + n/4 + n/8 … copy-ing • O(n) ? • Total: O(n) effort!

What about deletions? What about deletions?

• when n=32 -> 33 (copy 32, insert 1) • when n=32 -> 33 (copy 32, insert 1) • delete: 33->32 • delete: 33->32 – should you delete (slots 33-64) immediately? – should you delete immediately? – Delete only when becomes less than 1/4th full

– Have to delete at least n/2 to decrease – Have to add at least n to increase size – Most operations, O(1) effort – But few operations take O(n) to copy – For any m operations, O(m) time

10 08.09.20

Amortized analysis Lists and dictionary ADT…

• How to maintain a dictionary using (linked) • Analyze the time complexity over the entire lists? “lifespan” of the algorithm • Is k in D ? – go through all elements d of D, test if d==k O(n) • Some operations that cost more will be – If sorted: d= first(D); while( d<=k ) d=next(D); “covered” by many other operations taking – on average n/2 tests … less • Add(k,D) => insert(k,D) = O(1) or O(n) – test for uniqueness

Array based sorted list Binary search – recursive BinarySearch(A[0..N-1], value, low, high) { • is d in D ? if (high < low) return -1 // not found • Binary search in D mid = (low + high) / 2 if (A[mid] > value) return BinarySearch(A, value, low, mid-1) else if (A[mid] < value)

low mid high return BinarySearch(A, value, mid+1, high) else return mid // found }

Binary search – recursive Binary search – iterative BinarySearch(A[0..N-1], value, low, high) { BinarySearch(A[0..N-1], value) if (high < low) { low = 0 ; high = N - 1 ; return -1 // not found while (low <= high) { mid = low + ((high - low) / 2) // Note: not (low + high) / 2 !! mid = low + ((high - low) / 2) // Note: not (low + high) / 2 !! if (A[mid] > value) if (A[mid] > value) return BinarySearch(A, value, low, mid-1) high = mid - 1 else if (A[mid] < value) else if (A[mid] < value) return BinarySearch(A, value, mid+1, high) low = mid + 1 else else return mid // found return mid // found } } return -1 // not found }

11 08.09.20

Work performed Sorting

• x <=> A[18] ? < • given a list, arrange values so that • x <=> A[9] ? > L[1] <= L[2] <= … <= L[n] • x <=> A[13] ? == • n elements => n! possible orderings • One test L[i] <= L[j] can divide n! to 2 1 18 36 – Make a binary tree and calculate the depth • log( n! ) = Ω ( n log n ) • O(lg n) • Hence, lower bound for sorting is Ω(n log n) – using comparisons…

Decision tree model Proof: log(n!) = Ω ( n log n )

• log( n! ) = log n + log (n-1) + log(n-2) + … log(1)

>= n/2 * log( n/2 )

Half of elements are larger than log(n/2) = Ω ( n log n )

• n! orderings (leaves) • Height of such tree?

12 08.09.20

The divide-and-conquer design paradigm

1. Divide the problem (instance) into subproblems. 2. Conquer the subproblems by solving them recursively. 3. Combine subproblem solutions.

Merge sort Example

Merge-Sort(A,p,r) • Applying the merge sort algorithm: if p

It was invented by John von Neumann in 1945.

Wikipedia / viz. Merge of two lists: Θ(n)

A, B – lists to be merged L = new list; // empty Value while( A not empty and B not empty ) if A.first() <= B.first() then append( L, A.first() ) ; A = rest(A) ; else append( L, B.first() ) ; B = rest(B) ; append( L, A); // all remaining elements of A append( L, B ); // all remaining elements of B return L Pos in array

13 08.09.20

Run-time Analysis of Merge Sort

• Thus, the time required to sort an array of size n > 1 is: – the time required to sort the first half, – the time required to sort the second half, and – the time required to merge the two lists • That is: ì Θ(1) n = 1 T(n) = í n î2T(2 )+ Θ(n) n > 1

Merge sort

• Worst case, average case, best case … Θ( n log n ) • Common wisdom: – Requires additional space for merging (in case of arrays) • Homework*: develop in-place merge of two lists implemented in arrays /compare speed/

a b L[a] > L[b] L[a] <= L[b] a b a b

Quicksort

• Proposed by C.A.R. Hoare in 1962. • Divide-and-conquer algorithm. • Sorts “in place”(like insertion sort, but not like merge sort). • Very practical (with tuning).

14 08.09.20

Quicksort

Partitioning version 2 L R Wikipedia / “video” pivot

L i j R <= pivot pivot = A[R]; // > pivot i=L; j=R-1; while( i<=j ) while ( A[i] < pivot ) i++ ; // will stop at pivot latest while ( i<=j and A[j] >= pivot ) j-- ; if ( i < j ) { swap( A[i], A[j] ); i++; j-- } A[R]=A[i]; L j i R

A[i]=pivot; L j i R return i;

15 08.09.20

Choice of pivot in Quicksort

• Select median of three …

• Select random – opponent can not choose the winning strategy against you!

Random pivot L R <= pivot

L i j R > pivot

Select pivot randomly from the region (blue) and swap with last position Select pivot as a median of 3 [or more] random values from region

Apply non-recursive sort for array less than 10-20

L j i R http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-046JFall-2005/VideoLectures/detail/embed04.htm

16 08.09.20

• 2-pivot version of Quicksort – split in 3 regions! • Handle equal values (equal to pivots) as „third regioon“

17 08.09.20

18 08.09.20

Alternative materials

• Quicksort average case analysis http://eid.ee/10z • https://secweb.cs.odu.edu/~zeil/cs361/web/website/Lectures/quick/pages/ar01s05.html

• http://eid.ee/10y - MIT Open Courseware - Asymptotic notation, Recurrences, Substituton Master Method

• Pause

log b a f(n) f(n) n vs f(n)

f(n/b) f(n/b) f(n/b) f(n/b) f(n/b) f(n/b)

19 08.09.20

20 08.09.20

21 08.09.20

Back to sorting We can sort in O(n log n)

• using comparisons ( < <= >= > ) we can not do better than O(n log n)

• Is that the best we can do ?

How fast can we sort n integers?

• Sort people by their sex? (F/M, 0/1)

• Sort people by year of birth?

22 08.09.20

23 08.09.20

Radix sort

Radix-Sort(A,d)

1. for i = 1 to d /* least significant to most significant */ 2. use a stable sort to sort A on digit i

24 08.09.20

Radix sort using lists (stable)

bbb bba adb aad aac ccc ccb aba cca

Radix sort using lists (stable) Radix sort using lists (stable)

bbb bba adb aad aac ccc ccb aba cca bbb bba adb aad aac ccc ccb aba cca 2. 1. 1. a bba aba cca a bba aba cca a aac aad

b bbb adb ccb b bbb adb ccb b bba aba bbb

c aac ccc c aac ccc c cca ccb ccc

d aad d aad d adb

aac aad aba adb 3. a b bba bbb

c cca ccb ccc

d

25 08.09.20

Why not from left to right ? Bitwise sort left to right

0101100 0101100 0101100 0101100 • Idea2: 1001010 0010010 0010010 0010010 1111000 1111000 0101000 0101000 – swap elements only if the prefixes match… 1001001 1001001 1001001 0010000 0010010 1001010 1001010 1001010 – 1001001 1001001 1001001 1001001 For all bits from most significant 0101000 0101000 1111000 1111000 • advance when 0 0010000 0010000 0010000 1001001 • when 1 -> look for next 0 – if prefix matches, swap – otherwise keep advancing on 0’s and look for next 1 • Swap ‘0’ with first ‘1’ • Idea 1: recursively sort first and second half – Exercise ?

do { /* For each bit */ i=0; Bitwise left to right sort new_mask: for( ; ( i < size ) && ( ! (ARRAY[i] & curbit) ) ; i++ ) ; /* Advance while bit == 0 */ if( i >= size ) goto array_end ; /* Historical sorting – was used in Univ. of Tartu using assembler…. */ group = ARRAY[i] & mask ; /* Save current prefix snapshot */ /* C implementation – Jaak Vilo, 1989 */ j=i; /* memorize location of 1 */ for( ; ; ) { void bitwisesort( SORTTYPE *ARRAY , int size ) if ( ++i >= size ) goto array_end ; /* reached end of array */ { if ( (ARRAY[i] & mask) != group) goto new_mask ; /* new prefix */ int i, j, tmp, nrbits ; if ( ! (ARRAY[i] & curbit) ) { /* bit is 0 – need to swap with previous location of 1, A[i] óA[j] */ tmp = ARRAY[i]; ARRAY[i] = ARRAY[j]; ARRAY[j] = tmp ; j += 1 ; /* swap and increase j to the register SORTTYPE mask , curbit , group ; next possible 1 */ } } nrbits = sizeof( SORTTYPE ) * 8 ; array_end: mask = mask | curbit ; /* area under mask is now sorted */ curbit = 1 << (nrbits-1) ; /* set most significant bit 1 */ curbit >>= 1 ; /* next bit */ } while( curbit ); /* until all bits have been sorted… */ mask = 0; /* mask of the already sorted area */ }

Jaak Vilo, Univ. of Tartu Jaak Vilo, Univ. of Tartu

Bitwise from left to right Bucket sort

0010000 • Assume uniform distribution 0010010 0101000 0101100 • Allocate O(n) buckets 1001010 1001001 1001001 • Assign each value to pre-assigned bucket 1111000

• Swap ‘0’ with first ‘1’

Jaak Vilo, Univ. of Tartu

26 08.09.20

Sort small buckets with insertion http://sortbenchmark.org/ sort .78 0 / • The sort input records must be 100 bytes in length, .17 1 .12 .17 with the first 10 bytes being a random key .39 2 .21 .23 .26 • Minutesort – max amount sorted in 1 minute .26 3 .39 – 116GB in 58.7 sec (Jim Wyllie, IBM Research) .72 4 / – 40-node 80-Itanium cluster, SAN array of 2,520 disks

.94 5 / • 2009, 500 GB Hadoop 1406 nodes x (2 Quadcore Xeons, 8 GB memory, 4 SATA) .21 .68 6 Owen O'Malley and Arun Murthy .12 7 .72 .78 Yahoo Inc. .23 8 • Performance / Price Sort and PennySort .68 9 .94

27 08.09.20

Sort Benchmark

• http://sortbenchmark.org/ • Sort Benchmark Home Page • We have a new benchmark called new GraySort, new in memory of the father of the sort benchmarks, Jim Gray. It replaces TeraByteSort which is now retired. • Unlike 2010, we will not be accepting early entries for the 2011 year. The deadline for submitting entries is April 1, 2011. – All hardware used must be off-the-shelf and unmodified. – For Daytona cluster sorts where input sampling is used to determine the output partition boundaries, the input sampling must be done evenly across all input partitions. New rules for GraySort: • The input file size is now minimum ~100TB or 1T records. Entries with larger input sizes also qualify. • The winner will have the fastest SortedRecs/Min. • We now provide a new input generator that works in parallel and generates binary data. See below. • For the Daytona category, we have two new requirements. (1) The sort must run continuously (repeatedly) for a minimum 1 hour. (This is a minimum reliability requirement). (2) The system cannot overwrite the input file.

Order statistics Order statistics

• Minimum – the smallest value • Input: A set A of n numbers and i, 1≤i≤n • Maximum – the largest value • Output: x from A that is larger than exactly i-1 • In general i’th value. elements of A • Find the median of the values in the array • Median in sorted array A : – n is odd A[(n+1)/2] – n is even – A[ (n+1)/2 ] or A[ (n+1)/2 ]

Q: Find i’th value in unsorted data Minimum

Minimum(A) A. O( n ) 1 min = A[1] B. O( n log log n ) 2 for i = 2 to length(A) C. O(n log n) 3 if min > A[i] D. O( n log2 n ) 4 then min = A[i] 5 return min

n-1 comparisons.

28 08.09.20

Min and max together Selection in expected O(n)

• compare every two elements A[i],A[i+1] Randomised-select( A, p, r, i ) • Compare larger against current max if p=r then return A[p] • Smaller against current min q = Randomised-Partition(A,p,r) k= q – p + 1 // nr of elements in subarr • 3 n / 2 if i<= k then return Randomised-Partition(A,p,q,i) else return Randomised-Partition(A,q+1,r,i-k)

Conclusion Ok…

• Sorting in general O( n log n ) • lists – a versatile data structure for various • Quicksort is rather good purposes • Sorting – a typical algorithm (many ways) • Linear time sorting is achievable when one • Which sorting methods for array/list? does not assume only direct comparisons • Array: most of the important (e.g. update) • Find i’th value in unsorted – expected O(n) tasks seem to be O(n), which is bad

• Find i’th value: worst case O(n) – see CLRS

Can we search faster Q: search for a value X in linked list? in linked lists?

A. O(1) • Why sort linked lists if search anyway O(n)?

B. O( log n ) • Linked lists: – what is the “mid-point” of any sublist ? C. O( n ) – Therefore, binary search can not be used…

• Or can it ?

29 08.09.20

Skip List Skip lists

• Build several lists at different “skip” steps

• O(n) list • Level 1: ~ n/2 • Level 2: ~ n/4 • … • Level log n ~ 2-3 elements…

Skip List

typedef struct nodeStructure *node; Skip Lists typedef struct nodeStructure{ keyType key; S3 -¥ +¥ valueType value; S2 -¥ 15 +¥ node forward[1]; /* variable sized array of forward pointers */ S1 -¥ 15 23 +¥ }; S0 -¥ 10 15 23 36 +¥

9/5/20 3:22 PM Skip Lists 177

What is a Skip List Illustration of lists

• A skip list for a set S of distinct (key, element) items is a series of lists S0, S1 , … , Sh such that – Each list Si contains the special keys +¥ and -¥ – List S0 contains the keys of S in nondecreasing order – Each list is a subsequence of the previous one, i.e., S0 Ê S1 Ê … Ê Sh – List Sh contains only the two special keys • - We show how to use a skip list to implement the dictionary ADT 3 17 25 30 47 99 inf inf S3 -¥ +¥

S2 -¥ 31 +¥

S1 -¥ 23 31 34 64 +¥ Key Right[ .. ] - right links in array S0 -¥ 12 23 26 31 34 44 56 64 78 +¥ Left[ .. ] - left links in array, array size – how high is the list

9/5/20 3:22 PM Skip Lists 179

30 08.09.20

Search Randomized Algorithms

• We search for a key x in a a skip list as follows: • A randomized algorithm • We analyze the expected – We start at the first position of the top list performs coin tosses (i.e., uses running time of a randomized – At the current position p, we compare x with y ¬ key(after(p)) random bits) to control its algorithm under the following execution assumptions x = y: we return element(after(p)) • – x > y: we “scan forward” It contains statements of the the coins are unbiased, and – x < y: we “drop down” type the coin tosses are b ¬ random() independent – If we try to drop down past the bottom list, we return NO_SUCH_KEY if b = 0 • The worst-case running time of • Example: search for 78 do A … a randomized algorithm is else { b = 1} often large but has very low S3 -¥ +¥ probability (e.g., it occurs when do B … all the coin tosses give “heads”) S2 31 • Its running time depends on the -¥ +¥ • We use a randomized algorithm outcomes of the coin tosses S1 -¥ 23 31 34 64 +¥ to insert items into a skip list

S0 -¥ 12 23 26 31 34 44 56 64 78 +¥

9/5/20 3:22 PM Skip Lists 181 9/5/20 3:22 PM Skip Lists 182

Insertion Deletion

• To insert an item (x, o) into a skip list, we use a randomized • To remove an item with key x from a skip list, we proceed as follows: algorithm: – We search for x in the skip list and find the positions p0, p1 , …, pi of the – We repeatedly toss a coin until we get tails, and we denote with i the items with key x, where position pj is in list Sj number of times the coin came up heads – If i ³ h, we add to the skip list new lists Sh+1, … , Si +1, each containing – We remove positions p0, p1 , …, pi from the lists S0, S1, … , Si only the two special keys – We remove all but one list containing only the two special keys – We search for x in the skip list and find the positions p0, p1 , …, pi of the • Example: remove key 34 items with largest key less than x in each list S0, S1, … , Si – For j ¬ 0, …, i, we insert item (x, o) into list Sj after position pj • Example: insert key 15, with i = 2 S3 -¥ +¥ S3 -¥ +¥ p2 p2 S2 -¥ +¥ S2 -¥ 15 +¥ S2 -¥ 34 +¥ S2 -¥ +¥ p1 p1 S1 -¥ 23 +¥ S1 -¥ 15 23 +¥ S1 -¥ 23 34 +¥ S1 -¥ 23 +¥ p0 p0 S0 -¥ 10 23 36 +¥ S0 -¥ 10 15 23 36 +¥ S0 -¥ 12 23 34 45 +¥ S0 -¥ 12 23 45 +¥

9/5/20 3:22 PM Skip Lists 183 9/5/20 3:22 PM Skip Lists 184

Implementation v2 Space Usage

• Consider a skip list with n items • We can implement a skip list • The space used by a skip list with quad-nodes – By Fact 1, we insert an item in depends on the random bits used i list Si with probability 1/2 • A quad-node stores: by each invocation of the insertion – – item algorithm By Fact 2, the expected size of list Si is n/2i – link to the node before quad-node • We use the following two basic • The expected number of nodes – probabilistic facts: link to the node after used by the skip list is – link to the node below x Fact 1: The probability of getting i – link to the node after consecutive heads when flipping i 1/2 h h • Also, we define special keys a coin is n 1 Fact 2: If each of n items is present in = n < 2n PLUS_INF and MINUS_INF, and å 2i å 2i a set with probability p, the i=0 i=0 we modify the key comparator expected size of the set is np Thus, the expected space to handle them usage of a skip list with n items is O(n)

9/5/20 3:22 PM Skip Lists 185 9/5/20 3:22 PM Skip Lists 186

31 08.09.20

Height Search and Update Times

• • • The running time of the search • Consider a skip list with n items The search time in a skip list is When we scan forward in a list, proportional to the destination key does not an insertion algorithms is – By Fact 1, we insert an item in list i – the number of drop-down steps, belong to a higher list affected by the height h of the Si with probability 1/2 plus – A scan-forward step is associated skip list – By Fact 3, the probability that list Si has at least one item is at most – the number of scan-forward with a former coin toss that gave • We show that with high n/2i steps tails • By Fact 4, in each list the probability, a skip list with n • By picking i = 3log n, we have • The drop-down steps are expected number of scan-forward items has height O(log n) that the probability that S3log n bounded by the height of the O(log n) steps is 2 • We use the following additional has at least one item is skip list and thus are probabilistic fact: at most with high probability • Thus, the expected number of n/23log n = n/n3 = 1/n2 • To analyze the scan-forward scan-forward steps is O(log n) Fact 3: If each of n events has steps, we use yet another • probability p, the probability • Thus a skip list with n items has We conclude that a search in a 3log n probabilistic fact: O(log n) that at least one event occurs height at most with skip list takes expected 1 - 1/n2 is at most np probability at least Fact 4: The expected number of time coin tosses required in order to • The analysis of insertion and get tails is 2 deletion gives similar results

9/5/20 3:22 PM Skip Lists 187 9/5/20 3:22 PM Skip Lists 188

Summary

• A skip list is a data structure • Using a more complex for dictionaries that uses a probabilistic analysis, one randomized insertion can show that these algorithm performance bounds also • In a skip list with n items hold with high probability – The expected space used is O(n) • Skip lists are fast and simple – The expected search, to implement in practice insertion and deletion time is O(log n)

9/5/20 3:22 PM Skip Lists 189

Conclusions

• Abstract data types hide implementations • Important is the functionality of the ADT • Data structures and algorithms determine the speed of the operations on data • Linear data structures provide good versatility • Sorting – a most typical need/algorithm • Sorting in O(n log n) Merge Sort, Quicksort • Solving Recurrences – means to analyse • Skip lists – log n randomised data structure

32