TITLE: Computer Science II (Algorithms) SPEAKER: Paolo Ferragina, University of Pisa Bocconi University Ph.D. in Statistics

TITLE: Computer Science II (Algorithms) SPEAKER: Paolo Ferragina, University of Pisa Bocconi University Ph.D. in Statistics

Bocconi University Ph.D. in Statistics and Computer Science 2021-2022 TITLE: Computer Science II (Algorithms) SPEAKER: Paolo Ferragina, University of Pisa GOAL: In this course we will study, design and analyze algorithms and data structures for the efficient solution of combinatorial problems involving Big Data of several types, such as integers, strings, trees and graphs. Special attention will be devoted to the architectural features of modern storage technologies which are key issues when designing scalable Data Science platforms which process large and complex datasets. Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithms aimed at minimizing the use of some computational resources like time, space, I/O, energy, etc. PREREQUISITE: Basic algorithms and data structures, Basic notions of probability and discrete math. TEACHING MATERIAL: Notes of the teacher PRELIMINARY PROGRAM: 10 lectures, for a total of 24h. Every lecture will consist of few “sessions”, each one of 45 mins. Lecture 1 (Warm Up – 2 sessions) • Models of computation: i.e., RAM, 2-level memories, streaming. • Scanning versus Jumping in algorithm design • The issue of Virtual memory Lecture 2 (Sorting – 2 sessions) • Sorting Data: Mergesort in internal memory (small data) and on disk (big data) • The I/O Lower Bound • Permuting versus Sorting Lecture 3 (Sampling – 2 sessions) • Sampling Data uniformly at random from a stream of items (known length, unknown length) Lecture 4 (Dictionary problem: Exact search – 2 sessions) • Cuckoo hashing • Bloom filters and Spectral Bloom Filters • Application to search engines and DBs Lecture 5 (Dictionary problem: Approximate search – 2 sessions) • Locality Sensitive Hashing • Hamming distance on vectors • Shingling and document deduplication • Application dedup, clustering, approximate search, ... Lecture 6 (Strings: Prefix search – 2 sessions) • Tries (uncompacted, compacted) • Patricia trie • 2-level indexing and prefix search • Application to key-value store and Auto-completion search Lecture 7 (Strings: Substring search – 2 sessions) • Suffix Arrays and LCP array • Suffix Tree • Searching strings by substring • Application to string statistics in efficient space and time Lecture 8 (Strings: Dynamic Programming – 2 sessions) • Dynamic Programming • Application to Edit distance, Knapsack, Viterbi, Data compression Lecture 9 (Data Compression – 4 sessions) • 0-th order entropy compressors: Huffman coding and Arithmetic coding • K-th order entropy compressors: Lempel-Ziv parsing and the Burrows-Wheeler Transform • Applications: gzip and bzip Lecture 10 (Graphs – 4 sessions) • Graph representation • Algorithms for graphs: DFS, BFS, … • Graph storage (compressed?): from basic codes to Elias-Fano codes • Centrality measures: PageRank and HITS, and some of their variants .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    2 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us