Scalpel: Optimizing Query Streams Using Semantic Prefetching

Total Page:16

File Type:pdf, Size:1020Kb

Scalpel: Optimizing Query Streams Using Semantic Prefetching Scalpel: Optimizing Query Streams Using Semantic Prefetching by Ivan Thomas Bowman A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2005 c Ivan Thomas Bowman 2005 AUTHOR’S DECLARATION FOR ELECTRONIC SUBMISSION OF A THESIS I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. iii Abstract Client applications submit streams of relational queries to database servers. For simple re- quests, inter-process communication costs account for a significant portion of user-perceived la- tency. This trend increases with faster processors, larger memory sizes, and improved database execution algorithms, and this trend is not significantly offset by improvements in communica- tion bandwidth. Caching and prefetching are well studied approaches to reducing user-perceived latency. Caching is useful in many applications, but it does not help if future requests rarely match pre- vious requests. Prefetching can help in this situation, but only if we are able to predict future re- quests. This prediction is complicated in the case of relational queries by the presence of request parameters: a prefetching algorithm must predict not only a query that will be executed in the fu- ture, but also the actual parameter values that will be supplied. We have found that, for many applications, the streams of submitted queries contain patterns that can be used to predict future requests. Further, there are correlations between results of ear- lier requests and actual parameter values used in future requests. We present the Scalpel system, a prototype implementation that detects these patterns of queries and optimizes request streams us- ing context-based predictions of future requests. Scalpel uses its predictions to provide a form of semantic prefetching, which involves com- bining a predicted series of requests into a single request that can be issued immediately. Scalpel’s semantic prefetching reduces not only the latency experienced by the application but also the to- tal cost of query evaluation. We describe how Scalpel learns to predict optimizable request pat- terns by observing the application’s request stream during a training phase. We also describe the types of query pattern rewrites that Scalpel’s cost-based optimizer considers. Finally, we present empirical results that show the costs and benefits of Scalpel’s optimizations. We have found that even when an application is well suited for its original configuration, it may behave poorly when moving to a new configuration such as a wireless network. The opti- mizations performed by Scalpel take the current configuration into account, allowing it to select strategies that give good performance in a wider range of configurations. v Acknowledgements This document marks a milestone in a course of research that began in the fall of 2000, when I had a long conversation with Kenneth Salem about problematic client applications that had been plaguing my working days. I’d like to thank Ken for provoking me to formalize proposed solu- tions and for helping me find a thesis-sized research problem within an unruly cloud of ideas. I owe Ken a debt of gratitude for his guidance, encouragement, and understanding. His help has been instrumental in completing this work. I would also like to thank the other members of my examining committee—Kostas Konto- giannis, Jeffrey Naughton, Tamer Ozsu,¨ and David Toman—for their thoughtful comments, ques- tions, and suggestions. This final text has been improved by changes based on their comments on earlier drafts. Over the years, I have been fortunate to have the support of many people, and I would like to thank my family, friends, and colleagues for their help and best wishes. My co-workers at iAny- where Solutions have provided useful ideas and feedback, and helped to give me the flexibility to complete my research while working full-time. In particular, I would like to thank my man- ager, Glenn Paulley, for providing me with challenging problems, useful advice, interesting ref- erences, and unflagging support. I’d also like to thank the other members of the dev qp team for their encouragement and understanding during periods where I was working on multiple tasks: Mike Demko, Dan Farrar, Anil Goel, Anisoara Nica, and Matthew Young-Lai were unfailing in their support. My friends have provided a source of strength and stability in my life, and I would like to thank them for the encouragement, support, and humour they have provided over the years. The banter list provided welcome distraction, as did occasional events. My family has also contributed in uncountably many ways to the successful completion of this work. The Bowmans, Browns, Carters, Farleys, Richardsons, and Sharpes were often in my thoughts. My grandparents, Lloyd and Winnifred Richardson, provided me with model examples. My brother, Don, also provided me with advice and feedback on early versions of this research, and I thank him for that. In particular, I would like to thank my mom, Nancy, for inspiring me by her example, working hard to achieve her goals while still finding time to appreciate the world around her. Finally, it is with love and gratitude that I thank my wife Sherri for helping to motivate me to start this course of study, then for being supportive and tolerant through many late nights, early mornings, and working vacations. vii Dedication For Dr. Lloyd T. Richardson ix Contents 1 Introduction 1 1.1 Motivation...................................... 3 1.2 UsingPrefetching................................ 5 1.3 Application Request Patterns . ...... 6 1.4 Thesis........................................ 8 1.5 Outline ....................................... 8 2 Preliminaries 11 2.1 Scalpel System Architecture . ..... 11 2.2 Pseudo-Code Conventions . 12 2.3 ModelofRequestStreams . 13 2.4 Notation for Strings and Sequences . ....... 15 3 Nested Request Patterns 17 3.1 ExampleofNesting................................ 18 3.2 PatternDetector................................. 19 3.2.1 ContextTree ................................ 19 3.2.2 Tracking Parameter Correlations . ..... 24 3.2.2.1 Overhead of Correlation Detection . 27 3.2.3 Client Predicate Selectivity . ..... 29 3.2.4 Summary of Pattern Detection . 30 3.3 QueryRewriter ................................... 32 3.3.1 NestedExecution.............................. 33 3.3.2 Query Rewrite Preliminaries . 34 3.3.2.1 Algebraic Notation . 36 3.3.2.2 Lateral Derived Tables . 37 3.3.2.3 Choosing The Best Correlation Value . 40 3.3.3 UnifiedExecution ............................. 41 3.3.3.1 The Outer Join Strategy . 42 xi 3.3.3.2 The Outer Union Strategy . 46 3.3.4 Partitioned Execution . 50 3.3.4.1 The Client Hash Join Strategy . 50 3.3.4.2 The Client Merge Join Strategy . 52 3.3.5 RewritingaContextTree. 55 3.3.5.1 Representing Run-Time Behaviour With ACTION Objects . 55 3.3.5.2 The REWRITE-TREE Procedure................. 58 3.3.6 SummaryofQueryRewriter . 60 3.4 Prefetcher ...................................... 62 3.4.1 Mistaken Predictions . 65 3.5 PatternOptimizer ................................ 65 3.5.1 ValidExecutionPlans . 66 3.5.2 RankingPlans ............................... 66 3.5.3 Exhaustive Enumeration . 68 3.6 Experiments..................................... 68 3.6.1 Effects of Client Predicate Selectivity . ........ 70 3.6.2 QueryCosts................................. 72 3.6.3 NumberofColumns ............................ 73 3.6.4 ExecutionCosts .............................. 74 3.6.5 ScalpelOverhead.............................. 76 3.7 Summary of Nested Request Patterns . ...... 76 4 Cost Model 79 4.1 Estimating Per-Request Overhead U0 ........................ 80 4.2 Estimating the Cost of Interpreting Results . ........... 81 4.3 Estimating the Cost of Queries . ..... 82 4.3.1 Estimating the Cost of a Lateral Derived Table . ....... 83 4.3.2 Estimating the Cost of an Outer Union . 85 4.4 SummaryofCostModel .............................. 86 xii 5 Batch Request Patterns 89 5.1 ExampleofBatchPattern. 91 5.2 PatternDetector................................. 94 5.2.1 ModelsofRequestStreams . 95 5.2.1.1 Order-k Models ......................... 96 5.2.1.2 Choosing a Context Length . 99 5.2.1.3 Finite State Models . 101 5.2.1.4 Summary of Request Models . 101 5.2.2 SuffixTrieDetection . 102 5.2.2.1 Implicit Suffix Tries . 103 5.2.2.2 Estimating the Probability of a Future Request . 103 5.2.2.3 Tracking Parameter Correlations . 107 5.2.2.4 Summary of Suffix Trie Detection . 111 5.2.3 A Path Compressed Suffix Trie . 112 5.2.3.1 Estimating the Probability of a Future Request . 118 5.2.3.2 Tracking Correlations . 120 5.2.3.3 Summary of Suffix Tree Detection . 126 5.2.4 Summary of Pattern Detector . 126 5.3 PatternOptimizer ................................ 127 5.3.1 CostBasedOptimization . 129 5.3.1.1 Feasible Prefetches . 132 5.3.1.2 Choosing the Best Feasible Prefetch . 132 5.3.2 Building a Finite-State Model . 135 5.3.3 RemovingRedundancy. 136 5.3.4 Summary of the Pattern Optimizer . 138 5.4 QueryRewriter ................................... 139 5.4.1 Alternative Prefetch Strategies . 140 5.4.2 Rewriting with Join and Union . 144 5.4.3 Representing Run-Time Behaviour With ACTION Objects . 144 5.4.4 SummaryofQueryRewriter . 148 5.5 Prefetcher ...................................... 150 5.5.1 SummaryofPrefetcher. 154 5.6 Experiments..................................... 154 5.6.1 Effectiveness of Semantic Prefetching . ....... 154 5.6.1.1 BatchLength .......................... 156 xiii 5.6.1.2 Useful Prefetches . 157 5.6.1.3 BreakdownofCosts . 159 5.6.2 Scalpel Training Overhead . 160 5.7 Summary of Batch Request Patterns . 161 6 CombiningNestedandBatchRequestPatterns 163 6.1 Example Combining Nesting and Batches . 163 6.2 Combining Context/Suffix Trees . 165 6.3 Current Implementation . 167 6.4 Summary of Combining Nested and Batch Request Patterns . .......... 168 7 Prefetch Consistency 169 7.1 Updates by the Same Transaction .
Recommended publications
  • Minimax Redundancy for Markov Chains with Large State Space
    Minimax redundancy for Markov chains with large state space Kedar Shriram Tatwawadi, Jiantao Jiao, Tsachy Weissman May 8, 2018 Abstract For any Markov source, there exist universal codes whose normalized codelength approaches the Shannon limit asymptotically as the number of samples goes to infinity. This paper inves- tigates how fast the gap between the normalized codelength of the “best” universal compressor and the Shannon limit (i.e. the compression redundancy) vanishes non-asymptotically in terms of the alphabet size and mixing time of the Markov source. We show that, for Markov sources (2+c) whose relaxation time is at least 1 + √k , where k is the state space size (and c> 0 is a con- stant), the phase transition for the number of samples required to achieve vanishing compression redundancy is precisely Θ(k2). 1 Introduction For any data source that can be modeled as a stationary ergodic stochastic process, it is well known in the literature of universal compression that there exist compression algorithms without any knowledge of the source distribution, such that its performance can approach the fundamental limit of the source, also known as the Shannon entropy, as the number of observations tends to infinity. The existence of universal data compressors has spurred a huge wave of research around it. A large fraction of practical lossless compressors are based on the Lempel–Ziv algorithms [ZL77, ZL78] and their variants, and the normalized codelength of a universal source code is also widely used to measure the compressibility of the source, which is based on the idea that the normalized codelength is “close” to the true entropy rate given a moderate number of samples.
    [Show full text]
  • Digital Communication Systems 2.2 Optimal Source Coding
    Digital Communication Systems EES 452 Asst. Prof. Dr. Prapun Suksompong [email protected] 2. Source Coding 2.2 Optimal Source Coding: Huffman Coding: Origin, Recipe, MATLAB Implementation 1 Examples of Prefix Codes Nonsingular Fixed-Length Code Shannon–Fano code Huffman Code 2 Prof. Robert Fano (1917-2016) Shannon Award (1976 ) Shannon–Fano Code Proposed in Shannon’s “A Mathematical Theory of Communication” in 1948 The method was attributed to Fano, who later published it as a technical report. Fano, R.M. (1949). “The transmission of information”. Technical Report No. 65. Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT. Should not be confused with Shannon coding, the coding method used to prove Shannon's noiseless coding theorem, or with Shannon–Fano–Elias coding (also known as Elias coding), the precursor to arithmetic coding. 3 Claude E. Shannon Award Claude E. Shannon (1972) Elwyn R. Berlekamp (1993) Sergio Verdu (2007) David S. Slepian (1974) Aaron D. Wyner (1994) Robert M. Gray (2008) Robert M. Fano (1976) G. David Forney, Jr. (1995) Jorma Rissanen (2009) Peter Elias (1977) Imre Csiszár (1996) Te Sun Han (2010) Mark S. Pinsker (1978) Jacob Ziv (1997) Shlomo Shamai (Shitz) (2011) Jacob Wolfowitz (1979) Neil J. A. Sloane (1998) Abbas El Gamal (2012) W. Wesley Peterson (1981) Tadao Kasami (1999) Katalin Marton (2013) Irving S. Reed (1982) Thomas Kailath (2000) János Körner (2014) Robert G. Gallager (1983) Jack KeilWolf (2001) Arthur Robert Calderbank (2015) Solomon W. Golomb (1985) Toby Berger (2002) Alexander S. Holevo (2016) William L. Root (1986) Lloyd R. Welch (2003) David Tse (2017) James L.
    [Show full text]
  • Principles of Communications ECS 332
    Principles of Communications ECS 332 Asst. Prof. Dr. Prapun Suksompong (ผศ.ดร.ประพันธ ์ สขสมปองุ ) [email protected] 1. Intro to Communication Systems Office Hours: Check Google Calendar on the course website. Dr.Prapun’s Office: 6th floor of Sirindhralai building, 1 BKD 2 Remark 1 If the downloaded file crashed your device/browser, try another one posted on the course website: 3 Remark 2 There is also three more sections from the Appendices of the lecture notes: 4 Shannon's insight 5 “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” Shannon, Claude. A Mathematical Theory Of Communication. (1948) 6 Shannon: Father of the Info. Age Documentary Co-produced by the Jacobs School, UCSD- TV, and the California Institute for Telecommunic ations and Information Technology 7 [http://www.uctv.tv/shows/Claude-Shannon-Father-of-the-Information-Age-6090] [http://www.youtube.com/watch?v=z2Whj_nL-x8] C. E. Shannon (1916-2001) Hello. I'm Claude Shannon a mathematician here at the Bell Telephone laboratories He didn't create the compact disc, the fax machine, digital wireless telephones Or mp3 files, but in 1948 Claude Shannon paved the way for all of them with the Basic theory underlying digital communications and storage he called it 8 information theory. C. E. Shannon (1916-2001) 9 https://www.youtube.com/watch?v=47ag2sXRDeU C. E. Shannon (1916-2001) One of the most influential minds of the 20th century yet when he died on February 24, 2001, Shannon was virtually unknown to the public at large 10 C.
    [Show full text]
  • IEEE Information Theory Society Newsletter
    IEEE Information Theory Society Newsletter Vol. 59, No. 3, September 2009 Editor: Tracey Ho ISSN 1059-2362 Optimal Estimation XXX Shannon lecture, presented in 2009 in Seoul, South Korea (extended abstract) J. Rissanen 1 Prologue 2 Modeling Problem The fi rst quest for optimal estimation by Fisher, [2], Cramer, The modeling problem begins with a set of observed data 5 5 5 c 6 Rao and others, [1], dates back to over half a century and has Y yt:t 1, 2, , n , often together with other so-called 5 51 c26 changed remarkably little. The covariance of the estimated pa- explanatory data Y, X yt, x1,t, x2,t, . The objective is rameters was taken as the quality measure of estimators, for to learn properties in Y expressed by a set of distributions which the main result, the Cramer-Rao inequality, sets a lower as models bound. It is reached by the maximum likelihood (ML) estima- 5 1 u 26 tor for a restricted subclass of models and asymptotically for f Y|Xs; , s . a wider class. The covariance, which is just one property of u5u c u models, is too weak a measure to permit extension to estima- Here s is a structure parameter and 1, , k1s2 real- valued tion of the number of parameters, which is handled by various parameters, whose number depends on the structure. The ad hoc criteria too numerous to list here. structure is simply a subset of the models. Typically it is used to indicate the most important variables in X. (The traditional Soon after I had studied Shannon’s formal defi nition of in- name for the set of models is ‘likelihood function’ although no formation in random variables and his other remarkable per- such concept exists in probability theory.) formance bounds for communication, [4], I wanted to apply them to other fi elds – in particular to estimation and statis- The most important problem is the selection of the model tics in general.
    [Show full text]
  • Itnl0908.Pdf
    itNL0908.qxd 8/13/08 8:34 AM Page 1 IEEE Information Theory Society Newsletter Vol. 58, No. 3, September 2008 Editor: Daniela Tuninetti ISSN 1059-2362 Annual IT Awards Announced The principal annual awards of the Information Theory Society of the U.S. National Academy of Engineering (2007 to date). He were announced at the Toronto ISIT. The 2009 Shannon Award will be General Co-Chair of the 2009 ISIT in Seoul. goes to Jorma Rissanen. The 2008 Wyner Award goes to Vincent Poor. The winners of the 2008 IT Paper Award are two 2006 IT The Information Theory Society Paper Award is given annually Transactions papers on compressed sensing by David Donoho to an outstanding publication in the fields of interest to the and by Emmanuel Candes and Terence Tao, respectively (with Society appearing anywhere during the preceding two calendar acknowledgment of an earlier paper by Candes, Romberg and years. The winners of the 2008 award are "Compressed sensing," Tao). The winner of the 2008 Joint IT/ComSoc Paper Award is a by David Donoho, which appeared in the April 2006 IEEE Communications Transactions paper on Accumulate-Repeat- Transactions on Information Theory, and "Near-optimal signal Accumulate Codes by Aliazam Abbasfar, Dariush Divsalar and recovery from random projections: universal encoding strate- Kung Yao. Three student authors of ISIT papers received ISIT gies," by Emmanuel Candes and Terence Tao, which appeared Student Paper Awards: Paul Cuff of Stanford, Satish Korada of in the December 2006 Transactions. These two overlapping EPFL, and Yury Polyanskiy of Princeton. Finally, the 2008 papers are the foundations of the exciting new field of com- Chapter of the Year Award was presented to the Kitchener- pressed sensing.
    [Show full text]
  • Evaluation of Image Compression Algorithms for Electronic Shelf Labels
    UPTEC IT 09 018 Examensarbete 30 hp December 2009 Evaluation of Image Compression Algorithms for Electronic Shelf Labels Robin Kuivinen Abstract Evaluation of Image Compression Algorithms for Electronic Shelf Labels Robin Kuivinen Teknisk- naturvetenskaplig fakultet UTH-enheten An advantageous innovation for retail stores is the ESL system, which consists of many electronic units, called labels, showing product and price information to Besöksadress: customers on small displays. Such a system offers, among other things, efficient price Ångströmlaboratoriet Lägerhyddsvägen 1 updates that implies lower costs by reducing man-hours and paper volumes. Data Hus 4, Plan 0 transfered to a label can be compressed in order to minimize the data size and thus lowering the updating time and the energy consumption of a label. Postadress: Box 536 751 21 Uppsala In this study, twelve lossless compression prototypes were implemented and evaluated, using a corpus set of bi-level images, with respect to compressibility, Telefon: encoding and decoding time. Six of those prototypes were subject to case studies of 018 – 471 30 03 real ESL data and further studies of memory consumption. Telefax: 018 – 471 30 00 The results showed that the low-precision arithmetic coder LowPac, which uses a context-based probability model, was one of the best prototypes over all data sets in Hemsida: terms of compressibility and it was also the most memory-efficient prototype. http://www.teknat.uu.se/student Using the corpus set, LowPac obtained encoding and decoding times of 2 times longer than one of the fastest prototypes, but with 108% improvement of compressibility, on average. Handledare: Nils Hulth Ämnesgranskare: Cris Luengo Examinator: Anders Jansson ISSN: 1401-5749, UPTEC IT09 018 Tryckt av: Reprocentralen ITC Sammanfattning En butiksägare inom detaljhandeln står inför en tidsödande process när nya priser på varor ska märkas upp.
    [Show full text]
  • Ieee-Level Awards
    IEEE-LEVEL AWARDS The IEEE currently bestows a Medal of Honor, fifteen Medals, thirty-three Technical Field Awards, two IEEE Service Awards, two Corporate Recognitions, two Prize Paper Awards, Honorary Memberships, one Scholarship, one Fellowship, and a Staff Award. The awards and their past recipients are listed below. Citations are available via the “Award Recipients with Citations” links within the information below. Nomination information for each award can be found by visiting the IEEE Awards Web page www.ieee.org/awards or by clicking on the award names below. Links are also available via the Recipient/Citation documents. MEDAL OF HONOR Ernst A. Guillemin 1961 Edward V. Appleton 1962 Award Recipients with Citations (PDF, 26 KB) John H. Hammond, Jr. 1963 George C. Southworth 1963 The IEEE Medal of Honor is the highest IEEE Harold A. Wheeler 1964 award. The Medal was established in 1917 and Claude E. Shannon 1966 Charles H. Townes 1967 is awarded for an exceptional contribution or an Gordon K. Teal 1968 extraordinary career in the IEEE fields of Edward L. Ginzton 1969 interest. The IEEE Medal of Honor is the highest Dennis Gabor 1970 IEEE award. The candidate need not be a John Bardeen 1971 Jay W. Forrester 1972 member of the IEEE. The IEEE Medal of Honor Rudolf Kompfner 1973 is sponsored by the IEEE Foundation. Rudolf E. Kalman 1974 John R. Pierce 1975 E. H. Armstrong 1917 H. Earle Vaughan 1977 E. F. W. Alexanderson 1919 Robert N. Noyce 1978 Guglielmo Marconi 1920 Richard Bellman 1979 R. A. Fessenden 1921 William Shockley 1980 Lee deforest 1922 Sidney Darlington 1981 John Stone-Stone 1923 John Wilder Tukey 1982 M.
    [Show full text]
  • IEEE Information Theory Society Newsletter
    IEEE Information Theory Society Newsletter Vol. 70, No. 3, September 2020 EDITOR: Salim El Rouayheb ISSN 1059-2362 President’s Column Aylin Yener We are quickly approaching the end of sum- experience from these reputed information mer and the reopening of campuses at least theorists. I am pleased to report that our Dis- across the US. As I write this, I find myself tinguished Lecturer program, is gearing up looking at the same view as I was when writ- to be back on, but with virtual lectures. The ing the previous columns, with perhaps a Board of Governors has extended the terms slight change (for the better) in the tempera- of the current lecturers by a year so that when ture and a new end-of-the summer breeze traveling resumes, the lecturers will have the that makes sitting in my balcony so much bet- opportunity to re-coupe the lost year in in- ter. I feel lucky once again to have an outdoor person chapter visits. For now, all chapters space; one I came to treasure much more than should consider inviting the lecturers for on- I did in BC (Before COVID). I realize I have line lectures, which can be facilitated on zoom forgotten the view from the office I got to sit or webex. I hope that the chapters and our in no more than ten days before the closure members will find this resource useful. in March and try to recall -to no avail- if any part of the city skyline was visible. I should We are continuing to live our lives online (al- have taken pictures I conclude.
    [Show full text]
  • Principles of Communications ECS 332
    Principles of Communications ECS 332 Asst. Prof. Dr. Prapun Suksompong (ผศ.ดร.ประพันธ ์ สขสมปองุ ) [email protected] 1. Intro to Communication Systems Office Hours: BKD, 6th floor of Sirindhralai building Wednesday 13:45-15:15 Friday 13:45-15:15 1 “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” Shannon, Claude. A Mathematical Theory Of Communication. (1948) 2 Shannon: Father of the Info. Age Documentary Co-produced by the Jacobs School, UCSD-TV, and the California Institute for Telecommunications and Information Technology Won a Gold award in the Biography category in the 2002 Aurora Awards. 3 [http://www.uctv.tv/shows/Claude-Shannon-Father-of-the-Information-Age-6090] [http://www.youtube.com/watch?v=z2Whj_nL-x8] C. E. Shannon (1916-2001) 1938 MIT master's thesis: A Symbolic Analysis of Relay and Switching Circuits Insight: The binary nature of Boolean logic was analogous to the ones and zeros used by digital circuits. The thesis became the foundation of practical digital circuit design. The first known use of the term bit to refer to a “binary digit.” Possibly the most important, and also the most famous, master’s thesis of the century. It was simple, elegant, and important. 4 C. E. Shannon: Master Thesis 5 Boole/Shannon Celebration Events in 2015 and 2016 centered around the work of George Boole, who was born 200 years ago, and Claude E. Shannon, born 100 years ago. Events were scheduled both at the University College Cork (UCC), Ireland and the Massachusetts Institute of Technology (MIT) 6 http://www.rle.mit.edu/booleshannon/ An Interesting Book The Logician and the Engineer: How George Boole and Claude Shannon Created the Information Age by Paul J.
    [Show full text]
  • IEEE Information Theory Society Newsletter
    IEEE Information Theory Society Newsletter Vol. 59, No. 3, September 2009 Editor: Tracey Ho ISSN 1059-2362 Optimal Estimation XXX Shannon lecture, presented in 2009 in Seoul, South Korea (extended abstract) J. Rissanen 1 Prologue 2 Modeling Problem The fi rst quest for optimal estimation by Fisher, [2], Cramer, The modeling problem begins with a set of observed data 5 5 5 c 6 Rao and others, [1], dates back to over half a century and has Y yt:t 1, 2, , n , often together with other so-called 5 51 c26 changed remarkably little. The covariance of the estimated pa- explanatory data Y, X yt, x1,t, x2,t, . The objective is rameters was taken as the quality measure of estimators, for to learn properties in Y expressed by a set of distributions which the main result, the Cramer-Rao inequality, sets a lower as models bound. It is reached by the maximum likelihood (ML) estima- 5 1 u 26 tor for a restricted subclass of models and asymptotically for f Y|Xs; , s . a wider class. The covariance, which is just one property of u5u c u models, is too weak a measure to permit extension to estima- Here s is a structure parameter and 1, , k1s2 real- valued tion of the number of parameters, which is handled by various parameters, whose number depends on the structure. The ad hoc criteria too numerous to list here. structure is simply a subset of the models. Typically it is used to indicate the most important variables in X. (The traditional Soon after I had studied Shannon’s formal defi nition of in- name for the set of models is ‘likelihood function’ although no formation in random variables and his other remarkable per- such concept exists in probability theory.) formance bounds for communication, [4], I wanted to apply them to other fi elds – in particular to estimation and statis- The most important problem is the selection of the model tics in general.
    [Show full text]
  • Digital Communication Systems ECS 452
    Digital Communication Systems ECS 452 Asst. Prof. Dr. Prapun Suksompong (ผศ.ดร.ประพันธ ์ สขสมปองุ ) [email protected] 1. Intro to Digital Communication Systems Office Hours: BKD, 6th floor of Sirindhralai building Monday 10:00-10:40 Tuesday 12:00-12:40 1 Thursday 14:20-15:30 “The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” Shannon, Claude. A Mathematical Theory Of Communication. (1948) 2 Shannon: Father of the Info. Age Documentary Co-produced by the Jacobs School, UCSD-TV, and the California Institute for Telecommunications and Information Technology Won a Gold award in the Biography category in the 2002 Aurora Awards. 3 [http://www.uctv.tv/shows/Claude-Shannon-Father-of-the-Information-Age-6090] [http://www.youtube.com/watch?v=z2Whj_nL-x8] C. E. Shannon (1916-2001) 1938 MIT master's thesis: A Symbolic Analysis of Relay and Switching Circuits Insight: The binary nature of Boolean logic was analogous to the ones and zeros used by digital circuits. The thesis became the foundation of practical digital circuit design. The first known use of the term bit to refer to a “binary digit.” Possibly the most important, and also the most famous, master’s thesis of the century. It was simple, elegant, and important. 4 C. E. Shannon: Master Thesis 5 Boole/Shannon Celebration Events in 2015 and 2016 centered around the work of George Boole, who was born 200 years ago, and Claude E. Shannon, born 100 years ago. Events were scheduled both at the University College Cork (UCC), Ireland and the Massachusetts Institute of Technology (MIT) 6 http://www.rle.mit.edu/booleshannon/ An Interesting Book The Logician and the Engineer: How George Boole and Claude Shannon Created the Information Age by Paul J.
    [Show full text]
  • Distance Path Because, 01 1 Regardless of What Happens Subsequently, This Path Will 0 × 4 Have a Larger Hamming 11 1/10 (1) 1 Distance from Y
    Direct Minimum Distance Decoding y Suppose = [11 01 11]. y Find . y Find the message which corresponds to the (valid) codeword with minimum (Hamming) distance from . y ܠപൌො ݀ ܠപǡܡ ܠപ പ + + 89 Direct Minimum Distance Decoding + y Suppose = [11 01 11]. y Find . + y Find the message which corresponds to the (valid) codeword with minimum (Hamming) distance from . ࢟ = [ 11 01 11 ]. 00 0/00 0/00 0/00 For 3-bit message, 3 10 there are 2 = 8 possible codewords. We can list all possible codewords. However, here, let’s first try to work 01 on the distance directly. 11 1/10 90 Direct Minimum Distance Decoding y Suppose = [11 01 11]. y Find . y Find the message which corresponds to the (valid) codeword with minimum (Hamming) distance from . The number in parentheses on each ࢟ = [ 11 01 11 ]. branch is the branch metric, obtained 0/00 (2) 0/00 (1) 0/00 (2) by counting the differences between 00 the encoded bits and the corresponding bits in ࢟. 10 01 11 1/10 (1) 91 Direct Minimum Distance Decoding y Suppose = [11 01 11]. y Find . y Find the message which corresponds to the (valid) codeword with minimum (Hamming) distance from . ࢟ = [ 11 01 11 ]. b d(x,y) 0/00 (2) 0/00 (1) 0/00 (2) 00 000 2+1+2 = 5 001 2+1+0 = 3 010 2+1+1 = 4 10 011 2+1+1 = 4 100 0+2+0 = 2 01 101 0+2+2 = 4 110 0+0+1 = 1 11 1/10 (1) 111 0+0+1 = 1 92 Viterbi decoding y Developed by Andrew J.
    [Show full text]