IEEE Information Theory Society Newsletter Vol. 59, No. 3, September 2009 Editor: Tracey Ho ISSN 1059-2362 Optimal Estimation XXX Shannon lecture, presented in 2009 in Seoul, South Korea (extended abstract) J. Rissanen 1 Prologue 2 Modeling Problem The fi rst quest for optimal estimation by Fisher, [2], Cramer, The modeling problem begins with a set of observed data 5 5 5 c 6 Rao and others, [1], dates back to over half a century and has Y yt:t 1, 2, , n , often together with other so-called 5 51 c26 changed remarkably little. The covariance of the estimated pa- explanatory data Y, X yt, x1,t, x2,t, . The objective is rameters was taken as the quality measure of estimators, for to learn properties in Y expressed by a set of distributions which the main result, the Cramer-Rao inequality, sets a lower as models bound. It is reached by the maximum likelihood (ML) estima- 5 1 u 26 tor for a restricted subclass of models and asymptotically for f Y|Xs; , s . a wider class. The covariance, which is just one property of u5u c u models, is too weak a measure to permit extension to estima- Here s is a structure parameter and 1, , k1s2 real- valued tion of the number of parameters, which is handled by various parameters, whose number depends on the structure. The ad hoc criteria too numerous to list here. structure is simply a subset of the models. Typically it is used to indicate the most important variables in X. (The traditional Soon after I had studied Shannon’s formal defi nition of in- name for the set of models is ‘likelihood function’ although no formation in random variables and his other remarkable per- such concept exists in probability theory.) formance bounds for communication, [4], I wanted to apply them to other fi elds – in particular to estimation and statis- The most important problem is the selection of the model tics in general. After all, the central problem in statistics is to class, but since its optimal selection is non-computable there extract information from data. After having worked on data is little we can say about it. It is clear that when picking it we compression and introduced arithmetic coding it seemed evi- must take into account the general type of the data, the sensi- dent that both estimation and data compression have a com- tivity of the models with respect to the parameters, which is mon goal: in data compression the shortest code length cannot an issue in the so-called robust statistics, and so on. be achieved without taking advantage of the regular features in data, while in estimation it is these regular features, the un- To simplify the notations we write the data as x with the un- derlying mechanism, that we want to learn. This led me to in- derstanding that the models are distributions on Y or condi- troduce the MDL or Minimum Description Length principle, tional distributions on Y given the explanatory data X if they and I thought that the job was done. too are observed. (In the fi rst reading put just x 5 Y ). We also consider only structures determined by the number of param- uk 5u c u However, when starting to prepare this lecture I found that it eters so that we already fi x the sequence 1, , k, also was diffi cult, for I could not connect the several in themselves written as u. We then have the two model classes meaningful results to form a nice coherent picture. It was like a jigsaw puzzle where the pieces almost fi t but not quite, M 5 5 f1x; u, k2:u [ Vk6, k # n and, moreover, vital pieces were missing. After considerable k K struggle I was able to get the pieces to fi t but to do so I had to M 5 M # d k, K n alter them all, and ignore the means and concepts introduced k50 by the masters mentioned above. The result was separation of estimation from data compression, and we can now defi ne op- depending on whether we are considering the number of pa- timality for all amounts of data and not just asymptotically. rameters fi xed or if it, too, is to be estimated. The latter class continued on page 6 2 From the Editor Tracey Ho Dear IT Society members, For many of us, the summer is drawing to an end, and fall term is starting soon. Do take some time to relax with our It was very nice to see many of you at ISIT fun regular features – the Historian’s Column and Golomb’s in June. Many thanks to the organizers for Puzzle Column. Also, an article commemorating the hun- a fantastic program and meticulously-run dredth anniversary of telecommunications pioneer Vladimir conference. This issue includes the re- Kotelnikov is included in this issue. ports from ISIT and ITW Volos, and an- nouncements of awards from ISIT and Please help to make the newsletter as interesting and infor- elsewhere – warmest congratulations to mative as possible by offering suggestions and contributing all the award winners. I’m also delighted news. The deadlines for the next few issues are: to include in this issue the fi rst-ever pub- lished article on 2009 Shannon Lecturer Issue Deadline Jorma Rissanen’s new theory of optimal December 2009 October 10, 2009 estimation. Jorma said that as he was March 2010 January 10, 2010 preparing his Shannon Lecture, he un- June 2010 April 10, 2010 expectedly encountered diffi culties in- corporating some of the basic results of Announcements, news and events intended for both the printed newsletter and the web- traditional estimation theory in a coher- site, such as award announcements, calls for nominations and upcoming conferences, can ent framework, so he set about creating now be submitted jointly at the IT Society website http://www.itsoc.org/, using the quick a new framework for optimal estimation. links “Share News” and “Announce an Event”. For more details please see the article on His article in this issue summarizes his the new website in this issue. lecture and gives an exciting fi rst look at these new ideas. Articles and columns intended only for the printed newsletter should be e-mailed to me at [email protected]. This issue also comes with sad news of the passing of Wesley Peterson, whose Please submit ASCII, LaTeX or Word source fi les; do not worry about fonts or layout many achievements and honors include as this will be taken care of by IEEE layout specialists. Electronic photos and graphics the Shannon Award in 1981. He is fondly should be in high resolution and sent as separate fi les. remembered by his friends Ned Weldon and Shu Lin. I look forward to your contributions and suggestions for future issues of the newsletter. Tracey Ho IEEE Information Theory Society Newsletter IEEE Information Theory Society Newsletter Table of Contents (USPS 360-350) is published quarterly by the Information Theory Society of the Institute of Optimal Estimation: XXX Shannon Lecture . 1 Electrical and Electronics Engineers, Inc. From the Editor . 2 Headquarters: 3 Park Avenue, 17th Floor, President’s Column. 3 New York, NY 10016-5997. Annual Information Theory Society Awards Announced . .4 Cost is $1.00 per member per year (included in Society fee) for each member of the In Memoriam, Wes Peterson . 5 Information Theory Society. Printed in the The Historian’s Column. 8 U.S.A. Periodicals postage paid at New York, NY and at additional mailing offices. IT Society Members Honored . 9 Symposium Report: The 2009 IEEE International Symposium on Information Theory, Seoul, Korea . 9 Postmaster: Send address changes to IEEE Information Theory Society Newsletter, Workshop Report: Networking and Information Theory, Volos, Greece. 11 IEEE, 445 Hoes Lane, Piscataway, NJ 08854. Golomb’s Puzzle Column: Finding Sums. 12 © 2009 IEEE. Information contained in this The Life Filled with Cognition and Action: 100th Anniversary of Academician V.A. Kotelnikov . .13 newsletter may be copied without permission provided that the copies are not made or dis- Golomb’s Puzzle Column: Some Partition Problems Solutions . 15 tributed for direct commercial advantage, and Call for Papers . 18 the title of the publication and its date appear. Conference Calendar . .20 IEEE Information Theory Society Newsletter September 2009 3 President’s Column Andrea Goldsmith I have just returned from the main annual event model associated with a given set of observed of our society, the International Symposium on In- data is the one that leads to the largest compres- formation Theory (ISIT). This year ISIT was held sion of the data. MDL is particularly well-suited in Seoul, Korea, returning to mainland Asia for for highly complex models where overfi tting the the fi rst time since 1973, when Shannon gave the data is a serious concern. MDL has had signifi cant fi rst of what would become the Shannon lectures impact on the fi elds of inductive inference, statis- in Askelon Israel. The symposium was superb in tical modeling, pattern recognition, and machine both its organization and technical content. A di- learning, and has been widely applied to a broad verse set of stimulating and thought-provoking cross- section of problems in engineering as well as plenary talks by Rich Baraniuk, David Tse, biology and medicine. It continues to be a very ac- Raymond Yeung, and Noga Alon set an excel- tive area of research more than 20 years after its lent tone for each day’s sessions. The technical invention, including recent work by Jorma himself program was outstanding with a broad range of on MDL denoising currently in press.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages20 Page
-
File Size-