Three Learning Principles Hsuan-Tien Lin (林軒田) [email protected]

Machine Learning Foundations (_hx習ú石) Lecture 16: Three Learning Principles Hsuan-Tien Lin (林軒0) [email protected] Department of Computer Science & Information Engineering National Taiwan University (國Ë台c'xÇ訊工程û) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 0/25 Three Learning Principles Roadmap 1 When Can Machines Learn? 2 Why Can Machines Learn? 3 How Can Machines Learn? 4 How Can Machines Learn Better? Lecture 15: Validation (crossly) reserve validation data to simulate testing procedure for model selection Lecture 16: Three Learning Principles Occam’s Razor Sampling Bias Data Snooping Power of Three Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 1/25 entia non sunt multiplicanda praeter necessitatem (entities must not be multiplied beyond necessity) —William of Occam (1287-1347) ‘Occam’s razor’ for trimming down unnecessary explanation figure by Fred the Oyster (Own work) [CC-BY-SA-3.0], via Wikimedia Commons Three Learning Principles Occam’s Razor Occam’s Razor An explanation of the data should be made as simple as possible, but no simpler.—Albert Einstein? (1879-1955) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/25 ‘Occam’s razor’ for trimming down unnecessary explanation figure by Fred the Oyster (Own work) [CC-BY-SA-3.0], via Wikimedia Commons Three Learning Principles Occam’s Razor Occam’s Razor An explanation of the data should be made as simple as possible, but no simpler.—Albert Einstein? (1879-1955) entia non sunt multiplicanda praeter necessitatem (entities must not be multiplied beyond necessity) —William of Occam (1287-1347) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/25 Three Learning Principles Occam’s Razor Occam’s Razor An explanation of the data should be made as simple as possible, but no simpler.—Albert Einstein? (1879-1955) entia non sunt multiplicanda praeter necessitatem (entities must not be multiplied beyond necessity) —William of Occam (1287-1347) ‘Occam’s razor’ for trimming down unnecessary explanation figure by Fred the Oyster (Own work) [CC-BY-SA-3.0], via Wikimedia Commons Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/25 1 What does it mean for a model to be simple? 2 How do we know that simpler is better? which one do you prefer? :-) two questions: Three Learning Principles Occam’s Razor Occam’s Razor for Learning The simplest model that fits the data is also the most plausible. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/25 1 What does it mean for a model to be simple? 2 How do we know that simpler is better? two questions: Three Learning Principles Occam’s Razor Occam’s Razor for Learning The simplest model that fits the data is also the most plausible. which one do you prefer? :-) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/25 2 How do we know that simpler is better? Three Learning Principles Occam’s Razor Occam’s Razor for Learning The simplest model that fits the data is also the most plausible. which one do you prefer? :-) two questions: 1 What does it mean for a model to be simple? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/25 Three Learning Principles Occam’s Razor Occam’s Razor for Learning The simplest model that fits the data is also the most plausible. which one do you prefer? :-) two questions: 1 What does it mean for a model to be simple? 2 How do we know that simpler is better? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/25 small Ω( ) = not many • H contains small number of • hypotheses small Ω(h) small Ω( ) ( H simple model H small Ω(h) = ‘looks’ simple • specified by few • parameters connection h specified by ` bits of size 2` ( jHj simple: small hypothesis/model complexity Three Learning Principles Occam’s Razor Simple Model simple hypothesis h Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/25 small Ω( ) = not many • H contains small number of • hypotheses small Ω(h) small Ω( ) ( H simple model H specified by few • parameters connection h specified by ` bits of size 2` ( jHj simple: small hypothesis/model complexity Three Learning Principles Occam’s Razor Simple Model simple hypothesis h small Ω(h) = ‘looks’ simple • Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/25 small Ω( ) = not many • H contains small number of • hypotheses small Ω(h) small Ω( ) ( H simple model H connection h specified by ` bits of size 2` ( jHj simple: small hypothesis/model complexity Three Learning Principles Occam’s Razor Simple Model simple hypothesis h small Ω(h) = ‘looks’ simple • specified by few • parameters Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/25 small Ω(h) small Ω( ) ( H small Ω( ) = not many • H contains small number of • hypotheses connection h specified by ` bits of size 2` ( jHj simple: small hypothesis/model complexity Three Learning Principles Occam’s Razor Simple Model simple hypothesis h simple model H small Ω(h) = ‘looks’ simple • specified by few • parameters Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/25 small Ω(h) small Ω( ) ( H contains small number of • hypotheses connection h specified by ` bits of size 2` ( jHj simple: small hypothesis/model complexity Three Learning Principles Occam’s Razor Simple Model simple hypothesis h simple model H small Ω(h) = ‘looks’ simple small Ω( ) = not many • • H specified by few • parameters Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/25 small Ω(h) small Ω( ) ( H connection h specified by ` bits of size 2` ( jHj simple: small hypothesis/model complexity Three Learning Principles Occam’s Razor Simple Model simple hypothesis h simple model H small Ω(h) = ‘looks’ simple small Ω( ) = not many • • H specified by few contains small number of • • parameters hypotheses Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/25 small Ω(h) small Ω( ) ( H simple: small hypothesis/model complexity Three Learning Principles Occam’s Razor Simple Model simple hypothesis h simple model H small Ω(h) = ‘looks’ simple small Ω( ) = not many • • H specified by few contains small number of • • parameters hypotheses connection h specified by ` bits of size 2` ( jHj Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/25 simple: small hypothesis/model complexity Three Learning Principles Occam’s Razor Simple Model simple hypothesis h simple model H small Ω(h) = ‘looks’ simple small Ω( ) = not many • • H specified by few contains small number of • • parameters hypotheses connection h specified by ` bits of size 2` ( jHj small Ω(h) small Ω( ) ( H Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/25 Three Learning Principles Occam’s Razor Simple Model simple hypothesis h simple model H small Ω(h) = ‘looks’ simple small Ω( ) = not many • • H specified by few contains small number of • • parameters hypotheses connection h specified by ` bits of size 2` ( jHj small Ω(h) small Ω( ) ( H simple: small hypothesis/model complexity Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/25 = ) simple H = smaller mH(N) ) m (N) = less ‘likely’ to fit data perfectly H ) 2N = more significant when fit happens ) direct action: linear first; always ask whether data over-modeled Three Learning Principles Occam’s Razor Simple is Better in addition to math proof that you have seen, philosophically: Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/25 = ) = smaller mH(N) ) m (N) = less ‘likely’ to fit data perfectly H ) 2N = more significant when fit happens ) direct action: linear first; always ask whether data over-modeled Three Learning Principles Occam’s Razor Simple is Better in addition to math proof that you have seen, philosophically: simple H Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/25 = ) m (N) = less ‘likely’ to fit data perfectly H ) 2N = more significant when fit happens ) direct action: linear first; always ask whether data over-modeled Three Learning Principles Occam’s Razor Simple is Better in addition to math proof that you have seen, philosophically: simple H = smaller mH(N) ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/25 = ) = more significant when fit happens ) direct action: linear first; always ask whether data over-modeled Three Learning Principles Occam’s Razor Simple is Better in addition to math proof that you have seen, philosophically: simple H = smaller mH(N) ) m (N) = less ‘likely’ to fit data perfectly H ) 2N Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/25 = ) direct action: linear first; always ask whether data over-modeled Three Learning Principles Occam’s Razor Simple is Better in addition to math proof that you have seen, philosophically: simple H = smaller mH(N) ) m (N) = less ‘likely’ to fit data perfectly H ) 2N = more significant when fit happens ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/25 = ) direct action: linear first; always ask whether data over-modeled Three Learning Principles Occam’s Razor Simple is Better in addition to math proof that you have seen, philosophically: simple H = smaller mH(N) ) m (N) = less ‘likely’ to fit data perfectly H ) 2N = more significant when fit happens ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/25 = ) always ask whether data over-modeled Three Learning Principles Occam’s Razor Simple is Better in addition to math proof that you have seen, philosophically: simple H = smaller mH(N) ) m (N) = less ‘likely’ to fit data perfectly H ) 2N = more significant when fit happens ) direct action: linear first; Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/25 = ) Three Learning Principles Occam’s Razor Simple is Better in addition to math proof that you have seen, philosophically: simple H = smaller mH(N) ) m (N) = less ‘likely’ to fit data perfectly H ) 2N = more significant when fit happens ) direct action: linear first; always ask whether data over-modeled Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/25 Reference Answer: 3 Of all 1024 possible , only 2N = 20 of them is separable by .

Load more