The Mathematics of the Unknown Bringing to the Present the Knowledge of the Future
Total Page:16
File Type:pdf, Size:1020Kb
The Mathematics of the Unknown Bringing to the present the knowledge of the future R. A. García Leiva (This book is 69% complete) Copyright c 2020 R. A. García Leiva www.mathematicsunknonw.com PUBLISHED BY THE AUTHOR This book is dedicated to my wife Justi, my son Daniel, and my two daughters Teresa and Lucía. Contents 1 Introduction .................................................. 19 1.1 Entities 20 1.2 Representations 21 1.3 Descriptions 22 1.4 Miscoding 24 1.5 Inaccuracy 25 1.6 Surfeit 27 1.7 Nescience 28 1.8 Evolution of Nescience 31 1.9 Other Metrics 32 1.10 Interesting Research Questions 34 1.11 New Research Entities 36 1.12 References and Further Reading 37 I Part 1: Background 2 Logic ........................................................ 41 2.1 Propositional Logic 41 2.2 Predicate Logic 43 3 Discrete Mathematics ......................................... 45 3.1 Sets, Relations and Functions 46 3.2 Strings 47 3.3 Matrices 48 3.4 Graphs 49 3.5 Probability 51 4 Set Theory .................................................... 55 5 Computability ................................................ 57 5.1 Turing Machines 58 5.2 Universal Turing Machines 61 5.3 Non-Computable Problems 62 5.4 Computable Functions and Sets 63 5.5 Oracle Turing Machine 64 5.6 Computational Complexity 64 6 Coding ...................................................... 67 6.1 Coding 68 6.2 Kraft Inequality 70 6.3 Entropy 72 6.4 Optimal Codes 75 6.5 Huffman Algorithm 76 7 Complexity .................................................. 81 7.1 Strings Complexity 82 7.2 Properties of Complexity 84 7.3 Conditional Kolmogorov complexity 85 7.4 Information Distance 86 7.5 Incompressibility and Randomness 87 8 Learning ..................................................... 89 8.1 Statistical Inference 89 8.1.1 Non-Bayesian Inference.......................................... 90 8.1.2 Bayesian Inference.............................................. 91 8.2 Machine Learning 91 8.2.1 No free lunch theorem........................................... 93 8.2.2 The bias-variance trade-off........................................ 93 8.2.3 Generative vs. discriminative models................................ 93 8.3 Minimum Message Length 93 8.4 Minimum Description Length 94 8.4.1 Refined MDL................................................... 96 8.5 Multiobjective Optimization 96 8.5.1 Trade-offs...................................................... 97 8.5.2 Optimization Methods............................................ 97 9 Philosophy of Science ........................................ 99 9.1 Metaphysics 99 9.2 Scientific Representation 100 9.3 Models in Science 101 9.4 Scientific Theories 102 9.5 The Scientific Method 102 9.6 Scientific Discovery 103 II Part 2: Foundations 10 Entities, Representations and Descriptions ..................... 107 10.1 Entities 108 10.2 Representations 109 10.3 Joint Representatinos 112 10.4 Descriptions 113 10.5 Models for Joint Topics 115 10.6 Conditional Models 116 10.7 Research Areas 117 11 Miscoding ................................................... 121 11.1 Miscoding 122 11.2 Joint Miscoding 123 11.3 Miscoding of Areas 124 12 Inaccuracy ................................................. 125 12.1 Inaccuracy 126 12.2 Joint Inaccuracy 127 12.2.1 Inaccuracy of Joint Models....................................... 128 12.3 Conditional Inaccuracy 129 12.4 Inaccuracy of Areas 129 13 Surfeit ....................................................... 131 13.1 Surfeit 132 13.2 Joint Surfeit 133 13.3 Conditional Surfeit 135 13.4 Surfeit of Areas 136 14 Nescience .................................................. 137 14.1 Nescience 138 14.2 Joint Nescience 140 14.3 Conditional Nescience 140 14.4 Nescience of Areas 141 14.5 Perfect Knowledge 142 14.6 Current Best Description 143 14.7 Nescience based on Datasets 144 14.8 Unknonwn Unknown 144 15 Interesting Questions ........................................ 145 15.0.1 Combination of Topics........................................... 146 15.1 Relevance 147 15.2 Applicability 148 15.3 Interesting Questions 149 15.4 New Research Topics 150 15.5 Classification of Research Areas 152 16 Advanced Properties ........................................ 155 16.1 The Axioms of Science 155 16.2 Scientific Method 157 16.2.1 Science as a Language......................................... 157 16.3 The Inaccuracy - Surfeit Trade-off 158 16.4 Science vs. Pseudoscience 158 16.5 Graspness 158 16.6 Effort 159 16.7 Human Understanding 159 16.8 Areas in Decay 160 III Part 3: Applications 17 Classifying Research Topics .................................. 163 17.1 Wikipedia, The Free Encyclopedia 163 17.2 Classification of Research Topics 163 17.3 Classification of Research Areas 167 17.4 Interesting Research Questions 168 17.5 Interesting Research Topics 169 17.6 Philosophy of Science 172 17.7 Evolution of Knowledge 174 17.8 Graspness of Topics 176 17.9 Probability of Being True 176 17.10 Unused text 177 18 Machine Learning ........................................... 179 18.1 Nescience Python Library 179 18.2 Inaccuracy 180 18.3 Miscoding 183 18.4 Surfeit 190 18.5 Nescience 193 18.6 Auto Machine Classification 197 18.7 Auto Machine Regression 197 18.8 Auto Time Series 197 18.9 Decision Trees 197 18.9.1 Algorithm Description........................................... 199 18.9.2 Algorithm Evaluation............................................ 201 18.10 Algebraic Model Selection 205 18.11 The Analysis of the Incompressible 208 19 Software Engineering ........................................ 211 19.1 Redundancy of Software 211 19.2 Quality Assurance 215 19.3 Forex Trading Robots 215 20 Computational Creativity .................................... 219 20.1 Classification of Research Topics 220 20.1.1 Relevance.................................................... 220 20.1.2 Applicability................................................... 221 20.1.3 Maturity...................................................... 222 20.2 Interesting Research Questions 222 20.2.1 Intradisciplinary Questions........................................ 222 20.2.2 Interdisciplinary Questions........................................ 222 20.3 New Research Topics 222 20.4 Classification of Research Areas 222 20.5 References 222 20.6 Future Work 223 IV Apendix A Advanced Mathematics ..................................... 227 A.1 Zermelo-Fraenkel Axioms 227 B Appendix: Wikipedia Data Processing ........................ 229 C Coq Proof Assistant .......................................... 231 D About Quotes and Photos .................................... 235 D.0.1 Quotes....................................................... 235 D.0.2 Photos....................................................... 237 E Notation .................................................... 241 Bibliography ................................................ 245 Books 245 Articles 246 Index ....................................................... 249 Preface Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. Antoine de Saint-Exupéry The main idea of this book is that perfect knowledge implies randomness. This is, in principle, a highly counterintuitive idea, since a lot of effort in science deals with the task to name, organize and classify our messy, chaotic, world. Even the kind of knowledge that explains how things work requires a previous ordering and classification. Science, apparently, is anything but random. Yet, this is not the case. If it requires a lot of time to explain how something works, probably we do not fully understand it. For example, consider the calculus of derivatives. At its origin, during the time of Newton and Leibniz, it was required a lot of space to define the concept of limit of a function. Moreover, only the specialists, if any, were able to understand the idea. We had a highly incomplete knowledge. Today, the definition of derivative barely takes one paragraph in a math book, and it is taught at high school. Our current knowledge is much better. Long explanations are usually superfluous, they contain repeated ideas, unused concepts, improperly identified relations, bad notation, and so on. With a better understanding, normally after considerable research, we should be able to remove those unnecessary elements. And when there is nothing left to take away, we say that we have achieved a perfect knowledge. Hence, if a theory is perfect, that is, it presents no redundant elements we can remove, it description must be an incompressible string. And that is exactly the mathematical definition of random string, an incompressible sequence of symbols. In this sense, a scientific theory is random if it contains the maximum amount of information in the less space possible. In this book it is described the new Theory of Nescience, a mathematical theory that address the problem of what science is and how scientific knowledge is acquired. Our main assumption is that it is easier to measure how much we do not know than measuring how much we do know, because randomness posses a limit to how much we can know. A second assumption is that the computer, as a conceptual model, is the right tool to provide this quantitative measure. 12 The fact that randomness imposes a limit on how much we can know about a particular research topic, far from being a handicap, opens new opportunities in science and technology. The proper understanding of this limit will allow us not only to solve the most challenging open problems, but also to discover new interesting research