Counting Prime Polynomials and Measuring Complexity and Similarity of Information

Counting Prime Polynomials and Measuring Complexity and Similarity of Information by Niko Rebenich B.Eng., University of Victoria, 2007 M.A.Sc., University of Victoria, 2012 A Dissertation Submitted in Partial Fulllment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY in the Department of Electrical and Computer Engineering © Niko Rebenich, 2016 University of Victoria All rights reserved. This dissertation may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author. ii Counting Prime Polynomials and Measuring Complexity and Similarity of Information by Niko Rebenich B.Eng., University of Victoria, 2007 M.A.Sc., University of Victoria, 2012 Supervisory Committee Dr. Stephen Neville, Co-supervisor (Department of Electrical and Computer Engineering) Dr. T. Aaron Gulliver, Co-supervisor (Department of Electrical and Computer Engineering) Dr. Venkatesh Srinivasan, Outside Member (Department of Computer Science) iii Supervisory Committee Dr. Stephen Neville, Co-supervisor (Department of Electrical and Computer Engineering) Dr. T. Aaron Gulliver, Co-supervisor (Department of Electrical and Computer Engineering) Dr. Venkatesh Srinivasan, Outside Member (Department of Computer Science) ABSTRACT This dissertation explores an analogue of the prime number theorem for polynomials over nite elds as well as its connection to the necklace factorization algorithm T-transform and the string complexity measure T-complexity. Specically, a precise asymptotic expansion for the prime polynomial counting function is derived. The approximation given is more accurate than previous results in the literature while requiring very little computational eort. In this context asymptotic series expansions for Lerch transcendent, Eulerian polynomials, truncated polylogarithm, and polylogarithms of negative integer order are also provided. The expansion formu- las developed are general and have applications in numerous areas other than the enumeration of prime polynomials. A bijection between the equivalence classes of aperiodic necklaces and monic prime polynomials is utilized to derive an asymptotic bound on the maximal T- complexity value of a string. Furthermore, the statistical behaviour of uniform random sequences that are factored via the T-transform are investigated, and an accurate probabilistic model for short necklace factors is presented. Finally, a T-complexity based conditional string complexity measure is proposed and used to dene the normalized T-complexity distance that measures similarity between strings. The T-complexity distance is proven to not be a metric. However, the measure can be computed in linear time and space making it a suitable choice for large data sets. iv Contents Supervisory Committee ii Abstract iii Table of Contents iv List of Tables vii List of Figures ix List of Nomenclature xi Acknowledgements xv Dedication xvi 1 Introduction 1 1.1 Contributions ................................ 3 1.2 DissertationOutline. 4 2 Algebraic and Number Theory Background 6 2.1 Notation ................................... 6 2.2 CyclicGroups ................................ 7 2.3 FiniteFields ................................. 9 2.3.1 FiniteFieldExtensions . 9 2.4 Primitive Roots of Unity and Cyclotomic Cosets . ..... 12 2.5 MonicIrreduciblePolynomialsandNecklaces . ..... 14 2.5.1 Bounding the Number of Monic Irreducible Polynomials ... 17 2.5.2 DensityofMonicIrreduciblePolynomials . 18 2.5.3 AperiodicandPeriodicNecklaces . 19 v 2.6 Summary................................... 23 3 An Analogue of the Prime Number Theorem for Polynomials over Finite Fields 24 3.1 EnumerationofPrimePolynomials . 25 3.2 Asymptotic Expansions of the Truncated Polylogarithm . ....... 29 3.3 ThePrimePolynomialTheoremforFiniteFields . .... 41 3.4 ComputationalResults . 43 3.5 Summary................................... 56 4 The T-Transform and T-Complexity 57 4.1 BackgroundandRelatedWork . 57 4.1.1 ComputationalComplexity . 57 4.1.2 AlgorithmicComplexity . 58 4.1.3 DeterministicComplexityandRandomness . 58 4.2 Notation ................................... 62 4.3 T-Augmentation ............................... 63 4.4 TheT-Transform............................... 65 4.4.1 TheNaïveT-TransformAlgorithm . 66 4.4.2 T-TransformAlgorithmEvolution. 69 4.5 T-Complexity ................................ 71 4.5.1 BoundingT-Complexity . 71 4.6 ComputationalResults . 74 4.7 Summary................................... 77 5 The T-Complexity of Uniformly Distributed Random Sequences 78 5.1 RelatedWork................................. 78 5.2 Conjectures on the Statistics of the T-Transform . ....... 79 5.2.1 The T-augmentation Level Distribution of Short Necklaces . 82 5.2.2 The T-handle Length Distribution of Short Necklaces . .... 92 5.2.3 BeyondShortNecklaces . 98 5.3 Summary................................... 103 6 Measuring String Similarity 104 6.1 TheNormalizedInformationDistance . 104 6.2 TheNormalizedCompressionDistance . 105 vi 6.3 TheNormalizedT-ComplexityDistance . 107 6.3.1 MetricViolation. 110 6.4 Summary................................... 113 7 Conclusions 114 7.1 FutureWork ................................. 115 A Supplemental Materials 117 A.1 MapleSourceCode ............................. 117 Bibliography 121 vii List of Tables 4 Table 2.1 Finite eld representation for F2[t]/(t + t + 1). ......... 20 Table 2.2 Cyclotomic cosets for F16. ..................... 21 Table 2.3 Cyclotomic cosets of F2m and binary necklaces of length m=4. 22 Table 3.1 Asymptotic approximations to , (z) for z = 0.2. ....... 45 An K Table 3.2 Asymptotic approximations to , (z) for z = 2. ........ 46 An K Table 3.3 Asymptotic approximations to , (z) for z = 7 + 11i. .... 47 An K − Table3.4 Euleriannumbertriangle. 48 Table 3.5 Asymptotic approximations to (z,s,m) for z = 3, s = 1. ... 49 LN Table 3.6 Relative approximation error of (z,s,m) for z = 3, s = 1.. 51 LN Table 3.7 Relative approximation error of (z,s,m) for z = 1.25, s = 2. 52 LN Table 3.8 Relative approximation error of (z,s,m) for LN z = 9 + 2.5i, s = 5. ......................... 53 − Table 3.9 Absolute approximation error of the monic prime polynomial counting function for q = 2. ............. 54 Table 3.10 Relative approximation error of the monic prime polynomial counting function estimates for q = 2. ....... 55 Table 4.1 Computational complexity of LZ string factorization algorithms............................... 60 Table 4.2 Computational complexity of T-transform algorithms.. 70 Table 4.3 Comparison of lower and asymptotic bound on maximal T-complexity. ............................ 75 Table 4.4 Comparison of upper and asymptotic bound on maximal T-complexity. ............................ 76 Table 5.1 Exponential T-augmentation level probability model parameterestimation. 87 viii Table 5.2 Goodness of t test and exponential PDF parameter estimations. ............................. 88 Table 6.1 T-transform of string x#y. ..................... 108 Table 6.2 T-transform of string y#x. ..................... 109 Table 6.3 T-transform results for all pairwise concatenations of the three strings x, y, and z. ...................... 112 ix List of Figures Figure 2.1 Binary necklaces of length m = 4.................. 22 Figure 3.1 Comparison of empirical and optimal truncation of (z,s,m) for z = 3, s = 1. ..................... 48 LN Figure 3.2 Absolute approximation error of (z,s,m) for LN z = 3, s = 1underoptimaltruncation. 50 Figure 4.1 Example of a binary T-code construction. ..... 64 Figure 4.2 Pseudo-code listing of naïve T-transform algorithm. 66 Figure 4.3 T-transform at intermediate T-augmentation level i. ...... 67 Figure 4.4 Comparison of upper, lower, and asymptotic bound on maximalT-complexity. 75 Figure 5.1 T-complexity of random sequence x versus minimal and maximalT-complexitybounds. 80 Figure 5.2 Histogram of υ(x) for |x| = 232 bits for 512 binary uniform randomsequences. 81 Figure 5.3 Empirical and modelled cumulative distribution function of T-augmentation levels of x............... 82 Figure 5.4 Probability of the occurrence of a necklace of length m at T-augmentation level ℓ........................ 84 Figure 5.5 Quantile-quantile plot for necklaces length 1 to 10 over ℓ.... 85 Figure 5.6 Quantile-quantile plot for necklaces length 11 to 20 over ℓ. .. 86 Figure 5.7 Empirical and modelled CDFs for T-augmentation level ℓ. .. 89 Figure 5.8 Error between empirical and modelled PDFs for T-augmentation level ℓ with m from1to10. 90 Figure 5.9 Error between empirical and modelled PDFs for T-augmentation level ℓ with m from11to20. 91 Figure 5.10 Quantile-quantile plot for necklaces length 1 to 10 over h.... 93 x Figure 5.11 Quantile-quantile plot for necklaces length 11 to 20 over h. .. 94 Figure 5.12 Empirical and modelled CDFs for T-handle length |x˜i |...... 95 Figure 5.13 Error between empirical and modelled PDFs for T-handle length h with m from1to10. .............. 96 Figure 5.14 Error between empirical and modelled PDFs for T-handle length h with m from11to20. 97 Figure 5.15 Modelled and average empirical necklace count per length m over512trials. ...................... 99 Figure 5.16 Sample standard deviation for the necklace count per length m over512trials. ...................... 100 Figure 5.17 Error between modelled and average empirical necklace count per length m.......................... 101 Figure 5.18 Logarithmic ratio of modelled and average empirical necklace count per length m..................... 102 xi Nomenclature Mathematical Functions A(n,k) Eulerian

Load more