Counting Prime Polynomials and
Measuring Complexity and Similarity of
Information
by
Niko Rebenich
B.Eng., University of Victoria, 2007 M.A.Sc., University of Victoria, 2012
A Dissertation Submitted in Partial Fulꢀllment of the
Requirements for the Degree of
DOCTOR OF PHILOSOPHY in the Department of Electrical and Computer Engineering
© Niko Rebenich, 2016
University of Victoria
All rights reserved. This dissertation may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author. ii
Counting Prime Polynomials and
Measuring Complexity and Similarity of
Information
by
Niko Rebenich
B.Eng., University of Victoria, 2007 M.A.Sc., University of Victoria, 2012
Supervisory Committee Dr. Stephen Neville, Co-supervisor (Department of Electrical and Computer Engineering)
Dr. T. Aaron Gulliver, Co-supervisor (Department of Electrical and Computer Engineering)
Dr. Venkatesh Srinivasan, Outside Member (Department of Computer Science) iii
Supervisory Committee
Dr. Stephen Neville, Co-supervisor (Department of Electrical and Computer Engineering)
Dr. T. Aaron Gulliver, Co-supervisor (Department of Electrical and Computer Engineering)
Dr. Venkatesh Srinivasan, Outside Member (Department of Computer Science)
ABSTRACT
This dissertation explores an analogue of the prime number theorem for polynomials over ꢀnite ꢀelds as well as its connection to the necklace factorization algorithm T-transform and the string complexity measure T-complexity. Speciꢀcally, a precise asymptotic expansion for the prime polynomial counting function is derived. The approximation given is more accurate than previous results in the literature while requiring very little computational eꢁort. In this context asymptotic series expansions for Lerch transcendent, Eulerian polynomials, truncated polylogarithm, and polylogarithms of negative integer order are also provided. The expansion formulas developed are general and have applications in numerous areas other than the enumeration of prime polynomials.
A bijection between the equivalence classes of aperiodic necklaces and monic prime polynomials is utilized to derive an asymptotic bound on the maximal T- complexity value of a string. Furthermore, the statistical behaviour of uniform random sequences that are factored via the T-transform are investigated, and an accurate probabilistic model for short necklace factors is presented.
Finally, a T-complexity based conditional string complexity measure is proposed and used to deꢀne the normalized T-complexity distance that measures similarity between strings. The T-complexity distance is proven to not be a metric. However, the measure can be computed in linear time and space making it a suitable choice for large data sets. iv
Contents
Supervisory Committee Abstract
ii iii iv
Table of Contents List of Tables
vii ix
List of Figures List of Nomenclature Acknowledgements Dedication
xi xv xvi
1 Introduction
1
34
1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Algebraic and Number Theory Background
6
6799
2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Cyclic Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Finite Field Extensions . . . . . . . . . . . . . . . . . . . . . . .
2.4 Primitive Roots of Unity and Cyclotomic Cosets . . . . . . . . . . . . . 12 2.5 Monic Irreducible Polynomials and Necklaces . . . . . . . . . . . . . . 14
2.5.1 Bounding the Number of Monic Irreducible Polynomials . . . 17 2.5.2 Density of Monic Irreducible Polynomials . . . . . . . . . . . . 18 2.5.3 Aperiodic and Periodic Necklaces . . . . . . . . . . . . . . . . 19 v
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 An Analogue of the Prime Number Theorem for Polynomials over
Finite Fields
24
3.1 Enumeration of Prime Polynomials . . . . . . . . . . . . . . . . . . . . 25 3.2 Asymptotic Expansions of the Truncated Polylogarithm . . . . . . . . 29 3.3 The Prime Polynomial Theorem for Finite Fields . . . . . . . . . . . . 41 3.4 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 The T-Transform and T-Complexity
57
4.1 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . 57
4.1.1 Computational Complexity . . . . . . . . . . . . . . . . . . . . 57 4.1.2 Algorithmic Complexity . . . . . . . . . . . . . . . . . . . . . . 58 4.1.3 Deterministic Complexity and Randomness . . . . . . . . . . . 58
4.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3 T-Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.4 The T-Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.1 The Naïve T-Transform Algorithm . . . . . . . . . . . . . . . . 66 4.4.2 T-Transform Algorithm Evolution . . . . . . . . . . . . . . . . . 69
4.5 T-Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.1 Bounding T-Complexity . . . . . . . . . . . . . . . . . . . . . . 71
4.6 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5 The T-Complexity of Uniformly Distributed Random Sequences
78
5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Conjectures on the Statistics of the T-Transform . . . . . . . . . . . . . 79
5.2.1 The T-augmentation Level Distribution of Short Necklaces . . 82 5.2.2 The T-handle Length Distribution of Short Necklaces . . . . . 92 5.2.3 Beyond Short Necklaces . . . . . . . . . . . . . . . . . . . . . . 98
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6 Measuring String Similarity
104
6.1 The Normalized Information Distance . . . . . . . . . . . . . . . . . . 104 6.2 The Normalized Compression Distance . . . . . . . . . . . . . . . . . 105 vi
6.3 The Normalized T-Complexity Distance . . . . . . . . . . . . . . . . . 107
6.3.1 Metric Violation . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7 Conclusions
114
7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A Supplemental Materials
117
A.1 Maple Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Bibliography
121
vii
List of Tables
Table 2.1 Table 2.2 Table 2.3
Finite ꢀeld representation for F2[t]/(t4 + t + 1). . . . . . . . . . 20 Cyclotomic cosets for F16. . . . . . . . . . . . . . . . . . . . . . 21
m
Cyclotomic cosets of F2 and binary necklaces of length m=4. 22
Table 3.1 Table 3.2 Table 3.3 Table 3.4 Table 3.5 Table 3.6 Table 3.7 Table 3.8
Asymptotic approximations to An,K(z) for z = 0.2. . . . . . . . 45 Asymptotic approximations to An,K(z) for z = 2. . . . . . . . . 46 Asymptotic approximations to An,K(z) for z = −7 + 11i. . . . . 47 Eulerian number triangle. . . . . . . . . . . . . . . . . . . . . . 48 Asymptotic approximations to LN (z,s,m) for z = 3, s = 1. . . . 49 Relative approximation error of LN (z,s,m) for z = 3, s = 1. . . . 51 Relative approximation error of LN (z,s,m) for z = 1.25, s = 2. . 52 Relative approximation error of LN (z,s,m) for z = −9 + 2.5i, s = 5. . . . . . . . . . . . . . . . . . . . . . . . . . 53
- Absolute approximation error of the monic prime
- Table 3.9
polynomial counting function for q = 2. . . . . . . . . . . . . . 54
Table 3.10 Relative approximation error of the monic prime polynomial counting function estimates for q = 2. . . . . . . . 55
- Table 4.1
- Computational complexity of LZ string factorization
algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Computational complexity of T-transform algorithms. . . . . . 70 Comparison of lower and asymptotic bound on maximal T-complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Comparison of upper and asymptotic bound on maximal T-complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Table 4.2 Table 4.3
Table 4.4
- Table 5.1
- Exponential T-augmentation level probability model
parameter estimation. . . . . . . . . . . . . . . . . . . . . . . . 87 viii
- Table 5.2
- Goodness of ꢀt test and exponential PDF parameter
estimations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Table 6.1 Table 6.2 Table 6.3
T-transform of string x#y. . . . . . . . . . . . . . . . . . . . . . 108 T-transform of string y#x. . . . . . . . . . . . . . . . . . . . . . 109 T-transform results for all pairwise concatenations of the three strings x, y, and z. . . . . . . . . . . . . . . . . . . . . . . 112 ix
List of Figures
Figure 2.1 Binary necklaces of length m = 4. . . . . . . . . . . . . . . . . . 22 Figure 3.1 Comparison of empirical and optimal truncation of
LN (z,s,m) for z = 3, s = 1. . . . . . . . . . . . . . . . . . . . . . 48
Figure 3.2 Absolute approximation error of LN (z,s,m) for z = 3, s = 1 under optimal truncation. . . . . . . . . . . . . . . 50
Figure 4.1 Example of a binary T-code construction. . . . . . . . . . . . . 64 Figure 4.2 Pseudo-code listing of naïve T-transform algorithm. . . . . . . 66 Figure 4.3 T-transform at intermediate T-augmentation level i. . . . . . . 67 Figure 4.4 Comparison of upper, lower, and asymptotic bound on maximal T-complexity. . . . . . . . . . . . . . . . . . . . . . . . 75
Figure 5.1 T-complexity of random sequence x versus minimal and maximal T-complexity bounds. . . . . . . . . . . . . . . . . . . 80
Figure 5.2 Histogram of υ(x) for |x| = 232 bits for 512 binary uniform random sequences. . . . . . . . . . . . . . . . . . . . . . . . . . 81
Figure 5.3 Empirical and modelled cumulative distribution function of T-augmentation levels of x. . . . . . . . . . . . . . . 82
Figure 5.4 Probability of the occurrence of a necklace of length m at
T-augmentation level ℓ. . . . . . . . . . . . . . . . . . . . . . . . 84
Figure 5.5 Quantile-quantile plot for necklaces length 1 to 10 over ℓ. . . . 85 Figure 5.6 Quantile-quantile plot for necklaces length 11 to 20 over ℓ. . . 86 Figure 5.7 Empirical and modelled CDFs for T-augmentation level ℓ. . . 89 Figure 5.8 Error between empirical and modelled PDFs for
T-augmentation level ℓ with m from 1 to 10. . . . . . . . . . . . 90
Figure 5.9 Error between empirical and modelled PDFs for
T-augmentation level ℓ with m from 11 to 20. . . . . . . . . . . 91
Figure 5.10 Quantile-quantile plot for necklaces length 1 to 10 over h. . . . 93 x
Figure 5.11 Quantile-quantile plot for necklaces length 11 to 20 over h. . . 94 Figure 5.12 Empirical and modelled CDFs for T-handle length |x˜i|. . . . . . 95 Figure 5.13 Error between empirical and modelled PDFs for
T-handle length h with m from 1 to 10. . . . . . . . . . . . . . . 96
Figure 5.14 Error between empirical and modelled PDFs for
T-handle length h with m from 11 to 20. . . . . . . . . . . . . . 97
Figure 5.15 Modelled and average empirical necklace count per length m over 512 trials. . . . . . . . . . . . . . . . . . . . . . . 99
Figure 5.16 Sample standard deviation for the necklace count per length m over 512 trials. . . . . . . . . . . . . . . . . . . . . . . 100
Figure 5.17 Error between modelled and average empirical necklace count per length m. . . . . . . . . . . . . . . . . . . . . . . . . . 101
Figure 5.18 Logarithmic ratio of modelled and average empirical necklace count per length m. . . . . . . . . . . . . . . . . . . . . 102 xi
Nomenclature
Mathematical Functions
A(n,k)
An(z)
Bn
Eulerian number The nth Eulerian polynomial in z The nth Bernoulli number Bn = Bn(0) with B0 = 1 The nth Bernoulli polynomial in x T-complexity of the string x
Bn(x) CT (x) CT (x)
Maximal T-complexity bound of strings of length |x| Normalized information distance of x and y Normalized compression distance of x and y Normalized T-complexity distance of x and y Exponential integral
max
dNID(x,y)
dd
NC D(x,y) NT C(x,y)
Ei(x)
gcd(a,b)
ℑ(z)
Greatest common divisor of integers a and b Imaginary part of complex number z Least common multiple of integers a and b Logarithmic integral lcm(a,b)
li(x) Li(x)
Oꢁset logarithmic integral
Lis(z)
Polylogarithm, also known as Jonquière’s function Truncated polylogarithm function
L(z,s,m)
xii
- logz
- Natural logarithm function
logb z
Lp(m)
Logarithm function of base b The number of monic irreducible polynomials of degree m or the number of Lyndon words of length m
µ(n)
Möbius function
Nq(m)
The number of distinct monic irreducible polynomials over Fq of degree d 6 m such that d|m
O( · )
Landau gauge in asymptotics and big O notation in computer science
ord(a)
ϕ(n)
Order of the element a of a cyclic group Euler’s totient function
Φ(z,s,a) π(x)
Lerch transcendent Prime counting function enumerating the number of primes less than or equal to x
πq(m)
Prime polynomial counting function enumerating the monic irreducible polynomials of degree m or less in Fq[t]
PP(g,z) ℜ(z)
Principal part of the Laurent series of the function g about z Real part of complex number z
- Res(g,z)
- Residue of the function g at z
Mathematical Symbols and Notation
·
Operation or variable placeholder Set product or scalar multiplication Plus or minus
×
,
Negation of equality xiii
≡
∼
≪≫∀
Equality as per modulo operation mod Asymptoticity, f ∼ g implies f /g → 1 Much less than Much greater than For all
⊂∩∪\
Set containment relation Set intersection operator Set union operator Set diꢁerence
∅
Empty set
∈
Set membership
<
Negation of set membership
| · |
Magnitude of complex number, cardinality of set, or length of a string
:
{ }
Deꢀnes properties of elements of set Set of elements a, b, and c
{a,b,c} [a,b]
d|n
→
Set of real numbers between a and b The integer d divides the integer n Convergence
→
Function mapping
C
Set of complex numbers
Fq
Finite ꢀeld withq elements whereq is a prime power, Fq = (Zq,+,×) Multiplicative group of the ꢀnite ꢀeld Fq, Fq∗ = (Zq\{0},×)
Fq∗
xiv
Fq[t]
ki
Univariate polynomials with coeꢂcients in Fq The ith copy factor lim mod
Limit value Modulo operation, gives remainder after division of one integer by another
N
N+
N∗
pi
Set of nonnegative integers, N = {0,1,. . . } Set of positive integers excluding zero The index of least term of an optimally truncated series The ith copy pattern or the ith distinct prime factor of an integer Set of real numbers
R
R+
ρ
Set of positive real numbers excluding zero Signiꢀcance level of statistical test
S
Alphabet set S = {a1,a2,a3,. . . ,aq−1,aq} where ai are symbols Set of all strings including the empty string Set of all strings excluding the empty string T-code at T-augmentation level j
S∗
+
S
(k1,k2,...,kj )
S
(p1,p2,...,pj )
Z
Set of positive and negative integers including zero Set of positive integers excluding zero
Z+
Z−
Set of negative negative integers excluding zero Cyclic group of q elements
(Zq,·)
xv
ACKNOWLEDGEMENTS
The three years I have spent studying for this Ph.D. have been a great experience for me. I feel that I have learned a lot in this time, not only academically but also personally, and I am indebted to the many people that have supported me along the way.
First and foremost, I would like to thank my Co-supervisors Aaron Gulliver and Stephen Neville. Thank you Aaron for being so generous with your time, always asking the right questions, and motivating me to push further than I thought I could go. When I needed a little advice, you always had some to spare. Without your guidance I would not have written this dissertation. Your knowledge, enthusiasm, and sense of humour have been an inspiration for me and made my Ph.D. so enjoyable. Thank you Stephen for convincing me to do this Ph.D. in the ꢀrst place, your time, advice, feedback, and ꢀnancial support was much appreciated.
A very special thanks also goes to Ulrich Speidel at the University of Auckland, who has always been willing to share his advice and expertise with me. I loved going on all the hiking trips with you when you where here. Thanks for being such a great person and for reading all my drafts so carefully.
Thank you Dr. Wu-Sheng Lu for opening my eyes to convex optimization, you truly taught the best class I ever took.
Thanks also go to my many friends (you know who you are). Without you guys this Ph.D. would have been much less exciting.
A big thank-you also goes to my family away from home, Penny and Doug, thanks for being so wonderful. Most of all, I would like to thank my parents along with my brothers. Thank you for all your support and your continued love and encouragement over all these years. Papi, Mami, Jan, and Till thanks for always being there for me.
Divergent series are the invention of the devil, and it is shameful to base on them any demonstrations whatsoever.
Niels Henrik Abel xvi
DEDICATION
For my father, with love.
1
Chapter 1 Introduction
In mathematics, hardly any topic has intrigued curious minds more than the study of prime numbers. A prime number is a positive integer larger than one that has no positive divisors other than one and itself. All integers other than zero and one can be factored into a sequence of primes. However, for large integers prime factorization is a computationally hard problem which is exploited in cryptography for the secure exchange of information over an untrusted communication link.
Among the positive integers prime numbers seem to be randomly distributed, yet on close inspection their asymptotic distribution shows remarkable regularity. Towards the end of the 18th century Gauss noticed that the probability that a randomly chosen integer less than x is a prime number is close to 1/ logx. He later conjectured that the prime counting function enumerating the number of primes less than or equal to x is asymptotically given by the oꢁset logarithmic integral which may be approximated in terms of a divergent series expansion as follows