Counting Prime Polynomials and Measuring Complexity and Similarity of Information

Counting Prime Polynomials and
Measuring Complexity and Similarity of
Information

by
Niko Rebenich
B.Eng., University of Victoria, 2007 M.A.Sc., University of Victoria, 2012

A Dissertation Submitted in Partial Fulꢀllment of the
Requirements for the Degree of

DOCTOR OF PHILOSOPHY in the Department of Electrical and Computer Engineering

© Niko Rebenich, 2016
University of Victoria

Counting Prime Polynomials and
Measuring Complexity and Similarity of
Information

by
Niko Rebenich
B.Eng., University of Victoria, 2007 M.A.Sc., University of Victoria, 2012

Supervisory Committee Dr. Stephen Neville, Co-supervisor (Department of Electrical and Computer Engineering)

Dr. T. Aaron Gulliver, Co-supervisor (Department of Electrical and Computer Engineering)

Dr. Venkatesh Srinivasan, Outside Member (Department of Computer Science) iii

Supervisory Committee

Dr. Stephen Neville, Co-supervisor (Department of Electrical and Computer Engineering)

Dr. T. Aaron Gulliver, Co-supervisor (Department of Electrical and Computer Engineering)

Dr. Venkatesh Srinivasan, Outside Member (Department of Computer Science)

ABSTRACT

This dissertation explores an analogue of the prime number theorem for polynomials over ꢀnite ꢀelds as well as its connection to the necklace factorization algorithm T-transform and the string complexity measure T-complexity. Speciꢀcally, a precise asymptotic expansion for the prime polynomial counting function is derived. The approximation given is more accurate than previous results in the literature while requiring very little computational eꢁort. In this context asymptotic series expansions for Lerch transcendent, Eulerian polynomials, truncated polylogarithm, and polylogarithms of negative integer order are also provided. The expansion formulas developed are general and have applications in numerous areas other than the enumeration of prime polynomials.
A bijection between the equivalence classes of aperiodic necklaces and monic prime polynomials is utilized to derive an asymptotic bound on the maximal T- complexity value of a string. Furthermore, the statistical behaviour of uniform random sequences that are factored via the T-transform are investigated, and an accurate probabilistic model for short necklace factors is presented.
Finally, a T-complexity based conditional string complexity measure is proposed and used to deꢀne the normalized T-complexity distance that measures similarity between strings. The T-complexity distance is proven to not be a metric. However, the measure can be computed in linear time and space making it a suitable choice for large data sets. iv

Contents

Supervisory Committee Abstract

ii iii iv

Table of Contents List of Tables

vii ix

List of Figures List of Nomenclature Acknowledgements Dedication

xi xv xvi

1 Introduction

1

34

1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Algebraic and Number Theory Background

6

6799

2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Cyclic Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Finite Field Extensions . . . . . . . . . . . . . . . . . . . . . . .
2.4 Primitive Roots of Unity and Cyclotomic Cosets . . . . . . . . . . . . . 12 2.5 Monic Irreducible Polynomials and Necklaces . . . . . . . . . . . . . . 14
2.5.1 Bounding the Number of Monic Irreducible Polynomials . . . 17 2.5.2 Density of Monic Irreducible Polynomials . . . . . . . . . . . . 18 2.5.3 Aperiodic and Periodic Necklaces . . . . . . . . . . . . . . . . 19 v
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 An Analogue of the Prime Number Theorem for Polynomials over
Finite Fields

24

3.1 Enumeration of Prime Polynomials . . . . . . . . . . . . . . . . . . . . 25 3.2 Asymptotic Expansions of the Truncated Polylogarithm . . . . . . . . 29 3.3 The Prime Polynomial Theorem for Finite Fields . . . . . . . . . . . . 41 3.4 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 The T-Transform and T-Complexity

57

4.1 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . 57
4.1.1 Computational Complexity . . . . . . . . . . . . . . . . . . . . 57 4.1.2 Algorithmic Complexity . . . . . . . . . . . . . . . . . . . . . . 58 4.1.3 Deterministic Complexity and Randomness . . . . . . . . . . . 58
4.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3 T-Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.4 The T-Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.1 The Naïve T-Transform Algorithm . . . . . . . . . . . . . . . . 66 4.4.2 T-Transform Algorithm Evolution . . . . . . . . . . . . . . . . . 69
4.5 T-Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.1 Bounding T-Complexity . . . . . . . . . . . . . . . . . . . . . . 71
4.6 Computational Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5 The T-Complexity of Uniformly Distributed Random Sequences

78

5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2 Conjectures on the Statistics of the T-Transform . . . . . . . . . . . . . 79
5.2.1 The T-augmentation Level Distribution of Short Necklaces . . 82 5.2.2 The T-handle Length Distribution of Short Necklaces . . . . . 92 5.2.3 Beyond Short Necklaces . . . . . . . . . . . . . . . . . . . . . . 98
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6 Measuring String Similarity

104

6.1 The Normalized Information Distance . . . . . . . . . . . . . . . . . . 104 6.2 The Normalized Compression Distance . . . . . . . . . . . . . . . . . 105 vi
6.3 The Normalized T-Complexity Distance . . . . . . . . . . . . . . . . . 107
6.3.1 Metric Violation . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7 Conclusions

114

7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

A Supplemental Materials

117

A.1 Maple Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Bibliography

121

vii

List of Tables

Table 2.1 Table 2.2 Table 2.3
Finite ꢀeld representation for F₂[t]/(t⁴+ t + 1). . . . . . . . . . 20 Cyclotomic cosets for F₁₆. . . . . . . . . . . . . . . . . . . . . . 21

m

Cyclotomic cosets of F₂and binary necklaces of length m=4. 22
Table 3.1 Table 3.2 Table 3.3 Table 3.4 Table 3.5 Table 3.6 Table 3.7 Table 3.8
Asymptotic approximations to A_n,K(z) for z = 0.2. . . . . . . . 45 Asymptotic approximations to A_n,K(z) for z = 2. . . . . . . . . 46 Asymptotic approximations to A_n,K(z) for z = −7 + 11i. . . . . 47 Eulerian number triangle. . . . . . . . . . . . . . . . . . . . . . 48 Asymptotic approximations to L_N(z,s,m) for z = 3, s = 1. . . . 49 Relative approximation error of L_N(z,s,m) for z = 3, s = 1. . . . 51 Relative approximation error of L_N(z,s,m) for z = 1.25, s = 2. . 52 Relative approximation error of L_N(z,s,m) for z = −9 + 2.5i, s = 5. . . . . . . . . . . . . . . . . . . . . . . . . . 53

Absolute approximation error of the monic prime

Table 3.9

polynomial counting function for q = 2. . . . . . . . . . . . . . 54
Table 3.10 Relative approximation error of the monic prime polynomial counting function estimates for q = 2. . . . . . . . 55

Table 4.1

Computational complexity of LZ string factorization

algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Computational complexity of T-transform algorithms. . . . . . 70 Comparison of lower and asymptotic bound on maximal T-complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Comparison of upper and asymptotic bound on maximal T-complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Table 4.2 Table 4.3

Table 4.4

Table 5.1

Exponential T-augmentation level probability model

parameter estimation. . . . . . . . . . . . . . . . . . . . . . . . 87 viii

Table 5.2

Goodness of ꢀt test and exponential PDF parameter

estimations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Table 6.1 Table 6.2 Table 6.3
T-transform of string x#y. . . . . . . . . . . . . . . . . . . . . . 108 T-transform of string y#x. . . . . . . . . . . . . . . . . . . . . . 109 T-transform results for all pairwise concatenations of the three strings x, y, and z. . . . . . . . . . . . . . . . . . . . . . . 112 ix

List of Figures

Figure 2.1 Binary necklaces of length m = 4. . . . . . . . . . . . . . . . . . 22 Figure 3.1 Comparison of empirical and optimal truncation of
L_N(z,s,m) for z = 3, s = 1. . . . . . . . . . . . . . . . . . . . . . 48
Figure 3.2 Absolute approximation error of L_N(z,s,m) for z = 3, s = 1 under optimal truncation. . . . . . . . . . . . . . . 50

Figure 4.1 Example of a binary T-code construction. . . . . . . . . . . . . 64 Figure 4.2 Pseudo-code listing of naïve T-transform algorithm. . . . . . . 66 Figure 4.3 T-transform at intermediate T-augmentation level i. . . . . . . 67 Figure 4.4 Comparison of upper, lower, and asymptotic bound on maximal T-complexity. . . . . . . . . . . . . . . . . . . . . . . . 75

Figure 5.1 T-complexity of random sequence x versus minimal and maximal T-complexity bounds. . . . . . . . . . . . . . . . . . . 80
Figure 5.2 Histogram of υ(x) for |x| = 2³²bits for 512 binary uniform random sequences. . . . . . . . . . . . . . . . . . . . . . . . . . 81
Figure 5.3 Empirical and modelled cumulative distribution function of T-augmentation levels of x. . . . . . . . . . . . . . . 82
Figure 5.4 Probability of the occurrence of a necklace of length m at
T-augmentation level ℓ. . . . . . . . . . . . . . . . . . . . . . . . 84
Figure 5.5 Quantile-quantile plot for necklaces length 1 to 10 over ℓ. . . . 85 Figure 5.6 Quantile-quantile plot for necklaces length 11 to 20 over ℓ. . . 86 Figure 5.7 Empirical and modelled CDFs for T-augmentation level ℓ. . . 89 Figure 5.8 Error between empirical and modelled PDFs for
T-augmentation level ℓ with m from 1 to 10. . . . . . . . . . . . 90
Figure 5.9 Error between empirical and modelled PDFs for
T-augmentation level ℓ with m from 11 to 20. . . . . . . . . . . 91
Figure 5.10 Quantile-quantile plot for necklaces length 1 to 10 over h. . . . 93 x
Figure 5.11 Quantile-quantile plot for necklaces length 11 to 20 over h. . . 94 Figure 5.12 Empirical and modelled CDFs for T-handle length |x˜_i|. . . . . . 95 Figure 5.13 Error between empirical and modelled PDFs for
T-handle length h with m from 1 to 10. . . . . . . . . . . . . . . 96
Figure 5.14 Error between empirical and modelled PDFs for
T-handle length h with m from 11 to 20. . . . . . . . . . . . . . 97
Figure 5.15 Modelled and average empirical necklace count per length m over 512 trials. . . . . . . . . . . . . . . . . . . . . . . 99
Figure 5.16 Sample standard deviation for the necklace count per length m over 512 trials. . . . . . . . . . . . . . . . . . . . . . . 100
Figure 5.17 Error between modelled and average empirical necklace count per length m. . . . . . . . . . . . . . . . . . . . . . . . . . 101
Figure 5.18 Logarithmic ratio of modelled and average empirical necklace count per length m. . . . . . . . . . . . . . . . . . . . . 102 xi

Nomenclature

Mathematical Functions

A(n,k)

A_n(z)

B_n

Eulerian number The nth Eulerian polynomial in z The nth Bernoulli number B_n= B_n(0) with B₀= 1 The nth Bernoulli polynomial in x T-complexity of the string x

B_n(x) C_T(x) C_T(x)

Maximal T-complexity bound of strings of length |x| Normalized information distance of x and y Normalized compression distance of x and y Normalized T-complexity distance of x and y Exponential integral

max

d_NID(x,y)

dd

_{NC D}(x,y) _{NT C}(x,y)

Ei(x)

gcd(a,b)

ℑ(z)

Greatest common divisor of integers a and b Imaginary part of complex number z Least common multiple of integers a and b Logarithmic integral lcm(a,b)

li(x) Li(x)

Oꢁset logarithmic integral

Li_s(z)

Polylogarithm, also known as Jonquière’s function Truncated polylogarithm function

L(z,s,m)

xii

logz

Natural logarithm function

log_bz

L_p(m)

Logarithm function of base b The number of monic irreducible polynomials of degree m or the number of Lyndon words of length m

µ(n)

Möbius function

N_q(m)

The number of distinct monic irreducible polynomials over F_qof degree d 6 m such that d|m

O( · )

Landau gauge in asymptotics and big O notation in computer science

ord(a)

ϕ(n)

Order of the element a of a cyclic group Euler’s totient function

Φ(z,s,a) π(x)

Lerch transcendent Prime counting function enumerating the number of primes less than or equal to x

π_q(m)

Prime polynomial counting function enumerating the monic irreducible polynomials of degree m or less in F_q[t]

PP(g,z) ℜ(z)

Principal part of the Laurent series of the function g about z Real part of complex number z

Res(g,z)

Residue of the function g at z

Mathematical Symbols and Notation

·

Operation or variable placeholder Set product or scalar multiplication Plus or minus

×

,

Negation of equality xiii

≡

∼

≪≫∀

Equality as per modulo operation mod Asymptoticity, f ∼ g implies f /g → 1 Much less than Much greater than For all

⊂∩∪\

Set containment relation Set intersection operator Set union operator Set diꢁerence

∅

Empty set

∈

Set membership

<

Negation of set membership

| · |

Magnitude of complex number, cardinality of set, or length of a string

:

{ }

Deꢀnes properties of elements of set Set of elements a, b, and c

{a,b,c} [a,b]

d|n

→

Set of real numbers between a and b The integer d divides the integer n Convergence

→

Function mapping

C

Set of complex numbers

F_q

Finite ꢀeld withq elements whereq is a prime power, F_q= (Z_q,+,×) Multiplicative group of the ꢀnite ꢀeld F_q, F_q^∗= (Z_q\{0},×)

F_q^∗

xiv

F_q[t]

k_i

Univariate polynomials with coeꢂcients in F_qThe ith copy factor lim mod
Limit value Modulo operation, gives remainder after division of one integer by another

N

N⁺

N_∗

p_i

Set of nonnegative integers, N = {0,1,. . . } Set of positive integers excluding zero The index of least term of an optimally truncated series The ith copy pattern or the ith distinct prime factor of an integer Set of real numbers

R

R⁺

ρ

Set of positive real numbers excluding zero Signiꢀcance level of statistical test

S

Alphabet set S = {a₁,a₂,a₃,. . . ,a_q−1,a_q} where a_iare symbols Set of all strings including the empty string Set of all strings excluding the empty string T-code at T-augmentation level j

S^∗

+

S

(k₁,k₂,...,k_j)

S

(p₁,p₂,...,p_j)

Z

Set of positive and negative integers including zero Set of positive integers excluding zero

Z⁺

Z⁻

Set of negative negative integers excluding zero Cyclic group of q elements

(Z_q,·)

xv
ACKNOWLEDGEMENTS
The three years I have spent studying for this Ph.D. have been a great experience for me. I feel that I have learned a lot in this time, not only academically but also personally, and I am indebted to the many people that have supported me along the way.
First and foremost, I would like to thank my Co-supervisors Aaron Gulliver and Stephen Neville. Thank you Aaron for being so generous with your time, always asking the right questions, and motivating me to push further than I thought I could go. When I needed a little advice, you always had some to spare. Without your guidance I would not have written this dissertation. Your knowledge, enthusiasm, and sense of humour have been an inspiration for me and made my Ph.D. so enjoyable. Thank you Stephen for convincing me to do this Ph.D. in the ꢀrst place, your time, advice, feedback, and ꢀnancial support was much appreciated.
A very special thanks also goes to Ulrich Speidel at the University of Auckland, who has always been willing to share his advice and expertise with me. I loved going on all the hiking trips with you when you where here. Thanks for being such a great person and for reading all my drafts so carefully.
Thank you Dr. Wu-Sheng Lu for opening my eyes to convex optimization, you truly taught the best class I ever took.
Thanks also go to my many friends (you know who you are). Without you guys this Ph.D. would have been much less exciting.
A big thank-you also goes to my family away from home, Penny and Doug, thanks for being so wonderful. Most of all, I would like to thank my parents along with my brothers. Thank you for all your support and your continued love and encouragement over all these years. Papi, Mami, Jan, and Till thanks for always being there for me.

Divergent series are the invention of the devil, and it is shameful to base on them any demonstrations whatsoever.

Niels Henrik Abel xvi
DEDICATION
For my father, with love.
1

Chapter 1 Introduction

In mathematics, hardly any topic has intrigued curious minds more than the study of prime numbers. A prime number is a positive integer larger than one that has no positive divisors other than one and itself. All integers other than zero and one can be factored into a sequence of primes. However, for large integers prime factorization is a computationally hard problem which is exploited in cryptography for the secure exchange of information over an untrusted communication link.
Among the positive integers prime numbers seem to be randomly distributed, yet on close inspection their asymptotic distribution shows remarkable regularity. Towards the end of the 18th century Gauss noticed that the probability that a randomly chosen integer less than x is a prime number is close to 1/ logx. He later conjectured that the prime counting function enumerating the number of primes less than or equal to x is asymptotically given by the oꢁset logarithmic integral which may be approximated in terms of a divergent series expansion as follows