University of Calgary PRISM: University of Calgary's Digital Repository
Graduate Studies The Vault: Electronic Theses and Dissertations
2014-09-30 New Notions of Secrecy and User Generated Randomness in Cryptography
Alimomeni, Mohsen
Alimomeni, M. (2014). New Notions of Secrecy and User Generated Randomness in Cryptography (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/27097 http://hdl.handle.net/11023/1874 doctoral thesis
University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY
New Notions of Secrecy and
User Generated Randomness in Cryptography
by
Mohsen Alimomeni
A THESIS
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
DEPARTMENT OF COMPUTER SCIENCE
CALGARY, ALBERTA
SEPTEMBER, 2014
c Mohsen Alimomeni 2014
Abstract
Randomness plays a central role in computer science, in particular cryptography. Almost all
cryptographic primitives depend crucially on randomness because randomness and unpre-
dictability in secret keys provides the means for security. Usually one assumes that perfect
randomness, a sequence of independently and uniformly distributed bits, are accessible to
algorithms. This is a strong assumption. Physical sources of randomness are neither uniformly
random, nor produce necessarily independent bits. Therefore, the aim of this thesis is to start
from a realistic model of randomness, investigate notions of secrecy and their randomness
requirements and finally find practical methods for generation of randomness that matches
the requirements of cryptographic primitives. We consider a model of random source where
the source output follows one distribution from a set of possible distributions, each with the
property that the maximum probability of symbols is bounded and can not be arbitrarily
close to 1. This model does not assume independence or uniformity of the output symbols,
and is considered to be a realistic model of randomness. From this point, the thesis can be
divided into two main parts:
In the first part, considering various notions of information theoretic secrecy, a fundamental
problem is to find the properties of randomness needed to achieve security in these notions.
Traditional cryptographic protocols simply assume perfect randomness and build on this
assumption. We explore the results that show secrecy can not be based on imperfect random-
ness that is not uniform or independent. Thus a line of work attempts to relax notions of
secrecy in such a way that they can be constructed with non-perfect sources, and possibly
require smaller key sizes. Yet they should match real life applications. An important work
in this context is entropic security where the key size could be smaller than the message
depending on message distribution. Inspired by this, we propose two relaxed notions of
secrecy that are motivated by practical applications. In the first notion, motivated by an
i application in biometrics authentication, we propose guessing secrecy where the probability
of guessing the message for a computational unbounded adversary with the best strategy
remains the same when a ciphertext is given or when it is not. We compare the randomness
requirements of guessing secrecy with stronger notions and show that in some cases such
as key length, the requirements are the same. For key distribution however, we found a
family of distributions that provide guessing secrecy but not perfect secrecy. In the second
notion, we investigate randomness requirements of multiple message encryption. Considering
a natural extension of secrecy definition to multiple messages, we show that independent
keys are needed to encrypt each message. We then propose a relaxed notion in which the
security of last message is more important than past messages, although the leakage of past
messages is bounded using entropic security. By this assumption, we achieve smaller key
length compared to -indistinguishability, and comparable to key length for entropic security.
This notion has applications such as location privacy.
In the second part of the thesis, since secrecy crucially depends on perfect randomness, we
investigate how perfect randomness can be practically generated, specifically from human
game-play. Unlike many random number generators that assume independent or uniform
random bits in the random source, we base our work on the realistic model of randomness.
Our main observation is that human game-play has an element of randomness coming from
the errors in their game-play which is the main entertaining factor of the game. We also
observed that this game-play can distinguish among a group of people if the right features
are collected from the game-play. We incrementally changed our game design until the
distinguishability among a small population is maximized, and then run the experiments
required to show the viability of this approach over a larger population. This approach can
also provide a hard to delegate authentication property where a human could not emulate
the behavior of another human even given statistical information about their game-play. Acknowledgments
My research and this thesis would not have been possible without the help and support of
kind people around me; my supervisor, committee members, friends, family and my wife.
First and foremost I wish to express my deepest appreciation to my supervisor, Reihaneh
Safavi-Naini, for her continuous support during my PhD study and research. Your patience
in reading the drafts of my work over and over helped me learn a lot from your comments
and views on topics discussed in this thesis. The joy and enthusiasm you have for research was contagious and motivational for me.
My sincere thanks goes to my supervisor committee, Philipp Woelfel and Payman Mohassel who helped me during different stages of my research in UofC. I would like to thank my
examiners Keith Martin and Christoph Simon for the comments and suggestions that improved
this work. I gratefully acknowledge the funding sources that made my Ph.D. work possible.
The funding was provided as scholarship, teaching and research assistantships I received from
department of computer science in UofC and Alberta Innovates Technology Future (AITF) .
I would like to thank many friends that made our life happier during our stay in Calgary.
Specifically I appreciate Johnson family for their kind hospitality. I enjoyed my time in
Calgary with my close friends and their family for the gatherings and trips we had together.
Thank you all my nice friends. I would also like to thank my parents, brothers and sister.
They were always supporting me and encouraging me with their best wishes.
Nobody could have stood by me through the good and bad times of more than 4 years of
my study better than my wife. In occasions when I was very busy with paper deadlines, she
tolerated a monotonous life, yet cheered me up to motivate me work better. I am indebted
to you Narges Mashayekhi.
iii Table of Contents
Abstract ...... i Acknowledgments ...... iii Table of Contents ...... iv List of Tables ...... viii List of Figures ...... ix List of Symbols ...... x 1 Introduction ...... 1 1.1 Formal model of randomness ...... 2 1.2 Randomness for secrecy ...... 3 1.2.1 Guessing secrecy ...... 6 1.2.2 Correlated keys for multiple messages ...... 7 1.3 Generating random numbers, and applications ...... 8 1.3.1 TRG from human game-play: Using video games ...... 9 1.3.2 TRG from human game-play: Game theoretic approach ...... 9 1.3.3 User generated randomness for authentication ...... 9 1.4 Other contributions ...... 10 1.4.1 Review, partial results, and comparison of secrecy primitives . . . . . 10 1.4.2 Location based storage ...... 10 1.5 Subsequent works ...... 11 1.6 Thesis structure ...... 11 1.6.1 Theorems and proofs ...... 12 2 Preliminaries and Basics ...... 13 2.1 Probability theory ...... 13 2.2 Information theoretic measures ...... 14 2.3 Information theoretic security ...... 20 2.3.1 Computational versus information theoretic model ...... 21 2.4 Secrecy ...... 22 2.4.1 Secret sharing ...... 26 2.5 Concluding remarks ...... 28 3 Randomness requirement of secrecy ...... 29 3.1 Modeling random sources ...... 29 3.2 Dealing with weak random sources in cryptography ...... 32 3.2.1 Local versus public versus shared randomness ...... 34 3.3 Paradigm 1: Randomness extraction ...... 34 3.3.1 Deterministic extractors ...... 35 3.3.2 Seeded Extractors ...... 36 3.4 Paradigm 2: Constructions using imperfect randomness ...... 38 3.4.1 Randomness requirements for perfect secrecy ...... 39 3.4.2 Relaxation of perfect secrecy ...... 41 3.4.3 Comparison of perfect secrecy relaxations ...... 42 3.4.4 Randomness requirements of indistinguishability ...... 43 3.4.5 Secrecy with weak random sources ...... 45
iv 3.4.6 t-source admit randomized encryption ...... 48 3.4.7 One-time pad is universal for deterministic encryption ...... 49 3.5 Randomness requirement for secret sharing ...... 52 3.6 Authentication sources ...... 58 3.6.1 Authentication with t-sources ...... 60 3.7 Comparison of random sources ...... 61 3.8 Concluding remarks ...... 62 4 Guessing secrecy ...... 64 4.1 Introduction ...... 64 4.1.1 Motivation ...... 65 4.1.2 Related work ...... 66 4.1.3 Our contribution ...... 67 4.2 Secrecy based on guessing probability ...... 68 4.3 Requirements on the key size ...... 70 4.4 Requirements on the key distribution ...... 75 4.4.1 Guessing secrecy with imperfect randomness ...... 78 4.4.2 Relation with perfect secrecy ...... 80 4.5 Applications ...... 81 4.6 Bounds on conditional min-entropy ...... 82 4.7 Concluding remarks ...... 85 5 Information Theoretic Security of Sequential High Entropy Messages . . . . . 86 5.1 Introduction ...... 86 5.1.1 Our contribution ...... 87 5.1.2 Applications ...... 88 5.1.3 Entropic security ...... 89 5.2 Encryption of multiple sequential messages ...... 91 5.2.1 Relaxing λ-message security ...... 94 5.2.2 Min-entropy v.s. -indistinguishability ...... 96 5.2.3 Encryption of uniformly random messages ...... 96 5.2.4 Using correlated keys for min-entropy messages ...... 97 5.2.5 Entropic security of the past messages ...... 106 5.3 Concluding remarks ...... 108 5.4 Proofs of lemmas and theorems ...... 108 5.4.1 Proof of Lemma 5.2.1 ...... 108 5.4.2 Proof of Lemma 5.2.2 ...... 109 5.4.3 Proof of Lemma 5.2.3 ...... 109 5.4.4 Proof of security for uniform messages ...... 109 5.4.5 Proof of Theorem 5.2.2 ...... 111 6 Human-Assisted Generation of Random Numbers ...... 113 6.1 Introduction ...... 113 6.1.1 Related work ...... 114 6.2 The structure of a TRG ...... 116 6.2.1 Entropy estimation of the source ...... 117 6.2.2 Extraction module ...... 118 6.2.3 Measuring statistical property of the output ...... 120
v 6.3 Human game-play in video games for randomness generation ...... 121 6.3.1 Our contribution ...... 122 6.3.2 Applications ...... 123 6.3.3 The TRG design ...... 124 6.3.4 Experimental setup and results ...... 130 6.4 Human game-play in zero-sum games for randomness generation ...... 135 6.4.1 This work ...... 137 6.4.2 Background on expander graphs ...... 138 6.4.3 The TRG design ...... 141 6.4.4 Experiments ...... 146 6.4.5 Measures of randomness for our game ...... 146 6.5 Comparison to Halprin et al. approach ...... 149 6.6 Concluding remarks ...... 150 7 Human game-play for user authentication ...... 152 7.1 Introduction ...... 152 7.1.1 Our contribution ...... 154 7.1.2 Related work ...... 159 7.2 Delegation in authentication ...... 161 7.2.1 HtD Authentication ...... 161 7.3 HtE authentication games ...... 164 7.3.1 The model ...... 165 7.3.2 Criteria to select human features ...... 167 7.4 From HtE games to HtD authentication ...... 169 7.4.1 Adversarial benefits from delegation ...... 170 7.4.2 Game client bypassing attacks ...... 171 7.4.3 MiM attacks ...... 174 7.4.4 Security of the protocol ...... 177 7.5 The proof of concept: HtE game ...... 179 7.5.1 The game design ...... 179 7.5.2 The verification function ...... 181 7.5.3 Experimental setup ...... 184 7.6 Concluding remarks ...... 187 7.6.1 Extensions and future work ...... 187 8 Concluding remarks ...... 189 8.1 Randomness requirements of secrecy ...... 190 8.1.1 Sources for secret sharing ...... 190 8.1.2 Guessing secrecy ...... 192 8.2 Generation of random numbers ...... 194 8.2.1 Randomness generators using human game-play ...... 194 8.2.2 Randomness for authentication ...... 195 Bibliography ...... 197 A LoSt: Location Based Storage ...... 216 A.1 Introduction ...... 216 A.1.1 Setting Considered ...... 217 A.1.2 Our Contribution ...... 219
vi A.2 Related Work ...... 221 A.3 Storage Model ...... 222 A.4 Trust Assumptions and Impossibility Results ...... 225 A.4.1 Assumptions on Adversarial Behavior ...... 227 A.5 Proofs of Location ...... 227 A.5.1 PoL Scheme ...... 229 A.5.2 Security Model ...... 231 A.5.3 Constructing a PoL ...... 233 A.5.4 PoR with Recoding ...... 234 A.5.5 A Secure PoL Using PoR with Recoding ...... 238 A.6 Experiments ...... 240 A.6.1 Geolocation Method ...... 241 A.6.2 Error Analysis ...... 243 A.7 Conclusion ...... 244 A.8 Acknowledgments ...... 245
vii List of Tables
3.1 Encryption function with uniform keys ...... 40 3.2 Encryption table with non-uniform keys ...... 40
1 4.1 Table for 32 -guessing secrecy ...... 73 3 4.2 Partitioned one-time pad for 32 -guessing secrecy ...... 75 6.1 Min-entropy of users in first sub-game ...... 147 6.2 Result of statistical test ...... 148 6.3 Min-entropy of users in second sub-game ...... 148 6.4 Result of statistical test ...... 148
viii List of Figures and Illustrations
2.1 Van diagram of Entropy measures ...... 16 2.2 Secure Communication in Cryptography ...... 20
3.1 Impossibility of randomness extraction from a (n, n 1)-source ...... 36 3.2 Comparison of random sources ...... − ...... 62
6.1 Screen-shot of the game ...... 125 6.2 The measurement of output ...... 125 6.3 Min-entropy for players ...... 131 6.4 Min-entropy in blocks of bits (One user) ...... 131 6.5 Min-entropy during level A, B, C for 3 users ...... 132 6.6 Average min-entropy change during levels over all users ...... 132 6.7 Min-entropy of bits ...... 133 6.8 The game ...... 144
7.1 Screen-shot of the game ...... 179
7.2 12Hit precision ...... 179 7.3 User verification accuracy; measurements matching the profile ...... 181 7.4 The smooth edf of the features for 7 random users, and the distance between them, illustrating how the features can distinguish among a group of people. 183 7.5 The smooth histogram of the feature values for 7 users, illustrating how the features can distinguish among a group of people...... 185 7.6 A user trying to emulate the behavior of another user; increase in hit accuracy results in increase in targeting time...... 186
A.1 Example of File Location Regions ...... 223 A.2 Bad Verification Example ...... 224 A.3 Location Faking ...... 226 A.4 Security Experiment for PoL ...... 233 A.5 Boxplots of Geolocation Error (km) for File Retrieval and Challenge Sizes 5, 35, 65 ...... 243
ix List of Symbols and Abbreviations
Symbol Definition ∆ .; . Statistical Distance G(.) Guessing Probability
H(.) Shannon Entropy
H∞(.) Min-Entropy
E Expected Value I(., .) Mutual Information
. Probability Distributon P BPP Bounded Probabalistic Polynomial
CDF Cumulative Distribution Function DB Distance Bounding
FAR False Acceptance Rate
FRR False Rejection Rate
HtD Hard to Delegate
HtE Hard to Emulate ITS Information Theoretic Security
KS Kolmogorov-Smirnov
MAC Message Authentication Code
MiM Man in the Middle OS Operating System
OTP One-Time Pad PDF Probability Distribution Function
PPT Probabalistic Polynomial Turing
PS Perfect Secrecy
x RNG Random Number Generator SSL Secure Socket Layer
TRG True Randomness Generator U of C University of Calgary
xi Chapter 1
Introduction
This thesis is about the theory, applications and generation of randomness with main emphasis
on information theoretic security. Randomness is vital in many areas of computer science,
specifically in cryptography. Almost all cryptographic primitives use randomness somewhere
in their design and they all assume perfect randomness is available, so the proofs are mostly
based on the assumption that randomness is perfect. By perfect, we mean a sequence of
independently generated and uniformly distributed bits, that is completely unpredictable.
In other words, attackers must not be able to guess even one bit of the random secret with
1 probability more that 2 . Poor choices of randomness in the past have resulted in complete breakdown of security and
expected functionality of the system. Early reported examples of bad choices of randomness
resulting in security failure include attack on Netscape implementation of the SSL protocol
[GW96] and the weakness of entropy collection in Linux Pseudo-Random Generator [GPR06].
A more recent high profile reported case was the discovery of collisions among secret (and
public) keys generated by individuals around the world [LHA+12, HDWH12]. Further
studies attributed the phenomenon partly due to the flaw in Linux kernel randomness
generation subsystem. Therefore, random number generation is an important task in real world applications, and new sources of randomness would add to the security of systems.
Many of the attacks on secure systems are due to bad randomness used, not the underlying
security algorithms and protocols.
In practice, computer systems generate random bits heuristically from the accessible
physical sources of randomness that are not guaranteed to generate random bits with required
properties, or at least there is no proof that they do so. Therefore, on one hand, good
1 randomness is very important and is needed inherently in many applications of cryptography,
and on the other hand, sources of randomness that can be proved to have the required
security properties for cryptography can not be easily found. So there is a gap between what
is needed in cryptography and what can be achieved in practice. One approach to fill this
gap is to find constructions for cryptographic primitives that can be proved secure when
perfect randomness is substituted by the output of true random number generators (TRG).
Another approach is to modify constructions such that they demand less of this expensive
computational resource, namely “randomness”.
To prove that a construction is secure when perfect randomness is substituted by the
output of a true randomness generator, a precise modeling of this output is required. In this
thesis, we start with a formal model of randomness that can represent the output of physical
random sources for practical applications. Then the thesis is divided into two main parts, in which we base our arguments on the formalism of random sources. Part 1 consists of three
chapters investigating randomness requirements of different secrecy notions. Part 2 consists
of two chapters on practical generation of random numbers given the formal model and their
applications in cryptography.
1.1 Formal model of randomness
A TRG is a process that reads the raw output from a source of randomness (e.g. physical
sources such as thermal noise) and then transforms this output to perfect randomness by
applying a post-processing function. Real world constructions of TRGs often assume that
the raw output of the physical random source satisfies certain properties (e.g. the symbols
are normally distributed) and they design a post-processing function that transforms the raw
output to perfect randomness. However, the properties are heuristically tested and there is no
proof that the raw output satisfies the properties. Note that “mathematically proving” that
the output has any specific properties seems impossible for known sources of randomness due
2 to non-deterministic behavior during time, or at least we do not yet know how to prove that.
Thus a line of work sought to relax the assumptions on the properties of the raw output. The
most accepted model of randomness (many works are based on this model) was proposed by
Chor and Goldreich [CG88] where the only assumption on the output distribution is a bound
on the probability that each symbol occurs, and they call it weak random sources. In this
source, the symbols are neither considered independent nor uniformly distributed. Moreover,
other models of randomness proposed in the literature can be represented as special case of
this model. We consider weak random sources in this thesis.
1.2 Randomness for secrecy
In this part of thesis, based on the formalism of the randomness model, we study the ran-
domness requirements of cryptographic primitives related to secrecy in terms of number and
distribution of bits needed for generating secret key or randomness. In particular, secrecy
primitives such as encryption and secret sharing are investigated to determine if realistic
models of random sources can be used effectively to provide security for these primitives. The
task is to seek for a minimal condition on the randomness that can be used in a particular
cryptographic primitive. Here minimal conditions refers to how practical is the assumption
on the random source for a primitive. For example, assuming access to independent and
uniformly distributed random bits is considered costly, while assuming that a source generates
output bits from a distribution with a certain bound on the probability of each element,
is considered more realistic. So a natural question is that to what extent a cryptographic
primitive depends on randomness? And what are the properties (e.g. distribution) of the
randomness that are needed for a particular primitive? In other words, what is the “random-
ness complexity” of a cryptographic primitive to provide reasonable security guarantees. By
randomness complexity, we mean both the quantity, i.e. the number of random bits, and the
quality, i.e. the distribution of the randomness that is needed to accomplish a cryptographic
3 task. To formally define randomness complexity of a primitive, a measure of randomness
is needed. One estimate for the quality of the randomness is min-entropy which measures
the unpredictability of a random variable by the largest probability that its outputs can
take. This measure enables us to classify different cryptographic primitives based on their
randomness demand. Comparing to complexity theory where complexity classes are defined
to classify algorithms based on their need for time or space, in this context “random sources”
are defined to classify algorithms based on their need for randomness. A random source is
defined as a class of probability distributions that have some property, e.g. distributions that
are sufficient to do a cryptographic tasks such as encryption.
Modern cryptography was initiated by the seminal work of Shannon [Sha49], where he
formally defined a notion of secrecy, called perfect secrecy, and a by product of his work was to bound the amount of randomness required to achieve perfect secrecy. He proved
results on the size and distribution of the secret key needed for this definition. Perfect
secrecy requires that a ciphertext does not give any more information about the plaintext
than the adversary already knows prior to viewing the ciphertext. Shannon showed that an
encryption scheme that can achieve perfect secrecy is one-time pad which expects that the
key be selected uniformly at random with a size equal to the plaintext. He also proved that
for any encryption scheme with perfect secrecy, Shannon entropy of the keys must be greater
than that of the plaintexts. This is a very expensive demand because even if one could afford
the size of the key, there is no guarantee about uniformity of the key, as discussed before. So
a number of works tried to relax these requirements. In the first natural relaxation, some
leakage of the message was allowed. This was formalized as -indistinguishability (-secrecy).
This relaxation however, could not reduce the requirements on the secret key significantly as
it will be shown in Theorem 3.4.2 that even for a key of length one bit less than the message,
-secrecy is impossible for 1/2. ≤
4 Since the above limitations are hard to satisfy in practical scenarios, modern cryptography
turned toward computational security [GM82]. In computational security the adversary
is assumed to have limited computational power, and breaking security is related to a
computationally “hard” problem where hardness of a problem is formalized using complexity
theory. Using this approach a scheme is secure if a probabilistic polynomial time (PPT)
adversary (algorithm) can have negligible success chance in breaking the scheme. Note that
requiring “negligible” success chance instead of 0 success chance is needed to reduce the key
length: it was recently proved by Dodis [Dod12] that for an adversary running in time v
equal to the maximum bit length of the message and the ciphertext, success probability of
0 in breaking the encryption implies that the min-entropy of the key must be greater than
the min-entropy of the message. Compared to information theoretic security, it is important
to note that computational security is based on unproven assumptions such as hardness of
integer factoring, or discrete logarithm problem.
In information theoretic setting, Russel and Wang [RW02] relaxed perfect secrecy by
assuming a bound on the prior knowledge of the adversary about the plaintext. With this
relaxed notion, they could achieve smaller key length depending on the amount of adversaries
knowledge. But their scheme only works when keys are sampled uniformly at random. Dodis
and Smith [DS05] build upon their work by showing that the relaxed notion of Russel and
Wang is equivalent to the definition of randomness extractors and by this, they could achieve
simpler constructions.
For non-uniform keys, which we refer to them as imperfect keys, a number of papers
[MP91, DS02, DOPS04] tried to find out whether -secrecy is possible if the key is not chosen
uniformly at random. -secrecy allows a small amount of information to be leaked to the
adversary compared to perfect secrecy which tolerates no information leakage. Bosley and
Dodis [BD07] settled the problem by proving that even for -secrecy, imperfect keys are not
sufficient for encryption. They proved that for practical key lengths that are not exponential
5 in the size of plaintext, either the key is transformable to a uniform key with a size equal
to the size of plaintext using a deterministic function, or encryption is impossible. This
deterministic function is called an extractor which extracts almost uniform randomness from
non-uniform randomness. If such a function exists, then one can transform the key to an
almost uniform key and then use one-time pad to encrypt a plaintext of the same size as the
output key.
In this thesis, we propose two new notions of security that are motivated by practical
applications.
1.2.1 Guessing secrecy
In the first relaxation, a definition of secrecy named “Guessing secrecy” is proposed and
its properties and randomness requirements is investigated. This definition is based on
min-entropy of the message, requiring that the predictability of message does not change when an adversary views a ciphertext. Predictability refers to the best advantage of the
adversary in guessing message sampled from a distribution. The best an adversary can do is
to guess the most probable value from the distribution, measuring this in terms of bits gives
us “Min-Entropy”: For a random variable X over the set of messages , the best guess of X ∗ ∗ an adversary is the x with X (x ) = maxx X (x) and measuring this unpredictability ∈ X P P
in terms of bits results in log maxx X (x). We show that guessing secrecy requires some − P of the same conditions on the secret key as in Shannon’s perfect secrecy. For example, it
requires the same length of the secret key as Shannon’s perfect secrecy. However, there exists
a family of distributions on messages and keys such that guessing secrecy is satisfied, but
perfect secrecy is not for that family. The significance of this result is that depending on the
security guarantee that is required in the secrecy system, there exists one reasonable model
of secrecy that weaker random sources (compared to uniformly distributed ones) suffices to
provide security. Comparing this to entropic security [DS05] where a smaller but uniformly
6 distributed key sufficed to provide security, in guessing secrecy the distribution of the secret
key can be relaxed, but not its length. This work was published in ICITS’12 [ASN12].
1.2.2 Correlated keys for multiple messages
Most relaxations of secrecy in information theoretic security are based on one message security,
and a fresh sampling of the secret key, completely independent of the first key, is required for
the encryption of a second message. In this work, we first extend the definition of secrecy to
multiple messages where an obvious encryption scheme would be to use independent keys to
provide the security for messages. We prove bounds on the length of the uniformly distributed
and independent keys in multiple message security. Then we show how correlated keys can be
used to encrypt a sequence of messages, by relaxing the definition of multiple message security.
We assume that the adversary’s advantage in predicting messages (before encryption) is
sufficiently small and thus the adversary can not guess the message before encryption occurs.
In our construction, the secret keys would still be close to uniform distribution, but they
are not independent of the previous keys. In this relaxation, last message in the sequence
is proved to be secure at the price of some leakage in older messages. We however, show
that this leakage is not arbitrary and can be bounded in terms of entropic security [DS05].
This is useful in scenarios that the importance of encrypted messages have a due time and
the security of last message would be more important. This happens in applications such as
keeping the privacy of location or health records, where the last record (e.g. location) is the
most important and the past records are not. Compared to entropic security, if the number
of messages λ to be encrypted was known prior to the beginning of communication, one could
use a construction of [DS05] to encrypt the first λ 1 messages and use one-time pad to − encrypt the last one. However, if the number of messages is not known, or the messages
are streamed (with no possible ending), then our construction has the advantage of using
correlated keys and achieve a smaller size of key.
7 1.3 Generating random numbers, and applications
In this part of thesis, we discuss two methods of generating random numbers using human
interaction with computers. Generating random numbers is investigated in many works due
to its importance in security applications. The deterministic nature of computers make it
impossible to generate random numbers unless an external source of randomness is utilized.
Prior works investigated random number generation from physical sources with inherent
randomness such as thermal noise. In many such sources the underlying laws that govern the
behavior of the system is unknown so the phenomenon is perceived as random. One can argue
that once these laws are discovered, then the source cannot be considered as random anymore.
Quantum source HotBit [Wal01] however can be constructed to rely on proven uncertainty
laws of quantum mechanics, and so have provable uncertain properties. Realization of these
sources however introduce inaccuracies that affect this property.
There is a debate that many physical sources of randomness are not inherently random,
compared to quantum events such as radioactive decays which are inherently random.
Therefore quantum phenomena are used as a more reliable source of randomness, and many
TRGs are built based on them such as HotBit. Other works such as [HN09, ZLwW+09]
considered human as a source of randomness. This is because choices and actions of human
although biased, contain inherent randomness. In our two proposals, we used human as an
external source of randomness where the unpredictable behavior or choices of human could
be used to generate random numbers. Even though human choices and behaviors have biases,
but because of the complexity of modeling human cognitive and perceptual systems, it would
be hard to predict the human choices or randomness while they engage in a game and their
goal is to win the game.
8 1.3.1 TRG from human game-play: Using video games
In the first method, we propose a random number generator from human game-play in video
games. Our observation is that for the game to be entertaining, there is always an inevitable
element of error in human game-play that can be used to generate random numbers. Our
experiments showed that even for the most experienced players, these errors are sufficient to
produce enough unpredictability to generate random numbers. In particular, this approach is
quite effective in gaming consoles and smart phones where hardware sources of generating
randomness is not sufficient to provide enough entropy for server communication. We discuss
our experiments and the other applications of this approach in real world applications. This work is to be published in the post proceedings of ISC’13 [ASN14].
1.3.2 TRG from human game-play: Game theoretic approach
Halprin et al [HN09] proposed to generate randomness from human game-play in zero-sum
games based on psychological experiments that showed human behavior is close to random when involved in competitive games that leave a few choices for the human (2 to 3 choices).
Their work however, used a zero-sum game with many choices for the human. The many
choices for human was to achieve a higher rate of randomness generation but contrasting
psychological results that requires few choices for human. We extended their work by keeping
the rate of randomness generation high, and at the same time giving only a few choices
(3 choices) to the human. We also did not use any extra randomness to achieve security
guarantees required for a TRG. This work is published in GameSec’13 [ASS13].
1.3.3 User generated randomness for authentication
Finally we discuss how the randomness in the human game-play can be used for authentication
purposes. The errors of human game-play in video games although sufficiently random to
generate randomness, but we found out that these errors (and in general human behavior in
game-play) represent a fingerprint of a human that can be used as an additional factor in
9 authenticating humans with interesting properties. Our empirical studies showed that the
behavior of a human entity in games is very hard to mimic by others even given information
about their game-play behavior and statistical information of how the entity plays the game.
This property has some interesting applications such as hard-to-delegate authentication: It
should be hard for an authorized entity to delegate its authentication information to others
so that they get authenticated too.
1.4 Other contributions
Comparing sources for various cryptographic primitives is an interesting question. This is to
find the primitives that are more demanding in terms of randomness. We present a literature
review and our partial results and discuss this line of results.
1.4.1 Review, partial results, and comparison of secrecy primitives
We review the results on randomness requirements of information theoretic encryption and
secret sharing, starting from the seminal work of Shannon by highlighting the main results.
We discuss and generalize the proofs on the size of the key needed for encryption. We then
discuss randomness requirements for encryption in terms of the distribution of the key and the
minimum requirement for secret keys. Comparing random sources in different applications is
an important subject to understand how demanding an application is in terms of randomness
requirements. In this regard, a number of results on the relation between the randomness
requirements of secret sharing and encryption is proved.
1.4.2 Location based storage
Distributed storage servers are ubiquitous nowadays to store files and data of many organi-
zations. In this service, data may be stored on any server around the globe. However, it is
10 important to know the country for which certain sensitive data (such as health records) are
stored in to determine applicable laws to the data. In this work, using a number of servers
around the world and proof of retrievability protocols, an estimate of the location of files is
calculated in a reasonable scenario that servers in distributed storage systems do not keep
copies of files due to economic benefits. Our main contribution was in the implementation
and running of the experiment to locate the files with theoretical analysis of how an estimate
of the location can be derived using multiple servers. To do this, a server-client application was developed over planet-lab network of computers. This work is published in CCSW’12
[WSNA+12].
1.5 Subsequent works
Jiang [Jia13] compared our definition of guessing secrecy with various secrecy definitions, in which he showed that guessing secrecy is a weaker notion than -secrecy. In other words,
any encryption scheme that provides -secrecy, would provide guessing secrecy. However, a
scheme providing guessing secrecy does not necessarily provide -secrecy.
Iwamoto et al. [IS14] generalized guessing secrecy to another type of entropy called
“Renyi entropy”, which is based on the collision probability of a random variable, i.e. the
probability that two identical copies of the same variable output the same realization. The
authors extended the results to a general model of entropy where Min-entropy and Shannon
entropy is a subset of it. They derived bounds on the size of secret key needed to achieve
secrecy in this model which conformed to the bounds on guessing secrecy and -secrecy.
1.6 Thesis structure
The thesis is divided into seven chapters and an appendix. The first chapter is introduction
to the problem, prior work and our contribution. In the second chapter, we recall some of
11 the essential backgrounds needed for the thesis, including concepts from probability theory, information measures and information theoretic security. In the third chapter, we review secrecy in information theoretic security in terms of randomness requirements. We remind some of the celebrated results and also prove basic results needed in the following two chapters. In forth chapter, we give a definition of secrecy, called guessing secrecy. We prove that although it needs the same number of random bits to encrypt messages under this definition, but in certain cases, the randomness can be biased. In the fifth chapter, we discuss our proposal for encryption of multiple messages. In sixth chapter, we discuss two approaches in generating random numbers practically using human interaction with computers. In the seventh chapter, we show how human game-play can be used to provide authentication with a new property that the authentication can not be delegated to a third party. In the last chapter, we point out concluding remarks of the thesis and discuss some of the open problems and interesting directions from this work. In the appendix, a joint work is presented on location based storage.
1.6.1 Theorems and proofs
In this thesis, whenever a theorem or lemma is used from other works, it is cited to the original paper proving it. If a theorem or lemma is not cited, the proof is from our work.
12 Chapter 2
Preliminaries and Basics
In this chapter, we recall some of the backgrounds on probability
theory and information theoretic security which is extensively
used in this work from [CT91].
2.1 Probability theory
Information theoretic security (ITS) relies on discrete probability theory for many of its
definitions and concepts. In this section, we remind the readers parts of probability theory
that are used in this thesis. We follow the notations in [CM97a]:
A discrete probability space (Ω,P ) is defined over a finite or countably infinite set Ω, called sample space, and a probability function P that assigns a number in interval [0, 1] to each
element of the sample space with sum equal to 1 over the whole sample space. P [ω] is
the value assigned to ω Ω with probability function P . An event A is a subset of the ∈ sample space and the probability associated with an event A, denoted by P [A] is the sum P of the probabilities of elements of A, i.e. P [A] = ω∈A P [ω]. A discrete random variable X is defined over a probability space (Ω,P ) and is a mapping from the sample space to an
alphabet with a distribution function X that assigns a probability X (x) to the event X P P x defined as follows: ∈ X
X X (x) = P [X = x] = P [ω]. P ω∈Ω:X(ω)=x In this thesis, random variables are always denoted by capital letters (e.g. X), their alphabet
is denoted by the corresponding calligraphic letters (e.g. ) and their realization, i.e. the X value of a random variable observed, is denoted by the corresponding lower case letter (e.g.
13 x). By size of we mean the number of elements in the set and by length of , we mean X X X the number of bits needed to represent elements of (length of x means its bit length). X
Having a pair of random variables X and Y over the same sample space with respective
alphabets and , the random variable XY is defined over the alphabet with joint X Y X × Y
probability distribution XY : [0, 1] given by XY (x, y) = P [X = x, Y = y]. The P X × Y → P conditional probability distribution of the random variable X given that Y takes on the value
y with Y (y) > 0 is denoted by X|Y (x y) and defined by ∈ Y P P |
XY (x, y) X|Y (x y) = X|Y =y(x) = P , for x . (2.1) P | P Y (y) ∈ X P Note that for a value y , X Y = y is a random variable over . We say that two random ∈ Y | X variables X and Y are independent if for all x , y ∈ X ∈ Y
XY (x, y) = X (x) Y (y) or X|Y (x y) = X (x). (2.2) P P P P | P
From the equation (2.1), it is easy to see that if Y (y) > 0, then P
Y |X (y x) X (x) X|Y (x y) = P | P . (2.3) P | Y (y) P This equality is called Bayes’ theorem.
The expected value of a random variable X defined over real numbers is denoted by E[X] and is given by X E[X] = x X (x). P x∈X For a random variable X, support of X, supp(X), is all the elements x with non-zero ∈ X
probability, i.e. supp(X) = x X (x) > 0 . { ∈ X |P }
2.2 Information theoretic measures
In many scenarios in ITS one needs a measure for the amount of information contained in a
random variable, or the information that is leaked from a message or secret after revealing
14 a function of it. Information in a random variable is associated with unpredictability or
uncertainty of that random variable. If the output of a random variable is completely
predictable, then random variable’s output adds no extra information to one’s knowledge and
the more unpredictable is the random variable, the more information is gained after learning
its output. The terms information gain, uncertainty and unpredictability of a random variable
refer to different aspects of the same concept. A formalization of this concept was introduced
in the seminal work of Shannon [Sha48]. Shannon defined the entropy or uncertainty of a
random variable as a measure of information that is gained after observing a realization of
the random variable.
Definition 2.2.1 (Shannon entropy) The Shannon entropy H(X) of a random variable X with probability distribution X is given by: P X H(X) = X (x) log( X (x)) = E[ log X ], − P P − P x∈X where log is in base 2. Note that we will use logarithm in base 2 in the entire thesis.
For example the information contained in a random variable over n-bit strings with uniform
distribution is n bits, since we can not predict any particular realization of the random variable and the output is completely unpredictable and thus by observing a realization of
the random variable we learn n bits of information. However, if the distribution was not
uniform, then we could guess the most probable element of the space as a candidate and
thus the realization of the random variable was predictable with better chance. Therefore we
expect that the information contained in such a random variable be less than n bits.
The joint Shannon entropy of joint random variables X and Y is the entropy of the
joint probability distribution XY . The conditional Shannon entropy H(X Y ) of the random P | variable X given Y is the average value of H(X Y = y) over all possible value of y, i.e. | X H(X Y ) = Y (y)H(X Y = y). | − P | y∈Y
15 Definition 2.2.2 (Mutual information) The mutual information between two random vari- ables measures the mutual dependence of them and is given by
X X XY (x, y) I(X; Y ) = XY (x, y) log P . P X (x) Y (y) x∈X y∈Y P P Note that the above definition is symmetric in X and Y . Mutual information can also be
defined as the amount of information reduction in X after knowing Y , i.e.
H(X) H(X Y ). − |
As can be derived from the Van diagram in Figure 2.1. See [CT91] for a proof of this.
H(X) H(Y )
H(X|Y ) I(X; Y ) H(Y |X)
H(X,Y )
Figure 2.1: Van diagram of Entropy measures
We also need a measure of the “closeness” of two random variables over the same alphabet.
This measure is called statistical distance and is defined as follows:
Definition 2.2.3 (Statistical distance) The Statistical distance between two random variables
X,Y over is defined by X 1 X ∆ X; Y = max Pr[X T ] Pr[Y T ] = X (x) Y (x) . T ⊆X 2 | ∈ − ∈ | x |P − P | We use X Y to denote ∆ X; Y . ≈ ≤
16 We denote the expected statistical distance of two random variables X,Y conditioned on Z
by 1 X ∆ (X,Z); (Y,Z) = ∆ X; Y Z = Z (z) X|Z (x z) Y |Z (x z) . 2 | x,z P P | − P | We say X is -close to Y if ∆ X; Y . Specifically if Y is the uniform distribution, then ≤ we say X is -random. We will use the following lemmas on statistical distance throughout
this thesis.
Lemma 2.2.1 (Triangle inequality) For any random variables X,Y,Z we have
∆ X; Y ∆ X; Z + ∆ Z; Y . ≤
The above lemma is a well-known fact about statistical distance which is used in many papers.
Lemma 2.2.2 For any two random variables X,Y over that are jointly distributed with M a random variable A, and any function f on , M
∆ f(X); f(Y ) A ∆ X; Y A, | ≤ | with equality if f is a one-to-one function.
Proof. Suppose f( ) is the image of f on , then by definition of statistical distance the M M following holds: