Belief in Information Flow

Michael R. Clarkson Andrew C. Myers Fred B. Schneider Department of Computer Science Cornell University {clarkson,andru,fbs}@cs.cornell.edu

Abstract include guards, which sit at the boundary between trusted and untrusted systems, and password checkers. Information leakage traditionally has been defined to oc- Defining the quantity of information flow is more diffi- cur when uncertainty about secret data is reduced. This cult than it might seem. Consider a password checker PWC uncertainty-based approach is inadequate for measuring that sets an authentication flag a after checking a stored information flow when an attacker is making assumptions password p against a (guessed) password g supplied by the about secret inputs and these assumptions might be incor- user. rect; such attacker beliefs are an unavoidable aspect of any satisfactory definition of leakage. To reason about informa- PWC : if p = g then a := 1 else a := 0 tion flow based on beliefs, a model is developed that describes how attacker beliefs change due to the attacker’s For simplicity, suppose that the password is either A, B, observation of the execution of a probabilistic (or determin- or C. Suppose also that the user is actually an attacker at- istic) program. The model leads to a new metric for quanti- tempting to discover the password, and he believes the pass- tative information flow that measures accuracy rather than word is overwhelmingly likely to be A but has a minuscule uncertainty of beliefs. and equally likely chance to be either B or C. (This need not be an arbitrary assumption on the attacker’s part; perhaps the attacker was told by a usually reliable informant that the password is A.) If the attacker experiments by ex- 1. Introduction ecuting PWC and guessing A, he expects the outcome to be that a is equal to 1. Such a confirmation of the attacker’s Qualitative security properties, such as noninterference belief does seem to convey some small amount of informa- [10], typically either prohibit any flow of information from tion. But suppose that the informant was wrong: the real a high security level to a lower level, or they allow any password is C. The outcome of this experiment has a equal amount of flow so long as it passes through some release to 0, from which the attacker infers that A is not the pass- mechanism. For a program whose correctness requires flow word. Common sense dictates that his new belief is that B from high to low, the former property is too restrictive and and C each have a 50% chance of being the password. The the latter can lead to unbounded leakage of information. attacker’s belief has greatly changed—he is surprised to dis- Quantitative flow properties, such as “at most k bits leak cover the password is not A—so this outcome of his exper- per execution of the program”, allow information flows but iment seems to convey a larger amount of information than at restricted rates. Such properties are useful when analyz- the previous outcome. Thus, the information conveyed by ing programs whose nature requires that some—but not too executing PWC depends on what the attacker believes. much—information be leaked. Examples of these programs How much information flows from p to a in each of the above experiments? Answers to this question have tradition- This work was supported by the Department of the Navy, Office of Naval Research, ONR Grant N00014-01-1-0968; Air Force Office of Scientific ally been based on change in uncertainty [5, 20, 11, 1, 16, 2, Research, Air Force Materiel Command, USAF, grant number F49620-03- 17]: information flow is measured by the reduction in uncer- 1-0156; and National Science Foundation grants 0208642, 0133302, and tainty about secret data. Observe that, in the case where the 0430161. Michael Clarkson is supported by a National Science Founda- tion Graduate Research Fellowship; Andrew Myers is supported by an Al- password is C, the attacker initially is quite certain (though fred P. Sloan Research Fellowship. Opinions, findings, conclusions, or rec- wrong) about the value of the password and after the exper- ommendations contained in this material are those of the authors and do iment is rather uncertain about the value of the password; not necessarily reflect the views of these sponsors. The U.S. Government is authorized to reproduce and distribute reprints for Governmental pur- the change from “quite certain” to “rather uncertain” is an poses notwithstanding any copyright notation thereon. increase in uncertainty. So according to a reduction in un-

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE certainty metric, no information flow occurred, which flatly guage semantics [21]. Henceforth, we write “distribution” contradicts our intuition. to mean “frequency distribution”. The problem with metrics based on uncertainty is The set of all program states is State, and the set of all twofold. First, they do not take accuracy into account. Ac- distributions is Dist. The structure of State is mostly unim- curacy and uncertainty are orthogonal properties of portant; it can be instantiated according to the needs of any the attacker’s belief—being certain does not make one particular language or system. For our examples, states map correct—and as the password checking example illus- variables to values, where Var and Val are both countable trates, the amount of information flow depends on accu- sets. ∈ racy rather than on uncertainty. Second, uncertainty-based v Var ∈ → metrics are concerned with some unspecified agent’s un- σ State Var Val + certainty rather than an attacker’s. The unspecified agent δ ∈ Dist State → R is able to observe a probability distribution over secret in- We write a state as a list of mappings; e.g. (g → A, a → 0) put values but cannot observe the particular secret in- is a state in which variable g has value A and a has value 0. put used in the program execution. If the attacker were The mass in a distribution δ is the sum of frequencies: the unspecified agent, there would be no reason in general to assume the probability distribution the attacker δ σ δ(σ) uses is correct. Because the attacker’s probability distribution is therefore subjective, it must be treated as a be- A probability distribution has mass 1, but a frequency distri- lief. Beliefs are thus an essential—though until now bution may have any non-negative mass. A point mass is a uninvestigated—component of information flow. probability distribution that maps a single state to 1. It is de- This paper presents a new way of measuring informa- noted by placing a dot over that single state: tion flow, based on these insights. Section 2 gives basic rep- σ˙ λσ . if σ = σ then 1 else 0 resentations and notations for beliefs and programs. Sec- tion 3 describes a model of the interaction between attack- 2.2. Programs ers and systems; it also describes how attackers update beliefs by observing execution of programs. Section 4 defines Execution of program S is described by a denotational a new quantitative flow metric, based on information the- semantics in which the meaning [[ S]] of S is a function of ory, that characterizes the amount of information flow due type State → Dist. This semantics describes the frequency to changes in the accuracy of an attacker’s belief. The model of termination in a given state: if [[ S]] σ = δ, then the fre- and metric are formulated for use with any programming quency of S, when begun in σ, terminating in σ should language (or even any state machine) that can be given a de- be δ(σ). This semantics can be lifted to a function of type notational semantics compatible with the representation of Dist → Dist by the following definition: beliefs, and Section 5 illustrates with a particular programming language (while-programs plus probabilistic choice). [[ S]] δ σ δ(σ) · [[ S]] σ Section 6 discusses related work, and Section 7 concludes. Thus, the meaning of S over a distribution of inputs is completely determined by the meaning of S given a state 2. Incorporating beliefs as input. By defining programs in terms of how they oper- ate on distributions we permit analysis of probabilistic pro- A belief is a statement an agent makes about the state grams. Section 5 shows how to build such a semantics. of the world, accompanied by some measure of how cer- Our examples use while-programs extended with a prob- tain the agent is about the truthfulness of the statement. We abilistic choice construct. Let metavariables S, v, E,and begin by developing mathematical structures for represent- B range over programs, variables, arithmetic expressions, ing beliefs. and Boolean expressions, respectively. Evaluation of expressions is assumed side-effect free, but we do not other- 2.1. Distributions wise prescribe their syntax or semantics. The syntax of the language is:

A frequency distribution is a function δ that maps a pro- S ::= skip | v := E | S; S | if B then S else S gram state to a frequency, where a frequency is a non- | while B do S | S p S negative real number. A frequency distribution is essentially an unnormalized probability distribution over pro- The operational semantics for the deterministic subset of

gram states; frequency distributions are often better than this language is standard. Probabilistic choice S1 p S2 ex- probability distributions as the basis for a programming lan- ecutes S1 with probability p or S2 with probability 1 − p.

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE 2.3. Labels and projections 2. Belief update b|U is the belief that results when b is up- dated to include new information that the actual world We need a way to identify secret data; confidential- is in set U ⊆ W of possible worlds. ity labels serve this purpose. For simplicity, assume there ≥ 3. Belief distance D(b b ) is a real number r 0 are only two such labels: a label L that indicates low- quantifying the difference between b and b . confidentiality (public) data, and a label H that indicates high-confidentiality (secret) data. Assume that State is a While the results in this paper are, for the most part, in- product of two domains StateL and StateH , which con- dependent of any particular representation, the rest of this tain the low- and high-labeled data, respectively. A low paper uses distributions to represent beliefs. High states are state is an element σL ∈ StateL;ahigh state is an ele- the possible worlds for beliefs, and a belief b is a probability ment σH ∈ StateH . The projection of state σ ∈ State onto distribution over high states, i.e. b =1. Whereas distribu- StateL is denoted σ L; this is the part of σ visible to the at- tions correspond to positive measures, beliefs correspond to tacker. Projection onto StateH , the part of σ not visible to probability measures. Probability measures are well-studied the attacker, is denoted σ H. as a belief representation [13], and they have several advan- Each variable in a program is labeled to indicate the con- tages here: they are familiar, quantitative, support the oper- fidentiality of the information stored in that variable; for ex- ations required above, and admit a programming language ample, xL is a variable that contains low information. For semantics (as shown in Section 5). There is also a nice jus- convenience, let variable l be labeled L and variable h be la- tification for the numbers they produce: roughly, b(σ) char- beled H. VarL is the set of variables in a program that are acterizes the amount of money an attacker should be willing labeled L,soStateL = Var L → Val.Thelow projection to bet that σ is the actual state of the system [13]. σ L of state σ is: For belief product ⊗, we employ a distribution product + + ⊗ of two distributions δ1 : A → R and δ2 : B → R , σ L λv ∈ Var L .σ(v) with A and B disjoint:

States σ and σ are low-equivalent, written σ ≈L σ ,if δ1 ⊗ δ2 λ(σ1,σ2) ∈ A × B.δ1(σ1) · δ2(σ2) they have the same low projection: It is easy to check that if b and b are beliefs, b ⊗ b is too. σ ≈L σ (σ L)=(σ L) For belief update |,weusedistribution conditioning:

Distributions also have projections. Let δ be a distribu- δ(σ) δ|U λσ . if σ ∈ U then else 0 tion and σL a low state. Then (δ L)(σL) is the combined σ∈U δ(σ ) 1 frequency of those states whose low projection is σL: For belief distance D we use relative entropy,an information-theoretic metric [14] for the distance be- δ L λσL ∈ StateL . σ | (σL)=σ δ(σ ) L tween distributions. High projection and high equivalence are defined by re- · b(σ) placing occurrences of L with H in the definitions above. D(b b) σ b(σ) log b(σ) The base of the logarithm in D can be chosen arbitrarily; 2.4. Belief representation we use base 2 and write lg to indicate log2, making bits the unit of measurement for distance. The relative entropy of b Many belief representations have been proposed [13]. To to b is the expected inefficiency (that is, the number of addi- be usable in our framework, a belief representation must tional bits that must be sent) of an optimal code that is con- support certain natural operations. Let b and b be beliefs structed by assuming an inaccurate distribution over sym- about sets of possible worlds W and W , respectively, bols b when the real distribution is b [14]. Like an analytic

where a world is an elementary outcome about which be- metric, D(b b) is always at least zero and D(b b) liefs can be held. equals zero only when b = b.2 1. Belief product ⊗ combines b and b into a new belief Relative entropy has the property that if b(σ) > 0 and ∞ b ⊗ b about possible worlds W × W ,whereW and b (σ)=0,thenD(b b)= . An inﬁnite distance be- W are disjoint. tween beliefs would cause difﬁculty in measuring change in accuracy. To avoid this anomaly, beliefs may be required

1Formulax∈D | R P is a quantiﬁcation in which is the quantiﬁer (such as ∀ or Σ), x is the variable that is bound in R and P , D is the 2 Unlike an analytic metric, D does not satisfy the triangle inequality. domain of x, R is the range, and P is the body. We omit D, R,and However, it seems unreasonable to assume that the triangle inequality even x when they are clear from context; an omitted range means R = holds for beliefs, since it can be easier to rule out a possibility from a true. belief than to add a new one, or vice-versa.

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE to satisfy certain restrictions. For example, an attacker’s be- An experiment E = S, bH ,σH ,σL is conducted as fol- b lief might be restricted such that: lows. 1 1. The attacker chooses a prebelief bH about the high (min σH b(σH )) ≥ · |State H| state.

for some >0, which ensures that b is never off by more 2. (a) The system picks a high state σH than a factor of from a uniform distribution; we call such (b) The attacker picks a low state σL. beliefs admissible. Other admissibility restrictions may be 3. The system executes the program S, which produces a substituted for this one when stronger assumptions can be state σ ∈ Γ(δ ) as output, where δ =[[S]] ( σ˙ L ⊗ σ˙ H ). made about attacker beliefs. The attacker observes the low projection of the output state: o = σ L. 3. Experiments 4. The attacker infers a postbelief: bH = (([[S]] ( σ˙ L ⊗ We formalize as an experiment how an attacker, an agent bH )|o)) H that reasons about beliefs, revises his beliefs from interac- Figure 1. Experiment Protocol tion with a system, an agent that executes programs. The attacker should not learn about the high input to the program but is allowed to observe (and perhaps influence) low inputs constant from one experiment to the next or might vary. For and outputs. Other agents (a system operator, other users example, Unix passwords do not usually change frequently, of the system with their own high data, an informant upon but the PINs on RSA SecurID tokens change each minute. which the attacker relies, etc.) might be involved when an We conservatively assume that the attacker chooses all of attacker interacts with a system; however, it suffices to con- σL (step 2(b)), the low projection of the initial state, since dense all of these to just the attacker and the system. this gives additional power in controlling execution of the 3 We are chiefly interested in the program S with which program. The attacker’s choice of σL is likely to be influ- the attacker is interacting, and conservatively assume that enced by bH , but for generality, we do not require there be the attacker knows the source code of S. For simplicity of such a strategy. presentation, we assume that S always terminates and that Program S is executed (step 3) only once in each exper- it never modifies the high state. Section 3.4 discusses how iment; multiple executions are modeled by multiple experi- both restrictions can be lifted without significant changes. ments. The meaning of S given input σ˙ L ⊗ σ˙ H is an output distribution δ: 3.1. Experiment protocol δ =[[S]] ( σ˙ L ⊗ σ˙ H ) Formally, an experiment E is a tuple: From δ the attacker makes an observation,whichisalow E = S, bH ,σH ,σL projection of an output state. Probabilistic programs may yield many possible output states, but in a single execution where S is the program, bH is the attacker’s belief, σH is the of the program, only one output state is actually produced. high projection of the initial state, and σL is the low projec- So in a single experiment, the attacker is allowed only a sin- tion of the initial state. The protocol for experiments, which gle observation. The choice of a state from a distribution is uses some notation defined below, is summarized in Fig- modeled by sampling operator Γ,whereΓ(δ) generates a ure 1. Here is a justification for the protocol. state σ (from the domain of δ) with probability δ(σ)/δ. An attacker’s prebelief, describing his belief at the be- To emphasize the fact that the choice is made randomly, as- ginning of the experiment (step 1), may be chosen arbi- signment of a sample is written σ ∈ Γ(δ),using∈ instead trarily (subject to the admissibility requirement in Section of =. The observation o resulting from δ is: 2.4) or may be informed by previous experiments. In a series of experiments, the postbelief from one experiment typ- o ∈ Γ(δ ) L ically becomes the prebelief to the next. The attacker might The formula the attacker uses for postbelief bH (step 4) even choose a prebelief bH that contradicts his true subjec- involves two operations. The first is to use the semantics of tive probability distribution for the state, and this gives our S along with prebelief bH as the distribution on high input. analysis additional power by allowing the attacker to con- duct experiments to answer questions such as “What would 3 More generally, both the system and the attacker might contribute happen if I were to believe bH ?”. to σL. But since we are concerned only with confidentiality—not The system chooses σH (step 2(a)), the high projection integrity—of information, we do not need to distinguish which parts of the initial state, and this part of the state might remain are chosen by what agent.

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE The attacker starts by choosing prebelief bH , perhaps probability as specified in the column labeled bH of Table 1. Next, the system chooses initial high projection σH , and the at- pH bH bH1 bH2 A 0.98 1 0 tacker chooses initial low projection σL.Inthefirstex- B 0.01 0 0.5 periment in Section 1, the password was A, so the system → C 0.01 0 0.5 chooses σH =(p A). Similarly, the attacker chooses σL =(g → A, a → 0). (The initial value of a is actually irrelevant, since it is never used by the program and Table 1. Beliefs about pH a is set along all control paths.) Next, the system executes PWC . Output distribution δ is a point mass at the state σ =(p → A, g → A, a → 1); the semantics in Sec- tion 5 will validate this intuition. Since σ is the only state pga δA δA|o1 δA|o2 that can be sampled from δ , the attacker’s observation o1 is AA0 0 0 0 σ L =(g → A, a → 1). AA1 0.98 1 0 In the final step of the protocol, the attacker infers a post- BA0 0.01 0 0.5 belief. He conducts a thought experiment, predicting an out- BA1 0 0 0 put distribution δA =[[PWC ]] ( σ˙ L ⊗ bH ),giveninTable CA0 0.01 0 0.5 2. The ellipsis in the final row of the table indicates that CA1 0 0 0 all states not shown have frequency 0. This distribution is ... 0 0 0 intuitively correct: the attacker believes that he has a 98% chance of being authenticated, whereas 1% of the time he Table 2. Distributions on PWC output will fail to be authenticated because the password is B,and another 1% because it is C. The attacker conditions prediction δA on observation o1, obtaining δA|o1,alsoshown This “thought experiment” allows the attacker to generate in Table 2. Projecting to high yields the attacker’s postbe- a prediction of the output distribution. We define prediction lief bH1, shown in Table 1. This postbelief is what the infor- δA to correlate the output state with the high input state: mal reasoning in Section 1 suggested: the attacker is certain that the password is A. δ =[[S]] ( σ˙ L ⊗ bH ) A The second experiment in Section 1 can also be formal- The second operation is to incorporate any additional in- ized by an experiment. In it, bH and σL remain the same as → ferences that can be made from the observation by condi- before, but σH becomes (p C). Observation o2 is therefore the point mass at (g → A, a → 0). The prediction tioning prediction δA on observation o. The result is pro- δA remains unchanged, and conditioned on o2 it becomes jected to H to produce the attacker’s postbelief bH : δA|o2, shown in Table 2. Projecting to high yields the new bH =(δA|o) H postbelief bH2 in Table 1. This postbelief again agrees with the informal reasoning: the attacker believes that there is a Here, conditioning operator | is a specialization of distribu- 50% chance each for the password to be B or C. tion conditioning operator δ|U. The specialization removes δ all mass in distribution that is inconsistent with observa- 3.3. Bayesian belief revision tion o, then normalizes the result:

δ|o δ|{σ | σ L = o} The formula the attacker uses to infer a postbelief in step δ(σ) 4 is an application of Bayesian inference, which is a stan- = λσ . if (σ L)=o then else 0 (δL)(o) dard technique in applied statistics for making inferences when uncertainty is made explicit through probability mod- 3.2. Password checking as an experiment els [9]. Let belief revision operator B yield the postbelief from an experiment E = S, bH ,σH ,σL : Adding conﬁdentiality labels to the password checker of

Section 1 yields: B(E) (([[S]] ( σ˙ L ⊗ bH )|o)) H PWC : if pH = gL then aL := 1 else aL := 0 where o ∈ Γ(δ ) L δ =[[S]] ( σ˙ L ⊗ σ˙ H ) An analysis of PWC in terms of our experiment model allows the informal reasoning in Section 1 to be made pre- Because it uses Γ, operator B produces values by sampling, cise, as follows. so we write bH ∈B(E). To select a particular bH from B,

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE we provide observation o: model the observation in the case of nontermination, depending on whether the attacker can detect nontermination. B E ˙ ⊗ | ( ,o) (([[S]] ( σL bH ) o)) H If the attacker has an oracle that decides nontermination, then nontermination can be modeled in the standard deno- The fundamental Bayesian method of updating a hypoth- tational style with a state ⊥ that represents the divergent esis Hyp basedonanobservationobs is Bayes’ rule: state. Details of this approach are given in Appendix A. An Pr(Hyp)Pr(obs|Hyp) attacker that cannot detect nontermination is more difficult Pr(Hyp|obs)= Hyp Pr(Hyp )Pr(obs|Hyp ) to model. At some point during the execution of the program, he will stop waiting for the program to terminate and In our model, the attacker’s hypothesis is about the values declare that he has observed nontermination. However, he of high states, so the domain of hypotheses is State H. may be incorrect in doing so—leading to beliefs about non- Therefore Pr(Hyp), the probability the attacker ascribes to termination and instruction timings. The interaction of these a particular hypothesis σH , is modeled by bH (σH ).The beliefs with beliefs about high inputs is complex; we leave probability Pr(obs|Hyp) the attacker ascribes to an obser- this for future work. vation given the assumed truth of a hypothesis is modeled by the program semantics: the probability of an observa- 4. Measuring information flow tion o given an assumed high input σH is ([[S]] ( σ˙ L ⊗ σ˙ H ) L)(o). Given experiment E = S, bH ,σH ,σL , instantiat- The informal analysis of PWC in Section 1 suggests that ing Bayes rule on these probability models yields B(E,o), information flow corresponds to an improvement in the ac- which is Pr(σH |o): curacy of an attacker’s belief. Recall that the more accurate ˙ · ˙ ⊗ ˙ belief b is with respect to high state σH , the less the dis- bH (σH ) ([[S]] ( σL σH ) L)(o) ˙ B(E,o)= tance D(b σH ). We use change in accuracy, as measured σ bH (σH ) · ([[S]] ( σ˙ L ⊗ σ˙ H ) L)(o) H by distance, to quantify information flow. With this instantiation, we can show that how an attacker updates his belief according to our experiment protocol is 4.1. Information flow from an outcome equivalent to Bayesian updating. Given an experiment E = S, bH ,σH ,σL ,anoutcome is a pair E,bH such that bH ∈B(E). The accuracy of Theorem 1

the attacker’s prebelief bH in outcome E,bH is D(bH B(E,o)(σH )=B(E,o) σ˙ H ); the accuracy of the attacker’s postbelief bH in that out- ˙ come is D(bH σH ). We deﬁne the amount of informa- Proof. In Appendix B. tion ﬂow Q caused by E,bH as the difference of these two quantities:

˙ − ˙ 3.4. Mutable high state and nontermination Q( E,bH ) D(bH σH ) D(bH σH ) Q Section 3.1 invokes two simplifying assumptions about Thus the amount of information flow (in bits) in corre- program S: it never modifies high input, and it always ter- sponds to the improvement in the accuracy of the attacker’s minates. We now dispense with these mostly minor techni- belief. cal issues. With an additional definition from information theory, a Q To eliminate the first assumption, note that if S were to more consequential characterization of is possible. Let Iδ(F ) denote the information contained in event F drawn modify the high state, the attacker’s prediction δA would correlate high outputs with low outputs. However, to calcu- from probability distribution δ: late a postbelief (in step 4), δ must correlate high inputs A Iδ(F ) − lg Prδ(F ) with low outputs. So our experiment protocol requires the high input state be preserved in δA. Informally, we can do Information is sometimes called “surprise” because I mea- this by copying the high input state and requiring that the sures how surprising an event is; for example, when an event copy be immutable. Thus, the copy is preserved in the fi- that has probability 1 occurs, no information (i.e. 0 bits) is nal output state, and the attacker can again establish a cor- conveyed because the occurrence is completely unsurpris- relation between high inputs and low outputs. Technical de- ing. tails are given in Appendix A. For an attacker, the outcome of an experiment involves To eliminate the second assumption, note that program two unknowns: the initial high state σH and the probabilis- S must terminate for an attacker to obtain a low state as tic choices made by the program. Let δS =[[S]] ( σ˙ L ⊗σ˙ H ) an observation when executing S. There are two ways to L be the system’s distribution on low outputs, and δA =

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE [[ S]] ( σ˙ L ⊗ bH ) L be the attacker’s distribution on low out- II 6More certain I puts. IδA (o) measures the information contained in o about

both unknowns, but IδS (o) measures only the probabilistic bH = 0.5, 0.5 bH = 0.5, 0.5 choices made by the program.4 For programs that make no o =(l → 1) o =(l → 0) probabilistic choices, δA contains information about only the initial high state, and δS is a point mass at some state σ - such that σ L = o. So the amount of information Iδ (o) is S Less accurate More accurate 0. For probabilistic programs, IδS (o) is generally not equal to 0; subtracting it removes all the information contained bH = 0.99, 0.01 bH = 0.01, 0.99 in Iδ (o) that is solely about the outcomes of probabilis- A o =(l → 1) o =(l → 0) tic choices, leaving information about high inputs only. The following theorem states that Q measures the infor- III?Less certain IV mation about high input σH contained in observation o. Figure 2. Eﬀect of FLIP on postbelief Theorem 2

Q( E,b )=Iδ (o) −Iδ (o) H A S 4.2. Comparing accuracy and uncertainty Proof. In Appendix B. The information flow in experiment E2 might seem sur- As an example, consider the experiments involv- prisingly high. At most two bits are required to store pass- ing PWC in Section 3.2. The first experiment E1 has the word p in memory, so how can the program leak more than attacker correctly guess the password A,so: five bits? In brief, this occurs because the attacker’s belief is so erroneous that a large amount of information is re- E1 = PWC ,bH , (p → A), (g → A, a → 0) quired to correct it. This also illuminates the difference between measuring information flow based on uncertainty ver- where bH (and the other beliefs about to be used) is defined sus based on accuracy. E in Table 1. Only one outcome, 1,bH1 , is possible from Consider how an uncertainty-based approach would an- Q E this experiment. Calculating ( 1,bH1 ) yields a flow of alyze the program, if the attacker’s belief were used as the 0.0291 bits from the outcome. The small flow makes sense input distribution over high inputs. The attacker’s initial un- because the outcome has only confirmed something the at- certainty about p is H(bH )=0.1614 bits, where H is the tacker already believed to be almost certainly true. In exper- information-theoretic measure of entropy, or uncertainty, in E iment 2 the attacker guesses incorrectly. a probability distribution δ. E2 = PWC ,bH , (p → C), (g → A, a → 0) H(δ) − σ δ(σ) · lg δ(σ) E Again, only one outcome 2,bH2 is possible from this ex- Maximum entropy is achieved by uniform distributions Q E periment, and calculating ( 2,bH2 ) yields an informa- [14], so the maximal uncertainty about p is lg 3 ≈ 1.6 bits, tion flow of 5.6439 bits. This higher information flow makes the same number of bits required to store p. In the sec- sense, because the attacker’s postbelief is much closer to ond experiment, the attacker’s final uncertainty about p is correctly identifying the high state. The attacker’s prebe- H(bH2)=1. The reduction in uncertainty is 0.1614 − 1=

lief bH ascribed a 0.02 probability to the event [p = A],and −0.8386. An uncertainty-based analysis, such as Denning’s the information of an event with probability 0.02 is 5.6439. [5], would interpret this negative quantity as an absence of Q This suggests that is a reasonable measure for the infor- information flow. But this is clearly not the case—the at- mation about high input contained in the observation. tacker’s belief has been guided closer to reality by the ex- Another interpretation of the number produced by our periment. The uncertainty-based analysis ignores reality by definition of Q is possible: the amount of information flow measuring bH and bH2 against themselves only, instead of Q is the improvement in expected inefficiency of the at- against the high state σH . tacker’s optimal code for the high input. This is because Accuracy and uncertainty are orthogonal properties of relative entropy can be interpreted in terms of coding effi- beliefs, as depicted in Figure 2. The figure shows the change ciency (see Section 2.4). We have yet to fully investigate in an attacker’s accuracy and uncertainty when the program this interpretation. ¬ FLIP : l := h 0.99 l := h 4 The technique used in Section 3.4 for modeling nontermination en- δ δ I I sures that A and S are probability distributions. Thus, δA and δS is analyzed with experiment E = FLIP,bH , (h → are well-defined. 0), (l → 0) and observation o is generated by the exper-

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE Quadrant h I II III IV bH :00.5 0.5 0.99 0.01 1 0.5 0.5 0.01 0.99 o (l → 0) (l → 1) (l → 1) (l → 0) bH :00.99 0.01 0.5 0.5 1 0.01 0.99 0.5 0.5 Increase in accuracy +0.9855 −5.6439 −0.9855 +5.6439 Reduction in uncertainty +0.9192 +0.9192 −0.9192 −0.9192

Table 3. Analysis of FLIP

iment. The notation bH = x, y in Figure 2 means that → → bH (h 0) = x and bH (h 1) = y. p Usually, FLIP sets l to be h, so the attacker will ex- bH : A 0.98 pect this to be the case. Executions in which this occurs will B 0.01 cause his postbelief to be more accurate, but may cause his C 0.01 uncertainty to either increase or decrease, depending on his bH : A 0 prebelief; when uncertainty increases, an uncertainty met- B 0.5 ric would mistakenly say that no flow has occurred. C 0.5 With probability 0.01, FLIP produces an execution that Increase in accuracy +5.6439 − fools the attacker and sets l to be ¬h, causing his belief Reduction in uncertainty 0.8245 to become less accurate. The decrease in accuracy results in misinformation, which is a negative information flow. When Table 4. Analysis of PWC the attacker’s prebelief is almost completely accurate, such executions will make him more uncertain. But when the at- 4.3. Expected flow for an experiment tacker’s prebelief is uniform, executions that result in misinformation will make him less uncertain; when uncertainty Since an experiment on a probabilistic program S can decreases, an uncertainty metric would mistakenly say that produce many outcomes, it is reasonable to assume that flow has occurred. Table 3 demonstrates this phenomenon quantitative information flow properties will discuss ex- concretely. The quadrant labels refer to Figure 2. For each pected flow over those outcomes. So we define expected quadrant, the attacker’s prebelief bH , observation o,andthe Q E flow E over all outcomes from experiment : resulting postbelief bH is given in the top half of the table. In the bottom half, increase in accuracy is calculated QE(E) Eo∈δL[Q( E, B(E,o) )] using the information flow metric Q( E,bH ), and reduc- = o (δ L)(o) tion in uncertainty is calculated using the difference in en- ·Q( E, ([[S]] ( σ˙ L ⊗ bH )|o) H) ) tropy H(bH ) −H(bH ). where δ =[[S]] ( σ˙ L ⊗ σ˙ H ) gives the distribution on out- Finally, recall that when the attacker guessed a password comes and Eδ[X] is the expected value of an expression X incorrectly in Section 1, his belief became more accurate with respect to distribution δ. and more uncertain. Table 4 gives the exact changes in his Expected flow is useful in analyzing probabilistic pro- accuracy and uncertainty, using guess g = A and password grams. Consider a faulty password checker: p = C. FPWC : if p = g then a := 1 else a := 0; In summary, uncertainty is inadequate as a metric for in-

a := ¬a 0.1 skip formation flow if input distributions represent attacker beliefs. By Theorem 2, information flows when an attacker’s With probability 0.1, FPWC inverts the authentication flag. belief becomes more accurate, but an uncertainty metric can Can this program be expected to confound attackers—does mistakenly measure a flow of zero or less. Inversely, misin- FPWC leak less expected information than PWC ?This formation flows when an attacker’s belief becomes less ac- question can be answered by comparing the expected flow curate, but an uncertainty metric can mistakenly measure a from FPWC to the flow of PWC . Table 5 gives informa- F F positive information flow. Hence, in the presence of beliefs, tion flows from FPWC for experiments E1 and E2 ,which accuracy is the correct metric for information flow. are identical to E1 and E2 from Section 4.1, except that they

E o Q( E, B(E,o) ) QE(E) sive language of security properties in which particular inputs or experiments can be tested. Moreover, our anal- E1 (a → 1) 0.0291 0.0291 (a → 0) impossible ysis can be extended to calculate expected flow over all F experiments. E1 (a → 1) 0.0258 0.0018 (a → 0) −0.2142 Rather than choosing particular high and low input states σH and σL, the system and the attacker may more generally E2 (a → 1) impossible 5.6439 (a → 0) 5.6439 choose distributions δH and δL over high and low states, re- EF (a → 1) −3.1844 2.3421 spectively. These distributions are sampled to produce the 2 Q (a → 0) 2.9561 initial input state. Taking the expectation in E with respect to σH , σL and o then yields the expected flow over all experiments. Table 5. Leakage of PWC and FPWC This extension also increases the expressive power of the experiment model. A distribution over low inputs allows the execute FPWC instead of PWC . Observe that, for both attacker to use a randomized guessing strategy. His distribu- pairs of experiments, the expected flow of FPWC is less tion might also be a function of his belief, though we leave than the flow of PWC . We have confirmed that the random investigation of such attacker strategies as future work. A corruption of a makes it more difficult for the attacker to in- distribution over high inputs could be used, for example, to crease the accuracy of his belief. determine the expected flow of the password checker when F F Outcomes E1 , (a → 0) and E2 , (a → 1) correspond users’ choice of passwords can be described by a distribu- to an execution where the value of a is inverted. The flow tion. for these outcomes is negative, indicating that the program is giving the attacker misinformation. 4.5. Maximum information flow In general, calculating expected flow can require summing over all o ∈ StateL, which might be a countably in- System designers are likely to want to limit the max- finite set. Thus, expected flow could be infeasible to cal- imum possible information flow. So we characterize the culate either by hand or by machine. Fortunately, expected maximum amount of information flow that program S can flow can be conservatively approximated by conditioning cause in a single outcome as the maximum amount of flow on a single distribution rather than conditioning on many from any outcome of any experiment E = S, bH ,σH ,σL observations. Conditioning δ on δL has the effect of mak- on S: ing the low projection of δ identical to δL, while leaving the δ Qmax Q E high projection of unchanged. (S) maxE,bH | bH ∈B(E) ( ,bH ) δ(σ) Q PWC δ|δL λσ . · δL(σ L) Consider applying max(S) to . Assuming that (δ L)(σ L) bH satisfies the admissibility restriction in Section 2.4 and that the attacker guesses an incorrect password yields that A bound on expected flow is then calculated as follows. n−1 PWC can leak at most − lg( · n ) bits per outcome, where n is the number of possible passwords. If =1,the Theorem 3 Let: attacker is forced to have a uniform distribution over pass- E = S, bH ,σH ,σL words, representing a lack of belief for any particular value k δ =[[S]] ( σ˙ L ⊗ σ˙ H ) for the password. Additionally, if n =2 for some k,then eH = (([[S]] ( σ˙ L ⊗ bH ))|(δ L)) H we obtain that for k-bit passwords, PWC can leak at most k − lg(2k − 1) bits in a outcome; for k>12 this is less than Then: 0.0001 bits, supporting the intuition that password check- QE(E) ≤Q( E,eH ) ing leaks little information. Proof. In Appendix B. 4.6. Repeated experiments

4.4. Expected flow over all experiments Nothing precludes repetition of experiments. The most interesting case has the attacker return to step 2(b) of the Uncertainty-based metrics typically consider the ex- experiment protocol in Figure 1 after updating his belief in pected information flow over all experiments, rather step 4; that is, the system keeps the high input to the pro- than the flow in a single experiment. An analysis, like gram constant, and the attacker is allowed to check new low

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE Our first task then is to define the semantics [[ S]] : → Repetition # 1 2 State Dist. The semantics is given in Figure 3. We assume some semantics [[ E]] : State → Val that gives mean- bH : A 0.98 0 → B 0.01 0.5 ing to expressions, and a semantics [[ B]] : State Bool C 0.01 0.5 that gives meaning to Boolean expressions. The statements skip, if,andwhile have essentially the σL(g) A B 5 o(a) 0 0 same denotations as in the standard deterministic case. State update σ[v → V ],whereV ∈ Val, changes the value b : A 0 0 H of v to V in σ. The distribution update δ[v → E] in the de- B 0.5 0 C 0.5 1 notation of assignment represents the result of substituting the meaning of E for v in all the states of δ: Q( E,b ) 5.6439 1.0 H δ[v → E] λσ . ( σ | σ[v→[[ E]] σ]=σ δ(σ )) Table 6. Repeated experiments on PWC The sequential composition of two programs, written S1; S2, is defined using intermediate states. The frequency inputs based on the results of previous experiments. Sup- of S1; S2, starting from σ, reaching a final state σ is the pose that experiment E2 from Section 4.1 is conducted and sum of the probabilities of all the ways that S1 can reach then repeated with σL =(g → B). Then the attacker’s be- some intermediate σ and then S2 from that σ can reach lief about the password evolves as shown in Table 6. σ . Note that ([[S1]] σ)(σ ) is the frequency that S1,begin- Summing the information flow for each repetition yields ning in σ,terminatesinσ , because [[ S1]] σ produces a dis- a total information flow of 6.6439. This total corresponds tribution that, when applied to σ, returns the frequency of to what Q would calculate for a single experiment, if that termination in σ . Similarly, ([[S2]] σ )(σ ) is the frequency experiment changed prebelief bH to postbelief bH2,where that S2, beginning in σ ,terminatesinσ .

bH2 is the attacker’s ﬁnal postbelief in Table 6: The ﬁnal program construct is probabilistic choice, S1 p S2,where0 ≤ p ≤ 1. The semantics multiplies the proba-

˙ − ˙ − D(bH σH ) D(bH2 σH )=6.6439 0 bility of choosing a side Si with the frequency that Si pro- =6.6439 duces a particular output state σ. Since the same state σ might actually be produced by both sides of the choice, the This example suggests a general theorem stating that the frequency of its occurrence is the sum of the frequency from postbelief from a series of experiments, where the postbe- either side: p·([[S1]] σ)(σ )+(1−p)·([[S2]] σ)(σ ). This for- lief from one experiment becomes the prebelief to the next, mula is simplified to the definition in Figure 3 using · and contains all the information learned during the series. Let th + as pointwise operators: Ei = S, bHi ,σH ,σLi be the i experiment in the series, E E and let ri = i,bHi be an outcome from i.Letr1,...,rn p · δ λσ . p · δ(σ) be a series of n outcomes in which prebelief bHi in ex- δ1 + δ2 λσ . δ1(σ)+δ2(σ) E − periment i is postbelief bHi−1 from outcome i 1.Let To show how to lift the semantics in Figure 3 and define bH0 = bH1 be the attacker’s prebelief for the entire series. [[ S]] : Dist → Dist, we use the same intuition as for the se- Theorem 4 quential operator above. There are many states σ in which S could begin execution, and all of them could potentially

˙ − ˙ Q D(bH1 σH ) D(bHn σH )= i | 1≤i≤n (ri) terminate in state σ. So to compute ([[S]] δ)(σ),wetakea weighted average over all input states σ. The weights are Proof. In Appendix B. δ(σ ), which describes how likely σ is to be used as the input state. With σ as input, S terminates in state σ with frequency ([[S]] σ)(σ). Thus we define [[ S]] δ as: 5. Language semantics · [[ S]] δ λσ . σ δ(σ ) ([[S]] σ )(σ) The last piece required for our framework is a seman- = σ δ(σ) · [[ S]] σ tics [[ S]] in which programs are functions that map distributions to distributions. Here we build such a semantics in 5 To ensure that the fixed point for while exists, we have to verify that two stages. First, we build a simpler semantics that maps Dist is a complete partial order with a bottom element. To do so, we have to extend the definition Dist to be State → [0, 1]. This makes states to distributions. Second, we lift the simpler seman- distributions correspond to subprobability measures, and it is easy to tics so that it operates on distributions, as suggested by Sec- check that the semantics always produces subprobability measures as tion 2.2. output. The LUB is at most λσ . 1, and the bottom element is λσ . 0.

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE [[ skip]] σ = σ˙ [[ v := E]] σ = σ˙ [v →E] [[ S1; S2]] σ = λσ . σ ([[S1]] σ)(σ ) · ([[S2]] σ )(σ ) [[ if b then S1 else S2]] σ = if [[ B]] σ then [[ S1]] σ else [[ S2]] σ [[ while B do S]] σ =ﬁxf : Dist → Dist. if [[ B]] σ then f([[S]] σ) else σ˙ · − · [[ S1 p S2]] σ = p [[ S1]] σ +(1 p) [[ S2]] σ

Figure 3. Semantics of programs in states

[[ skip]] δ = δ [[ v := E]] δ = δ[v → E] [[ S1; S2]] δ =[[S2]] ( [[ S1]] δ) [[ if B then S1 else S2]] δ =[[S1]] ( δ | B)+[[S2]] ( δ |¬B) [[ while B do S]] δ =ﬁxf : Dist → Dist.f([[S]] ( δ | B)) + (δ |¬B) · − · [[ S1 p S2]] δ =[[S1]] p δ +[[S2]] ( 1 p) δ

Figure 4. Semantics of programs in distributions

where the equality follows from η-reduction. Wittbold and Johnson [26] introduce nondeducibility Applying this definition to the semantics in Figure 3 on strategies, an extension of Sutherland’s nondeducibility yields [[ S]] δ, shown in Figure 4. This lifted semantics corre- [22]. Wittbold and Johnson observe that if a program is run sponds directly to a semantics given by Kozen [15], which multiple times and feedback between runs is allowed, then interprets programs as continuous linear operators on mea- information can be leaked by coding schemes across multi- sures. Our semantics uses an extension of the distribution ple runs. A system that is nondeducible on strategies has no conditioning operator | to Boolean expressions. Whereas noiseless communication channels between high input and distribution conditioning produces a normalized distribu- low output, even in the presence of feedback. tion, Boolean expression conditioning produces an unnor- The flow model (FM) is a security property first given by malized distribution: McLean [19] and later given a quantitative formalization by δ|B λσ . if [[ B]] σ then δ(σ) else 0 Gray [11], who called it the Applied Flow Model (AFM). The FM stipulates that the probability of a low output may By producing unnormalized distributions as part of the depend on previous low outputs, but not on previous high meaning of if and while statements, we are tracking the fre- outputs. Gray formalizes this in the context of probabilis- quency with which each branch of the statement is chosen. tic state machines, and he relates noninterference to the rate of maximum flow between high and low. Browne [1] devel- 6. Related work ops a novel application of the idea behind the Turing test to characterize information flow: a system passes Browne’s We believe our work is the first to address and show Turing test exactly when for all finite lengths of time, the information flow over that time is zero. Halpern and O’Neill the importance of attacker beliefs in quantifying information flow. Perhaps the earliest published connection between [12] construct a framework for reasoning about secrecy that generalizes many previous results on qualitative and proba- information theory and information flow is Denning [5], bilistic, but not quantitative, security. which demonstrates the analysis of a few particular assignment and if statements by using entropy to calculate leak- Volpano [23] gives a type system that can be used to age. Millen [20], using deterministic state machines, proves establish the security of password checking and one-way that a system satisfies noninterference exactly when the mu- functions such as MD5 and SHA1. Noninterference does tual information between certain inputs and outputs is zero. not allow such functions to be typed, so this type system is He also proposes mutual information as a metric for infor- an improvement over previous type systems. However, the mation flow, but does not show how to compute the amount type system does not allow a general analysis of quantita- of flow for programs. tive information flow. Volpano and Smith [24] give another

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE type system that enforces relative secrecy, which requires ability distribution on a property of the confidential data. that well-typed programs cannot leak confidential data in They use Bayesian reasoning, based on observation of ran- polynomial time. domized data, to update the attacker’s probability distri- Weber [25] defines n-limited security, which allows de- butions. Their distributions are similar to our beliefs, but classification at a rate that depends, in part, on the size n of have a strong admissibility restriction: the attacker’s prebe- a buffer shared by the high and low projections of a state. lief must be the same distribution from which the system Lowe [16] defines the information flow quantity of a pro- generates the high input. They also show that relative en- cess with two users H and L to be the number of behaviors tropy can be used to bound the maximum privacy breach of H that L can distinguish. When there are n such distin- for a randomized operator. guishable behaviors, H can use them to transmit lg n bits to L. These both measure the size of channels rather than ac- 7. Conclusion curacy of belief. Di Pierro, Hankin, and Wiklicky [7] relax noninterfer- This paper presents a model for incorporating attacker ence to approximate noninterference, where “approximate” beliefs into analysis of quantitative information flow in pro- is a quantified measure of the similarity of two processes in grams. Our theory reveals that uncertainty, the traditional a process algebra. Similarity is measured using the supre- metric for information flow, is inadequate: it cannot sat- mum norm over the difference of the probability distribu- isfactorily explain even the simple example of password tions the processes create on memory. They show how to in- checking. Accuracy is the appropriate metric for informa- terpret this quantity as a probability on an attacker’s ability tion flow in the presence of attacker beliefs, and we have to distinguish two processes from a finite number of tests, shown how to use it to calculate exact, expected, and maxi- in the sense of statistical hypothesis testing. Finally, the pa- mum information flow. A formal model of experiments we per explores how to build an abstract interpretation that al- give enables precise descriptions of attackers’ actions. We lows approximation of the confinement of a process. Their have instantiated the model with a probabilistic semantics more recent work [6] generalizes this to measuring approx- and have given several examples of applying the model and imate confinement in probabilistic transition systems. metric to the measurement of information flow. Clark, Hunt, and Malacaria [3] apply information theory to the analysis of while-programs. They develop a static Acknowledgments analysis that provides bounds on the amount of information that can be leaked by a program. The metric for informa- Stephen Chong participated in an early discussion about tion leakage is based on conditional entropy; the analysis the distinction between attacker beliefs and reality. Sig- consists of a dataflow analysis, which computes a use-def mund Cherem, Stephen Chong, Jed Liu, Kevin O’Neill, graph, accompanied by a set of syntax-directed inference Nathaniel Nystrom, Riccardo Pucella, Lantian Zheng, and rules, which calculate leakage bounds. In other work [2], the reviewers provided helpful feedback on the paper. Se- the same authors investigate other leakage metrics, settling bastian Hunt also provided insightful comments on this on conditional mutual information as an appropriate met- work. ric for measuring flow in probabilistic languages; they do not consider relative entropy. Mutual information is always References at least 0, so unlike relative entropy it cannot represent misinformation. [1] R. Browne. The Turing test and non-information flow. In McIver and Morgan [17] calculate the channel capacity S&P 1991, pages 375–385, Oakland, CA, 1991. IEEE. of a program using conditional entropy. They add demonic [2] D. Clark, S. Hunt, and P. Malacaria. Quantified interference: nondeterminism as well as probabilistic choice to the lan- Information theory and information flow. Presented at Work- guage of while-programs, and they show that the perfect shop on Issues in the Theory of Security (WITS’04), April 2004. security (0 bits of leakage) of a program is determined by [3] D. Clark, S. Hunt, and P. Malacaria. Quantified interference the behavior of its deterministic refinements. They also con- for a while language. Electronic Notes in Theoretical Com- sider restricting the power of the demon making the nonde- puter Science, 112:149–166, Jan 2005. terministic choices, such that it can see all data, or just low [4] T. M. Cover and J. A. Thomas. Elements of Information The- data, or no data. ory. John Wiley & Sons, 1991. Evfimievski, Gehrke, and Srikant [8] quantify privacy [5] D. Denning. Cryptography and Data Security. Addison- breaches in data mining. In their framework, randomized Wesley, 1982. operators are applied to confidential data before the data is [6] A. Di Pierro, C. Hankin, and H. Wiklicky. Measuring the released. A privacy breach occurs when release of the ran- confinement of probabilistic systems. To appear in Theoret- domized data causes a large change in an attacker’s prob- ical Computer Science.

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE [7] A. Di Pierro, C. Hankin, and H. Wiklicky. Approximate A. Relaxing restrictions on programs non-interference. Journal of Computer Security, 12(1):37– 81, 2004. To allow mutable high inputs, as discussed in Section 0 [8] A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy 3.4, let the notation bH mean the same distribution as bH , breaches in privacy preserving data mining. In Proc. ACM except that each state of its domain has a 0 as a super- Symposium on Principles of Database Systems, pages 211– script. So, if bH ascribes probability p to the state σ,then 222, San Diego, CA, 2003. 0 0 bH ascribes probability p to the state σ . We assume that S [9] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. cannot modify states with a superscript 0. In the case that Bayesian Data Analysis. Chapman and Hall/CRC, 2004. states map variables to values, this could be achieved by 0 [10] J. A. Goguen and J. Meseguer. Security policies and security defining σ to be the same state as σ, but with the super- models. In Proc. IEEE Symposium on Security and Privacy, script 0 attached to variables; for example, if σ(v)=1then pages 11–20, Apr. 1982. σ0(v0)=1. Note that S cannot modify σ0 if did not origi- [11] J. W. Gray, III. Toward a mathematical foundation for infor- nally contain any variables with superscripts. mation flow security. In S&P 1991, pages 21–35, Oakland, Using this notation, the belief revision operator is ex- CA, 1991. IEEE. tended to B!, which allows S to modify the high state in ex- [12] J. Halpern and K. O’Neill. Secrecy in multiagent systems. periment E = S, bH ,σH ,σL . In CSFW 2002, pages 32–46, Cape Breton, Nova Scotia, ! 0 0 Canada, 2002. IEEE. B (E) (([[S]] ( σ˙ L ⊗ bH ⊗ bH )|o)) H [13] J. Y. Halpern. Reasoning about Uncertainty. MIT Press, where o ∈ Γ(δ) L Cambridge, Massachusetts, 2003. δ =[[S]] ( σ˙ L ⊗ σ˙ H ) [14] G. A. Jones and J. M. Jones. Information and Coding The- ory. Springer, 2000. In the first line of the definition, the high input state is 0 [15] D. Kozen. Semantics of probabilistic programs. Journal of preserved by introducing the product with bH , and the at- Computer and System Sciences, 22:328–350, 1981. tacker’s postbelief about the input is recovered by restrict- [16] G. Lowe. Quantifying information flow. In CSFW 2002, ing to H0, the high input state with the superscript 0. pages 18–31, Cape Breton, Nova Scotia, Canada, 2002. To allow nonterminating programs, let State⊥ State∪ IEEE. {⊥},and⊥ L ⊥. Nontermination is now allowed as an [17] A. McIver and C. Morgan. A probabilistic approach to in- observation, leading to an extended belief revision operator formation hiding. In Programming Methodology, chapter 20, B!⊥: pages 441–460. Springer, 2003. !⊥ 0 0 [18] A. McIver and C. Morgan. Abstraction, Refinement and B (E) (out ⊥(S, σ˙ L ⊗ bH ⊗ bH )|o) H Proof for Probabilistic Systems. Springer, 2004. where o ∈ Γ(δ) L [19] J. McLean. Security models and information flow. In S&P δ = out ⊥(S, σ˙ L ⊗ σ˙ H ) 1990, pages 180–189, Oakland, CA, 1990. IEEE. [20] J. Millen. Covert channel capacity. In S&P 1987, pages 60– Function out ⊥(S, δ) produces a distribution that yields 66, Oakland, CA, 1987. IEEE. the frequency that S terminates, or fails to terminate, on in- [21] L. H. Ramshaw. Formalizing the Analysis of Algorithms. put distribution δ: PhD thesis, Stanford University, 1979. Available as technical report, XEROX PARC, 1981. out ⊥(S, δ) λσ : State⊥ . if σ = ⊥ − [22] D. Sutherland. A model of information. In Proceedings of then δ [[ S]] δ the 9th National Computer Security Conference, pages 175– else ([[S]] δ)(σ) 183, Sep 1986. If S does not terminate on some input states in δ, then out- [23] D. Volpano. Secure introduction of one-way functions. In [[ S]] δ δ CSFW 2000, pages 246–254, Cambridge, UK, 2000. IEEE. put distribution will contain less mass than ;other- wise, δ will equal [[ S]] δ. Missing mass corresponds to [24] D. Volpano and G. Smith. Verifying secrets and relative se- nontermination [21, 18], so out ⊥ maps the missing mass to crecy. In POPL 2000, pages 268–276, Boston, MA, 2000. ⊥. ACM. [25] D. G. Weber. Quantitative hook-up security for covert channel analysis. In CSFW 1988, pages 58–71, Franconia, NH, B. Proofs 1988. IEEE. [26] J. T. Wittbold and D. Johnson. Information flow in nonde- Theorem 1 Let E = S, bH ,σH ,σL . terministic systems. In S&P 1990, pages 144–161, Oakland, CA, 1990. IEEE. B(E,o)(σH )=B(E,o)

B(E,o) Q( E,bH ) = Deﬁnition of B = Deﬁnition of Q

· ˙ ⊗ ˙

˙ − ˙ bH (σH ) ([[S]] ( σL σH ) L)(o) D(bH σH ) D(b σH ) H σ bH (σH ) · ([[S]] ( σ˙ L ⊗ σ˙ H ) L)(o) H = Definitions of D and point mass = Definition of δ L, apply distribution to o − lg bH (σH )+lgb (σH ) · ˙ ⊗ ˙ H bH (σH ) ( σ| σL=o ([[S]] ( σL σH )(σ)) = Lemma 2.1, properties of lg bH (σH ) · ( ([[S]] ( σ˙ L ⊗ σ˙ H )(σ)) σH σ | σL=o − lg PrδA (o)+lgPrδS (o) = Lemma 1.1 · ˙ ⊗ ˙ = Definition of I bH (σH) ( σ | σL=o ([[S]] ( σL σH )(σ)) Iδ (o) −Iδ (o) σ | σL=o [[ S]] ( σ˙ L ⊗ bH )(σ ) A S = Distributivity, one-point rule σ | σL=o ∧ σH=σ σ bH (σH )·[[ S]] ( σ˙ L⊗σ˙ H )(σ) H H Lemma 2.1 σ | σL=o [[ S]] ( σ˙ L ⊗ bH )(σ ) = Lemma 1.1 δS(o) bH (σH )=bH (σH ) · [[ S]] ( σ˙ L ⊗ bH )(σ) δA(o) σ | σL=o ∧ σH=σH ˙ ⊗ σ | σL=o [[ S]] ( σL bH )(σ ) Proof. = Distributivity [[ S]] ( σ˙ L⊗bH )(σ) bH (σH ) σ | σL=o ∧ σH=σH [[ S]] ( σ˙ ⊗b )(σ) σ | σL=o L H = Definition of B = Definition of δ L (([[S]] ( σ˙ L ⊗ bH )|o) H)(σH ) (([[S]] ( σ˙ L ⊗ bH ))|o)(σ) σ | σH=σH = Definition of δ H = Definition of δ H, applying distribution to σH ˙ L ⊗ H | σ | σH=σH ([[S]] ( σ b ) o)(σ) ((([[S]] ( σ˙ L ⊗ bH ))|o) H)(σH ) = Definition of δ|o B E = Definition of ( ,o) [[ S]] ( σ˙ L ⊗ bH )(σ) σ | σH=σH ∧ σL=o B(E,o)(σH ) ([[S]] ( σ˙ L ⊗ bH ) L)(o) = One-point rule: σ = o, σH [[ S]] ( σ˙ L ⊗ bH )( o, σH ) ([[S]] ( σ˙ L ⊗ bH ) L)(o)

Lemma 1.1 Let σ L = o. = Definition of δA 1 · [[ S]] ( σ˙ L ⊗ bH )( o, σH ) δA(o) [[ S]] ( σ˙ L ⊗ bH )(σ)= bH (σH ) · [[ S]] ( σ˙ L ⊗ σ˙ H )(σ) σH = Definition of [[ S]] δ 1 · (σ˙ L ⊗ bH )(σ ) · ([[S]] σ )(o˙ ⊗ σ˙ H ) Proof. δA(o) σ = Definition of ⊗, point mass [[ S]] ( σ˙ L ⊗ bH )(σ) 1 · bH (σ H) δA(o) σ | σ L=σL = Definition of [[ S]] δ ·([[S]] ( σ˙ L ⊗ (σ˙ H)))(o˙ ⊗ σ˙ H ) (σ˙ L ⊗ bH )(σ ) · ([[S]] σ )(σ) σ = High input is immutable = Definition of point mass 1 · bH (σ H) δA(o) σ | σ L=σL ∧ σ H=σH σ | σL=σ bH (σ H) · ([[S]] σ )(σ) L ·([[S]] ( σ˙ L ⊗ (σ˙ H)))(o˙ ⊗ σ˙ H ) = Let σ = σL,σH , nesting, one-point rule = One-point rule: σ = σL,σH σ bH (σH ) · [[ S]] ( σ˙ L ⊗ σ˙ H )(σ) 1 H · bH (σH ) · ([[S]] ( σ˙ L ⊗ σ˙ ))(o˙ ⊗ σ˙ H ) δA(o) H = High input is immutable, Definition of δ L 1 · bH (σH ) · (([[S]] ( σ˙ L ⊗ σ˙ )) L)(o) δA(o) H Theorem 2 Let E = S, bH ,σH ,σL . = Definition of δS δS(o) Q E I −I bH (σH ) · ( ,bH )= δA (o) δS (o) δA(o)

Proceedings of the 18th IEEE Computer Security Foundations Workshop (CSFW’05) 1063-6900/05 $20.00 © 2005 IEEE = Definition of δ L, applied to o Note that the immutability of high input can be dispensed ([[S]] ( σ˙ L ⊗ bH ))( o, σH ) o (δ L)(o) · with using the technique of Section 3.4. σ | σL=o [[ S]] ( σ˙ L ⊗ bH )(σ ) = Let σ = o, σH , change of dummy: o := σ, Theorem 3 Let: definition of ≈L E = S, bH ,σH ,σL σ | σH=σ (δ L)(o) δ =[[S]] ( σ˙ L ⊗ σ˙ H ) H ([[S]] ( σ˙ L ⊗ bH ))(σ) eH = (([[S]] ( σ˙ L ⊗ bH ))|(δ L)) H · ˙ L ⊗ H σ | σ ≈Lσ [[ S]] ( σ b )(σ ) Then: = Definition of δ|δL, applied to σ QE(E) ≤Q( E,eH ) σ | σH=σ ([[S]] ( σ˙ L ⊗ bH )|(δ L))(σ) Proof. H = Definition of δ H, applied to σH Q E E( ) (([[S]] ( σ˙ L ⊗ bH )|(δ L)) H)(σH ) Q = Definition of E = Definition of eH Q E B E Eo∈δ L[ ( , ( ,o) )] eH (σH ) = Definition of Q,letbH = B(E,o) )

˙ − ˙ Eo∈δ L[D(bH σH ) D(b σH )] H Theorem 4 = E

Linearity of

˙ − ˙ Q D(bH σH ) D(b σH )= (ri)

0 Hn i

˙ − ˙ D(bH σH ) Eo∈δ L[D(bH σH )] ≤ Jensen’s inequality and convexity of D,see[4] Proof.

˙ − ˙ D(bH σH ) D(Eo∈δ L[bH ] σH ) i | 1≤i≤n Q(ri) = Lemma 3.1 = Deﬁnition of Q

˙ − ˙

D(bH σH ) D(eH σH )

˙ − ˙ i | 1≤i≤n D(bHi σH ) D(bHi σH ) = Deﬁnition of Q = Lemma 4.1

Q( E,eH )

˙ − ˙ D(bH1 σH ) D(bHn σH )

E Lemma 3.1 Let , δ , eH be defined as in Theorem 3. Let Lemma 4.1 Assume for f and f that ∀i | 1≤i≤n f(i)= B E bH = ( ,o) and assume the range of o is always δ L. f (i − 1), n ≥ 2,andf(1) = f (0). Then: Then: ( f(i) − f (i)) = f(1) − f (n) Eo[bH ]=eH i | 1≤i≤n Proof. (by extensionality) Proof. − Eo[bH ](σH ) i | 1≤i≤n f(i) f (i) = Definitions of E, b = f(i)=f (i − 1) H − − ( o (δ L)(o) ·B(E,o)(σH ) i | 1≤i≤n f (i 1) f (i) = Definition of B(E,o) = Distributivity − − o (δ L)(o) · ((([[S]] ( σ˙ L ⊗ bH ))|o) H)(σH ) ( i | 1≤i≤n f (i 1)) i | 1≤i≤n f (i) = Definition of δ H, applying distribution to σH = Change of dummy: i := i − 1 (δ L)(o) ( f (i)) − f (i) o i | 0≤i≤n−1 i | 1≤i≤n ·( (([[S]] ( σ˙ L ⊗ bH ))|o)(σ )) = Split off term, n ≥ 2 σ | σ H=σH = Definition of δ|o, applying distribution to σ f (0) + ( f (i)) i | 1≤i≤n−1 o (δ L)(o) −( i | 1≤i≤n−1 f (i)) − f (n) ˙ ⊗ ([[S]] ( σL bH ))(σ ) = f(1) = f (0) ·( σ | σH=σ ∧ σL=o ) Arithmetic, H ([[S]] ( σ˙ L ⊗ bH ) L)(o) f(1) − f (n) = One-point rule ([[S]] ( σ˙ L ⊗ bH ))( o, σH ) o (δ L)(o) · ([[S]] ( σ˙ L ⊗ bH ) L)(o)