A Logic for Causal Inference in Time Series with Discrete and Continuous Variables

A Logic for Causal Inference in Time Series with Discrete and Continuous Variables Samantha Kleinberg Columbia University New York, NY [email protected] Abstract windows of time. Similarly, Granger causality [1969] allows for only discrete lags between cause and effect and assumes Many applications of causal inference, such as find- that the relationships are between individual variables. Fi- ing the relationship between stock prices and news nally, prior work on inferring complex causal relationships reports, involve both discrete and continuous vari- represented as logical formulas identified factors that substan- ables observed over time. Inference with these tially impact the probability of the effect, but required that all complex sets of temporal data, though, has re- variables be discretized [Kleinberg and Mishra, 2009]. mained difficult and required a number of simplifi- In this work we address the problem of inference when we cations. We show that recent approaches for infer- are interested primarily in the level of the effect. We will ring temporal relationships (represented as logical show that instead of a difference in probability, a cause’s sig- formulas) can be adapted for inference with contin- nificance for an effect can be assessed using an average dif- uous valued effects. Building on advances in logic, ference in conditional expectation. By extending the under- PCTLc (an extension of PCTL with numerical con- lying logic used to represent the relationships, this approach straints) is introduced here to allow representation allows for structured representation and automated inference and inference of relationships with a mixture of of complex temporal relationships with discrete and contin- discrete and continuous components. Then, find- uous components. The result is a succinct representation of ing significant relationships in the continuous case the most significant relationships in a set of data. We evaluate can be done using the conditional expectation of an this approach empirically on both synthetic and actual finan- effect, rather than its conditional probability. We cial market data to validate it and assess its practical use. evaluate this approach on both synthetically gener- ated and actual financial market data, demonstrat- ing that it can allow us to answer different questions 2 Background than the discrete approach can. We begin by reviewing the causal inference framework being extended. Earlier work by Kleinberg et al. [2009] introduced 1 Introduction a method for causal inference based on probabilistic notions Relationships such as “smoking causes lung cancer” or “gene of causality, where the relationship between cause and effect A regulates gene B” can help us decide to quit smoking, or is described in a structured way using probabilistic computa- investigate a pathway during drug development, but in many tion tree logic (PCTL) formulas. To assess whether the for- cases it is useful to understand the relationship between the mulas satisfied by the data are significant, the average dif- magnitude of the cause and the probability of the effect or ference each cause makes to the probability of its effect is explore causes that produce the greatest level of the effect. computed and techniques for multiple hypothesis testing are We may want to know not only whether positive news about applied to determine the level at which a relationship should [ ] a company causes its price to increase, but how much of be considered statistically significant Efron, 2004 . an increase we can expect, when it will happen, and what other factors are needed. To do this we need to reason about 2.1 Probabilistic computation tree logic complex relationships that have qualitative, quantitative, and We briefly discuss PCTL and refer the reader to [Hansson and temporal components. Thus far, approaches to causal infer- Jonsson, 1994] for a more in depth description. Formulas in ence have focused on separate aspects of this problem, with the logic are defined relative to a probabilistic Kripke struc- none addressing all areas. Methods based on Bayesian net- ture (also called a discrete time Markov chain (DTMC)), con- works [Pearl, 2000] yield compact representations of sparse sisting of a set of states, S, a start state si, a total transition systems, but neither they nor their temporal extensions, dy- function T that gives the probability of transition between namic Bayesian networks [Murphy, 2002], allow for auto- all pairs of states and a labeling function L that indicates the mated representation of and reasoning about relationships propositions from the set of atomic propositions A that are more complex than one variable causing another, or involving true at each state. In the case of causal inference we do not normally have or infer these structures but instead test the re- average significance of each cause for its effect. The basic lationships directly in the data. premise of this approach is that when testing for spuriousness, There are two types of formulas in PCTL: state formulas one is trying to find whether there are better explanations for that describe properties of individual states, and path formu- the effect. With X being the set of all potential causes of e, las that describe properties along sequences of states. These the significance of a particular cause c for an effect e is: can be defined inductively as follows:1 X P (ejc ^ x) − P (ej:c ^ x) " (c; e) = (3) 1. Each atomic proposition is a state formula. avg jX n cj 2. If f and g are state formulas, so are :f and f ^ g. x2Xnc 3. If f and g are state formulas, 0 ≤ r ≤ s ≤ 1 and Note that the relationships between c and e, and x and e, have r 6= 1, fU ≥r;≤sg and fW ≥r;≤sg are path formulas. time windows associated with them (as in equation (2)), so the conjunctions in (3) refer to instances where e occurs such 4. If f is a path formula and 0 ≤ p ≤ 1, [f] and [f] ≥p >p that either c or x could have caused it (e.g. thinking of each are state formulas. time window as a constraint, both constraints on when e could The operators in item (2) have their usual meanings. Item have occurred are satisfied). This average significance score (3) describes until (U) and weak-until (W ) formulas. First, can be used to partition the potential causes. fU ≥r;≤sg, means that f must be true at every state along the Definition 2.2. A potential cause c of an effect e is an "- path until g becomes true, which must happen in between r insignificant cause of e if j" (c; e)j < ". and s time units. The weak-until formula is similar, but does avg not guarantee that g will ever hold. In that case, f must hold Definition 2.3. A potential cause c of an effect e that is not an for at least s time units. Finally, in (4) probabilities are added "-insignificant cause of e is an "-significant or just-so cause to path formulas to make state formulas. Then the sum of the of e. probabilities of the paths from the state where the path for- To determine an appropriate value of ", the problem is mula holds is at least p. One shorthand that will be useful for treated as one of multiple hypothesis testing, aiming to con- representing causal relationships is “leads-to”. The formula: trol the false discovery rate. Assuming many hypotheses are being tested and the proportion of true positives is small rel- f ;≥r;≤s g ≡ AG[f ! F ≥r;≤sg] (1) ≥p ≥p ative to this set, methods for empirically inferring the null means that for all paths, from all states, if f is true, then g hypothesis from the data can be applied [Efron, 2004] since will become true in between r and s time units with at least the values of "avg for large scale testing mostly follow a nor- probability p. This operator is defined differently relative to mal distribution, with significant (non-null) values deviating traces (where the problem is closer to runtime verification from this distribution [Kleinberg, 2010]. than model checking) as described in [Kleinberg, 2010] and the semantics here. 3 Inference of relationships with discrete and 2.2 Causes as logical formulas continuous components To use this logic for causal inference, Kleinberg et al. [2009] We now introduce an approach for inferring relationships assumed that the system has some underlying structure that is with continuous-valued effects, and explicitly representing not observed, where the temporal observations (such as stock the constraints on continuous valued causes as part of their price movements or longitudinal electronic health records) logical formulas. In the previous section we described evalu- can be thought of as observations of the sequence of states ating the significance of a cause for its effect using the aver- the system has occupied. In model checking, these sequences age difference in probability with each other possible cause of are referred to as traces. The standard probabilistic notion of the effect held fixed. When an effect is continuous, we instead causality, that a cause is earlier than and raises the probabil- want to determine the impact of a cause on the level of the ef- ity of its effect, can then be translated into PCTL to define fect, and can do this using the average difference in expected potential (prima facie) causes. value. For instance, we may want to determine the effect of medications on weight, where it may be difficult to discretize Definition 2.1. Where both c and e are PCTL formulas, c is a this effect of interest, though the potential causes are naturally potential cause of e if, relative to a finite trace (or set of traces) discrete variables.

A Logic for Causal Inference in Time Series with Discrete and Continuous Variables

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support