Introspective Forecasting

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Introspective Forecasting Loizos Michael Open University of Cyprus [email protected] Abstract Of course, introspective forecasting is not aimed to be used to make “self-fulfilling” predictions, which are realized sim- Science ultimately seeks to reliably predict aspects ply because they were acted upon. Instead, having the ability of the future; but, how is this even possible in light to make such “self-fulfilling” predictions allows one to ratio- of the logical paradox that making a prediction may nally choose whether it is warranted to act upon them, if this cause the world to evolve in a manner that defeats indeed happens to lead to a future that is preferred over one it? We show how learning can naturally resolve this where no action is taken. We defer a discussion of the philo- conundrum. The problem is studied within a causal sophical issues that inevitably arise for an extended version or temporal version of the Probably Approximately of this paper, and focus herein on the technical framework. Correct semantics, extended so that a learner’s predictions are first recorded in the states upon which the learned hypothesis is later applied. On the nega- 2 Overview tive side, we make concrete the intuitive impossibil- ity of predicting reliably, even under very weak as- We start with an overview of certain key aspects of this work. sumptions. On the positive side, we identify conditions under which a generic learning schema, akin 2.1 Causal Learnability to randomized trials, supports agnostic learnability. Causal learnability [Michael, 2011] extends the PAC learning semantics [Valiant, 1984] to a causal or temporal setting. 1 Introduction Below we present a simplified version of that framework. S S Several scientific disciplines seek to reliably forecast a future Given as inputs T , n Tn, S , n Sn, "; δ 2 (0; 1], state of affairs. Often, utility is derived not simply by having where Sn is a set of states of size n, and Tn is a set of func- one’s predictions verified, but by acting upon the predictions tions that map Sn to Sn, the learning algorithm proceeds as prior to their verification. Being able to predict the stock mar- follows: During a training phase, the learner observes pairs of ket, for example, would be of little value if one were not to states hs; t(s)i, such that s is drawn independently at random invest based on that prediction, or get paid to share it. In cer- from some arbitrary but fixed probability distribution Dn, and tain cases, taking such actions is accompanied by the distinct such that t is a fixed target function in Tn. After time upper risk of altering the temporal evolution of the environment in bounded by a fixed polynomial in n, 1=", 1/δ, the learner re- a way that would lead to a different, than anticipated, future. turns, with probability at least 1 − δ, a hypothesis function The problem is modeled as a forecaster in a possibly adver- h : Sn !Sn such that the probability that h(s) = t(s) on sarial environment that becomes aware of the forecaster’s pre- a state s randomly drawn from Dn is at least 1 − ". If the diction before the environment commits to the outcome that learner can meet the stated requirements for every choice of the forecaster attempts to predict. In this context, a forecaster n, "; δ 2 (0; 1], Dn, and t 2 Tn, then the class T is causally is required to be introspective, and acknowledge the effect of learnable. Previous work [Michael, 2011] investigates which its predictions on their realizability. The contributions of this classes are causally learnable, by establishing, among others, work are four-fold: (i) it formalizes introspective forecast- connections to PAC concept learning [Valiant, 1984]. ing (cf. Definition 4) as an extension of the PAC semantics One may allow functions in Tn that map Sn to a set Ln, [Valiant, 1984; Michael, 2011]; (ii) it identifies a realizability when modeling scenarios where a learner seeks to predict not quantity (cf. Definition 3) that measures the optimal correct- the entire successor state t(s) of s 2 Sn, but only some as- ness that any introspective forecaster can achieve; (iii) it pro- pect, or label in Ln, of t(s). When jLnj = 2, this resembles poses a general schema for constructing introspective fore- PAC concept learning, but with a key conceptual difference: casters and identifies conditions (cf. Theorem 1) under which the causal dependence of the label of t(s) on s. Even when s it applies; (iv) it establishes limitations on what can be intro- is observed, the label of t(s) is not determined until the suc- spectively forecasted (cf. Theorem 2) and discusses situations cessor state t(s) materializes; if we were to somehow affect s under which these limitations can be lifted (cf. Theorem 3). after observing it, this would also affect the label of t(s). 3714 2.2 Recording Predictions case, however, the hypothesis function h appears in both sides To make the aforementioned conceptual distinction between of the equality h(s) = f(t(s / h(s))) that it seeks to obey; it PAC concept learning and causal learning explicit, let Tn be is not necessarily possible to factor out h from the equation. as in the original causal learning framework, and introduce a This leads to the consideration of an agnostic setting, where new set Fn of functions that map Sn to Ln. Like concepts h is required to obey the equality with probability 1 − α − ", in PAC learning, each f 2 Fn maps a state to a label of that where 1−α is the optimal probability that any hypothesis can S achieve. As is typical in agnostic or noisy learning models, same state. Unlike PAC learning, however, class F , n Fn does not aim to act as a hidden structure to be learned; the hid- the learner is allowed time polynomial in the usual learning den structure is the temporal dynamics of the environment as parameters, but also in 1=(1 − α), to compensate for the ad- captured by class T . Instead, F is available for the learner to versarial environment it faces. These extensions give rise to the final definition of introspective learnability that we adopt. choose to apply any f 2 Fn on the successor state t(s) 2 Sn to produce a label (i.e., to extract a particular feature). For Under this final definition, we establish certain connections between extrospective (and, in particular, causal) learnability any chosen function f 2 Fn, then, the definition of causal learnability applies mutatis mutandis as before, except that and introspective learnability. We first show that T is introspectively learnable through F if T is causally learnable (cf. the learner has access to Ln and f, and aims to return a hy- Theorem 1), but causal learnability is not also a necessary pothesis function h : Sn !Ln such that h(s) = f(t(s)). If the learner can meet the learning requirements for every condition (cf. Theorem 2). We then introduce a metric of how expressive set F is (cf. Definition 5) in terms of how many choice of n, "; δ 2 (0; 1], Dn, t 2 Tn, and f 2 Fn, then the n class T is extrospectively learnable through the class F. functions in the set are needed to invert the mappings they Extrospective learnability adopts an assumption that is im- produce, and show that if running time polynomial in this in- plicit in several supervised learning frameworks: the learner vertibility dimension of Fn is also allowed, then the necessity operates outside the environment, in that the label of a learn- of causal learnability can be established (cf. Theorem 3). ing instance is determined irrespectively of the learner’s prediction. This is so even when the learning instances are cho- 2.4 Randomized Trials sen adversarially with access to the learner’s current hypoth- The sufficiency of causal learnability for introspective learn- esis (e.g., in mistake-bounded learning [Littlestone, 1988]), ability comes through a process analogous to randomized tri- since even then the label of each (adversarially) chosen learn- als, often used in clinical research and certain social sciences. ing instance is still independent of the learner. Closer in spirit Roughly, a randomized trial examines the effects of compet- to the causal nature of extrospective learnability, the stated ing intervention strategies by applying a randomly chosen one assumption is also followed by certain statistical and learning on each instance of interest. In the case of clinical research, approaches that are used to predict the temporal evolution of for example, one administers a randomly chosen drug among a sequence of states [Murphy, 2002; Box et al., 2008]. a certain set of such to each patient participating in the study. To account for situations where the very act of making a The need for randomized trials arises from the fact that ad- prediction may potentially affect the correct / expected pre- ministering a drug to cure / prevent a predicted illness in the diction, we introduce the notion of a recorder function / that future may have inadvertent and unpredictable effects. What maps Sn × Ln to Sn; thus, a recorder function records into is needed, then, is making predictions that are introspectively a state s 2 Sn a given label l 2 Ln and produces a new accurate. Ideally, a doctor should administer drugs based not state s / l 2 Sn. This new state belongs in the same set Sn of on the extrospectively predicted illness of a patient (i.e., “take states as state s, since the manner in which the environment is this drug because without it you will remain / become sick”), represented is independent of whether predictions are made.

Introspective Forecasting

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support