<<

Poisson-Networks: A Model for Structured

Shyamsundar Rajaram Thore Graepel Ralf Herbrich ECE Department MLP Group MLP Group University of Illinois Microsoft Research Microsoft Research Urbana, USA Cambridge, UK Cambridge, UK

Abstract access systems.

Consider V of events ti. We would like to Modelling structured multivariate point pro- find a compact representation of the joint probability cess data has wide ranging applications like distribution of the V time series. Assuming each time understanding neural activity, developing series is modelled by an inhomogeneous Poisson pro- faster file access systems and learning depen- cess, a unique representation is obtained in terms of dencies among servers in large networks. In V rate functions λi(t) each of which depends on all this paper, we develop the Poisson network events of the V times series up to time t. Clearly, we model for representing multivariate struc- need further assumption on the rate functions to infer tured Poisson processes. In our model each the rate functions from a finite amount of data. The node of the network represents a Poisson pro- proposed approach makes four crucial assumptions: cess. The novelty of our work is that wait- ing times of a process are modelled by an ex- 1. The rate function of each process depends only on ponential distribution with a piecewise con- the history in a short time window into the past. stant rate function that depends on the event counts of its parents in the network in a gen- 2. The rate function of each process depends only eralised linear way. Our choice of model al- on a small number of other processes. Adopting lows to perform exact sampling from arbi- a notation, these are also referred trary structures. We adopt a Bayesian ap- to as parent nodes. proach for learning the network structure. Further, we discuss fixed point and sampling 3. The rate function of each process depends only on based approximations for performing infer- the empirical rates of its parents. ence of rate functions in Poisson networks. 4. The rate function of each process is parameterised by a generalised linear model. 1 Introduction The standard approach adopted in modelling such Structured multivariate point processes appear in time series is by discretising the time axis into in- many different settings ranging from multiple spike tervals of fixed length δ and transforming the time train recordings, file access patterns and failure events series into a sequence of counts per interval. The de- in server farms to queuing networks. Inference of the pendency structure between nodes of such a network, structure underlying such multivariate point processes which is also known as a dynamic Bayesian network and answering queries based on the learned structure (DBN) [Dean and Kanazawa, 1989, Murphy, 2001], is an important problem. For example, learning the can be modelled using transition probabilities between structure of cooperative activity between multiple neu- states at time t and t + δ given the states of all the rons is an important task in identifying patterns of parents of each node at time t. This approach suffers information transmission and storage in cortical cir- from the problem that the discretisation is somewhat cuits [Brillinger and Villa, 1994, Aertsen et al., 1989, arbitrary: A small value of δ will result in a redundant Oram et al., 1999, Harris et al., 2003, Barbieri et al., representation and a huge computational overhead but 2001, Brown et al., 2004]. Similarly, learning the access a large value of δ may smooth away important details patterns of files can be exploited for building faster file in the data. We overcome the issues of discretisation by consider- ing the limit δ → 0 and thus modelling the waiting A A A time in each time series directly. Nodelman et al. A t−1 t t+1 [2002] and Nodelman et al. [2003] use a similar continu- ous time approach for modelling homogeneous Markov processes with a finite number of states. In their B B B t−1 B t+1 work the rate functions depend on the current state of t the parents only which leads to an efficient and exact learning algorithm. In our setup, we model the wait- Figure 1: Illustration of a Poisson network with a cy- ing times as an exponential distribution with piecewise cle unwrapped in time. Note that we use dashed lines to constant rates that depend on the count history of par- indicate parent relationships in Poisson networks. These ent nodes. arrows should always be interpreted as pointing forward in time. The paper is structured as follows: In Sec. 2 we in- troduce our Poisson network model. In Sec. 3 we describe an efficient technique for performing exact sets T of non-overlapping intervals where the aver- sampling from a Poisson network. We describe pa- k age waiting time in each of the intervals T in T is rameter estimation and structure learning using ap- k governed by λ . The waiting time of the piecewise ex- proximate Bayesian inference techniques in Sec. 4. In k ponential distribution has the following density: Sec. 5 we discuss approximate marginalisation and in- ference based on sampling and fixed point methods. Yl Finally, we describe experiments on data generated ak(t,Tk) p(t|λ, T ) := λk exp(−λkbk(t, Tk)), by our sampling technique for performing parameter k=1 learning, structure estimation and inference in Sec. 6. X ak(t, Tk) := It∈[t0,t1),

[t0,t1)∈Tk 2 The Poisson Network Model X bk(t, Tk) := (t1 − t0)It>t1 + (d − t0)It∈[t0,t1).

+ Ni [t0,t1)∈Tk Consider time series data ti ∈ (R ) from V point processes. Each element of ti = [ti,1, . . . , ti,Ni ] corre- sponds to series of times ti,j at which a particular event In the Poisson network model we aim at modelling occurred. We model each time series as an inhomoge- the independence relations between the V processes. neous Poisson process and the modelling problem is to Similarly to Bayesian networks, the independence re- capture the dependency between different processes, lations are implicitly represented by the connectivity both qualitatively (structure) and quantitatively (pa- of a directed graph. Hence, the network structure rameters). can be fully represented by all parent relationships, A Poisson process is an instance of a counting process M := {π(i) ⊆ {1,...,V }}. Interestingly, the seman- which is characterised by a rate function λ(t). If the tics of a parent relationship is slightly different from rate function is constant over time, the Poisson process Bayesian networks: Since the rate function of each is called homogeneous, otherwise it is called inhomo- node only depends on the past history of its parents, geneous [Papoulis, 1991]. A very useful property of a cycles w.r.t. the parent set M are permissible (see Fig- homogeneous Poisson process is that the waiting time ure 1). between two consecutive events is exponentially dis- We model the dependency of a node i on its par- tributed with rate λ, that is, ents π(i) = {p1, . . . , pmi } by constraining the rate function λi(t) to be a function of the event counts ∀t ∈ R+ : p(t|λ) := λ exp(−λt) . of all parents in time windows of length φ, ni(t) := [ni,p (t), . . . , ni,p (t)] where ni,p represents the num- Note that the mean of the waiting time distribution is 1 mi j ber of events of node p in the time window [t − φ, t).1 given by λ−1. j We consider a generalized linear model for the rate The waiting time distribution for a non-homogeneous function λ(t) in terms of the counts n (t). Possible link process is a generalized exponential distribution. In i functions include the probit or sigmoid function which the special case of a piecewise constant rate function exhibit a natural saturation characteristic. However, λ(t), the waiting time has a piecewise exponential dis- in the following we will consider the canonical link tribution. Proposition 1 (Piecewise Exponential Distribu- 1Note that we will treat φ as a fixed quantity throughout tion). Suppose we are given l rates λ ∈ (R+)l and l the paper. function for the Poisson model resulting in:   X A AA λi(t; wi, xi) = exp wi,0 + wi,jxi,j(t) , (1) j∈π(i) where wi,0 represents a bias term and translates into a multiplicative base rate exp(wi,0). We define BB ³ ´ x (t) := ln 1 + λˆ (t) , i,j i,j (a) (b) (c) n (t) λˆ (t) := i,j . i,j φ Figure 2: Poisson networks for illustrating sampling. ˆ Note that λi,j(t) is the empirical rate of node j w.r.t. node i. Alternatively, the rate function can be writ- ten in a way that is more amenable for inference (see durations τl,j are defined by Sec. 5): τl,1 := t˘l,1 − tl−1 , Y ³ ´w ˆ i,j λi(t) = exp(wi,0) 1 + λi,j(t) . τl,j := t˘l,j − t˘l,j−1 , j ∈ {2, . . . , kl} j∈π(i) ˘ τl,kl+1 := tl − tl,kl .

A positive value for wi,j indicates an excitatory effect and a negative value corresponds to an inhibitory ef- The probability density for the time series t = fect on the rate. The rate of each node at any given [t1, . . . , tN ] of a given node is obtained as the product time instant is a function of the empirical rate of its of the probability densities for each of the mutually parents resulting in an inhomogeneous process. Note exclusive subintervals (t0, t1],..., (tN−1, tN ], that in practice there are only finitely many different YN count vectors due to the finite length time windows. p(t|w,M) = p(t |t , w,M) , (2) The piecewise constant characteristic of the rate func- l l−1 l=1 tion and the absence of cycles in the Poisson network enables us to do exact sampling (see Sec. 3). where the initial time t0 is assumed to be 0. Let us derive the of the time series corresponding at given nodes. Consider the 3 Sampling from a Poisson Network th th l − 1 and l events occurring at times tl−1 and tl in a given node and the instants at which the count vector The piecewise constant behavior of the rate function ˘ ˘ 2 changes in this interval be represented as tl,1,..., tl,kl . λ(t) and the absence of cycles in the network allows The probability density of an event at time tl given us to perform exact sampling from a Poisson Network. the previous event happened at tl−1 is denoted by Let us consider the simple case of sampling from a sin- p(tl|tl−1, w,M). This density is a product of probabili- gle node network as shown in Fig. 2 (a). Sampling is ties of non-occurrence of an event in each of the disjoint straightforward in this case, because the node repre- ˘ ˘ ˘ ˘ ˘ subintervals, i.e. (tl−1, tl,1], (tl,1, tl,2],..., (tl,kl−1, tl,kl ] sents a homogeneous Poisson process. The standard and the probability density of occurrence of an event way to sample from a homogeneous Poisson process is ˘ at tl in the subinterval (tl,kl , tl]: to make use of the property that the waiting times be-   tween two adjacent events are iid and are distributed kYl+1 exponentially with rate parameter λ. p(t |t , w,M) = λ  exp (−λ τ ) , l l−1 l,kl+1 l,j l,j Let us consider a two node network shown in Fig. 2 j=1 (b). Sampling for node A can be done in a straight- forward way as explained above because at any time t, where λ is the rate as in (1) for a node which is l,j A is independent of B . The rate function for node B a function of the event counts of its parents in the t t can be calculated using (1) once the whole time series jth subinterval corresponding to the lth event and the for node A is known. The rate function for B is piece- 2We drop the node subscript and suppress the depen- wise constant because of the finite number of events dence on the time series of all parent nodes for better for node A in the time window [t − φ, t] for varying readability. t. Sampling from a waiting time distribution with a (a) (b) (c) (d)

1 1' AB AB AB AB

C E C E C' E 2 C' E 3

D D

Figure 3: (a) An arbitrary Poisson network. (b) SCC’s indicated by ellipses. (c) Directed acyclic graph C0 represents the SCC formed by nodes C and D. (d) Topological ordering where the number (10 and 1 can be sampled independently of each other in parallel) written adjacent to the node indicates the order. piecewise constant rate function is done by rejection 2. Sampling has to be done in a certain order. sampling. Denote the current value of the rate func- tion by λ(t) and assume the rate remains unchanged The first observation is illustrated in Fig. 3 (b) and until time tˆ. Sample τ from the waiting time distri- such groups of nodes are known as strongly connected bution with parameter λ(t). Accept τ if t + τ ≤ tˆ components (SCC) in [Cormen et al., and reject τ otherwise. The sampling time is now up- 2001]. A strongly connected is a directed dated to tˆ or t + τ depending on whether the sample subgraph with the property that for all ordered pairs is rejected or accepted, respectively. of vertices (u, v), there exists a from u to v.A Now, let us consider a two node network as shown in variant of depth first search can be used to find all the Fig. 2 (c) where there exists a cyclic relationship be- SCCs of a graph. A directed acyclic graph graph is tween nodes A and B. It is worth mentioning again obtained by replacing each SCC by a single node as that node B depends on the event counts of node A shown in Fig. 3 (c). Now we observe that the nodes in the past only and vice versa. Sampling from the of the directed acyclic graph can be sampled from in nodes cannot be done in an independent way because such a way that, when sampling a given node, its par- ents have been sampled before (see Fig. 3 (d)). Topo- of the cyclic relationship. Let the values λA(t) and logical sorting is a method to compute such an order λB(t) of the rate functions for nodes A and B, respec- tively, be calculated by (1) and assume that the rates efficiently (see Cormen et al. [2001]). remain constant until tˆA, tˆB (excluding the mutual in- teraction in the future). Sample waiting times τA and 4 Parameter Estimation and τB for both nodes using the rates λA(t) and λB(t), re- Structure Learning spectively. The sample corresponding to max(τA, τB) is rejected because an earlier event at the other node Given time series data T := [t ,..., t ] we are inter- might have changed the value of the rate function; the 1 V ested in finding the most plausible structure M ∗ and other sample is accepted if it is within the constant parameter estimates for w in the linear model of the time interval of the firing rate. The sampling time is rate functions (see (1)). We take a Bayesian approach updated to min(tˆ , tˆ ) or t + min(τ , τ ) depending A B A B to the problem of parameter estimation which, at the on whether the other sample was rejected or accepted, same time, provides a score over structure M by the respectively. marginalised likelihood. We choose a Gaussian prior This sampling technique can be generalised to an ar- distribution over the weights bitrary number of nodes. We note that the structure YV of the network can be made use of in order to per- 2 form sampling in a very efficient way. The two key p(W|M) = N (wi; 0, σ I) . observations are that, i=1 Using Bayes rule, we obtain the posterior 1. Certain groups of nodes have to sampled from in a synchronized, dependent way because of mutual p(T|W,M)p(W|M) p(W|T,M) = . dependence of nodes in the group. p(T|M) The structure learning problem can be written as, 4.2 Variational Approach

M ∗ := argmax p(M|T) The variational approach to solving problem of param- M eter estimation and structure learning is to introduce = argmax p(T|M)p(M) M a family of probability distribution qθ(wi) parame- terised over θ which gives rise to a lower bound on YV the marginalised likelihood3: p(T|M) = p(ti|{tj|j ∈ π(i)}) i=1 ln (p(T|M)) ≥ LM (θ) , YV ¿ µ ¶À p(M) := p(π(i)) , p(wi|M) LM (θ) := ln + hln (p(T|wi,M))i , i=1 q (w ) qθ θ i qθ where we make use of a structural prior p(M) that which follows from an application of Jensen’s inequal- factors over the parents of the V nodes. Note that ity to the concave logarithm function [Jordan et al., p(ti|{tj|j ∈ π(i)}) corresponds to (2) marginalised 1999]. This lower bound on the log-marginal probabil- over w. In contrast to structure learning in Bayesian ity is a function of the parameters θ of the distribution networks which requires optimisation over the set of q. This bound is tight if we choose qθ(wi) to be the directed acyclic graphs, no such constraints are im- true posterior p(wi|T,M). We also observe that the posed in Poisson network or continuous time Bayesian lower bound is the sum of two terms which correspond networks [Nodelman et al., 2003]. Hence, the struc- to the negative of the Kullback-Leibler divergence be- ture learning problem can be solved by finding the tween the prior term and the distribution qθ(wi) and most likely parents of each node independently. In the second term is the log-likelihood of the data aver- theory the exact structure can be learned without re- aged over qθ(wi). gard to the number of parents. However, in practice The idea of the variational approximation Bayesian the running time of the learning procedure is a func- algorithm is to maximize the lower bound L with tion of the number of parents of a node and hence while M respect to the parameters of the q distribution. The learning structures we fix the maximum number of par- structure learning problem can be posed as maximiza- ents. Now we present two approximate Bayesian infer- tion of the lower bound L (θ) which can be written ence techniques for parameter estimation and struc- M as, ture learning. · ¸ ∗ M := argmax max LM (θ) . θ 4.1 Laplace Approximation M Similar to the Laplace approximation, we choose In the Laplace approximation [Kass and Raftery, qθ(wi) to be a multivariate Gaussian N (wi; µ, Σ). For 1995], the posterior is approximated by a multivari- a given network structure M, we can use conjugate ate Gaussian density, gradients to find the maximum of LM (µ, Σ). Note that the gradients are given by p(wi|T,M) ≈ N (wi; wMAP, Σ) , XN where wMAP is the mode of the posterior, ∇µ(LM (µ, Σ)) = −µ + xi,ki ... i=1 à ! wMAP := argmax p(T, wi|M) , N ki T T X X x Σxi,j + 2µ xi,j wi − τ x exp i,j , i,j i,j 2 and the covariance matrix Σ is given by i=1 j=1 1 1 −T ¡ T ¢−1 Σ := −∇∇ ln(p(T, w |M))| . ∇Σ(LM (µ, Σ)) = − 2 I + (Σ) ... wi i w=wMAP 2σ 2 à ! XN Xki T T The marginalised likelihood can be obtained by, 1 xi,jΣxi,j + 2µ xi,j − τ x xT exp . Z 2 i,j i,j i,j 2 i=1 j=1 p(T|M) = p(T, wi|M) dwi Z p 5 Inference in a Poisson network m ≈ (2π) i |Σ|N (wi; wMAP, Σ) dwi p Once the network structure and parameters are mi = (2π) |Σ| . learned, several queries regarding the distribution rep- resented by the network can be answered. A common The mode wMAP is found by the conjugate gradients 3 algorithm using the gradient of ln(p(T, wi|M)) w.r.t. We use the shorthand notation hf(·)iq to denote an w (see also (2)). expectation of f w.r.t. the distribution q. inference problem that is encountered in a Bayesian network is one where data is available for some nodes 1 2 (denoted by MD), there are a few query nodes (de- noted by MQ) whose behavior has to be estimated from the data and the remaining nodes are hidden nodes (denoted by MH for which no data is available. We pose an analogous problem for Poisson networks. The rate of any arbitrary node can be computed if the time series data for all its parents are known for the pe- riod of interest. However, certain configurations of the 3 4 5 problem involving hidden nodes are not easy to solve because of the problem of entanglement as mentioned in Nodelman et al. [2002]. The standard procedure Figure 4: The random Poisson network graph that gen- in a Bayesian network is to marginalise over all the erated the samples in Fig. 5. Red arrow indicates an in- hidden nodes and obtain an estimation for the query hibitory influence and the blue arrow indicates an excita- nodes using the data in observed nodes. Marginalis- tory influence. ing over a hidden node amounts to integrating over all possible time series for a particular node which, in general, is impossible. Hence, approximations are whole time series is observed. Hence, we approximate necessary. the empirical rate by the true rate,

Z t ˆ ni,j(t) 1 5.1 Inference by Sampling λi,j(t) = ≈ λj(t˜)dt˜≈ λj(t) , φ φ t−φ A straightforward way to solve the inference problem is to perform marignalisation of the hidden nodes by us- where we assume that the length of time window, φ, ing a few instantiations of them which can be obtained is very small. Substituting back in (3) we obtain by sampling from the network. The samples can then XV be used to obtain averaged rate estimates at each of ln(λ (t)) ≈ w + w ln (1 + λ (t)) . (4) the query nodes. The sampling procedure is the same i i,0 i,j j j=1 as our procedure in Sec. 3 with a minor modification that the sampling is to be performed conditioned on We make the assumption that the rate is constant in the data that has been observed in the data nodes the time interval [t−φ, t] and hence the Poisson process MD. We observe that if the data for a node i is com- of each of the parents j, j ∈ {1,...,V } is assumed pletely known, then the parent nodes of i do not have to be a homogeneous Poisson process with rate λj(t). any influence on i in the subsequent sampling process. Now, (4) can be constructed for all nodes that are Hence, we can safely remove all the parental relation- not observed and the set of equations have the form ships for all the fully observed nodes and obtain a new λ(t) = F (λ(t)), F : RV → RV , whose solution are 0 network M = M − {i → j, i ∈ MD, j ∈ π(i)}. Sam- the fixed points of the system. We perform fixed point pling is done from the new network M 0 for all the iterations starting from randomly initialized values for unobserved nodes and then average firing rates can be λi(t), i ∈ {1 ...V }. In experiments we observe that obtained for all the query nodes. at convergence, the estimated rate corresponds to the mean rate in the time interval [t−φ, t] calculated from 5.2 Estimation of Steady State Rate actual data.

An alternative way to solve the inference problem is to approximate the empirical rate λˆ with a steady state 6 Experimental Results rate. According to our model, the rate of a node can be written as, In this section, we test the presented methods on data sampled using the algorithms from Sec. 3. We show   experiments of approximate inference of rate using a XV ³ ³ ´´ ˆ λi(t) = exp wi,0 + wi,j ln 1 + λi,j(t)  . (3) sampling based approximation and a fixed point ap- j=1 proximation.

We notice that this is (1) if we consider wi,j = 0 for Sampling Firstly, we generate a (see all the pairs of nodes which are not dependent. The Fig. 4) with V nodes. As mentioned before, because of ˆ empirical rate λi,j(t) cannot be obtained unless the computational issues we fix the maximum number of 1 Parameter Estimation results using variational approximation 2.5 0.5 node 1 0 0 10 20 30 40 50 60 70 80 90 100 2 1

0.5 1.5 node 2 0 0 10 20 30 40 50 60 70 80 90 100 1 1

0.5 0.5 node 3 0 0 10 20 30 40 50 60 70 80 90 100 1 0

0.5

Estimated weights −0.5 node 4 0 0 10 20 30 40 50 60 70 80 90 100 1 −1

0.5 −1.5 node 5 0 0 10 20 30 40 50 60 70 80 90 100 time(s) −2

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Figure 5: Samples generated from a random network with True weights V = 5 and maximum number of parents is 2 Figure 6: Results for parameter estimation and structure learning using the variational approximation for a network with 15 nodes in which maximum number of parents is 2. parents πmax for every node. Now, our efficient sam- Each node i with a parent j has a true expected weight pling technique given in Sec. 3 is used to generate sam- wi,j (x-axis) and a posterior estimate (y-axis) shown as a ples from the network. Samples generated from a ran- 5%-95% posterior quantile interval with a mark indicating the mean. The empty circles indicate the edges that were domly generated network with V = 5 and πmax = 2 is not identified by the structure learning algorithm. as shown in Fig. 5. The time window duration φ was fixed to 1 second. of rates using the sampling based approximation and the fixed point approximation. The results shown in Parameter estimation and structure learning Fig. 7 shows that the fixed point approximation tech- The Laplace and variational approximation developed nique (which is significantly faster than the sampling in Sec. 4.1 and Sec. 4.2 are tested using samples gen- based approximation) closely tracks the true rate. erated from random graph structures. We observed that the posterior distribution p(wi|T,M) can be approximated very well by a multivariate Gaussian. 7 Conclusions and Discussion Thus, both approximations methods perform very ac- curately. Fig. 6 shows the parameter estimation and Poisson networks are models of structured multivariate structure learning results obtained using variational point processes and have the potential to be applied approximations for a 15 node network with maximum in many fields. They are designed such that sampling number of parents restricted to 2. The results indicate and approximate learning and inference are tractable that the few edges that were missed by the structure using the approaches described above. In particular, learning algorithm (circles) correspond to weak depen- structure learning is carried out in a principled yet dencies i.e., edges with weights close to zero. The re- efficient Bayesian framework. sults of the Laplace approximation is similar to the variational approximation except that the variational Future applications of the Poisson network model in- approximation had higher confidence in its estimate. clude biological problems such as analysing multiple neural spike train data Brown et al. [2004] as well as application in computer science such as prediction of file access patterns, network failure analysis and queu- Approximate Inference of rates The approxi- ing networks. mate inference techniques developed in Sec. 5 was tested on a random graph having 10 nodes. Samples A promising direction for future research is to combine were generated from the network using our sampling Poisson networks with continuous time Bayesian net- technique. Nodes were chosen at random and marked works in order to be able to model both event counts as observed, hidden and as query nodes. The inference and state (transitions). For example, consider file ac- task was to estimate rates for all the nodes marked cess events and the running states of CPU processes. as query nodes. The samples generated for the ob- Clearly, they exhibit an interesting relationship the served nodes were made use of to perform inference discovery of which would it possible to predict and 0.42 analysis. Advanced Methods of Physi- fixed−point approx Actual firing rate ological System Modelling, 3:111–127, 1994. Sampling approximation E. N. Brown, R. E. Kass, and P. P. Mitra. Multiple 0.4 neural spike train data analysis: state-of-the-art and future challenges. Nature Neuroscience, 7:456–461, 0.38 April 2004. T. Cormen, C. Leiserson, R. Rivest, and C. Stein. 0.36 Introduction to Algorithms, Second Edition. MIT

Rate Press, 2001.

0.34 T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational In- telligence, 5:142–150, 1989. 0.32 K. D. Harris, J. Csicsvari, H. Hirase, G. Dragoi, and G. Buzsaki. Organization of cell assemblies in the 0.3 0 50 100 150 200 250 300 350 400 450 500 hippocampus. Nature, 424:552–556, July 2003. time M. I. Jordan, Z. Ghahramani, T. Jaakkola, and L. K. Figure 7: Inference of rate functions Saul. An introduction to variational methods for graphical models. , 37(2):183–233, 1999. hence optimise process-file interactions. Similarly, in R. E. Kass and A. E. Raftery. Bayes factors. Journal biological application external stimuli can be modelled of the American Statistical Association, 90:773–795, as states influencing physiological events such as neu- 1995. ral spike activity. K. P. Murphy. Dynamic Bayesian Networks: Repre- Finally, it would be of great interest to study the infor- sentation, Inference and Learning. PhD thesis, UC mation processing potential of Poisson networks in the Berkeley, Computer Science Division, 2001. sense of artificial neural networks (see Barber [2002]). U. Nodelman, C. Shelton, and D. Koller. Continuous time Bayesian networks. In Proceedings of the 18th Acknowledgments Shyamsundar would like to Annual Conference on Uncertainty in Artificial In- thank Microsoft Research for providing funding for telligence (UAI-02), pages 378–387, San Francisco, this project during an internship in Cambridge and CA, 2002. Morgan Kaufmann Publishers. Thoams Huang for generous support. We are very in- U. Nodelman, C. Shelton, and D. Koller. Learning debted to Ken Harris for interesting discussions about continuous time Bayesian networks. In Proceedings biological applications of Poisson networks, and we of the 19th Annual Conference on Uncertainty in would like to thank Tom Minka for interesting dis- Artificial Intelligence (UAI-03), pages 451–458, San cussions about approximate inference. Francisco, CA, 2003. Morgan Kaufmann Publishers. M. W. Oram, M. C. Wiener, R. Lestienne, and B. J. References Richmond. Stochastic nature of precisely timed A. M. Aertsen, G. L. Gerstein, M. K. Habib, and spike patterns in visual system neuronal responses. G. Palm. Dynamics of neuronal firing correlation: Journal of Neurophysiology, 81(6):3021–3033, 1999. modulation of ”effective connectivity”. Journal of A. Papoulis. Probability, Random Variables, and Neurophysiology, 61(5):900–917, 1989. Stochastic Processes. Mcgraw-Hill, New York, 3 edi- D. Barber. Learning in spiking neural assemblies. tion, 1991. In Advances in Neural Information Processing Sys- tems, pages 165–172, 2002. R. Barbieri, M. C. Quirk, L. M. Frankb, M. A. Wil- son, and E. N. Browna. Construction and analysis of non-poisson stimulus-response models of neural spiking activity. Journal of Neuroscience Methods, 105(1):25–37, 2001. D. R. Brillinger and A. E. P. Villa. Examples of the investigation of neural information processing by