arXiv:1810.08672v2 [cs.LG] 22 Nov 2018 eemnnso arcsppltdwt h auso some of of values form the the with in populated matrices densities of admitting determinants measures torial [27]. [24], [21], [20], [10], network eerhr oueteepitpoess hndfie on defined when processes, motivated has plane these This the use [19]. [18], expression to moments tractable researchers more and and likelihoods the methods for su to advantages simulation have compared faster poin [7], and, as processes for [19] point models Gibbs [3], statistical well-studied repulsion useful exhibiting distri provide Palm pattern They and analytic [25]. void tions admit as mathemat- the such processes functional, interesting characteristics Laplace point fundamental their several These to to approaches [11]. due properties years ical recent considerabl in attracted have tention processes point determinantal hl tl epn ttatbe epooeatinn oper thinning model a propose realistic the we using more tractable, upon it a keeping develop relies still To while models [4]. process these point bu of Poisson types, majority network wireless overwhelming various the of models random spatial to lead can schemes. approach transmissio probabilistic proposed optimized, These statistically the new, triangles. how demonstrate nearest-neighbour examples on depending thinning itiuin,saitcllann,goercnetworks geometric learning, statistical distributions, hsapoc yiiaigwt eemnna hnigthe thinning Mat determinantal with known imitating ill netw by We observed approach properties. to geometric this fitted particular some be poin with to (supervised patterns models determinantal statistical the for allowing on suited techniques, well based also are int Models processes re usually constants. involving thinning, realizations, of normalizing Poisson instead preferen which, the (and processes, contrast weighting in point ta is Gibbs This be Poisson. finite can be which to process, example, simu for point by cond (non-thinned) accurately underlying as estimated the well be as can probabilities, characteristics measures, void (Palm) moment the as such functional, process, Laplace point that new properti the process The of repulsion. point nume exhibiting functions new model statistical, network a a to gives as serve amenable It techniques. particularly simulation is and s the such, finite leverages on as defined which and, processes proposed, point determinantal is of vantages continuous in cesses eemnna on rcse r enduulyvafac- via usually defined are processes point Determinantal rgnlycle emo on rcse yMch [22], Macchi by processes point fermion called Originally eerhr aeue on rcse ntepaet build to plane the on processes point used have Researchers Terms Index Abstract iceedtriatlpitprocesses point determinantal discrete eemnna hnigo on rcse with processes point of thinning Determinantal Anwtp fdpnettinn o on pro- point for thinning dependent of type new —A R r Ihr-oetinn,a ela soft-core a as well as thinning, hard-core II ern ´ 2 ssailmdl o aesain ncellular in stations base for models spatial as , dpnettinn,dtriatlsbe,Palm subset, determinantal thinning, —dependent .I I. NTRODUCTION ewr erigapplications learning network . .Bazzsy n ..Keeler H.P. and Błaszczyszyn B. scheduling n learning ) ractable ni/N,France Inria/ENS, ustrate itional sand es lating paces eto) ce quire ation rical, at- e well- ken, two ork can bu- ad- the the ch s t t t , iceesetting. discrete nudryn on rcs nabuddsubset bounded a in Fir process approach. point stochastic underlying doubly a an using space continuous in for [17]. [15], framework see comprehensive also a learning; the (supervised) develop used statistical [16] to Taskar and processes Kulesza point setting, this In processes. edsusntokapiain.Tecd o l numerical all [12]. for online code available V The is Section results applications. in results method network and V; discuss the fitting Section we demonstrate in the examples we illustrative present two likelihoods; Palmwith we maximum including on IV characteristics based Section poin their in determinantally-thinned of i distributions; the some ; and introduce finite in processes we processes III determinantal the Section of basics to the thinning determinantal models. the network of observed kernel can of the [16] types fit Taskar various statistic to and used The Kulesza be by then process. proposed point approach learning (non-thinned) t simul underlying by estimate process the point accurately (thinned) new to the one of characteristics allowing determina thus discrete processes, for inherit point available process t point expressions of subset closed-form realization the given process, a bounded point on underlying on Conditioned existing process. lying point new underlying dependen of a a the regions as to seen on be leads depends a can thinning, which which usually sampl operation, on is that This realization. realization) space kernel the state some of discrete using (subset finite, proc point process a this Poisson determinantal of as realization the given considered is a of are choice points natural the Then, a process. which for considered, hsapoc losteueo o-omlzdkres whi as kernels, to non-normalized refer of we use the allows Further approach algebra. mathemati this linear the of problems reducing to considers lattices, down technicalities bounded one as c finite when such with dinality, spaces circumvented on defined largely processes point determinantal be can problem ffidn prpit enlfntos hc edt define interval to the in need eigenvalues which with operators functions, (integral) kernel appropriate finding of eemnna on rcse in processes point determinantal kernel esatb ealn eemnna on rcse na in processes point determinantal detailing by start We elvrg hsln frsac n en on processes point define and research of line this leverage We h ae ssrcue sflos nScinI erecall we II Section In follows. as structured is paper The I D II. R d L n xiiigmr euso hnteunder- the than repulsion more exhibiting and u h anosal rvnigmr s of use more preventing obstacle main the But . TRIATLPITPROCESSES POINT ETERMINANTAL -matrices omr aiydfiedeterminantal define easily more to , R 2 (or R d stedifficulty the is ) ⊂ R [0 , 1] This . more, R ating all s d ntal ess cal ar- he ed ch he st, se al is n I t t A. Finite state space provided all eigenvalues of K are strictly positive, which is P We consider an underlying state space S on which we will equivalent to (Ψ = ∅) > 0. For more details, see, for define a point process (the term carrier space is also used). example, the paper by Borodin and Rains [5, Proposition 1.1] We assume the important simplification that the cardinality of or the book [16] by Kulesza and Taskar. the state space S is finite, that is #(S) < ∞. We consider a III. DETERMINANTALLY-THINNEDPOINTPROCESSES simple point process Ψ on the state space S, which means that Ψ is a random subset of the state space S, that is Ψ ⊆ S. A We now define a new point process, which builds upon a single realization ψ of this point process Ψ can be interpreted homogeneous Φ with intensity λ > 0 Rd simply as occupied or unoccupied locations in the underlying on a bounded region R⊂ of the n-dimensional Euclidean state space S. space. Given a realization Φ= φ, we consider it as the state space S = φ on which a determinantal point process (subset) B. Definition Ψ ⊂ φ is sampled, resulting in a (typically dependent) thinning For a state space S with finite cardinality m := #(S), a of the realization Φ = φ. More precisely, the points in a discrete point process is a determinantal point process Ψ if for realization φ = {xi}i of a Poisson point process Φ form the all configurations (or subsets) ψ ⊆ S, state space of the finite determinantal point processes Ψ, which is defined via P(Ψ ⊇ ψ) = det(Kψ), (1) P(Ψ ⊇ ψ|Φ= φ) = det(Kψ(φ)), (5) where K is some real symmetric m × m matrix, and Kψ := where K (φ) = [K(φ)] is such that ψ ⊂ φ ⊂ R. Note [K]x,y∈ψ denotes the restriction of K to the entries indexed ψ xi,xj ∈ψ by the elements or points in ψ, that is x, y ∈ ψ. The matrix that Ψ is characterized by intensity of the underlying K is called the marginal kernel, and has to be positive semi- Poisson point process Φ and the function K(·) which maps definite. The eigenvalues of K need to be bounded between each (Poisson) realization φ to a semi-definite matrix K(φ) zero and one (with eigenvalues in [0, 1]) having elements indexed by the To simulate or sample a determinantal point process on a points of φ. Rd finite state space, one typically uses an algorithm based on the The point process Ψ is defined on a subset of , but uses eigenvalues and eigenvectors of the matrix K. The number of the discrete approach of determinantal point processes. In other points is given by Bernoulli trials (or biased coin flips) with words, the points of the realization φ are dependently thinned the probabilities of success being equal to the eigenvalues, such that there is repulsion among the points of Ψ. We call this while the joint location of the points is determined by the point process a determinantally-thinned Poisson point process eigenvectors corresponding to the successful trials. Each point or, for brevity, a determintantal Poisson process. is randomly placed one after the other; for further details, The double stochastic construction of determinantally- see [16, Algorithm 1] [19, Algorithm 1] and [28, Algorithm thinned point processes can be compared with the classic 1]. Matern´ hard-core processes (of type I, II and III), which are also constructed through dependent thinning of underlying C. L-ensembles Poisson point processes. For these point processes, there is a In the finite state space setting kernels K can be easily zero that any two points are within a certain fixed defined by using the formalism of L-ensembles. Instead of distance of each other. Determinantal thinning of Poisson point finding a K matrix with appropriate eigenvalues, we can processes can provide examples of soft-core process, where work with a family of point processes known as L-ensembles there is a smaller (compared to the Poisson case) probability that are defined through a positive semi-definite matrices L, that any two points are within a certain distance of each other. which is also indexed by the elements of the space S, but We return to this theme later in our results section, where we the eigenvalues of L, though non-negative, do not need to be fit our new point process to a Mat´ern II hard-core process. less than one. Provided det(I + L) 6=0, where I is a m × m A. Functionals of Ψ identity matrix, we define the kernel The double stochastic construction of Ψ gives K = L(I + L)−1. (2) E[h(Ψ)] = E[E[h(Ψ)|Φ]], (6) This mapping (2) preserves the eigenvectors and maps the corresponding eigenvalues by the function x/(1 + x). Conse- where h is a general real function on the space of realizations quently, the kernel K given by (2) is positive semi-definite of point processes on R (measurable with respect to the with eigenvalues between zero and one. The corresponding usual σ-algebra of counting measures), and the conditional determinantal point process Ψ satisfies expectation on the right-hand-side can be calculated using (5). Several special cases of h admit explicit expressions for this det(Lψ) P(Ψ = ψ)= . (3) conditional expectation allowing one to express E[h(Ψ)] in det(I + L) terms of some other functional of the Poisson point process The relation (2) can be inverted yielding the L-ensemble Φ. The evaluation of such expressions requires simulating at representation of the determinantal point process most the underlying Poisson point process Φ, but not Ψ. This L = K(I − L)−1, (4) not only simplifies the task but also reduces the . 1) Average retention probability: We consider the average 3) Void probabilities and the complement process: We probability that a point is retained (or not removed) after the recall that the number of points of a determinantal point thinning. The average retention probability of a Poisson point process is equal (in distribution) to the number of successful located at x ∈ R is 1 Bernoulli trials with the eigenvalues of K as the parameters. Using this result, one can show that the probability of no π(x) := E[P(x ∈ Ψ|Φ ∪ x)] = E [Kx(Φ ∪ x)] (7) point of φ being retained for Ψ is given by P(Ψ ∩ ψ = where K(Φ ∪ x) denotes the kernel matrix with the entries ∅) = i(1 − λi(φ)) = det((I − K)φ), where λi(φ) are the corresponding to the state space S = Φ ∪ x, and I is an eigenvaluesQ of Kφ. Consequently, we can express the void identity matrix with cardinality of (Φ ∪ x), that is Φ(R)+1, probabilities of the determinantal thinning Ψ of Φ by the and Kx(Φ ∪ x) = K{x}(Φ ∪ x) denotes the restriction of following expression K(Φ ∪ x) to a single element on the diagonal corresponding ν (B) := P(Ψ(B)=0) to x. Ψ E 2) Moment measures and correlation functions: Multiply- = [det((I − K(Φ))Φ∩B]. (12) ing the intensity measure Λ(dx) of the underlying Poisson Observing again in the discrete setting of S that P(S\ Ψ ⊃ process by the average retention probability, we obtain the ψ)= P(Ψ ∩ ψ = ∅) = det((I − K)φ) one sees easily that the intensity measure of Ψ, namely M(dx) := π(x)Λ(dx). In the point process Ψc := Φ \ Ψ formed from the Poisson points case of underlying homogeneous Poisson process of intensity removed by the determinantal thinning with kernel K(·) is also λ (within the considered finite window) we can express the a determinantally-thinned Poisson process with the kernel I − first moment measure as K(·). (But the retained and removed points are not in general independent of each other as in the case of an independent M(B) := λ π(x)dx = λ|B|E [det[KU (Φ ∪ U)] , (8) ZB thinning of Poisson processes.) where B ⊂ R, |B| is the area of B, and U is a single point 4) Laplace functional: For any non-negative function f, uniformly located in B. the Laplace functional of the detetermintally-thinned Poisson The higher measures are similarly given process Ψ is given by by − P Ψ f(x) LΨ(f):= E e x∈ M (n)(d(x ,...,x )) = π(x ,...,x )Λ(dx ) ... Λ(dx ) h i 1 n 1 n 1 n = E [det[I − K′(Φ)] , (13) (9) where where the matrix K′ = K′(φ) has the elements ′ −f(xi) 1/2 −f(xj ) 1/2 E [K] = [1 − e ] [K]xi,xj [1 − e ] , (14) π(x1,...,xn)= det[K(x1,...,xn)(Φ ∪ (x1,...,xn))] xi,yj  (10) for all xi, xj ∈ φ. Shirai and Takahashi [26] proved this for x1 6=,..., 6= xn and 0 otherwise, and K(x1,...,xn)(Φ ∪ in the general discrete case. But in the Appendix A we (x1,...,xn)) is the restriction of the matrix K(Φ ∪ present a simpler, probabilistic proof of the last equality, which (xn,...,xn)) to the the elements x1,...,xn. leverages the finite state space assumption of the determinantal For the the case of an underlying homogeneous Poisson process, circumventing the functional-analytic techniques used process with intensity λ, the second factorial moment measure by Shirai and Takahashi. can be written as 5) Palm distributions: Palm distribution of a point process (2) 2 E can be interpreted as the probability distribution of a point M (B1,B2)= λ |B1||B2| K(U,V )(Φ ∪ U ∪ V )] , (11)   process Ψ conditioned on a point of the point process Ψ where U and V are points uniformly located in B1 and B2 existing at some location u on the underlying state space respectively, and a similar expression holds for -th moment n S. If we condition on n points of the point process existing measure. These expressions allow us to more efficiently esti- at n locations T = {x1,...,xn}, then we need the n-fold mate the measure with stochastic simulation. Palm distribution. The reduced Palm distribution is the Palm distribution when we ignore (remove) the points in the T . Remark III.1. When the matrix K(x1,...,xn)(Φ∪(xn,...,xn)) The reduced Palm version ΨT of Ψ given points at T , is a in (10) does not depend on Φ and [K(x1,...,xn)]xi,xj = K(xi,yj) for some appropriate function K, then Ψ has the determinantal thinning of some Gibbs point process having moment measures (9) in the form of a (usual, continuous) density with respect to the original Poisson process Φ. More determinantal point process with kernel K. However, it does precisely, for any real h on the space of not seem evident how one finds K such that the resulting realizations of point processes on R, matrix K has eigenvalues in [0, 1] for all configurations of T 1 T E[h(Ψ )] = E[E[h(Ψ¯ )|Φ] det(KT (Φ ∪ T ))], (15) points (x1,...,xn), n ≥ 1. Note that this is not equivalent π(T ) to guaranteeing that the eigenvalues of the integral T where π(T ) = π(x1,...,xn) is given by (9) and Ψ¯ is a related to K are in [0, 1], the latter condition being usually determinantal thinning of Φ with Palm kernel KT (φ) given required in continuous determinantal point process framework. by the Schur complement of the block (or submatrix) KT (φ∪ 1To simplify the expressions we slightly abuse the notation writing φ ∪ x T ) of the matrix K(φ ∪ T ); see Appendix B. In the case instead of φ ∪ {x}. of one-point conditioning T = {u}, the corresponding Schur u u complement representation of the Palm kernel K = K (φ) process at location x, while the larger [S]x,y value for two has elements locations x and y, the less likely realizations will occur with u [K]x,u[K]y,u two points simultaneously at both locations. If qx ≤ 1, then [K ]x,y = [K]x,y − , (16) [K]u,u an additional probabilistic interpretation exists: Determinantal point process Ψ characterized by L-ensemble L (L-ensemble where K = K(φ ∪ u), so K and Ku are (n + 1) × (n + 1) and n × n matrices. process for short) in the form of (18) is an independent The general expression in the right-hand-side of (15) can thinning of the S-ensemble, with retention probabilities equal 2 be understood in the following way: Ψ¯ T is the reduced Palm to qx, for all x ∈ S. version of Ψ, on each realization Φ∪T (that is, conditioned to One way of constructing a positive semi-definite matrix L contain T , and with T removed from the conditional realiza- is to use some known kernel functions, such as those used for functions of Gaussian processes. tion). The biasing by det(KT (Φ∪T )) transforms conditioning on a given realization of Poisson process to the average one. Example IV.1 (Squared exponential (or Gaussian) similarity We shall prove (15) in Appendix B, where we also recall d kernel). For S ⊂ R , the similarity kernel is [S]x,y = 2 2 two further, equivalent characterizations of the reduced Palm Ce−(|xi−xj | /σ ), where | · | is the Euclidean distance and ¯ T distribution Ψ in the discrete setting. σ> 0, C > 0 are suitably chosen the parameters. 6) Nearest neighbour distance: Using (16) and (12) one can express the distribution function Gu(r) of the distance Another possibility is to specify S as some Gram matrix ⊤ from the point of u ∈ Ψu to its nearest neighbour S = B B, where, often normalized, columns of the matrix B are some vectors vx representing points x ∈ S in the state Gu(r)=1 − P min |u − x| > r space S. Quantities such as these vectors are called covariates x∈Ψu  (in ) or features (in computer ), among other =1 − νΨu (Bu(r)) terms. The dimension of these diversity (covariate or feature) u E[(1 − det[(I − K (Φ))] )[K]u,u(Φ ∪ u)] vectors can be arbitrary. Note in this case the similarity = Φ∩Bu (r) , π(u) between locations x and y is given by the scalar product of the ⊤ (17) respective diversity vectors [S]x,y = vx vy, thus points with where π(u) is the average retention probability (7). more collinear diversity vectors repel each other more. Similarly, the scalar-valued qualities qx can be modeled by IV. STATISTICAL FITTING using some quality (covariate or feature) vectors fx of some Discrete determinantal point processes are suitable for fit- arbitrary dimension. The following construction will be used ting techniques such as max-log-likelihood-methods [16], [19]. in our numerical examples. Relevant to the present work, Kulesza and Taskar [16] devel- Example IV.2 (Exponential quality model). The qualities qx oped a statistical (supervised) learning method allowing one to depend on the quality vectors fx in the following way approximate an empirically-observed thinning mechanism by ⊤ (θ fx) a determinantal thinning model. In other words, the training qx = qx(θ) := e , (19) data consists of sets with coupled subsets that need to be where θ is a parameter vector with the same dimension as fx. fitted. This approach was originally motivated by the auto- B. Learning determinantal thinning parameters mated analysis of documents (the sets) and generation of their abstracts (the subsets). Inspired by this work, our proposal is Consider a situation when some number of pairs of patterns to fit determinantally-thinned point processes to real network of points (φt, ψt), t = 1 ...,T is observed, where φt is a layouts, with particular focus on models of (transmission) realization of a (say Poisson) process and ψt ⊂ φt is some scheduling schemes: locations of potential transmitters are subset (due to thinning) of this realization. Our goal is to fit a the underlying point patterns (the sets) and the locations determinantal thinning model Ψ to this observed data. More actually scheduled for transmissions are the retained points precisely, we will find Ψ which maximizes the likelihood of (the subsets). observing thinned realizations ψt, given (complete) realiza- tions φt, assuming independence of realizations of pairs across A. Specifying quality and repulsion of points t = 1,...,T . This is equivalent to the maximization of the For an interpretation of the L-matrix, we briefly recall the following log-likelihood approach proposed by Kulesza and Taskar [16]. Consider a T P matrix L whose elements can be written as L{(φt,ψt)} = log (Ψ = ψt|Φ= φt) tY=1  [L]x,y = qx [S]x,y qy, (18) T + det(Lψt (φt) for x, y ∈ S, where qx ∈ R and S is a symmetric, positive = log , (20) det(I + L(φ ) semi-definite m × m matrix. These two terms are known as Xt=1  t  quality and the similarity matrix. The quality qx measures the where L(φ) is the L-ensemble characterizing the determinantal goodness of point qx ∈ S, while [S]x,y gives a measure of thinning of Φ. Fitting the determinantal thinning to (φt, ψt), similarity between points x and y. The larger the qx value, the t = 1 ...,T consists thus in finding model parameters that more likely there will be a point of the determinantal point maximize (20). The exponential quality model IV.2 with an ′ ′ arbitrary similarity matrix S allows for standard optimization with radius rW = rW +rM in the Mat´ern case and rW = r+2rT methods because the expression in (20) is a concave function in the triangle case, and then thin the points accordingly. of θ as shown by Kulesza and Taskar [16, Proposition 4.1]). But the fitting (or learning) data will only contain points on the original disk of radius rW, which means that the fitted V. TEST CASES determinantal thinning will be dependent on the boundary of We will illustrate the fitting method of the determinantally- the observation window. thinned Poisson process Ψ, outlined in Section IV, by fitting We implemented everything in MATLAB [12] and ran it on the point process Ψ to two types of points processes both a standard machine, taking mostly seconds to complete each constructed through dependent thinning. These two point of the three components: generation of the test cases; fitting processes are suitable and demonstrative models as they have of a determinatally-thinned model; and empirical validation of a similar two-step construction: 1) Simulate a Poisson point fitted model. The fitting method (outlined in SectionIV) used process. 2) Given a realization of this point process, retain/thin a standard optimization function (fminunc) in MATLAB the points according to some rule. We will see that they that uses the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algo- also represent, in some sense, two extreme cases: one is well rithm, which is a popular type of gradient method, particularly captured just by the diversity matrix, the other just by the in machine learning. quality model. D. Numerical results A. Training sets: Two dependently-thinned point process We simulated a Poisson point process with intensity λ = 10, 1) Matern´ II case: The first test case is the well-known where the circular observation window had a radius rW = 1. Matern´ II point process. To construct it, all points of the For the Mat´ern II, the parameters were rM =0.2530 (yielding underlying Poisson process are assigned an independent and λM = 4.3076). For the triangular process, they were rT = ˆ identically distributed mark, say a uniform 0.6325 (λT =4.8961 an empirical estimate). on [0, 1], and then the points that have a neighbor within 1) Quality versus diversity fitting: We found that we could fit the point process Ψ to the Mat´ern II process by just distance rM with a smaller mark are removed; for details, see, for example, the books [6, Section 5.4] or [4, Section 3.5.2]. optimizing σ and θ0, and so ignoring the non-constant terms The Mat´ern II model is characterized by two parameters: of the quality feature vectors fx. The addition of the terms θ3 and θ gave negligible gain. This suggests that the essence of an inhibition radius rM > 0 and density λ > 0 of the 4 underlying Poisson point process. The resulting density is the point process Mat´ern II is captured through its repulsion, 2 −λπr 2 and not by our choice of the quality model qx(θ). λM = (1 − e )/(πrM). 2) Triangle case: For the second test case, we remove Conversely, for the triangle process, we could set σ = 0, a given Poisson point if the total distance to its first and thus reducing S to an identity matrix, and still accurately fit second nearest neighbour plus the distance between these two our point process Ψ very well by fitting the parameters θ1 to θ . This observation suggests the nature of the triangle neighbours exceeds some parameter rT > 0.We refer to the 4 resulting random object simply as a triangle (thinned) point model is captured well by its nearest neighbour distances, process. No explicit expressions will be used for this process. which of course agrees completely with its construction. In fact, we were able to fit our point process Ψ to the triangle B. Quality and diversity models model so well that, at times, the (or variance) of In our model of Ψ, we assume the quality feature or covari- the fitted Ψ decreased significantly, due to the fact that the 4 quality q (θ) features dictated overwhelmingly where points ate fx(φ) ∈ R of a point x within a given configuration φ i to be a four-dimensional vector composed of a constant, of Ψ should and should not exist. In short, the Mat´ern II and triangle processes represent well, in some sense, the two model the distances dx,1, dx,2 of x to its two nearest neighbours, extremes: models captured by just diversity or just quality of as well as the distance dx,3 between these two neighbours. ⊤ points. Consequently, the scalar product θ fx in (19) is equal to ⊤ Based on a 100 samples (or training sets) of our two θ fx(φ)= θ0 + θ1dx,1 + θ2dx,1 + θ3dx,3 . (21) test cases, we were able to find the θ value that maximized For our similarity matrix S, we use the squared exponential the log-likelihood (20), denoted by θ∗. We arrived at the (or Gaussian) kernel given in Example IV.1 with the constant fitting parameters: Mat´ern II (with fitted σ = 1.5679) is C =1, which means it is also possible to fit the σ parameter, θ∗ = (0.3067, 0.6315) and triangular process (with σ = 0) thus adjusting the repulsion between points. is θ∗ = (−4.0779, 2.7934, 1.2445, 2.1173). We give examples of realizations the point process Ψ fitted to these two models C. Simulation and code details in Figures 1 and 3. To remove edge effects, the two test point processes are 2) Testing the quality of fit: built on Poisson point processes simulated in windows that a) Nearest neighbour distance: To gauge the quality of are extended versions of the observation windows used in the the fitted models, we empirically estimated the average nearest fitting stage. For example, if the observation window of the neighbour distribution G(r); see . Unfortunately, this quantity thinned-point processes is a disk with radius rW, then we is highly susceptible to edge effects, as points near the edge simulate the underlying Poisson point process on the disks of the observation window generally have less neighbours, 1 1

0.9 0.9 Poisson process Poisson process 0.8 Triangle 0.8 Matern II Determinantal Poisson Determinantal Poisson 0.7 0.7

0.6 0.6

0.5 0.5 G(r) G(r) 0.4 0.4

0.3 0.3

0.2 0.2 Matern II (empirical) Triangle (empirical) 0.1 Determinatally-thinned (empirical) 0.1 Determinatally-thinned (empirical) Determinatally-thinned (equation) Determinatally-thinned (equation) 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 r r Fig. 1. Realizations of a Mat´ern II Fig. 2. Nearest-neighbour distribu- Fig. 3. Realizations of a trian- Fig. 4. Nearest-neighbour distribu- process and a fitted determinantally- tions of the determinantally-thinned gle thinning process and a fitted tions of the determinantally-thinned thinned Poisson process on a unit Poisson Poisson process fitted to the determinantally-thinned Poisson pro- Poisson Poisson process fitted to the disk (both generated on the same Mat´ern II process: empirical (aver- cess on a unit disk (both generated triangle thinning process: empirical Poisson point process). age) and the semi-analytic (17) cal- on the same Poisson point process). and semi-analytic as on Figure 2.

culated for the point located at the 1 1

origin u = o. 0.9 0.9

0.8 0.8

0.7 0.7 which means the empirical estimates of our test cases and 0.6 0.6

0.5 0.5

(fitted) determinantal models are biased; see Figures 2 and 4. H(r) H(r) 0.4 0.4

But our semi-analytic formula (17) does not suffer from this 0.3 0.3

0.2 0.2 bias, giving an accurate description of the nearest neighbour Matern II (empirical) Triangle (empirical) 0.1 Determinatally-thinned (empirical) 0.1 Determinatally-thinned (empirical) u Determinatally-thinned (equation) Determinatally-thinned (equation) distribution G for a point u at, say, the origin o. To obtain 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 the average nearest neighbour distribution one would just need r r to average Gu(r) with respect to the mean measure (8) over Fig. 5. Empirical spherical con- Fig. 6. Spherical contact distribu- tact distribution function of a triangle tions H(r) of a triangle process the the entire observation window. process and the semi-analytic (22) fitted determinantally-thinned Pois- For the Mat´ern II model, the support of the nearest neigh- with (12) evaluation of the same son Poisson process, as on Figure (5). function for determinantally-thinned bour distribution G(r) has a lower bound of the inhibition Poisson Poisson process fitted to radius, which is reflected in Figure 2. The nearest neighbour same the Mat´ern II process. distribution of the (Mat´ern-fitted) Ψ does not have such a bound, demonstrating how it has a soft-core. Perhaps a better VI. WIRELESS NETWORKS APPLICATIONS match would be possible by using determinantal kernels that A. Models for network layouts gives stronger repulsion, which has been a recent subject of Many real-world cellular phone network layouts do not mathematical study [3]. resemble realizations of Poisson point processes. When such b) Contact distribution: We also studied the (spherical) network layouts exhibit repulsion among the base stations, contact distribution Hx(r), which is the probability distribu- researchers [23], [27] have proposed using determinantal point tion of the distance to the nearest point of the point process processes on the plane R2 to better model such repulsion. from an arbitrary location in the region (which we took as the Though some layouts have been fitted to such point pro- centre of the circular sample window, namely the origin o); cesses [21], a problem is finding appropriate kernel functions. see [2, Section 8.6.2] or [6, Section 3.1.7]. This is simply the Using models based on determinantal thinning of Poisson void probability of a disk with radius r, so point process circumvents this problem by allowing one to construct the kernels via a very flexible L-ensemble formalism Ho(r)=1 − νΨ(Bo(r)). (22) in a way particularly amenable to statistical fitting. One would need to develop a statistical method for fitting complete point (We note that the ratio of the nearest neighbour distribution patterns in an observation window, and not just their subsets, and this distribution, known as the J function, is also used which would need to address (statistical biasing) issues such as an exploratory test in spatial statistics [2, Section 8.6.2]). as edge effects; see [2, 5.6.2]. For the distribution Ho(r), edge effects are less of an issue because we study Ho(r) at the centre without conditioning B. On-off (sleep) schemes on a point exiting there. Indeed, we see that edge effects Instead of modelling network patterns, we now look for ap- have virtually disappeared for the contact distribution, giving propriate models of subsets of various given network patterns. essentially matching curves in Figures 5 and 6. More specifically, we consider power schemes that power In summary, the fitted versions of the determinantally- down sections of the networks. A simple model is when each thinned Poisson process Ψ behave statistically like the Mat´ern transmitter is independently switched off (or put into sleep II and triangle processes, particularly in terms of the near- mode) with some fixed probability p, resulting in a random est neighbour and contact distributions. Furthermore, our uncoordinated power scheme. If the original network formed determinant-based expressions, which only require the Poisson a Poisson point process, then the resulting network of active point process to be generated, avoided the statistical bias from transmitters forms another Poisson point process. Researchers edge effects. have used this for a power scheme, sometimes called a blinking process, to study latency [8] and In this paper we have considered only determinantal thin- routing success in ad hoc networks [13], [14], while more ning of a Poisson point process in a finite window, but recently it has been used to study so-called green cellular completely arbitrary, simple, finite underlying point processes networks [1]. Although the tractability of such a simple are also possible, including non-stationary ones on Rd having power scheme is appealing, it can result in active transmitters finite total number of points. On a more theoretical note, one that are clustered relatively close together, giving unrealistic can consider the problem of extending this setting to stationary and inefficient configurations. We believe our determinantal thinning of point processes on Rd, which raises the question thinning permits for more realistic models of power schemes. of constructing discrete thinning kernels in random, countable Moreover, the presented expressions for the Palm distribution environment; see [26] for the theory of determinantal processes and the Laplace transform will hopefully allow one to evaluate on deterministic countable spaces. Note that the known kernels the performance of such power schemes in terms of semi- used in the continuous setting, such as that of the Ginibre explicit expressions for the coverage probabilities based on point process, do not necessarily have required properties signal-to-interference-plus-noise ratio (see [4, Chapter 5]), that when defining the discrete operators with respect to, say, is, by randomly simulated (or Monte Carlo) evaluation of some Poisson realizations. Also, the L-ensemble approach does not functionals of the underlying Poisson point process Φ, without apply directly to the infinite state spaces. A natural extension ever simulating the actual power schemes. consists of considering an L with finite dependence range. More precisely, set [L(φ)] = 0 for all φ and x, y ∈ φ C. Pre-optimized transmission schedulers x,y such that |x − y| > M for some constant M, meaning that The previous example of a sleep scheme is just one way the Gilbert graph with the edge length threshold M does not to organize wireless network transmissions. Depending on percolate with probability one on the underlying process (to the quantity of interest, there are different optimization goals be thinned). The existence of stationary, determinantal thinning and methods resulting in different transmission schedulers. mechanisms with infinite dependence range is left for future The appeal of determinantally-thinned Poisson processes is fundamental research. that they can be readily fitted to these different schedulers. By using the statistical (supervised) learning approach, the ACKNOWLEDGMENT benefit is not just about the performance evaluation of the REFERENCES existing schedulers, but also potentially from the algorithmic nature. Imagine the situation in which finding an optimal [1] E. Altman, M. K. Hanawal, R. ElAzouzi, and S. Shamai. Tradeoffs in subset of transmitters requires solving a computationally heavy green cellular networks. ACM SIGMETRICS Performance Evaluation Review, 39(3):67–71, 2011. problem, so it is not feasible to do online implementations. [2] A. Baddeley, E. Rubak, and R. Turner. Spatial point patterns: method- Instead, one can solve the optimization problem offline for a ology and applications with R. Chapman and Hall/CRC, 2015. sufficiently rich set of configurations of potential transmitters [3] C. A. N. Biscio and F. Lavancier. Quantifying repulsiveness of determinantal point processes. Bernoulli, 22(4):2001–2028, 2016. (and receivers) and then use it as a training set to fit a deter- [4] B. Błaszczyszyn, M. Haenggi, P. Keeler, and S. Mukherjee. Stochastic minantal thinning approximation to the original optimization geometry analysis of cellular networks. Cambridge University Press, problem. Such suitable fitted determinantal thinnings could be 2018. [5] A. Borodin and E. M. Rains. Eynard–Mehta theorem, Schur process, and implemented instead of the original complex scheduler. their Pfaffian analogs. Journal of statistical , 121(3-4):291–317, 2005. VII. CONCLUSIONS [6] S. N. Chiu, D. Stoyan, W. S. Kendall, and J. Mecke. Stochastic geometry Motivated to present tractable models for wireless networks and its applications. John Wiley & Sons, 2013. [7] D. Dereudre. Introduction to the theory of Gibbs point processes. arXiv exhibiting repulsion, we used determinantal point processes preprint arXiv:1701.08105, 2017. on finite state spaces to define a determinantal thinning opera- [8] O. Dousse, P. Mannersalo, and P. Thiran. Latency of wireless sensor tion on point processes. The resulting determinantally-thinned networks with uncoordinated power saving mechanisms. In Proceedings of the 5th ACM international symposium on Mobile ad hoc networking point processes possess many appealing properties includ- and computing, pages 109–120. ACM, 2004. ing accurately fundamental functionals, such as the Laplace [9] J. E. Gentle. Matrix Algebra: Theory, Computations and Applications functional, void probabilities, as well as Palm (conditional) in Statistics. Springer, 2017. probabilities, by simulating the underlying (non-thinned) point [10] J.-S. Gomez, A. Vasseur, A. Vergne, P. Martins, L. Decreusefond, and W. Chen. A case study on regularity in cellular network deployment. process, without simulating the actual (thinned) point process. IEEE wireless communications letters, 4(4):421–424, 2015. In contrast to Gibbs models, which require weighting the entire [11] J. B. Hough, M. Krishnapur, Y. Peres, and B. Vir´ag. Determinantal realization, determinantal thinning does not involve intractable processes and independence. Probability surveys, 3:206–229, 2006. [12] H. P. Keeler. Simulating and fitting determinantally-thinned Poisson normalizing constants, and also, it is particularly amenable point processes, 2018. to statistical fitting of the parameters. We illustrated this by [13] H. P. Keeler and P. G. Taylor. A model framework for greedy routing presenting two examples where the determinantal thinning in a sensor network with a stochastic power scheme. ACM Transactions on Sensor Networks (TOSN), 7(4):34, 2011. model is fitted to two different thinning mechanisms that [14] H. P. Keeler and P. G. Taylor. Random transmission radii in greedy create repulsion. We see them as prototypes for determinantal routing models for ad hoc sensor networks. SIAM Journal on Applied schedulers approximating more sophisticated wireless trans- Mathematics, 72(2):535–557, 2012. [15] A. Kulesza and B. Taskar. Structured determinantal point processes. In mission scheduling schemes with (geometry-based) medium Advances in neural information processing systems, pages 1171–1179, access control, offering new avenues for future research. 2010. [16] A. Kulesza and B. Taskar. Determinantal point processes for machine conditional realization) can be simply expressed directly using learning. Foundations and Trends in Machine Learning, 5(2–3):123– the defining property (1) 286, 2012. [17] A. Kulesza and B. Taskar. Determinantal point processes for machine P(Ψ¯ T ⊃ ψ)= P (Ψ ⊃ ψ|Ψ ⊃ T ) learning. arXiv preprint arXiv:1207.6083, 2012. P (Ψ ⊃ (ψ ∪ T )) [18] F. Lavancier, J. Møller, and E. Rubak. Determinantal point process = models and statistical inference: Extended version. arXiv preprint P (Ψ ⊃ T ) arXiv:1205.4818, 2014. det(K ) [19] F. Lavancier, J. Møller, and E. Rubak. Determinantal point process = ψ∪T , models and statistical inference. Journal of the Royal Statistical Society: det(KT ) Series B (Statistical Methodology), 77(4):853–877, 2015. [20] Y. Li, F. Baccelli, H. S. Dhillon, and J. G. Andrews. Fitting determi- for ψ ∩ T = ∅. Schur’s determinant identity allows one to nantal point processes to macro base station deployments. In Global express the ratio of the determinants on the right-hand side Communications Conference (GLOBECOM), 2014 IEEE, pages 3641– of the above expression using the Schur complement of the 3646. IEEE, 2014. [21] Y. Li, F. Baccelli, H. S. Dhillon, and J. G. Andrews. Statistical modeling block KT in Kψ∪T and probabilistic analysis of cellular networks with determinantal point T P(Ψ¯ ⊃ ψ)= Schur(KT ,KT∪ψ); (23) processes. IEEE Transactions on Communications, 63(9):3405–3422, 2015. see, for example, [9, Section 3.4]. [22] O. Macchi. The coincidence approach to stochastic point processes. Borodin and Rains [5, Proposition 1.2] derived the following Advances in Applied Probability, 7(1):83–122, 1975. ¯ T [23] N. Miyoshi and T. Shirai. Cellular networks with α-Ginibre configurated characterization of Ψ in terms of the L-ensemble character- base stations. In The Impact of Applications on Mathematics, pages izing Ψ as in (3): Ψ¯ T admits L-ensemble LT given by 211–226. Springer, 2014. T −1 −1 ′ (24) [24] I. Nakata and N. Miyoshi. Spatial stochastic models for analysis of L := (IT + L) T ′ − I, heterogeneous cellular networks with repulsively deployed base stations. ′  ′  Performance Evaluation, 78:7–17, 2014. provided T = S\T 6= ∅, where IT is the square matrix of the [25] T. Shirai and Y. Takahashi. Random point fields associated with certain dimension of S which has ones on the diagonal corresponding Fredholm determinants I: fermion, Poisson and boson point processes. to all the points in T ′ and zeroes elsewhere. In this case, Journal of Functional Analysis, 205(2):414–463, 2003. [26] T. Shirai and Y. Takahashi. Random point fields associated with certain using (2), one can derive the following equivalent form of the T T Fredholm determinants II: fermion shifts and their ergodic and Gibbs kernel K of Ψ¯ properties. The Annals of Probability, 31(3):1533–1564, 2003. T −1 ′ (25) [27] G. L. Torrisi and E. Leonardi. Large deviations of the interference in K = I − (IT + L) T ′ . the Ginibre network model. Stochastic Systems, 4(1):173–205, 2014. In the case of Ψ being the determinantal thinning of Poisson [28] C. Wachinger and P. Golland. Sampling from determinantal point (n) (n) processes for scalable manifold learning. In International Conference on process Φ, denoting by Ψ and Φ the factorial powers Information Processing in Medical Imaging, pages 687–698. Springer, of the respective point processes and T = (x1,...,xn), 2015. dT = d(x1,...,xn) we have for any, say non-negative function f(T , φ), APPENDIX (n) A. Laplace functional E f(T , Ψ \ T )Ψ (dT ) hZRn i Our probabilistic proof of (13) exploits an observation E 1 (n) allowing one to interpret the Laplace functional of a general = f(T , Ψ \ T ) (T ⊂ Ψ)Φ (dT ) hZRn i point process Ψ = E E[f(T , Ψ \ T )1(T ⊂ Ψ)|Φ]Φ(n)(dT ) − P Ψ f(x) −f(x) LΨ(f)= E e x∈ = E e hZRn i h i hxY∈Ψ i = E E[f(T , Ψ \ T )1(T ⊂ Ψ)|Φ ∪ T ] Λn(dT ) as the probability that an independent thinning of Ψ with ZRn h i position dependent retention probabilities equal to 1 − e−f(x) 1 = E E[f(T , Ψ¯ T |Φ]P(T ⊂ Ψ|Φ ∪ T ) M (n)(dT ), has no points in the whole space. Such an independent thinning ZRn h iπ(T ) of Ψ introduces the product of retention probabilities of (26) x1,...,xn as a factor to the moments measures of Ψ; see where the third equality follows from n-th order Campbell’s also [18, Proposition A.2.]. When Ψ is interpreted as a deter- formula and Slivnyak’s theorem for Poisson process, and the minantal thinning of a Poisson process with kernel K, then last one from the definition of Ψ¯ T as the reduced Palm version by combining this factor with the determinant in (10), we can of Ψ on each realization Φ ∪T , as well as (9) for the moment see this independent thinning of Ψ as another determinantal measure M (n), thus proving (15). thinning of the same Poisson process with a kernel given Observe that the conditional distribution of Ψ¯ T in (26) by (14). Using the void probability expression (12) concludes given Φ = φ is given by (23) when considering the discrete the proof of (13) with (14). setting S = φ ∪ T , φ ∩ T = ∅, with KT = KT (φ ∪ T ), K = K (φ∪T ) for ψ ⊂ φ. Considering L = L(φ∪T ) B. Palm distribution T∪ψ T∪ψ and KT = KT (φ), LT = LT (φ), the expressions (24) For a discrete determinantal point process Ψ on S with and (25) apply as well. kernel K and a given subset T ⊂ S, the distribution of the reduced Palm version Ψ¯ T of Ψ given T ⊂ S (that is, conditioned to contain T , with T removed from the