arXiv:2001.08857v1 [math.PR] 24 Jan 2020 ihdensity with n[5 6,Leypoe h eertdrsl that result celebrated L´evy the proved 46], [45, In e od : words Key h rsn itiuinapassrrsnl ntestudy the Let in surprisingly motion. appears distribution arcsine The M 00MteaisSbetClassification: Subject s Mathematics 2010 method, AMS Stein’s walks, random permutation. permutation, Mallows tion, o admwalk random a For oneprsof counterparts ADMPRUAIN IHAPIAIN OGENOMICS TO APPLICATIONS WITH PERMUTATIONS RANDOM RSN ASFRRNO AK EEAE FROM GENERATED WALKS RANDOM FOR LAWS ARCSINE • • • • • IOFN,HNLAGGN UA OMS AYNHAG RLPE EROL HUANG, HAIYAN HOLMES, SUSAN GAN, LIANG HAN FANG, XIAO ehdfrteaciedsrbto,a ela ucinlc limi functional the as Mallows( well to the as bounds of distribution, embedding error arcsine strong explicit the give statisti for also other method We the though dis even laws. walk, same different random the simple exactly the has for as walk generated permutation uniform pca ae,frtersl httenme fsesaoet above steps of number eviden the that numerical result simpl with the the along for for cases, conjecture, as special unexpected distributions an asymptotic give same the have they Abstract. ecnso nfr admpruainadaMallows( a gener and walk w permutation random random genomics, uniform non-Markovian in a the of applications descents for by statistics Motivated these distribution law. of same visi the arcsine last exactly the the have of to all time height the maximum origin, the the of above steps of number the Γ t aiu n[0 on maximum its G G G G tan t aiu au eoetime before value maximum its attains time max n n max : : = sup = : max = R B n : 0 : inf = rsn itiuin rwinmto,L´evy statistics, motion, Brownian distribution, Arcsine 1 min = , : 1 ( = G { { B 0 , lsia eutfrtesml ymti admwl with walk random symmetric simple the for result classical A { s B G ≤ 0 > S 0 { t max 0 n ≤ ; 0 s } ≤ t ds : ≤ ≤ = k ≥ s n r ie by given are Γ and : 1 ≤ eteocpto ieof time occupation the be DINR ADRIAN P k ≤ , f )b n-iesoa rwinmto triga .Let 0. at starting motion Brownian one-dimensional be 0) 1], ( n ≤ k n x : 1 B =1 = ) : s n S B 0 = X k : s π k S 0 = max = 1. p } LI,ADWNI TANG WENPIN AND OLLIN, k ihiceet ( increments with q ¨ emtto hc so needn interest. independent of is which permutation ) etels xttm of time exit last the be x max = Introduction } (1 1 h ne twihtewl atht eobefore zero hits last walk the which at index the − u ∈ [0 x 1 0 ) , ≤ 1] n k B ≤ , o 0 for 00,6J5 05A05. 60J65, 60C05, G u n
, S G B k etefis iea which at time first the be } max X < x < bv eobfr ie1. time before zero above h ne twihtewl first walk the which at index the k q ; emtto n hwthat show and permutation ) n r l rsn distributed arcsine all are Γ and nrllmtterm n a and theorems limit entral frno ak n Brownian and walks random of oteoii,adtetime the and origin, the to t k eoii yse 2 step by origin he sfrteewlshv very have walks these for cs tdfo h set and ascents the from ated eadaprilpofin proof partial a and ce n ovrewe scaled when converge and admwl.W also We walk. random e rt rsn distribution arcsine crete ≥ td h distributions the study e hoesuigStein’s using theorems t 1 B . rn medn,uniform embedding, trong )satn at starting 1) rmzr eoetm 1, time before zero from 2 n tp sthat is steps iiigdistribu- limiting n o the for S K B 0 OZ, ¨ : ,the 0, = achieves (1.1) 2 ARCSINE LAWS FOR RANDOM WALKS
n Γn := k=1 I[Sk > 0] the number of times that the walk is strictly positive up to • n time n, and Nn := k=1 I[Sk 1 0, Sk 0] the number of edges which lie above zero upP to time n. − ≥ ≥ P The discrete analog of L´evy’s arcsine law was established by Andersen [2], where the limiting distribution (1.1) was computed by Erd¨os and Kac [27], and Chung and Feller [18]. Feller [29] gave the following refined treatment:
(i) If the increments (Xk; k 1) of the walk are exchangeable with continuous dis- tribution, then ≥ (d) max Γn = Gn . (d) (ii) For a simple random walk with P(Xk = 1) = 1/2, N2n = G2n which follows the discrete arcsine law given by ± 1 2k 2n 2k α := − for k 0,...,n . (1.2) 2n,2k 22n k n k ∈ { } − (d) (d) In the Brownian scaling limit, the above identities imply that Γ = Gmax = G. The fact (d) max (d) that G = G also follows from L´evy’s identity ( Bt ; t 0) = (sups t Bs Bt; t 0). See Williams [68], Karatzas and Shreve [41], Rogers| and| Will≥ iams [55, Section≤ − 53], Pitman≥ and Yor [54] for various proofs of L´evy’s arcsine law. The arcsine law has further been generalized in several different ways, e.g. Dynkin [25], Getoor and Sharpe [31], and Bertoin and Doney [8] to L´evy processes; Barlow, Pitman and Yor [4], and Bingham and Doney [13] to multidimensional Brownian motion; Akahori [1] and Tak´acs [61] to Brownian motion with drift; Watanabe [67] and Kasahara and Yano [42] to one-dimensional diffusions. See also Pitman [51] for a survey of arcsine laws arising from random discrete structures. In this paper we are concerned with the limiting distribution of the L´evy statistics Gn, max Gn , Γn and Nn of a random walk generated from a class of random permutations. Our motivation comes from a statistical problem in genomics.
1.1. Motivation from genomics. Understanding the relationship between genes is an important goal of systems biology. Systematically measuring the co-expression relation- ships between genes requires appropriate measures of the statistical association between bivariate data. Since gene expression data routinely require normalization, rank correla- tions such as Spearman rank correlation have been commonly used. Compared to many other measures, although some information may be lost in the process of converting numer- ical values to ranks, rank correlations are usually advantageous in terms of being invariant to monotonic transformation, and also robust and less sensitive to outliers. In genomics studies, however, these correlation-based and other kinds of global measures have a prac- tical limitation — they measure a stationary dependent relationship between genes across all samples. It is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from heterogeneous biological conditions. In response to this consideration, several recent efforts have consid- ered statistics that are based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. For instance, denoting the expression profiles for genes X and Y over n conditions (or n samples) by x = (x1,...,xn) and y = (y1,...,yn) respectively, the following statistic, denoted by W2, was introduced ARCSINE LAWS FOR RANDOM WALKS 3 in [65] to consider and aggregate possible local interactions: W = I[φ(x ,...,x )= φ(y ,...,y )]+I[φ(x ,...,x )= φ( y ,..., y )] , 2 i1 ik i1 ik i1 ik − i1 − ik 1 i1<